Hello,
i can reproduce the slowpath warning in the trunk within my qemu setup. When i start a server, some of the clients show this slow path warning.
I've tried Andrews patch, and i can not see any memory leaks - it varies in a 100k window. However some of the servers crash after some time (30 minutes) without any special trigger i am aware of. Stack trace is this:
root@OpenWrt:/# ------------[ cut here ]------------ WARNING: at lib/kref.c:43 kref_get+0x18/0x30() Hardware name: Modules linked in: via_velocity via_rhine tg3 sis900 r8169 pcnet32 ne2k_pci 8390 e1000 e100 batman_adv 8139too 3c59x nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox libphy ipt_REJECT xt_TCPMSS ipt_LOG xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc natsemi crc_ccitt ipv6 Pid: 570, comm: bat_events Not tainted 2.6.31.1 #32 Call Trace: [<c10dd608>] ? kref_get+0x18/0x30 [<c101f55f>] ? warn_slowpath_common+0x7f/0xb0 [<c10dd608>] ? kref_get+0x18/0x30 [<c101f5a3>] ? warn_slowpath_null+0x13/0x20 [<c10dd608>] ? kref_get+0x18/0x30 [<c2ac51a2>] ? proc_vis_read_prim_sec+0x352/0x5a0 [batman_adv] [<c2ac4ee0>] ? proc_vis_read_prim_sec+0x90/0x5a0 [batman_adv] [<c102d23a>] ? worker_thread+0xca/0x150 [<c102fd80>] ? autoremove_wake_function+0x0/0x50 [<c102d170>] ? worker_thread+0x0/0x150 [<c102fbc3>] ? kthread+0x73/0x90 [<c102fb50>] ? kthread+0x0/0x90 [<c1003813>] ? kernel_thread_helper+0x7/0x14 ---[ end trace 4f770856ce7e0712 ]--- ------------[ cut here ]------------ kernel BUG at mm/slub.c:2929! invalid opcode: 0000 [#1] last sysfs file: /sys/kernel/uevent_seqnum Modules linked in: via_velocity via_rhine tg3 sis900 r8169 pcnet32 ne2k_pci 8390 e1000 e100 batman_adv 8139too 3c59x nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox libphy ipt_REJECT xt_TCPMSS ipt_LOG xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc natsemi crc_ccitt ipv6
Pid: 570, comm: bat_events Tainted: G W (2.6.31.1 #32) EIP: 0060:[<c1064669>] EFLAGS: 00010046 CPU: 0 EIP is at kfree+0x49/0xc0 EAX: 00000000 EBX: c1f5dd94 ECX: 40080000 EDX: c133cba0 ESI: c2ac5540 EDI: c1f5dd80 EBP: 00000216 ESP: c0d06eec DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process bat_events (pid: 570, ti=c0d06000 task=c18d0000 task.ti=c0d06000) Stack: 00000002 c1f5dd94 c1f5dd94 c2ac5540 c1f5dda4 c10dd5cf c1f5dda4 00000000 <0> c2ac538b c0d06f68 c0e0504a c1f5dd84 c1f5dd94 c1f5dd80 c1f5dd80 000022a0 <0> 00000046 c0e05024 c0e05030 00000286 00000086 c1f5dda4 0000003f 00000080 Call Trace: [<c2ac5540>] ? vis_init+0x150/0x1b0 [batman_adv] [<c10dd5cf>] ? kref_put+0x4f/0x70 [<c2ac538b>] ? proc_vis_read_prim_sec+0x53b/0x5a0 [batman_adv] [<c2ac4ee0>] ? proc_vis_read_prim_sec+0x90/0x5a0 [batman_adv] [<c102d23a>] ? worker_thread+0xca/0x150 [<c102fd80>] ? autoremove_wake_function+0x0/0x50 [<c102d170>] ? worker_thread+0x0/0x150 [<c102fbc3>] ? kthread+0x73/0x90 [<c102fb50>] ? kthread+0x0/0x90 [<c1003813>] ? kernel_thread_helper+0x7/0x14 Code: 00 40 a1 80 cb 2e c1 c1 ea 0c c1 e2 05 01 c2 8b 0a 89 c8 25 00 80 00 00 66 85 c0 74 05 8b 52 0c 8b 0a 84 c9 78 24 f6 c5 c0 75 09 <0f> 0b 90 8d 74 26 00 eb fe 8b 5c 24 08 89 d0 8b 74 24 0c 8b 7c EIP: [<c1064669>] kfree+0x49/0xc0 SS:ESP 0068:c0d06eec ---[ end trace 4f770856ce7e0713 ]---
Please note that in this moment, i have not queried the vis_data file, so the proc_vis_read_prim_sec call is probably not correctly reported. Furthermore there are some more stack traces in vfs/mount and then the boxes reboot - we probably write at some bad memory position.
I've changed Andrews patch a little bit, and could not crash my 9 kvm instances when running it some hours last night. I've added kref_get() calls when the info packets get referenced from the send_list, this reference is removed with kref_put() in the send_vis_packets() after sending the packet. The disadvantage of this solution is that in a race condition, we might (unnecessarily) send out an old packet. No memory leaks as far as i could see after running some hours.
Please give it a try (patch is attached to this e-mail)!
best regards, Simon
On Thu, Feb 11, 2010 at 11:01:56AM +0100, Andrew Lunn wrote:
On Thu, Feb 11, 2010 at 10:46:59AM +0100, Andrew Lunn wrote:
Hi Linus
Here is a new version of the patch. I've tested it this time using five UML machines. It should not immediately opps now.
Instead is will leak memory and crash after a while...
I will try to find the memory leak.
Andrew