Hi Martin
Am Montag, den 24.11.2014, 09:24 +0100 schrieb Martin Hundebøll:
Can you help me do a quick sum-up?
At the beginning of the month some bug occurred on our two gateways with Arch Linux. MM and NC enabled. First bat-adv 2014.2.0 later bat-adv provided with the kernel. Kernel 3.14.23-ARCH later 3.17.1-ARCH. The bug can not be analysed on those VMs because I don't have access to their consoles.
I took a Gentoo Linux VM (bat-adv provided inside 3.16.6) with access to the console to ensure this bug is not Arch related. The bug occurs after three days. After kernel panic it can't be scrolled inside the console, so a complete trace-back is impossible. The end of the trace-back was the same I reported. I recompiled the Kernel (3.16.7) with debug symbols on and changed to it after one more crash.
- At first it crashed with regular intervals (0 - 72 hours) with the
backtrace you posted initially.
No, with irregular intervals (0 - 72 hours). I think it has nothing to do with the time. With the Arch VMs I tried out this: one machine gw server the other gw client. After first VM's crash I immediately switched the other to gw server. After no time also this machine crashed. I think it has to be a bogus user packet.
I don't know which user sends bogus packages and I also can not ask our users what they are doing to crash our gateways.
I also don't know if the crash on Arch VM is the same on the Gentoo VM, with the back-trace I reported, but I assume.
- Then you disabled NC. Did it stop crashing at that point?
NC has been disabled for 20 h before I patched the kernel, so it can't be told for sure that disabling stops the crashes.
- Then we enabled NC and added my patch, and it still does not crash?
After patching NC was enabled again to reproduce the bug. The VM crashed after 27 h. I could not retrieve the trace-back because I set the 'crashkernel' option to low. The next crash happened after 32:38:59. There has not been any batadv_frag_merge_packets in kernel ring buffer.
I remeber you said it crashed with the distro-provided batman-adv module. Did you ensure to use the same version when running with my patch?
Yes. I patched /usr/src/linux/net/batman-adv/fragmentation.c I use batman-adv provided with the Kernel to reproduce all the steps. make modules recompiled only the batman-adv module, which I reloaded.
I haven't had time to dig into the reproduction of the crash, but I think I will do regardless.
Please tell me, if you need some more information.
The VM's uptime is now 39 h. It survives Saturday evening and Sunday without a crash. I think the bug is NC related, but lets wait some more days until next Monday to tell for sure. In this time the users might do what they did in the past and trigger the bug.
Thank you for your time and making B.A.T.M.A.N.-adv better.
Best regards
Philipp
________________________ Freifunk Rheinland e. V. – Funkzelle Wuppertal –