Hi Simon,
On 6/12/2008, at 8:51 AM, Simon Wunderlich wrote:
Hey Scott,
On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote:
Hi Simon,
On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
Hey Scott,
thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldnôt find where. (i have not experienced any kernel panics by this however ...).
It is quite possible that they are related. The slab error states that a memory allocation was overwritten - the same problem as my patch fixed. However, I can't confirm whether it is the same memory allocation or a different one. The stack trace I got specifically mentioned the kfree() in send_own_packet(), whereas this stack trace does not.
Is that bug easily reproducible? It will be a couple of days before I can try to look at it.
Yep, it was quite easy: just turn it on and off a few times. (echo device and nothing into /proc/net/batman-adv/interfaces). The warning appeared after 10 times in my qemu instance. No crash, only this warning.
I can't reproduce this bug before my patch is applied because the bug it fixes always gets in the way :)
After applying the patch I seem to be able to consistently lock up the system by adding and removing an interface from the batman device several times. The box still replies to pings, but I can't SSH in. This does not trigger the slab debugger. I've looked at using the magic sysreq interface to see what's going on and by printing the current task it appears to be hanging during the cancel_rearming_delayed_work() call in shutdown_module(). This might be related to the scheduling-while-atomic bugs. I'll keep looking into this as I get time, but things are pretty busy here at the moment.
Cheers,
-- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand