Hi there,
I've been spending some time tracking down a bug that's been causing memory corruption followed by random kernel panics. Thanks to the kernel's slab memory debugger I tracked it down to a kfree in send.c that was freeing a block of memory that had been written to past the end of its allocation.
Turned out to be a simple typo, which I've fixed in the following patch. When resizing the packet_buff struct in batman_if, the new length was being updated but the old length was being used for the kmalloc(), causing something later to think it had more memory allocated to write to, hence writing past the end of the allocation.
Signed-off-by: Scott Raynel scottraynel@gmail.com
Index: send.c =================================================================== --- send.c (revision 1105) +++ send.c (working copy) @@ -159,7 +159,7 @@ if ((hna_local_changed) && (batman_if->if_num == 0)) {
new_len = sizeof(struct batman_packet) + (num_hna * ETH_ALEN); - new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC); + new_buf = kmalloc(new_len, GFP_ATOMIC);
/* keep old buffer if kmalloc should fail */ if (new_buf) {
Cheers,
-- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
Hey,
Turned out to be a simple typo, which I've fixed in the following patch. When resizing the packet_buff struct in batman_if, the new length was being updated but the old length was being used for the kmalloc(), causing something later to think it had more memory allocated to write to, hence writing past the end of the allocation.
wow - nice catch ! I happily applied your patch (revision 1173). :-)
Regards, Marek
Hey Scott,
thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldn´t find where. (i have not experienced any kernel panics by this however ...).
Thanks, best regards Simon
On Thu, Dec 04, 2008 at 02:14:27PM +1300, Scott Raynel wrote:
Hi there,
I've been spending some time tracking down a bug that's been causing memory corruption followed by random kernel panics. Thanks to the kernel's slab memory debugger I tracked it down to a kfree in send.c that was freeing a block of memory that had been written to past the end of its allocation.
Turned out to be a simple typo, which I've fixed in the following patch. When resizing the packet_buff struct in batman_if, the new length was being updated but the old length was being used for the kmalloc(), causing something later to think it had more memory allocated to write to, hence writing past the end of the allocation.
Signed-off-by: Scott Raynel scottraynel@gmail.com
Index: send.c
--- send.c (revision 1105) +++ send.c (working copy) @@ -159,7 +159,7 @@ if ((hna_local_changed) && (batman_if->if_num == 0)) {
new_len = sizeof(struct batman_packet) + (num_hna * ETH_ALEN);
new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC);
new_buf = kmalloc(new_len, GFP_ATOMIC);
/* keep old buffer if kmalloc should fail */ if (new_buf) {
Cheers,
-- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n
Hi Simon,
On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
Hey Scott,
thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldn´t find where. (i have not experienced any kernel panics by this however ...).
It is quite possible that they are related. The slab error states that a memory allocation was overwritten - the same problem as my patch fixed. However, I can't confirm whether it is the same memory allocation or a different one. The stack trace I got specifically mentioned the kfree() in send_own_packet(), whereas this stack trace does not.
Is that bug easily reproducible? It will be a couple of days before I can try to look at it.
Also, the stack trace is confusing as it appears to indicate a kfree() within hardif_min_mtu(), which I can't find :)
I'll try to do some stress testing of the module with the slab debugger turned on for a while and see what happens.
Cheers,
-- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
Hey Scott,
On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote:
Hi Simon,
On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
Hey Scott,
thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldn´t find where. (i have not experienced any kernel panics by this however ...).
It is quite possible that they are related. The slab error states that a memory allocation was overwritten - the same problem as my patch fixed. However, I can't confirm whether it is the same memory allocation or a different one. The stack trace I got specifically mentioned the kfree() in send_own_packet(), whereas this stack trace does not.
Is that bug easily reproducible? It will be a couple of days before I can try to look at it.
Yep, it was quite easy: just turn it on and off a few times. (echo device and nothing into /proc/net/batman-adv/interfaces). The warning appeared after 10 times in my qemu instance. No crash, only this warning.
Also, the stack trace is confusing as it appears to indicate a kfree() within hardif_min_mtu(), which I can't find :)
That's the problem, that is what confused me at this point. :/
I'll try to do some stress testing of the module with the slab debugger turned on for a while and see what happens.
Sounds great. Thanks for you hard work. :)
best regards, Simon
Hi Simon,
On 6/12/2008, at 8:51 AM, Simon Wunderlich wrote:
Hey Scott,
On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote:
Hi Simon,
On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
Hey Scott,
thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldnôt find where. (i have not experienced any kernel panics by this however ...).
It is quite possible that they are related. The slab error states that a memory allocation was overwritten - the same problem as my patch fixed. However, I can't confirm whether it is the same memory allocation or a different one. The stack trace I got specifically mentioned the kfree() in send_own_packet(), whereas this stack trace does not.
Is that bug easily reproducible? It will be a couple of days before I can try to look at it.
Yep, it was quite easy: just turn it on and off a few times. (echo device and nothing into /proc/net/batman-adv/interfaces). The warning appeared after 10 times in my qemu instance. No crash, only this warning.
I can't reproduce this bug before my patch is applied because the bug it fixes always gets in the way :)
After applying the patch I seem to be able to consistently lock up the system by adding and removing an interface from the batman device several times. The box still replies to pings, but I can't SSH in. This does not trigger the slab debugger. I've looked at using the magic sysreq interface to see what's going on and by printing the current task it appears to be hanging during the cancel_rearming_delayed_work() call in shutdown_module(). This might be related to the scheduling-while-atomic bugs. I'll keep looking into this as I get time, but things are pretty busy here at the moment.
Cheers,
-- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
b.a.t.m.a.n@lists.open-mesh.org