I am using batman 1256 on a very recent openwrt (linux version 2.6.28.10) as well as a bit older one (linux version 2.6.26.8).
With batgat installed, I have problems with the kernel crashing when turning the gateway on and off. I start batman with -r 2. If I detect an uplink, I issue -c -g 11000. If I lose the link, I issue -c -r 2. It is this final -c -r 2 that causes the kernel to either crash with a bad page on the next process that is created, have a null pointer error, or have a recursion error.
If I run batman without batgat, I don't get any crashes.
Everything works fine otherwise. Except one thing that just came to mind, I had to remove -DDEBUG_MALLOC -DMEMORY_USAGE because batman wouldn't do anything without crashing because of magic number problems. Could this be because I am on Big Endian hardware?
Could anyone else see if they have the same problem? All you have to do is have batman running with batgat installed, start issuing batmand -c -g 11000 ; batmand -c -r 2 multiple times and see if their system stays stable.
Thanks.
Hi, thanks for your report. I am currently running some stress tests on x86 and mips and couldn't reproduce any such problems. So I have some questions regarding your configuration.
On Tuesday 19 May 2009 16:27:25 Nathan Wharton wrote:
I am using batman 1256 on a very recent openwrt (linux version 2.6.28.10) as well as a bit older one (linux version 2.6.26.8).
What is your target architecture in openwrt? Have you tried to reproduce that problem on another architecture?
With batgat installed, I have problems with the kernel crashing when turning the gateway on and off. I start batman with -r 2. If I detect an uplink, I issue -c -g 11000. If I lose the link, I issue -c -r 2. It is this final -c -r 2 that causes the kernel to either crash with a bad page on the next process that is created, have a null pointer error, or have a recursion error.
Can you create a readable kernel backtrace with ksymoops?
If I run batman without batgat, I don't get any crashes.
Everything works fine otherwise. Except one thing that just came to mind, I had to remove -DDEBUG_MALLOC -DMEMORY_USAGE because batman wouldn't do anything without crashing because of magic number problems. Could this be because I am on Big Endian hardware?
I am running it also on big endian hardware and it seems to work. Does it happen right after the start or were extra interaction needed? What was the error output?
Could anyone else see if they have the same problem? All you have to do is have batman running with batgat installed, start issuing batmand -c -g 11000 ; batmand -c -r 2 multiple times and see if their system stays stable.
I am running it in a while true loop since an hour on x86 and mips on isolated and non isolated (single partner) nodes and didn't get such problems.
Regards, Sven
Thanks for your reply. I will answer inline below:
On Tue, May 19, 2009 at 2:21 PM, Sven Eckelmann sven.eckelmann@gmx.de wrote:
Hi, thanks for your report. I am currently running some stress tests on x86 and mips and couldn't reproduce any such problems. So I have some questions regarding your configuration.
On Tuesday 19 May 2009 16:27:25 Nathan Wharton wrote:
I am using batman 1256 on a very recent openwrt (linux version 2.6.28.10) as well as a bit older one (linux version 2.6.26.8).
What is your target architecture in openwrt? Have you tried to reproduce that problem on another architecture?
The target is a Gateworks Avila 2348-4 board, which has an IXP425. I haven't tried another target yet.
With batgat installed, I have problems with the kernel crashing when turning the gateway on and off. I start batman with -r 2. If I detect an uplink, I issue -c -g 11000. If I lose the link, I issue -c -r 2. It is this final -c -r 2 that causes the kernel to either crash with a bad page on the next process that is created, have a null pointer error, or have a recursion error.
Can you create a readable kernel backtrace with ksymoops?
I can, but it is never in the batman process, which is why I didn't think it was batman until I figured out how to reproduce it. For example: ===================================== root@SchaferRobotics_1_3:/# batmand -c -g 11000 WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! root@SchaferRobotics_1_3:/# batmand -c WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! batmand -g 12MBit/1536KBit -a 10.1.3.0/24 -a 10.255.1.3/32 -d 3 --hop-penalty 5 --purge-timeout 10000 ath0 eth0 root@SchaferRobotics_1_3:/# batmand -c -r 2 WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! Bad page state in process 'volts_temp' page:c0335440 flags:0x00000000 mapping:00000000 mapcount:0 count:-1 Trying to fix it up, but a reboot is needed Backtrace: [<c0028680>] (dump_stack+0x0/0x14) from [<c0064a08>] (bad_page+0x74/0xb4) [<c0064994>] (bad_page+0x0/0xb4) from [<c0065a0c>] (get_page_from_freelist+0x45c/0x4a0) r6:c02bd7e8 r5:c02be02c r4:c0335440 [<c00655b0>] (get_page_from_freelist+0x0/0x4a0) from [<c0065afc>] (__alloc_pages_internal+0xac/0x3e0) [<c0065a50>] (__alloc_pages_internal+0x0/0x3e0) from [<c0065e50>] (__get_free_pages+0x20/0x54) [<c0065e30>] (__get_free_pages+0x0/0x54) from [<c0033af4>] (copy_process+0x90/0xd40) [<c0033a64>] (copy_process+0x0/0xd40) from [<c0034924>] (do_fork+0x70/0x2a4) [<c00348b4>] (do_fork+0x0/0x2a4) from [<c0027c00>] (sys_fork+0x30/0x38) [<c0027bd0>] (sys_fork+0x0/0x38) from [<c0024de0>] (ret_fast_syscall+0x0/0x2c) ===================================== volts_temp, in this case, happens to be the next process that tried to run. I get a similar trace even if it is another process.
If I run batman without batgat, I don't get any crashes.
Everything works fine otherwise. Except one thing that just came to mind, I had to remove -DDEBUG_MALLOC -DMEMORY_USAGE because batman wouldn't do anything without crashing because of magic number problems. Could this be because I am on Big Endian hardware?
I am running it also on big endian hardware and it seems to work. Does it happen right after the start or were extra interaction needed? What was the error output?
It happens right after the start, and the error is debugRealloc - invalid magic number in trailer.
Could anyone else see if they have the same problem? All you have to do is have batman running with batgat installed, start issuing batmand -c -g 11000 ; batmand -c -r 2 multiple times and see if their system stays stable.
I am running it in a while true loop since an hour on x86 and mips on isolated and non isolated (single partner) nodes and didn't get such problems.
Here is a little more on our setup:
All boards run the same software. Each board has 2 mesh interfaces. One is a radio, one is wired. So, batman runs on 2 interfaces on every board. Each board has a downstream wired interface with a dhcp server. batman announces this network. This downstream network is different for every board due to a group/node numbering scheme. The network is 10.group.node.0/24. Group and Node are 1-250. The wireless interface is 10.0.group.node, and the wired interface is 10.255.group.node.
A board can have an optional second radio, and if it does, it is used to try to find an open wireless access point. A board can also have an optional cellular modem and will try to use it if it does.
If a default route gets set by one of these options, batmand -c -g is used. If the default route goes away, -c -r is used.
The boards are then either used as a mesh network extender, to provide access to the mesh to a computer, or attached to a mobile platform which can be controlled from any computer with access to the mesh.
The --hop-penalty of 5 was tested to be the best value for a mobile platform just on the edge of needing to hop.
The --purge-timeout of 10000 is so that any boards that have been turned off don't hang around long.
The current setup I am testing is 3 boards. 1 in the middle has a wireless connection to one and a wired connection to the other. The node in the middle has the optional wireless uplink. The node connected via wired has the optional cellular uplink.
I appreciate you trying it out. I'll try looking a bit deeper.
On Wednesday 20 May 2009 04:38:31 Nathan Wharton wrote:
Everything works fine otherwise. Except one thing that just came to mind, I had to remove -DDEBUG_MALLOC -DMEMORY_USAGE because batman wouldn't do anything without crashing because of magic number problems. Could this be because I am on Big Endian hardware?
I am running it also on big endian hardware and it seems to work. Does it happen right after the start or were extra interaction needed? What was the error output?
It happens right after the start, and the error is debugRealloc - invalid magic number in trailer.
The DEBUG_MALLOC option enables additional functions within batman that allow it to easily trace back malloc bugs. A simple core dump might help but sometimes it is hard to say where it is coming from. Could you post the exact "invalid trailer number" line ?
I'm a bit confused here: * Is batman crashing ? * Is the kernel crashing ? * Is batman crashing if you use the batgat module ?
Regards, Marek
On Tue, May 19, 2009 at 8:30 PM, Marek Lindner lindner_marek@yahoo.de wrote:
On Wednesday 20 May 2009 04:38:31 Nathan Wharton wrote:
Everything works fine otherwise. Except one thing that just came to mind, I had to remove -DDEBUG_MALLOC -DMEMORY_USAGE because batman wouldn't do anything without crashing because of magic number problems. Could this be because I am on Big Endian hardware?
I am running it also on big endian hardware and it seems to work. Does it happen right after the start or were extra interaction needed? What was the error output?
It happens right after the start, and the error is debugRealloc - invalid magic number in trailer.
The DEBUG_MALLOC option enables additional functions within batman that allow it to easily trace back malloc bugs. A simple core dump might help but sometimes it is hard to say where it is coming from. Could you post the exact "invalid trailer number" line ?
Here is the output when I have DEBUG_MALLOC on from start to finish: ========================================= root@SchaferRobotics_1_3:/# batmand -d 3 -r 2 -a 10.1.3.0/24 --disable-client-na t --hop-penalty 5 --purge-timeout 10000 ath0 eth0 WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Interface activated: ath0 Using interface ath0 with address 10.0.1.3 and broadcast address 10.0.255.255 Interface activated: eth0 Using interface eth0 with address 10.255.1.3 and broadcast address 10.255.255.255 B.A.T.M.A.N. 0.3.2-beta rv1256 (compatibility version 5) Adding throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) Adding throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) debug level: 3 routing class: 2 Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - unknown) Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 Deleting throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) Deleting throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) Interface deactivated: ath0 Interface deactivated: eth0 =========================================
I'm a bit confused here:
- Is batman crashing ?
- Is the kernel crashing ?
- Is batman crashing if you use the batgat module ?
This is only happening when using the batgat module. The kernel is crashing. If it happens to not reboot, I see that batman is in a device wait state and can't be killed.
I see that some more patches were added recently. I will try them and see if anything changes.
On Wednesday 20 May 2009 22:34:29 Nathan Wharton wrote:
Here is the output when I have DEBUG_MALLOC on from start to finish:
root@SchaferRobotics_1_3:/# batmand -d 3 -r 2 -a 10.1.3.0/24 --disable-client-na t --hop-penalty 5 --purge-timeout 10000 ath0 eth0 WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! B.A.T.M.A.N. 0.3.2-beta rv1256 (compatibility version 5)
Thanks, that helps.
Adding throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) Adding throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) debug level: 3 routing class: 2 Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - unknown) Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 Deleting throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) Deleting throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0)
Ok, from your output plus the nvalid magic number we can say that it seems somewhat related to your HNA settings. A few more questions: * In this case the batgat module is not involved and still it crashes ?! * Is your network up & running ? Does batman receive messages from neighbor nodes (you can track that via debug log 4) ? * Does batman also crash in a disconnected environment ?
This is only happening when using the batgat module. The kernel is crashing. If it happens to not reboot, I see that batman is in a device wait state and can't be killed.
The log you just provided is not about a kernel crash - its "just" the batman daemon. Are we hunting 2 different bugs ?
I see that some more patches were added recently. I will try them and see if anything changes.
Ok, keep us posted.
Regards, Marek
On Wed, May 20, 2009 at 11:10 AM, Marek Lindner lindner_marek@yahoo.de wrote:
Ok, from your output plus the nvalid magic number we can say that it seems somewhat related to your HNA settings. A few more questions:
- In this case the batgat module is not involved and still it crashes ?!
- Is your network up & running ? Does batman receive messages from neighbor
nodes (you can track that via debug log 4) ?
- Does batman also crash in a disconnected environment ?
In this case, it does the same thing whether or not batgat is installed.
Debug level 4 gives: ======================================== WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Interface activated: ath0 Using interface ath0 with address 10.0.1.3 and broadcast address 10.0.255.255 Interface activated: eth0 Using interface eth0 with address 10.255.1.3 and broadcast address 10.255.255.255 B.A.T.M.A.N. 0.3.2-beta rv1256 (compatibility version 5) [ 30] Adding throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) [ 30] Adding throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) debug level: 4 routing class: 2 [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - unknown) [ 30] Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists [ 30] Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists [ 30] debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 [ 30] debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 [ 30] Deleting throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) [ 100] Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) [ 130] Deleting throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) ========================================
It does this while not connected.
This is only happening when using the batgat module. The kernel is crashing. If it happens to not reboot, I see that batman is in a device wait state and can't be killed.
The log you just provided is not about a kernel crash - its "just" the batman daemon. Are we hunting 2 different bugs ?
If you consider 1 bug being the debug_malloc stuff not working, and the other being batgat possibly crashing the kernel, then yes. If I turn off debug malloc, then everything works fine, except using batgat and going from gateway to routing class.
On Thursday 21 May 2009 01:01:43 Nathan Wharton wrote:
In this case, it does the same thing whether or not batgat is installed.
Ok.
I miss a couple of things in your output - do you use the plain sources from open-mesh.net or do you apply custom patches ?
Debug level 4 gives:
WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the lat est stable release ! Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown)
Your log indicates that all routes are still present and batman tries to clean them up while starting. As you can see here table 68 is not mentioned. On my machine I get:
Deleting throw route to 105.0.0.0/8 via 0.0.0.0 (table 68 - unknown)
[ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 65) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 66)
Here we lack the message that says we found a new HNA: Adding HNA to announce network list: 105.0.0.0/8
It does this while not connected.
I could make a patch that produces more debug output to get to the root of it but first we have to make sure we run the same code ...
If you consider 1 bug being the debug_malloc stuff not working, and the other being batgat possibly crashing the kernel, then yes. If I turn off debug malloc, then everything works fine, except using batgat and going from gateway to routing class.
Ok, lets do the malloc stuff first and then we move to the batgat issue.
Just to be clear here: DEBUG_MALLOC is not the problem - it just makes the problem visible. Everytime batman allocates memory the debugger will allocate more than needed to add its debugging information. Now the debugging information gets overwritten and the debugger tells you that (including a direction towards the source of the problem). If you deactivate the debugger the memory will still be overwritten but you don't notice it! It can destroy arbitrary structures in the memory that need hours to lead to a crash (if it all). May be it leads to broken routing entries ..
Regards, Marek
On Wed, May 20, 2009 at 2:02 PM, Marek Lindner lindner_marek@yahoo.de wrote:
I miss a couple of things in your output - do you use the plain sources from open-mesh.net or do you apply custom patches ?
I am using OpenWRT, and it doesn't have any patches. It does get the source from open-mesh.net.
Your log indicates that all routes are still present and batman tries to clean them up while starting. As you can see here table 68 is not mentioned. .... Here we lack the message that says we found a new HNA: Adding HNA to announce network list: 105.0.0.0/8 .... I could make a patch that produces more debug output to get to the root of it but first we have to make sure we run the same code ...
I don't know where the table 68 entries might have gone, or the HNA.
How about using 1269? Here is the latest -d 4 output: ========================================= WARNING: You are using the unstable batman branch. If you are interested in *using* batman get the latest stable release ! Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 68 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 68 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.255.1.3/32 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) Interface activated: ath0 Using interface ath0 with address 10.0.1.3 and broadcast address 10.0.255.255 Interface activated: eth0 Using interface eth0 with address 10.255.1.3 and broadcast address 10.255.255.255 B.A.T.M.A.N. 0.3.2-beta rv1269 (compatibility version 5) [ 30] Adding throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) [ 30] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) [ 30] Adding throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) debug level: 4 routing class: 2 [ 30] schedule_own_packet(): ath0 [ 30] schedule_own_packet(): eth0 [ 30] [ 940] [ 950] Sending own packet (originator 10.255.1.3, seqno 1, TTL 2) on interface eth0 [ 950] schedule_own_packet(): eth0 [ 950] [ 950] Received BATMAN packet via NB: 10.255.1.3, IF: eth0 10.255.1.3 (from OG: 10.255.1.3, via old OG: 10.255.1.3, seqno 1, tq 255, TTL 2, V 5, IDF 0) [ 950] Drop packet: received my own broadcast (sender: 10.255.1.3) [ 950] [ 1010] [ 1020] Sending own packet (originator 10.0.1.3, seqno 1, TTL 50, IDF off) on interface ath0 [ 1020] Sending own packet (originator 10.0.1.3, seqno 1, TTL 50, IDF off) on interface eth0 [ 1020] schedule_own_packet(): ath0 [ 1020] [ 1020] Received BATMAN packet via NB: 10.0.1.3, IF: ath0 10.0.1.3 (from OG: 10.0.1.3, via old OG: 10.0.1.3, seqno 1, tq 255, TTL 50, V 5, IDF 0) [ 1020] Drop packet: received my own broadcast (sender: 10.0.1.3) [ 1020] [ 1020] Received BATMAN packet via NB: 10.255.1.3, IF: eth0 10.255.1.3 (from OG: 10.0.1.3, via old OG: 10.0.1.3, seqno 1, tq 255, TTL 50, V 5, IDF 0) [ 1020] Drop packet: received my own broadcast (sender: 10.255.1.3) [ 1020] [ 1990] [ 2000] Sending own packet (originator 10.255.1.3, seqno 2, TTL 2) on interface eth0 [ 2000] schedule_own_packet(): eth0 [ 2000] ------------------ DEBUG ------------------ [ 2000] Forward list [ 2000] 10.0.1.3 at 2022 [ 2000] 10.255.1.3 at 2913 [ 2000] Originator list [ 2000] Originator (#/255) Nexthop [outgoingIF]: Potential nexthops [ 2000] No batman nodes in range ... [ 2000] ---------------------------------------------- END DEBUG [ 2000] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 65 - unknown) [ 2000] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 66 - unknown) [ 2000] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 67 - unknown) [ 2000] Adding throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - unknown) [ 2000] Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists [ 2000] Error - can't add throw route to 10.1.3.0/24 via 0.0.0.0 (table 68): File exists [ 2000] Adding throw route to 10.255.1.3/32 via 0.0.0.0 (table 65 - unknown) [ 2000] Adding throw route to 10.255.1.3/32 via 0.0.0.0 (table 66 - unknown) [ 2000] Adding throw route to 10.255.1.3/32 via 0.0.0.0 (table 67 - unknown) [ 2000] Adding throw route to 10.255.1.3/32 via 0.0.0.0 (table 68 - unknown) [ 2010] debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 [ 2010] debugRealloc - invalid magic number in trailer: 78183456, malloc tag = 15 [ 2010] Deleting throw route to 127.0.0.0/8 via 0.0.0.0 (table 68 - lo) [ 2090] Deleting throw route to 10.1.3.0/24 via 0.0.0.0 (table 68 - eth1) [ 2130] Deleting throw route to 10.0.0.0/16 via 0.0.0.0 (table 68 - ath0) =========================================
Ok, lets do the malloc stuff first and then we move to the batgat issue.
Just to be clear here: DEBUG_MALLOC is not the problem - it just makes the problem visible. Everytime batman allocates memory the debugger will allocate more than needed to add its debugging information. Now the debugging information gets overwritten and the debugger tells you that (including a direction towards the source of the problem). If you deactivate the debugger the memory will still be overwritten but you don't notice it! It can destroy arbitrary structures in the memory that need hours to lead to a crash (if it all). May be it leads to broken routing entries ..
That sounds good to me. I had just turned it off to see if it was just giving false errors, and everything ran fine until trying to do something new with batgat.
Architectures with a special alignment for load and store operations on datatypes bigger than bytes will return a prealigned memory region when calling malloc. When we add our data structure before and after this region we destroy this alignment. To fix this problem we add special regions with "magic" padding data. To be sure that it is big enough for every load/store operation we use the alignment for uintmax_t or a pointer even when the architecture only supports smaller load/store operations.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman/allocate.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 69 insertions(+), 8 deletions(-)
diff --git a/batman/allocate.c b/batman/allocate.c index 3cb1d65..a779504 100644 --- a/batman/allocate.c +++ b/batman/allocate.c @@ -67,6 +67,44 @@ struct memoryUsage };
+static size_t getHeaderPad() { + size_t pad = sizeof(uintmax_t) - (sizeof(struct chunkHeader) % sizeof(uintmax_t)); + if (pad == sizeof(uintmax_t)) + return 0; + else + return pad; +} + +static size_t getTrailerPad(size_t length) { + size_t pad = sizeof(uintmax_t) - (length % sizeof(uintmax_t)); + if (pad == sizeof(uintmax_t)) + return 0; + else + return pad; +} + +static void fillPadding(unsigned char* padding, size_t length) { + unsigned char c = 0x00; + size_t i; + + for (i = 0; i < length; i++) { + c += 0xA7; + padding[i] = c; + } +} + +static int checkPadding(unsigned char* padding, size_t length) { + unsigned char c = 0x00; + size_t i; + + for (i = 0; i < length; i++) { + c += 0xA7; + if (padding[i] != c) + return 0; + } + return 1; +} + static void addMemory( uint32_t length, int32_t tag ) {
struct memoryUsage *walker; @@ -176,7 +214,7 @@ void checkIntegrity(void)
memory = (unsigned char *)walker;
- chunkTrailer = (struct chunkTrailer *)(memory + sizeof(struct chunkHeader) + walker->length); + chunkTrailer = (struct chunkTrailer *)(memory + sizeof(struct chunkHeader) + getHeaderPad() + walker->length + getTrailerPad(walker->length));
if (chunkTrailer->magicNumber != MAGIC_NUMBER) { @@ -209,7 +247,7 @@ void *debugMalloc(uint32_t length, int32_t tag)
/* printf("sizeof(struct chunkHeader) = %u, sizeof (struct chunkTrailer) = %u\n", sizeof (struct chunkHeader), sizeof (struct chunkTrailer)); */
- memory = malloc(length + sizeof(struct chunkHeader) + sizeof(struct chunkTrailer)); + memory = malloc(length + sizeof(struct chunkHeader) + sizeof(struct chunkTrailer) + getHeaderPad() + getTrailerPad(length));
if (memory == NULL) { @@ -218,8 +256,11 @@ void *debugMalloc(uint32_t length, int32_t tag) }
chunkHeader = (struct chunkHeader *)memory; - chunk = memory + sizeof(struct chunkHeader); - chunkTrailer = (struct chunkTrailer *)(memory + sizeof(struct chunkHeader) + length); + chunk = memory + sizeof(struct chunkHeader) + getHeaderPad(); + chunkTrailer = (struct chunkTrailer *)(memory + sizeof(struct chunkHeader) + length + getHeaderPad() + getTrailerPad(length)); + + fillPadding((unsigned char*)chunkHeader + sizeof(struct chunkHeader), getHeaderPad()); + fillPadding(chunk + length, getTrailerPad(length));
chunkHeader->length = length; chunkHeader->tag = tag; @@ -251,7 +292,7 @@ void *debugRealloc(void *memoryParameter, uint32_t length, int32_t tag)
if (memoryParameter) { /* if memoryParameter==NULL, realloc() should work like malloc() !! */ memory = memoryParameter; - chunkHeader = (struct chunkHeader *)(memory - sizeof(struct chunkHeader)); + chunkHeader = (struct chunkHeader *)(memory - sizeof(struct chunkHeader) - getHeaderPad());
if (chunkHeader->magicNumber != MAGIC_NUMBER) { @@ -259,13 +300,23 @@ void *debugRealloc(void *memoryParameter, uint32_t length, int32_t tag) restore_and_exit(0); }
- chunkTrailer = (struct chunkTrailer *)(memory + chunkHeader->length); + if (checkPadding(memory - getHeaderPad(), getHeaderPad()) == 0) { + debug_output( 0, "debugRealloc - invalid magic padding in header, malloc tag = %d\n", chunkHeader->tag ); + restore_and_exit(0); + } + + chunkTrailer = (struct chunkTrailer *)(memory + chunkHeader->length + getTrailerPad(chunkHeader->length));
if (chunkTrailer->magicNumber != MAGIC_NUMBER) { debug_output( 0, "debugRealloc - invalid magic number in trailer: %08x, malloc tag = %d\n", chunkTrailer->magicNumber, chunkHeader->tag ); restore_and_exit(0); } + + if (checkPadding(memory + chunkHeader->length, getTrailerPad(chunkHeader->length)) == 0) { + debug_output( 0, "debugRealloc - invalid magic padding in trailer, malloc tag = %d\n", chunkHeader->tag ); + restore_and_exit(0); + } }
@@ -292,7 +343,7 @@ void debugFree(void *memoryParameter, int tag) struct chunkHeader *previous;
memory = memoryParameter; - chunkHeader = (struct chunkHeader *)(memory - sizeof(struct chunkHeader)); + chunkHeader = (struct chunkHeader *)(memory - sizeof(struct chunkHeader) - getHeaderPad());
if (chunkHeader->magicNumber != MAGIC_NUMBER) { @@ -300,6 +351,11 @@ void debugFree(void *memoryParameter, int tag) restore_and_exit(0); }
+ if (checkPadding(memory - getHeaderPad(), getHeaderPad()) == 0) { + debug_output( 0, "debugFree - invalid magic padding in header, malloc tag = %d\n", chunkHeader->tag ); + restore_and_exit(0); + } + previous = NULL;
pthread_mutex_lock(&chunk_mutex); @@ -326,7 +382,7 @@ void debugFree(void *memoryParameter, int tag)
pthread_mutex_unlock(&chunk_mutex);
- chunkTrailer = (struct chunkTrailer *)(memory + chunkHeader->length); + chunkTrailer = (struct chunkTrailer *)(memory + chunkHeader->length + getTrailerPad(chunkHeader->length));
if (chunkTrailer->magicNumber != MAGIC_NUMBER) { @@ -334,6 +390,11 @@ void debugFree(void *memoryParameter, int tag) restore_and_exit(0); }
+ if (checkPadding(memory + chunkHeader->length, getTrailerPad(chunkHeader->length)) == 0) { + debug_output( 0, "debugFree - invalid magic padding in trailer, malloc tag = %d\n", chunkHeader->tag ); + restore_and_exit(0); + } + #if defined MEMORY_USAGE
removeMemory( chunkHeader->tag, tag );
On Thursday 28 May 2009 18:40:08 Sven Eckelmann wrote:
Architectures with a special alignment for load and store operations on datatypes bigger than bytes will return a prealigned memory region when calling malloc. When we add our data structure before and after this region we destroy this alignment. To fix this problem we add special regions with "magic" padding data. To be sure that it is big enough for every load/store operation we use the alignment for uintmax_t or a pointer even when the architecture only supports smaller load/store operations.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de
@Nathan: Could you let me know if these patches work for you ? If so I'll commit them.
Regards, Marek
On Fri, May 29, 2009 at 2:02 AM, Marek Lindner lindner_marek@yahoo.de wrote:
On Thursday 28 May 2009 18:40:08 Sven Eckelmann wrote:
Architectures with a special alignment for load and store operations on datatypes bigger than bytes will return a prealigned memory region when calling malloc. When we add our data structure before and after this region we destroy this alignment. To fix this problem we add special regions with "magic" padding data. To be sure that it is big enough for every load/store operation we use the alignment for uintmax_t or a pointer even when the architecture only supports smaller load/store operations.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de
@Nathan: Could you let me know if these patches work for you ? If so I'll commit them.
Regards, Marek
I set /proc/cpu/alignment to 4 (raise bus error) and I get a bus error:
Program received signal SIGBUS, Bus error. list_add_tail (new=0x29368, head=0x28819) at list-batman.c:68 68 __list_add( new, head->prev, (struct list_head *)head ); (gdb) l 63 * Insert a new entry before the specified head. 64 * This is useful for implementing queues. 65 */ 66 void list_add_tail( struct list_head *new, struct list_head_first *head ) { 67 68 __list_add( new, head->prev, (struct list_head *)head ); 69 70 head->prev = new; 71 72 }
On Friday 29 May 2009 16:00:40 Nathan Wharton wrote:
@Nathan: Could you let me know if these patches work for you ? If so I'll commit them.
Regards, Marek
I set /proc/cpu/alignment to 4 (raise bus error) and I get a bus error:
Program received signal SIGBUS, Bus error. list_add_tail (new=0x29368, head=0x28819) at list-batman.c:68 68 __list_add( new, head->prev, (struct list_head *)head ); (gdb) l 63 * Insert a new entry before the specified head. 64 * This is useful for implementing queues. 65 */ 66 void list_add_tail( struct list_head *new, struct list_head_first *head ) { 67 68 __list_add( new, head->prev, (struct list_head *)head ); 69 70 head->prev = new; 71 72 }
Have you added the patches per hand? At this moment no patch I've made available in trunk. As you have run it with gdb, can you please append a full backtrace?
Best regards, Sven
On Mon, Jun 1, 2009 at 11:44 AM, Sven Eckelmann sven.eckelmann@gmx.de wrote:
On Friday 29 May 2009 16:00:40 Nathan Wharton wrote:
@Nathan: Could you let me know if these patches work for you ? If so I'll commit them.
Regards, Marek
I set /proc/cpu/alignment to 4 (raise bus error) and I get a bus error:
Program received signal SIGBUS, Bus error. list_add_tail (new=0x29368, head=0x28819) at list-batman.c:68 68 __list_add( new, head->prev, (struct list_head *)head ); (gdb) l 63 * Insert a new entry before the specified head. 64 * This is useful for implementing queues. 65 */ 66 void list_add_tail( struct list_head *new, struct list_head_first *head ) { 67 68 __list_add( new, head->prev, (struct list_head *)head ); 69 70 head->prev = new; 71 72 }
Have you added the patches per hand? At this moment no patch I've made available in trunk. As you have run it with gdb, can you please append a full backtrace?
Best regards, Sven
I had to copy the patches out of the e-mail.
Here is the back trace: #0 list_add_tail (new=0x29bf0, head=0x298c9) at list-batman.c:68 #1 0x0000ee7c in _hna_global_add (orig_node=0x29f80, hna_element=0x29ba8) at hna.c:371 #2 0x0000f160 in hna_global_add (orig_node=0x29f80, new_hna=<value optimized out>, new_hna_len=<value optimized out>) at hna.c:529 #3 0x000099c8 in update_routes (orig_node=0x29f80, neigh_node=0x2a080, hna_recv_buff=0xbead1591 "\n\002\001", hna_buff_len=10) at batman.c:377 #4 0x0000c730 in update_orig (orig_node=0x29f80, in=0xbead157f, neigh=167772673, if_incoming=0x27678, hna_recv_buff=0xbead1591 "\n\002\001", hna_buff_len=-16723, is_duplicate=0 '\0', curr_time=3199014207) at originator.c:227 #5 0x0000a7e0 in batman () at batman.c:956 #6 0x000148d4 in main (argc=14, argv=0xbead1e14) at posix/posix.c:629
Looks like debugMalloc didn't return an aligned value for head. I'll step through that and see what I see.
On Monday 01 June 2009 20:03:43 Nathan Wharton wrote:
I had to copy the patches out of the e-mail.
Here is the back trace: #0 list_add_tail (new=0x29bf0, head=0x298c9) at list-batman.c:68 #1 0x0000ee7c in _hna_global_add (orig_node=0x29f80, hna_element=0x29ba8) at hna.c:371 #2 0x0000f160 in hna_global_add (orig_node=0x29f80, new_hna=<value optimized out>, new_hna_len=<value optimized out>) at hna.c:529 #3 0x000099c8 in update_routes (orig_node=0x29f80, neigh_node=0x2a080, hna_recv_buff=0xbead1591 "\n\002\001", hna_buff_len=10) at batman.c:377 #4 0x0000c730 in update_orig (orig_node=0x29f80, in=0xbead157f, neigh=167772673, if_incoming=0x27678, hna_recv_buff=0xbead1591 "\n\002\001", hna_buff_len=-16723, is_duplicate=0 '\0', curr_time=3199014207) at originator.c:227 #5 0x0000a7e0 in batman () at batman.c:956 #6 0x000148d4 in main (argc=14, argv=0xbead1e14) at posix/posix.c:629
Looks like debugMalloc didn't return an aligned value for head. I'll step through that and see what I see.
Ok, I think I see the problem. The malloc returned a valid aligned adress. list_add_tail will get a pointer to an element in hna_global_entry. This structure is packed and all operations on it should be non-alignment safe. If you look at it further you will notice that orig_list is at position 9 (assuming 4 bytes for a pointer) - which will not be aligned to 4 bytes of course..... And here comes the problem: the compiler will only do the safe operations on non-aligned data if it knows that it is not alignent. Since a cast is done by calling list_add_tail it will not know that this parameter is not aligned and the non-alignment bug will occur.
So my question to marek: Is it really needed to have "struct hna_global_entry" packed in hna.h:57? If not then we should remove it and this problem should be gone. And what is with "struct hna_element".
Thank you for your work, Nathan :)
Regards, Sven
On Mon, Jun 1, 2009 at 2:35 PM, Sven Eckelmann sven.eckelmann@gmx.de wrote:
Ok, I think I see the problem. The malloc returned a valid aligned adress. list_add_tail will get a pointer to an element in hna_global_entry. This structure is packed and all operations on it should be non-alignment safe. If you look at it further you will notice that orig_list is at position 9 (assuming 4 bytes for a pointer) - which will not be aligned to 4 bytes of course..... And here comes the problem: the compiler will only do the safe operations on non-aligned data if it knows that it is not alignent. Since a cast is done by calling list_add_tail it will not know that this parameter is not aligned and the non-alignment bug will occur.
So my question to marek: Is it really needed to have "struct hna_global_entry" packed in hna.h:57? If not then we should remove it and this problem should be gone. And what is with "struct hna_element".
Thank you for your work, Nathan :)
You are welcome, thanks for your help.
The crashing of batgat on unloading turns out to be socket 4306 not being ready to be reused yet. I worked around this by using: batmand -c -r 0 ; sleep 1 ; batmand -c -g 11000 and batmand -c -g 0 ; sleep 1 ; batmand -c -r 2
Marek helped figure that out on irc.
On Tuesday 02 June 2009 03:35:07 Sven Eckelmann wrote:
So my question to marek: Is it really needed to have "struct hna_global_entry" packed in hna.h:57? If not then we should remove it and this problem should be gone. And what is with "struct hna_element".
The first 5 bytes of both structs are used as base for the hash index. If the compiler changes the order or something similar it might not work.
Regards, Marek
On Tuesday 02 June 2009 06:36:41 Marek Lindner wrote:
On Tuesday 02 June 2009 03:35:07 Sven Eckelmann wrote:
So my question to marek: Is it really needed to have "struct hna_global_entry" packed in hna.h:57? If not then we should remove it and this problem should be gone. And what is with "struct hna_element".
The first 5 bytes of both structs are used as base for the hash index. If the compiler changes the order or something similar it might not work.
Ok, then it should be safe to force the alignment of the pointers in hna_global_entry. Everything else seems to be much more complicated and doesn't create much cleaner code.
Best regards, Sven
Architectures like SuperARM or Xscale needs aligned data for multi-byte operations. GCC can create instructions sequences for packed data, but must know that something will not be aligned. Since list_add will operate on untyped data over void-pointers it cannot know that hna_global_entry is packed and will create only a fast and unsafe version for load and store operations. It is only important for the first 5 bytes of hna_global_entry to be packed we can force these elements to be aligned without changing the relative addresses of the first bytes.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman/hna.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/batman/hna.h b/batman/hna.h index 6063324..3e7049e 100644 --- a/batman/hna.h +++ b/batman/hna.h @@ -58,8 +58,8 @@ struct hna_global_entry { uint32_t addr; uint8_t netmask; - struct orig_node *curr_orig_node; - struct list_head_first orig_list; + struct orig_node *curr_orig_node ALIGN_WORD; + struct list_head_first orig_list ALIGN_WORD; } __attribute__((packed));
struct hna_orig_ptr
On Tue, Jun 2, 2009 at 12:56 PM, Sven Eckelmann sven.eckelmann@gmx.de wrote:
Architectures like SuperARM or Xscale needs aligned data for multi-byte operations. GCC can create instructions sequences for packed data, but must know that something will not be aligned. Since list_add will operate on untyped data over void-pointers it cannot know that hna_global_entry is packed and will create only a fast and unsafe version for load and store operations. It is only important for the first 5 bytes of hna_global_entry to be packed we can force these elements to be aligned without changing the relative addresses of the first bytes.
It looks good here. I am running this combined with the previous 3 patches with cpu/alignment set to bus error on problems.
Architectures like SuperARM or Xscale needs aligned data for multi-byte operations. GCC can create instructions sequences for packed data, but must know that something will not be aligned. Since list_add will operate on untyped data over void-pointers it cannot know that hna_global_entry is packed and will create only a fast and unsafe version for load and store operations. It is only important for the first 5 bytes of hna_global_entry to be packed we can force these elements to be aligned without changing the relative addresses of the first bytes.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman/batman.h | 1 + batman/hna.h | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/batman/batman.h b/batman/batman.h index d6b00cf..23f8e9a 100644 --- a/batman/batman.h +++ b/batman/batman.h @@ -153,6 +153,7 @@
#define BATMANUNUSED(x) (x)__attribute__((unused)) #define ALIGN_WORD __attribute__ ((aligned(sizeof(TYPE_OF_WORD)))) +#define ALIGN_POINTER __attribute__ ((aligned(sizeof(void*))))
diff --git a/batman/hna.h b/batman/hna.h index 6063324..a046857 100644 --- a/batman/hna.h +++ b/batman/hna.h @@ -58,8 +58,8 @@ struct hna_global_entry { uint32_t addr; uint8_t netmask; - struct orig_node *curr_orig_node; - struct list_head_first orig_list; + struct orig_node *curr_orig_node ALIGN_POINTER; + struct list_head_first orig_list ALIGN_POINTER; } __attribute__((packed));
struct hna_orig_ptr
On Wednesday 03 June 2009 18:39:26 Sven Eckelmann wrote:
Architectures like SuperARM or Xscale needs aligned data for multi-byte operations. GCC can create instructions sequences for packed data, but must know that something will not be aligned. Since list_add will operate on untyped data over void-pointers it cannot know that hna_global_entry is packed and will create only a fast and unsafe version for load and store operations. It is only important for the first 5 bytes of hna_global_entry to be packed we can force these elements to be aligned without changing the relative addresses of the first bytes.
Sven, thanks a lot for your patches and thanks for your debugging help, Nathan. I just applied these patches. :-)
Regards, Marek
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman/allocate.c | 24 ++++++++++++++++++++---- batman/batman.h | 2 ++ batman/bitarray.h | 4 +--- 3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/batman/allocate.c b/batman/allocate.c index a779504..5e28c71 100644 --- a/batman/allocate.c +++ b/batman/allocate.c @@ -68,16 +68,32 @@ struct memoryUsage
static size_t getHeaderPad() { - size_t pad = sizeof(uintmax_t) - (sizeof(struct chunkHeader) % sizeof(uintmax_t)); - if (pad == sizeof(uintmax_t)) + size_t alignwith, pad; + + if (sizeof(TYPE_OF_WORD) > sizeof(void*)) + alignwith = sizeof(TYPE_OF_WORD); + else + alignwith = sizeof(void*); + + pad = alignwith - (sizeof(struct chunkHeader) % alignwith); + + if (pad == alignwith) return 0; else return pad; }
static size_t getTrailerPad(size_t length) { - size_t pad = sizeof(uintmax_t) - (length % sizeof(uintmax_t)); - if (pad == sizeof(uintmax_t)) + size_t alignwith, pad; + + if (sizeof(TYPE_OF_WORD) > sizeof(void*)) + alignwith = sizeof(TYPE_OF_WORD); + else + alignwith = sizeof(void*); + + pad = alignwith - (length % alignwith); + + if (pad == alignwith) return 0; else return pad; diff --git a/batman/batman.h b/batman/batman.h index c02ce8d..1cc5896 100644 --- a/batman/batman.h +++ b/batman/batman.h @@ -31,6 +31,8 @@ #include <stdint.h> #include <stdio.h>
+#define TYPE_OF_WORD uintmax_t /* you should choose something big, if you don't want to waste cpu */ + #include "list-batman.h" #include "bitarray.h" #include "hash.h" diff --git a/batman/bitarray.h b/batman/bitarray.h index 5472ef1..0bb0710 100644 --- a/batman/bitarray.h +++ b/batman/bitarray.h @@ -21,10 +21,8 @@
-#define TYPE_OF_WORD unsigned long /* you should choose something big, if you don't want to waste cpu */ -#define WORD_BIT_SIZE ( sizeof(TYPE_OF_WORD) * 8 ) #include "batman.h" - +#define WORD_BIT_SIZE ( sizeof(TYPE_OF_WORD) * 8 )
void bit_init( TYPE_OF_WORD *seq_bits );
Buffers of char must not be special aligned on all architecture, but if the compiler will not know about missing alignment of the larger data type it generate unsafe instructions as it assumes that they are word aligned.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman/batman.h | 1 + batman/linux/route.c | 6 +++--- batman/posix/tunnel.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/batman/batman.h b/batman/batman.h index 1cc5896..d6b00cf 100644 --- a/batman/batman.h +++ b/batman/batman.h @@ -152,6 +152,7 @@
#define BATMANUNUSED(x) (x)__attribute__((unused)) +#define ALIGN_WORD __attribute__ ((aligned(sizeof(TYPE_OF_WORD))))
diff --git a/batman/linux/route.c b/batman/linux/route.c index 4d46955..0c7b932 100644 --- a/batman/linux/route.c +++ b/batman/linux/route.c @@ -185,7 +185,7 @@ void add_del_route(uint32_t dest, uint8_t netmask, uint32_t router, uint32_t src struct rtmsg rtm; char buff[4 * (sizeof(struct rtattr) + 4)]; } *req; - char req_buf[NLMSG_LENGTH(sizeof(struct req_s))]; + char req_buf[NLMSG_LENGTH(sizeof(struct req_s))] ALIGN_WORD;
iov.iov_base = buf; iov.iov_len = sizeof(buf); @@ -369,7 +369,7 @@ void add_del_rule(uint32_t network, uint8_t netmask, int8_t rt_table, uint32_t p struct rtmsg rtm; char buff[2 * (sizeof(struct rtattr) + 4)]; } *req; - char req_buf[NLMSG_LENGTH(sizeof(struct req_s))]; + char req_buf[NLMSG_LENGTH(sizeof(struct req_s))] ALIGN_WORD;
iov.iov_base = buf; iov.iov_len = sizeof(buf); @@ -634,7 +634,7 @@ int flush_routes_rules(int8_t is_rule) struct req_s { struct rtmsg rtm; } *req; - char req_buf[NLMSG_LENGTH(sizeof(struct req_s))]; + char req_buf[NLMSG_LENGTH(sizeof(struct req_s))] ALIGN_WORD;
struct rtattr *rtap;
diff --git a/batman/posix/tunnel.c b/batman/posix/tunnel.c index 4263794..1cfb501 100644 --- a/batman/posix/tunnel.c +++ b/batman/posix/tunnel.c @@ -567,7 +567,7 @@ void *gw_listen(void *BATMANUNUSED(arg)) { unsigned char buff[1501]; int32_t res, max_sock, buff_len, tun_fd, tun_ifi; uint32_t addr_len, client_timeout, current_time; - uint8_t my_tun_ip[4], next_free_ip[4]; + uint8_t my_tun_ip[4] ALIGN_WORD, next_free_ip[4] ALIGN_WORD; struct hashtable_t *wip_hash, *vip_hash; struct list_head_first free_ip_list; fd_set wait_sockets, tmp_wait_sockets;
b.a.t.m.a.n@lists.open-mesh.org