Found the bug, it happens when the difference between the received and last seqno is exactly -64.
We also have bugs in this regard in the current implementation:
if seq_diff is -64, bit_get_packet calls bit_mark(). However there is a check inside which ignores -64, so nothing bad happens.
if seq_diff is +64, bit_get_packet calls bit_shift. There is no check in bit_shift(), so one byte outside next to the sequence window read.
I will send a cleaned up patch in the next few minutes. We should also fix batman (layer 3) in this regard, adding the protection window there as well might be a good idea.
best regards, Simon
On Tue, Apr 06, 2010 at 07:58:09PM +0200, Simon Wunderlich wrote:
Hi Linus,
i've verified and can reproduce the problem. The queue limitation patch removes the OOM problems, but the same packets are still broadcasted. It is always the same sequence number which is sent many times - the same packet should not be sent more than 3 times.
All nodes but the original sender flood the same packets on all interfaces ...
I'll look into this, thanks. Simon
On Tue, Apr 06, 2010 at 03:11:05PM +0200, Linus Lüssing wrote:
On Tue, Apr 06, 2010 at 12:41:29PM +0200, Simon Wunderlich wrote:
Hi Linus,
from the time where the messages come (the printk is removed in the submitted version of the patch BTW), we can see that there is a 30 second period between the protection time starts - as it is supposed to be.
I guess you have stopped your broadcast ping after ~900 seconds, but still receive packets some time later. Do you have some dumps or any analysis data for this?
Ehm, no, I stopped after 1-2 seconds :). And well, "some" packets after? A is still receiving about 3000-4000 packets per second. I made some dumps, you can find them here: http://x-realis.dyndns.org/Freifunk/batman-log/mesh1.cap http://x-realis.dyndns.org/Freifunk/batman-log/mesh2.cap For the virtual machines, I've just been bridging their tap-interfaces on the host system, so no vde_switch/wirefilter involved. mesh1.cap is the capture from the bridge between A and B, mesh2.cap from the bridge between B and C. A's mac addr: XX:...:XX:X1 B's mac addrs: XX:...:XX:X3 C's mac addr: XX:...:XX:X2 After some seconds, B and C also seem to relay the same packet all the time (in this dump 2702, but is a different seqno every time I restart the hole setup).
I will try to rebuild your setup and turn the broadcast replies on later.
Ok. As said above, I've just connected the nodes via bridges. After all three nodes were connected and running, I did the following commands on node A:
ifconfig bat0 up ifconfig bat0 192.168.123.1/24 echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ping -b -f 192.168.123.255 > /dev/null --> and stopped the ping command after 1 to 2 seconds.