I've been lurking on this list for a few weeks, but finally tried compiling and running the batman-adv module from the svn sources.
It seems to work fine between my two test machines so far, but I noticed some weird sequence numbers when doing a dump with 'batctl td'. In the sequence below, I had 'rover' running and active on rover's eth0, then I activated the eth1 interface on hilltop for a few seconds, then de-activated it.
It seems like when rover re-sends the OGM from hilltop, it gets the sequence number wrong (1->256, 2->512, 3->768). Is this my lack of understanding of how it should work, a bug in batctl's display or a bug in batman-adv?
rover is kernel 2.6.31-20-generic (Ubuntu karmic), hilltop is kernel 2.6.26-2-686 (Debian lenny)
$ ./batctl td eth1 23:37:22.022554 BAT rover-eth0: OGM via neigh rover-eth0, seqno 481, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:23.002547 BAT rover-eth0: OGM via neigh rover-eth0, seqno 482, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:23.982469 BAT rover-eth0: OGM via neigh rover-eth0, seqno 483, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:24.962417 BAT rover-eth0: OGM via neigh rover-eth0, seqno 484, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:25.962320 BAT rover-eth0: OGM via neigh rover-eth0, seqno 485, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:26.943490 BAT rover-eth0: OGM via neigh rover-eth0, seqno 486, tq 0, ttl 49, v 9, flags [D..], length 42 23:37:27.041437 BAT rover-eth0: OGM via neigh hilltop-eth1, seqno 486, tq 0, ttl 49, v 9, flags [D..], length 28 23:37:27.893422 BAT hilltop-eth1: OGM via neigh hilltop-eth1, seqno 1, tq 255, ttl 50, v 9, flags [..F], length 22 23:37:27.922233 BAT rover-eth0: OGM via neigh rover-eth0, seqno 487, tq 0, ttl 49, v 9, flags [D..], length 42 23:37:27.978267 BAT hilltop-eth1: OGM via neigh rover-eth0, seqno 256, tq 10, ttl 49, v 9, flags [D..], length 42 23:37:28.025425 BAT rover-eth0: OGM via neigh hilltop-eth1, seqno 487, tq 0, ttl 49, v 9, flags [D..], length 28 23:37:28.873428 BAT hilltop-eth1: OGM via neigh hilltop-eth1, seqno 2, tq 255, ttl 50, v 9, flags [..F], length 28 23:37:28.902203 BAT rover-eth0: OGM via neigh rover-eth0, seqno 488, tq 10, ttl 49, v 9, flags [D..], length 42 23:37:28.958188 BAT hilltop-eth1: OGM via neigh rover-eth0, seqno 512, tq 21, ttl 49, v 9, flags [D..], length 42 23:37:29.005443 BAT rover-eth0: OGM via neigh hilltop-eth1, seqno 488, tq 10, ttl 49, v 9, flags [D..], length 28 23:37:29.873428 BAT hilltop-eth1: OGM via neigh hilltop-eth1, seqno 3, tq 255, ttl 50, v 9, flags [..F], length 28 23:37:29.902152 BAT rover-eth0: OGM via neigh rover-eth0, seqno 489, tq 22, ttl 49, v 9, flags [D..], length 42 23:37:29.970180 BAT hilltop-eth1: OGM via neigh rover-eth0, seqno 768, tq 30, ttl 49, v 9, flags [D..], length 42 23:37:30.009433 BAT rover-eth0: OGM via neigh hilltop-eth1, seqno 489, tq 22, ttl 49, v 9, flags [D..], length 28 23:37:30.882142 BAT rover-eth0: OGM via neigh rover-eth0, seqno 490, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:31.882087 BAT rover-eth0: OGM via neigh rover-eth0, seqno 491, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:32.862046 BAT rover-eth0: OGM via neigh rover-eth0, seqno 492, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:33.861989 BAT rover-eth0: OGM via neigh rover-eth0, seqno 493, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:34.861946 BAT rover-eth0: OGM via neigh rover-eth0, seqno 494, tq 255, ttl 50, v 9, flags [..F], length 42
-Kevin
Further information: the strange behaviour seems to stop once the sequence number reaches 1024.
-Kevin
On Sun, Mar 14, 2010 at 12:35:50AM +0000, Kevin Steen wrote:
I've been lurking on this list for a few weeks, but finally tried compiling and running the batman-adv module from the svn sources.
It seems to work fine between my two test machines so far, but I noticed some weird sequence numbers when doing a dump with 'batctl td'. In the sequence below, I had 'rover' running and active on rover's eth0, then I activated the eth1 interface on hilltop for a few seconds, then de-activated it.
It seems like when rover re-sends the OGM from hilltop, it gets the sequence number wrong (1->256, 2->512, 3->768). Is this my lack of understanding of how it should work, a bug in batctl's display or a bug in batman-adv?
rover is kernel 2.6.31-20-generic (Ubuntu karmic), hilltop is kernel 2.6.26-2-686 (Debian lenny)
$ ./batctl td eth1 23:37:22.022554 BAT rover-eth0: OGM via neigh rover-eth0, seqno 481, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:23.002547 BAT rover-eth0: OGM via neigh rover-eth0, seqno 482, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:23.982469 BAT rover-eth0: OGM via neigh rover-eth0, seqno 483, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:24.962417 BAT rover-eth0: OGM via neigh rover-eth0, seqno 484, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:25.962320 BAT rover-eth0: OGM via neigh rover-eth0, seqno 485, tq 255, ttl 50, v 9, flags [..F], length 42 23:37:26.943490 BAT rover-eth0: OGM via neigh rover-eth0, seqno 486, tq 0, ttl 49, v 9, flags [D..], length 42 23:37:27.041437 BAT rover-eth0: OGM via neigh hilltop-eth1, seqno 486, tq 0, ttl 49, v 9, flags [D..], length 28 23:37:27.893422 BAT hilltop-eth1: OGM via neigh hilltop-eth1, seqno 1, tq 255, ttl 50, v 9, flags [..F], length 22 23:37:27.922233 BAT rover-eth0: OGM via neigh rover-eth0, seqno 487, tq 0, ttl 49, v 9, flags [D..], length 42
Thanks for testing the code. I doubt that this is a bug in the batman-adv code. Can you please make a capture of the raw packet data using tcpdump and/or wireshark? I think that this is a batctl bug, but I don't have time to check it right now. You can do the comparison by yourself using the wireshark-batman-adv dissector linked in the open-mesh.net wiki.
Best regards, Sven
On Sun, Mar 14, 2010 at 01:04:10PM +0100, Sven Eckelmann wrote:
Thanks for testing the code. I doubt that this is a bug in the batman-adv code. Can you please make a capture of the raw packet data using tcpdump and/or wireshark? I think that this is a batctl bug, but I don't have time to check it right now. You can do the comparison by yourself using the wireshark-batman-adv dissector linked in the open-mesh.net wiki.
Just checked the output of a uml tcpdump versus a tcpdump of the corresponding tap device on the host. The result on the host ist correct and the output from the uml has the two bytes from the sequence number exchanged. So it seems that the skb is changed without cloning the data which is changed.
The problem appears when data is resend by the other node and we retreive them. It doesn't seem to happen when we receive data from the other node with the originator also being the other node. This is not the data the network really sees - only tcpdump will see it.
So it is real a problem with batman-adv and not batctl (the code of batctl isn't that complex that it could create such a problem).
Best regards, Sven
"tcpdump" and "batctl td" will receive packets with a wrong sequence number on systems with a different endianess than network byte order. This happens due to the reordering of bytes in the function which handles aggregated bat packets. The function which receives the bat packets must ensure that these buffers aren't shared with anything else before that function tries to write into it. Otherwise it has to copy the buffers so it is save again to change them.
Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman-adv-kernelland/routing.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/batman-adv-kernelland/routing.c b/batman-adv-kernelland/routing.c index 8610b22..0051259 100644 --- a/batman-adv-kernelland/routing.c +++ b/batman-adv-kernelland/routing.c @@ -680,6 +680,7 @@ int recv_bat_packet(struct sk_buff *skb, { struct ethhdr *ethhdr; unsigned long flags; + struct sk_buff *skb_old;
/* drop packet if it has not necessary minimum size */ if (skb_headlen(skb) < sizeof(struct batman_packet)) @@ -695,12 +696,19 @@ int recv_bat_packet(struct sk_buff *skb, if (is_bcast(ethhdr->h_source)) return NET_RX_DROP;
- spin_lock_irqsave(&orig_hash_lock, flags); /* TODO: we use headlen instead of "length", because * only this data is paged in. */ - /* TODO: is another skb_copy needed here? there will be - * written on the data, but nobody (?) should further use - * this data */ + + /* create a copy of the skb, if needed, to modify it. */ + if (!skb_clone_writable(skb, skb_headlen(skb))) { + skb_old = skb; + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) + return NET_RX_DROP; + kfree_skb(skb_old); + } + + spin_lock_irqsave(&orig_hash_lock, flags); receive_aggr_bat_packet(ethhdr, skb->data, skb_headlen(skb),
"tcpdump" and "batctl td" will receive packets with a wrong sequence number on systems with a different endianess than network byte order. This happens due to the reordering of bytes in the function which handles aggregated bat packets. The function which receives the bat packets must ensure that these buffers aren't shared with anything else before that function tries to write into it. Otherwise it has to copy the buffers so it is save again to change them.
Reported-by: Kevin Steen batman@kevinsteen.net Signed-off-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman-adv-kernelland/routing.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/batman-adv-kernelland/routing.c b/batman-adv-kernelland/routing.c index 8610b22..0051259 100644 --- a/batman-adv-kernelland/routing.c +++ b/batman-adv-kernelland/routing.c @@ -680,6 +680,7 @@ int recv_bat_packet(struct sk_buff *skb, { struct ethhdr *ethhdr; unsigned long flags; + struct sk_buff *skb_old;
/* drop packet if it has not necessary minimum size */ if (skb_headlen(skb) < sizeof(struct batman_packet)) @@ -695,12 +696,19 @@ int recv_bat_packet(struct sk_buff *skb, if (is_bcast(ethhdr->h_source)) return NET_RX_DROP;
- spin_lock_irqsave(&orig_hash_lock, flags); /* TODO: we use headlen instead of "length", because * only this data is paged in. */ - /* TODO: is another skb_copy needed here? there will be - * written on the data, but nobody (?) should further use - * this data */ + + /* create a copy of the skb, if needed, to modify it. */ + if (!skb_clone_writable(skb, skb_headlen(skb))) { + skb_old = skb; + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) + return NET_RX_DROP; + kfree_skb(skb_old); + } + + spin_lock_irqsave(&orig_hash_lock, flags); receive_aggr_bat_packet(ethhdr, skb->data, skb_headlen(skb),
On Monday 15 March 2010 01:15:55 Sven Eckelmann wrote:
"tcpdump" and "batctl td" will receive packets with a wrong sequence number on systems with a different endianess than network byte order. This happens due to the reordering of bytes in the function which handles aggregated bat packets. The function which receives the bat packets must ensure that these buffers aren't shared with anything else before that function tries to write into it. Otherwise it has to copy the buffers so it is save again to change them.
Applied in rev 1598.
Thanks for this quick fix!
Cheers, Marek
Works correctly on my systems now - thanks.
-Kevin
b.a.t.m.a.n@lists.open-mesh.org