Is there a known issue conerning the DAT functionality in batman-adv 2014.4.0?
I have got a problem with looping ARP packets / multiplication of ARP packets causing ARP storms in a setup with enabled DAT and BLA. My setup consists of 6 mesh nodes of which 3 are connected to the same backbone network. I connected a PC to the backbone which has an open ssh connection to one ot the mesh nodes not connected to the backbone network directly. Using arp -d to delete the ARP cache of the Windows PC forces the PC to send an ARP request to the mesh node used for the ssh session. I can then see multiple copies of that ARP request in the backbone in a wireshark recording and also multiple ARP replies from the mesh node. Sometimes also BLA gratuitous ARP telegrams seem to be looping, but it's easier to force this behaviour with regular ARPs (via arp -d on a PC). Non-ARP telegrams don't seem to be affected and except the waste of bandwith in the mesh and backbone I don't have problems with normal network communication in the mesh.
I could provide the mentioned wireshark recordings made in the backbone network with a switch using port mirroring if someone explains how to provide such a file to the mailing list (I guess simple attachments are not allowed?).
If I disable DAT, everything looks fine again, i.e. no duplicated ARP telegrams anymore (except for a few ARP replies from the mesh node which are received twice, which could be a race for claiming the device?)..
Regards, Andreas
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. ---------------------------------------------------------------------------------------------------- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. ___________________________________________________________________
Hi Andreas,
so far we don't have any known DAT regression in 2014.4.0.
Could you please provide a more detailed description about your setup including how the nodes have their bridges configured and what interfaces have been added to batman-adv?
Thanks!
On 13/03/15 08:28, Andreas Pape wrote:
Is there a known issue conerning the DAT functionality in batman-adv 2014.4.0?
I have got a problem with looping ARP packets / multiplication of ARP packets causing ARP storms in a setup with enabled DAT and BLA. My setup consists of 6 mesh nodes of which 3 are connected to the same backbone network. I connected a PC to the backbone which has an open ssh connection to one ot the mesh nodes not connected to the backbone network directly. Using arp -d to delete the ARP cache of the Windows PC forces the PC to send an ARP request to the mesh node used for the ssh session. I can then see multiple copies of that ARP request in the backbone in a wireshark recording and also multiple ARP replies from the mesh node. Sometimes also BLA gratuitous ARP telegrams seem to be looping, but it's easier to force this behaviour with regular ARPs (via arp -d on a PC). Non-ARP telegrams don't seem to be affected and except the waste of bandwith in the mesh and backbone I don't have problems with normal network communication in the mesh.
I could provide the mentioned wireshark recordings made in the backbone network with a switch using port mirroring if someone explains how to provide such a file to the mailing list (I guess simple attachments are not allowed?).
If I disable DAT, everything looks fine again, i.e. no duplicated ARP telegrams anymore (except for a few ARP replies from the mesh node which are received twice, which could be a race for claiming the device?)..
Regards, Andreas
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. ___________________________________________________________________
Hello Antonio,
my mesh nodes use a wlan interface in adhoc mode as the only hard_if in bat0. bat0 is bridged to a Linux bridge br0 together with the Ethernet interface eth0. The wlan interface ath0 is not part of that bridge. The only interface having an ip address assigned is the bridge br0.
As mentioned I use 6 mesh nodes of that described setup of which 3 are only accessible via the mesh (eth0 interface not connected to any other Ethernet device) and 3 devices are connected with their eth Interfaces to the same Ethernet switch. The Windows PC is also connected to that same switch.
I am using batman-adv 2014.4.0 in combination with a fairly old Linux kernel 2.6.32.26 on an embedded device. If I enable BLA and DAT and send a ping from the Windows PC to one of the mesh nodes which is not connected to the Ethernet backbone, I see a multiplication of the ARP request sent by the PC and even a higher amount of corresponding ARP replies in the backbone network of which I am not sure, how much of them are really sent by the mesh node being the original destination for the ARP request. Furthermore I get lots of "bat0: received packet with own address as source address" and some "eth0: received ...." kernel log messages in that case as soon as the PC sends the first broadcast ARP request (after the mentioned arp -d command).
If I disable DAT on all of my 6 devices the ARP telegrams being visible in the backbone network look normal to me. There is only one broadcast ARP request from the PC and only one ARP reply.
In the meantime I enabled dat debug messages on one of my gateways between the ethernet backbone and the mesh. After clearing the ARP cache of the PC by the arp -d command, I see the following output of batctl l
Parsing outgoing ARP REQUEST ARP MSG : [src: <mac of the PC> - 192.168.0.50 dst: 00:00:00:00:00:00 - 192.168.0.101] Entry updated 192.168.0.50 <mac of the PC> ARP request replied locally ARP Request for 192.168.0.101: fallback prevented Parsing incoming ARP REPLY ARP MSG: [src: <mac of the mesh node> - 192.168.0.101 dst: <mac of the PC> - 192.168.0.50] * encapsulated within a UNICAST packet Entry updated: 192.168.0.101 <mac of the mesh node> Entry updated: 192.168.0.50 <mac of the PC>
followed by a flood of additional messages of similiar kind. From this logging and from what I understood so far about bla and dat from open-mesh.org and a short look into the source code I conclude, that the gateway knew already the mac the PC was looking for ("ARP request replied locally") and did not forward it as a broadcast into the mesh. Nevertheless the gateway received an ARP reply from the mesh. I guess the original ARP request broadcast was forwarded at least by one of the remaining two backbone gateways into the mesh and a reply was sent by someone else (another mesh node with enabled dat or the mesh node being searched for).
This leads me to the question if using dat and a bla setup in combination is considered by design and if this should work or if dat is only reasonable to be used when a backbone network has a single gateway into the mesh (as depicted in the dat wiki on open-mesh.org) only.
Thanks for the support and regards, Andreas
Von: Antonio Quartulli antonio@meshcoding.com An: The list for a Better Approach To Mobile Ad-hoc Networking b.a.t.m.a.n@lists.open-mesh.org, Datum: 13.03.2015 13:22 Betreff: Re: [B.A.T.M.A.N.] DAT broken in 2014.4.0? Gesendet von: "B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org
Hi Andreas,
so far we don't have any known DAT regression in 2014.4.0.
Could you please provide a more detailed description about your setup including how the nodes have their bridges configured and what interfaces have been added to batman-adv?
Thanks!
On 13/03/15 08:28, Andreas Pape wrote:
Is there a known issue conerning the DAT functionality in batman-adv 2014.4.0?
I have got a problem with looping ARP packets / multiplication of ARP packets causing ARP storms in a setup with enabled DAT and BLA. My setup
consists of 6 mesh nodes of which 3 are connected to the same backbone network. I connected a PC to the backbone which has an open ssh
connection
to one ot the mesh nodes not connected to the backbone network directly.
Using arp -d to delete the ARP cache of the Windows PC forces the PC to send an ARP request to the mesh node used for the ssh session. I can
then
see multiple copies of that ARP request in the backbone in a wireshark recording and also multiple ARP replies from the mesh node. Sometimes also BLA gratuitous ARP telegrams seem to be looping, but it's
easier to force this behaviour with regular ARPs (via arp -d on a PC). Non-ARP telegrams don't seem to be affected and except the waste of bandwith in the mesh and backbone I don't have problems with normal network communication in the mesh.
I could provide the mentioned wireshark recordings made in the backbone network with a switch using port mirroring if someone explains how to provide such a file to the mailing list (I guess simple attachments are not allowed?).
If I disable DAT, everything looks fine again, i.e. no duplicated ARP telegrams anymore (except for a few ARP replies from the mesh node which
are received twice, which could be a race for claiming the device?)..
Regards, Andreas
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad
Pyrmont
USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
----------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
In the meantime I digged a little deeper into this. DAT as such works but has some side effects on the backbone network in a setup like mine with several mesh nodes connected to the same backbone network and bla enabled. I see two main issues:
1. The original broadcast ARP request sent by the PC is looping back into the backbone network. As far as I have figured out this comes from the encapsulation of the original ARP broadcast into a BATADV_UNICAST_4ADDR frame, which is not handled by the bla code responsible for preventing looping broadcasts as for bla this is a unicast frame. 2. All ARP replies are forwarded into the backbone by all possible gateways. If a gateway gets responses of up to 3 remote dat candidates, the total number of seen arp replies becomes 3 times the number of gateways used.
I am not sure, if this is specific to the old kernel version I used, but I tried to overcome the two mentioned points with the following measures: 1. drop BATADV_UNICAST_4ADDR DHT_GET frames received from another gateway as long as we cannot answer the forwarded arp request. 2. make sure, that only a gateway which has claimed the src mac of an arp reply forwards this reply to the backbone network 3. drop received arp replies as soon as we have a local dat entry for the src mac of the arp reply. In this case it is most likely that the device has already sent a reply.
With these measures I see a "clean" arp request / reply behaviour in the backbone network. As a further improvement I added the snooping of all incoming IP traffic on the mesh soft interface. I use the src mac and src IP to update the local dat cache. I wanted to achieve as low arp request/reply and connected broadcast traffic in the mesh as possible.
If there is interest I could send a patch file to the mailing list with the changes based on the batman-adv git master. But I warn you in front: I am not a very skilled kernel programmer nor do I have any experience in using git ;-)
Regards, Andreas
"B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org schrieb am 13.03.2015 15:35:53:
Von: Andreas Pape APape@phoenixcontact.com An: The list for a Better Approach To Mobile Ad-hoc Networking b.a.t.m.a.n@lists.open-mesh.org, Datum: 13.03.2015 15:57 Betreff: Re: [B.A.T.M.A.N.] DAT broken in 2014.4.0? Gesendet von: "B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org
Hello Antonio,
my mesh nodes use a wlan interface in adhoc mode as the only hard_if in bat0. bat0 is bridged to a Linux bridge br0 together with the Ethernet interface eth0. The wlan interface ath0 is not part of that bridge. The only interface having an ip address assigned is the bridge br0.
As mentioned I use 6 mesh nodes of that described setup of which 3 are only accessible via the mesh (eth0 interface not connected to any other Ethernet device) and 3 devices are connected with their eth Interfaces
to
the same Ethernet switch. The Windows PC is also connected to that same switch.
I am using batman-adv 2014.4.0 in combination with a fairly old Linux kernel 2.6.32.26 on an embedded device. If I enable BLA and DAT and send
a
ping from the Windows PC to one of the mesh nodes which is not connected
to the Ethernet backbone, I see a multiplication of the ARP request sent
by the PC and even a higher amount of corresponding ARP replies in the backbone network of which I am not sure, how much of them are really
sent
by the mesh node being the original destination for the ARP request. Furthermore I get lots of "bat0: received packet with own address as source address" and some "eth0: received ...." kernel log messages in
that
case as soon as the PC sends the first broadcast ARP request (after the mentioned arp -d command).
If I disable DAT on all of my 6 devices the ARP telegrams being visible
in
the backbone network look normal to me. There is only one broadcast ARP request from the PC and only one ARP reply.
In the meantime I enabled dat debug messages on one of my gateways
between
the ethernet backbone and the mesh. After clearing the ARP cache of the
PC
by the arp -d command, I see the following output of batctl l
Parsing outgoing ARP REQUEST ARP MSG : [src: <mac of the PC> - 192.168.0.50 dst: 00:00:00:00:00:00 - 192.168.0.101] Entry updated 192.168.0.50 <mac of the PC> ARP request replied locally ARP Request for 192.168.0.101: fallback prevented Parsing incoming ARP REPLY ARP MSG: [src: <mac of the mesh node> - 192.168.0.101 dst: <mac of the
PC>
- 192.168.0.50]
- encapsulated within a UNICAST packet
Entry updated: 192.168.0.101 <mac of the mesh node> Entry updated: 192.168.0.50 <mac of the PC>
followed by a flood of additional messages of similiar kind. From this logging and from what I understood so far about bla and dat from open-mesh.org and a short look into the source code I conclude, that the
gateway knew already the mac the PC was looking for ("ARP request
replied
locally") and did not forward it as a broadcast into the mesh. Nevertheless the gateway received an ARP reply from the mesh. I guess
the
original ARP request broadcast was forwarded at least by one of the remaining two backbone gateways into the mesh and a reply was sent by someone else (another mesh node with enabled dat or the mesh node being searched for).
This leads me to the question if using dat and a bla setup in
combination
is considered by design and if this should work or if dat is only reasonable to be used when a backbone network has a single gateway into the mesh (as depicted in the dat wiki on open-mesh.org) only.
Thanks for the support and regards, Andreas
Von: Antonio Quartulli antonio@meshcoding.com An: The list for a Better Approach To Mobile Ad-hoc Networking b.a.t.m.a.n@lists.open-mesh.org, Datum: 13.03.2015 13:22 Betreff: Re: [B.A.T.M.A.N.] DAT broken in 2014.4.0? Gesendet von: "B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org
Hi Andreas,
so far we don't have any known DAT regression in 2014.4.0.
Could you please provide a more detailed description about your setup including how the nodes have their bridges configured and what interfaces have been added to batman-adv?
Thanks!
On 13/03/15 08:28, Andreas Pape wrote:
Is there a known issue conerning the DAT functionality in batman-adv 2014.4.0?
I have got a problem with looping ARP packets / multiplication of ARP packets causing ARP storms in a setup with enabled DAT and BLA. My
setup
consists of 6 mesh nodes of which 3 are connected to the same backbone
network. I connected a PC to the backbone which has an open ssh
connection
to one ot the mesh nodes not connected to the backbone network
directly.
Using arp -d to delete the ARP cache of the Windows PC forces the PC
to
send an ARP request to the mesh node used for the ssh session. I can
then
see multiple copies of that ARP request in the backbone in a wireshark
recording and also multiple ARP replies from the mesh node. Sometimes also BLA gratuitous ARP telegrams seem to be looping, but
it's
easier to force this behaviour with regular ARPs (via arp -d on a PC).
Non-ARP telegrams don't seem to be affected and except the waste of bandwith in the mesh and backbone I don't have problems with normal network communication in the mesh.
I could provide the mentioned wireshark recordings made in the
backbone
network with a switch using port mirroring if someone explains how to provide such a file to the mailing list (I guess simple attachments
are
not allowed?).
If I disable DAT, everything looks fine again, i.e. no duplicated ARP telegrams anymore (except for a few ARP replies from the mesh node
which
are received twice, which could be a race for claiming the device?)..
Regards, Andreas
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad
Pyrmont
USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche
anderweitige
Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht
gestattet.
----------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error)
please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
-- Antonio Quartulli
[Anhang "signature.asc" gelöscht von Andreas Pape/Phoenix Contact]
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad
Pyrmont
USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
----------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e- mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. ___________________________________________________________________
.................................................................. PHOENIX CONTACT ELECTRONICS GmbH
Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont USt-Id-Nr.: DE811742156 Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528 Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck ___________________________________________________________________ Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. ---------------------------------------------------------------------------------------------------- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. ___________________________________________________________________
On 18/03/15 11:45, Andreas Pape wrote:
In the meantime I digged a little deeper into this. DAT as such works but has some side effects on the backbone network in a setup like mine with several mesh nodes connected to the same backbone network and bla enabled. I see two main issues:
- The original broadcast ARP request sent by the PC is looping back into
the backbone network. As far as I have figured out this comes from the encapsulation of the original ARP broadcast into a BATADV_UNICAST_4ADDR frame, which is not handled by the bla code responsible for preventing looping broadcasts as for bla this is a unicast frame.
Good point.
- All ARP replies are forwarded into the backbone by all possible
gateways. If a gateway gets responses of up to 3 remote dat candidates, the total number of seen arp replies becomes 3 times the number of gateways used.
This is probably a consequence of point 1, right ?
I am not sure, if this is specific to the old kernel version I used, but I tried to overcome the two mentioned points with the following measures:
- drop BATADV_UNICAST_4ADDR DHT_GET frames received from another gateway
as long as we cannot answer the forwarded arp request. 2. make sure, that only a gateway which has claimed the src mac of an arp reply forwards this reply to the backbone network 3. drop received arp replies as soon as we have a local dat entry for the src mac of the arp reply. In this case it is most likely that the device has already sent a reply.
With these measures I see a "clean" arp request / reply behaviour in the backbone network. As a further improvement I added the snooping of all incoming IP traffic on the mesh soft interface. I use the src mac and src IP to update the local dat cache. I wanted to achieve as low arp request/reply and connected broadcast traffic in the mesh as possible.
If there is interest I could send a patch file to the mailing list with the changes based on the batman-adv git master. But I warn you in front: I am not a very skilled kernel programmer nor do I have any experience in using git ;-)
I am not I understand 100% your proposed solution, but a patch is worth thousand words :) Please do send it..and don't worry about your skill, everybody has to start!
Regards,
b.a.t.m.a.n@lists.open-mesh.org