[please don't send me private mails about batman-adv - unless you have a really good reason to do so. And if not stated otherwise, I must assume that you actually wanted to send you message to the mailing list]
On Thursday, 28 May 2020 21:18:36 CEST Steve Newcomb wrote:
My first guess is that the underlying interfaces (mesh0) stopped to transport unicast frames. Did you check this by setting an IP on mesh0 and ping between these devices using the IPv4 ping?
Not sure what the phrase "to set an IP on mesh0" means, if not simply to endow the corresponding bridge with a static IP. Which is what I'm doing.
Not sure what "IPv4 ping" means. I've disabled IPv6, so I'm not using anything but IPv4.
I am assuming that mesh0 is the device which was added to bat0 as slave. Please replace this with whatever you are using
# on device 1 ip addr add 192.168.23.1/24 dev mesh0
# on device 2 ip addr add 192.168.23.2/24 dev mesh0
If "IPv4 ping" means "the ordinary Linux ping command", then, yes, I've tried that.
The IPv4 ping was just a placeholder for "not batman-adv ping packets". So you can also use ICMPv6 if you prefer. Just make sure to send it over the underlying ("slave") interface of batman-adv. And not on bat0 or any higher layer bridge/vlan/... interface.
With the addresses mentioned earlier:
# on device 1 ping 192.168.23.2
# on device 2 ping 192.168.23.1
And also observe with tcpdump what is received by the other end.
100% packet loss when the offline condition occurs. Batctl o, on the other hand, looks just fine.
Sounds to me like "mesh0" is still able to transport broadcast frames (which are used for the OGMs - which "create" the originator lists in `batctl o`). And if you cannot send unicast frames anymore on mesh0 then something is wrong with the unicast part.
For example, when you are using encryption for the mesh0 link, maybe the group key is still set correctly but something happened with the pairwise key and it is now "corrupted".
Kind regards, Sven
Thanks very much for the advice and clues. I'll report what happens.
By the way, the problem *never* occurs when all devices are inside my house. It only happens in the field. It will take a long time to do this test, because I'll have to set one device up, first, in a remote location, wait for the problem to occur, and then perform the test. If the problem doesn't occur, I assume that would be significant, too.
On 5/28/20 3:31 PM, Sven Eckelmann wrote:
[please don't send me private mails about batman-adv - unless you have a really good reason to do so. And if not stated otherwise, I must assume that you actually wanted to send you message to the mailing list]
I did. Oops.
On Thursday, 28 May 2020 21:18:36 CEST Steve Newcomb wrote:
My first guess is that the underlying interfaces (mesh0) stopped to transport unicast frames. Did you check this by setting an IP on mesh0 and ping between these devices using the IPv4 ping?
Not sure what the phrase "to set an IP on mesh0" means, if not simply to endow the corresponding bridge with a static IP. Which is what I'm doing.
Not sure what "IPv4 ping" means. I've disabled IPv6, so I'm not using anything but IPv4.
I am assuming that mesh0 is the device which was added to bat0 as slave. Please replace this with whatever you are using
# on device 1 ip addr add 192.168.23.1/24 dev mesh0 # on device 2 ip addr add 192.168.23.2/24 dev mesh0
If "IPv4 ping" means "the ordinary Linux ping command", then, yes, I've tried that.
The IPv4 ping was just a placeholder for "not batman-adv ping packets". So you can also use ICMPv6 if you prefer. Just make sure to send it over the underlying ("slave") interface of batman-adv. And not on bat0 or any higher layer bridge/vlan/... interface.
With the addresses mentioned earlier:
# on device 1 ping 192.168.23.2 # on device 2 ping 192.168.23.1
And also observe with tcpdump what is received by the other end.
100% packet loss when the offline condition occurs. Batctl o, on the other hand, looks just fine.
Sounds to me like "mesh0" is still able to transport broadcast frames (which are used for the OGMs - which "create" the originator lists in `batctl o`). And if you cannot send unicast frames anymore on mesh0 then something is wrong with the unicast part.
For example, when you are using encryption for the mesh0 link, maybe the group key is still set correctly but something happened with the pairwise key and it is now "corrupted".
Kind regards, Sven
b.a.t.m.a.n@lists.open-mesh.org