I'm working on a port of the BATMAN-ADV protocol to FreeRTOS, specifically running on the ESP32 from EspressIF.
I'm currently sending and receiving what seem to be correctly formed OGM messages from BATMAN IV. For now, my goal is interoperability with Raspberry Pis running BATMAN from stock Raspbian.
From the RasPi, I am able to see the ESP32 as a neighbor, but not as an
originator: pi@pi-16:~ $ sudo batctl n [B.A.T.M.A.N. adv 2017.3, MainIF/MAC: wlan0/b8:27:eb:a0:2a:53 (bat0/e6:46:65:5b:c4:c4 BATMAN_IV)] IF Neighbor last-seen wlan0 30:ae:a4:3b:9a:d4 0.090s pi@pi-16:~ $ sudo batctl o [B.A.T.M.A.N. adv 2017.3, MainIF/MAC: wlan0/b8:27:eb:a0:2a:53 (bat0/e6:46:65:5b:c4:c4 BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF]
Can someone explain to me why this might be occurring, and what I'm missing? Currently, the code is doing two things: sending a new OGM with an incrementing sequence number every 1000 ms, and re-broadcasting any received OGMs with a decremented TTL, modified TQ, and DirectLink flag set whenever it receives them (also every 1000 ms or so). Internally, I'm getting close to having the routing algorithm implemented, but it looks like I'm doing something "wrong" in the eyes of the BATMAN implementation on the Pis. I'd be happy to provide packet dumps on request.
Thank you in advance. John Gorkos
On Montag, 1. Oktober 2018 20:01:25 CEST John Gorkos wrote: [...]
From the RasPi, I am able to see the ESP32 as a neighbor, but not as an originator:
[...]
Can someone explain to me why this might be occurring, and what I'm missing?
Looks like no one answered here. Most likely because it could be everything. You are comparing two different concepts here - neighbors and originators. The neighbor list is a rather new feature [1] (compared to the actual B.A.T.M.A.N. IV implementation). So a good idea would be to download the newest batman-adv + batctl [2] and compile it with debugging enabled [3] to get more insight [4].
I also don't know what kind of batctl you are using in the moment. Let us just assume for a moment that the bug is not in batctl (just check whether the newest batctl receives the same info via netlink as the file in debugfs [5]). Just keep in mind that both originator output methods remove some originators from the list when they just look "invalid" [6,7].
Kind regards, Sven
[1] https://git.open-mesh.org/batman-adv.git/commit/fed2826b490ce1daaf039a87a5b2... [2] https://www.open-mesh.org/projects/open-mesh/wiki/Download [3] https://www.open-mesh.org/projects/batman-adv/wiki/Faq#Log-file-doesnt-exist... [4] https://www.open-mesh.org/projects/batman-adv/wiki/Understand-your-batman-ad... [5] https://www.open-mesh.org/projects/batman-adv/wiki/Understand-your-batman-ad... [6] https://git.open-mesh.org/batman-adv.git/blob/6f6cc4f54909127b236069207fe569... [7] https://git.open-mesh.org/batman-adv.git/blob/6f6cc4f54909127b236069207fe569...
Hi John,
On Mon, Oct 01, 2018 at 11:01:25AM -0700, John Gorkos wrote:
I'm working on a port of the BATMAN-ADV protocol to FreeRTOS, specifically running on the ESP32 from EspressIF.
Awesome! I've never seen batman-adv running on such a device. And it looks like you are already quite close to being the first one accomplishing this.
[...] Can someone explain to me why this might be occurring, and what I'm missing? Currently, the code is doing two things: sending a new OGM with an incrementing sequence number every 1000 ms,
and re-broadcasting any received OGMs with a decremented TTL, modified TQ, and DirectLink flag set whenever it receives them (also every 1000 ms or so).
This sounds wrong. You should not rebroadcast any OGM. You should only rebroadcast from the best candidate. I'm wondering whether this could cause issues. Or are you testing with a single EspressIF and a single Pi only for now?
If so, then it's probably more likely that just some flags or addresses got mixed up. Would it be possible for you to share a tcpdump capture from the Pi? You can also check with "batctl log" whether the Pi seems to recognize the ESP32 as an originator.
Also, have you tested with two Pis and no EspressIF? Are Pis themselves forming a mesh just fine for you? Just to outrule that the issue is on the Pi side.
Regards, Linus
Thank you, Linux and Sven, for your responses. My comments are in-line.
On 10/6/18 05:14, Linus Lüssing wrote:
Hi John,
On Mon, Oct 01, 2018 at 11:01:25AM -0700, John Gorkos wrote:
I'm working on a port of the BATMAN-ADV protocol to FreeRTOS, specifically running on the ESP32 from EspressIF.
Awesome! I've never seen batman-adv running on such a device. And it looks like you are already quite close to being the first one accomplishing this.
Getting closer. This weekend, I got the Raspi recognizing the ESP32 as both an originator and a neigbor:
pi@pi-16:~ $ sudo batctl n [B.A.T.M.A.N. adv 2017.3, MainIF/MAC: wlan0/b8:27:eb:a0:2a:53 (bat0/ca:bf:41:a6:af:81 BATMAN_IV)] IF Neighbor last-seen wlan0 02:ae:a4:3b:9a:d4 0.150s pi@pi-16:~ $ sudo batctl o [B.A.T.M.A.N. adv 2017.3, MainIF/MAC: wlan0/b8:27:eb:a0:2a:53 (bat0/ca:bf:41:a6:af:81 BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF] * 02:ae:a4:3b:9a:d4 0.430s (255) 02:ae:a4:3b:9a:d4 [ wlan0]
[...] Can someone explain to me why this might be occurring, and what I'm missing? Currently, the code is doing two things: sending a new OGM with an incrementing sequence number every 1000 ms,
and re-broadcasting any received OGMs with a decremented TTL, modified TQ, and DirectLink flag set whenever it receives them (also every 1000 ms or so).
This sounds wrong. You should not rebroadcast any OGM. You should only rebroadcast from the best candidate. I'm wondering whether this could cause issues. Or are you testing with a single EspressIF and a single Pi only for now?
I must have a misunderstanding of the BATMAN IV OGM mechanism. My understanding is that ALL 1-hop (direct) neighbors will rebroadcast the heard OGM, and beyond that, only best-hop candidates. Right now, I am just testing with a single pi and single ESP32, but I have an ample supply of both. I figured I'd crawl before I tried to walk.
If so, then it's probably more likely that just some flags or addresses got mixed up. Would it be possible for you to share a tcpdump capture from the Pi? You can also check with "batctl log" whether the Pi seems to recognize the ESP32 as an originator.
Raspbian comes with debugfs for Batman disabled, so I wound up recompiling BATMAN for the pi with extra debugging enabled. It's been invaluable in tracking down where I'm wrong.
Also, have you tested with two Pis and no EspressIF? Are Pis themselves forming a mesh just fine for you? Just to outrule that the issue is on the Pi side.
Regards, Linus
So, to be clear, I'm working on BATMAN IV and not BATMAN V. Based on the open-mesh docs, BATMAN V networks don't rebroadcast OGMs but instead use the Echo Location Packets. Am I reading that wrong? If there's a compelling reason to move to BATMAN V, I'd be happy to take a look at it. We don't anticipate more than 4 hops between links, very low traffic (ESP32s really don't have the capacity to generate a lot of traffic quickly) and no more than 20 or so devices in a mesh for starters.
Right now, I'm working on the unicast VLAN TVLV response for the VLAN TVLV query sent by the Pi when my ESP32 first comes up and begins announcing. The Pi is quite insistent that it get an answer... :)
Thank you! John Gorkos
On Montag, 8. Oktober 2018 18:43:37 CEST John Gorkos wrote: [...]
This sounds wrong. You should not rebroadcast any OGM. You should only rebroadcast from the best candidate. I'm wondering whether this could cause issues. Or are you testing with a single EspressIF and a single Pi only for now?
I must have a misunderstanding of the BATMAN IV OGM mechanism. My understanding is that ALL 1-hop (direct) neighbors will rebroadcast the heard OGM, and beyond that, only best-hop candidates.
Yes, single hop neighbors have special rules. But it is also important to check for the incoming+outgoing interface [1].
[...]
So, to be clear, I'm working on BATMAN IV and not BATMAN V. Based on the open-mesh docs, BATMAN V networks don't rebroadcast OGMs but instead use the Echo Location Packets. Am I reading that wrong?
Yes, you are wrong. ELP are used in a link-local context (detecting neighbors and announcing itself to neighbors). OMG2 are used to inform the complete mesh.
If there's a compelling reason to move to BATMAN V, I'd be happy to take a look at it.
B.A.T.M.A.N. V is using a different metric. IV is using one which is basically packet loss based for direct neighbors and V is using the expected throughput for direct neighbors. Unfortunately, the ELP documentation [2] in the wiki is outdated [3].
Kind regards, Sven
[1] https://git.open-mesh.org/batman-adv.git/blob/6f6cc4f54909127b236069207fe569... [2] https://www.open-mesh.org/projects/batman-adv/wiki/ELP [3] https://www.open-mesh.org/issues/363
b.a.t.m.a.n@lists.open-mesh.org