Hi Robert,
On 05/01/18 09:16, Robert Bates wrote:
Hi Simon,
Thanks for the reply. I'd already proposed changing the timer value to my team (e.g., to 1 hour), and we're likely going to get that done (again, we license this product from another developer, and don't even have direct access to the code under our current arrangement). I agree that it seems to be the most straight-forward solution, but others in the team feel that the desired fix will involve either some type of "ARP if unknown", or timer-based ARP mechanism on the part of some other component/piece of software.
what do you exactly mean with "ARP if unknown"? In theory the ARP packet should be sent exactly by that client that now is unknown. Are you proposing some reactive discovery of unknown clients?
I like your idea of periodically reading tt_local and pinging the clients. I'm going to bring that up when we have the next in-person discussion about a fix (a few people are still out on holiday vacation). I like the fact that with that approach, the additional traffic introduced is only toward locally connected clients, and not over the mesh. Each node is doing this with its locally connected clients, circumventing timeout and removal. I like it. You mentioned some potential pitfalls. We will talk through what those might look like in our customer environments, but my sense is that there is no significant downside. I could wrong.
don't forget that this can get racy: when you iterate over the local table, clients of your interest may have already disappeared (unless you make the probe reliable and with interval shorter than the TT timeout).
At the same time, how do you distinguish between clients that have to be pinged and clients that do not need that?
Another thing: consider that performing a simple ICMP ping from the mesh node to the local client won't be enough, because no packet generated by the client will enter the bat0 interface, thus it won't be detected by batman-adv.
Cheers,