On Thursday 06 August 2009 16:44:28 Yang Su wrote:
- You are right. Under certain circumstance, this approach may lead to less
optimal route selection. Just to understand batman algorithm better, I try to make an example (please correct me if something's wrong here): There are two direct neighbors via which a sender communicates with a receiver. Via the good neighbor, their is a single good route with long delay. Between the bad neighbor and the receiver there are many parallel bad routes. Each bad route has high packet loss rate but very short delay. The receiver's OGMs arrive at the sender first via the bad neighbor. Although there is substantial OGM loss on each individual bad routes, by aggregation over those parallel bad routes, the bad neighbor is still able to relay most of sender's OGM to the receiver. As a result, if the sender already chooses the bad neighbor as the next hop, it will stick to it for quite long time and does not switch to the good neighbor.
It is possble to simplify your example more to clearly illustrate the problem. Scenario: Host A wants to send data to host B. Consider the beautiful graphic I attached. ;-) The link between A and B is quite asymetric. No packet loss from B to A but heavy packet loss the other way round. Now, B periodically sends its OGMs and all OGMs will arrive at A directly and via the neighbor whereas the OGMs via the neighbor always be slower than the directly received ones. A should choose the longer path via the neighbor because it has no packet loss but the "fastest sequence number wins" algorithm won't allow that.
- This approach achieves the similar goal as the seqno-based one: switch to
a neighbor only when this neighbor is better and has more up-to-date information than the current best route. But this approach is less constrained and seems not to suffer from the above problem for the seqno-based approach. I have tested the patch on our test environment. It works pretty well on the test cases.
Great! I comitted the patch immediately. Thanks again for your thorough analysis and ideas. :-)
- The problem might happen when the neighbor thinks that I am his best
neighbor (his good number is actually from me). In this case, my estimation of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop penalty). When my TQ drops down, the correct behavior should be that my estimation of that neighbor's TQ also drops down accordingly. However, when echo cancellation steps in, my estimation of that neighbor's TQ will not be updated and remains to be the stale value. This might cause problems (e.g. looping) in corner cases even with the new patch. Does this analysis make sense or I miss some details of the echo cancellation?
Unless you found another bug the echo cancellation should not hinder the propagation of our own TQ value. Once we (uml2) received a packet we will rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got lost) and update its routing tables accordingly. Now, he will rebroadcast the packet as well which will be killed by the echo cancellation on uml1 as this packet does not contain new information. But even if the TQ received via uml2 is worse uml1 might still have a good TQ from uml3 in its routing table which might get rebroadcasted.
Regards, Marek