- This approach achieves the similar goal as the seqno-based one: switch
to
a neighbor only when this neighbor is better and has more up-to-date information than the current best route. But this approach is less constrained and seems not to suffer from the above problem for the seqno-based approach. I have tested the patch on our test environment. It works pretty well on the test cases.
Great! I comitted the patch immediately. Thanks again for your thorough analysis and ideas. :-)
- Some updates: I did more thorough testing. It turns out that this patch does really solve the problem. This method seems not able to completely eliminate all possible forms of looping which happen in the test cases. This result in an special pattern: the rerouting is fast (~15 seconds), then (after 3~ 5 seconds) the network enters the looping state and it normally takes long time to recover. Last time when I did tests, I stop the tests right after I observed the successful rerouting. I didn't look into the cause of this problem yet. Any inputs are welcome.
- The problem might happen when the neighbor thinks that I am his best
neighbor (his good number is actually from me). In this case, my
estimation
of that neighbor's TQ is actually decided by my own TQ (my TQ minus one
hop
penalty). When my TQ drops down, the correct behavior should be that my estimation of that neighbor's TQ also drops down accordingly. However,
when
echo cancellation steps in, my estimation of that neighbor's TQ will not
be
updated and remains to be the stale value. This might cause problems
(e.g.
looping) in corner cases even with the new patch. Does this analysis
make
sense or I miss some details of the echo cancellation?
Unless you found another bug the echo cancellation should not hinder the propagation of our own TQ value. Once we (uml2) received a packet we will rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got lost) and update its routing tables accordingly. Now, he will rebroadcast the packet as well which will be killed by the echo cancellation on uml1 as this packet does not contain new information. But even if the TQ received via uml2 is worse uml1 might still have a good TQ from uml3 in its routing table which might get rebroadcasted.
- In the following example (also appended in the attachment :) ), if I understand the current echo cancellation implementation correctly, batman will enter permanent looping between A and B. In this example, A send to F, all the links are perfect and have the same delay. Only exception is link A-E. It is an asymmetric link.
Regards, Yang (See attached file: looping.pdf)