- This approach achieves the similar goal as the
seqno-based one: switch
to
a neighbor only when this neighbor is better and has
more up-to-date
information than the current best route. But this approach is less
constrained and seems not to suffer from the above problem for the
seqno-based approach. I have tested the patch on our test environment. It
works pretty well on the test cases.
Great! I comitted the patch immediately. Thanks again for your thorough
analysis and ideas. :-)
- Some updates: I did more thorough testing. It turns out that this patch
does really solve the problem. This method seems not able to completely
eliminate all possible forms of looping which happen in the test cases.
This result in an special pattern: the rerouting is fast (~15 seconds),
then (after 3~ 5 seconds) the network enters the looping state and it
normally takes long time to recover. Last time when I did tests, I stop
the tests right after I observed the successful rerouting. I didn't look
into the cause of this problem yet. Any inputs are welcome.
- The problem might happen when the neighbor thinks
that I am his best
neighbor (his good number is actually from me). In this case, my
estimation
of that neighbor's TQ is actually decided by my
own TQ (my TQ minus one
hop
penalty). When my TQ drops down, the correct behavior
should be that my
estimation of that neighbor's TQ also drops down accordingly. However,
when
echo cancellation steps in, my estimation of that
neighbor's TQ will not
be
updated and remains to be the stale value. This might
cause problems
(e.g.
looping) in corner cases even with the new patch.
Does this analysis
make
sense or I miss some details of the echo cancellation?
Unless you found another bug the echo cancellation should not hinder the
propagation of our own TQ value. Once we (uml2) received a packet we will
rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got
lost) and update its routing tables accordingly. Now, he will rebroadcast
the
packet as well which will be killed by the echo cancellation on uml1 as
this
packet does not contain new information.
But even if the TQ received via uml2 is worse uml1 might still have a good
TQ
from uml3 in its routing table which might get rebroadcasted.
- In the following example (also appended in the attachment :) ), if I
understand the current echo cancellation implementation correctly, batman
will enter permanent looping between A and B. In this example, A send to F,
all the links are perfect and have the same delay. Only exception is link
A-E. It is an asymmetric link.
Regards,
Yang
(See attached file: looping.pdf)