On Thursday 06 August 2009 16:44:28 Yang Su wrote:
- You are right. Under certain circumstance, this
approach may lead to less
optimal route selection. Just to understand batman algorithm better, I try
to make an example (please correct me if something's wrong here): There are
two direct neighbors via which a sender communicates with a receiver. Via
the good neighbor, their is a single good route with long delay. Between
the bad neighbor and the receiver there are many parallel bad routes. Each
bad route has high packet loss rate but very short delay. The receiver's
OGMs arrive at the sender first via the bad neighbor. Although there is
substantial OGM loss on each individual bad routes, by aggregation over
those parallel bad routes, the bad neighbor is still able to relay most of
sender's OGM to the receiver. As a result, if the sender already chooses
the bad neighbor as the next hop, it will stick to it for quite long time
and does not switch to the good neighbor.
It is possble to simplify your example more to clearly illustrate the problem.
Scenario: Host A wants to send data to host B. Consider the beautiful graphic
I attached. ;-)
The link between A and B is quite asymetric. No packet loss from B to A but
heavy packet loss the other way round. Now, B periodically sends its OGMs and
all OGMs will arrive at A directly and via the neighbor whereas the OGMs via
the neighbor always be slower than the directly received ones. A should choose
the longer path via the neighbor because it has no packet loss but the
"fastest sequence number wins" algorithm won't allow that.
- This approach achieves the similar goal as the
seqno-based one: switch to
a neighbor only when this neighbor is better and has more up-to-date
information than the current best route. But this approach is less
constrained and seems not to suffer from the above problem for the
seqno-based approach. I have tested the patch on our test environment. It
works pretty well on the test cases.
Great! I comitted the patch immediately. Thanks again for your thorough
analysis and ideas. :-)
- The problem might happen when the neighbor thinks
that I am his best
neighbor (his good number is actually from me). In this case, my estimation
of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop
penalty). When my TQ drops down, the correct behavior should be that my
estimation of that neighbor's TQ also drops down accordingly. However, when
echo cancellation steps in, my estimation of that neighbor's TQ will not be
updated and remains to be the stale value. This might cause problems (e.g.
looping) in corner cases even with the new patch. Does this analysis make
sense or I miss some details of the echo cancellation?
Unless you found another bug the echo cancellation should not hinder the
propagation of our own TQ value. Once we (uml2) received a packet we will
rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got
lost) and update its routing tables accordingly. Now, he will rebroadcast the
packet as well which will be killed by the echo cancellation on uml1 as this
packet does not contain new information.
But even if the TQ received via uml2 is worse uml1 might still have a good TQ
from uml3 in its routing table which might get rebroadcasted.
Regards,
Marek