Hi all, as anticipated some weeks ago, I'm working on a modification of B.A.T.M.A.N. routing metric in order to take into account link bit-rate. As already exposed, in my opinion, it is important to consider bit-rate because there isn't any correlation between TQ and the physical capacity.
I have called the new metric PCE (Physical Capacity Estimation), which is defined for each path as the minimum value of (tq_local * bit-rate) among all the link forming it.
After a quick and dirty modification of the code, I've done some test on two simple topologies to verify whether there is an improvement or not. From this preliminary test seems that the improvement is remarkable.
In the attached report there is a better explanation of the modification, and the results from the tests. Any comment will be appreciated!!
Regards.
Hi,
I have called the new metric PCE (Physical Capacity Estimation), which is defined for each path as the minimum value of (tq_local * bit-rate) among all the link forming it.
After a quick and dirty modification of the code, I've done some test on two simple topologies to verify whether there is an improvement or not. From this preliminary test seems that the improvement is remarkable.
thanks for posting your paper. I have a few remarks:
* You claim that route flapping degrades performance more than once in your paper without providing detailed explanations. Do you have further proofs / observations why and how route flapping leads to performance degradations ? In your test setup the performance penalty comes from the fact that a suboptimal path is chosen, not because the route changed. You would need to equally good paths to test the performance influence of route flapping.
* The concept of attaching all hops and their information to the OGM brings us dangerously close to the problems other routing protocols suffer from (especially Link State): The more a node is away from the source the more its information are outdated. Imagine a 10 hop path - how can we rely on this information at the end of that path ? Furthermore, by making a decision at the end of the path it does not mean the next hop (which has newer information) agrees with us on that path. It might send it the packet somewhere else. All that depends at which point the bandwidth influences the routing decision. Do you mind providing a more precise example how the route switching happens ? I hope you thought about loops..
* Taking throughput into the routing equation is an old dream that often clashes with reality. Any idea how to solve the following real world issues: - Different drivers have different bit-rate APIs (the mac80211 stack is supposed to address that but we are not there yet). - Bit-rates are not a fixed value either - mistrel for example maintains a table of 3 different bit-rates. How do you obtain one value ? What about the other bit rate algorithms ? - How to handle interfaces that have no bit-rate (VPN / cable / etc) ? - Bit-rates are a best guess too which might drop rapidly as soon as you try to actually use the link.
Regards, Marek
2011/5/7 Marek Lindner lindner_marek@yahoo.de:
Hi,
I have called the new metric PCE (Physical Capacity Estimation), which is defined for each path as the minimum value of (tq_local * bit-rate) among all the link forming it.
After a quick and dirty modification of the code, I've done some test on two simple topologies to verify whether there is an improvement or not. From this preliminary test seems that the improvement is remarkable.
thanks for posting your paper. I have a few remarks:
- You claim that route flapping degrades performance more than once in your
paper without providing detailed explanations. Do you have further proofs / observations why and how route flapping leads to performance degradations ? In your test setup the performance penalty comes from the fact that a suboptimal path is chosen, not because the route changed. You would need to equally good paths to test the performance influence of route flapping.
You are right, route flapping require a deeper study using equivalent link and measuring whether there is a real performance degradation or not. By the way, the fact that batman changes route frequently is proven. Certainly I will do some experiment regarding this aspect.
- The concept of attaching all hops and their information to the OGM brings us
dangerously close to the problems other routing protocols suffer from (especially Link State): The more a node is away from the source the more its information are outdated. Imagine a 10 hop path - how can we rely on this information at the end of that path ? Furthermore, by making a decision at the end of the path it does not mean the next hop (which has newer information) agrees with us on that path. It might send it the packet somewhere else. All that depends at which point the bandwidth influences the routing decision. Do you mind providing a more precise example how the route switching happens ? I hope you thought about loops..
It is not a real Link State routing, the document maybe is not completely clear. Every node before the OGM rebroadcast, attach its current best path toward the originator node. I could do this without adding an hop list, but simply substituting the TQ with PCE. This modification, as explained in the report, has been done to permit future addition of further information about link such as, for example, the load.
- Taking throughput into the routing equation is an old dream that often
clashes with reality. Any idea how to solve the following real world issues:
- Different drivers have different bit-rate APIs (the mac80211 stack is
supposed to address that but we are not there yet).
This is a major problem, I have based my work to mac 80211 which is the de-facto future standard. I know it is not very mature but it is promising. For example it now works also on Foneras..
- Bit-rates are not a fixed value either - mistrel for example maintains a
table of 3 different bit-rates. How do you obtain one value ? What about the other bit rate algorithms ?
As far as I know, the value given from driver is the current "best-throughput" one in the minstrel case. In any case is reported the most used by the wireless card.
- How to handle interfaces that have no bit-rate (VPN / cable / etc) ?
I think to read the net-device type and assign for cables their bit-rate (for example 100 for ethernet device). Where the bit-rate information instead is not available (old driver for example..) maybe is better to assign to bit-rate a value of 0 and use only the TQ for routing decision. Any suggestion is welcome!!!
- Bit-rates are a best guess too which might drop rapidly as soon as you try
to actually use the link.
For the minstrel case, no estimation of bit-rate is done, the bit-rate is obtained only measuring real traffic. For the other algorithms I have surely to do a more accurate research. In addition I assume that links are used periodically, otherwise why is routing necessary? :)
Regards, Marek
Thank you very much for your precious comments!!
PS. thanks Sven Eckelmann for fixing my patch!! :)
Hi Daniele
It is not a real Link State routing, the document maybe is not completely clear. Every node before the OGM rebroadcast, attach its current best path toward the originator node. I could do this without adding an hop list, but simply substituting the TQ with PCE.
How you tried this? How well does this work?
Not adding path information will be a big help in avoiding routing loops. It seems to me, just the simple idea of weighting the local TQ with a bitrate factor might help.
The hard part is correctly handling the total air time. One 36Mbps hop is better than two 54Mbps hops, because of the total air time.
It would also be interesting to see how bit rates change with usage. It could be that you pick a next hop partially based on its bit rate, which puts load onto the next hop, and its bit rate drops, since it is now being used more and so suffers more interference. The dropping bit rate causes the next hop route decision to change.
It could be, you are actually introducing more route flipping.
What would be interesting is to look at the effect of route flipping. Is it bad for performance? If so, can hysteresis be added without causing routing loops. There was some work posted here a couple of months ago that started to look at these questions. Unfortunately, it did not consider if routing loops would be introduced.
Andrew
2011/5/8 Andrew Lunn andrew@lunn.ch:
Hi Daniele
It is not a real Link State routing, the document maybe is not completely clear. Every node before the OGM rebroadcast, attach its current best path toward the originator node. I could do this without adding an hop list, but simply substituting the TQ with PCE.
How you tried this? How well does this work?
I've not tried this, but it is perfectly equivalent as appending the current best path to OGM. I permit to the next node to do the same operation that previous node has done.
Not adding path information will be a big help in avoiding routing loops. It seems to me, just the simple idea of weighting the local TQ with a bitrate factor might help.
Routing loops can occur exactly as they can occur with current protocol, I think. I did not introduced any cause of loops, I use the list of hop received in OGM only for PCE computation. Anyhow I will think deeply on this potential problem.
The hard part is correctly handling the total air time. One 36Mbps hop is better than two 54Mbps hops, because of the total air time.
I explained this in the report. I did not calculate the exactly total air time, but I weight the metric with its length when comparing it with the current best route. For example, for doubling the length I require 4 times more PCE.
It would also be interesting to see how bit rates change with usage. It could be that you pick a next hop partially based on its bit rate, which puts load onto the next hop, and its bit rate drops, since it is now being used more and so suffers more interference. The dropping bit rate causes the next hop route decision to change.
It could be, you are actually introducing more route flipping.
Certainly bit-rate is subject to frequent oscillations, so I maintain an exponential moving average of PCE that "smooth" this oscillations. Bit-rate is actually subject to variation for various factor, such as node distance, channel occupation, etc.. and all these factor are important for routing decision. I take the bit-rate that summarize all these information and use it to weight the TQ.
What would be interesting is to look at the effect of route flipping. Is it bad for performance? If so, can hysteresis be added without causing routing loops. There was some work posted here a couple of months ago that started to look at these questions. Unfortunately, it did not consider if routing loops would be introduced.
I've seen that work and seems interesting. Routing loops have been not considered yet in any published paper as far as I know. :(
Andrew
Thank you for your comments!
Hi,
- The concept of attaching all hops and their information to the OGM
brings us dangerously close to the problems other routing protocols suffer from (especially Link State): The more a node is away from the source the more its information are outdated. Imagine a 10 hop path - how can we rely on this information at the end of that path ? Furthermore, by making a decision at the end of the path it does not mean the next hop (which has newer information) agrees with us on that path. It might send it the packet somewhere else. All that depends at which point the bandwidth influences the routing decision. Do you mind providing a more precise example how the route switching happens ? I hope you thought about loops..
It is not a real Link State routing, the document maybe is not completely clear. Every node before the OGM rebroadcast, attach its current best path toward the originator node. I could do this without adding an hop list, but simply substituting the TQ with PCE. This modification, as explained in the report, has been done to permit future addition of further information about link such as, for example, the load.
No, it is not exactly like Link State but it goes into the same direction. The problem is this: The more information you propagate through the mesh the more you have to pay attention to not make decisions based on outdated data. In a 4 node testbed you will not see this but as your mesh grows you will. Your document explains how the PCE is calculated but I could not find the section which explains how every individual node comes to a routing decision. That part is essential for the routing loops. In addition you gain no value by sending all these information. Even if every node had up-to-date information about all other nodes in the mesh and made a decision based on that, the next hop can come to a completely different decision and sends the packet somewhere else. That is one of the main reason why I don't see the multipath routing to work as you envision in the paper.
- Bit-rates are not a fixed value either - mistrel for example maintains
a table of 3 different bit-rates. How do you obtain one value ? What about the other bit rate algorithms ?
As far as I know, the value given from driver is the current "best-throughput" one in the minstrel case. In any case is reported the most used by the wireless card.
Ok, your results are not accurate then. Not sure whether this is a problem or not.
- Bit-rates are a best guess too which might drop rapidly as soon as you
try to actually use the link.
For the minstrel case, no estimation of bit-rate is done, the bit-rate is obtained only measuring real traffic. For the other algorithms I have surely to do a more accurate research.
What makes you think this is not an "estimation" ? On wireless you never know the actual throughput unless you use the link. Even then the outcome is highly variable depending on whether you have neighboring nodes that also send traffic, whether there is a sudden interference (microwave is turned on) and so forth and so on.
In addition I assume that links are used periodically, otherwise why is routing necessary? :)
You assume a steady flow of data which rarely is the case. Imagine a user surfing the web - you'll have peaks of traffic and then longer periods of no traffic. During the peak time the bandwidth estimate will drop and recover quickly as soon as the link is not used anymore (I have seen that many times).
Cheers, Marek
2011/5/8 Marek Lindner lindner_marek@yahoo.de:
Hi,
- The concept of attaching all hops and their information to the OGM
brings us dangerously close to the problems other routing protocols suffer from (especially Link State): The more a node is away from the source the more its information are outdated. Imagine a 10 hop path - how can we rely on this information at the end of that path ? Furthermore, by making a decision at the end of the path it does not mean the next hop (which has newer information) agrees with us on that path. It might send it the packet somewhere else. All that depends at which point the bandwidth influences the routing decision. Do you mind providing a more precise example how the route switching happens ? I hope you thought about loops..
It is not a real Link State routing, the document maybe is not completely clear. Every node before the OGM rebroadcast, attach its current best path toward the originator node. I could do this without adding an hop list, but simply substituting the TQ with PCE. This modification, as explained in the report, has been done to permit future addition of further information about link such as, for example, the load.
No, it is not exactly like Link State but it goes into the same direction. The problem is this: The more information you propagate through the mesh the more you have to pay attention to not make decisions based on outdated data. In a 4 node testbed you will not see this but as your mesh grows you will. Your document explains how the PCE is calculated but I could not find the section which explains how every individual node comes to a routing decision. That part is essential for the routing loops.
You are perfectly right. I will revise the report as soon as possible to add this information. Certainly routing loops are the worse problem that can occur, so I will think deeply on it. Thanks for the suggestion!
In addition you gain no value by sending all these information. Even if every node had up-to-date information about all other nodes in the mesh and made a decision based on that, the next hop can come to a completely different decision and sends the packet somewhere else. That is one of the main reason why I don't see the multipath routing to work as you envision in the paper.
mmh..if I have up-to-date information, maybe using the OGM sequence number I can tell my neighbor use a specific path, in particular the best path that there was when that specific OGM traversed the network. And this can be valid also for an hypothetical multi-path routing.
Now that I think on it, with this trick can be demonstrated also the absence of routing loops because in every moment I know the path that a packet will follow.
- Bit-rates are not a fixed value either - mistrel for example maintains
a table of 3 different bit-rates. How do you obtain one value ? What about the other bit rate algorithms ?
As far as I know, the value given from driver is the current "best-throughput" one in the minstrel case. In any case is reported the most used by the wireless card.
Ok, your results are not accurate then. Not sure whether this is a problem or not.
- Bit-rates are a best guess too which might drop rapidly as soon as you
try to actually use the link.
For the minstrel case, no estimation of bit-rate is done, the bit-rate is obtained only measuring real traffic. For the other algorithms I have surely to do a more accurate research.
What makes you think this is not an "estimation" ? On wireless you never know the actual throughput unless you use the link. Even then the outcome is highly variable depending on whether you have neighboring nodes that also send traffic, whether there is a sudden interference (microwave is turned on) and so forth and so on.
As answered to Andrew Lunn, I can exploit the bit-rate variability that summarize many aspects of a link (load, node distance, microwave oven, ..). So we can carefully, use this info, opportunely averaged over time to obtain a better description of a link.
In addition I assume that links are used periodically, otherwise why is routing necessary? :)
You assume a steady flow of data which rarely is the case. Imagine a user surfing the web - you'll have peaks of traffic and then longer periods of no traffic. During the peak time the bandwidth estimate will drop and recover quickly as soon as the link is not used anymore (I have seen that many times).
As said before, this is not necessarily a defect, is a better understanding of link conditions. When load is heavy in a sector of the network, packet are routed in another direction as actually happens also with current batman (and this is much valuable!!!!!!)
Cheers, Marek
b.a.t.m.a.n@lists.open-mesh.org