[B.A.T.M.A.N.] any throughput mechanism or plans?

List overview All Threads
Download

newer

older

[B.A.T.M.A.N.] new batman-adv...

[B.A.T.M.A.N.] question about...

dan

31 Mar 2012 31 Mar '12

10:22 p.m.

I can't see anything in my reading on this.

Is there a mechanism to identify throughput between two nodes? or between a client and a destination for the purpose of routing through the highest throughput path?

For instance, track the maximum throughput on each interface over time. Compare it to the current throughput on the interface and send what is leftover as 'available throughput' with the TQ each node. Now instead of adding up the numbers as with TQ, just take the min. Take the lowest throughput along the path and have a node be able to utilize the highest throughput path that has acceptable TQ. As a node's interface gets loaded up (vs observed maximum) it will inform it's neighbors in the same way it informs them of the TQ of each interface and then the neighbors can choose to route around if there is a good alternate path.

Basically, I'm thinking that TQ isn't the only or most optimal way to route simply because available throughput should be considered somehow. As far as I can tell, a 56k link with .001% packet loss is better than a 54Mb link with .1% packet loss, though both are solid interfaces and the 54Mb link should be prefered heavily over the 56k link. The 56k link probably will start dropping packets once it is saturated, but one client might run up against slow ACKs keeping the remote server from sending enough data to saturate the connection so the client could be getting a very slow connection while a 54Mb link sits virtually unused.

Show replies by date

Antonio Quartulli

31 Mar 31 Mar

8:43 p.m.

On Sat, Mar 31, 2012 at 04:22:06 -0600, dan wrote:

...

I can't see anything in my reading on this.

Is there a mechanism to identify throughput between two nodes? or between a client and a destination for the purpose of routing through the highest throughput path?

For instance, track the maximum throughput on each interface over time. Compare it to the current throughput on the interface and send what is leftover as 'available throughput' with the TQ each node. Now instead of adding up the numbers as with TQ, just take the min. Take the lowest throughput along the path and have a node be able to utilize the highest throughput path that has acceptable TQ. As a node's interface gets loaded up (vs observed maximum) it will inform it's neighbors in the same way it informs them of the TQ of each interface and then the neighbors can choose to route around if there is a good alternate path.

Basically, I'm thinking that TQ isn't the only or most optimal way to route simply because available throughput should be considered somehow. As far as I can tell, a 56k link with .001% packet loss is better than a 54Mb link with .1% packet loss, though both are solid interfaces and the 54Mb link should be prefered heavily over the 56k link. The 56k link probably will start dropping packets once it is saturated, but one client might run up against slow ACKs keeping the remote server from sending enough data to saturate the connection so the client could be getting a very slow connection while a 54Mb link sits virtually unused.

Hello Dan,

there is not such mechanism in batman-adv right now. It is a really good idea and we already started to think about that during WBMv5. Personally, I'm applying to the GSOC2012 and I'm proposing something similar.

I also think that packet loss is not the correct way to go.

Regarding your idea, how would you measure the maximum throughput?

Cheers,

-- Antonio Quartulli ..each of us alone is worth nothing.. Ernesto "Che" Guevara

dan

1 Apr 1 Apr

2:02 a.m.

...

Hello Dan,

there is not such mechanism in batman-adv right now. It is a really good idea and we already started to think about that during WBMv5. Personally, I'm applying to the GSOC2012 and I'm proposing something similar.

I also think that packet loss is not the correct way to go.

Regarding your idea, how would you measure the maximum throughput?

An easy way might be to just pull the info from ifconfig on a timer:

eth0 RX bytes:2841699391 (2.6 GiB) TX bytes:2681928474 (2.4 GiB) + 5 seconds RX bytes:2842069649 (2.6 GiB) TX bytes:2682282969 (2.4 GiB)

=RX of 361.5Kb, TX of 346.2Kb

just update a value for the MAX of both as they change

Compare the stored value to the last 5 second interval so see what amount of the connection is available. In my case, I have a 20Mb/10Mb connection so I have 19.3Mb/9.3Mb available. If I know the connection speed (reliably) then I should be able to statically assign this. Otherwise it should just be based on historical observations. Wireless links are unpredictable so we have to rely on observation while wired or higher end backhaul links are more predictable so it may be best just use a set value.

It would be better if we could get total throughput on the wireless link if possible.

I would think it would be useful to identify 1/2 duplex vs full duplex. 1/2 duplex should be the aggregate of the rx and tx with some ratio applied (70/30 by default for dl vs ul, this should be tunable) and full duplex wouldn't have a ratio applied..

devices like an SAF freemile or Ubiquiti AirFibre are full duplex, so SAF = 100Mb FD and AirFibre could be 700Mb FD.

I am assuming that each node knows which direction is upload and which is download but that might not be true unless the gateway config was used... I'm not sure if that would be another type of advertisement or what.

Guido Iribarren

2:39 a.m.

On Sat, Mar 31, 2012 at 11:02 PM, dan dandenson@gmail.com wrote:

...

...
Hello Dan,

there is not such mechanism in batman-adv right now. It is a really good idea and we already started to think about that during WBMv5. Personally, I'm applying to the GSOC2012 and I'm proposing something similar.

I also think that packet loss is not the correct way to go.

Regarding your idea, how would you measure the maximum throughput?

An easy way might be to just pull the info from ifconfig on a timer:

eth0 RX bytes:2841699391 (2.6 GiB) TX bytes:2681928474 (2.4 GiB)

5 seconds

RX bytes:2842069649 (2.6 GiB) TX bytes:2682282969 (2.4 GiB)

=RX of 361.5Kb, TX of 346.2Kb

just update a value for the MAX of both as they change

Compare the stored value to the last 5 second interval so see what amount of the connection is available. In my case, I have a 20Mb/10Mb connection so I have 19.3Mb/9.3Mb available.

That sounds like a nice start! I'm not sure, though, if that number includes retransmissions and/or unacknowledged frames. IIRC i think i've seen that TX number grow in an interface with such a lossy link that was just sending traffic (trying to reach the distant ap) but getting nothing in return (ap couldn't hear me)

So far, with openwrt and ath9k / ath9k_htc, i've found the "tx bitrate" (or MCS level) a fair indicator of the link capacity. if it says "65.0 MBit/s MCS 7" i cannot tell exactly how much goodput it will have, but i can be pretty confident that it will be better than when it says "6.5 MBit/s MCS 1"

Guido Iribarren

2:53 a.m.

On Sat, Mar 31, 2012 at 11:39 PM, Guido Iribarren guidoiribarren@buenosaireslibre.org wrote:

...

On Sat, Mar 31, 2012 at 11:02 PM, dan dandenson@gmail.com wrote:

...
...
Regarding your idea, how would you measure the maximum throughput?

An easy way might be to just pull the info from ifconfig on a timer:

eth0 RX bytes:2841699391 (2.6 GiB) TX bytes:2681928474 (2.4 GiB)

5 seconds

RX bytes:2842069649 (2.6 GiB) TX bytes:2682282969 (2.4 GiB)

=RX of 361.5Kb, TX of 346.2Kb

just update a value for the MAX of both as they change

Compare the stored value to the last 5 second interval so see what amount of the connection is available. In my case, I have a 20Mb/10Mb connection so I have 19.3Mb/9.3Mb available.

That sounds like a nice start! I'm not sure, though, if that number includes retransmissions and/or unacknowledged frames.

Oh, i forgot to add, iw does discriminate between succesfully sent packets, and unsuccessfull retries or failed packets. For example

# batctl o -n [B.A.T.M.A.N. adv 2012.0.0, MainIF/MAC: wlan0-1/56:e6:fc:be:29:d3 (bat0)] Originator last-seen (#/255) Nexthop [outgoingIF]: Potential nexthops ... 56:e6:fc:be:26:13 0.270s (145) 56:e6:fc:b9:b6:48 [ wlan1]: 56:e6:fc:b9:b6:47 (134) 56:e6:fc:b9:b6:48 (145) ### Let's find out how good looks that Nexthop 56:e6:fc:b9:b6:48 on wlan1

# iw wlan1 station get 56:e6:fc:b9:b6:48 Station 56:e6:fc:b9:b6:48 (on wlan1) inactive time: 20 ms rx bytes: 3593056580 rx packets: 2728661 tx bytes: 915427424 tx packets: 1677330 tx retries: 0 tx failed: 0 signal: -71 dBm signal avg: -70 dBm tx bitrate: 72.2 MBit/s MCS 7 short GI rx bitrate: 72.2 MBit/s MCS 7 short GI

Those "tx bytes" do not include retries or failed transmissions, only ACKed packets (AFAIK)

of course, making batman peek into this would definitely deviate from the bat-idea of "i don't care if physical is wired, wifi, wimax or avian carriers"

my 2c

Dan Denson

2:53 a.m.

...

That sounds like a nice start! I'm not sure, though, if that number includes retransmissions and/or unacknowledged frames. IIRC i think i've seen that TX number grow in an interface with such a lossy link that was just sending traffic (trying to reach the distant ap) but getting nothing in return (ap couldn't hear me)

So far, with openwrt and ath9k / ath9k_htc, i've found the "tx bitrate" (or MCS level) a fair indicator of the link capacity. if it says "65.0 MBit/s MCS 7" i cannot tell exactly how much goodput it will have, but i can be pretty confident that it will be better than when it says "6.5 MBit/s MCS 1"

Good point on ifconfigs potential limitations.

One issue with using the wireless sync rate is that fresnel zone infractions will allow you to sync at one rate but for the link to fall apart as the throughput grows. I have seen a 130Mb link degrade to 54Mb as soon as it started passing significant traffic.

Marek Lindner

9:35 p.m.

On Sunday, April 01, 2012 05:39:05 Guido Iribarren wrote:

...

So far, with openwrt and ath9k / ath9k_htc, i've found the "tx bitrate" (or MCS level) a fair indicator of the link capacity. if it says "65.0 MBit/s MCS 7" i cannot tell exactly how much goodput it will have, but i can be pretty confident that it will be better than when it says "6.5 MBit/s MCS 1"

This info is pretty useless since the reported tx rate is the rate used to send the last packet. If the last packet is sent via broadcast the tx rate will show the broadcast tx rate.

Regards, Marek

Marek Lindner

9:32 p.m.

On Sunday, April 01, 2012 05:02:02 dan wrote:

...

An easy way might be to just pull the info from ifconfig on a timer:

eth0 RX bytes:2841699391 (2.6 GiB) TX bytes:2681928474 (2.4 GiB)

5 seconds RX bytes:2842069649 (2.6 GiB) TX bytes:2682282969 (2.4 GiB)

=RX of 361.5Kb, TX of 346.2Kb

just update a value for the MAX of both as they change

Compare the stored value to the last 5 second interval so see what amount of the connection is available. In my case, I have a 20Mb/10Mb connection so I have 19.3Mb/9.3Mb available. If I know the connection speed (reliably) then I should be able to statically assign this. Otherwise it should just be based on historical observations. Wireless links are unpredictable so we have to rely on observation while wired or higher end backhaul links are more predictable so it may be best just use a set value.

It would be better if we could get total throughput on the wireless link if possible.

This approach bears some drawbacks: We can only assess the maximum throughput by saturating the link. Saturating the link isn't what we really want to do because it means nobody else can use the link to transfer data. Furthermore, what if nobody transmits anything whithin your interval ? The counters won't increase and the link will be considered "bad" ?

Regards, Marek

dan

10:23 p.m.

...

This approach bears some drawbacks: We can only assess the maximum throughput by saturating the link. Saturating the link isn't what we really want to do because it means nobody else can use the link to transfer data. Furthermore, what if nobody transmits anything whithin your interval ? The counters won't increase and the link will be considered "bad" ?

Regards, Marek

I am making some assumptions. I assume that the link will at some point become saturated. If we simply track the maximum then we can advertise an available amount. This might be tunable to a % of actual if trying to avoid saturating a link is a goal.

Here is an example

Y | A---B---C---D---E---X \ / F---G---H---I

every node A-I has an equal amount of bandwidth available. All the links are the same quality and have the same connection rate. (lets say 10Mb aggregate)

A is looking for the best route to X, which is the gateway. TQ says that A<>X via B and A<>X via F are good, but A<>X via B has a slightly lower TQ because it is once less hop.

The catch here is that Y is sending traffic through C at a rate of 5Mb aggregate. This means that in our route selection, A<>X via B is identified as the best path because TQ sees clean links with very little packet loss. Y is not sending at a rate that saturates C so performance is still good and TQ is still good.

We should have a mechanism that identifies that although A<>B and A<>F are both good paths, I should prefer to route through A>F>G>H>I>E>X because the most restrictive available bandwidth in 10Mb while on A>B>C>D>E>F>X path, the C node can only provide 5Mb aggregate speed.

How do we identify this? Well, if the C radio has historically transfered 10Mb (on the interface closest to the gateway) and we have tracked it, we can take the current 5Mb away from that an see that there is 5Mb remaining. This does also assume that a specific interface has a consistent speed.

Each node could simply ask the next node closest to the gateway the available speed on the path. Each node would offer the lowest speed available, either the upstream nodes advertised speed or it's own. Since batman-adv only really cares about routing to the next neighbor with ideal TQ then this method plays right into the batman-adv system.

I don't suggest making throughput the #1 route selection method, only what would be used if similar quality links where available. in this case, A<>B and A<>F are very similar quality so we would use available throughput in the decision making. Have a tunable threshold for TQ vs TQ before this load balancing is taken into account.

Send the throughput information frequently so that as a node takes on routes due to available bandwidth, it is less likely to be routed through.

I would add that it is probably a good idea to try to lock in a route sourced from a single source else the routes might jump around. If a client device is downloading at a high speed, once batman-adv has selected a route, it should stick with it for that client for some period unless the TQ on the link plummets. Else a route might jump back and forth between two paths because as one path gets more saturated, the other will start looking very interesting and switch, creating a swinging pattern.

I have another though on how to determine maximum speed but it is more 'destructive' Have batman-adv do a test on each link for tx, rx, and bi-directional and store the results and consider these the interfaces potential. Also identify if an interface is FD or HD. retest on an interval, and/or when the TQ on a link is consistently worse than when tested last. If the test was thorough enough, it would be able to identify at what throughput ping times and packet loss spike and have an effective 'safe' maximum vs absolute maximum.

Marek Lindner

2 Apr 2 Apr

9:32 a.m.

On Monday, April 02, 2012 00:23:36 dan wrote:

...

I am making some assumptions. I assume that the link will at some point become saturated. If we simply track the maximum then we can advertise an available amount.

This will result in a metric optimizing paths for the highest throughput ever recorded. In reality one can easily observe many links with variable throughput. Sometimes you get a spike of high throughput although the average speed is lower. Or your wifi environment changes with a negative impact on the throughput.

...

How do we identify this? Well, if the C radio has historically transfered 10Mb (on the interface closest to the gateway) and we have tracked it, we can take the current 5Mb away from that an see that there is 5Mb remaining. This does also assume that a specific interface has a consistent speed.

That is not a safe assumption.

...

I don't suggest making throughput the #1 route selection method, only what would be used if similar quality links where available. in this case, A<>B and A<>F are very similar quality so we would use available throughput in the decision making. Have a tunable threshold for TQ vs TQ before this load balancing is taken into account.

Interesting idea. Have to think about this a little bit.

...

I have another though on how to determine maximum speed but it is more 'destructive' Have batman-adv do a test on each link for tx, rx, and bi-directional and store the results and consider these the interfaces potential. Also identify if an interface is FD or HD. retest on an interval, and/or when the TQ on a link is consistently worse than when tested last. If the test was thorough enough, it would be able to identify at what throughput ping times and packet loss spike and have an effective 'safe' maximum vs absolute maximum.

Yes, we still have the "costly" way of detecting the link throughput ourselves. What do you think about the idea of asking the wifi rate algorithm for the link speed ?

Regards, Marek

dan

1:35 p.m.

On Mon, Apr 2, 2012 at 3:32 AM, Marek Lindner lindner_marek@yahoo.de wrote:

...

On Monday, April 02, 2012 00:23:36 dan wrote:

...
I am making some assumptions. I assume that the link will at some point become saturated. If we simply track the maximum then we can advertise an available amount.

This will result in a metric optimizing paths for the highest throughput ever recorded. In reality one can easily observe many links with variable throughput. Sometimes you get a spike of high throughput although the average speed is lower. Or your wifi environment changes with a negative impact on the throughput.

True. maybe not keep the maximum. Maybe watch the interface queue and measure the throughput when frames start to get queued. Updated the 'max' speed whenever an interface starts to queue frames.

...

...
How do we identify this? Well, if the C radio has historically transfered 10Mb (on the interface closest to the gateway) and we have tracked it, we can take the current 5Mb away from that an see that there is 5Mb remaining. This does also assume that a specific interface has a consistent speed.

That is not a safe assumption.

maybe its ok to assume that an interface has a consistent speed for some period of time...

...

...
I don't suggest making throughput the #1 route selection method, only what would be used if similar quality links where available. in this case, A<>B and A<>F are very similar quality so we would use available throughput in the decision making. Have a tunable threshold for TQ vs TQ before this load balancing is taken into account.

Interesting idea. Have to think about this a little bit.

...
I have another though on how to determine maximum speed but it is more 'destructive' Have batman-adv do a test on each link for tx, rx, and bi-directional and store the results and consider these the interfaces potential. Also identify if an interface is FD or HD. retest on an interval, and/or when the TQ on a link is consistently worse than when tested last. If the test was thorough enough, it would be able to identify at what throughput ping times and packet loss spike and have an effective 'safe' maximum vs absolute maximum.

Yes, we still have the "costly" way of detecting the link throughput ourselves. What do you think about the idea of asking the wifi rate algorithm for the link speed ?

I am a wisp. In my experience, the wifi sync rate isn't reliable. In perfect conditions yes, but when there is fresnel zone incursion on a wireless link, the algorythm can't take into account reflected signal as noise because they dont exist yet. Not until you start transfering data does the signal get reflected back (as noise) and the radio has to adjust the rate down. Problem is that this happens after you have dropped 5% of your packets, which would drop the TQ on the link and it would be effectively down until. Now the data stops, reflections stop, link changes speed back up and very light use (pings, OGMs) travel safely and TQ rises. Rinse and repeat.

...

Regards, Marek

I wish I had a really great solution to this. I dont really have anything to complain about, batman-adv is already a mile ahead of the next best mesh routing protocal :)

Marek Lindner

3:29 p.m.

On Monday, April 02, 2012 15:35:35 dan wrote:

...

...
This will result in a metric optimizing paths for the highest throughput ever recorded. In reality one can easily observe many links with variable throughput. Sometimes you get a spike of high throughput although the average speed is lower. Or your wifi environment changes with a negative impact on the throughput.

True. maybe not keep the maximum. Maybe watch the interface queue and measure the throughput when frames start to get queued. Updated the 'max' speed whenever an interface starts to queue frames.

Watching the interface queue would be very interesting for other features as well but turns out to be hard in practice. A while ago Simon and others tried to improve the interface alternating / bonding by monitoring the fill status of the queue. But the wifi stack does not report the fill status. Even if it did we don't know what is going on in the hardware.

We still have the problem that some links might be idle, therefore we will have to generate traffic before we can evaluate these links.

...

...
Yes, we still have the "costly" way of detecting the link throughput ourselves. What do you think about the idea of asking the wifi rate algorithm for the link speed ?

I am a wisp. In my experience, the wifi sync rate isn't reliable. In perfect conditions yes, but when there is fresnel zone incursion on a wireless link, the algorythm can't take into account reflected signal as noise because they dont exist yet. Not until you start transfering data does the signal get reflected back (as noise) and the radio has to adjust the rate down. Problem is that this happens after you have dropped 5% of your packets, which would drop the TQ on the link and it would be effectively down until. Now the data stops, reflections stop, link changes speed back up and very light use (pings, OGMs) travel safely and TQ rises. Rinse and repeat.

Sounds very similar to the problem above: Without traffic we can't be sure about the possible throughput.

...

I wish I had a really great solution to this. I dont really have anything to complain about, batman-adv is already a mile ahead of the next best mesh routing protocal :)

Thanks for the flowers! :-) Still, we have some work ahead of us. Throughput based routing is a hot topic we want to work on. All ideas are welcome.

Cheers, Marek

dan

6:21 p.m.

...

Watching the interface queue would be very interesting for other features as well but turns out to be hard in practice. A while ago Simon and others tried to improve the interface alternating / bonding by monitoring the fill status of the queue. But the wifi stack does not report the fill status. Even if it did we don't know what is going on in the hardware.

Another issue I suspect is that in AdHoc networks, the channel might be saturated by two other nodes. A third node may not be able to receive the tx from one of the first two nodes and theirfor wouldn't know how saturated the channel was.

...

We still have the problem that some links might be idle, therefore we will have to generate traffic before we can evaluate these links.

Sounds very similar to the problem above: Without traffic we can't be sure about the possible throughput.

Thanks for the flowers! :-) Still, we have some work ahead of us. Throughput based routing is a hot topic we want to work on. All ideas are welcome.

Cheers, Marek

In AdHoc networks, every node that is using the same channel would need to know every other nodes current throughput on that interface so know how saturated the channel was, at least in it's vacinity. The node would basically need to track the MACs that it could see in 1 hop and then request the throughput from those nodes to see what is in use 'on the air'. That is going to be difficult, or very expensive in overhead pulling the throughput from visible nodes and adding up to determine channel saturation and capabilities...

As far as the issue of having to generate traffic to know where to route traffic, maybe reverse the train of thought.

Initially assume that all links are quite similar. Maybe an 'N' node will have a different default than a 'G' node, ethernet, etc. Then adjust down from the assumed average. you might assume that all of your nodes with a certain type of interface have the potential for the same throughput, but if a node begins dropping TQ when a link gets saturated, then for some period of time we know what the link is capable of delivering. This may not be true for very long so this downward adjustment should swing up to the baseline over time. Consistent traffic will keep the known maximum populated, light traffic will let the node drift back to average.

Track this over time. If a loaded node is routinely having the throughput number brought down to 75, then we can adjust the node's default to 75. If the node becomes more capable because of some environmental change that we are not aware of, it will still transfer at full speed, then the history will show that this has been swinging UP from 75 to 85. Adjust the default number for the node based on the trends. If we store that TQ dropped at throughput X 15 times in the last 3 days, we should adjust our throughput number to just below that point as the default. If that changes, then the recent history will reflect it and the default will be changed back to the interface type default.

If we manually assign a node to a certain default speed from some known values then we have a baseline.

certain things might be initially assumable. half duplex links are 70/30 to rx/tx (from receiver's perspective). 100Mb aggregate means 70Mb for our purposes full duplex links are 100/100. 100Mb really means 100Mb here.

assuming a connection level of MCS12 instead of max: single stream N 20mhz is 65Mb H/D * .7 =45 dual stream N is 130 H/D * .7 =91 G is 27Mb * .7 =19 Ethernet is 100Mb * 1 =100 Gigabit is 1000Mb * 1 = 1000

set the wireless G interface on a node to 19, the ethernet to 100.

If historically, we see that the link falls apart at about 16Mb, we need to adjust the default.

this might be some process that sits outside of batman-adv. batman-adv should handle distributing the throughput numbers, but another daemon could handle the math and updating the throughput numbers. This would allow batman-adv to stay pure as far as interfaces go. The helper daemon could handle the differences between using ifconfig, ethtool, iw etc to determine throughput.

4709

Age (days ago)

4711

Last active (days ago)

b.a.t.m.a.n@lists.open-mesh.org

12 comments

5 participants

tags (0)

participants (5)

Antonio Quartulli
dan
Dan Denson
Guido Iribarren
Marek Lindner