I've just come upon an interesting paper that experimentally compares the performance of OLSR, BATMAN and Babel.
Real-world Performance of Current Proactive Multi-hop Mesh Protocols. M. Abolhasan, B. Hagelstein, J. C.-P. Wang.
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1747&context=infopapers
Short summary: see Table II on the last page.
A few comments on the paper:
1. Section II (the informal description of the protocols) doesn't make much sense. Ignore it.
2. They evaluated original OLSR, not OLSR-ETX as used by our friends in Vienna and Berlin.
3. The results in Figure 3 would appear to imply that there's a bug in Babel -- it loses a packet every time it switches routes. I think I understand why.
4. They ran the routing daemons with the default parameters. This means that BATMAN ran with an OGM interval of 1 second, while Babel used a Hello interval of 4 seconds. It would have been interesting to see the results with similar parameters.
5. They didn't measure the amount of routing protocol traffic.
Juliusz
Am Freitag 06 November 2009 20:05:31 schrieb Juliusz Chroboczek:
I've just come upon an interesting paper that experimentally compares the performance of OLSR, BATMAN and Babel.
Real-world Performance of Current Proactive Multi-hop Mesh Protocols. M. Abolhasan, B. Hagelstein, J. C.-P. Wang.
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1747&context=infopapers
Interesting link Juliusz, thank you.
Short summary: see Table II on the last page.
A few comments on the paper:
Section II (the informal description of the protocols) doesn't make much sense. Ignore it.
They evaluated original OLSR, not OLSR-ETX as used by our friends in Vienna and Berlin.
This is the bane of the OLSR protocol. Most people not doing mesh-research just "use the RFC compatible OLSR" to compare it with anything else... and discover (as we all know) that hopcount metric does not work. I'm fighting for including a simple ETX implementation into the coming OLSRv2 RFC at the moment.
Henning Rogge
On Saturday 07 November 2009 03:05:31 Juliusz Chroboczek wrote:
I've just come upon an interesting paper that experimentally compares the performance of OLSR, BATMAN and Babel.
Real-world Performance of Current Proactive Multi-hop Mesh Protocols. M. Abolhasan, B. Hagelstein, J. C.-P. Wang.
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1747&context=infopapers
Very interesting analysis - thanks for sharing this with us.
- They ran the routing daemons with the default parameters. This means that BATMAN ran with an OGM interval of 1 second, while Babel used a Hello interval of 4 seconds. It would have been interesting to see the results with similar parameters.
Although I get your point, you probably share my belief in default options, hence it is the right thing to compare. Useful defaults are the first step towards a working protocol. ;)
Cheers, Marek
PS: You might or might have not noticed that our list now is open for everyone to post (without prior registration).
- They ran the routing daemons with the default parameters. This means that BATMAN ran with an OGM interval of 1 second, while Babel used a Hello interval of 4 seconds.
Although I get your point, you probably share my belief in default options, hence it is the right thing to compare.
Fully agreed -- I would have liked to see a comparison with similar parameters *in addition* to the comparison with the default parameters.
Juliusz
I would have liked to see a comparison with OLSR.org default settings (ETX metric) instead of hopcount metric. As Henning already pointed out, hop count metric is not useful at all and was long abandoned.
Nobody in our freifunk/funkfeuer networks actually uses hopcount and therefore we could achieved mesh sizes of 1000+ nodes *in practice* with OLSR.
a.
On Nov 7, 2009, at 5:42 PM, Juliusz Chroboczek wrote:
- They ran the routing daemons with the default parameters. This
means that BATMAN ran with an OGM interval of 1 second, while Babel used a Hello interval of 4 seconds.
Although I get your point, you probably share my belief in default options, hence it is the right thing to compare.
Fully agreed -- I would have liked to see a comparison with similar parameters *in addition* to the comparison with the default parameters.
Juliusz
-- Olsr-users mailing list Olsr-users@lists.olsr.org http://lists.olsr.org/mailman/listinfo/olsr-users
I would have liked to see a comparison with OLSR.org default settings (ETX metric) instead of hopcount metric.
Agreed.
Note however that this doesn't entirely explain why OLSR collapsed in their tests. If you look at table I, you'll notice that in the case of node Babel did choose the shortest hop-count route, and Babel and OLSR exhibited similar levels of route flapping; in other words, in this particular test ETX and shortest-hop coincide. However, Figure 2 indicates that OLSR's throughput was half that of Babel, and in Figure 3, OLSR's packet delivery ratio was just 75%.
Juliusz
Am Sonntag 08 November 2009 02:08:48 schrieb Juliusz Chroboczek:
I would have liked to see a comparison with OLSR.org default settings (ETX metric) instead of hopcount metric.
Agreed.
Note however that this doesn't entirely explain why OLSR collapsed in their tests. If you look at table I, you'll notice that in the case of node Babel did choose the shortest hop-count route, and Babel and OLSR exhibited similar levels of route flapping; in other words, in this particular test ETX and shortest-hop coincide. However, Figure 2 indicates that OLSR's throughput was half that of Babel, and in Figure 3, OLSR's packet delivery ratio was just 75%.
Good question. Maybe they activated an agressive MPR setting and hit a known bug in the dijkstra algorithm that can create problems with MPR settings (the bug is fixed in the development tree and the current 0.5.6 displays an error message with the not-working settings).
Henning
Hi All,
Many Thanks for all your comments. Just a couple of point of clarification.
1. The aim of this paper was to study all the protocols in their default settings. We did not switch off ETX with olsr (note we used the olsr version from olsr.org). In fact the link quality metric was left to 2 by default. We are well aware that ETX provide more stable routes than hop count.
2. In terms of looking at performance using different parameters, we will be doing this in our future studies. Also, note that the conference papers were limited to 4 pages only.
3. In terms of overheads, given that this was a small scale indoor test-bed, we believed the amount of overhead introduced into the network is not signficant enough to adversely affect the network. So we did not look into overheads for this paper, however we would do this for larger test-beds.
4. We previously ran OLSR using various different outdoor and indoor test-beds and at the time we were doing the experimentations BATMAN and BABEL were more stable.
We would be interested to hear what other aspects of the protocols (such as different parameters) you would like to see studied.
Kind regards, Mehran
----- Original Message ----- From: "Juliusz Chroboczek" jch@pps.jussieu.fr To: babel-users@lists.alioth.debian.org; olsr-users@lists.olsr.org; b.a.t.m.a.n@lists.open-mesh.net Cc: "Mehran Abolhasan" mehrana@uow.edu.au; "Brett Hagelstein" bretth@uow.edu.au; "Chun-Ping Wang" jerryw@uow.edu.au Sent: Saturday, November 07, 2009 6:05 AM Subject: A peer-reviewed assessment of OLSR, BATMAN and Babel
I've just come upon an interesting paper that experimentally compares the performance of OLSR, BATMAN and Babel.
Real-world Performance of Current Proactive Multi-hop Mesh Protocols. M. Abolhasan, B. Hagelstein, J. C.-P. Wang.
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1747&context=infopapers
Short summary: see Table II on the last page.
A few comments on the paper:
1. Section II (the informal description of the protocols) doesn't make much sense. Ignore it.
2. They evaluated original OLSR, not OLSR-ETX as used by our friends in Vienna and Berlin.
3. The results in Figure 3 would appear to imply that there's a bug in Babel -- it loses a packet every time it switches routes. I think I understand why.
4. They ran the routing daemons with the default parameters. This means that BATMAN ran with an OGM interval of 1 second, while Babel used a Hello interval of 4 seconds. It would have been interesting to see the results with similar parameters.
5. They didn't measure the amount of routing protocol traffic.
Juliusz
On Mon, Nov 9, 2009 at 05:46, Dr. Mehran Abolhasan mehrana@uow.edu.au wrote:
Hi All,
Many Thanks for all your comments. Just a couple of point of clarification.
- The aim of this paper was to study all the protocols in their default
settings. We did not switch off ETX with olsr (note we used the olsr version from olsr.org). In fact the link quality metric was left to 2 by default. We are well aware that ETX provide more stable routes than hop count.
The problem is that the "RFC" mode of olsr.org is not well tested, the MPR algorithm is broken in 0.5.6 and RFC conform OLSR networks do not work in practice for non-trivial networks. I'm surprised that you got results this good.
- In terms of looking at performance using different parameters, we will be
doing this in our future studies. Also, note that the conference papers were limited to 4 pages only.
If you plan to start your tests, feel free to contact the olsr mailing lists for some suggestions for parameters. ;)
- In terms of overheads, given that this was a small scale indoor test-bed,
we believed the amount of overhead introduced into the network is not signficant enough to adversely affect the network. So we did not look into overheads for this paper, however we would do this for larger test-beds.
- We previously ran OLSR using various different outdoor and indoor
test-beds and at the time we were doing the experimentations BATMAN and BABEL were more stable.
Noone surprised (without ETX).
Henning Rogge
Hello to both of you, and thanks a lot for your paper.
MA:
- The aim of this paper was to study all the protocols in their default
settings.
I realise that. My comments were just a quick guide to the paper, to make it easier for folks to read it, by no way a criticism.
MA:
We did not switch off ETX with olsr
CPW:
The results shown in our paper is actually the performance of OLSR with ETX.
In Section IV.A, at the bottom of the first column, you say
OLSR, which based on hop-count metric, always maintains the minimum number of hops for any given destination.
I'd really like this to be clarified, it is part of an ongoing debate in our community (which I'm not going to summarise right now, since I'm trying to avoid a flamewar).
MA:
- In terms of overheads, given that this was a small scale indoor
test-bed, we believed the amount of overhead introduced into the network is not signficant enough to adversely affect the network.
I agree. However, this is the kind of informaton that would be interesting to me -- Babel extensively uses reactive requests and updates, and the heuristics used are somewhat tricky to tune. Packet count is the information I need in order to know out if I got it right.
(I'm not quite sure what to measure here, but I'd say that the values of interest are the number of packets sent on average per unit of time, the average, median and standard derivation of packet size, and the average interval between two packets -- the latter being an estimate of how bursty the routing traffic is).
CPW:
You may also aware the protocols that we used are already out-dated. This is because this work was done back in 2008.
The results are still very interesting. While our routing daemons have gone through extensive amounts of tuning since then, the basic principles have not changed, and your paper validates the approaches that we've taken.
CPW:
We are more than happy to undergo another study with more recent protocols.
That would be excellent. We all know how time-consuming it is to set up a testbed and perform quantitative measures, and we're most grateful for any data you're willing to share.
Two comments:
In Figure 3, you show that Babel tends to have a packet loss rate of 1%. Since you had redundant routes in your case, and were running normal 802.11 with link-layer ARQ, this should not happen -- and the BATMAN results show that, indeed, it is possible to have 0% loss rate.
I suspect that the explanation is that Babel ocasionally loses a packet when it switches routes -- that would be consistent with the RCF given in Table I. Do you have any extra data on this test that you'd be willing to share? In particular, do you know whether the losses were evenly distributed, or whether they came in bursts?
Finally, in Section IV.C, you say
BABEL had the fastest route convergence time with a fastest repair time of nine seconds. Interestingly, it was found in the bandwidth test that the route changed in as little as two seconds when parallel paths were available. This suggests the route preference algorithm is more active than the route repair algorithm.
This is, unfortunately, an effect that I don't know how to avoid. Since you were running with a Hello interval of 2 seconds, you have shown that Babel repairs a broken route after 2 to 3 hellos; anything more aggressive than that, and you risk massive amounts of route flapping, since a single stray packet could too easily cause a route switch.
However, you have witnessed route flaps happening in two seconds in the worst case -- given two neighbours A and B, two seconds is the average time for receiving first a packet from A, then a packet from B; if the two routes have very similar metrics, slight oscillations of ETX might cause Babel to switch after just two hellos.
As you justly point out, the two facts put together indicate that the link cost computation and hysteresis in Babel do not interact quite right. This is the most tricky bit of Babel, since it appears to be a pure engineering tradeoff and I don't see any theoretical arguments to guide me in its design.
Thanks again for your work,
Juliusz
b.a.t.m.a.n@lists.open-mesh.org