Thought it might be a good idea to continue the discussion about MTU handling to a new thread on the mailinglist for sharing some experiences we had with this here in Lübeck already. Hope this is fine with you guys :).
In our setup we first started with the default settings of using wifi and ethernet links with the default MTU of 1500, ending up at a MTU of 1476 on bat0. After discovering, that we could actually increase the MTU on both the wifi and ethernet interfaces on our Dlink DIR-300 routers running OpenWRT, we decided to give an MTU of 1524 a try. In the beginning this was absolutely fine for our test setups, as we had a quite homogeneous network. But as soon as we were trying to connect some x86 desktop pcs running BATMAN-Adv to the same network, things got quite complicated. Our experiences were, that we are having 1GBit ethernet card in a laptop, that is able to increase the MTU to 1524 on a recent 2.6.30 kernel. We were trying about 5 random network cards we had available here in our cellar and they all were not able to increase the MTU (including my laptop's BCM5787M GB-ethernet card). So at the moment I'm truely considering to also switch those Dlink routers back to an MTU of 1500 for the sake of compatibility at least on local area networks.
The motivation of using a higher MTU of 1524 at the beginning was the rumour, that there might be some client devices (which we would get into the network by bridging bat0 wifi wlan0 for instance) not able to handle any MTU smaller than 1500. But in fact it turned out, that in our days basically all devices are able to do both IPv4 and IPv6 PMTU discovery on layer 3. So tests showed, that if all BATMAN nodes are using an MTU of 1476 on bat0 (or the overlaying bridge) everything seems to work fine.
So now the question remains of what to do with links, where you are only able to have links with an MTU of smaller than 1500? Well, this usually is just the case when you are tunneling (over the internet) where the standard MTU is mostly lower than 1500 because of the VPN overhead (+ pppoe). Luckily there might be a nice workaround already: Since version 1.0.10 of the tinc VPN daemon, which was released 3 weeks ago, it is handling packets being too large for the PMTU (which tinc discovers on its own) differently in switch mode (which is needed to transport BATMAN-Adv ethernet frames). Usually you are choosing UDP mode in tinc, any ethernet frame inside of this will then be encapsulated in this. But if tinc discovers, that the packet will be too big to fit through this link and the PMTU tinc discovered for it, it will do TCP encapsulation instead to let the kernel fragment the packet automatically.
As in a mesh network usually not an internet uplink but the wifi is very likely being the bottleneck, the extra overhead on the internet uplinks created by fragmentation might not be "harmful" for the network average bandwidth. And as also the packet loss on an internet uplink not running on its bandwidth limit is very, very low, latencies shouldn't increase too much.
I'm going to test the usability of the current tinc version for tranporting BATMAN-Adv packets soon, don't know if this "feature" is working yet at all. But anyone else, please feel free to do the same :).
Yes, and it would probably be nice to have "emergency" fragmentation in BATMAN-Adv already, but as it had been mentioned before, this is not trivial to implement. I also don't see a way of how PMTU discovery might be done instead of fragmentation... probably even tougher to do because of the dynamically changing topology :).
Cheers, Linus
On Tuesday 10 November 2009 03:02:06 Linus Lüssing wrote:
We were trying about 5 random network cards we had available here in our cellar and they all were not able to increase the MTU (including my laptop's BCM5787M GB-ethernet card). So at the moment I'm truely considering to also switch those Dlink routers back to an MTU of 1500 for the sake of compatibility at least on local area networks.
Sure, as I said: not all cards / drivers support that feature.
The motivation of using a higher MTU of 1524 at the beginning was the rumour, that there might be some client devices (which we would get into the network by bridging bat0 wifi wlan0 for instance) not able to handle any MTU smaller than 1500. But in fact it turned out, that in our days basically all devices are able to do both IPv4 and IPv6 PMTU discovery on layer 3. So tests showed, that if all BATMAN nodes are using an MTU of 1476 on bat0 (or the overlaying bridge) everything seems to work fine.
Sorry, I can't follow you here. If the whole network is a switch environment how could the clients perform a working PMTU ? Sure, all clients are able to do PMTU (I don't think somebody doubted that) but it won't work. :) Client sends 1500 bytes -> router receives the frame (no IP!) and drops the packet. Where should the "fragmentation needed" packet come from ? That only works if you route packets instead of switching them.
Usually you are choosing UDP mode in tinc, any ethernet frame inside of this will then be encapsulated in this. But if tinc discovers, that the packet will be too big to fit through this link and the PMTU tinc discovered for it, it will do TCP encapsulation instead to let the kernel fragment the packet automatically.
That is nice but only works because tinc uses IP addresses (unlike batman- adv). AFAIK you use tinc to connect internet endpoints, hence your packet probably looks like this: [ETHER][IP][UDP/TCP][BATMAN-HDR][PAYLOAD] whereas the packets sent by batman-adv look like this: [ETHER][BATMAN-HDR][PAYLOAD]
As in a mesh network usually not an internet uplink but the wifi is very likely being the bottleneck, the extra overhead on the internet uplinks created by fragmentation might not be "harmful" for the network average bandwidth. And as also the packet loss on an internet uplink not running on its bandwidth limit is very, very low, latencies shouldn't increase too much.
I think you underestimate the performance impact. AFAIK IPv6 does not support the clasical IPv4 fragmentation anymore (intermediate routers won't fragment the packets but drop them).
Regards, Marek
Hey,
On Tue, Nov 10, 2009 at 06:34:41PM +0800, Marek Lindner wrote:
The motivation of using a higher MTU of 1524 at the beginning was the rumour, that there might be some client devices (which we would get into the network by bridging bat0 wifi wlan0 for instance) not able to handle any MTU smaller than 1500. But in fact it turned out, that in our days basically all devices are able to do both IPv4 and IPv6 PMTU discovery on layer 3. So tests showed, that if all BATMAN nodes are using an MTU of 1476 on bat0 (or the overlaying bridge) everything seems to work fine.
Sorry, I can't follow you here. If the whole network is a switch environment how could the clients perform a working PMTU ? Sure, all clients are able to do PMTU (I don't think somebody doubted that) but it won't work. :) Client sends 1500 bytes -> router receives the frame (no IP!) and drops the packet. Where should the "fragmentation needed" packet come from ? That only works if you route packets instead of switching them.
Maybe most clients support that, but consider the way back. E.g. in TU-Chemnitz and many other corporate or university network firewalls, ICMP messages including PMTU packets are blocked. This means Wifi client A may send correctly the smaller packets to server B, but B might send a big packet back as the PMTU messages are blocked at the firwall and then get dropped in the mesh network.
This might be considered as problem of the servers network firewall, but unfortunately things like these exist. There are also misconfigured desktop firewalls which block ICMP packets. Its more a question of configuration than support.
regards, Simon
The motivation of using a higher MTU of 1524 at the beginning was the rumour, that there might be some client devices (which we would get into the network by bridging bat0 wifi wlan0 for instance) not able to handle any MTU smaller than 1500. But in fact it turned out, that in our days basically all devices are able to do both IPv4 and IPv6 PMTU discovery on layer 3. So tests showed, that if all BATMAN nodes are using an MTU of 1476 on bat0 (or the overlaying bridge) everything seems to work fine.
Sorry, I can't follow you here. If the whole network is a switch environment how could the clients perform a working PMTU ? Sure, all clients are able to do PMTU (I don't think somebody doubted that) but it won't work. :) Client sends 1500 bytes -> router receives the frame (no IP!) and drops the packet. Where should the "fragmentation needed" packet come from ? That only works if you route packets instead of switching them.
Ah, wait, I forgot one thing: It worked for our hotspots because the coovachili internet gateway had an MTU equal to the PMTU all the way through the mesh. But you are right, we are probably having some trouble when having too mesh clients which are bridged to each other and have an MTU set to 1500...
I'm wondering what you think of how tinc is handling this at the moment in switch mode: It just "fakes" an ICMPv4/6 message with the address of the destination if such a hop is getting an IP-packet bigger than the link MTU. This might sound like a good idea at first sight, but the disadvantage is, that you're getting trouble in IPSec-only networks (which are quite rare at the moment, yes :) ).
Usually you are choosing UDP mode in tinc, any ethernet frame inside of this will then be encapsulated in this. But if tinc discovers, that the packet will be too big to fit through this link and the PMTU tinc discovered for it, it will do TCP encapsulation instead to let the kernel fragment the packet automatically.
That is nice but only works because tinc uses IP addresses (unlike batman- adv). AFAIK you use tinc to connect internet endpoints, hence your packet probably looks like this: [ETHER][IP][UDP/TCP][BATMAN-HDR][PAYLOAD] whereas the packets sent by batman-adv look like this: [ETHER][BATMAN-HDR][PAYLOAD]
Nope, tinc is able to create a TUN (router mode) and TAP (switch mode) network adapter, so it is able to actually transport the original ethernet frame as well: [ETHER][IP][UDP/TCP][ETHER][BATMAN-HDR][PAYLOAD]
As in a mesh network usually not an internet uplink but the wifi is very likely being the bottleneck, the extra overhead on the internet uplinks created by fragmentation might not be "harmful" for the network average bandwidth. And as also the packet loss on an internet uplink not running on its bandwidth limit is very, very low, latencies shouldn't increase too much.
I think you underestimate the performance impact. AFAIK IPv6 does not support the clasical IPv4 fragmentation anymore (intermediate routers won't fragment the packets but drop them).
Hmm, yes, true, IPv6 does not support fragmentation at all... so this would/might just work over the crappy IPv4 at all and is not only but also just a workaround to quickly squeeze some fragmented packets over the ipv4 internet.
I also somehow liked Andrew's suggestion about header compression with segmentation as a fallback mechanism. If this segmentation would occure in rare situations only and transparent for the upper layers, well, why not :).
Cheers, Linus
On Thursday 12 November 2009 08:51:49 Linus Lüssing wrote:
Ah, wait, I forgot one thing: It worked for our hotspots because the coovachili internet gateway had an MTU equal to the PMTU all the way through the mesh. But you are right, we are probably having some trouble when having too mesh clients which are bridged to each other and have an MTU set to 1500...
Ok, then we are on the same page. :)
I'm wondering what you think of how tinc is handling this at the moment in switch mode: It just "fakes" an ICMPv4/6 message with the address of the destination if such a hop is getting an IP-packet bigger than the link MTU. This might sound like a good idea at first sight, but the disadvantage is, that you're getting trouble in IPSec-only networks (which are quite rare at the moment, yes :) ).
This sounds rather hacky - I can think of more scenarios in which that approach will fail (encryption being one of them). The compression idea Andrew was talking about sounds much more promissing.
Nope, tinc is able to create a TUN (router mode) and TAP (switch mode) network adapter, so it is able to actually transport the original ethernet frame as well: [ETHER][IP][UDP/TCP][ETHER][BATMAN-HDR][PAYLOAD]
That is besides the point. Tinc is able to let the kernel handle the fragmentation because it forwards packets on layer 3 and not on layer 2 (even if it can encapsulate layer 2 packets). Batman-adv forwards on layer 2 ...
Regards, Marek
I think you underestimate the performance impact. AFAIK IPv6 does not support the clasical IPv4 fragmentation anymore (intermediate routers won't fragment the packets but drop them).
Hmm, yes, true, IPv6 does not support fragmentation at all...
It does, but it's done end-to-end, not at intermediate routers. See Section 5 of RFC 2460.
Juliusz
Hmm, yes, true, IPv6 does not support fragmentation at all...
It does, but it's done end-to-end, not at intermediate routers. See Section 5 of RFC 2460.
Ah, ok, yes, you're right, forgot about the fragmentation header in IPv6 :).
If BATMAN were just doing transparent link fragmentation towards its neighbours as a "fall-back" mechanism despite header compression, would it be better to also do something like "aggregation"? So for example if we were receiving a packet of 1500 bytes, which does not fit through a bat0 interface with an MTU of 1476, we could wait for a maximum of maybe 5ms for another packet and send a packet fragment of (1476 - (fragmentation information overhead)). A second packet fragment would be send with the rest of the first and the beginning of the second packet. Wouldn't that cause a lot less loss of throughput compared to splitting it in two packets of equal size as done in fragmentation in IPv4 (in fact, only the additonal fragmentation header overhead of a couple of bytes, especially in lossy environments like wifi, right?). Any concerns about such an approach?
Cheers, Linus
On Mon, Nov 23, 2009 at 03:21:51PM +0100, Linus Lüssing wrote:
Hmm, yes, true, IPv6 does not support fragmentation at all...
It does, but it's done end-to-end, not at intermediate routers. See Section 5 of RFC 2460.
Ah, ok, yes, you're right, forgot about the fragmentation header in IPv6 :).
By the way, does anyone know if it is possible to encapsulate packets with the ethernet frame / packets from a brdige port in ipv6? RFC 4303 for encapsulating security payload looks like only tunnelling of ip-traffic is possible with the standardised features of IPv6.
Cheers, Linus
b.a.t.m.a.n@lists.open-mesh.org