Thought it might be a good idea to continue the discussion about MTU handling to a new thread on the mailinglist for sharing some experiences we had with this here in Lübeck already. Hope this is fine with you guys :).
In our setup we first started with the default settings of using wifi and ethernet links with the default MTU of 1500, ending up at a MTU of 1476 on bat0. After discovering, that we could actually increase the MTU on both the wifi and ethernet interfaces on our Dlink DIR-300 routers running OpenWRT, we decided to give an MTU of 1524 a try. In the beginning this was absolutely fine for our test setups, as we had a quite homogeneous network. But as soon as we were trying to connect some x86 desktop pcs running BATMAN-Adv to the same network, things got quite complicated. Our experiences were, that we are having 1GBit ethernet card in a laptop, that is able to increase the MTU to 1524 on a recent 2.6.30 kernel. We were trying about 5 random network cards we had available here in our cellar and they all were not able to increase the MTU (including my laptop's BCM5787M GB-ethernet card). So at the moment I'm truely considering to also switch those Dlink routers back to an MTU of 1500 for the sake of compatibility at least on local area networks.
The motivation of using a higher MTU of 1524 at the beginning was the rumour, that there might be some client devices (which we would get into the network by bridging bat0 wifi wlan0 for instance) not able to handle any MTU smaller than 1500. But in fact it turned out, that in our days basically all devices are able to do both IPv4 and IPv6 PMTU discovery on layer 3. So tests showed, that if all BATMAN nodes are using an MTU of 1476 on bat0 (or the overlaying bridge) everything seems to work fine.
So now the question remains of what to do with links, where you are only able to have links with an MTU of smaller than 1500? Well, this usually is just the case when you are tunneling (over the internet) where the standard MTU is mostly lower than 1500 because of the VPN overhead (+ pppoe). Luckily there might be a nice workaround already: Since version 1.0.10 of the tinc VPN daemon, which was released 3 weeks ago, it is handling packets being too large for the PMTU (which tinc discovers on its own) differently in switch mode (which is needed to transport BATMAN-Adv ethernet frames). Usually you are choosing UDP mode in tinc, any ethernet frame inside of this will then be encapsulated in this. But if tinc discovers, that the packet will be too big to fit through this link and the PMTU tinc discovered for it, it will do TCP encapsulation instead to let the kernel fragment the packet automatically.
As in a mesh network usually not an internet uplink but the wifi is very likely being the bottleneck, the extra overhead on the internet uplinks created by fragmentation might not be "harmful" for the network average bandwidth. And as also the packet loss on an internet uplink not running on its bandwidth limit is very, very low, latencies shouldn't increase too much.
I'm going to test the usability of the current tinc version for tranporting BATMAN-Adv packets soon, don't know if this "feature" is working yet at all. But anyone else, please feel free to do the same :).
Yes, and it would probably be nice to have "emergency" fragmentation in BATMAN-Adv already, but as it had been mentioned before, this is not trivial to implement. I also don't see a way of how PMTU discovery might be done instead of fragmentation... probably even tougher to do because of the dynamically changing topology :).
Cheers, Linus