Hi Chris,
I hope it's okay that I'm attaching our chatlog here: http://pastebin.org/89225 (being stored for a month). And just to point out, the two captures on your router: http://filebin.ca/hzoxmj (athX) http://filebin.ca/xtwoa (bat0) They seem to show quite well, that batman-adv and/or the kernel seem to drop the arp replays which the router wants to put into the bat0 interface as you described below. I couldn't spot anything wrong in the second dump's arp-replays though.
Anyone else seen this "protocol 4305 is buggy, dev ath1" message before? Could just find 6-10 years old posts on mailinglists to this topic...
On Sun, Feb 07, 2010 at 09:54:38PM +0100, x@muc.ccc.de wrote:
hi!
as openwrt 8.09.2 still ships with an old batman-adv 0.1 module, i tried to compile a batman-adv 0.2 module. the compile worked, the module loads, originators see each other, but on the openwrt box on bat0 tx packets stays 0 while tx dropped obviously increases with each packet to be transmitted.
the setup: laptop debian squeeze amd64 2.6.31.12 batman-adv 0.2 laptop debian sid x86 2.6.32 batman-adv 0.2 ap openwrt 8.09.2 ixp4xx/armeb (cambria) 2.6.26.8 batman-adv 0.2
the facts: all bridges and iptables switched off. with plain ip on the wlan interfaces, pinging between all nodes works fine (when within reach). all three nodes have the respective two other nodes listed as originators, and if all are within reach of each other, with originator=nexthop. pinging via bat0 works between the two laptops. pinging the laptops via bat0 from the ap results in no packets seen on the laptops' bat0. pinging the ap via bat0 from a laptop results in incoming arp-requests and outgoing arp-replies seen on the ap's bat0 - but again, the arp-replies aren't seen on the laptops' bat0 (nor on the laptops' wlan interfaces). on the ap's bat0, the tx packets counter stays at 0, while the tx dropped counter seems to increase with each packet that should be sent over it.
i enabled all logging (15) on the ap and the laptops, but found no hint in there...
the only interesting messages seem to be in dmesg, saying: protocol 4305 is buggy, dev ath1
so to me it seems like all tx packets on bat0 on the ap are dropped, while everything else seems to work as it's supposed to.
i then tried to compile the current (r1568) version from svn for the ap. again, the compile worked, but the ap just freezes immediately when i try to load it.
I also had tried some Debian stable versions with a 2.6.26 kernel, and you're right in one of the last maintenance patches, a bug has been introduced for kernel versions < 2.6.29. (I made another post with some call traces here: https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2010-February/002282.html)
i thought about trying a newer kernel for the ap, but from openwrt there's a special cambria kernel and i haven't found its config and also don't know what patches might have been applied, so i haven't had much hope for any helpful result along this path...
regards,
Chris
Cheers, Linus