Hello everybody,
I'm interested if there is any progress concerning the bug entry #173 ( http://www.open-mesh.org/issues/173).
I'm currently observing something similiar on an embedded system running an older kernel 2.6.32.26. Batman-adv versions up to 2013.1.0 work flawlessly out of the box. All newer versions show the phenomenon described in bug #173. In my case I found out, that the batadv_batman_skb_recv function is never called again as soon as I add bat0 to the bridge interface I use. If I use the bat0 interface outside a bridge, everything works fine up to the latest version I tested (which was 2014.4.0) even with the old kernel version.
Regards, Andreas Pape
On Wednesday, February 18, 2015 08:35:49 Andreas Pape wrote:
I'm interested if there is any progress concerning the bug entry #173 ( http://www.open-mesh.org/issues/173).
I'm currently observing something similiar on an embedded system running an older kernel 2.6.32.26. Batman-adv versions up to 2013.1.0 work flawlessly out of the box. All newer versions show the phenomenon described in bug #173. In my case I found out, that the batadv_batman_skb_recv function is never called again as soon as I add bat0 to the bridge interface I use. If I use the bat0 interface outside a bridge, everything works fine up to the latest version I tested (which was 2014.4.0) even with the old kernel version.
Can you please try the attached patch and check whether it makes any difference? If the symptoms are the same, please provide step-by-step instructions how you create / configure your interfaces.
Thanks, Marek
Hi,
I adapted your patch to batman-adv-2014.4.0 without success. I got the additional issue that with the patched version of batman-adv I was not able to destroy the virtual wireless interface anymore used fot the adhoc connection over which I try to use batman-adv (error message was: unregister_netdevice: waiting for ath0 to become free).
With the unpatched 2014.4.0 I did the following test on two of my devices:
1. created a virtual wireless interface ath0 in adhoc mode 2. iwconfig ath0 essid TEST 3. iwconfig ath0 channel 40 4. ifconfig ath0 up 5. batctl if add ath0
After this the two devices connected and I could see the repective neighbor via the batctl o command on both devices. So far so good. But I can see via batctl td bat0 OGM packets sent with the MAC address of the wlan interface of the device itself and also from the neigbour this device is connected to via wlan. Is this OK?
6. Generating a bridge interface via brctl addbr br0 7. add bat0 interface to bridge via brctl addif br0 bat0
As soon as I do this, the batadv_batman_skb_recv isn't called anymore (I've put a printk at the beginning of that function for debugging). Furthermore batctl o shows that the mesh communication starts timing out (last seen time for the originator/neighbor exceeds the ogm send interval and increases continuously).
The interesting point in this state is, that batctl td bat0 still shows the reception of ogm messages from the neighbour and from the own wlan interface as mentioned above.
As mentioned I use a kernel version 2.6.32.26 and batman-adv/batctl versions up to 2013.1.0 work with the same configuration steps.
Thanks for the support, Andreas
Von: Marek Lindner mareklindner@neomailbox.ch An: The list for a Better Approach To Mobile Ad-hoc Networking b.a.t.m.a.n@lists.open-mesh.org, Datum: 18.02.2015 12:28 Betreff: Re: [B.A.T.M.A.N.] Question concerning batman-adv bug #173 "Mesh packets on bat0" Gesendet von: "B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org
On Wednesday, February 18, 2015 08:35:49 Andreas Pape wrote:
I'm interested if there is any progress concerning the bug entry #173 ( http://www.open-mesh.org/issues/173).
I'm currently observing something similiar on an embedded system running
an older kernel 2.6.32.26. Batman-adv versions up to 2013.1.0 work flawlessly out of the box. All newer versions show the phenomenon described in bug #173. In my case I found out, that the batadv_batman_skb_recv function is
never
called again as soon as I add bat0 to the bridge interface I use. If I use the bat0 interface outside a bridge, everything works fine up
to
the latest version I tested (which was 2014.4.0) even with the old
kernel
version.
Can you please try the attached patch and check whether it makes any difference? If the symptoms are the same, please provide step-by-step instructions how you create / configure your interfaces.
Thanks, Marek [Anhang "0001-do-not-call-master-netdev_ops-ndo_init.patch" gelöscht von Andreas Pape/Phoenix Contact] [Anhang "signature.asc" gelöscht von Andreas Pape/Phoenix Contact]
On Wednesday, February 18, 2015 13:28:27 Andreas Pape wrote:
I adapted your patch to batman-adv-2014.4.0 without success. I got the additional issue that with the patched version of batman-adv I was not able to destroy the virtual wireless interface anymore used fot the adhoc connection over which I try to use batman-adv (error message was: unregister_netdevice: waiting for ath0 to become free).
It is very possible that the supplied patches have side effects. Right now, I am trying to figure out which part of the code introduced with 2013.2.0 causes the malfunction. I prepared some more patches which deactivate more code, most notably rtnl code added with 2013.2.0. Please give it a try and let me know how it goes.
With the unpatched 2014.4.0 I did the following test on two of my devices:
- created a virtual wireless interface ath0 in adhoc mode
- iwconfig ath0 essid TEST
- iwconfig ath0 channel 40
- ifconfig ath0 up
- batctl if add ath0
After this the two devices connected and I could see the repective neighbor via the batctl o command on both devices. So far so good.
At this point the mesh is working to your expectation ? Can you transport payload across the mesh ? If so, this is a deviation from #173 - wouldn't you agree ?
But I can see via batctl td bat0 OGM packets sent with the MAC address of the wlan interface of the device itself and also from the neigbour this device is connected to via wlan. Is this OK?
Yes, batman-adv continues to use the mac addresses of the interfaces you configure.
Cheers, Marek
At this point the mesh is working to your expectation ? Can you transport
payload across the mesh ? If so, this is a deviation from #173 - wouldn't
you
agree ?
Before adding bat0 to the bridge br0 I can communicate via the mesh interface. I configured ip addresses for the bat0 interfaces on my devices and the ping worked without problems. But I understood from #173 that pinging was possible in that case, too. I'm referring to #173 because I can see the ogm messages received via wlan also at the bat0 interface, which was not the case in 2013.1.0 and earlier - if I remember my tests correctly...
In the meantime I found out that not only batman-adv stops receiving the ogm messages at ath0 but also the wpa_supplicant does not receive EAPOL frames any more as soon as bat0 is attached to the bridge br0 if I try to use WPA with the mesh interface (wpa_supplicant -i ath0). But I can see the EAPOL frames at the bridge interface br0 (via batctl td br0). Strange.
I'll come back to you as soon as I have tested your latest patches.
Thanks for your support, Andreas
Hi Marek,
good news: the sum of the three patches you sent solved the problem as far as I have tested yet. Now bat0 works in combination with the bridge and also ethernet traffic is bridged into the mini-mesh setup I use correctly. Furthermore there are no ogm messages visible anymore at bat0 (with the command batctl td bat0).
Do you see a chance to add these changes to the compatibility code for older kernels (I guess for kernels < 2.6.39)?
Thanks for the excellent help, Andreas
On Wednesday, February 18, 2015 16:22:18 Andreas Pape wrote:
good news: the sum of the three patches you sent solved the problem as far as I have tested yet. Now bat0 works in combination with the bridge and also ethernet traffic is bridged into the mini-mesh setup I use correctly. Furthermore there are no ogm messages visible anymore at bat0 (with the command batctl td bat0).
Do you see a chance to add these changes to the compatibility code for older kernels (I guess for kernels < 2.6.39)?
Sounds like we are on a good path! Glad to hear that!
Ultimately, we want to fix the problem and support the older kernels correctly. We still don't know what exactly created the problem. Can you try the patches one-by-one and let me know if a single also fixes the compat issue ? The first patch you already tried - remains patch 2 and 3.
Cheers, Marek
Hi Marek,
I reverted the changes step by step starting with patch 0002-remove-netdev_calls.patch, as patch 1 did not help and patch 3 containes in compat.h and soft-interface.c, I tried out myself earlier today.
The essential call is in patch 2 as assumed. As soon as I add the netdev_master_upper_dev_link call again to the compilable code, the problem starts to occur (mesh doesn't work as soon as bat0 is added to the bridge, ogm packets can be seen at bat0). It seems that this call behaves in older kernels different compared to newer ones.
I haven't tried to add all the other excluded parts again except for the netdev_master_upper_dev_link call. If you are interested I can test this tomorrow, too.
Regards, Andreas
On Wednesday, February 18, 2015 17:10:12 Andreas Pape wrote:
The essential call is in patch 2 as assumed. As soon as I add the netdev_master_upper_dev_link call again to the compilable code, the problem starts to occur (mesh doesn't work as soon as bat0 is added to the bridge, ogm packets can be seen at bat0). It seems that this call behaves in older kernels different compared to newer ones.
I haven't tried to add all the other excluded parts again except for the netdev_master_upper_dev_link call. If you are interested I can test this tomorrow, too.
Can you try the attached patch without applying any of the previous patches ? This patch is meant to fix the compat issue without harming any functionality and could be included in the next release.
Cheers, Marek
Hi Marek,
the problem seems to be a little bit more complex. Your latest patch does not solve the problem.
But I found out, that commenting out the following line in your patch makes bat0 work:
slave->master = master;
But as this is the core of "enslaving" a device to a master device, this breaks the complete concept behind this I guess (I'm not a skilled kernel developer). From this I conclude that there might be a bug somewhere deeper in the kernel version I use. I don't want to give up too early, but it looks a little bit as if this "enslaving concept using rtnl" might not be usable in these older kernels. What do you think?
Regards, Andreas
On Thursday, February 19, 2015 09:31:08 Andreas Pape wrote:
the problem seems to be a little bit more complex. Your latest patch does not solve the problem.
But I found out, that commenting out the following line in your patch makes bat0 work:
slave->master = master;
But as this is the core of "enslaving" a device to a master device, this breaks the complete concept behind this I guess (I'm not a skilled kernel developer). From this I conclude that there might be a bug somewhere deeper in the kernel version I use. I don't want to give up too early, but it looks a little bit as if this "enslaving concept using rtnl" might not be usable in these older kernels. What do you think?
Please try the attached patch instead. This time we are replacing the function with our own function doing nothing at all. The net_dev->master variable seems to be reserved for interface bonding and shouldn't be touched at all on these ancient kernels.
Cheers, Marek
I started with a freshly unpacked source code of the 2014.4.0 release and applied only your latest patch.
As expected from the tests done so far with the old kernel effectively not calling netdev_set_master allows the usage of the latest batman-adv version in combination with older kernels (at least the 2.6.32 I tested with).
I think this patch is worth to be integrated into the next batman-adv version.
Regards, Andreas
"B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org schrieb am 19.02.2015 10:40:34:
Von: Marek Lindner mareklindner@neomailbox.ch An: The list for a Better Approach To Mobile Ad-hoc Networking b.a.t.m.a.n@lists.open-mesh.org, Datum: 19.02.2015 10:42 Betreff: Re: [B.A.T.M.A.N.] Antwort: Re: Antwort: Antwort: Re: Antwort: Re: Question concerning batman-adv bug #173 "Mesh packets on
bat0"
Gesendet von: "B.A.T.M.A.N" b.a.t.m.a.n-bounces@lists.open-mesh.org
On Thursday, February 19, 2015 09:31:08 Andreas Pape wrote:
the problem seems to be a little bit more complex. Your latest patch
does
not solve the problem.
But I found out, that commenting out the following line in your patch makes bat0 work:
slave->master = master;
But as this is the core of "enslaving" a device to a master device,
this
breaks the complete concept behind this I guess (I'm not a skilled
kernel
developer). From this I conclude that there might be a bug somewhere deeper in the kernel version I use. I don't want to give up too
early,
but it looks a little bit as if this "enslaving concept using rtnl"
might
not be usable in these older kernels. What do you think?
Please try the attached patch instead. This time we are replacing the function with our own function doing nothing at all. The net_dev->master variable seems to be reserved for interface bonding and shouldn't be touched at allon
these
ancient kernels.
Cheers, Marek [Anhang "0001-batman-adv-ignore-netdev_set_master-calls-on- kernels.patch" gelöscht von Andreas Pape/Phoenix Contact] [Anhang "signature.asc" gelöscht von Andreas Pape/Phoenix Contact]
On Thursday, February 19, 2015 11:28:24 Andreas Pape wrote:
I started with a freshly unpacked source code of the 2014.4.0 release and applied only your latest patch.
As expected from the tests done so far with the old kernel effectively not calling netdev_set_master allows the usage of the latest batman-adv version in combination with older kernels (at least the 2.6.32 I tested with).
I think this patch is worth to be integrated into the next batman-adv version.
Thanks for testing! I'll stage the patch for inclusion.
Cheers, Marek
b.a.t.m.a.n@lists.open-mesh.org