Hi,
We are trying to get OpenWRT with batman-adv 2011.4.0 to play nice using ap_isolation. However as soon as ap_isolation is turned on the meshed ap are not acessible anymore by our Linux firewall on which at least 1 ap node is physically connected. The following pastebin post shows the exact configuration and details:
Any clues ?
Thanks,
Andre
Hi,
We are trying to get OpenWRT with batman-adv 2011.4.0 to play nice using ap_isolation. However as soon as ap_isolation is turned on the meshed ap are not acessible anymore by our Linux firewall on which at least 1 ap node is physically connected. The following pastebin post shows the exact configuration and details:
how is the Linux firewall connected to batman-adv-AP#1 ? Via eth0 ? Can you please also add the output of "batctl if" and "batctl sn" on all devices ?
What happens if you disable the bridge loop avoidance ? Does it have any effect ?
You are implying that disabling and then re-enabling the ap isolation also fixes the "ping problem" ?
Regards, Marek
Ok, I have added the requested information to the pastebin:
You are implying that disabling and then re-enabling the ap isolation also fixes the "ping problem" ?
Yes, I see the configmation in the logs that ap_isolation was turned on but contrary to 'cold boot' condition where ap_isolation is on, the ping works when doing a disable/enable cycle.
What happens if you disable the bridge loop avoidance ? Does it have any effect ?
I'll test that and report.
On 2012-01-26, at 2:13 PM, Marek Lindner wrote:
Hi,
We are trying to get OpenWRT with batman-adv 2011.4.0 to play nice using ap_isolation. However as soon as ap_isolation is turned on the meshed ap are not acessible anymore by our Linux firewall on which at least 1 ap node is physically connected. The following pastebin post shows the exact configuration and details:
how is the Linux firewall connected to batman-adv-AP#1 ? Via eth0 ? Can you please also add the output of "batctl if" and "batctl sn" on all devices ?
What happens if you disable the bridge loop avoidance ? Does it have any effect ?
You are implying that disabling and then re-enabling the ap isolation also fixes the "ping problem" ?
Regards, Marek
Sorry for the reply on the other post, getting tired I guess...
So...
Ok, disabling loop avoidance has no effect.
Has ap_isolation been tested in this type of scenario?
On 2012-01-26, at 2:13 PM, Marek Lindner wrote:
Hi,
We are trying to get OpenWRT with batman-adv 2011.4.0 to play nice using ap_isolation. However as soon as ap_isolation is turned on the meshed ap are not acessible anymore by our Linux firewall on which at least 1 ap node is physically connected. The following pastebin post shows the exact configuration and details:
how is the Linux firewall connected to batman-adv-AP#1 ? Via eth0 ? Can you please also add the output of "batctl if" and "batctl sn" on all devices ?
What happens if you disable the bridge loop avoidance ? Does it have any effect ?
You are implying that disabling and then re-enabling the ap isolation also fixes the "ping problem" ?
Regards, Marek
On Friday, January 27, 2012 04:39:24 Andre Courchesne wrote:
Sorry for the reply on the other post, getting tired I guess...
So...
Ok, disabling loop avoidance has no effect.
Has ap_isolation been tested in this type of scenario?
Antonio is the real expert on the AP isolation but judging your debug output I'd say the AP isolation has nothing to do with your problem. Also, the disable/enable action seems to point into the same direction.
The AP isolation only drops packets from wireless clients. You should see a "W" in the flags section of the "batctl tg" dump: 02:69:fe:45:a3:cf ( 1) via ae:86:74:01:b4:94 ( 2) [...] Try connecting with a wireless device and you will see it. Unless we have a bug the traffic from the linux firewall should not be dropped (I see no W).
You could get packet dumps to find out where the pings are dropped. Maybe that brings us closer to understanding what is going on.
Regards, Marek
Ok, did a bit of tcpdump and my test was the following:
tcpdump running on the linux firewall on the NIC (5.0.0.1) where AP#1 is connected.
From the linux server I ping AP#1 (T004) and as expected I see:
16:37:38.013775 arp who-has 5.0.0.1 tell T004 16:37:38.013792 arp reply 5.0.0.1 is-at 00:0e:2e:bd:d7:88 (oui Unknown) 16:37:39.010099 IP 5.0.0.1 > T004: ICMP echo request, id 22020, seq 24, length 64 16:37:39.010345 IP T004 > 5.0.0.1: ICMP echo reply, id 22020, seq 24, length 64
Pinging AP#2 (T003), I only get:
16:39:05.165998 arp who-has T003 tell 5.0.0.1
But there are no arp replies.
If I disable ap_isolation on AP#1, and make sure I do not have T003 in the arp table it finally gets it but takes a bit:
[root@andre-test~]# ping T003 PING T003 (5.1.180.144) 56(84) bytes of data. From 5.0.0.1 icmp_seq=2 Destination Host Unreachable From 5.0.0.1 icmp_seq=3 Destination Host Unreachable From 5.0.0.1 icmp_seq=4 Destination Host Unreachable From 5.0.0.1 icmp_seq=6 Destination Host Unreachable From 5.0.0.1 icmp_seq=7 Destination Host Unreachable From 5.0.0.1 icmp_seq=8 Destination Host Unreachable From 5.0.0.1 icmp_seq=10 Destination Host Unreachable From 5.0.0.1 icmp_seq=11 Destination Host Unreachable From 5.0.0.1 icmp_seq=12 Destination Host Unreachable From 5.0.0.1 icmp_seq=15 Destination Host Unreachable From 5.0.0.1 icmp_seq=16 Destination Host Unreachable From 5.0.0.1 icmp_seq=19 Destination Host Unreachable From 5.0.0.1 icmp_seq=20 Destination Host Unreachable 64 bytes from T003 (5.1.180.144): icmp_seq=23 ttl=64 time=4.84 ms 64 bytes from T003 (5.1.180.144): icmp_seq=24 ttl=64 time=2.91 ms 64 bytes from T003 (5.1.180.144): icmp_seq=25 ttl=64 time=2.53 ms 64 bytes from T003 (5.1.180.144): icmp_seq=26 ttl=64 time=1.47 ms 64 bytes from T003 (5.1.180.144): icmp_seq=27 ttl=64 time=1.50 ms 64 bytes from T003 (5.1.180.144): icmp_seq=28 ttl=64 time=1.46 ms 64 bytes from T003 (5.1.180.144): icmp_seq=29 ttl=64 time=1.47 ms
With the tcp dump traces: 16:44:38.378149 arp who-has 5.0.0.1 tell T003 16:44:38.378215 arp reply 5.0.0.1 is-at 00:0e:2e:bd:d7:88 (oui Unknown)
On 2012-01-26, at 3:45 PM, Marek Lindner wrote:
On Friday, January 27, 2012 04:39:24 Andre Courchesne wrote:
Sorry for the reply on the other post, getting tired I guess...
So...
Ok, disabling loop avoidance has no effect.
Has ap_isolation been tested in this type of scenario?
Antonio is the real expert on the AP isolation but judging your debug output I'd say the AP isolation has nothing to do with your problem. Also, the disable/enable action seems to point into the same direction.
The AP isolation only drops packets from wireless clients. You should see a "W" in the flags section of the "batctl tg" dump: 02:69:fe:45:a3:cf ( 1) via ae:86:74:01:b4:94 ( 2) [...] Try connecting with a wireless device and you will see it. Unless we have a bug the traffic from the linux firewall should not be dropped (I see no W).
You could get packet dumps to find out where the pings are dropped. Maybe that brings us closer to understanding what is going on.
Regards, Marek
Hello Andre,
On Thu, Jan 26, 2012 at 04:46:44 -0500, Andre Courchesne wrote:
Ok, did a bit of tcpdump and my test was the following:
[cut]
Thank you for reporting this issue and sending us the dumps. Actually it is very hard to link the ap isolation mechanism to this problem.
First of all I would like to make a simple test. Please, could you use dump packets received on T003 and see if the ARP request (the first one that receives no reply) reaches the node (T003)?
In particular I would suggest you to use wireshark (it can parse batman packets) and to sniff at the same time packets either from the physical interface used by the mesh (I'd say wlan0) and bat0.
Then tell us if you see the ARP request on both interfaces, on wlan0 only or on none of them.
Another question, why are you using the bridge loop avoidance? If possible I would like you to disable any optional feature you have in order to have the cleanest testbed possible. I know that you already tried to disable it without effect, but it is better to perform test without any other "noise".
Cheers,
Hi Antonio,
Thanks for the reply. I will attempt these tests today and provide you as much feedback as possible.
We are using loop avoidance because in some (if not all) installations we will be doing there will be multiple AP wired to the same network to provide redundancy. And if we move to LoopAvoidance-II if I understand correctlt if should also provide bandwidth balancing correct ?
On 2012-01-27, at 8:36 AM, Antonio Quartulli wrote:
Hello Andre,
On Thu, Jan 26, 2012 at 04:46:44 -0500, Andre Courchesne wrote:
Ok, did a bit of tcpdump and my test was the following:
[cut]
Thank you for reporting this issue and sending us the dumps. Actually it is very hard to link the ap isolation mechanism to this problem.
First of all I would like to make a simple test. Please, could you use dump packets received on T003 and see if the ARP request (the first one that receives no reply) reaches the node (T003)?
In particular I would suggest you to use wireshark (it can parse batman packets) and to sniff at the same time packets either from the physical interface used by the mesh (I'd say wlan0) and bat0.
Then tell us if you see the ARP request on both interfaces, on wlan0 only or on none of them.
Another question, why are you using the bridge loop avoidance? If possible I would like you to disable any optional feature you have in order to have the cleanest testbed possible. I know that you already tried to disable it without effect, but it is better to perform test without any other "noise".
Cheers,
-- Antonio Quartulli
..each of us alone is worth nothing.. Ernesto "Che" Guevara
On Fri, Jan 27, 2012 at 09:30:05AM -0500, Andre Courchesne wrote:
Hi Antonio,
Thanks for the reply. I will attempt these tests today and provide you as much feedback as possible.
Thank you.
We are using loop avoidance because in some (if not all) installations we will be doing there will be multiple AP wired to the same network to provide redundancy. And if we move to LoopAvoidance-II if I understand correctlt if should also provide bandwidth balancing correct ?
it depends on what you mean. Incoming traffic will enter the LAN through the "best" (depending on the TQ) node. While the current implementation, IIRC, provides only one fixed entry point. But please, don't mix topic :)
Cheers,
On 2012-01-27, at 8:36 AM, Antonio Quartulli wrote:
Hello Andre,
On Thu, Jan 26, 2012 at 04:46:44 -0500, Andre Courchesne wrote:
Ok, did a bit of tcpdump and my test was the following:
[cut]
Thank you for reporting this issue and sending us the dumps. Actually it is very hard to link the ap isolation mechanism to this problem.
First of all I would like to make a simple test. Please, could you use dump packets received on T003 and see if the ARP request (the first one that receives no reply) reaches the node (T003)?
In particular I would suggest you to use wireshark (it can parse batman packets) and to sniff at the same time packets either from the physical interface used by the mesh (I'd say wlan0) and bat0.
Then tell us if you see the ARP request on both interfaces, on wlan0 only or on none of them.
Another question, why are you using the bridge loop avoidance? If possible I would like you to disable any optional feature you have in order to have the cleanest testbed possible. I know that you already tried to disable it without effect, but it is better to perform test without any other "noise".
Cheers,
-- Antonio Quartulli
..each of us alone is worth nothing.. Ernesto "Che" Guevara
FYI, I am forces to put my OpenWRT/batman-adv developement on hold till probably end of March. A new project is taking over the priority. SO I may ping you guys back than.
Thanks for all the help.
On 2012-01-27, at 9:45 AM, Antonio Quartulli wrote:
On Fri, Jan 27, 2012 at 09:30:05AM -0500, Andre Courchesne wrote:
Hi Antonio,
Thanks for the reply. I will attempt these tests today and provide you as much feedback as possible.
Thank you.
We are using loop avoidance because in some (if not all) installations we will be doing there will be multiple AP wired to the same network to provide redundancy. And if we move to LoopAvoidance-II if I understand correctlt if should also provide bandwidth balancing correct ?
it depends on what you mean. Incoming traffic will enter the LAN through the "best" (depending on the TQ) node. While the current implementation, IIRC, provides only one fixed entry point. But please, don't mix topic :)
Cheers,
On 2012-01-27, at 8:36 AM, Antonio Quartulli wrote:
Hello Andre,
On Thu, Jan 26, 2012 at 04:46:44 -0500, Andre Courchesne wrote:
Ok, did a bit of tcpdump and my test was the following:
[cut]
Thank you for reporting this issue and sending us the dumps. Actually it is very hard to link the ap isolation mechanism to this problem.
First of all I would like to make a simple test. Please, could you use dump packets received on T003 and see if the ARP request (the first one that receives no reply) reaches the node (T003)?
In particular I would suggest you to use wireshark (it can parse batman packets) and to sniff at the same time packets either from the physical interface used by the mesh (I'd say wlan0) and bat0.
Then tell us if you see the ARP request on both interfaces, on wlan0 only or on none of them.
Another question, why are you using the bridge loop avoidance? If possible I would like you to disable any optional feature you have in order to have the cleanest testbed possible. I know that you already tried to disable it without effect, but it is better to perform test without any other "noise".
Cheers,
-- Antonio Quartulli
..each of us alone is worth nothing.. Ernesto "Che" Guevara
-- Antonio Quartulli
..each of us alone is worth nothing.. Ernesto "Che" Guevara
b.a.t.m.a.n@lists.open-mesh.org