[B.A.T.M.A.N.] alfred and batadv-vis issue

List overview All Threads
Download

newer

older

[B.A.T.M.A.N.] [PATCH 0/9]...

[B.A.T.M.A.N.] [PATCH maint]...

gary

18 Oct 2018 18 Oct '18

5:53 a.m.

Hello guys,

I setup a testbed like this. (BBN = backbone node)

Switch ---------->BBN1 --------------->MP1 | ------->BBN2---------------->MP2

MP1 and MP2 may select BBN1 or BBN2 as gateway. On both MP1 and MP2, I enable Alfred and batadv-vis as follows: alfred -i br0 -4 224.0.0.1 -m & batadv-vis -i bat0 -s

when I run batadv-vis again on MP1, I can't get the info from MP2. If I ping the ip address of MP2 from MP1, MP1 will get MP2's mac address by arp. And then I run batadv-vis again, I can get the data from MP2 now.

Is there any configure or method to get MP2's info without ping test?

Thanks, Gary

Show replies by date

Sven Eckelmann

18 Oct 18 Oct

6:22 a.m.

On Donnerstag, 18. Oktober 2018 13:53:44 CEST gary wrote:

...

I setup a testbed like this. (BBN = backbone node)

Switch ---------->BBN1 --------------->MP1 | ------->BBN2---------------->MP2

MP1 and MP2 may select BBN1 or BBN2 as gateway. On both MP1 and MP2, I enable Alfred and batadv-vis as follows: alfred -i br0 -4 224.0.0.1 -m & batadv-vis -i bat0 -s

Looks like you are using the experimental IPv4 support which I don't want to actively support. So Jonathan Haws should take care of this.

...

when I run batadv-vis again on MP1, I can't get the info from MP2. If I ping the ip address of MP2 from MP1, MP1 will get MP2's mac address by arp. And then I run batadv-vis again, I can get the data from MP2 now.

Is there any configure or method to get MP2's info without ping test?

Kind regards, Sven

Jonathan Haws

22 Oct 22 Oct

5:36 p.m.

On Thu, 2018-10-18 at 08:22 +0200, Sven Eckelmann wrote:

...

On Donnerstag, 18. Oktober 2018 13:53:44 CEST gary wrote:

...
I setup a testbed like this. (BBN = backbone node)

Switch ---------->BBN1 --------------->MP1 | ------->BBN2---------------->MP2

MP1 and MP2 may select BBN1 or BBN2 as gateway. On both MP1 and MP2, I enable Alfred and batadv-vis as follows: alfred -i br0 -4 224.0.0.1 -m & batadv-vis -i bat0 -s

Looks like you are using the experimental IPv4 support which I don't want to actively support. So Jonathan Haws should take care of this.

I am aware of this issue, but was under the impression I was the only one it affected. I'm working on a fix (basically check the ARP table and if a record doesn't exist for the MAC then run an ARP query; if that query fails then report a error) and I hope to have it working soon.

...

...
when I run batadv-vis again on MP1, I can't get the info from MP2. If I ping the ip address of MP2 from MP1, MP1 will get MP2's mac address by arp. And then I run batadv-vis again, I can get the data from MP2 now.

Is there any configure or method to get MP2's info without ping test?

We have been working around this issue in the same manner as you describe for the time being.

Thanks, Jon

gary

23 Oct 23 Oct

3:02 a.m.

Hi Jon,

Long for your fix. Thanks.

I have a doubt for the issue. Alfred should send the info by multicast packets. So the packet's dst mac address should be multicast mac address. The packet with multicast mac address should be received by all group member, isn't it? Why does the mesh point need peer's mac address for sending the info?

Regards, Gary -----Original Message----- From: Jonathan Haws jhaws@sdl.usu.edu Sent: 2018年10月23日 1:36 To: sven@narfation.org Cc: b.a.t.m.a.n@lists.open-mesh.org; guohuizou2000@sina.com Subject: Re: [B.A.T.M.A.N.] alfred and batadv-vis issue

On Thu, 2018-10-18 at 08:22 +0200, Sven Eckelmann wrote:

...

On Donnerstag, 18. Oktober 2018 13:53:44 CEST gary wrote:

...
I setup a testbed like this. (BBN = backbone node)

Switch ---------->BBN1 --------------->MP1 | ------->BBN2---------------->MP2

MP1 and MP2 may select BBN1 or BBN2 as gateway. On both MP1 and MP2, I enable Alfred and batadv-vis as follows: alfred -i br0 -4 224.0.0.1 -m & batadv-vis -i bat0 -s

Looks like you are using the experimental IPv4 support which I don't want to actively support. So Jonathan Haws should take care of this.

...

...
when I run batadv-vis again on MP1, I can't get the info from MP2. If I ping the ip address of MP2 from MP1, MP1 will get MP2's mac address by arp. And then I run batadv-vis again, I can get the data from MP2 now.

Is there any configure or method to get MP2's info without ping test?

We have been working around this issue in the same manner as you describe for the time being.

Thanks, Jon

Jonathan Haws

4:04 a.m.

On Tue, 2018-10-23 at 11:02 +0800, gary wrote:

...

Hi Jon,

Long for your fix. Thanks.

It's still at least a week or so out...

...

I have a doubt for the issue. Alfred should send the info by multicast packets. So the packet's dst mac address should be multicast mac address. The packet with multicast mac address should be received by all group member, isn't it? Why does the mesh point need peer's mac address for sending the info?

The rest of the devs may be able to answer better, but I believe it is how alfred does its data storage and mapping of the source. When I looked at this a few months ago the way it was using the ARP table to pull the MAC address didn't work right and required a separate ARP request to have it work properly.

When I have a chance to dig back into it and develop the fix I can provide more details, but that's what I recall right now. It didn't make sense to me at first either based on the symptoms, but when I dug through the code I realized that was what was required and just didn't have time to put the fix in then.

...

-----Original Message----- From: Jonathan Haws jhaws@sdl.usu.edu Sent: 2018年10月23日 1:36 To: sven@narfation.org Cc: b.a.t.m.a.n@lists.open-mesh.org; guohuizou2000@sina.com Subject: Re: [B.A.T.M.A.N.] alfred and batadv-vis issue

On Thu, 2018-10-18 at 08:22 +0200, Sven Eckelmann wrote:

...
On Donnerstag, 18. Oktober 2018 13:53:44 CEST gary wrote:

...
I setup a testbed like this. (BBN = backbone node)

Switch ---------->BBN1 --------------->MP1 | ------->BBN2---------------->MP2

MP1 and MP2 may select BBN1 or BBN2 as gateway. On both MP1 and MP2, I enable Alfred and batadv-vis as follows: alfred -i br0 -4 224.0.0.1 -m & batadv-vis -i bat0 -s

Looks like you are using the experimental IPv4 support which I don't want to actively support. So Jonathan Haws should take care of this.

I am aware of this issue, but was under the impression I was the only one it affected. I'm working on a fix (basically check the ARP table and if a record doesn't exist for the MAC then run an ARP query; if that query fails then report a error) and I hope to have it working soon.

...
...
when I run batadv-vis again on MP1, I can't get the info from MP2. If I ping the ip address of MP2 from MP1, MP1 will get MP2's mac address by arp. And then I run batadv-vis again, I can get the data from MP2 now.

Is there any configure or method to get MP2's info without ping test?

We have been working around this issue in the same manner as you describe for the time being.

Thanks, Jon

Sven Eckelmann

6:29 a.m.

On Dienstag, 23. Oktober 2018 04:04:42 CEST Jonathan Haws wrote: [...]

...

...
I have a doubt for the issue. Alfred should send the info by multicast packets. So the packet's dst mac address should be multicast mac address. The packet with multicast mac address should be received by all group member, isn't it? Why does the mesh point need peer's mac address for sending the info?

The rest of the devs may be able to answer better, but I believe it is how alfred does its data storage and mapping of the source.

Correct, alfred is storing the data from each server using its mac address. It is also looking up the TQ of master servers via the mac address. And the mac address is expected to be extractable from the EUI64 based link-local IPv6 address.

And of course, Jonathan's IPv4 implementation must get the mac address using different methods. It is using the function ipv4_arp_request to get the information from IPv4 neighbor table. And it looks like his implementation is missing a reliable way to fill this neighbor table.

Kind regards, Sven

gary

9:52 a.m.

Hi Sven and Jon,

Can you add a payload in multicast packets to carry the mac address, so we can get the mac from the payload without concerning about whether it is IPv4 or IPv6?

Regards, Gary

-----Original Message----- From: Sven Eckelmann sven@narfation.org Sent: 2018年10月23日 14:29 To: Jonathan Haws jhaws@sdl.usu.edu Cc: guohuizou2000@sina.com; b.a.t.m.a.n@lists.open-mesh.org Subject: Re: [B.A.T.M.A.N.] alfred and batadv-vis issue

On Dienstag, 23. Oktober 2018 04:04:42 CEST Jonathan Haws wrote: [...]

...

...
I have a doubt for the issue. Alfred should send the info by multicast packets. So the packet's dst mac address should be multicast mac address. The packet with multicast mac address should be received by all group member, isn't it? Why does the mesh point need peer's mac address for sending the info?

The rest of the devs may be able to answer better, but I believe it is how alfred does its data storage and mapping of the source.

Kind regards, Sven

Sven Eckelmann

10:25 a.m.

On Dienstag, 23. Oktober 2018 17:52:54 CEST gary wrote: [..]

...

Can you add a payload in multicast packets to carry the mac address, so we can get the mac from the payload without concerning about whether it is IPv4 or IPv6?

You can do a lot - but I will not accept such a change in alfred.

Kind regards, Sven

Jonathan Haws

1:50 p.m.

On Tue, 2018-10-23 at 08:29 +0200, Sven Eckelmann wrote:

...

On Dienstag, 23. Oktober 2018 04:04:42 CEST Jonathan Haws wrote: [...]

...
...
I have a doubt for the issue. Alfred should send the info by multicast packets. So the packet's dst mac address should be multicast mac address. The packet with multicast mac address should be received by all group member, isn't it? Why does the mesh point need peer's mac address for sending the info?

The rest of the devs may be able to answer better, but I believe it is how alfred does its data storage and mapping of the source.

Correct, alfred is storing the data from each server using its mac address. It is also looking up the TQ of master servers via the mac address. And the mac address is expected to be extractable from the EUI64 based link-local IPv6 address.

Thanks, Sven, for the clarification!

...

And of course, Jonathan's IPv4 implementation must get the mac address using different methods. It is using the function ipv4_arp_request to get the information from IPv4 neighbor table. And it looks like his implementation is missing a reliable way to fill this neighbor table.

Yes - my plan is to implement a manual ARP request, which if that fails then the behavior will be as it is now. However, on success, it will have the correct MAC and everything will be good to go.

Thanks!

Sven Eckelmann

2:06 p.m.

On Dienstag, 23. Oktober 2018 13:50:42 CEST Jonathan Haws wrote: [...]

...

...
And of course, Jonathan's IPv4 implementation must get the mac address using different methods. It is using the function ipv4_arp_request to get the information from IPv4 neighbor table. And it looks like his implementation is missing a reliable way to fill this neighbor table.

Yes - my plan is to implement a manual ARP request, which if that fails then the behavior will be as it is now. However, on success, it will have the correct MAC and everything will be good to go.

Manual ARP request sounds weird. What about https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec... and https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

Kind regards, Sven

Jonathan Haws

2:11 p.m.

On Tue, 2018-10-23 at 16:06 +0200, Sven Eckelmann wrote:

...

On Dienstag, 23. Oktober 2018 13:50:42 CEST Jonathan Haws wrote: [...]

...
...
And of course, Jonathan's IPv4 implementation must get the mac address using different methods. It is using the function ipv4_arp_request to get the information from IPv4 neighbor table. And it looks like his implementation is missing a reliable way to fill this neighbor table.

Yes - my plan is to implement a manual ARP request, which if that fails then the behavior will be as it is now. However, on success, it will have the correct MAC and everything will be good to go.

Manual ARP request sounds weird. What about

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...

and

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...

Are these new routines? I may have been looking at old code and don't remember seeing these there. I'm seeing that these will do what we need - perform a request on the network if the entry is not in the cache?

Thanks!

Sven Eckelmann

2:16 p.m.

On Dienstag, 23. Oktober 2018 14:11:41 CEST Jonathan Haws wrote: [...]

...

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...
and

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...
Are these new routines? I may have been looking at old code and don't remember seeing these there. I'm seeing that these will do what we need

perform a request on the network if the entry is not in the cache?

These routines are not in alfred yet - they are in batctl. And we use them there for the translate subcommand (translate an IP to the responsible originator).

And yes, they check whether the entry is available and if not then they will try to send something towards the remote device. They only have to be adjusted for alfred and integrated in your IPv4 codepath(s).

Kind regards, Sven

Jonathan Haws

24 Oct 24 Oct

6:39 p.m.

On Tue, 2018-10-23 at 16:16 +0200, Sven Eckelmann wrote:

...

On Dienstag, 23. Oktober 2018 14:11:41 CEST Jonathan Haws wrote: [...]

...

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...

...
...
and

https://git.open-mesh.org/batctl.git/blob/83faa3126d6cc984fff10760aa975bacec...

...

...
...
Are these new routines? I may have been looking at old code and don't remember seeing these there. I'm seeing that these will do what we need

perform a request on the network if the entry is not in the

cache?

These routines are not in alfred yet - they are in batctl. And we use them there for the translate subcommand (translate an IP to the responsible originator).

And yes, they check whether the entry is available and if not then they will try to send something towards the remote device. They only have to be adjusted for alfred and integrated in your IPv4 codepath(s).

Sven and Gary,

I just submitted a patch that pulls the request_mac_resolve() routine from batctl, modifies it appropriately, and uses it when MAC resolution isn't from the cache.

I've tested this with my VM setup here and it works properly (after I verified that the nodes were not sharing messages first).

Gary - can you try the patch with your setup and make sure it solves the problem in your setup as well?

And Sven, thanks for pointing out those routines - this approach makes much more sense!

Thanks! Jon

Sven Eckelmann

25 Oct 25 Oct

6:15 a.m.

On Mittwoch, 24. Oktober 2018 18:39:43 CEST Jonathan Haws wrote: [...]

...

I just submitted a patch that pulls the request_mac_resolve() routine from batctl, modifies it appropriately, and uses it when MAC resolution isn't from the cache.

I've tested this with my VM setup here and it works properly (after I verified that the nodes were not sharing messages first).

Gary - can you try the patch with your setup and make sure it solves the problem in your setup as well?

The patch can be found at https://patchwork.open-mesh.org/patch/17552/ (or directly on the mailing list)

Please reply via mail with a line

Tested-by: FirstName LastName guohuizou2000@sina.com

when you've successfully tested it (FirstName and LastName have to be replaced with your actual name).

Kind regards, Sven

gary

9:06 a.m.

Hi Sven and Jon,

The patch does NOT work for me.

I review the code again and find the issue. The following code may make my testbed work.

--- a/util.c +++ b/util.c @@ -122,14 +122,14 @@ int ipv4_arp_request(struct interface *interface, const alfred_addr *addr, arpreq.arp_dev[sizeof(arpreq.arp_dev) - 1] = '\0';

if (ioctl(interface->netsock, SIOCGARP, &arpreq) < 0) - return -1; - - while (retries-- && !(arpreq.arp_flags & ATF_COM)) { - ipv4_request_mac_resolve(addr); - usleep(200000); - - if (ioctl(interface->netsock, SIOCGARP, &arpreq) < 0) - return -1; + { + while (retries-- && !(arpreq.arp_flags & ATF_COM)) { + ipv4_request_mac_resolve(addr); + usleep(200000); + + if (ioctl(interface->netsock, SIOCGARP, &arpreq) < 0) + return -1; + } }

Regards, Gary

-----Original Message----- From: Sven Eckelmann sven@narfation.org Sent: 2018年10月25日 14:15 To: Jonathan Haws jhaws@sdl.usu.edu Cc: guohuizou2000@sina.com; b.a.t.m.a.n@lists.open-mesh.org Subject: Re: [B.A.T.M.A.N.] alfred and batadv-vis issue

On Mittwoch, 24. Oktober 2018 18:39:43 CEST Jonathan Haws wrote: [...]

...

I just submitted a patch that pulls the request_mac_resolve() routine from batctl, modifies it appropriately, and uses it when MAC resolution isn't from the cache.

I've tested this with my VM setup here and it works properly (after I verified that the nodes were not sharing messages first).

Gary - can you try the patch with your setup and make sure it solves the problem in your setup as well?

The patch can be found at https://patchwork.open-mesh.org/patch/17552/ (or directly on the mailing list)

Please reply via mail with a line

Tested-by: FirstName LastName guohuizou2000@sina.com

when you've successfully tested it (FirstName and LastName have to be replaced with your actual name).

Kind regards, Sven

Jonathan Haws

29 Oct 29 Oct

4:07 p.m.

...

The patch does NOT work for me.

I review the code again and find the issue. The following code may make my testbed work.

Gary - can you describe to me some more details of your test setup? How are BBN1 and BBN2 configured? Are they machines with two different Ethernet interfaces (i.e. each having one connected to the downstream node and one connected to the switch)? Are they more or less acting as routers (i.e. doing IP forwarding)?

The testing I have been doing is just connecting MP1 and MP2 to the same switch and having them on the same subnet. Is that not the case in your setup?

The other thing I am interested in is what is the result of the first ioctl() call? I'm guessing it has to be failing, or else the patch would have worked for you. Can you add a print statement before the return that would give the error string as well as the interface and IP (i.e. interface->interface and addr->ipv4.s_addr)? That would be helpful in helping me find the root issue.

One thing to note (and Sven, maybe you can tell me if this is expected): in my testing I found that alfred is getting into this ipv4_arp_request call for the local node as well, thus the very first ioctl() will fail with "No such device or address". Should there be a check for this being the local node and just discard it before making the check or is making the check all the time then discarding okay?

Sven Eckelmann

4:48 p.m.

On Montag, 29. Oktober 2018 17:07:26 CET Jonathan Haws wrote: [...]

...

One thing to note (and Sven, maybe you can tell me if this is expected): in my testing I found that alfred is getting into this ipv4_arp_request call for the local node as well, thus the very first ioctl() will fail with "No such device or address". Should there be a check for this being the local node and just discard it before making the check or is making the check all the time then discarding okay?

Uhm, this sounds extremely wrong to me. Why would you receive your own UDP packets (push_data, announce_master, status_txend) again in the first place? See netsock_own_address for the code which drops such packets in the main recv function - you should know it because you've tried to modify it for IPv4 support.

But your memory initialization is completely broken and have to be fixed. Right now, you are just comparing uninitialized memory regions against each other.

Kind regards, Sven

Jonathan Haws

5:25 p.m.

On Mon, 2018-10-29 at 17:48 +0100, Sven Eckelmann wrote:

...

On Montag, 29. Oktober 2018 17:07:26 CET Jonathan Haws wrote: [...]

...
One thing to note (and Sven, maybe you can tell me if this is expected): in my testing I found that alfred is getting into this ipv4_arp_request call for the local node as well, thus the very first ioctl() will fail with "No such device or address". Should there be a check for this being the local node and just discard it before making the check or is making the check all the time then discarding okay?

Uhm, this sounds extremely wrong to me. Why would you receive your own UDP packets (push_data, announce_master, status_txend) again in the first place? See netsock_own_address for the code which drops such packets in the main recv function - you should know it because you've tried to modify it for IPv4 support.

I need to double check - I had thought I explicitly disabled IP_MULTICAST_LOOP, but if not I need to. This would be my first guess at why we'd be seeing our own UDP packets.

...

But your memory initialization is completely broken and have to be fixed. Right now, you are just comparing uninitialized memory regions against each other.

It looks like you fixed the memory initialization issue in your latest commit (recv.c, zeroing the alfred_source variable)? I'll pull this and see how it affects the behavior.

Sven Eckelmann

5:34 p.m.

On Montag, 29. Oktober 2018 18:25:19 CET Jonathan Haws wrote: [...]

...

...
But your memory initialization is completely broken and have to be fixed. Right now, you are just comparing uninitialized memory regions against each other.

It looks like you fixed the memory initialization issue in your latest commit (recv.c, zeroing the alfred_source variable)? I'll pull this and see how it affects the behavior.

There is no commit yet and it is still queued in patchwork [1]. I am still waiting for a reply with Tested-by: ... line from you.

Kind regards, Sven

[1] https://patchwork.open-mesh.org/patch/17596/

gary

30 Oct 30 Oct

5:24 a.m.

In my test setup, both BBN1/2 and MP1/2 are in one subnet.

ipv4_arp_request ret = -1 interface =0x120020070 sipaddr = 0x50505f7 (the first try, sipaddr is right) ipv4_arp_request ret = 0 interface =0x120020070 sipaddr = 0x50505f7 (the second try with latest patch)

the cause should be there is no arp entry for the source ip address at the first try.

-----Original Message----- From: Jonathan Haws jhaws@sdl.usu.edu Sent: 2018年10月30日 0:07 To: guohuizou2000@sina.com; sven@narfation.org Cc: b.a.t.m.a.n@lists.open-mesh.org Subject: Re: [B.A.T.M.A.N.] alfred and batadv-vis issue

...

The patch does NOT work for me.

I review the code again and find the issue. The following code may make my testbed work.

The testing I have been doing is just connecting MP1 and MP2 to the same switch and having them on the same subnet. Is that not the case in your setup?

Jonathan Haws

2:20 p.m.

On Tue, 2018-10-30 at 13:24 +0800, gary wrote:

...

In my test setup, both BBN1/2 and MP1/2 are in one subnet.

ipv4_arp_request ret = -1 interface =0x120020070 sipaddr = 0x50505f7 (the first try, sipaddr is right) ipv4_arp_request ret = 0 interface =0x120020070 sipaddr = 0x50505f7 (the second try with latest patch)

the cause should be there is no arp entry for the source ip address at the first try.

Right - that is what I found as well as I was doing more testing. My previous testing had an entry that was simply being marked as incomplete instead of being fully flushed.

The latest patch resolves that oversight. From the results you sent it appears that the latest patch is working for you. If so, can you update the patch with a Tested By: name <email> line in patchwork ( https://patchwork.open-mesh.org/patch/17597/)?

Thanks! Jon

Sven Eckelmann

2:27 p.m.

On Dienstag, 30. Oktober 2018 15:20:56 CET Jonathan Haws wrote: [...]

...

The latest patch resolves that oversight. From the results you sent it appears that the latest patch is working for you. If so, can you update the patch with a Tested By: name <email> line in patchwork ( https://patchwork.open-mesh.org/patch/17597/)?

No, the line must be "Tested-by: $Full $Name <$email@$address.$something>" something else is not detected by Patchwork. The last "Tested By: $Full $Name <$email@$address.$something>" from you for example wasn't detected and I had to manually add it to the commit message.

Kind regards, Sven

2291

Age (days ago)

2303

Last active (days ago)

b.a.t.m.a.n@lists.open-mesh.org

21 comments

3 participants

tags (0)

participants (3)

gary
Jonathan Haws
Sven Eckelmann