Hi
I am having a problem getting alfred retrieve info from other nodes. I am running openwrt and alfred 2018.0.
I am able to start the alfred server with no errors both from boot and the command line.
root@OpenWrt:~# /etc/init.d/alfred restart /etc/init.d/alfred: waiting 30 secs for br-lan address... /etc/init.d/alfred: starting alfred /etc/init.d/alfred: starting batadv-vis
I can see it with ps
4290 root 816 S /usr/sbin/alfred -i br-lan -m -b bat0
I can issue a set and then a receive and I see the value I set but only locally. It does not get any info from the other nodes in my network. They are all running alfred servers as master.
I have added a firewall rule to open the port.
If I have 2 nodes running alfred and I run tcpdump -i br-lan udp port 16962 - I see packets from the other node:
16:21:49.279086 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4 16:21:51.907852 IP6 fe80::2d0:12ff:fe2e:f1bd.16962 > ip6-allnodes.16962: UDP, length 4 16:21:59.282603 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4 16:22:01.907227 IP6 fe80::2d0:12ff:fe2e:f1bd.16962 > ip6-allnodes.16962: UDP, length 4 16:22:09.285498 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4 16:22:11.919876 IP6 fe80::2d0:12ff:fe2e:f1bd.16962 > ip6-allnodes.16962: UDP, length 4 16:22:19.288912 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4 16:22:21.916609 IP6 fe80::2d0:12ff:fe2e:f1bd.16962 > ip6-allnodes.16962: UDP, length 4
I think these might be the master announcements but when I issue an alfred -r # - I do not see any new messages in the tcpdump.
Output of netstat -tunpl shows alfred
4290/alfred udp 0 0 fe80::2d0:12ff:fe00:f0c7:16962 :::* 4290/alfred
This is the output of ip a s br-lan
12: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:d0:12:00:f0:c7 brd ff:ff:ff:ff:ff:ff inet 192.168.2.91/24 brd 192.168.2.255 scope global br-lan valid_lft forever preferred_lft forever inet6 fd5b:763:2819::1/64 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::2d0:12ff:fe00:f0c7/64 scope link valid_lft forever preferred_lft forever
Ive rebuilt alfred, tried different versions from the repo, nothing works. I really would appreciate any help. Beth
On Wednesday, 6 February 2019 17.56.00 CET beth.flippo@telegrid.com wrote: [...]
I am having a problem getting alfred retrieve info from other nodes. I am running openwrt and alfred 2018.0.
Which OpenWrt version should that be? There is no release with 2018.0 and the current version in master is also newer.
I am able to start the alfred server with no errors both from boot and the command line.
root@OpenWrt:~# /etc/init.d/alfred restart /etc/init.d/alfred: waiting 30 secs for br-lan address... /etc/init.d/alfred: starting alfred /etc/init.d/alfred: starting batadv-vis
I can see it with ps
4290 root 816 S /usr/sbin/alfred -i br-lan -m -b bat0
I can issue a set and then a receive and I see the value I set but only locally. It does not get any info from the other nodes in my network. They are all running alfred servers as master.
Have you checked whether process_alfred_announce_master (recv.c) rejects the incoming packet for some reason and thus doesn't add it to the server list?
If not, have you checked whether sync_data (send.c) really sends out the data via push_data to the remote server?
I have added a firewall rule to open the port.
If I have 2 nodes running alfred and I run tcpdump -i br-lan udp port 16962 - I see packets from the other node:
16:21:49.279086 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4
[...]
I think these might be the master announcements but when I issue an alfred -r # - I do not see any new messages in the tcpdump.
If all daemons are in server mode then they will not create a request via `alfred -r`. Instead, they regularly sync [1] their facts between each other (same port but different packet type).
Kind regards, Sven
[1] https://www.open-mesh.org/projects/alfred/wiki/Alfred_architecture#Synchroni...
We are using a gateworks board and built the latest version from their bsp but we had batman-adv, batctl and Alfred ipk built at version 2018.0. Batman and batctl are both working.
root@OpenWrt:~# uname -a Linux OpenWrt 4.4.0 #16 SMP Tue Jan 29 19:41:46 UTC 2019 armv7l GNU/Linux root@OpenWrt:~# cat /etc/openwrt_release DISTRIB_ID='Gateworks' DISTRIB_RELEASE='Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_REVISION='r48868' DISTRIB_CODENAME='designated_driver' DISTRIB_TARGET='imx6/generic' DISTRIB_DESCRIPTION='Gateworks Designated Driver Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_TAINTS='no-all busybox' root@OpenWrt:~#
-----Original Message----- From: Sven Eckelmann sven@narfation.org Sent: Wednesday, February 6, 2019 1:14 PM To: b.a.t.m.a.n@lists.open-mesh.org Cc: beth.flippo@telegrid.com; s.pinsky@telegrid.com Subject: Re: [B.A.T.M.A.N.] Alfred problem reading other nodes
On Wednesday, 6 February 2019 17.56.00 CET beth.flippo@telegrid.com wrote: [...]
I am having a problem getting alfred retrieve info from other nodes. I am running openwrt and alfred 2018.0.
Which OpenWrt version should that be? There is no release with 2018.0 and the current version in master is also newer.
I am able to start the alfred server with no errors both from boot and the command line.
root@OpenWrt:~# /etc/init.d/alfred restart /etc/init.d/alfred: waiting 30 secs for br-lan address... /etc/init.d/alfred: starting alfred /etc/init.d/alfred: starting batadv-vis
I can see it with ps
4290 root 816 S /usr/sbin/alfred -i br-lan -m -b bat0
I can issue a set and then a receive and I see the value I set but only locally. It does not get any info from the other nodes in my network. They are all running alfred servers as master.
Have you checked whether process_alfred_announce_master (recv.c) rejects the incoming packet for some reason and thus doesn't add it to the server list?
If not, have you checked whether sync_data (send.c) really sends out the data via push_data to the remote server?
I have added a firewall rule to open the port.
If I have 2 nodes running alfred and I run 'tcpdump -i br-lan udp port 16962' - I see packets from the other node:
16:21:49.279086 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4
[...]
I think these might be the master announcements but when I issue an alfred -r # - I do not see any new messages in the tcpdump.
If all daemons are in server mode then they will not create a request via `alfred -r`. Instead, they regularly sync [1] their facts between each other (same port but different packet type).
Kind regards, Sven
[1] https://www.open-mesh.org/projects/alfred/wiki/Alfred_architecture#Synchroni zation
Hi Beth,
as Sven said, if both nodes are running in master mode then they should regularly synchronize their data. This happens by sending unicast (!) UDP packets to their respective link-local IPv6 addresses. In your dump, we can only see broadcasts of 4 bytes, most likely the announcements. But as you said, two different hosts are sending those announcements, so that part seems to work.
Since announcement from both ends are visible, it could still be that your firewall drops it. Did you try disabling the firewall completely (e.g. /etc/ init.d/firewall stop, iptable -F etc ...)?
If that doesn't help, the best way would be to instrument the code (recv.c as Sven suggested) to see if the announcements are received. we don't really have debug code there.
Cheers, Simon
On Wednesday, February 6, 2019 1:47:14 PM CET beth.flippo@telegrid.com wrote:
We are using a gateworks board and built the latest version from their bsp but we had batman-adv, batctl and Alfred ipk built at version 2018.0. Batman and batctl are both working.
root@OpenWrt:~# uname -a Linux OpenWrt 4.4.0 #16 SMP Tue Jan 29 19:41:46 UTC 2019 armv7l GNU/Linux root@OpenWrt:~# cat /etc/openwrt_release DISTRIB_ID='Gateworks' DISTRIB_RELEASE='Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_REVISION='r48868' DISTRIB_CODENAME='designated_driver' DISTRIB_TARGET='imx6/generic' DISTRIB_DESCRIPTION='Gateworks Designated Driver Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_TAINTS='no-all busybox' root@OpenWrt:~#
-----Original Message----- From: Sven Eckelmann sven@narfation.org Sent: Wednesday, February 6, 2019 1:14 PM To: b.a.t.m.a.n@lists.open-mesh.org Cc: beth.flippo@telegrid.com; s.pinsky@telegrid.com Subject: Re: [B.A.T.M.A.N.] Alfred problem reading other nodes
On Wednesday, 6 February 2019 17.56.00 CET beth.flippo@telegrid.com wrote: [...]
I am having a problem getting alfred retrieve info from other nodes. I am running openwrt and alfred 2018.0.
Which OpenWrt version should that be? There is no release with 2018.0 and the current version in master is also newer.
I am able to start the alfred server with no errors both from boot and the command line.
root@OpenWrt:~# /etc/init.d/alfred restart /etc/init.d/alfred: waiting 30 secs for br-lan address... /etc/init.d/alfred: starting alfred /etc/init.d/alfred: starting batadv-vis
I can see it with ps
4290 root 816 S /usr/sbin/alfred -i br-lan -m -b bat0
I can issue a set and then a receive and I see the value I set but only locally. It does not get any info from the other nodes in my network. They are all running alfred servers as master.
Have you checked whether process_alfred_announce_master (recv.c) rejects the incoming packet for some reason and thus doesn't add it to the server list?
If not, have you checked whether sync_data (send.c) really sends out the data via push_data to the remote server?
I have added a firewall rule to open the port.
If I have 2 nodes running alfred and I run 'tcpdump -i br-lan udp port 16962' - I see packets from the other node:
16:21:49.279086 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4
[...]
I think these might be the master announcements but when I issue an alfred -r # - I do not see any new messages in the tcpdump.
If all daemons are in server mode then they will not create a request via `alfred -r`. Instead, they regularly sync [1] their facts between each other (same port but different packet type).
Kind regards, Sven
[1] https://www.open-mesh.org/projects/alfred/wiki/Alfred_architecture#Synchroni zation
Hi -
So I added a lot of debug in both send.c and recv.c. Turns out that only multicast works - unicast does not. When I changed the destination on push packets to the Alfred multicast address then they were received on the other side.
Unicast does not produce any errors at the sendto function and looks like it is working but tcpdump doesn't see it and nothing gets to the receive side. Tcpdump does see the multicast announcement packets.
I have a ton of output from netstat and tcpdump but I don't think it is really useful to throw at you guys.
We are still googling it over here but at least we know it is on the send side and not a firewall issue on the receive side.
Any idea why this would happen? Anything thing we should try?
Thanks again for the help!
Beth
-----Original Message----- From: Simon Wunderlich sw@simonwunderlich.de Sent: Wednesday, February 6, 2019 2:03 PM To: b.a.t.m.a.n@lists.open-mesh.org Cc: beth.flippo@telegrid.com; 'Sven Eckelmann' sven@narfation.org; s.pinsky@telegrid.com Subject: Re: [B.A.T.M.A.N.] Alfred problem reading other nodes
Hi Beth,
as Sven said, if both nodes are running in master mode then they should regularly synchronize their data. This happens by sending unicast (!) UDP packets to their respective link-local IPv6 addresses. In your dump, we can only see broadcasts of 4 bytes, most likely the announcements. But as you said, two different hosts are sending those announcements, so that part seems to work.
Since announcement from both ends are visible, it could still be that your firewall drops it. Did you try disabling the firewall completely (e.g. /etc/ init.d/firewall stop, iptable -F etc ...)?
If that doesn't help, the best way would be to instrument the code (recv.c as Sven suggested) to see if the announcements are received. we don't really have debug code there.
Cheers, Simon
On Wednesday, February 6, 2019 1:47:14 PM CET beth.flippo@telegrid.com wrote:
We are using a gateworks board and built the latest version from their bsp but we had batman-adv, batctl and Alfred ipk built at version 2018.0. Batman and batctl are both working.
root@OpenWrt:~# uname -a Linux OpenWrt 4.4.0 #16 SMP Tue Jan 29 19:41:46 UTC 2019 armv7l GNU/Linux root@OpenWrt:~# cat /etc/openwrt_release DISTRIB_ID='Gateworks' DISTRIB_RELEASE='Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_REVISION='r48868' DISTRIB_CODENAME='designated_driver' DISTRIB_TARGET='imx6/generic' DISTRIB_DESCRIPTION='Gateworks Designated Driver Gateworks Ventana 16.02@dd98aee-dirty' DISTRIB_TAINTS='no-all busybox' root@OpenWrt:~#
-----Original Message----- From: Sven Eckelmann sven@narfation.org Sent: Wednesday, February 6, 2019 1:14 PM To: b.a.t.m.a.n@lists.open-mesh.org Cc: beth.flippo@telegrid.com; s.pinsky@telegrid.com Subject: Re: [B.A.T.M.A.N.] Alfred problem reading other nodes
On Wednesday, 6 February 2019 17.56.00 CET beth.flippo@telegrid.com wrote: [...]
I am having a problem getting alfred retrieve info from other nodes. I am running openwrt and alfred 2018.0.
Which OpenWrt version should that be? There is no release with 2018.0 and the current version in master is also newer.
I am able to start the alfred server with no errors both from boot and the command line.
root@OpenWrt:~# /etc/init.d/alfred restart /etc/init.d/alfred: waiting 30 secs for br-lan address... /etc/init.d/alfred: starting alfred /etc/init.d/alfred: starting batadv-vis
I can see it with ps
4290 root 816 S /usr/sbin/alfred -i br-lan -m -b bat0
I can issue a set and then a receive and I see the value I set but only locally. It does not get any info from the other nodes in my network. They are all running alfred servers as master.
Have you checked whether process_alfred_announce_master (recv.c) rejects the incoming packet for some reason and thus doesn't add it to the
server list?
If not, have you checked whether sync_data (send.c) really sends out the data via push_data to the remote server?
I have added a firewall rule to open the port.
If I have 2 nodes running alfred and I run 'tcpdump -i br-lan udp port 16962' - I see packets from the other node:
16:21:49.279086 IP6 fe80::2d0:12ff:fe00:f0c7.16962 > ip6-allnodes.16962: UDP, length 4
[...]
I think these might be the master announcements but when I issue an alfred -r # - I do not see any new messages in the tcpdump.
If all daemons are in server mode then they will not create a request via `alfred -r`. Instead, they regularly sync [1] their facts between each other (same port but different packet type).
Kind regards, Sven
[1] https://www.open-mesh.org/projects/alfred/wiki/Alfred_architecture#Syn chroni zation
On Friday, 8 February 2019 16.58.46 CET beth.flippo@telegrid.com wrote: [...]
Unicast does not produce any errors at the sendto function and looks like it is working but tcpdump doesn't see it and nothing gets to the receive side. Tcpdump does see the multicast announcement packets.
You could try to use perf to figure out where the packet might have been lost. Here here for example some output when I use `ip6tables -t filter -P OUTPUT DROP` and then try to ping something via:
sudo perf ftrace ping -6 google.de > test
Of course, it doesn't give you a direct answer but you can at least try to make an educated guess by searching for the position when the skb is freed:
34843 3) | ip6_send_skb() { 34844 3) | ip6_local_out() { 34845 3) | __ip6_local_out() { 34846 3) | nf_hook_slow() { 34847 3) | nft_do_chain_ipv6 [nf_tables]() { 34848 3) 0.146 us | ipv6_find_hdr(); 34849 3) 0.602 us | nft_do_chain [nf_tables](); 34850 3) 1.259 us | } 34851 3) | kfree_skb() { 34852 3) | skb_release_all() { 34853 3) | skb_release_head_state() { 34854 3) 0.105 us | dst_release(); 34855 3) | sock_wfree() { 34856 3) 0.271 us | sock_def_write_space(); 34857 3) 0.505 us | } 34858 3) 1.072 us | } 34859 3) | skb_release_data() { 34860 3) | skb_free_head() { 34861 3) 0.103 us | kfree(); 34862 3) 0.294 us | } 34863 3) 0.533 us | } 34864 3) 2.016 us | } 34865 3) | kfree_skbmem() { 34866 3) 0.150 us | kmem_cache_free(); 34867 3) 0.333 us | } 34868 3) 2.629 us | } 34869 3) 4.498 us | } 34870 3) 4.781 us | } 34871 3) 4.968 us | } 34872 3) 5.168 us | } 34873 3) 7.532 us | }
And without this filter, the function continues and at the end it is queued up in the ethernet HW and at some point really given to the ethernet driver (e1000_xmit_frame).
7) | ip6_send_skb() { 7) | ip6_local_out() { 7) | __ip6_local_out() { 7) | nf_hook_slow() { 7) | nft_do_chain_ipv6 [nf_tables]() { 7) 0.221 us | ipv6_find_hdr(); 7) 0.529 us | nft_do_chain [nf_tables](); 7) 1.405 us | } 7) 1.966 us | } 7) 2.354 us | } 7) | ip6_output() { 7) | ip6_finish_output() { 7) 0.785 us | __cgroup_bpf_run_filter_skb(); 7) 0.581 us | ip6_mtu(); 7) | ip6_finish_output2() { 7) | neigh_resolve_output() { 7) | eth_header() { 7) 0.538 us | skb_push(); 7) 1.858 us | } 7) | dev_queue_xmit() { 7) | __dev_queue_xmit() { 7) 0.541 us | dst_release(); 7) 0.903 us | netdev_pick_tx(); 7) | pfifo_fast_enqueue() { 7) 0.585 us | _raw_spin_lock(); 7) 1.892 us | } 7) 0.583 us | _raw_spin_trylock(); 7) | __qdisc_run() { 7) 1.029 us | pfifo_fast_dequeue(); 7) 0.858 us | pfifo_fast_dequeue(); 7) | sch_direct_xmit() { 7) | validate_xmit_skb_list() { 7) | validate_xmit_skb() { 7) | netif_skb_features() { 7) 0.537 us | passthru_features_check(); 7) 0.518 us | skb_network_protocol(); 7) 2.956 us | } 7) 0.579 us | validate_xmit_xfrm(); 7) 5.413 us | } 7) 6.473 us | } 7) 0.434 us | _raw_spin_lock(); 7) | dev_hard_start_xmit() { 7) | dev_queue_xmit_nit() { 7) | skb_clone() { 7) | kmem_cache_alloc() { 7) 0.563 us | should_failslab(); 7) 0.540 us | memcg_kmem_put_cache(); 7) 2.755 us | } 7) | __skb_clone() { 7) 0.681 us | __copy_skb_header(); 7) 1.915 us | } 7) 6.311 us | } 7) 0.721 us | ktime_get_with_offset(); 7) | packet_rcv() { 7) 1.027 us | skb_push(); 7) | __bpf_prog_run32() { 7) 2.280 us | ___bpf_prog_run(); 7) 3.418 us | } 7) | consume_skb() { 7) | skb_release_all() { 7) 0.668 us | skb_release_head_state(); 7) 0.615 us | skb_release_data(); 7) 2.927 us | } 7) | kfree_skbmem() { 7) 0.712 us | kmem_cache_free(); 7) 1.705 us | } 7) 6.126 us | } 7) + 13.655 us | } 7) + 23.417 us | } 7) | e1000_xmit_frame [e1000e]() { 7) 0.723 us | e1000_maybe_stop_tx [e1000e](); 7) 0.711 us | swiotlb_map_page(); 7) 0.542 us | swiotlb_dma_mapping_error(); 7) 0.444 us | e1000_maybe_stop_tx [e1000e](); 7) 6.934 us | } 7) + 32.409 us | } 7) + 41.556 us | } 7) 0.572 us | pfifo_fast_dequeue(); 7) + 47.066 us | } 7) 0.574 us | __local_bh_enable_ip(); 7) + 56.331 us | } 7) + 57.439 us | } 7) + 61.122 us | } 7) 0.588 us | __local_bh_enable_ip(); 7) + 63.722 us | }
You can also see this successful transmit in the net tracepoints:
sudo perf trace --syscalls -e 'net:*' -e 'skb:*' ping google.de [...] 7013.921 ( ): ping/18832 sendto(fd: 4socket:[266397], buff: 0x558076ff39c0, len: 64, flags: CONFIRM, addr: 0x558076ff1940, addr_len: 28) ... 7013.962 ( ): net:net_dev_queue:dev=enp0s31f6 skbaddr=0xffff94a2328e2e00 len=118 7013.981 ( ): skb:consume_skb:skbaddr=0xffff94a2328e3f00 7013.994 ( ): net:net_dev_start_xmit:dev=enp0s31f6 queue_mapping=0 skbaddr=0xffff94a2328e2e00 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x86dd ip_summed=0 len=118 data_len=0 network_offset=14 transport_offset_valid=1 transport_offset=54 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0 7014.006 ( ): net:net_dev_xmit:dev=enp0s31f6 skbaddr=0xffff94a2328e2e00 len=118 rc=0 7013.921 ( 0.094 ms): ping/18832 ... [continued]: sendto()) = 64 64 bytes from prg03s06-in-x03.1e100.net (2a00:1450:4014:80d::2003): icmp_seq=8 ttl=55 time=42.5 ms 7014.024 ( ): ping/18832 recvmsg(fd: 4socket:[266397], msg: 0x7ffcc8a9d000 ) ... 7056.519 ( ): skb:skb_copy_datagram_iovec:skbaddr=0xffff94a208eeac00 len=64 7056.525 ( ): skb:consume_skb:skbaddr=0xffff94a208eeac00 7014.024 (42.504 ms): ping/18832 ... [continued]: recvmsg()) = 64
For the filtered testcase you can see the kfree_skb:
9221.434 ( ): ping/18913 sendto(fd: 4socket:[259911], buff: 0x5561881769c0, len: 64, addr: 0x556188174940, addr_len: 28) ... 9221.473 ( ): skb:kfree_skb:skbaddr=0xffff94a2382df600 protocol=34525 location=0xffffffff9701a307
Kind regards, Sven
b.a.t.m.a.n@lists.open-mesh.org