INFO: rcu detected stall in netlink_sendmsg (4)
by syzbot
Hello,
syzbot found the following crash on:
HEAD commit: ae661dec Merge branch 'ifla_xdp_expected_fd'
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12245647e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=b5acf5ac38a50651
dashboard link: https://syzkaller.appspot.com/bug?extid=0fb70e87d8e0ac278fe9
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0fb70e87d8e0ac278fe9(a)syzkaller.appspotmail.com
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-....: (1 GPs behind) idle=5c2/1/0x4000000000000002 softirq=376075/376076 fqs=5176
(t=10500 jiffies g=506061 q=176208)
NMI backtrace for cpu 0
CPU: 0 PID: 17281 Comm: syz-executor.5 Not tainted 5.6.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x169/0x1b3 kernel/rcu/tree_stall.h:254
print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
rcu_pending kernel/rcu/tree.c:3030 [inline]
rcu_sched_clock_irq.cold+0x518/0xc55 kernel/rcu/tree.c:2276
update_process_times+0x25/0x60 kernel/time/timer.c:1726
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:171
tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1314
__run_hrtimer kernel/time/hrtimer.c:1517 [inline]
__hrtimer_run_queues+0x32c/0xdd0 kernel/time/hrtimer.c:1579
hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1641
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1119 [inline]
smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1144
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:759 [inline]
RIP: 0010:lock_release+0x45f/0x7c0 kernel/locking/lockdep.c:4505
Code: 94 08 00 00 00 00 00 00 48 c1 e8 03 80 3c 10 00 0f 85 d0 02 00 00 48 83 3d 6d 1d 1b 08 00 0f 84 71 01 00 00 48 8b 3c 24 57 9d <0f> 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 48 01 c3 48 c7 03 00
RSP: 0018:ffffc90003d9ec30 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff12e7698 RBX: 1ffff920007b3d89 RCX: 1ffff110098769b9
RDX: dffffc0000000000 RSI: 1ffff110098769c5 RDI: 0000000000000282
RBP: ffff88804c3b4540 R08: 0000000000000004 R09: fffffbfff14cc269
R10: fffffbfff14cc268 R11: ffffffff8a661347 R12: bc95c6993a9665e0
R13: ffffffff87a36fb1 R14: ffff88804c3b4dd0 R15: 0000000000000003
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:174 [inline]
_raw_spin_unlock_bh+0x12/0x30 kernel/locking/spinlock.c:207
spin_unlock_bh include/linux/spinlock.h:383 [inline]
batadv_tt_local_purge_pending_clients+0x2a1/0x3b0 net/batman-adv/translation-table.c:3914
batadv_tt_local_resize_to_mtu+0x96/0x130 net/batman-adv/translation-table.c:4198
batadv_update_min_mtu net/batman-adv/hard-interface.c:626 [inline]
batadv_hardif_activate_interface.part.0.cold+0xc6/0x294 net/batman-adv/hard-interface.c:653
batadv_hardif_activate_interface net/batman-adv/hard-interface.c:800 [inline]
batadv_hardif_enable_interface+0x9f2/0xaa0 net/batman-adv/hard-interface.c:792
batadv_softif_slave_add+0x92/0x150 net/batman-adv/soft-interface.c:859
do_set_master net/core/rtnetlink.c:2470 [inline]
do_set_master+0x1d7/0x230 net/core/rtnetlink.c:2443
do_setlink+0xaa2/0x3680 net/core/rtnetlink.c:2605
__rtnl_newlink+0xad5/0x1590 net/core/rtnetlink.c:3266
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3391
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454
netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343
___sys_sendmsg+0x100/0x170 net/socket.c:2397
__sys_sendmsg+0xec/0x1b0 net/socket.c:2430
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45c849
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f043b72fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f043b7306d4 RCX: 000000000045c849
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000009f5 R14: 00000000004ccac9 R15: 000000000076bf0c
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller(a)googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
1 year, 11 months
Network stops passing traffic randomly
by smartwires@gmail.com
I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem.
1. Gateway is up and running and able the reach the internet.
2. batctl o show the neighbor/s
3. batctl ping [MAC] fails
root@Main-GW:~# batctl o
[B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73 (bat0/22:55:4d:3e:5f:8f BATMAN_IV)]
Originator last-seen (#/255) Nexthop [outgoingIF]
* e8:5b:b7:00:10:6b 0.880s (255) e8:5b:b7:00:10:6b [ mesh0]
root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b
PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out
Reply from host e8:5b:b7:00:10:6b timed out
2 years
QoS over batman-adv
by Xuebing Wang
Hi Simon and community,
We have been using batman-adv on OpenWRT 15.05 + ath9k chips for over
3 years and it works great.
We are exploring the idea of QoS over batman-adv to transmit small
quantity of high priority data. Any suggestions? Thanks.
Xuebing Wang
2 years
Re: Network stops passing traffic randomly
by Sven Eckelmann
[please don't send me private mails about batman-adv - unless you have a
really good reason to do so. And if not stated otherwise, I must assume
that you actually wanted to send you message to the mailing list]
On Thursday, 28 May 2020 21:18:36 CEST Steve Newcomb wrote:
> > My first guess is that the underlying interfaces (mesh0) stopped to transport
> > unicast frames. Did you check this by setting an IP on mesh0 and ping between
> > these devices using the IPv4 ping?
> Not sure what the phrase "to set an IP on mesh0" means, if not simply to
> endow the corresponding bridge with a static IP. Which is what I'm doing.
>
> Not sure what "IPv4 ping" means. I've disabled IPv6, so I'm not using
> anything but IPv4.
I am assuming that mesh0 is the device which was added to bat0 as slave.
Please replace this with whatever you are using
# on device 1
ip addr add 192.168.23.1/24 dev mesh0
# on device 2
ip addr add 192.168.23.2/24 dev mesh0
> If "IPv4 ping" means "the ordinary Linux ping command", then, yes, I've
> tried that.
The IPv4 ping was just a placeholder for "not batman-adv ping packets". So you
can also use ICMPv6 if you prefer. Just make sure to send it over the
underlying ("slave") interface of batman-adv. And not on bat0 or any higher
layer bridge/vlan/... interface.
With the addresses mentioned earlier:
# on device 1
ping 192.168.23.2
# on device 2
ping 192.168.23.1
And also observe with tcpdump what is received by the other end.
> 100% packet loss when the offline condition occurs. Batctl
> o, on the other hand, looks just fine.
Sounds to me like "mesh0" is still able to transport broadcast frames (which
are used for the OGMs - which "create" the originator lists in `batctl o`).
And if you cannot send unicast frames anymore on mesh0 then something is wrong
with the unicast part.
For example, when you are using encryption for the mesh0 link, maybe the group
key is still set correctly but something happened with the pairwise key and it
is now "corrupted".
Kind regards,
Sven
2 years
[PATCH 0/3] pull request for net-next: batman-adv 2020-05-26
by Simon Wunderlich
Hi David,
here is a small cleanup pull request of batman-adv to go into net-next.
Please pull or let me know of any problem!
Thank you,
Simon
The following changes since commit 1a33e10e4a95cb109ff1145098175df3113313ef:
net: partially revert dynamic lockdep key changes (2020-05-04 12:05:56 -0700)
are available in the Git repository at:
git://git.open-mesh.org/linux-merge.git tags/batadv-next-for-davem-20200526
for you to fetch changes up to 9ad346c90509ebd983f60da7d082f261ad329507:
batman-adv: Revert "disable ethtool link speed detection when auto negotiation off" (2020-05-26 09:23:33 +0200)
----------------------------------------------------------------
This cleanup patchset includes the following patches:
- Fix revert dynamic lockdep key changes for batman-adv,
by Sven Eckelmann
- use rcu_replace_pointer() where appropriate, by Antonio Quartulli
- Revert "disable ethtool link speed detection when auto negotiation
off", by Sven Eckelmann
----------------------------------------------------------------
Antonio Quartulli (1):
batman-adv: use rcu_replace_pointer() where appropriate
Sven Eckelmann (2):
batman-adv: Revert "Drop lockdep.h include for soft-interface.c"
batman-adv: Revert "disable ethtool link speed detection when auto negotiation off"
net/batman-adv/bat_v_elp.c | 15 +--------------
net/batman-adv/gateway_client.c | 4 ++--
net/batman-adv/hard-interface.c | 4 ++--
net/batman-adv/routing.c | 4 ++--
net/batman-adv/soft-interface.c | 1 +
5 files changed, 8 insertions(+), 20 deletions(-)
2 years, 1 month
Batman-adv packet retranslation
by Alexey Ermakov
Hi, All.
There is a problem with retranslating packets in B.A.T.M.A.N. V mode.
I have 3 stations, st1, st2 and st3 with one active network interface.
This interface is configured so that st1 and st3 can see only st2.
I expect that if I join these stations in a bat0 network, then station
2 will act as a repeater and all three stations will be visible in the
bat0 network.
This works fine if I selecting B.A.T.M.A.N. IV algorithm, but doesn't
work in B.A.T.M.A.N.V.
--
Ermakov Alexey.
2 years, 1 month
Re: Network stops passing traffic randomly
by Sven Eckelmann
On Monday, 25 May 2020 15:19:22 CEST Daniel Ghansah wrote:
> [B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73
Just noticed another thing - why is your revision of batman-adv so low? The
OpenWrt 18.06.x version of batman-adv is already at 2018.1-11.
Kind regards,
Sven
2 years, 1 month
Re: Network stops passing traffic randomly
by Sven Eckelmann
On Monday, 25 May 2020 15:19:22 CEST Daniel Ghansah wrote:
> Hi Sven,
> Yes I did ping via the IP, there is no response, I am using IPV4
Ok, when your underlying layer (mesh0) is not working then you should not
expect batman-adv to work.
Btw. the `batctl dc` output doesn't look like your tested the IPv4 ping
on mesh0 but on bat0 - not what I've asked for.
Kind regards,
Sven
2 years, 1 month