Hello,
syzbot found the following issue on:
HEAD commit: 41bccc98fb79 Linux 6.8-rc2 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=14200118180000 kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.x... kernel image: https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz....
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [syz-executor.0:28718] Modules linked in: irq event stamp: 45929391 hardirqs last enabled at (45929390): [<ffff8000801d9dc8>] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 hardirqs last disabled at (45929391): [<ffff80008ad57108>] __el1_irq arch/arm64/kernel/entry-common.c:499 [inline] hardirqs last disabled at (45929391): [<ffff80008ad57108>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:517 softirqs last enabled at (2040): [<ffff80008002189c>] softirq_handle_end kernel/softirq.c:399 [inline] softirqs last enabled at (2040): [<ffff80008002189c>] __do_softirq+0xac8/0xce4 kernel/softirq.c:582 softirqs last disabled at (2052): [<ffff80008aacbc40>] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (2052): [<ffff80008aacbc40>] batadv_tt_local_resize_to_mtu+0x60/0x154 net/batman-adv/translation-table.c:3949 CPU: 1 PID: 28718 Comm: syz-executor.0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 sp : ffff80009a0670b0 x29: ffff80009a0670c0 x28: ffff70001340ce60 x27: ffff80009a0673d0 x26: ffff00011e860290 x25: ffff0000d08a9f08 x24: 0000000000000001 x23: 1fffe00023d4d3c1 x22: dfff800000000000 x21: ffff80008aacbf98 x20: 0000000000000202 x19: ffff00011ea69e08 x18: ffff80009a066800 x17: 77656e2074696620 x16: ffff80008031ffc8 x15: 0000000000000001 x14: 1fffe0001ba5a290 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000040000 x10: 0000000000000003 x9 : 0000000000000000 x8 : 0000000002bcd3ae x7 : ffff80008aacbe30 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000002 x1 : ffff80008aecd7e0 x0 : ffff80012545c000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_tt_local_purge+0x264/0x2e8 net/batman-adv/translation-table.c:1356 batadv_tt_local_resize_to_mtu+0xa0/0x154 net/batman-adv/translation-table.c:3956 batadv_update_min_mtu+0x74/0xa4 net/batman-adv/hard-interface.c:651 batadv_netlink_set_mesh+0x50c/0x1078 net/batman-adv/netlink.c:500 genl_family_rcv_msg_doit net/netlink/genetlink.c:1113 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1193 [inline] genl_rcv_msg+0x874/0xb6c net/netlink/genetlink.c:1208 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2543 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1217 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x65c/0x898 net/netlink/af_netlink.c:1367 netlink_sendmsg+0x83c/0xb20 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x56c/0x840 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2674 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:51 lr : default_idle_call+0xf8/0x128 kernel/sched/idle.c:103 sp : ffff80008ebe7cd0 x29: ffff80008ebe7cd0 x28: dfff800000000000 x27: 1ffff00011d7cfa8 x26: ffff80008ec6d000 x25: 0000000000000000 x24: 0000000000000001 x23: 1ffff00011d8da74 x22: ffff80008ec6d3a0 x21: 0000000000000000 x20: ffff80008ec94e00 x19: ffff8000802cff08 x18: 1fffe000367ff796 x17: ffff80008ec6d000 x16: ffff8000802cf7cc x15: 0000000000000001 x14: 1fffe00036801310 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000000001 x10: 0000000000000003 x9 : 0000000000000000 x8 : 0000000000bf0413 x7 : ffff800080461668 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008ad5af48 x2 : 0000000000000000 x1 : ffff80008aecd7e0 x0 : ffff80012543a000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:49 cpuidle_idle_call kernel/sched/idle.c:170 [inline] do_idle+0x1f0/0x4e8 kernel/sched/idle.c:312 cpu_startup_entry+0x5c/0x74 kernel/sched/idle.c:410 rest_init+0x2dc/0x2f4 init/main.c:730 start_kernel+0x0/0x4e8 init/main.c:827 start_kernel+0x3e8/0x4e8 init/main.c:1072 __primary_switched+0xb4/0xbc arch/arm64/kernel/head.S:523
--- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with: #syz undup
On Mon, Feb 12, 2024 at 11:26 AM syzbot syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com wrote:
Hello,
syzbot found the following issue on:
HEAD commit: 41bccc98fb79 Linux 6.8-rc2 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=14200118180000 kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.x... kernel image: https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz....
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [syz-executor.0:28718] Modules linked in: irq event stamp: 45929391 hardirqs last enabled at (45929390): [<ffff8000801d9dc8>] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 hardirqs last disabled at (45929391): [<ffff80008ad57108>] __el1_irq arch/arm64/kernel/entry-common.c:499 [inline] hardirqs last disabled at (45929391): [<ffff80008ad57108>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:517 softirqs last enabled at (2040): [<ffff80008002189c>] softirq_handle_end kernel/softirq.c:399 [inline] softirqs last enabled at (2040): [<ffff80008002189c>] __do_softirq+0xac8/0xce4 kernel/softirq.c:582 softirqs last disabled at (2052): [<ffff80008aacbc40>] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (2052): [<ffff80008aacbc40>] batadv_tt_local_resize_to_mtu+0x60/0x154 net/batman-adv/translation-table.c:3949 CPU: 1 PID: 28718 Comm: syz-executor.0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 sp : ffff80009a0670b0 x29: ffff80009a0670c0 x28: ffff70001340ce60 x27: ffff80009a0673d0 x26: ffff00011e860290 x25: ffff0000d08a9f08 x24: 0000000000000001 x23: 1fffe00023d4d3c1 x22: dfff800000000000 x21: ffff80008aacbf98 x20: 0000000000000202 x19: ffff00011ea69e08 x18: ffff80009a066800 x17: 77656e2074696620 x16: ffff80008031ffc8 x15: 0000000000000001 x14: 1fffe0001ba5a290 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000040000 x10: 0000000000000003 x9 : 0000000000000000 x8 : 0000000002bcd3ae x7 : ffff80008aacbe30 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000002 x1 : ffff80008aecd7e0 x0 : ffff80012545c000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_tt_local_purge+0x264/0x2e8 net/batman-adv/translation-table.c:1356 batadv_tt_local_resize_to_mtu+0xa0/0x154 net/batman-adv/translation-table.c:3956 batadv_update_min_mtu+0x74/0xa4 net/batman-adv/hard-interface.c:651 batadv_netlink_set_mesh+0x50c/0x1078 net/batman-adv/netlink.c:500 genl_family_rcv_msg_doit net/netlink/genetlink.c:1113 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1193 [inline] genl_rcv_msg+0x874/0xb6c net/netlink/genetlink.c:1208 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2543 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1217 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x65c/0x898 net/netlink/af_netlink.c:1367 netlink_sendmsg+0x83c/0xb20 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x56c/0x840 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2674 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:51 lr : default_idle_call+0xf8/0x128 kernel/sched/idle.c:103 sp : ffff80008ebe7cd0 x29: ffff80008ebe7cd0 x28: dfff800000000000 x27: 1ffff00011d7cfa8 x26: ffff80008ec6d000 x25: 0000000000000000 x24: 0000000000000001 x23: 1ffff00011d8da74 x22: ffff80008ec6d3a0 x21: 0000000000000000 x20: ffff80008ec94e00 x19: ffff8000802cff08 x18: 1fffe000367ff796 x17: ffff80008ec6d000 x16: ffff8000802cf7cc x15: 0000000000000001 x14: 1fffe00036801310 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000000001 x10: 0000000000000003 x9 : 0000000000000000 x8 : 0000000000bf0413 x7 : ffff800080461668 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008ad5af48 x2 : 0000000000000000 x1 : ffff80008aecd7e0 x0 : ffff80012543a000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:49 cpuidle_idle_call kernel/sched/idle.c:170 [inline] do_idle+0x1f0/0x4e8 kernel/sched/idle.c:312 cpu_startup_entry+0x5c/0x74 kernel/sched/idle.c:410 rest_init+0x2dc/0x2f4 init/main.c:730 start_kernel+0x0/0x4e8 init/main.c:827 start_kernel+0x3e8/0x4e8 init/main.c:1072 __primary_switched+0xb4/0xbc arch/arm64/kernel/head.S:523
This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with: #syz undup
This patch [1] looks suspicious
I think batman-adv should reject too small MTU values.
[1]
commit d8e42a2b0addf238be8b3b37dcd9795a5c1be459 Author: Sven Eckelmann sven@narfation.org Date: Wed Jul 19 10:01:15 2023 +0200
batman-adv: Don't increase MTU when set by user
If the user set an MTU value, it usually means that there are special requirements for the MTU. But if an interface gots activated, the MTU was always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the MTU has to be decreased because batman-adv is not able to transfer packets with the user specified size.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
On Monday, 12 February 2024 11:41:38 CET Eric Dumazet wrote:
This patch [1] looks suspicious
Shouldn't be caused by this - but this might be another way to trigger the problem. The problem would be visible even without it when a mtu is explicitly set. But the reproducer is not available so I can't actually check what is going on.
I think batman-adv should reject too small MTU values.
You are refering to the size calculated by batadv_tt_local_table_transmit_size(), right? And yes, I would agree that it looks suspicious and might not have been correctly integrated in batadv_max_header_len() when commit a19d3d85e1b8 ("batman-adv: limit local translation table max size") introduced the code. But I think we also need to remove interfaces again when receiving NETDEV_CHANGEMTU and an interface is not having the correctly sized anymore. So have to check how to do this the best way.
Kind regards, Sven
On Monday, 12 February 2024 11:26:24 CET syzbot wrote:
syzbot found the following issue on:
HEAD commit: 41bccc98fb79 Linux 6.8-rc2 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=14200118180000 kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.x... kernel image: https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz....
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
#syz test
From 5984ace8f8df7cf8d6f98ded0eebe7d962028992 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann sven@narfation.org Date: Mon, 12 Feb 2024 13:10:33 +0100 Subject: [PATCH] batman-adv: Avoid infinite loop trying to resize local TT
If the MTU of one of an attached interface becomes too small to transmit the local translation table then it must be resized to fit inside all fragments (when enabled) or a single packet.
But if the MTU becomes too low to transmit even the header + the VLAN specific part then the resizing of the local TT will never succeed. This can for example happen when the usable space is 110 bytes and 11 VLANs are on top of batman-adv. In this case, at least 116 byte would be needed. There will just be an endless spam of
batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
in the log but the function will never finish. Problem here is that the timeout will be halved in each step and will then stagnate at 0 and therefore never be able to reduce the table even more.
There are other scenarios possible with a similar result. The number of BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too high to fit inside a packet. Such a scenario can therefore happen also with only a single VLAN + 7 non-purgable addresses - requiring at least 120 bytes.
While this should be handled proactively when:
* interface with too low MTU is added * VLAN is added * non-purgeable local mac is added * MTU of an attached interface is reduced * fragmentation setting gets disabled (which most likely requires dropping attached interfaces)
not all of these scenarios can be prevented because batman-adv is only consuming events without the the possibility to prevent these actions (non-purgable MAC address added, MTU of an attached interface is reduced). It is therefore necessary to also make sure that the code is able to handle also the situations when there were already incompatible system configurations present.
Cc: stable@vger.kernel.org Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size") Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b95c36765d04..2243cec18ecc 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface)
spin_lock_bh(&bat_priv->tt.commit_lock);
- while (true) { + while (timeout) { table_size = batadv_tt_local_table_transmit_size(bat_priv); if (packet_size_max >= table_size) break;
On Monday, 12 February 2024 11:26:24 CET syzbot wrote:
syzbot found the following issue on:
HEAD commit: 41bccc98fb79 Linux 6.8-rc2 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=14200118180000 kernel config: https://syzkaller.appspot.com/x/.config?x=451a1e62b11ea4a6 dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0772069e29cf/disk-41bccc98.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/659d3f0755b7/vmlinux-41bccc98.x... kernel image: https://storage.googleapis.com/syzbot-assets/7780a45c3e51/Image-41bccc98.gz....
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
#syz test
This crash does not have a reproducer. I cannot test it.
From 5984ace8f8df7cf8d6f98ded0eebe7d962028992 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann sven@narfation.org Date: Mon, 12 Feb 2024 13:10:33 +0100 Subject: [PATCH] batman-adv: Avoid infinite loop trying to resize local TT
If the MTU of one of an attached interface becomes too small to transmit the local translation table then it must be resized to fit inside all fragments (when enabled) or a single packet.
But if the MTU becomes too low to transmit even the header + the VLAN specific part then the resizing of the local TT will never succeed. This can for example happen when the usable space is 110 bytes and 11 VLANs are on top of batman-adv. In this case, at least 116 byte would be needed. There will just be an endless spam of
batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
in the log but the function will never finish. Problem here is that the timeout will be halved in each step and will then stagnate at 0 and therefore never be able to reduce the table even more.
There are other scenarios possible with a similar result. The number of BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too high to fit inside a packet. Such a scenario can therefore happen also with only a single VLAN + 7 non-purgable addresses - requiring at least 120 bytes.
While this should be handled proactively when:
- interface with too low MTU is added
- VLAN is added
- non-purgeable local mac is added
- MTU of an attached interface is reduced
- fragmentation setting gets disabled (which most likely requires dropping attached interfaces)
not all of these scenarios can be prevented because batman-adv is only consuming events without the the possibility to prevent these actions (non-purgable MAC address added, MTU of an attached interface is reduced). It is therefore necessary to also make sure that the code is able to handle also the situations when there were already incompatible system configurations present.
Cc: stable@vger.kernel.org Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size") Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b95c36765d04..2243cec18ecc 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface)
spin_lock_bh(&bat_priv->tt.commit_lock);
- while (true) {
- while (timeout) { table_size = batadv_tt_local_table_transmit_size(bat_priv); if (packet_size_max >= table_size) break;
-- 2.39.2
syzbot has found a reproducer for the following issue on:
HEAD commit: 707081b61156 Merge branch 'for-next/core', remote-tracking.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=134d4fa5180000 kernel config: https://syzkaller.appspot.com/x/.config?x=caeac3f3565b057a dashboard link: https://syzkaller.appspot.com/bug?extid=a6a4b5bb3da165594cff compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=139a4c81180000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=108b0ac9180000
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/6cad68bf7532/disk-707081b6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/1a27e5400778/vmlinux-707081b6.x... kernel image: https://storage.googleapis.com/syzbot-assets/67dfc53755d0/Image-707081b6.gz....
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [syz-executor227:7772] Modules linked in: irq event stamp: 5373 hardirqs last enabled at (5372): [<ffff80008ad68de8>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline] hardirqs last enabled at (5372): [<ffff80008ad68de8>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95 hardirqs last disabled at (5373): [<ffff80008ad66a78>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (5373): [<ffff80008ad66a78>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (542): [<ffff800088e9a56c>] spin_unlock_bh include/linux/spinlock.h:396 [inline] softirqs last enabled at (542): [<ffff800088e9a56c>] release_sock+0x154/0x1b8 net/core/sock.c:3547 softirqs last disabled at (548): [<ffff800088eaf8bc>] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (548): [<ffff800088eaf8bc>] lock_sock_nested+0x74/0x11c net/core/sock.c:3526 CPU: 0 PID: 7772 Comm: syz-executor227 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : queued_spin_lock_slowpath+0x15c/0xcf8 kernel/locking/qspinlock.c:383 lr : queued_spin_lock_slowpath+0x168/0xcf8 kernel/locking/qspinlock.c:383 sp : ffff800097ca76c0 x29: ffff800097ca7760 x28: 1fffe00018e1be6b x27: 1ffff00012f94ee4 x26: dfff800000000000 x25: 1fffe00018e1be6d x24: ffff800097ca76e0 x23: ffff800097ca7720 x22: ffff700012f94edc x21: 0000000000000001 x20: 0000000000000001 x19: ffff0000c70df358 x18: 0000000000000000 x17: 0000000000000000 x16: ffff8000809fd934 x15: 0000000000000001 x14: 1fffe00018e1be6b x13: 0000000000000000 x12: 0000000000000000 x11: ffff600018e1be6c x10: 1fffe00018e1be6b x9 : 0000000000000000 x8 : 0000000000000001 x7 : ffff800088eaf8bc x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008ae5db50 x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000001 Call trace: __cmpwait_case_8 arch/arm64/include/asm/cmpxchg.h:229 [inline] __cmpwait arch/arm64/include/asm/cmpxchg.h:257 [inline] queued_spin_lock_slowpath+0x15c/0xcf8 kernel/locking/qspinlock.c:383 queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] do_raw_spin_lock+0x320/0x348 kernel/locking/spinlock_debug.c:116 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:127 [inline] _raw_spin_lock_bh+0x50/0x60 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] lock_sock_nested+0x74/0x11c net/core/sock.c:3526 lock_sock include/net/sock.h:1691 [inline] tipc_sendstream+0x50/0x84 net/tipc/socket.c:1550 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x56c/0x840 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2674 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
--- If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing.
b.a.t.m.a.n@lists.open-mesh.org