On Donnerstag, 6. Oktober 2016 01:43:07 CEST Linus Lüssing wrote:
The most prominent general protection fault I was experiencing when quickly removing and adding interfaces to batman-adv is the following:
I am personally not sure whether go through net.git or through net-next.git. If you think it should go through net-next then maybe it would be good to state quite early in the commit message that mdelay(...) is required to cause the problem?
[ 1137.316136] general protection fault: 0000 [#1] SMP
[...]
[ 1137.320038] Call Trace: [ 1137.320038] [<ffffffffa0363294>] batadv_hardif_disable_interface+0x29a/0x3a6 [batman_adv] [ 1137.320038] [<ffffffffa0373db4>] batadv_softif_destroy_netlink+0x4b/0xa4 [batman_adv] [ 1137.320038] [<ffffffff813b52f3>] __rtnl_link_unregister+0x48/0x92 [ 1137.320038] [<ffffffff813b9240>] rtnl_link_unregister+0xc1/0xdb [ 1137.320038] [<ffffffff8108547c>] ? bit_waitqueue+0x87/0x87 [ 1137.320038] [<ffffffffa03850d2>] batadv_exit+0x1a/0xf48 [batman_adv] [ 1137.320038] [<ffffffff810c26f9>] SyS_delete_module+0x136/0x1b0 [ 1137.320038] [<ffffffff8144dc65>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.320038] [<ffffffff8108aaca>] ? trace_hardirqs_off_caller+0x37/0xa6 [ 1137.320038] Code: 89 f7 e8 21 bd 0d e1 4d 85 e4 75 0e 31 f6 48 c7 c7 50 d7 3b a0 e8 50 16 f2 e0 49 8b 9c 24 28 01 00 00 48 85 db 0f 84 b2 00 00 00 <48> 8b 03 4d 85 ed 48 89 45 c8 74 09 4c 39 ab f8 00 00 00 75 1c [ 1137.320038] RIP [<ffffffffa0371852>] batadv_purge_outstanding_packets+0x1c8/0x291 [batman_adv] [ 1137.320038] RSP <ffff88001da5fd78> [ 1137.451885] ---[ end trace 803b9bdc6a4a952b ]--- [ 1137.453154] Kernel panic - not syncing: Fatal exception in interrupt [ 1137.457143] Kernel Offset: disabled [ 1137.457143] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Can we reduce the length of some lines here? Especially the modules line (which is not really interesting - I hope) to something like "Modules linked in: batman-adv(O-) <...>". Also please remove the "[ 1137.457143] " and just use 2/4 spaces in front of the snippet.
It can be easily reproduced with some carefully placed msleeps()/mdelay()s.
The issue is, that on interface removal, any still running worker thread of a forwarding packet will race with the interface purging routine to free a forwarding packet. Temporarilly giving up a spin-lock to be able
s/Temporarilly/Temporarily/
[...]
PS: checkpatch throws the following at me, but seems to be bogus?
------------------------------------------------------------------- /tmp/0001-batman-adv-fix-race-conditions-on-interface-removal.patch ------------------------------------------------------------------- CHECK: spinlock_t definition without comment + spinlock_t *lock); total: 0 errors, 0 warnings, 1 checks, 411 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /tmp/0001-batman-adv-fix-race-conditions-on-interface-removal.patch has style problems, please review.
Yes, this is bogus and a deficit of checkpatch.pl. But since we run checkpatch each day and I don't want to find a way to fix it in checkpatch.pl - maybe you can shorten it in send.h?
bool batadv_forw_packet_steal(struct batadv_forw_packet *packet, spinlock_t *l);
[...]
+bool batadv_forw_packet_steal(struct batadv_forw_packet *forw_packet,
spinlock_t *lock)
+{
- struct hlist_head head = HLIST_HEAD_INIT;
- /* did purging routine steal it earlier? */
- spin_lock_bh(lock);
- if (batadv_forw_packet_was_stolen(forw_packet)) {
spin_unlock_bh(lock);
return false;
- }
- hlist_del(&forw_packet->list);
- /* Just to spot misuse of this function */
- hlist_add_head(&forw_packet->bm_list, &head);
- hlist_add_fake(&forw_packet->bm_list);
Sorry, I don't get how this should spot misuse via this extra hlist_add_head. You first add the packet to the list (on the stack) and then setting pprev pointer to itself. So you basically have a fake hashed node with next pointer set to NULL. Wouldn't it be better here to use INIT_HLIST_NODE instead of hlist_add_head? I would even say that INIT_HLIST_NODE isn't needed here because you already did this during batadv_forw_packet_alloc.
But I would assume that you actually only wanted hlist_add_fake for the WARN_ONCE in batadv_forw_packet_queue, right?
[...]
+/**
- batadv_forw_packet_queue - try to queue a forwarding packet
- @forw_packet: the forwarding packet to queue
- @lock: a key to the store (e.g. forw_{bat,bcast}_list_lock)
- @head: the shelve to queue it on (e.g. forw_{bat,bcast}_list)
- @send_time: timestamp (jiffies) when the packet is to be sent
- This function tries to (re)queue a forwarding packet. If packet was stolen
- earlier then the shop owner will (usually) keep quiet about it.
Can "shop owner" please replaced with some relevant information for batman-adv?
- Caller needs to ensure that forw_packet->delayed_work was initialized.
- */
+static void batadv_forw_packet_queue(struct batadv_forw_packet *forw_packet,
spinlock_t *lock, struct hlist_head *head,
unsigned long send_time)
+{
- spin_lock_bh(lock);
- /* did purging routine steal it from us? */
- if (batadv_forw_packet_was_stolen(forw_packet)) {
/* If you got it for free() without trouble, then
* don't get back into the queue after stealing...
*/
WARN_ONCE(hlist_fake(&forw_packet->bm_list),
"Oh oh... the kernel OOPs are on our tail now... Jim won't bail us out this time!\n");
Can this be replaced with a less funny but more helpful message?
[...]
+/**
- batadv_purge_outstanding_packets - stop/purge scheduled bcast/OGMv1 packets
- @bat_priv: the bat priv with all the soft interface information
- @hard_iface: the hard interface to cancel and purge bcast/ogm packets on
Please replace the tab between " @hard_iface:" and "the hard in" with a space
[...]
@@ -21,6 +21,7 @@ #include "main.h"
#include <linux/compiler.h> +#include <linux/spinlock_types.h> #include <linux/types.h>
This include is actually correct - but I am currently mapping linux/spinlock_types.h to linux/spinlock.h in iwyu. So would be easier for me when this include will be set to linux/spinlock.h.
I am not sure about all the crime related puns in this patch but the idea makes sense and also cleans up some of the forwarding packet code.
Kind regards, Sven