On Sun, Jan 27, 2019 at 10:47:08PM +0100, Linus Lüssing wrote: [...]
The crash itself is triggered by the:
BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
in here:
https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_...
I had tried the nf_reset()s and Wang's patch but with no success.
Skimming through the code I noticed that there aren't that many opportunities for the hnnode to become zero. There are several hlist_nulls_del_rcu(), but no hlist_nulls_del_init_rcu()s for instance.
That started to make me wonder whether something from "outside" might be setting the hnnode to zero - and yeah...
I missed that batadv_send_skb_unicast() always frees/consumes the skb... and I was freeing the skb myself if that call returned !NET_XMIT_SUCCESS. So a double kfree_skb()... I'm a bit surprised that things did not crash more often...
Sorry for the noise :-(. But thanks for all the help and quick responses!