Linus Lüssing wrote:
Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert of that commit looks kind of similar to my patch.
Commit 5d4b5a4d together with your statement confuse me a little. The commit message does not say anything about a locking dependancy issue, but seems to be a performance patch (which does not seem as such a severe problem to me, as removing/adding interfaces / removing the batman-adv module does not happen that frequently in common setups ;) ). Could you explain a little further which combinations of locks could introduce a problem?
No
Hmm, for the "and explain why it is save to use the spin_lock only" part, aggreed, I think it was initially a mistake of mine. And usually this would not protect us from a new interface being added or an interface being removed from batman during a NETDEV_REGISTER/UNREGISTER event while we are trying to flush the if_list. However, just before calling hardif_remove_interfaces(), we are calling unregister_netdevice_notifier(&hard_if_notifier). So as far as I know, no hardif_add_interface() or hardif_remove_interface() and according list_add/del_rcu for the if_list should be called anymore.
Interesting assumption, but how did you ensure that everything is in a synchronous state? The network core also uses rcu - and it doesn't use the atomic notifier functions.
Cheers, Linus
PS: And it's actually not an "unhandled kernel paging request" but a "Null pointer dereference". Also see this ticket: http://www.open-mesh.org/issues/147
I'm also wondering why we'd actually need the rtnl_lock() in hardif_remove_interfaces() then with that reasoning. What operation in hardif_remove_interface() (without the 's') needs to be protected with the rtnl_lock(), could be place the rtnl_lock a little tighter instead to also fix the issue from here? http://www.open-mesh.org/issues/145
See 132b776c22c9b71962a3ed9a33e5b6f9218dae1b
I will propose two different patches.
Regards, Sven