On Sat, Apr 16, 2011 at 09:54:48AM +0200, Sven Eckelmann wrote:
Linus Lüssing wrote:
hardif_remove_interfaces() removes all hard interfaces from the hardif_list before freeing and cleaning up any device. However the clean up procedures in orig_hash_del_if() (hardif_remove_interface()->hardif_disable_interface()-> orig_hash_del_if()) need the other interfaces still to be present in the hardif_list. Otherwise it won't renumber any preceding interfaces, which leads to an unhandled kernel paging request in orig_node_del_if()'s "/* copy second part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down in the same way as removing single interfaces from batman only: One interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing linus.luessing@web.de
Please use --patience as requested in http://www.open-mesh.org/wiki/open-mesh/Contribute
Please show us (as part of the commit message) why the information in http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c22... isn't valid anymore and explain why it is save to use the spin_lock only inside the loop (but it would have to protect the loop in normal situations).
Kind regards, Sven
Hi Sven,
Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert of that commit looks kind of similar to my patch.
Commit 5d4b5a4d together with your statement confuse me a little. The commit message does not say anything about a locking dependancy issue, but seems to be a performance patch (which does not seem as such a severe problem to me, as removing/adding interfaces / removing the batman-adv module does not happen that frequently in common setups ;) ). Could you explain a little further which combinations of locks could introduce a problem?
Hmm, for the "and explain why it is save to use the spin_lock only" part, aggreed, I think it was initially a mistake of mine. And usually this would not protect us from a new interface being added or an interface being removed from batman during a NETDEV_REGISTER/UNREGISTER event while we are trying to flush the if_list. However, just before calling hardif_remove_interfaces(), we are calling unregister_netdevice_notifier(&hard_if_notifier). So as far as I know, no hardif_add_interface() or hardif_remove_interface() and according list_add/del_rcu for the if_list should be called anymore.
Cheers, Linus
PS: And it's actually not an "unhandled kernel paging request" but a "Null pointer dereference". Also see this ticket: http://www.open-mesh.org/issues/147
I'm also wondering why we'd actually need the rtnl_lock() in hardif_remove_interfaces() then with that reasoning. What operation in hardif_remove_interface() (without the 's') needs to be protected with the rtnl_lock(), could be place the rtnl_lock a little tighter instead to also fix the issue from here? http://www.open-mesh.org/issues/145