hardif_remove_interfaces() removes all hard interfaces from the hardif_list before freeing and cleaning up any device. However the clean up procedures in orig_hash_del_if() (hardif_remove_interface()->hardif_disable_interface()-> orig_hash_del_if()) need the other interfaces still to be present in the hardif_list. Otherwise it won't renumber any preceding interfaces, which leads to an unhandled kernel paging request in orig_node_del_if()'s "/* copy second part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down in the same way as removing single interfaces from batman only: One interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing linus.luessing@web.de --- hard-interface.c | 15 ++++----------- 1 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/hard-interface.c b/hard-interface.c index b3058e4..f039a3d 100644 --- a/hard-interface.c +++ b/hard-interface.c @@ -490,20 +490,13 @@ static void hardif_remove_interface(struct hard_iface *hard_iface) void hardif_remove_interfaces(void) { struct hard_iface *hard_iface, *hard_iface_tmp; - struct list_head if_queue;
- INIT_LIST_HEAD(&if_queue); - - spin_lock(&hardif_list_lock); - list_for_each_entry_safe(hard_iface, hard_iface_tmp, - &hardif_list, list) { + rtnl_lock(); + list_for_each_entry_safe(hard_iface, hard_iface_tmp, &hardif_list, list) { + spin_lock(&hardif_list_lock); list_del_rcu(&hard_iface->list); - list_add_tail(&hard_iface->list, &if_queue); - } - spin_unlock(&hardif_list_lock); + spin_unlock(&hardif_list_lock);
- rtnl_lock(); - list_for_each_entry_safe(hard_iface, hard_iface_tmp, &if_queue, list) { hardif_remove_interface(hard_iface); } rtnl_unlock();
Linus Lüssing wrote:
hardif_remove_interfaces() removes all hard interfaces from the hardif_list before freeing and cleaning up any device. However the clean up procedures in orig_hash_del_if() (hardif_remove_interface()->hardif_disable_interface()-> orig_hash_del_if()) need the other interfaces still to be present in the hardif_list. Otherwise it won't renumber any preceding interfaces, which leads to an unhandled kernel paging request in orig_node_del_if()'s "/* copy second part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down in the same way as removing single interfaces from batman only: One interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing linus.luessing@web.de
Please use --patience as requested in http://www.open-mesh.org/wiki/open-mesh/Contribute
Please show us (as part of the commit message) why the information in http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c22... isn't valid anymore and explain why it is save to use the spin_lock only inside the loop (but it would have to protect the loop in normal situations).
Kind regards, Sven
Sven Eckelmann wrote:
Linus Lüssing wrote:
hardif_remove_interfaces() removes all hard interfaces from the hardif_list before freeing and cleaning up any device. However the clean up procedures in orig_hash_del_if() (hardif_remove_interface()->hardif_disable_interface()-> orig_hash_del_if()) need the other interfaces still to be present in the hardif_list. Otherwise it won't renumber any preceding interfaces, which leads to an unhandled kernel paging request in orig_node_del_if()'s "/* copy second part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down in the same way as removing single interfaces from batman only: One interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing linus.luessing@web.de
Please use --patience as requested in http://www.open-mesh.org/wiki/open-mesh/Contribute
Please show us (as part of the commit message) why the information in http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c 22c9b71962a3ed9a33e5b6f9218dae1b isn't valid anymore and explain why it is save to use the spin_lock only inside the loop (but it would have to protect the loop in normal situations).
Sry, this was not the correct commit - The commit which fixed a problematic locking behaviour was 5d4b5a4d - but I didn't gave a lockdep output there.
The other question must still be answered.
Btw. what is the status of the sysfs_addrm_finish vs. rtnl_lock patch?
Kind regards, Sven
On Sat, Apr 16, 2011 at 09:54:48AM +0200, Sven Eckelmann wrote:
Linus Lüssing wrote:
hardif_remove_interfaces() removes all hard interfaces from the hardif_list before freeing and cleaning up any device. However the clean up procedures in orig_hash_del_if() (hardif_remove_interface()->hardif_disable_interface()-> orig_hash_del_if()) need the other interfaces still to be present in the hardif_list. Otherwise it won't renumber any preceding interfaces, which leads to an unhandled kernel paging request in orig_node_del_if()'s "/* copy second part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down in the same way as removing single interfaces from batman only: One interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing linus.luessing@web.de
Please use --patience as requested in http://www.open-mesh.org/wiki/open-mesh/Contribute
Please show us (as part of the commit message) why the information in http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c22... isn't valid anymore and explain why it is save to use the spin_lock only inside the loop (but it would have to protect the loop in normal situations).
Kind regards, Sven
Hi Sven,
Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert of that commit looks kind of similar to my patch.
Commit 5d4b5a4d together with your statement confuse me a little. The commit message does not say anything about a locking dependancy issue, but seems to be a performance patch (which does not seem as such a severe problem to me, as removing/adding interfaces / removing the batman-adv module does not happen that frequently in common setups ;) ). Could you explain a little further which combinations of locks could introduce a problem?
Hmm, for the "and explain why it is save to use the spin_lock only" part, aggreed, I think it was initially a mistake of mine. And usually this would not protect us from a new interface being added or an interface being removed from batman during a NETDEV_REGISTER/UNREGISTER event while we are trying to flush the if_list. However, just before calling hardif_remove_interfaces(), we are calling unregister_netdevice_notifier(&hard_if_notifier). So as far as I know, no hardif_add_interface() or hardif_remove_interface() and according list_add/del_rcu for the if_list should be called anymore.
Cheers, Linus
PS: And it's actually not an "unhandled kernel paging request" but a "Null pointer dereference". Also see this ticket: http://www.open-mesh.org/issues/147
I'm also wondering why we'd actually need the rtnl_lock() in hardif_remove_interfaces() then with that reasoning. What operation in hardif_remove_interface() (without the 's') needs to be protected with the rtnl_lock(), could be place the rtnl_lock a little tighter instead to also fix the issue from here? http://www.open-mesh.org/issues/145
Linus Lüssing wrote:
Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert of that commit looks kind of similar to my patch.
Commit 5d4b5a4d together with your statement confuse me a little. The commit message does not say anything about a locking dependancy issue, but seems to be a performance patch (which does not seem as such a severe problem to me, as removing/adding interfaces / removing the batman-adv module does not happen that frequently in common setups ;) ). Could you explain a little further which combinations of locks could introduce a problem?
No
Hmm, for the "and explain why it is save to use the spin_lock only" part, aggreed, I think it was initially a mistake of mine. And usually this would not protect us from a new interface being added or an interface being removed from batman during a NETDEV_REGISTER/UNREGISTER event while we are trying to flush the if_list. However, just before calling hardif_remove_interfaces(), we are calling unregister_netdevice_notifier(&hard_if_notifier). So as far as I know, no hardif_add_interface() or hardif_remove_interface() and according list_add/del_rcu for the if_list should be called anymore.
Interesting assumption, but how did you ensure that everything is in a synchronous state? The network core also uses rcu - and it doesn't use the atomic notifier functions.
Cheers, Linus
PS: And it's actually not an "unhandled kernel paging request" but a "Null pointer dereference". Also see this ticket: http://www.open-mesh.org/issues/147
I'm also wondering why we'd actually need the rtnl_lock() in hardif_remove_interfaces() then with that reasoning. What operation in hardif_remove_interface() (without the 's') needs to be protected with the rtnl_lock(), could be place the rtnl_lock a little tighter instead to also fix the issue from here? http://www.open-mesh.org/issues/145
See 132b776c22c9b71962a3ed9a33e5b6f9218dae1b
I will propose two different patches.
Regards, Sven
b.a.t.m.a.n@lists.open-mesh.org