During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze. Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure.
Signed-off-by: Linus Lüssing linus.luessing@web.de --- main.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/main.c b/main.c index 209a46b..e8acb46 100644 --- a/main.c +++ b/main.c @@ -73,6 +73,8 @@ static void __exit batman_exit(void) flush_workqueue(bat_event_workqueue); destroy_workqueue(bat_event_workqueue); bat_event_workqueue = NULL; + + synchronize_net(); }
int mesh_init(struct net_device *soft_iface) @@ -135,9 +137,6 @@ void mesh_free(struct net_device *soft_iface) hna_local_free(bat_priv); hna_global_free(bat_priv);
- synchronize_net(); - - synchronize_rcu(); atomic_set(&bat_priv->mesh_state, MESH_INACTIVE); }
On Mon, Sep 06, 2010 at 01:29:53AM +0200, Linus Lüssing wrote:
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze. Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure.
I am really irritated by your patch. I would have expected that you add a synchronyze_rcu in batman_exit and that was it. Instead I see a synchronize_net added and a synchronize_net/-_rcu removed from mesh_free. This doesn't seem to match at all. Could you please explain further why it is implemented that way?
thanks, Sven
Hi Sven,
synchronize_net already contains a synchronize_rcu at its end, so the synchronize_rcu in the batman code there has always been redundant.
I've removed the synchronize_rcu instead of the synchronize_net to be on the safe side. I guess usually no more packets should arrive anyway as the batman packet type is not registered anymore. But I wasn't sure if the might_sleep() of synchronize_net() might be needed for something, so I didn't dare to remove synchronize_net.
If someone says it'd be ok to remove synchronize_net() instead, I could make a new patch, no problem.
Cheers, Linus
On Mon, Sep 06, 2010 at 09:30:46AM +0200, Sven Eckelmann wrote:
On Mon, Sep 06, 2010 at 01:29:53AM +0200, Linus Lüssing wrote:
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze. Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure.
I am really irritated by your patch. I would have expected that you add a synchronyze_rcu in batman_exit and that was it. Instead I see a synchronize_net added and a synchronize_net/-_rcu removed from mesh_free. This doesn't seem to match at all. Could you please explain further why it is implemented that way?
thanks, Sven
Linus Lüssing wrote:
Hi Sven,
synchronize_net already contains a synchronize_rcu at its end, so the synchronize_rcu in the batman code there has always been redundant.
I've removed the synchronize_rcu instead of the synchronize_net to be on the safe side. I guess usually no more packets should arrive anyway as the batman packet type is not registered anymore. But I wasn't sure if the might_sleep() of synchronize_net() might be needed for something, so I didn't dare to remove synchronize_net.
If someone says it'd be ok to remove synchronize_net() instead, I could make a new patch, no problem.
Ok, it would have been nice to state such things in the commit message (otherwise the stable@kernel.org will drop such a patch quite easily). Marek and I have ausgekaspert why it only happens in 1765 and also in 1766. So it will not be a patch for stable.
And the might_sleep is only for debugging purposes. But yes, it makes sense to use synchronize_net here (for example due to the usage of dev_remove_pack before).
That means that technically the patch seems to be ok, but didn't liked the explanation with the problem that we might have to justify it to the stable@kernel.org guys that way.
So I would ack the patch with a minor change in the commit message. So instead of
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze. Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure.
something like
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze.
The synchronize_net and synchronize_rcu in mesh_free are currently called before the call_rcu in hardif_remove_interface and have no real effect on it.
Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure using synchronize_net. The call to synchronize_rcu can be omitted because synchronize_net already calls it.
thanks, Sven
So I would ack the patch with a minor change in the commit message. So instead of
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze. Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure.
something like
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze.
The synchronize_net and synchronize_rcu in mesh_free are currently called before the call_rcu in hardif_remove_interface and have no real effect on it.
Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure using synchronize_net. The call to synchronize_rcu can be omitted because synchronize_net already calls it.
Yep, sounds good :). Thanks for reviewing and the info about synchronize_net.
Cheers, Linus
thanks, Sven
From: Linus Lüssing linus.luessing@web.de
During the module shutdown procedure in batman_exit(), a rcu callback is being scheduled (batman_exit -> hardif_remove_interfaces -> hardif_remove_interfae -> call_rcu). However, when the kernel unloads the module, the rcu callback might not have been executed yet, resulting in a "unable to handle kernel paging request" in __rcu_process_callback afterwards, causing the kernel to freeze.
The synchronize_net and synchronize_rcu in mesh_free are currently called before the call_rcu in hardif_remove_interface and have no real effect on it.
Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure using synchronize_net. The call to synchronize_rcu can be omitted because synchronize_net already calls it.
Signed-off-by: Linus Lüssing linus.luessing@web.de Acked-by: Sven Eckelmann sven.eckelmann@gmx.de --- batman-adv/main.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/batman-adv/main.c b/batman-adv/main.c index 209a46b..e8acb46 100644 --- a/batman-adv/main.c +++ b/batman-adv/main.c @@ -73,6 +73,8 @@ static void __exit batman_exit(void) flush_workqueue(bat_event_workqueue); destroy_workqueue(bat_event_workqueue); bat_event_workqueue = NULL; + + synchronize_net(); }
int mesh_init(struct net_device *soft_iface) @@ -135,9 +137,6 @@ void mesh_free(struct net_device *soft_iface) hna_local_free(bat_priv); hna_global_free(bat_priv);
- synchronize_net(); - - synchronize_rcu(); atomic_set(&bat_priv->mesh_state, MESH_INACTIVE); }
On Monday 06 September 2010 14:45:24 Sven Eckelmann wrote:
Therefore, we should always flush all rcu callback functions scheduled during the shutdown procedure using synchronize_net. The call to synchronize_rcu can be omitted because synchronize_net already calls it.
Applied in revision 1788.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org