Currently the clean up routine is hitting race conditions generating global protection faults due to lack of a proper coordination.
Actually the TT and originator tables are accessed by several components by their periodic workers, therefore it is safe to move the clean up of the two at the very end, when all the workers have already been stopped.
Moreover the originator clean up function is also accessing the TT table to remove up global entries and it is doing that by means of RCU callbacks. However such callbacks can generate a global protection fault since the TT table they want to access is not there anymore, while it existed the when the RCU scheduling was done.
Hence it is safe to move the originator clean up procedure at the very end, so that nothing is modified between the function invocation and the execution of the scheduled RCU callbacks.
Signed-off-by: Antonio Quartulli ordex@autistici.org ---
This patch fixes the global protection fault deterministically generated by the nc_worker and the tt_global_del_orig
main.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/main.c b/main.c index 7dac212..8b5474c 100644 --- a/main.c +++ b/main.c @@ -163,14 +163,22 @@ void batadv_mesh_free(struct net_device *soft_iface) batadv_vis_quit(bat_priv);
batadv_gw_node_purge(bat_priv); - batadv_originator_free(bat_priv); batadv_nc_free(bat_priv); + batadv_dat_free(bat_priv); + batadv_bla_free(bat_priv);
+ /* free the TT and the originator tables only after having terminated all + * the other minor components which may use these structures for their + * purposes + */ batadv_tt_free(bat_priv);
- batadv_bla_free(bat_priv); - - batadv_dat_free(bat_priv); + /* since the originator clean up routine is the one accessing the TT + * tables as well, invoke it as last step so that any race condition is + * avoided when the RCU callbacks scheduled by this function access the + * TT data + */ + batadv_originator_free(bat_priv);
free_percpu(bat_priv->bat_counters); bat_priv->bat_counters = NULL;