On Tuesday 15 December 2015 14:15:33 Linus Lüssing wrote:
On Mon, Dec 14, 2015 at 07:56:19PM +0100, Sven Eckelmann wrote:
On Monday 07 December 2015 23:12:42 Linus Lüssing wrote:
On Sat, Nov 28, 2015 at 09:21:02AM +0100, Sven Eckelmann wrote:
mcast.mla_list is protected by tt.commit_lock (see batadv_mcast_mla_tt_add, batadv_mcast_mla_list_free and batadv_mcast_mla_tt_retract).
mcast.mla_list changes should be protected by the non-parallel code flow: During runtime, batadv_mcast_mla_tt_update() is only called from the self-rearming OGM scheduler thread - batadv_mcast_mla_tt_update() will never run more than once at the same time.
The second place for mcast.mla_list changes, batadv_mcast_free(), is called only on shutdown after the OGM scheduling thread was stopped.
The two functions with the lockdep assert are
- batadv_mcast_mla_list_free
- batadv_mcast_mla_tt_retract
(batadv_mcast_mla_tt_add looks basically like batadv_mcast_mla_list_free)
The call graphs are attached and these graphs have (pure) starting nodes which are not only batadv_exit and batadv_iv_ogm_schedule. Parts of them look like they are only protected because of tt.commit_lock.
Thanks for these pictures! (btw. which tool did you use for that?)
It is just doxygen with the dot (graphviz) enabled for all (also undocumented + local/static functions).
The two non-colliding paths I had in mind were batadv_iv_ogm_schedule() and batadv_mcast_free(), which looks like:
batadv_mesh_free() -> batadv_purge_outstanding_packets() -> cancel_delayed_work_sync() ! [...] -> batadv_mcast_free()
Which ensures that no batadv_iv_ogm_schedule() is running before calling batadv_mcast_free().
That seems to be right.
But I missed the path from batadv_update_min_mtu()... However, it should not race with batadv_mcast_free() either, which is called once the last hard-iface gets disabled:
batadv_hardif_disable_interface() -> batadv_purge_outstanding_packets() -> cancel_delayed_work_sync() ! -> dev_put(soft_iface) [ if last hard-iface, then soft-iface is out of scope for any new batadv_update_mtu() and gets freed: ] -> batadv_softif_free() -> batadv_mesh_free() -> batadv_mcast_free()
But with yet another path it is getting even more, rediculously complicated... Just proving that trying to keep a lock-less update for mla_list here is a bad, unmaintainable approach :).
Yes, and I am currently not trusting the reference/rcu code in batman-adv anymore. It seems to have a lot of backpointers (yes, loops are very bad for reference counting and the workarounds chosen in the code for that don't look very sane either). I actually don't want to check this mess to find out if there is a situation where were there can be an (incorrectly counted) reference somewhere back to it. Let us just assume that everything is fine here... better for everyones sanity.
So I'm definitely in favor of having some spinlock to refer to for mcast.mla_list update, even if it isn't necessary for batadv_mcast_free(). But I don't see a race for the current mla_list updates either (otherwise I'd need a specific, more verbose example, I guess...).
The assert was more for having a common lock for all accesses to this list (ogm_schedule, mtu change event, hardif add/del, softif free/add). This lock already existed indirectly and protected the updates. The free functions should have the same protection (especially when changes are done to the code in the future which might call these functions in a different context). So right now the patch from Simon only works around the lockdep warning and prepares the code for the "future". So I don't think it is required for - stable - especially when you are sure that there is no race between _free and _retract.
Kind regards, Sven