From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com --- Changes to PATCH: * rebase on maint, add actual splat --- net/batman-adv/multicast.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index eb76386..75fa501 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -802,7 +802,9 @@ void batadv_mcast_free(struct batadv_priv *bat_priv) batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1); batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
+ spin_lock_bh(&bat_priv->tt.commit_lock); batadv_mcast_mla_tt_retract(bat_priv, NULL); + spin_unlock_bh(&bat_priv->tt.commit_lock); }
/**
On Mon, Nov 30, 2015 at 05:34:01PM +0100, Simon Wunderlich wrote:
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index eb76386..75fa501 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -802,7 +802,9 @@ void batadv_mcast_free(struct batadv_priv *bat_priv) batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1); batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
- spin_lock_bh(&bat_priv->tt.commit_lock); batadv_mcast_mla_tt_retract(bat_priv, NULL);
- spin_unlock_bh(&bat_priv->tt.commit_lock);
}
See comments in the other thread before applying.
From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com
Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") Acked-by: Linus Lüssing linus.luessing@c0d3.blue
On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com
Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") Acked-by: Linus Lüssing linus.luessing@c0d3.blue
I am confused why this should fix a commit which is there to find such problems.
Kind regards, Sven
On 19/12/15 17:32, Sven Eckelmann wrote:
On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com
Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") Acked-by: Linus Lüssing linus.luessing@c0d3.blue
I am confused why this should fix a commit which is there to find such problems.
Am I wrong or also the commit message is misleading? Looks like it is "fixing" a bogus lockdep check...but as far as I can say this patch is making sure that the lock is held in that particular point.
If we want to "fix the lockdep" then we should remove the lockdep_assert_held(), but I don't think this is the goal of the whole discussion.
Cheers,
On Sat, Dec 19, 2015 at 07:09:02PM +0800, Antonio Quartulli wrote:
Am I wrong or also the commit message is misleading? Looks like it is "fixing" a bogus lockdep check...
Yes and no. The lockdep checks placed are not bogus as is. They are fine for batadv_iv_ogm_schedule() vs. batadv_update_mtu(), so no need to remove it.
but as far as I can say this patch is making sure that the lock is held in that particular point.
There is no locking issue from batadv_mcast_free() here as it is in the freeing phase of the soft-iface. The soft-iface is supposed to be out-of-scope for anything else by now. This patch therefore can't be a fix for a race regarding mcast.mla_list.
However it does fix an issue with a lockdep warning. Which makes it suitable for stable to at least remove that splat for v4.3 users.
Regards, Linus
On Saturday 19 December 2015 19:09:02 Antonio Quartulli wrote:
On 19/12/15 17:32, Sven Eckelmann wrote:
On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com
Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") Acked-by: Linus Lüssing linus.luessing@c0d3.blue
I am confused why this should fix a commit which is there to find such problems.
Am I wrong or also the commit message is misleading? Looks like it is "fixing" a bogus lockdep check...but as far as I can say this patch is making sure that the lock is held in that particular point.
No, it doesn't say this anywhere. It says that it fixes the "lockdep splat"
If we want to "fix the lockdep" then we should remove the lockdep_assert_held(), but I don't think this is the goal of the whole discussion.
No, the lockdep check should not be removed because a lock is necessary for other situations when this list is modified as explained earlier [1]. You have multiple concurrent calling contexts for this function (hard_if_event with something like mtu update + something like batadv_iv_ogm_schedule) which require exclusive access to this list. Right now it is just protected by a weird "named" lock (from the translation table and not from the mcast).
And no, the solution here is not to add a special codepath which (for no sane reason) requires no locks or synchronized frees - even when this seemed to have become the new trend in the batman-adv codebase.
Kind regards, Sven
[1] https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2015-December/013876.html
On Monday, November 30, 2015 17:34:01 Simon Wunderlich wrote:
From: Simon Wunderlich simon@open-mesh.com
While testing, we got something like this:
WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...]
Signed-off-by: Simon Wunderlich simon@open-mesh.com
Changes to PATCH:
- rebase on maint, add actual splat
net/batman-adv/multicast.c | 2 ++ 1 file changed, 2 insertions(+)
Applied in revision 025b743.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org