On Thu, Aug 04, 2016 at 09:29:20AM +0200, Sven Eckelmann wrote:
On Donnerstag, 4. August 2016 01:43:29 CEST Linus Lüssing wrote: [...]
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
Ok, lets go through it with following idea:
- the cycle exists (including the missing batadv_orig_node::ifinfo_list in my earlier example graph)
- batadv_purge_orig is called via the worker (for some reason with a large delay)
- _batadv_purge_orig goes through all entries in the orig hashtable
- batadv_purge_orig_node is called
- batadv_has_timed_out returns "hey, you are old - go away"
- batadv_purge_orig_node now only does for this orig_entry:
- batadv_gw_node_delete(bat_priv, orig_node);
- hlist_del_rcu(&orig_node->hash_entry);
- batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, -1, "originator timed out");
- batadv_orig_node_put(orig_node);
So the most cleanup work (batadv_orig_node_release) is only done when the orig_node refcnt reaches zero.
The batadv_purge_orig_ifinfo, batadv_purge_orig_neighbors and the rest from batadv_purge_orig_node is only done when the originator has not yet reached its timeout. I would therefore guess it working because you are lucky and your batadv_purge_orig_neighbors and batadv_purge_orig_ifinfo sometimes(tm) solve this problem for you.
Did some more tests today. Confirmed.
batadv_purge_orig_node has 2*PURGE_TIMEOUT, while purge_orig_neighbours only has 1*PURGE_TIMEOUT. All hardif_neigh_release() and orig_node_release() calls happened as expected, even with the patch of this thread.
purge_orig_neighbors seems to reset the orig_node->last_bonding_candidate to NULL and break the cycle.
If I switch the two purging intervals, then I have the memory leak with this patch. Without this patch, no memory leak.
I'll send a v3 with a simple six bytes orig address copy instead of a full orig_node reference.
Thanks for this awesome finding of a rare but nasty bug! Even though it'll probably mostly not happen due to the big timeout differences, it would nevertheless have been very, very difficult to spot & reproduce later on...