On Mon, Dec 29, 2014 at 11:52:53AM +0800, Marek Lindner wrote:
> The issue this patch fixes is caused by
> never being called because of not yet released references to the
> orig-node. References which were supposed to be released through
Could you please provide addition insight as to which references are still
held ? I did look around but nothing obvious jumped at me.
The batadv_tt_global_entry->orig_list holds the reference to the
orig-node. Usually this reference is released after
BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()->
batadv_tt_global_del_orig() which purges this global tt entry and
releases the reference to the orig-node.
However, if between two batadv_purge_orig_node() calls the orig-node
timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't
reached (*). Instead the according orig-node is removed from the
originator hash in _batadv_purge_orig(), the batadv_update_route()
part is skipped and won't be reached anymore.
It seems that in that case batadv_orig_node_free_rcu()->batadv_tt_global_del_orig()
is supposed to purge the global tt entry and to release the orig-node
reference but it's not called because batadv_orig_node_free_rcu() is
only called once all references are freed: A chicken 'n' egg
Generally, it wouldn't be bad if the commit message went into deeper detail
describing the nature of the bug instead of the middle section above to make
it easy to understand what is being fixed.
Hm, hm, my intention was to somehow/somewhere document how these
two, small and rare bugs together created this severe bug.
The issue fixed by 8a2ad5204674 was the main trigger for (*) in
But we can also skip that middle section if you want. Then I'll
just add a note to the ticket on redmine.