On Saturday 13 December 2014 23:32:15 Linus Lüssing wrote:
This patch fixes a potential memory leak which can
occur once an
originator times out. On timeout the according global translation table
entry might not get purged correctly. Furthermore, the non purged TT
entry will cause its orig-node to leak, too. Which additionally can lead
to the new multicast optimization feature not kicking in because of a
therefore bogus counter.
So far, I am with you ..
In the wild with larger mesh networks we saw this leak
quite regularly,
resulting in routers to reboot or killed processes. This was because
of a combination of two bugs: The bug fixed by commit
"batman-adv: fix delayed foreign originator recognition" (8a2ad5204674)
amplified this memory leak heavily. Since that commit I'd expect
it to happen rarely, probably only in paused and resumed VMs and
devices previously in stand-by.
This section shouldn't be part of the official commit message. It is hardly
relevant to the reviewer how often a memleak occurs and whether or not you
need a VM to trigger it. The provided commit id isn't valid in the Linux tree.
The issue this patch fixes is caused by
batadv_orig_node_free_rcu()
never being called because of not yet released references to the
orig-node. References which were supposed to be released through
batadv_orig_node_free_rcu()->batadv_tt_global_del_orig().
Could you please provide addition insight as to which references are still
held ? I did look around but nothing obvious jumped at me.
Generally, it wouldn't be bad if the commit message went into deeper detail
describing the nature of the bug instead of the middle section above to make
it easy to understand what is being fixed.
Cheers,
Marek