On Wed, Jun 12, 2024 at 04:39:15PM +0200, Linus Lüssing wrote:
On Wed, Jun 12, 2024 at 07:06:04AM -0700, Paul E. McKenney wrote:
Let me make sure that I understand...
You need rcu_barrier() to wait for any memory passed to kfree_rcu() to actually be freed? If so, please explain why you need this, as in what bad thing happens if the actual kfree() happens later.
(I could imagine something involving OOM avoidance, but I need to hear your code's needs rather than my imaginations.)
Thanx, Paul
We have allocated a kmem-cache for some objects, which are like batman-adv's version of a bridge's FDB entry.
The very last thing we do before unloading the module is free'ing/destroying this kmem-cache with a call to kmem_cache_destroy().
As far as I understand before calling kmem_cache_destroy() we need to ensure that all previously allocated objects on this kmem-cache were free'd. At least we get this kernel splat (from Slub?) otherwise. I'm not quite sure if any other bad things other than this noise in dmesg would occur though. Other than a stale, zero objects entry remaining in /proc/slabinfo maybe. Which gets duplicated everytime we repeat loading+unloading the module. At least these entries would be a memory leak I suppose?
# after insmod/rmmod'ing batman-adv 6 times: $ cat /proc/slabinfo | grep batadv_tl_cache batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0
That's why we added this rcu_barrier() call on module shutdown in the batman-adv module __exit function right before the kmem_cache_destroy() calls. Hoping that this would wait for all call_rcu() / kfree_rcu() callbacks and their final kfree() to finish. This worked when we were using call_rcu() with our own callback with a kfree(). However for kfree_rcu() this somehow does not seem to be the case anymore (- or more likely I'm missing something else, some other bug within the batman-adv code?).
It is quite possible that some of the recent energy-saving changes have caused rcu_barrier() to not wait for all kfree_rcu() memory to be freed. Which is timely, given a bunch of recently proposed changes that seemed like a good idea to me at the time. ;-)
Thanx, Paul