On Samstag, 6. August 2016 03:11:12 CEST Linus Lüssing wrote:
On Thu, Aug 04, 2016 at 07:56:44AM +0200, Sven Eckelmann wrote:
Ok, looks like the neigh_list edge in my graph is incorrectly there. But
what
about last_bonding_candidate? This definitely has a reference counter (and needs it) and thus your code would cause problems when bonding is enabled.
One more tiny thing, that I noticed during rereading & playing with that part of the code:
The usage of orig_node->last_bonding_candidate looks a lot like rcu-pointer style. However it is read & set without any rcu_dereference() / rcu_assign() wrappers.
It is set+accessed with a spinlock.
With rcu on the read side (and spinlock on the write side) we should guarantee that the rcu protected pointer is mostly read [1]. This is not the case here and thus it is using spinlocks at the moment.
Is something else protecting a simultaneous read (for instance from the neighbor table output) of last_bonding_candidate while being changed via batadv_last_bonding_replace()?
See above
What has the neighbor table output to do with last_bonding_candidate? It doesn't seem to access this pointer. Maybe I am missing something here. Can you please give an example how these are connected.
RCU also doesn't protect the changes to the object itself. It only helps us not to access invalid (free'd) memory regions when another context just replaced our pointer. This is done by splitting the previously (more or less) single operation of removing the object into two phases. One is to remove the object from the datastructure (list, pointer, ..) and then only reclaiming the memory when no one (correctly using rcu) can see the old object anymore. It is possible in the time between the removal from the datastructure that one context sees the new object when accessing the datastructure and another doesn't. Similar things can happen when new objects are added to the datastructure.
Just using the spinlock on both sides gives us the same protection (without the conflicting views of the datastructure from different contexts) - but it would be slower when we have multiple readers (and nearly no writer) trying to access the pointer all the time. This doesn't seem to be the case here because the main function batadv_find_router is at the same time reader+writer.
And just to remind the reader: The content of the object referenced in the datastructure still has to be protected with a spinlock (or something similar) in case multiple contexts may want to change the content.
And to be fair: There is one case were a spinlock is missing in batadv_find_router
last_candidate = orig_node->last_bonding_candidate; if (last_candidate) last_cand_router = rcu_dereference(last_candidate->router);
I had this on my list but mostly forgot about it while chasing the reference counting bugs. Maybe you found more problems but I am not sure which ones :)
Kind regards, Sven
[1] https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt search for "read-mostly" situations