As far as I found the synchronization is provided by delayed work subsystem. It is based on the WORK_STRUCT_PENDING_BIT in work->data field.
The cancel_delayed_work_sync() atomically sets this bit and queue_delayed_work() checks it before scheduling new delayed work.
The problem is caused by the INIT_DELAYED_WORK() call inside batadv_dat_start_timer(). This call happens before the queue_delayed_work() call and clears this bit.
Best regards,
Vlad
On 08.06.2023 08:24, Keller, Jacob E wrote:
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Wednesday, June 7, 2023 10:01 PM To: Simon Wunderlich sw@simonwunderlich.de Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open- mesh.org; Vladislav Efanov VEfanov@ispras.ru; stable@kernel.org; Sven Eckelmann sven@narfation.org Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed work
On Wed, 7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
The reason for these issues is the lack of synchronization. Delayed work (batadv_dat_purge) schedules new timer/work while the device is being deleted. As the result new timer/delayed work is set after cancel_delayed_work_sync() was called. So after the device is freed the timer list contains pointer to already freed memory.
I guess this is better than status quo but is the fix really complete? We're still not preventing the timer / work from getting scheduled and staying alive after the netdev has been freed, right?
Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again.
I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency...
Thanks, Jake