Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
This fixes the following kernel NULL dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705 Hardware name: RPT (r1) (DT) Workqueue: bat_events batadv_v_elp_throughput_metric_update pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core] lr : sta_set_sinfo+0xcc/0xbd4 sp : ffff000007b43ad0 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000 Call trace: ath10k_sta_statistics+0x10/0x2dc [ath10k_core] sta_set_sinfo+0xcc/0xbd4 ieee80211_get_station+0x2c/0x44 cfg80211_get_station+0x80/0x154 batadv_v_elp_get_throughput+0x138/0x1fc batadv_v_elp_throughput_metric_update+0x1c/0xa4 process_one_work+0x1ec/0x414 worker_thread+0x70/0x46c kthread+0xdc/0xe0 ret_from_fork+0x10/0x20 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)
Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API") Signed-off-by: Remi Pommarel repk@triplefau.lt --- net/wireless/util.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c index 2bde8a354631..082c6f9c5416 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -2549,6 +2549,7 @@ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr, { struct cfg80211_registered_device *rdev; struct wireless_dev *wdev; + int ret;
wdev = dev->ieee80211_ptr; if (!wdev) @@ -2560,7 +2561,11 @@ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr,
memset(sinfo, 0, sizeof(*sinfo));
- return rdev_get_station(rdev, dev, mac_addr, sinfo); + wiphy_lock(&rdev->wiphy); + ret = rdev_get_station(rdev, dev, mac_addr, sinfo); + wiphy_unlock(&rdev->wiphy); + + return ret; } EXPORT_SYMBOL(cfg80211_get_station);
On Sat May 18, 2024 at 5:50 PM CEST, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
This fixes the following kernel NULL dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705 Hardware name: RPT (r1) (DT) Workqueue: bat_events batadv_v_elp_throughput_metric_update pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core] lr : sta_set_sinfo+0xcc/0xbd4 sp : ffff000007b43ad0 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000 Call trace: ath10k_sta_statistics+0x10/0x2dc [ath10k_core] sta_set_sinfo+0xcc/0xbd4 ieee80211_get_station+0x2c/0x44 cfg80211_get_station+0x80/0x154 batadv_v_elp_get_throughput+0x138/0x1fc batadv_v_elp_throughput_metric_update+0x1c/0xa4 process_one_work+0x1ec/0x414 worker_thread+0x70/0x46c kthread+0xdc/0xe0 ret_from_fork+0x10/0x20 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)
Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API") Signed-off-by: Remi Pommarel repk@triplefau.lt
Reviewed-by: Nicolas Escande nico.escande@gmail.com
On Sat, 2024-05-18 at 17:50 +0200, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
This fixes the following kernel NULL dereference:
How do you get a NULL pointer dereference from a locking issue? Was something else removing the station simultaneously?
johannes
Hi,
On 18/05/2024 17:50, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
Adding the lock is fine as nowadays it is taken in pre_doit and released in post_doit (with some exceptions). Therefore when invoking .get_station from a side path the lock should be taken too.
It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced this requirement AFAICS.
This fixes the following kernel NULL dereference:
As already said by Johannes, I am not sure it truly fixes this NULL dereference though.
Have you checked where in ath10k_sta_statistics this is exactly happening? Do you think some sta was partly released and thus fields were NULLified?
Regards,
On Tue, May 21, 2024 at 09:43:56AM +0200, Antonio Quartulli wrote:
Hi,
On 18/05/2024 17:50, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
Adding the lock is fine as nowadays it is taken in pre_doit and released in post_doit (with some exceptions). Therefore when invoking .get_station from a side path the lock should be taken too.
It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced this requirement AFAICS.
IIUC lock requirement was already there before this commit, only it was on rtnl lock to be taken instead of wiphy one.
This fixes the following kernel NULL dereference:
As already said by Johannes, I am not sure it truly fixes this NULL dereference though.
Have you checked where in ath10k_sta_statistics this is exactly happening? Do you think some sta was partly released and thus fields were NULLified?
ath10k_sta_statistics+0x10 is associated with the arsta->arvif->ar statement. It crashes because arsta->arvif is NULL.
Here is a scenario that explains the crash where the same sta disconnects then reconnects quickly (e.g. OPEN network):
CPU0: CPU1:
batadv_v_elp_periodic_work() queue_work(batadv_v_elp_get_throughput)
ieee80211_del_station() ieee80211_add_station() sta_info_insert() list_add_tail_rcu() ath10k_sta_state() memset(arsta, 0, sizeof(arsta)) batadv_v_elp_get_throughput() cfg80211_get_station() ieee80211_get_station() <-- Find sta with its MAC in list ath10k_sta_statistics() arsta->arvif->ar <-- arsta is still zeroed
In other words if the same sta has enough time to disconnect then reconnect before batadv_v_elp_get_throughput get scheduled, sta could be partially initialized when ath10k_sta_statistics() is called. Locking the wiphy ensure sta is fully initialized if found.
Thanks,
Hi,
On 21/05/2024 14:15, Remi Pommarel wrote:
On Tue, May 21, 2024 at 09:43:56AM +0200, Antonio Quartulli wrote:
Hi,
On 18/05/2024 17:50, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
Adding the lock is fine as nowadays it is taken in pre_doit and released in post_doit (with some exceptions). Therefore when invoking .get_station from a side path the lock should be taken too.
It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced this requirement AFAICS.
IIUC lock requirement was already there before this commit, only it was on rtnl lock to be taken instead of wiphy one.
You're right.
This fixes the following kernel NULL dereference:
As already said by Johannes, I am not sure it truly fixes this NULL dereference though.
Have you checked where in ath10k_sta_statistics this is exactly happening? Do you think some sta was partly released and thus fields were NULLified?
ath10k_sta_statistics+0x10 is associated with the arsta->arvif->ar statement. It crashes because arsta->arvif is NULL.
Here is a scenario that explains the crash where the same sta disconnects then reconnects quickly (e.g. OPEN network):
CPU0: CPU1:
batadv_v_elp_periodic_work() queue_work(batadv_v_elp_get_throughput)
ieee80211_del_station() ieee80211_add_station() sta_info_insert() list_add_tail_rcu() ath10k_sta_state() memset(arsta, 0, sizeof(arsta))
batadv_v_elp_get_throughput() cfg80211_get_station() ieee80211_get_station() <-- Find sta with its MAC in list ath10k_sta_statistics() arsta->arvif->ar <-- arsta is still zeroed
In other words if the same sta has enough time to disconnect then reconnect before batadv_v_elp_get_throughput get scheduled, sta could be partially initialized when ath10k_sta_statistics() is called. Locking the wiphy ensure sta is fully initialized if found.
We were just wondering how you could get the arvif being NULL and your explanation makes sense.
This said, holding the lock is required when invoking get_station via netlink, therefore it's meaningful to hold it even when side invoking it from another module.
Acked-by: Antonio Quartulli a@unstable.cc
b.a.t.m.a.n@lists.open-mesh.org