On Thursday, 9 January 2025 03:27:56 CET Andy Strohman wrote:
Reference counting is used to ensure that batadv_hardif_neigh_node and batadv_hard_iface are not freed before/during batadv_v_elp_throughput_metric_update work is finished.
But there isn't a guarantee that the hard if will remain associated with a soft interface up until the work is finished.
This fixes a crash triggered by reboot that looks like this:
Call trace: batadv_v_mesh_free+0xd0/0x4dc [batman_adv] batadv_v_elp_throughput_metric_update+0x1c/0xa4 process_one_work+0x178/0x398 worker_thread+0x2e8/0x4d0 kthread+0xd8/0xdc ret_from_fork+0x10/0x20
(the batadv_v_mesh_free call is misleading, and does not actually happen)
I am not 100% sure how you build batman-adv but when you've used the external kernel module then you can use [1,2]:
make EXTRA_CFLAGS="-fno-inline -Og -fno-optimize-sibling-calls -fno-reorder-blocks -fno-ipa-cp-clone -fno-partial-inlining" KERNELPATH=...
to get actually useful backtraces. Unfortunately, compile time checks sometimes need inlining and compilations fails or some kernel configurations with '-fno-inline'. If this happens to you then you can at least try to use the rest of the extra flags.
[1] https://www.open-mesh.org/projects/devtools/wiki/Kernel_hacking_Debian_image... [2] https://www.open-mesh.org/projects/devtools/wiki/Kernel_debugging_with_kgdb#...
I was able to make the issue happen more reliably by changing hardif_neigh->bat_v.metric_work work to be delayed work. This allowed me to track down and confirm the fix.
Signed-off-by: Andy Strohman andrew@andrewstrohman.com
Please add before your Signed-off-by line following extra line:
Fixes: 5c3245172c01 ("batman-adv: ELP - compute the metric based on the estimated throughput")
net/batman-adv/bat_v_elp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c index 1d704574..7daaad9c 100644 --- a/net/batman-adv/bat_v_elp.c +++ b/net/batman-adv/bat_v_elp.c @@ -140,7 +140,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh) }
default_throughput:
- if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT)) {
- if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT) && hard_iface->soft_iface) { batadv_info(hard_iface->soft_iface, "WiFi driver or ethtool info does not provide information about link speeds on interface %s, therefore defaulting to hardcoded throughput values of %u.%1u Mbps. Consider overriding the throughput manually or checking your driver.\n", hard_iface->net_dev->name,
I would prefer something more explanatory instead of adding more conditions at the end of actually interesting checks. Something more like:
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c index 1d704574..185b063f 100644 --- a/net/batman-adv/bat_v_elp.c +++ b/net/batman-adv/bat_v_elp.c @@ -66,12 +66,19 @@ static void batadv_v_elp_start_timer(struct batadv_hard_iface *hard_iface) static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh) { struct batadv_hard_iface *hard_iface = neigh->if_incoming; + struct net_device *soft_iface = hard_iface->soft_iface; struct ethtool_link_ksettings link_settings; struct net_device *real_netdev; struct station_info sinfo; u32 throughput; int ret;
+ /* don't query throughput when no longer associated with any + * batman-adv interface + */ + if (!soft_iface) + return BATADV_THROUGHPUT_DEFAULT_VALUE; + /* if the user specified a customised value for this interface, then * return it directly */ @@ -141,7 +148,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
default_throughput: if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT)) { - batadv_info(hard_iface->soft_iface, + batadv_info(soft_iface, "WiFi driver or ethtool info does not provide information about link speeds on interface %s, therefore defaulting to hardcoded throughput values of %u.%1u Mbps. Consider overriding the throughput manually or checking your driver.\n", hard_iface->net_dev->name, BATADV_THROUGHPUT_DEFAULT_VALUE / 10,