So far, if we wanted to bridge VLAN tagged frames into the mesh one would need to manually create an according VLAN interface on top of bat0 first, to trigger batman-adv to create the according structures for a VID.
With this change the VLAN from bridged-in clients is now automatically detected and added to the translation table on the fly.
Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue --- v2: fix a typo, a missing "to" in the commit message
net/batman-adv/hard-interface.c | 2 +- net/batman-adv/multicast.c | 8 +- net/batman-adv/soft-interface.c | 121 ++++++++++++++++------------- net/batman-adv/soft-interface.h | 6 +- net/batman-adv/translation-table.c | 19 ++--- net/batman-adv/translation-table.h | 4 +- 6 files changed, 88 insertions(+), 72 deletions(-)
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index 96a412beab2d..f5826dd8752c 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -946,7 +946,7 @@ static int batadv_hard_if_event_softif(unsigned long event, switch (event) { case NETDEV_REGISTER: bat_priv = netdev_priv(net_dev); - batadv_softif_create_vlan(bat_priv, BATADV_NO_FLAGS); + batadv_softif_create_vlan_own(bat_priv, BATADV_NO_FLAGS); break; }
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index 38fab5e46ae2..61e765352e29 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -724,6 +724,7 @@ static void batadv_mcast_mla_tt_add(struct batadv_priv *bat_priv, { struct batadv_hw_addr *mcast_entry; struct hlist_node *tmp; + int ret;
if (!mcast_list) return; @@ -733,9 +734,10 @@ static void batadv_mcast_mla_tt_add(struct batadv_priv *bat_priv, &bat_priv->mcast.mla_list)) continue;
- if (!batadv_tt_local_add(bat_priv->soft_iface, - mcast_entry->addr, BATADV_NO_FLAGS, - BATADV_NULL_IFINDEX, BATADV_NO_MARK)) + ret = batadv_tt_local_add(bat_priv->soft_iface, + mcast_entry->addr, BATADV_NO_FLAGS, + BATADV_NULL_IFINDEX, BATADV_NO_MARK); + if (ret <= 0) continue;
hlist_del(&mcast_entry->list); diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 30ecbc2ef1fd..fb647798e5c8 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -141,6 +141,10 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p)
rcu_read_lock(); hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) { + /* we don't use this VID ourself, avoid adding us to it */ + if (!batadv_is_my_client(bat_priv, old_addr, vlan->vid)) + continue; + batadv_tt_local_remove(bat_priv, old_addr, vlan->vid, "mac address changed", false); batadv_tt_local_add(dev, addr->sa_data, vlan->vid, @@ -549,13 +553,15 @@ struct batadv_softif_vlan *batadv_softif_vlan_get(struct batadv_priv *bat_priv, }
/** - * batadv_softif_create_vlan() - allocate the needed resources for a new vlan + * batadv_softif_create_vlan() - create a softif vlan struct * @bat_priv: the bat priv with all the soft interface information * @vid: the VLAN identifier * - * Return: 0 on success, a negative error otherwise. + * Return: a pointer to the newly allocated softif vlan struct on success, NULL + * otherwise. */ -int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) +static struct batadv_softif_vlan * +batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) { struct batadv_softif_vlan *vlan;
@@ -563,15 +569,14 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
vlan = batadv_softif_vlan_get(bat_priv, vid); if (vlan) { - batadv_softif_vlan_put(vlan); spin_unlock_bh(&bat_priv->softif_vlan_list_lock); - return -EEXIST; + return vlan; }
vlan = kzalloc(sizeof(*vlan), GFP_ATOMIC); if (!vlan) { spin_unlock_bh(&bat_priv->softif_vlan_list_lock); - return -ENOMEM; + return NULL; }
vlan->bat_priv = bat_priv; @@ -584,34 +589,71 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
+ return vlan; +} + +/** + * batadv_softif_vlan_get_or_create() - retrieve or create a softif vlan struct + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + * + * Return: the softif vlan struct if found or created or NULL otherwise. + */ +struct batadv_softif_vlan * +batadv_softif_vlan_get_or_create(struct batadv_priv *bat_priv, + unsigned short vid) +{ + struct batadv_softif_vlan *vlan = batadv_softif_vlan_get(bat_priv, vid); + + if (vlan) + return vlan; + + return batadv_softif_create_vlan(bat_priv, vid); +} + +/** + * batadv_softif_create_vlan_own() - add our own softif to the local TT + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + * + * Adds the MAC address of our own soft interface with the given VLAN ID as + * a permanent local TT entry. + * + * Return: 0 on success, a negative error otherwise. + */ +int batadv_softif_create_vlan_own(struct batadv_priv *bat_priv, + unsigned short vid) +{ + int ret; + /* add a new TT local entry. This one will be marked with the NOPURGE * flag */ - batadv_tt_local_add(bat_priv->soft_iface, - bat_priv->soft_iface->dev_addr, vid, - BATADV_NULL_IFINDEX, BATADV_NO_MARK); - - /* don't return reference to new softif_vlan */ - batadv_softif_vlan_put(vlan); + ret = batadv_tt_local_add(bat_priv->soft_iface, + bat_priv->soft_iface->dev_addr, vid, + BATADV_NULL_IFINDEX, BATADV_NO_MARK); + if (ret < 0) + return ret;
return 0; }
/** - * batadv_softif_destroy_vlan() - remove and destroy a softif_vlan object + * batadv_softif_destroy_vlan_own() - remove our own softif from the local TT * @bat_priv: the bat priv with all the soft interface information - * @vlan: the object to remove + * @vid: the VLAN identifier + * + * Removes the MAC address of our own soft interface with the given VLAN ID from + * the local TT. */ -static void batadv_softif_destroy_vlan(struct batadv_priv *bat_priv, - struct batadv_softif_vlan *vlan) +static void batadv_softif_destroy_vlan_own(struct batadv_priv *bat_priv, + unsigned short vid) { /* explicitly remove the associated TT local entry because it is marked * with the NOPURGE flag */ - batadv_tt_local_remove(bat_priv, bat_priv->soft_iface->dev_addr, - vlan->vid, "vlan interface destroyed", false); - - batadv_softif_vlan_put(vlan); + batadv_tt_local_remove(bat_priv, bat_priv->soft_iface->dev_addr, vid, + "vlan interface destroyed", false); }
/** @@ -629,7 +671,6 @@ static int batadv_interface_add_vid(struct net_device *dev, __be16 proto, unsigned short vid) { struct batadv_priv *bat_priv = netdev_priv(dev); - struct batadv_softif_vlan *vlan;
/* only 802.1Q vlans are supported. * batman-adv does not know how to handle other types @@ -639,25 +680,7 @@ static int batadv_interface_add_vid(struct net_device *dev, __be16 proto,
vid |= BATADV_VLAN_HAS_TAG;
- /* if a new vlan is getting created and it already exists, it means that - * it was not deleted yet. batadv_softif_vlan_get() increases the - * refcount in order to revive the object. - * - * if it does not exist then create it. - */ - vlan = batadv_softif_vlan_get(bat_priv, vid); - if (!vlan) - return batadv_softif_create_vlan(bat_priv, vid); - - /* add a new TT local entry. This one will be marked with the NOPURGE - * flag. This must be added again, even if the vlan object already - * exists, because the entry was deleted by kill_vid() - */ - batadv_tt_local_add(bat_priv->soft_iface, - bat_priv->soft_iface->dev_addr, vid, - BATADV_NULL_IFINDEX, BATADV_NO_MARK); - - return 0; + return batadv_softif_create_vlan_own(bat_priv, vid); }
/** @@ -676,7 +699,6 @@ static int batadv_interface_kill_vid(struct net_device *dev, __be16 proto, unsigned short vid) { struct batadv_priv *bat_priv = netdev_priv(dev); - struct batadv_softif_vlan *vlan;
/* only 802.1Q vlans are supported. batman-adv does not know how to * handle other types @@ -684,15 +706,7 @@ static int batadv_interface_kill_vid(struct net_device *dev, __be16 proto, if (proto != htons(ETH_P_8021Q)) return -EINVAL;
- vlan = batadv_softif_vlan_get(bat_priv, vid | BATADV_VLAN_HAS_TAG); - if (!vlan) - return -ENOENT; - - batadv_softif_destroy_vlan(bat_priv, vlan); - - /* finally free the vlan object */ - batadv_softif_vlan_put(vlan); - + batadv_softif_destroy_vlan_own(bat_priv, vid | BATADV_VLAN_HAS_TAG); return 0; }
@@ -1099,7 +1113,6 @@ static void batadv_softif_destroy_netlink(struct net_device *soft_iface, { struct batadv_priv *bat_priv = netdev_priv(soft_iface); struct batadv_hard_iface *hard_iface; - struct batadv_softif_vlan *vlan;
list_for_each_entry(hard_iface, &batadv_hardif_list, list) { if (hard_iface->soft_iface == soft_iface) @@ -1107,11 +1120,7 @@ static void batadv_softif_destroy_netlink(struct net_device *soft_iface, }
/* destroy the "untagged" VLAN */ - vlan = batadv_softif_vlan_get(bat_priv, BATADV_NO_FLAGS); - if (vlan) { - batadv_softif_destroy_vlan(bat_priv, vlan); - batadv_softif_vlan_put(vlan); - } + batadv_softif_destroy_vlan_own(bat_priv, BATADV_NO_FLAGS);
unregister_netdevice_queue(soft_iface, head); } diff --git a/net/batman-adv/soft-interface.h b/net/batman-adv/soft-interface.h index 9f2003f1a497..7050ccd304df 100644 --- a/net/batman-adv/soft-interface.h +++ b/net/batman-adv/soft-interface.h @@ -21,10 +21,14 @@ void batadv_interface_rx(struct net_device *soft_iface, struct batadv_orig_node *orig_node); bool batadv_softif_is_valid(const struct net_device *net_dev); extern struct rtnl_link_ops batadv_link_ops; -int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid); +int batadv_softif_create_vlan_own(struct batadv_priv *bat_priv, + unsigned short vid); void batadv_softif_vlan_release(struct kref *ref); struct batadv_softif_vlan *batadv_softif_vlan_get(struct batadv_priv *bat_priv, unsigned short vid); +struct batadv_softif_vlan * +batadv_softif_vlan_get_or_create(struct batadv_priv *bat_priv, + unsigned short vid);
/** * batadv_softif_vlan_put() - decrease the vlan object refcounter and diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b21ff3c36b07..ca74a46c1d93 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -591,8 +591,8 @@ static void batadv_tt_global_free(struct batadv_priv *bat_priv, * * Return: true if the client was successfully added, false otherwise. */ -bool batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, - unsigned short vid, int ifindex, u32 mark) +int batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, + unsigned short vid, int ifindex, u32 mark) { struct batadv_priv *bat_priv = netdev_priv(soft_iface); struct batadv_tt_local_entry *tt_local; @@ -604,10 +604,10 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, struct hlist_head *head; struct batadv_tt_orig_list_entry *orig_entry; int hash_added, table_size, packet_size_max; - bool ret = false; bool roamed_back = false; u8 remote_flags; u32 match_mark; + int ret = 0;
if (ifindex != BATADV_NULL_IFINDEX) in_dev = dev_get_by_index(net, ifindex); @@ -658,21 +658,22 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, net_ratelimited_function(batadv_info, soft_iface, "Local translation table size (%i) exceeds maximum packet size (%i); Ignoring new local tt entry: %pM\n", table_size, packet_size_max, addr); + ret = -E2BIG; goto out; }
tt_local = kmem_cache_alloc(batadv_tl_cache, GFP_ATOMIC); - if (!tt_local) + if (!tt_local) { + ret = -ENOMEM; goto out; + }
/* increase the refcounter of the related vlan */ - vlan = batadv_softif_vlan_get(bat_priv, vid); + vlan = batadv_softif_vlan_get_or_create(bat_priv, vid); if (!vlan) { - net_ratelimited_function(batadv_info, soft_iface, - "adding TT local entry %pM to non-existent VLAN %d\n", - addr, batadv_print_vid(vid)); kmem_cache_free(batadv_tl_cache, tt_local); tt_local = NULL; + ret = -ENOMEM; goto out; }
@@ -769,7 +770,7 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, if (remote_flags ^ (tt_local->common.flags & BATADV_TT_REMOTE_MASK)) batadv_tt_local_event(bat_priv, tt_local, BATADV_NO_FLAGS);
- ret = true; + ret = 1; out: batadv_hardif_put(in_hardif); dev_put(in_dev); diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h index d18740d9a22b..bbdda8488c14 100644 --- a/net/batman-adv/translation-table.h +++ b/net/batman-adv/translation-table.h @@ -16,8 +16,8 @@ #include <linux/types.h>
int batadv_tt_init(struct batadv_priv *bat_priv); -bool batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, - unsigned short vid, int ifindex, u32 mark); +int batadv_tt_local_add(struct net_device *soft_iface, const u8 *addr, + unsigned short vid, int ifindex, u32 mark); u16 batadv_tt_local_remove(struct batadv_priv *bat_priv, const u8 *addr, unsigned short vid, const char *message, bool roaming);
On Wed, Jun 12, 2024 at 11:39:44PM +0200, Linus Lüssing wrote:
So far, if we wanted to bridge VLAN tagged frames into the mesh one would need to manually create an according VLAN interface on top of bat0 first, to trigger batman-adv to create the according structures for a VID.
With this change the VLAN from bridged-in clients is now automatically detected and added to the translation table on the fly.
Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue
Just wanted to add two more remarks / oddities I was stumbling over when working on and testing this and reading the code. Which might be something worth additional patches in the future.
Minor thing, in the following case I would have expected the P/NOPURGE flag to be added, but it wasn't:
``` root@linus-work:/# batctl tl [B.A.T.M.A.N. adv 2024.1-10-g2ee1b45a-dirty, MainIF/MAC: ens3/02:04:64:a4:39:c1 (bat0/02:11:00:00:00:01 BATMAN_V), TTVN: 12] Client VID Flags Last seen (CRC ) 02:11:00:00:00:01 -1 [.P....] 0.000 (0x1350f3f4) 02:11:00:00:00:02 -1 [......] 3.788 (0x1350f3f4) root@linus-work:/# ip link set dev bat0 addr 02:11:00:00:00:02 root@linus-work:/# batctl tl [B.A.T.M.A.N. adv 2024.1-10-g2ee1b45a-dirty, MainIF/MAC: ens3/02:04:64:a4:39:c1 (bat0/02:11:00:00:00:02 BATMAN_V), TTVN: 13] Client VID Flags Last seen (CRC ) 02:11:00:00:00:02 -1 [......] 1.872 (0x8d84de4b) root@linus-work:/# ```
Don't see a functional issue right now though. If it were timing out and deleted and then a frame is sent from bat0 then it would be readded with the P/NOPURGE flag.
Secondly, discovered this yesterday when trying to add multiple VIDs quickly like this:
``` for i in `seq 3 8`; do mz bat0 \ -a 02:33:02:34:00:01 \ -b bcast "81:00 00:0$i 08:00 ca:fe:ca:fe" sleep 1 done ```
Then this won't work. Whenever adding a new VID broadcast traffic seems to stale for 30 seconds from this node due to the "num requests in flight" check in batadv_bla_tx(): https://git.open-mesh.org/batman-adv.git/blob/36cfdc4401412d5a00231b1fd65a95...
So repeating that command every 30 seconds will add one more VID every time. Then after 6x 30 seconds you could use all VIDs as expected, as long as they don't time out. With "batctl bl 0" no such delay + broadcast filtering happens and VIDs would be usable immediately with that loop for the mausezahn command. -> Might make sense to make bat_priv->bla.num_requests per VLAN in the future instead? (would that make sense? or am I missing something else or an easier solution?)
Regards, Linus
And one more thought/idea that just came to my mind:
Maybe after adding dynamic VLAN detection from traffic we could then also exclude the annoying, actually typically unused VID 0 from the static addition? That way we would save quite a bit of overhead as each VLAN currently quite significantly increases the OGM size.
So basically thinking of simply adding a "if (!vid) return" to (.ndo_vlan_rx_add_vid =) batadv_interface_add_vid().
One could still use VID 0 on bat0, it just wouldn't be detected as quickly/immediately, would take an actual payload packet to be added.
Currently VID 0 is added because we set "dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER" in batadv_softif_init_early(). Which in turn triggers: net/8021q/vlan.c:vlan_device_event()->vlan_vid_add(dev, htons(ETH_P_8021Q), 0) And triggers the message "adding VLAN 0 to HW filter on device bat0" in dmesg.
But of course, first the BLA induced 30 seconds broadcast filtering with each added VLAN would need to be solved. As that might be a too easy DoS potential, I guess? Could still need some feedback on how to best solve that (bat_priv->bla.num_requests per VLAN?).
Regards, Linus
Hi there,
We've been running this for a few time and it's very usefull. So is there any news on merging this into the kernel ? Or is the BLA thing blocking ?
However, this patch seems to leak some vlan entries on softif interface deletion. For me simply running a kernel with kmemeleak & doing a simple ip link del bat0 shows splats like theese:
This should take care of kmemleak reports as this one: unreferenced object 0xffff00000cadff00 (size 128): comm "xxx", pid 913, jiffies 4294914994 (age 77762.956s) hex dump (first 32 bytes): 40 c9 35 05 00 00 ff ff 00 00 00 00 00 00 00 00 @.5............. 39 77 ff 61 01 00 00 00 00 00 00 00 00 00 00 00 9w.a............ backtrace: [<00000000f10febf1>] __kmem_cache_alloc_node+0x1d4/0x340 [<00000000178f97a6>] kmalloc_trace+0x40/0x128 [<0000000087db8410>] batadv_softif_vlan_get_or_create+0xa0/0x1c0 [<000000009f648859>] batadv_tt_local_add+0x7ec/0x10f8 [<00000000c0dacbb0>] batadv_softif_create_vlan_own+0x48/0x60 [<00000000676cacd0>] batadv_hard_if_event+0x1a0/0xb58 [<00000000cd053741>] notifier_call_chain+0xb0/0x220 [<0000000019022ed7>] raw_notifier_call_chain+0x1c/0x30 [<00000000e22f9034>] call_netdevice_notifiers_info+0x6c/0xc0 [<00000000fb639003>] register_netdevice+0x5ec/0x778 [<00000000e2ac250c>] batadv_softif_newlink+0x48/0x68 [<00000000b65a146a>] __rtnl_newlink+0x81c/0xb10 [<00000000a91fbe5b>] rtnl_newlink+0x60/0x90 [<0000000043273284>] rtnetlink_rcv_msg+0x3d8/0x568 [<00000000da2bd331>] netlink_rcv_skb+0xc0/0x1e0 [<000000007523d87a>] rtnetlink_rcv+0x1c/0x30
So I fixed it by removing the extra refs that is hold for the vlan entry when added to the global list. For me it seems superfluous. This way it also means that when a vlan entry gets deleted (because the last tt entry holding a ref to it gets dropped) the vlan is freed instead of staying there forever.
Or maybe I missed something ?
--- net/batman-adv/soft-interface.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index b61f35918b5d..d7de54734725 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -599,7 +599,6 @@ batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
atomic_set(&vlan->ap_isolation, 0);
- kref_get(&vlan->refcount); hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
On Thursday, 14 November 2024 14:48:52 CET Nicolas Escande wrote:
Hi there,
We've been running this for a few time and it's very usefull. So is there any news on merging this into the kernel ? Or is the BLA thing blocking ?
I am not in favour of changing the behavior of batman-adv. Now everyone can increase the number of managed VLANs without any control by the node admin.
And as Linus wrote, it also shows odd behaviors. And Antonio also didn't write his opinion here. I have therefore downgraded it from PATCH to RFC (instead of simply rejecting it).
[...]
Or maybe I missed something ?
net/batman-adv/soft-interface.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index b61f35918b5d..d7de54734725 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -599,7 +599,6 @@ batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
atomic_set(&vlan->ap_isolation, 0);
- kref_get(&vlan->refcount); hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
This ref is for the VLAN list entry (just one line below the kref_get). This patch is therefore definitely wrong.
Kind regards, Sven
On Thu Nov 14, 2024 at 2:58 PM CET, Sven Eckelmann wrote:
On Thursday, 14 November 2024 14:48:52 CET Nicolas Escande wrote:
Hi there,
We've been running this for a few time and it's very usefull. So is there any news on merging this into the kernel ? Or is the BLA thing blocking ?
I am not in favour of changing the behavior of batman-adv. Now everyone can increase the number of managed VLANs without any control by the node admin.
Well ok but but it makes configuration so much easier in a setup where we have many vlan that I still find this usefull.
And as Linus wrote, it also shows odd behaviors. And Antonio also didn't write his opinion here. I have therefore downgraded it from PATCH to RFC (instead of simply rejecting it).
This is about the BLA behaviour ? On a setup that doesn't use BLA like mine it makes things so much easier to configure that I still find this patch usefull :)
[...]
Or maybe I missed something ?
net/batman-adv/soft-interface.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index b61f35918b5d..d7de54734725 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -599,7 +599,6 @@ batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
atomic_set(&vlan->ap_isolation, 0);
- kref_get(&vlan->refcount); hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
This ref is for the VLAN list entry (just one line below the kref_get). This patch is therefore definitely wrong.
That's the part I have trouble understanding, we keep 1 ref for the list, 1 ref per TT entry using this vlan. And on interface deletion, we clear the TT tables (so we go back to a refcount of 1 for the global list), but I don't see where we clear the list itself ?
Kind regards, Sven
Hi Nicolas,
Many thanks for your feedback!
On Thu, Nov 14, 2024 at 03:53:11PM +0100, Nicolas Escande wrote:
On Thu Nov 14, 2024 at 2:58 PM CET, Sven Eckelmann wrote:
On Thursday, 14 November 2024 14:48:52 CET Nicolas Escande wrote:
Hi there,
We've been running this for a few time and it's very usefull. So is there any news on merging this into the kernel ? Or is the BLA thing blocking ?
Yes, I think this is at least one of the blocking issues. I didn't quite get my head fully wrapped around the BLA code yet. (Any help/guidance still appreciated :D. BLA is the one code part, other than network coding, that I haven't really looked into and digested so far. And what clashes right now is a specific internal that isn't fully documented on the BLA wiki pages, unless I overlooked it?)
I am not in favour of changing the behavior of batman-adv. Now everyone can increase the number of managed VLANs without any control by the node admin.
Valid concern. Especially as each added VLAN is quite costly for an OGM at the moment. A VLAN with one MAC address adds 12 bytes to a 24 bytes base OGM IV length (excluding ethernet header and other TVLVs).
However in a way for non-VLAN TT entries this is also partially a concern, right? Someone could also flood source MAC addresses in an uncontrolled way, too. (Though would likely have to actively keep roaming to create a constant extra overhead.)
Maybe it would make sense to check how the Linux bridge approaches this?
It seems there is an admin configurable limit of BR_MULTICAST_DEFAULT_HASH_MAX = 4096 entries for the MDB (multicast MAC addresses). And a configurable fdb_max_learned (disabled by default, for backwards compatibility reasons according to bdb4dfda3b) for the FDB (unicast MAC addresse). Thirdly, it seems 4096 VLANs are allowed (VLAN_N_VID), the maximum amount. Though this one does not seem configurable.
Would it maybe make sense to have a knob and by default set the limit to 8 or 16 VLANs? A default which could maybe be increased if the OGM size was decoupled from the amount of VLANs in the future?
---
The two reasons I would like to have a dynamic VLAN feature, especially for wireless community mesh networks:
Allow a group of people to setup and use their own address space when the centrally managed, default one does not match their needs. -> decentralization
Allow to get rid of the unused VID=0 and VID=1 entries by default, only dynamically learn them, to typically reduce the OGM overhead by at least 2*12 bytes. To partially mitigate the overhead regression we introduced by adding TT in compat 15.
---
Just my overall, conceptual thoughts on this feature. Will look into the refcount issue later, thanks for reporting!
Regards, Linus
On Thu Nov 14, 2024 at 7:06 PM CET, Linus Lüssing wrote:
Hi Nicolas,
Many thanks for your feedback!
[...]
I am not in favour of changing the behavior of batman-adv. Now everyone can increase the number of managed VLANs without any control by the node admin.
Valid concern. Especially as each added VLAN is quite costly for an OGM at the moment. A VLAN with one MAC address adds 12 bytes to a 24 bytes base OGM IV length (excluding ethernet header and other TVLVs).
However in a way for non-VLAN TT entries this is also partially a concern, right? Someone could also flood source MAC addresses in an uncontrolled way, too. (Though would likely have to actively keep roaming to create a constant extra overhead.)
Maybe it would make sense to check how the Linux bridge approaches this?
It seems there is an admin configurable limit of BR_MULTICAST_DEFAULT_HASH_MAX = 4096 entries for the MDB (multicast MAC addresses). And a configurable fdb_max_learned (disabled by default, for backwards compatibility reasons according to bdb4dfda3b) for the FDB (unicast MAC addresse). Thirdly, it seems 4096 VLANs are allowed (VLAN_N_VID), the maximum amount. Though this one does not seem configurable.
Would it maybe make sense to have a knob and by default set the limit to 8 or 16 VLANs? A default which could maybe be increased if the OGM size was decoupled from the amount of VLANs in the future?
I really like this idea. This either could be a compile time knob or a dynamic one to ensure we don't easilly have a too big OGM
The two reasons I would like to have a dynamic VLAN feature, especially for wireless community mesh networks:
Allow a group of people to setup and use their own address space when the centrally managed, default one does not match their needs. -> decentralization
Allow to get rid of the unused VID=0 and VID=1 entries by default, only dynamically learn them, to typically reduce the OGM overhead by at least 2*12 bytes. To partially mitigate the overhead regression we introduced by adding TT in compat 15.
Just my overall, conceptual thoughts on this feature. Will look into the refcount issue later, thanks for reporting!
Regards, Linus
Thanks
On Thursday, 14 November 2024 14:58:43 CET Sven Eckelmann wrote:
Or maybe I missed something ?
net/batman-adv/soft-interface.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index b61f35918b5d..d7de54734725 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -599,7 +599,6 @@ batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
atomic_set(&vlan->ap_isolation, 0);
kref_get(&vlan->refcount);
hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
This ref is for the VLAN list entry (just one line below the kref_get). This patch is therefore definitely wrong.
Ok, had a look at the surrounding code from this patch and it looks too me like the reason for what a ref++ stands for was changed and so this initialization also needs to be changed. So, I have to correct my original statement about your patch, Nicholas. It is definitely an improvement to the refcnt (with this patch, not before). But at the same time, it should be made clear by Linus what the reference counter is used for. Especially because it was completely different before the patch and different then most other things in batman-adv.
Starting with not simply dropping the kref_get but with a small comment explaining that there is "no kref_get for list because only TT entries per VLAN and temporary references on stack are tracked. list entries will be removed automatically when refcount reaches zero".
Kind regards, Sven
b.a.t.m.a.n@lists.open-mesh.org