From: Simon Wunderlich simon@open-mesh.com
When changing configuration on top of batman-adv, for example in firmwares, there are two cases which appeared to be problematic and causing initial temporary loops when starting batman-adv.
* when activating batman-adv with vlans and bridges, it may take longer than 30 seconds to start up the bridges. If the grace period is over by that time, the BLA grace period may already be over. Therefore increase the grace period to 60 seconds. * When changing the network configuration, e.g. adding or removing wired interface in a bridge, it may be useful to re-start the grace period. Therefore the bridge loop avoidance state is dropped when BLA is turned off, so turning off/on bla can be used to restart the grace period.
These two fixes addressed the problems we had in our setups.
The patches are based on top of the current master (since its technically more a feature than a fix), plus the kerneldoc fix sent last week.
Cheers, Simon
Simon Wunderlich (2): batman-adv: purge bridge loop avoidance when its disabled batman-adv: increase BLA wait periods to 6
net/batman-adv/bridge_loop_avoidance.c | 20 ++++++++++++++++++++ net/batman-adv/bridge_loop_avoidance.h | 1 + net/batman-adv/main.h | 2 +- net/batman-adv/sysfs.c | 4 +++- 4 files changed, 25 insertions(+), 2 deletions(-)
From: Simon Wunderlich simon@open-mesh.com
When bridge loop avoidance is disabled through sysfs, the internal datastructures are not disabled, but only BLA operations are disabled. To be sure that they are removed, purge the data immediately. That is especially useful if a firmwares network state is changed, and the BLA wait periods should restart on the new network.
Signed-off-by: Simon Wunderlich simon@open-mesh.com --- net/batman-adv/bridge_loop_avoidance.c | 20 ++++++++++++++++++++ net/batman-adv/bridge_loop_avoidance.h | 1 + net/batman-adv/sysfs.c | 4 +++- 3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c index e068ad9..d9b034a 100644 --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@ -1247,6 +1247,26 @@ void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, }
/** + * batadv_bla_status_update - purge bla interfaces if necessary + * @net_dev: the soft interface net device + */ +void batadv_bla_status_update(struct net_device *net_dev) +{ + struct batadv_priv *bat_priv = netdev_priv(net_dev); + struct batadv_hard_iface *primary_if; + + primary_if = batadv_primary_if_get_selected(bat_priv); + if (!primary_if) + return; + + /* this function already purges everything when bla is disabled, + * so just call that one. + */ + batadv_bla_update_orig_address(bat_priv, primary_if, primary_if); + batadv_hardif_free_ref(primary_if); +} + +/** * batadv_bla_periodic_work - performs periodic bla work * @work: kernel work struct * diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h index 025152b..db1aeb5 100644 --- a/net/batman-adv/bridge_loop_avoidance.h +++ b/net/batman-adv/bridge_loop_avoidance.h @@ -42,6 +42,7 @@ int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, struct batadv_hard_iface *primary_if, struct batadv_hard_iface *oldif); +void batadv_bla_status_update(struct net_device *net_dev); int batadv_bla_init(struct batadv_priv *bat_priv); void batadv_bla_free(struct batadv_priv *bat_priv);
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 2bcc7f6..9569494 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -40,6 +40,7 @@ #include "distributed-arp-table.h" #include "gateway_client.h" #include "gateway_common.h" +#include "bridge_loop_avoidance.h" #include "hard-interface.h" #include "network-coding.h" #include "packet.h" @@ -549,7 +550,8 @@ static ssize_t batadv_store_isolation_mark(struct kobject *kobj, BATADV_ATTR_SIF_BOOL(aggregated_ogms, S_IRUGO | S_IWUSR, NULL); BATADV_ATTR_SIF_BOOL(bonding, S_IRUGO | S_IWUSR, NULL); #ifdef CONFIG_BATMAN_ADV_BLA -BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL); +BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, + batadv_bla_status_update); #endif #ifdef CONFIG_BATMAN_ADV_DAT BATADV_ATTR_SIF_BOOL(distributed_arp_table, S_IRUGO | S_IWUSR,
On Monday, November 09, 2015 16:20:52 Simon Wunderlich wrote:
-BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL); +BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR,
batadv_bla_status_update);
#endif
Are we sure this is correct ? The post function is called whether or not there actually was a change in the setting. The check in __batadv_store_bool_attr() is this:
ret = batadv_store_bool_attr(buff, count, net_dev, attr->name, attr_store); if (post_func && ret) post_func(net_dev);
Let's ignore for now that ret should be changed to check for '> 0' to avoid calling post_func() in case of an error. The return value is always non- negative unless the input is broken. You could enable BLA while it already is enabled which would reset all claim tables. Is that intended ?
Cheers, Marek
On Tuesday 17 November 2015 16:01:27 Marek Lindner wrote:
On Monday, November 09, 2015 16:20:52 Simon Wunderlich wrote:
-BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL); +BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR,
batadv_bla_status_update);
#endif
Are we sure this is correct ? The post function is called whether or not there actually was a change in the setting. The check in __batadv_store_bool_attr() is this:
ret = batadv_store_bool_attr(buff, count, net_dev, attr->name, attr_store); if (post_func && ret) post_func(net_dev);
Let's ignore for now that ret should be changed to check for '> 0' to avoid calling post_func() in case of an error. The return value is always non- negative unless the input is broken. You could enable BLA while it already is enabled which would reset all claim tables. Is that intended ?
Its not intended, although my initial thought was that it didn't hurt too much - the backbone gateway and claim tables would be dropped and the interface would go into the "protected" state again, not allowing broadcasts for 30 (or 60 seconds, if the second patch is applied).
However, since you brought up this point, I think we should really change the behaviour of batadv_store_bool_attr() and friends, only calling post_func if there really was a change. I've checked the other functions which use that, and there shouldn't be any problem with that as far as I see - they do all some changes which depend on actual changes of the respective parameter. The other update functions are:
* batadv_dat_status_update * batadv_update_min_mtu * batadv_post_gw_reselect * batadv_nc_status_update
If you agree, I'd send another patch to change the behaviour as proposed.
Cheers, Simon
On Monday, November 09, 2015 16:20:52 Simon Wunderlich wrote:
From: Simon Wunderlich simon@open-mesh.com
When bridge loop avoidance is disabled through sysfs, the internal datastructures are not disabled, but only BLA operations are disabled. To be sure that they are removed, purge the data immediately. That is especially useful if a firmwares network state is changed, and the BLA wait periods should restart on the new network.
Signed-off-by: Simon Wunderlich simon@open-mesh.com
net/batman-adv/bridge_loop_avoidance.c | 20 ++++++++++++++++++++ net/batman-adv/bridge_loop_avoidance.h | 1 + net/batman-adv/sysfs.c | 4 +++- 3 files changed, 24 insertions(+), 1 deletion(-)
Applied in revision 07ed3c3.
Thanks, Marek
From: Simon Wunderlich simon@open-mesh.com
If networks take a long time to come up, e.g. due to lossy links, then the bridge loop avoidance wait time to suppress broadcasts may not wait long enough and detect a backbone before the mesh is brought up. Increasing the wait period further to 60 seconds makes this scenario less likely.
Signed-off-by: Simon Wunderlich simon@open-mesh.com --- net/batman-adv/main.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h index 79e2052..dcca44b 100644 --- a/net/batman-adv/main.h +++ b/net/batman-adv/main.h @@ -109,7 +109,7 @@ #define BATADV_MAX_AGGREGATION_MS 100
#define BATADV_BLA_PERIOD_LENGTH 10000 /* 10 seconds */ -#define BATADV_BLA_BACKBONE_TIMEOUT (BATADV_BLA_PERIOD_LENGTH * 3) +#define BATADV_BLA_BACKBONE_TIMEOUT (BATADV_BLA_PERIOD_LENGTH * 6) #define BATADV_BLA_CLAIM_TIMEOUT (BATADV_BLA_PERIOD_LENGTH * 10) #define BATADV_BLA_WAIT_PERIODS 3
On Monday, November 09, 2015 16:20:53 Simon Wunderlich wrote:
From: Simon Wunderlich simon@open-mesh.com
If networks take a long time to come up, e.g. due to lossy links, then the bridge loop avoidance wait time to suppress broadcasts may not wait long enough and detect a backbone before the mesh is brought up. Increasing the wait period further to 60 seconds makes this scenario less likely.
Signed-off-by: Simon Wunderlich simon@open-mesh.com
net/batman-adv/main.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Applied in revision c57af3a.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org