During broadcast queueing, the skb_reset_mac_header() sets the skb to a place invalid for a MAC header, pointing right into the batman-adv broadcast packet. Luckily, no one seems to actually use eth_hdr(skb) afterwards until batadv_send_skb_packet() resets the header to a valid position again.
Therefore removing this unnecessary, weird skb_reset_mac_header() call.
Reviewed-by: Sven Eckelmann sven@narfation.org Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue ---
Changes in v2: * none
net/batman-adv/send.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 8d4e1f5..97bdb0c 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -586,8 +586,6 @@ int batadv_add_bcast_packet_to_list(struct batadv_priv *bat_priv, bcast_packet = (struct batadv_bcast_packet *)newskb->data; bcast_packet->ttl--;
- skb_reset_mac_header(newskb); - forw_packet->skb = newskb;
INIT_DELAYED_WORK(&forw_packet->delayed_work,
With this patch, (re)broadcasting on a specific interfaces is avoided:
* No neighbor: There is no need to broadcast on an interface if there is no node behind it.
* Single neighbor is source: If there is just one neighbor on an interface and if this neighbor is the one we actually got this broadcast packet from, then we do not need to echo it back.
* Single neighbor is originator: If there is just one neighbor on an interface and if this neighbor is the originator of this broadcast packet, then we do not need to echo it back.
Goodies for BATMAN V:
("Upgrade your BATMAN IV network to V now to get these for free!")
Thanks to the split of OGMv1 into two packet types, OGMv2 and ELP that is, we can now apply the same opitimizations stated above to OGMv2 packets, too.
Furthermore, with BATMAN V, rebroadcasts can be reduced in certain multi interface cases, too, where BATMAN IV cannot. This is thanks to the removal of the "secondary interface originator" concept in BATMAN V.
Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue ---
Changes in v2: * Changed some "if"s to "switch"es (thanks Sven!) * Moved kref_get() closer to assignment in neigh_node_create() (thanks Sven!) * Added missing hardif_put() calls (urgh... thanks Sven!) * Added kerneldoc for new hardif_neigh_node->orig_node in types.h * Removed "###" in this commit message (seems like patchwork didn't like it)
net/batman-adv/bat_v_ogm.c | 56 +++++++++++++++++++++++++++++++++++++++++ net/batman-adv/hard-interface.c | 53 ++++++++++++++++++++++++++++++++++++++ net/batman-adv/hard-interface.h | 17 +++++++++++++ net/batman-adv/originator.c | 14 ++++++++--- net/batman-adv/routing.c | 1 + net/batman-adv/send.c | 52 ++++++++++++++++++++++++++++++++++++++ net/batman-adv/soft-interface.c | 2 ++ net/batman-adv/types.h | 2 ++ 8 files changed, 193 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 1aeeadc..7a2b274 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -140,6 +140,7 @@ static void batadv_v_ogm_send(struct work_struct *work) unsigned char *ogm_buff, *pkt_buff; int ogm_buff_len; u16 tvlv_len = 0; + int ret;
bat_v = container_of(work, struct batadv_priv_bat_v, ogm_wq.work); bat_priv = container_of(bat_v, struct batadv_priv, bat_v); @@ -182,6 +183,31 @@ static void batadv_v_ogm_send(struct work_struct *work) if (!kref_get_unless_zero(&hard_iface->refcount)) continue;
+ ret = batadv_hardif_no_broadcast(hard_iface, NULL, NULL); + if (ret) { + char *type; + + switch (ret) { + case BATADV_HARDIF_BCAST_NORECIPIENT: + type = "no neighbor"; + break; + case BATADV_HARDIF_BCAST_DUPFWD: + type = "single neighbor is source"; + break; + case BATADV_HARDIF_BCAST_DUPORIG: + type = "single neighbor is originator"; + break; + default: + type = "unknown"; + } + + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "OGM2 from ourselve on %s surpressed: %s\n", + hard_iface->net_dev->name, type); + + batadv_hardif_put(hard_iface); + continue; + } + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Sending own OGM2 packet (originator %pM, seqno %u, throughput %u, TTL %d) on interface %s [%pM]\n", ogm_packet->orig, ntohl(ogm_packet->seqno), @@ -651,6 +677,7 @@ static void batadv_v_ogm_process(const struct sk_buff *skb, int ogm_offset, struct batadv_hard_iface *hard_iface; struct batadv_ogm2_packet *ogm_packet; u32 ogm_throughput, link_throughput, path_throughput; + int ret;
ethhdr = eth_hdr(skb); ogm_packet = (struct batadv_ogm2_packet *)(skb->data + ogm_offset); @@ -716,6 +743,35 @@ static void batadv_v_ogm_process(const struct sk_buff *skb, int ogm_offset, if (!kref_get_unless_zero(&hard_iface->refcount)) continue;
+ ret = batadv_hardif_no_broadcast(hard_iface, + hardif_neigh->orig_node, + ogm_packet->orig); + + if (ret) { + char *type; + + switch (ret) { + case BATADV_HARDIF_BCAST_NORECIPIENT: + type = "no neighbor"; + break; + case BATADV_HARDIF_BCAST_DUPFWD: + type = "single neighbor is source"; + break; + case BATADV_HARDIF_BCAST_DUPORIG: + type = "single neighbor is originator"; + break; + default: + type = "unknown"; + } + + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "OGM2 packet from %pM on %s surpressed: %s\n", + ogm_packet->orig, hard_iface->net_dev->name, + type); + + batadv_hardif_put(hard_iface); + continue; + } + batadv_v_ogm_process_per_outif(bat_priv, ethhdr, ogm_packet, orig_node, neigh_node, if_incoming, hard_iface); diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index 08ce361..39bb5e3 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -228,6 +228,59 @@ bool batadv_is_wifi_netdev(struct net_device *net_device) return false; }
+/** + * batadv_hardif_no_broadcast - check whether (re)broadcast is necessary + * @if_outgoing: + * @orig_neigh: orig node of the forwarder we just got the packet from + * (NULL if we originated) + * @orig_addr: the originator of this packet + * + * Checks whether a packet needs to be (re)broadcasted on the given interface. + * + * Return: + * BATADV_HARDIF_BCAST_NORECIPIENT: No neighbor on interface + * BATADV_HARDIF_BCAST_DUPFWD: Just one neighbor, but it is the forwarder + * BATADV_HARDIF_BCAST_DUPORIG: Just one neighbor, but it is the originator + * BATADV_HARDIF_BCAST_OK: Several neighbors, must broadcast + */ +int batadv_hardif_no_broadcast(struct batadv_hard_iface *if_outgoing, + struct batadv_orig_node *orig_neigh, + u8 *orig_addr) +{ + struct batadv_hardif_neigh_node *hardif_neigh; + struct hlist_node *first; + int ret = BATADV_HARDIF_BCAST_OK; + + rcu_read_lock(); + + /* 0 neighbors -> no (re)broadcast */ + first = rcu_dereference(hlist_first_rcu(&if_outgoing->neigh_list)); + if (!first) { + ret = BATADV_HARDIF_BCAST_NORECIPIENT; + goto out; + } + + /* >1 neighbors -> (re)brodcast */ + if (rcu_dereference(hlist_next_rcu(first))) + goto out; + + hardif_neigh = hlist_entry(first, struct batadv_hardif_neigh_node, + list); + + /* 1 neighbor, is the one we received from -> no rebroadcast */ + if (orig_neigh && hardif_neigh->orig_node == orig_neigh) + ret = BATADV_HARDIF_BCAST_DUPFWD; + + /* 1 neighbor, is the originator -> no rebroadcast */ + if (orig_neigh && orig_addr && + batadv_compare_eth(hardif_neigh->orig_node->orig, orig_addr)) + ret = BATADV_HARDIF_BCAST_DUPORIG; + +out: + rcu_read_unlock(); + return ret; +} + static struct batadv_hard_iface * batadv_hardif_get_active(const struct net_device *soft_iface) { diff --git a/net/batman-adv/hard-interface.h b/net/batman-adv/hard-interface.h index a76724d..1831bc9 100644 --- a/net/batman-adv/hard-interface.h +++ b/net/batman-adv/hard-interface.h @@ -40,6 +40,20 @@ enum batadv_hard_if_state { };
/** + * enum batadv_hard_if_bcast - broadcast avoidance options + * @BATADV_HARDIF_BCAST_OK: Do broadcast on according hard interface + * @BATADV_HARDIF_BCAST_NORECIPIENT: Broadcast not needed, there is no recipient + * @BATADV_HARDIF_BCAST_DUPFWD: There is just the neighbor we got it from + * @BATADV_HARDIF_BCAST_DUPORIG: There is just the originator + */ +enum batadv_hard_if_bcast { + BATADV_HARDIF_BCAST_OK = 0, + BATADV_HARDIF_BCAST_NORECIPIENT, + BATADV_HARDIF_BCAST_DUPFWD, + BATADV_HARDIF_BCAST_DUPORIG, +}; + +/** * enum batadv_hard_if_cleanup - Cleanup modi for soft_iface after slave removal * @BATADV_IF_CLEANUP_KEEP: Don't automatically delete soft-interface * @BATADV_IF_CLEANUP_AUTO: Delete soft-interface after last slave was removed @@ -63,6 +77,9 @@ void batadv_hardif_remove_interfaces(void); int batadv_hardif_min_mtu(struct net_device *soft_iface); void batadv_update_min_mtu(struct net_device *soft_iface); void batadv_hardif_release(struct kref *ref); +int batadv_hardif_no_broadcast(struct batadv_hard_iface *if_outgoing, + struct batadv_orig_node *orig_neigh, + u8 *orig_addr);
/** * batadv_hardif_put - decrement the hard interface refcounter and possibly diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 5f3bfc4..00c934d 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -236,6 +236,7 @@ static void batadv_hardif_neigh_release(struct kref *ref) spin_unlock_bh(&hardif_neigh->if_incoming->neigh_list_lock);
batadv_hardif_put(hardif_neigh->if_incoming); + batadv_orig_node_put(hardif_neigh->orig_node); kfree_rcu(hardif_neigh, rcu); }
@@ -517,7 +518,8 @@ batadv_neigh_node_get(const struct batadv_orig_node *orig_node, */ static struct batadv_hardif_neigh_node * batadv_hardif_neigh_create(struct batadv_hard_iface *hard_iface, - const u8 *neigh_addr) + const u8 *neigh_addr, + struct batadv_orig_node *orig_node) { struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface); struct batadv_hardif_neigh_node *hardif_neigh = NULL; @@ -539,6 +541,9 @@ batadv_hardif_neigh_create(struct batadv_hard_iface *hard_iface, hardif_neigh->if_incoming = hard_iface; hardif_neigh->last_seen = jiffies;
+ kref_get(&orig_node->refcount); + hardif_neigh->orig_node = orig_node; + kref_init(&hardif_neigh->refcount);
if (bat_priv->algo_ops->neigh.hardif_init) @@ -561,7 +566,8 @@ out: */ static struct batadv_hardif_neigh_node * batadv_hardif_neigh_get_or_create(struct batadv_hard_iface *hard_iface, - const u8 *neigh_addr) + const u8 *neigh_addr, + struct batadv_orig_node *orig_node) { struct batadv_hardif_neigh_node *hardif_neigh = NULL;
@@ -570,7 +576,7 @@ batadv_hardif_neigh_get_or_create(struct batadv_hard_iface *hard_iface, if (hardif_neigh) return hardif_neigh;
- return batadv_hardif_neigh_create(hard_iface, neigh_addr); + return batadv_hardif_neigh_create(hard_iface, neigh_addr, orig_node); }
/** @@ -630,7 +636,7 @@ batadv_neigh_node_create(struct batadv_orig_node *orig_node, goto out;
hardif_neigh = batadv_hardif_neigh_get_or_create(hard_iface, - neigh_addr); + neigh_addr, orig_node); if (!hardif_neigh) goto out;
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index 610f2c4..c1e5aa7 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -1142,6 +1142,7 @@ int batadv_recv_bcast_packet(struct sk_buff *skb, goto out;
batadv_skb_set_priority(skb, sizeof(struct batadv_bcast_packet)); + skb_set_inner_mac_header(skb, sizeof(struct batadv_bcast_packet));
/* rebroadcast packet */ batadv_add_bcast_packet_to_list(bat_priv, skb, 1); diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 97bdb0c..e997daa 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -603,11 +603,16 @@ err: static void batadv_send_outstanding_bcast_packet(struct work_struct *work) { struct batadv_hard_iface *hard_iface; + struct batadv_hardif_neigh_node *neigh_node; + struct batadv_orig_node *orig_neigh; struct delayed_work *delayed_work; struct batadv_forw_packet *forw_packet; + struct batadv_bcast_packet *bcast_packet; struct sk_buff *skb1; struct net_device *soft_iface; struct batadv_priv *bat_priv; + u8 *neigh_addr; + int ret = 0;
delayed_work = to_delayed_work(work); forw_packet = container_of(delayed_work, struct batadv_forw_packet, @@ -625,6 +630,8 @@ static void batadv_send_outstanding_bcast_packet(struct work_struct *work) if (batadv_dat_drop_broadcast_packet(bat_priv, forw_packet)) goto out;
+ bcast_packet = (struct batadv_bcast_packet *)forw_packet->skb->data; + /* rebroadcast packet */ rcu_read_lock(); list_for_each_entry_rcu(hard_iface, &batadv_hardif_list, list) { @@ -634,6 +641,51 @@ static void batadv_send_outstanding_bcast_packet(struct work_struct *work) if (forw_packet->num_packets >= hard_iface->num_bcasts) continue;
+ /* hint for own origin -> no neigh_node */ + if (skb_mac_header(forw_packet->skb) == + skb_inner_mac_header(forw_packet->skb)) { + neigh_node = NULL; + } else { + neigh_addr = eth_hdr(forw_packet->skb)->h_source; + neigh_node = batadv_hardif_neigh_get(hard_iface, + neigh_addr); + } + + orig_neigh = neigh_node ? neigh_node->orig_node : NULL; + + ret = batadv_hardif_no_broadcast(hard_iface, orig_neigh, + bcast_packet->orig); + + if (ret) { + char *type; + + switch (ret) { + case BATADV_HARDIF_BCAST_NORECIPIENT: + type = "no neighbor"; + break; + case BATADV_HARDIF_BCAST_DUPFWD: + type = "single neighbor is source"; + break; + case BATADV_HARDIF_BCAST_DUPORIG: + type = "single neighbor is originator"; + break; + default: + type = "unknown"; + } + + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "BCAST packet from orig %pM on %s surpressed: %s\n", + bcast_packet->orig, + hard_iface->net_dev->name, type); + + if (neigh_node) + batadv_hardif_neigh_put(neigh_node); + + continue; + } + + if (neigh_node) + batadv_hardif_neigh_put(neigh_node); + if (!kref_get_unless_zero(&hard_iface->refcount)) continue;
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 49e16b6..483b93d 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -315,6 +315,8 @@ send: if (batadv_dat_snoop_outgoing_arp_request(bat_priv, skb)) brd_delay = msecs_to_jiffies(ARP_REQ_DELAY);
+ skb_reset_inner_mac_header(skb); + if (batadv_skb_head_push(skb, sizeof(*bcast_packet)) < 0) goto dropped;
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index b3dd1a3..6611ca5 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -409,6 +409,7 @@ struct batadv_hardif_neigh_node_bat_v { * @list: list node for batadv_hard_iface::neigh_list * @addr: the MAC address of the neighboring interface * @if_incoming: pointer to incoming hard-interface + * @orig_node: the originator this neighbor node belongs to * @last_seen: when last packet via this neighbor was received * @bat_v: B.A.T.M.A.N. V private data * @refcount: number of contexts the object is used @@ -418,6 +419,7 @@ struct batadv_hardif_neigh_node { struct hlist_node list; u8 addr[ETH_ALEN]; struct batadv_hard_iface *if_incoming; + struct batadv_orig_node *orig_node; unsigned long last_seen; #ifdef CONFIG_BATMAN_ADV_BATMAN_V struct batadv_hardif_neigh_node_bat_v bat_v;
On Montag, 1. August 2016 22:38:46 CEST Linus Lüssing wrote:
--- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -236,6 +236,7 @@ static void batadv_hardif_neigh_release(struct kref *ref) spin_unlock_bh(&hardif_neigh->if_incoming->neigh_list_lock);
batadv_hardif_put(hardif_neigh->if_incoming);
- batadv_orig_node_put(hardif_neigh->orig_node); kfree_rcu(hardif_neigh, rcu);
}
[...]
@@ -539,6 +541,9 @@ batadv_hardif_neigh_create(struct batadv_hard_iface *hard_iface, hardif_neigh->if_incoming = hard_iface; hardif_neigh->last_seen = jiffies;
kref_get(&orig_node->refcount);
hardif_neigh->orig_node = orig_node;
kref_init(&hardif_neigh->refcount);
if (bat_priv->algo_ops->neigh.hardif_init)
[...]
@@ -418,6 +419,7 @@ struct batadv_hardif_neigh_node { struct hlist_node list; u8 addr[ETH_ALEN]; struct batadv_hard_iface *if_incoming;
- struct batadv_orig_node *orig_node; unsigned long last_seen;
#ifdef CONFIG_BATMAN_ADV_BATMAN_V struct batadv_hardif_neigh_node_bat_v bat_v;
Isn't this causing the reference counting cycle (aka really, really, really bad):
batadv_orig_node::last_bonding_candidate -> batadv_orig_info batadv_orig_node::neigh_list -> batadv_neigh_node batadv_orig_ifino::router -> batadv_neigh_node batadv_neigh_node::hardif_neigh -> batadv_hardif_neigh_node batadv_hardif_neigh_node::orig_node -> batadv_orig_node
See the attached graphic for a visualization.
Kind regards, Sven
On Wed, Aug 03, 2016 at 09:59:27PM +0200, Sven Eckelmann wrote:
Isn't this causing the reference counting cycle (aka really, really, really bad):
Hm, I see what you are getting at. Could indeed be a nasty bug...
But, actually, just tested in VMs, seems like I'm getting a call to batadv_hardif_neigh_release() just fine. Which means it counted to zero successfully o_O?
I don't understand why it seems to work right now :D. Will dig deeper.
On Wed, Aug 03, 2016 at 11:25:53PM +0200, Linus Lüssing wrote:
On Wed, Aug 03, 2016 at 09:59:27PM +0200, Sven Eckelmann wrote:
Isn't this causing the reference counting cycle (aka really, really, really bad):
Hm, I see what you are getting at. Could indeed be a nasty bug...
But, actually, just tested in VMs, seems like I'm getting a call to batadv_hardif_neigh_release() just fine. Which means it counted to zero successfully o_O?
I don't understand why it seems to work right now :D. Will dig deeper.
Ok, I think it just works because hardif_neigh_create() intializes the nodes refcount to one (in contrast to other allocating functions which initialize to 2). In the end, once neigh_node_create() finishes, it stays at one.
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
On Donnerstag, 4. August 2016 01:43:29 CEST Linus Lüssing wrote:
On Wed, Aug 03, 2016 at 11:25:53PM +0200, Linus Lüssing wrote:
On Wed, Aug 03, 2016 at 09:59:27PM +0200, Sven Eckelmann wrote:
Isn't this causing the reference counting cycle (aka really, really, really bad):
Hm, I see what you are getting at. Could indeed be a nasty bug...
But, actually, just tested in VMs, seems like I'm getting a call to batadv_hardif_neigh_release() just fine. Which means it counted to zero successfully o_O?
I don't understand why it seems to work right now :D. Will dig deeper.
Ok, I think it just works because hardif_neigh_create() intializes the nodes refcount to one (in contrast to other allocating functions which initialize to 2). In the end, once neigh_node_create() finishes, it stays at one.
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
Ok, looks like the neigh_list edge in my graph is incorrectly there. But what about last_bonding_candidate? This definitely has a reference counter (and needs it) and thus your code would cause problems when bonding is enabled.
Kind regards, Sven
On Donnerstag, 4. August 2016 07:56:44 CEST Sven Eckelmann wrote:
On Donnerstag, 4. August 2016 01:43:29 CEST Linus Lüssing wrote:
On Wed, Aug 03, 2016 at 11:25:53PM +0200, Linus Lüssing wrote:
On Wed, Aug 03, 2016 at 09:59:27PM +0200, Sven Eckelmann wrote:
Isn't this causing the reference counting cycle (aka really, really, really bad):
Hm, I see what you are getting at. Could indeed be a nasty bug...
But, actually, just tested in VMs, seems like I'm getting a call to batadv_hardif_neigh_release() just fine. Which means it counted to zero successfully o_O?
I don't understand why it seems to work right now :D. Will dig deeper.
Ok, I think it just works because hardif_neigh_create() intializes the nodes refcount to one (in contrast to other allocating functions which initialize to 2). In the end, once neigh_node_create() finishes, it stays at one.
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
Ok, looks like the neigh_list edge in my graph is incorrectly there.
[...]
Just checked the code and my example graph seems to be correct in regards of this edge. There is an reference for the list in the code:
static struct batadv_neigh_node * batadv_neigh_node_create(struct batadv_orig_node *orig_node, struct batadv_hard_iface *hard_iface, const u8 *neigh_addr) { [...] /* extra reference for return */ kref_init(&neigh_node->refcount);
kref_get(&neigh_node->refcount); hlist_add_head_rcu(&neigh_node->list, &orig_node->neigh_list);
[...] return neigh_node; }
The function which doesn't get a reference for its list is batadv_hardif_neigh_create. It doesn't store a reference in batadv_hardif_neigh_create for the list because its lifetime doesn't depend on the list & just uses it as management helper and deletes the list entry in its _release function). But this list cannot be found in the example graph.
Kind regards, Sven
On Thu, Aug 04, 2016 at 07:56:44AM +0200, Sven Eckelmann wrote:
Ok, looks like the neigh_list edge in my graph is incorrectly there. But what about last_bonding_candidate? This definitely has a reference counter (and needs it) and thus your code would cause problems when bonding is enabled.
One more tiny thing, that I noticed during rereading & playing with that part of the code:
The usage of orig_node->last_bonding_candidate looks a lot like rcu-pointer style. However it is read & set without any rcu_dereference() / rcu_assign() wrappers.
Is something else protecting a simultaneous read (for instance from the neighbor table output) of last_bonding_candidate while being changed via batadv_last_bonding_replace()?
On Samstag, 6. August 2016 03:11:12 CEST Linus Lüssing wrote:
On Thu, Aug 04, 2016 at 07:56:44AM +0200, Sven Eckelmann wrote:
Ok, looks like the neigh_list edge in my graph is incorrectly there. But
what
about last_bonding_candidate? This definitely has a reference counter (and needs it) and thus your code would cause problems when bonding is enabled.
One more tiny thing, that I noticed during rereading & playing with that part of the code:
The usage of orig_node->last_bonding_candidate looks a lot like rcu-pointer style. However it is read & set without any rcu_dereference() / rcu_assign() wrappers.
It is set+accessed with a spinlock.
With rcu on the read side (and spinlock on the write side) we should guarantee that the rcu protected pointer is mostly read [1]. This is not the case here and thus it is using spinlocks at the moment.
Is something else protecting a simultaneous read (for instance from the neighbor table output) of last_bonding_candidate while being changed via batadv_last_bonding_replace()?
See above
What has the neighbor table output to do with last_bonding_candidate? It doesn't seem to access this pointer. Maybe I am missing something here. Can you please give an example how these are connected.
RCU also doesn't protect the changes to the object itself. It only helps us not to access invalid (free'd) memory regions when another context just replaced our pointer. This is done by splitting the previously (more or less) single operation of removing the object into two phases. One is to remove the object from the datastructure (list, pointer, ..) and then only reclaiming the memory when no one (correctly using rcu) can see the old object anymore. It is possible in the time between the removal from the datastructure that one context sees the new object when accessing the datastructure and another doesn't. Similar things can happen when new objects are added to the datastructure.
Just using the spinlock on both sides gives us the same protection (without the conflicting views of the datastructure from different contexts) - but it would be slower when we have multiple readers (and nearly no writer) trying to access the pointer all the time. This doesn't seem to be the case here because the main function batadv_find_router is at the same time reader+writer.
And just to remind the reader: The content of the object referenced in the datastructure still has to be protected with a spinlock (or something similar) in case multiple contexts may want to change the content.
And to be fair: There is one case were a spinlock is missing in batadv_find_router
last_candidate = orig_node->last_bonding_candidate; if (last_candidate) last_cand_router = rcu_dereference(last_candidate->router);
I had this on my list but mostly forgot about it while chasing the reference counting bugs. Maybe you found more problems but I am not sure which ones :)
Kind regards, Sven
[1] https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt search for "read-mostly" situations
On Sat, Aug 06, 2016 at 10:13:38AM +0200, Sven Eckelmann wrote:
And to be fair: There is one case were a spinlock is missing in batadv_find_router
last_candidate = orig_node->last_bonding_candidate; if (last_candidate) last_cand_router = rcu_dereference(last_candidate->router);
I had this on my list but mostly forgot about it while chasing the reference counting bugs. Maybe you found more problems but I am not sure which ones :)
Right, that was the part that startled me in the first place :-). (bc. of the rcu_read_lock() one line earlier, I falsely assumed that the author wanted to have it rcu-locked for the reader-side - but you are right, spinlocking for both reader and writer side is another option :) )
On Donnerstag, 4. August 2016 01:43:29 CEST Linus Lüssing wrote: [...]
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
Ok, lets go through it with following idea:
1. the cycle exists (including the missing batadv_orig_node::ifinfo_list in my earlier example graph) 2. batadv_purge_orig is called via the worker (for some reason with a large delay) 3. _batadv_purge_orig goes through all entries in the orig hashtable 4. batadv_purge_orig_node is called 5. batadv_has_timed_out returns "hey, you are old - go away" 6. batadv_purge_orig_node now only does for this orig_entry: - batadv_gw_node_delete(bat_priv, orig_node); - hlist_del_rcu(&orig_node->hash_entry); - batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, -1, "originator timed out"); - batadv_orig_node_put(orig_node);
So the most cleanup work (batadv_orig_node_release) is only done when the orig_node refcnt reaches zero.
The batadv_purge_orig_ifinfo, batadv_purge_orig_neighbors and the rest from batadv_purge_orig_node is only done when the originator has not yet reached its timeout. I would therefore guess it working because you are lucky and your batadv_purge_orig_neighbors and batadv_purge_orig_ifinfo sometimes(tm) solve this problem for you.
Kind regards, Sven
On Thu, Aug 04, 2016 at 09:29:20AM +0200, Sven Eckelmann wrote:
On Donnerstag, 4. August 2016 01:43:29 CEST Linus Lüssing wrote: [...]
So the regular purging routines will break the cycle in the end when they reduce the refcount of the hardif_neigh_node to zero.
Ok, lets go through it with following idea:
- the cycle exists (including the missing batadv_orig_node::ifinfo_list in my earlier example graph)
- batadv_purge_orig is called via the worker (for some reason with a large delay)
- _batadv_purge_orig goes through all entries in the orig hashtable
- batadv_purge_orig_node is called
- batadv_has_timed_out returns "hey, you are old - go away"
- batadv_purge_orig_node now only does for this orig_entry:
- batadv_gw_node_delete(bat_priv, orig_node);
- hlist_del_rcu(&orig_node->hash_entry);
- batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, -1, "originator timed out");
- batadv_orig_node_put(orig_node);
So the most cleanup work (batadv_orig_node_release) is only done when the orig_node refcnt reaches zero.
The batadv_purge_orig_ifinfo, batadv_purge_orig_neighbors and the rest from batadv_purge_orig_node is only done when the originator has not yet reached its timeout. I would therefore guess it working because you are lucky and your batadv_purge_orig_neighbors and batadv_purge_orig_ifinfo sometimes(tm) solve this problem for you.
Did some more tests today. Confirmed.
batadv_purge_orig_node has 2*PURGE_TIMEOUT, while purge_orig_neighbours only has 1*PURGE_TIMEOUT. All hardif_neigh_release() and orig_node_release() calls happened as expected, even with the patch of this thread.
purge_orig_neighbors seems to reset the orig_node->last_bonding_candidate to NULL and break the cycle.
If I switch the two purging intervals, then I have the memory leak with this patch. Without this patch, no memory leak.
I'll send a v3 with a simple six bytes orig address copy instead of a full orig_node reference.
Thanks for this awesome finding of a rare but nasty bug! Even though it'll probably mostly not happen due to the big timeout differences, it would nevertheless have been very, very difficult to spot & reproduce later on...
b.a.t.m.a.n@lists.open-mesh.org