The receiving interface might have used GRO to receive more fragments than MAX_SKB_FRAGS fragments. In this case, these will not be stored in skb_shinfo(skb)->frags but merged into the frag list.
batman-adv relies on the function skb_split to split packets up into multiple smaller packets which are not larger than the MTU on the outgoing interface. But this function cannot handle frag_list entries and is only operating on skb_shinfo(skb)->frags. If it is then still trying to split such an skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST then validate_xmit_skb() will try to linearize it. But this fails due to inconsistent information and __pskb_pull_tail will trigger a BUG_ON after skb_copy_bits() returns an error.
In case of entries in frag_list, just linearize the skb before operating on it with skb_split().
Reported-by: Felix Kaechele felix@kaechele.ca Fixes: 9de347143505 ("batman-adv: layer2 unicast packet fragmentation") Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/fragmentation.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c index 0899a729..c120c7c6 100644 --- a/net/batman-adv/fragmentation.c +++ b/net/batman-adv/fragmentation.c @@ -475,6 +475,17 @@ int batadv_frag_send_packet(struct sk_buff *skb, goto free_skb; }
+ /* GRO might have added fragments to the fragment list instead of + * frags[]. But this is not handled by skb_split and must be + * linearized to avoid incorrect length information after all + * batman-adv fragments were created and submitted to the + * hard-interface + */ + if (skb_has_frag_list(skb) && __skb_linearize(skb)) { + ret = -ENOMEM; + goto free_skb; + } + /* Create one header to be copied to all fragments */ frag_header.packet_type = BATADV_UNICAST_FRAG; frag_header.version = BATADV_COMPAT_VERSION;
Hi there,
initial testing shows that this patch seems to fix the issue. We are currently at 30 minutes of uptime on our fairly busy mesh which is already 15-30 times better than before.
Thanks for the super quick turnaround on this one, especially on easter weekend.
I will report back after some more uptime, but I have a feeling that if it is working right now it will probably continue to function just fine.
Thanks again!
Regards, Felix
On Sat, Apr 16, 2022 at 02:24:34PM +0200, Sven Eckelmann wrote:
The receiving interface might have used GRO to receive more fragments than MAX_SKB_FRAGS fragments. In this case, these will not be stored in skb_shinfo(skb)->frags but merged into the frag list.
batman-adv relies on the function skb_split to split packets up into multiple smaller packets which are not larger than the MTU on the outgoing interface. But this function cannot handle frag_list entries and is only operating on skb_shinfo(skb)->frags. If it is then still trying to split such an skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST then validate_xmit_skb() will try to linearize it. But this fails due to inconsistent information and __pskb_pull_tail will trigger a BUG_ON after skb_copy_bits() returns an error.
In case of entries in frag_list, just linearize the skb before operating on it with skb_split().
Hi Sven
This is not an area of the kernel i'm very familiar with. But i'm wondering, is this a BATMAN specific problem, or a generic problem? Should the fix be in BATMAN, or the core?
Andrew
On Saturday, 16 April 2022 16:21:19 CEST Andrew Lunn wrote:
This is not an area of the kernel i'm very familiar with. But i'm wondering, is this a BATMAN specific problem, or a generic problem? Should the fix be in BATMAN, or the core?
I understand what you mean. But let me cite two places which required to operate on parts of the frag lists:
/* If we need update frag list, we are in troubles. * Certainly, it is possible to add an offset to skb data, * but taking into account that pulling is expected to * be very rare operation, it is worth to fight against * further bloating skb head and crucify ourselves here instead. * Pure masohism, indeed. 8)8) */
/* Misery. We are in troubles, going to mincer fragments... */
And since I cannot reproduce this here at the moment, I've decided not to start with that.
Kind regards, Sven
On Sat, Apr 16, 2022 at 07:17:28PM +0200, Sven Eckelmann wrote:
On Saturday, 16 April 2022 16:21:19 CEST Andrew Lunn wrote:
This is not an area of the kernel i'm very familiar with. But i'm wondering, is this a BATMAN specific problem, or a generic problem? Should the fix be in BATMAN, or the core?
I understand what you mean. But let me cite two places which required to operate on parts of the frag lists:
/* If we need update frag list, we are in troubles. * Certainly, it is possible to add an offset to skb data, * but taking into account that pulling is expected to * be very rare operation, it is worth to fight against * further bloating skb head and crucify ourselves here instead. * Pure masohism, indeed. 8)8) */
/* Misery. We are in troubles, going to mincer fragments... */
And since I cannot reproduce this here at the moment, I've decided not to start with that.
O.K. The BUG_ON() should at least catch other drivers hitting the same issue, and hopefully a search engine will point to this discussion.
However, i suggest you post the fix to netdev, and see what others think.
Andrew
b.a.t.m.a.n@lists.open-mesh.org