skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the skb checksum, otherwise log spam of the form "bat0: hw csum failure" will result when packets with CHECKSUM_COMPLETE are received (at least in some setups, e.g. when stacking batman-adv on top of VXLAN).
Signed-off-by: Matthias Schiffer mschiffer@universe-factory.net ---
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
net/batman-adv/soft-interface.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index c95e2b26..edeffcb9 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -459,13 +459,7 @@ void batadv_interface_rx(struct net_device *soft_iface,
/* skb->dev & skb->pkt_type are set here */ skb->protocol = eth_type_trans(skb, soft_iface); - - /* should not be necessary anymore as we use skb_pull_rcsum() - * TODO: please verify this and remove this TODO - * -- Dec 21st 2009, Simon Wunderlich - */ - - /* skb->ip_summed = CHECKSUM_UNNECESSARY; */ + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
batadv_inc_counter(bat_priv, BATADV_CNT_RX); batadv_add_counter(bat_priv, BATADV_CNT_RX_BYTES,
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the skb checksum, otherwise log spam of the form "bat0: hw csum failure" will result when packets with CHECKSUM_COMPLETE are received (at least in some setups, e.g. when stacking batman-adv on top of VXLAN).
Would be nice to have a better explanation here.
The comment previously assumed that skb_pull_rcsum would be enough. But the problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The actual pull of the ethernet header (with skb_pull_inline) happens inside eth_type_trans. Or did I miss anything?
[...]
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
Yes, this is broken since earliest commits. The most relevant commit in batman-adv is:
Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
But I would propose to use following in the kernel tree:
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released...
But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name email@example.org" [2] reply when it works.
Thanks, Sven
[1] https://patchwork.open-mesh.org/patch/17250/ [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-...
On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the skb checksum, otherwise log spam of the form "bat0: hw csum failure" will result when packets with CHECKSUM_COMPLETE are received (at least in some setups, e.g. when stacking batman-adv on top of VXLAN).
Would be nice to have a better explanation here.
The comment previously assumed that skb_pull_rcsum would be enough. But the problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The actual pull of the ethernet header (with skb_pull_inline) happens inside eth_type_trans. Or did I miss anything?
This is correct, eth_type_trans() contains a simple skb_pull(), so the csum must be adjusted afterwards (grepping the kernel for eth_type_trans will find a lot of this). I can send a v2 with a better commit message later.
[...]
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
Yes, this is broken since earliest commits. The most relevant commit in batman-adv is:
Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
But I would propose to use following in the kernel tree:
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released...
But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name email@example.org" [2] reply when it works.
Thanks,> Sven
I've tested this on Kernel 4.14.14 (everything working correctly now) and 4.4.110 (here, there are still checksum errors; it seems on older kernels, the checksum handling in VXLAN is broken too? Still debugging this...)
Matthias
[1] https://patchwork.open-mesh.org/patch/17250/ [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-...
On 01/22/2018 10:18 PM, Matthias Schiffer wrote:
On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the skb checksum, otherwise log spam of the form "bat0: hw csum failure" will result when packets with CHECKSUM_COMPLETE are received (at least in some setups, e.g. when stacking batman-adv on top of VXLAN).
Would be nice to have a better explanation here.
The comment previously assumed that skb_pull_rcsum would be enough. But the problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The actual pull of the ethernet header (with skb_pull_inline) happens inside eth_type_trans. Or did I miss anything?
This is correct, eth_type_trans() contains a simple skb_pull(), so the csum must be adjusted afterwards (grepping the kernel for eth_type_trans will find a lot of this). I can send a v2 with a better commit message later.
[...]
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
Yes, this is broken since earliest commits. The most relevant commit in batman-adv is:
Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
But I would propose to use following in the kernel tree:
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released...
But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name email@example.org" [2] reply when it works.
Thanks,> Sven
I've tested this on Kernel 4.14.14 (everything working correctly now) and 4.4.110 (here, there are still checksum errors; it seems on older kernels, the checksum handling in VXLAN is broken too? Still debugging this...)
I've found the issue of this other checksum problem: batman-adv fragmentation code doesn't handle the checksum on reassembly at all. I think the best option here is to simply set ip_summed to CHECKSUM_NONE on reassembly, I will send another patch for that.
The IP fragmentation code does more fancy things when all fragments have CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain circumstances. This only works because IP fragments are guaranteed to be split at even byte offsets (multiples of 8, actually); as far as I can tell, batman-adv allows odd fragment sizes, making it impossible to add up the 16bit checksums in the general case.
Matthias
[1] https://patchwork.open-mesh.org/patch/17250/ [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-...
Anno domini 2018 Matthias Schiffer scripsit:
Hi,
On 01/22/2018 10:18 PM, Matthias Schiffer wrote:
On 01/22/2018 09:52 PM, Sven Eckelmann wrote:
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the skb checksum, otherwise log spam of the form "bat0: hw csum failure" will result when packets with CHECKSUM_COMPLETE are received (at least in some setups, e.g. when stacking batman-adv on top of VXLAN).
Would be nice to have a better explanation here.
The comment previously assumed that skb_pull_rcsum would be enough. But the problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The actual pull of the ethernet header (with skb_pull_inline) happens inside eth_type_trans. Or did I miss anything?
This is correct, eth_type_trans() contains a simple skb_pull(), so the csum must be adjusted afterwards (grepping the kernel for eth_type_trans will find a lot of this). I can send a v2 with a better commit message later.
[...]
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
Yes, this is broken since earliest commits. The most relevant commit in batman-adv is:
Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
But I would propose to use following in the kernel tree:
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released...
But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name email@example.org" [2] reply when it works.
Thanks,> Sven
I've tested this on Kernel 4.14.14 (everything working correctly now) and 4.4.110 (here, there are still checksum errors; it seems on older kernels, the checksum handling in VXLAN is broken too? Still debugging this...)
I've found the issue of this other checksum problem: batman-adv fragmentation code doesn't handle the checksum on reassembly at all. I think the best option here is to simply set ip_summed to CHECKSUM_NONE on reassembly, I will send another patch for that.
The IP fragmentation code does more fancy things when all fragments have CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain circumstances. This only works because IP fragments are guaranteed to be split at even byte offsets (multiples of 8, actually); as far as I can tell, batman-adv allows odd fragment sizes, making it impossible to add up the 16bit checksums in the general case.
And
Tested-By: Maximilian Wilhelm max@sdn.clinic
to the fix for fragmentation.c, too.
Disclaimer: As MTUs are calculated accordingly in our backbone fragmentation of VXLAN packets isn't an issue and we did not see these messages before. I can confirm, that I still don't see any now, meaning the log spam from the previous fix is still fixed and no new issues have arisen as of now.
Thanks a lot! <3
Best Max
Anno domini 2018 Sven Eckelmann scripsit:
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote:
[...]
I don't know what the exact circumstances are that trigger the log spam, but it seems this was broken forever (I could also reproduce the issue with our compat-14 legacy branch)... so please ask David to queue this up for stable :)
Yes, this is broken since earliest commits. The most relevant commit in batman-adv is:
Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs")
But I would propose to use following in the kernel tree:
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released...
But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name email@example.org" [2] reply when it works.
I took a Debian Kernel package (4.14.13-1~bpo9+1), applied the patch and deployed the package on a gateway running BATMAN over VTEPs. The log messages from previous kernels don't show up anymore <3.
Tested-by: Maximilian Wilhelm max@sdn.clinic
Thanks!
Best Max
b.a.t.m.a.n@lists.open-mesh.org