The last round of multicast patches send to the batman-adv mailinglist to add support for the multicast optimizations in bridged scenarios, too, unfortunately had one major conceptual flaw: It could lead to packet loss. It's not sufficient to have the unicasting of reports implemented on bridge-nodes only. Nodes without bridges need to either unicast them, too or as done with this patch flood them uncoditionally.
The issue is described in detail here:
https://www.open-mesh.org/projects/batman-adv/wiki/Multicast-optimizations-l...
This patch substitutes and greatly simplifies the following patchset: "batman-adv: Unicasting multicast reports to querier-node only"
The reason for doing this is, that I think the disadvantages of this simpler approach noted in the wikipage above (e.g. report overhead in large mesh networks) can be better compensated by segmenting the IGMP/MLD domain via ebtables (drop all queries and reports coming out of bat0 or before entering bat0) and the "multicast_router" bridge port flag on bat0 (flood all multicast to this bat0 port, let batman-adv deal with the optimizations from here on). Segmenting the IGMP/MLD domain also has the nice advantage of being able to run a querier on every mesh node which therefore is a way more decentral approach.
Cheers, Linus
-----
Changelog
v1: * Removed query snooping and state * Squashed all three patches into one * Renamed "batadv_mcast_tvlv_ogm_handler_v1()" to *_v2() * Added explicit icmpv6.h include * Rebased on top of master
Old changelog of "batman-adv: Unicasting multicast reports to querier-node only" v6: * compat: copied copyright headers from original upstream c files * compat: unified ordering in compat c files: -> copyright header, then includes, then kernel specific functions v5: * Removed RFC tag: Needed exports got merged to net-next and are going to be available with Linux 4.2 * Redid compat solution - now fully backwards compatible down to 2.6.33 v4: * excluded bridge part from this patchset, they should hopefully be added to net-next soon * Added a compat solution (PATCH 3/3) * Removed Kconfig-depends as by David's suggestion the needed parsing functions for MLD are going to be forced built-ins even if IPv6 is going to be built as a module * Removed unused variable 'int ret' in batadv_mcast_is_report_ipv6() * Adjusted to new folder structure v3: * Adding Kconfig-depends and #if's (so basically adding similar dependancy constraints as the bridge code has, except that there are no depends if batman-adv gets compiled without multicast optimizations) -> the case of IPv6=M and batman-adv=y is still impossible if multicast optimizations are enabled; but I don't see the practical demand for that either - people who use IPv6 as a module will probably also want to use batman-adv as a module v2: * various bugfixes (now runtime tested, too - should(tm) work) * added netdev+bridge mailinglists
With this patch IGMP or MLD reports are always flooded. This is necessary for the upcoming bridge integration:
An IGMPv2/MLDv1 querier does not actively join the multicast group the reports are sent to. Because of this, this would lead to snooping bridges/switches not being able to learn of multicast listeners in the mesh and wrongly shutting down ports for multicast traffic to the mesh, leading to packet loss.
This patch increments the multicast tvlv version number to 2.
Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue --- Makefile | 4 + compat-include/linux/igmp.h | 13 ++ compat-include/linux/skbuff.h | 14 ++ compat-include/net/addrconf.h | 13 ++ compat-include/net/ip6_checksum.h | 18 +++ compat-include/net/ipv6.h | 17 +++ compat-include/net/mld.h | 52 +++++++ compat-sources/Makefile | 3 + compat-sources/net/core/skbuff.c | 179 ++++++++++++++++++++++++ compat-sources/net/ipv4/igmp.c | 241 +++++++++++++++++++++++++++++++++ compat-sources/net/ipv6/mcast_snoop.c | 216 +++++++++++++++++++++++++++++ net/batman-adv/multicast.c | 76 +++++++++-- net/batman-adv/soft-interface.c | 2 + 13 files changed, 840 insertions(+), 8 deletions(-) create mode 100644 compat-include/linux/igmp.h create mode 100644 compat-include/net/addrconf.h create mode 100644 compat-include/net/ip6_checksum.h create mode 100644 compat-include/net/ipv6.h create mode 100644 compat-include/net/mld.h create mode 100644 compat-sources/Makefile create mode 100644 compat-sources/net/core/skbuff.c create mode 100644 compat-sources/net/ipv4/igmp.c create mode 100644 compat-sources/net/ipv6/mcast_snoop.c
diff --git a/Makefile b/Makefile index ee3be1d..69056d7 100644 --- a/Makefile +++ b/Makefile @@ -50,6 +50,10 @@ ifneq ($(REVISION),) NOSTDINC_FLAGS += -DBATADV_SOURCE_VERSION="$(REVISION)" endif
+include $(PWD)/compat-sources/Makefile + +export batman-adv-y + BUILD_FLAGS := \ M=$(PWD)/net/batman-adv \ CONFIG_BATMAN_ADV=m \ diff --git a/compat-include/linux/igmp.h b/compat-include/linux/igmp.h new file mode 100644 index 0000000..f61ab79 --- /dev/null +++ b/compat-include/linux/igmp.h @@ -0,0 +1,13 @@ +#ifndef _NET_BATMAN_ADV_COMPAT_LINUX_IGMP_H_ +#define _NET_BATMAN_ADV_COMPAT_LINUX_IGMP_H_ + +#include <linux/version.h> +#include_next <linux/igmp.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed); + +#endif /* < KERNEL_VERSION(4, 2, 0) */ + +#endif /* _NET_BATMAN_ADV_COMPAT_LINUX_IGMP_H_ */ diff --git a/compat-include/linux/skbuff.h b/compat-include/linux/skbuff.h index d363cc0..1d4f569 100644 --- a/compat-include/linux/skbuff.h +++ b/compat-include/linux/skbuff.h @@ -89,6 +89,20 @@ static inline void skb_reset_mac_len(struct sk_buff *skb)
#define pskb_copy_for_clone pskb_copy
+__sum16 skb_checksum_simple_validate(struct sk_buff *skb); + +__sum16 +skb_checksum_validate(struct sk_buff *skb, int proto, + __wsum (*compute_pseudo)(struct sk_buff *skb, int proto)); + #endif /* < KERNEL_VERSION(3, 16, 0) */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, + unsigned int transport_len, + __sum16(*skb_chkf)(struct sk_buff *skb)); + +#endif /* < KERNEL_VERSION(4, 2, 0) */ + #endif /* _NET_BATMAN_ADV_COMPAT_LINUX_SKBUFF_H_ */ diff --git a/compat-include/net/addrconf.h b/compat-include/net/addrconf.h new file mode 100644 index 0000000..69c45d0 --- /dev/null +++ b/compat-include/net/addrconf.h @@ -0,0 +1,13 @@ +#ifndef _NET_BATMAN_ADV_COMPAT_NET_ADDRCONF_H_ +#define _NET_BATMAN_ADV_COMPAT_NET_ADDRCONF_H_ + +#include <linux/version.h> +#include_next <net/addrconf.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +int ipv6_mc_check_mld(struct sk_buff *skb, struct sk_buff **skb_trimmed); + +#endif /* < KERNEL_VERSION(4, 2, 0) */ + +#endif /* _NET_BATMAN_ADV_COMPAT_NET_ADDRCONF_H_ */ diff --git a/compat-include/net/ip6_checksum.h b/compat-include/net/ip6_checksum.h new file mode 100644 index 0000000..fda0c07 --- /dev/null +++ b/compat-include/net/ip6_checksum.h @@ -0,0 +1,18 @@ +#ifndef _NET_BATMAN_ADV_COMPAT_NET_IP6_CHECKSUM_H_ +#define _NET_BATMAN_ADV_COMPAT_NET_IP6_CHECKSUM_H_ + +#include <linux/version.h> +#include_next <net/ip6_checksum.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 16, 0) + +static inline __wsum ip6_compute_pseudo(struct sk_buff *skb, int proto) +{ + return ~csum_unfold(csum_ipv6_magic(&ipv6_hdr(skb)->saddr, + &ipv6_hdr(skb)->daddr, + skb->len, proto, 0)); +} + +#endif /* < KERNEL_VERSION(3, 16, 0) */ + +#endif /* _NET_BATMAN_ADV_COMPAT_NET_IP6_CHECKSUM_H_ */ diff --git a/compat-include/net/ipv6.h b/compat-include/net/ipv6.h new file mode 100644 index 0000000..1e190d8 --- /dev/null +++ b/compat-include/net/ipv6.h @@ -0,0 +1,17 @@ +#ifndef _NET_BATMAN_ADV_COMPAT_NET_IPV6_H_ +#define _NET_BATMAN_ADV_COMPAT_NET_IPV6_H_ + +#include <linux/version.h> +#include_next <net/ipv6.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 3, 0) + +#define ipv6_skip_exthdr(skb, start, nexthdrp, frag_offp) \ + ({ \ + (void)frag_offp; \ + ipv6_skip_exthdr(skb, start, nexthdrp); \ + }) + +#endif /* < KERNEL_VERSION(3, 3, 0) */ + +#endif /* _NET_BATMAN_ADV_COMPAT_NET_IPV6_H_ */ diff --git a/compat-include/net/mld.h b/compat-include/net/mld.h new file mode 100644 index 0000000..e041eb6 --- /dev/null +++ b/compat-include/net/mld.h @@ -0,0 +1,52 @@ +#ifndef _NET_BATMAN_ADV_COMPAT_NET_MLD_H_ +#define _NET_BATMAN_ADV_COMPAT_NET_MLD_H_ + +#include <linux/version.h> + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 35) +#include_next <net/mld.h> +#endif /* >= KERNEL_VERSION(2, 6, 35) */ + +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 35) +struct mld_msg { + struct icmp6hdr mld_hdr; + struct in6_addr mld_mca; +}; + +#define mld_type mld_hdr.icmp6_type + +struct mld2_grec { + __u8 grec_type; + __u8 grec_auxwords; + __be16 grec_nsrcs; + struct in6_addr grec_mca; + struct in6_addr grec_src[0]; +}; + +struct mld2_report { + struct icmp6hdr mld2r_hdr; + struct mld2_grec mld2r_grec[0]; +}; + +struct mld2_query { + struct icmp6hdr mld2q_hdr; + struct in6_addr mld2q_mca; +#if defined(__LITTLE_ENDIAN_BITFIELD) + __u8 mld2q_qrv:3, + mld2q_suppress:1, + mld2q_resv2:4; +#elif defined(__BIG_ENDIAN_BITFIELD) + __u8 mld2q_resv2:4, + mld2q_suppress:1, + mld2q_qrv:3; +#else +#error "Please fix <asm/byteorder.h>" +#endif + __u8 mld2q_qqic; + __be16 mld2q_nsrcs; + struct in6_addr mld2q_srcs[0]; +}; + +#endif /* < KERNEL_VERSION(2, 6, 35) */ + +#endif /* _NET_BATMAN_ADV_COMPAT_NET_MLD_H_ */ diff --git a/compat-sources/Makefile b/compat-sources/Makefile new file mode 100644 index 0000000..c364ded --- /dev/null +++ b/compat-sources/Makefile @@ -0,0 +1,3 @@ +batman-adv-y += ../../compat-sources/net/core/skbuff.o +batman-adv-y += ../../compat-sources/net/ipv4/igmp.o +batman-adv-y += ../../compat-sources/net/ipv6/mcast_snoop.o diff --git a/compat-sources/net/core/skbuff.c b/compat-sources/net/core/skbuff.c new file mode 100644 index 0000000..7ad23fe --- /dev/null +++ b/compat-sources/net/core/skbuff.c @@ -0,0 +1,179 @@ +/* + * Routines having to do with the 'struct sk_buff' memory handlers. + * + * Authors: Alan Cox alan@lxorguk.ukuu.org.uk + * Florian La Roche rzsfl@rz.uni-sb.de + * + * Fixes: + * Alan Cox : Fixed the worst of the load + * balancer bugs. + * Dave Platt : Interrupt stacking fix. + * Richard Kooijman : Timestamp fixes. + * Alan Cox : Changed buffer format. + * Alan Cox : destructor hook for AF_UNIX etc. + * Linus Torvalds : Better skb_clone. + * Alan Cox : Added skb_copy. + * Alan Cox : Added all the changed routines Linus + * only put in the headers + * Ray VanTassle : Fixed --skb->lock in free + * Alan Cox : skb_copy copy arp field + * Andi Kleen : slabified it. + * Robert Olsson : Removed skb_head_pool + * + * NOTE: + * The __skb_ routines should be called with interrupts + * disabled, or you better be *real* sure that the operation is atomic + * with respect to whatever list is being frobbed (e.g. via lock_sock() + * or via disabling bottom half handlers, etc). + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include <linux/ipv6.h> +#include <linux/skbuff.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 16, 0) + +/* Compare with: + * "bridge: multicast: call skb_checksum_{simple_, }validate" + */ +__sum16 skb_checksum_simple_validate(struct sk_buff *skb) +{ + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (!csum_fold(skb->csum)) + break; + /* fall through */ + case CHECKSUM_NONE: + skb->csum = 0; + if (skb_checksum_complete(skb)) + return -EINVAL; + } + + return 0; +} + +/* Watch out: Not as generic as upstream + * - redefines this method to only fit with ICMPV6 + * + * Compare with: + * "bridge: multicast: call skb_checksum_{simple_, }validate" + */ +__sum16 +skb_checksum_validate(struct sk_buff *skb, int proto, + __wsum (*compute_pseudo)(struct sk_buff *skb, int proto)) +{ + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + + switch (skb->ip_summed) { + case CHECKSUM_COMPLETE: + if (!csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, skb->len, + IPPROTO_ICMPV6, skb->csum)) + break; + /*FALLTHROUGH*/ + case CHECKSUM_NONE: + skb->csum = ~csum_unfold(csum_ipv6_magic(&ip6h->saddr, + &ip6h->daddr, + skb->len, + IPPROTO_ICMPV6, 0)); + if (__skb_checksum_complete(skb)) + return -EINVAL; + } + + return 0; +} + +#endif /* < KERNEL_VERSION(3, 16, 0) */ + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +/** + * skb_checksum_maybe_trim - maybe trims the given skb + * @skb: the skb to check + * @transport_len: the data length beyond the network header + * + * Checks whether the given skb has data beyond the given transport length. + * If so, returns a cloned skb trimmed to this transport length. + * Otherwise returns the provided skb. Returns NULL in error cases + * (e.g. transport_len exceeds skb length or out-of-memory). + * + * Caller needs to set the skb transport header and release the returned skb. + * Provided skb is consumed. + */ +static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb, + unsigned int transport_len) +{ + struct sk_buff *skb_chk; + unsigned int len = skb_transport_offset(skb) + transport_len; + int ret; + + if (skb->len < len) { + kfree_skb(skb); + return NULL; + } else if (skb->len == len) { + return skb; + } + + skb_chk = skb_clone(skb, GFP_ATOMIC); + kfree_skb(skb); + + if (!skb_chk) + return NULL; + + ret = pskb_trim_rcsum(skb_chk, len); + if (ret) { + kfree_skb(skb_chk); + return NULL; + } + + return skb_chk; +} + +/** + * skb_checksum_trimmed - validate checksum of an skb + * @skb: the skb to check + * @transport_len: the data length beyond the network header + * @skb_chkf: checksum function to use + * + * Applies the given checksum function skb_chkf to the provided skb. + * Returns a checked and maybe trimmed skb. Returns NULL on error. + * + * If the skb has data beyond the given transport length, then a + * trimmed & cloned skb is checked and returned. + * + * Caller needs to set the skb transport header and release the returned skb. + * Provided skb is consumed. + */ +struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, + unsigned int transport_len, + __sum16(*skb_chkf)(struct sk_buff *skb)) +{ + struct sk_buff *skb_chk; + unsigned int offset = skb_transport_offset(skb); + __sum16 ret; + + skb_chk = skb_checksum_maybe_trim(skb, transport_len); + if (!skb_chk) + return NULL; + + if (!pskb_may_pull(skb_chk, offset)) { + kfree_skb(skb_chk); + return NULL; + } + + __skb_pull(skb_chk, offset); + ret = skb_chkf(skb_chk); + __skb_push(skb_chk, offset); + + if (ret) { + kfree_skb(skb_chk); + return NULL; + } + + return skb_chk; +} + +#endif /* < KERNEL_VERSION(4, 2, 0) */ diff --git a/compat-sources/net/ipv4/igmp.c b/compat-sources/net/ipv4/igmp.c new file mode 100644 index 0000000..457a05e --- /dev/null +++ b/compat-sources/net/ipv4/igmp.c @@ -0,0 +1,241 @@ +/* + * Linux NET3: Internet Group Management Protocol [IGMP] + * + * This code implements the IGMP protocol as defined in RFC1112. There has + * been a further revision of this protocol since which is now supported. + * + * If you have trouble with this module be careful what gcc you have used, + * the older version didn't come out right using gcc 2.5.8, the newer one + * seems to fall out with gcc 2.6.2. + * + * Authors: + * Alan Cox alan@lxorguk.ukuu.org.uk + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + * + * Alan Cox : Added lots of __inline__ to optimise + * the memory usage of all the tiny little + * functions. + * Alan Cox : Dumped the header building experiment. + * Alan Cox : Minor tweaks ready for multicast routing + * and extended IGMP protocol. + * Alan Cox : Removed a load of inline directives. Gcc 2.5.8 + * writes utterly bogus code otherwise (sigh) + * fixed IGMP loopback to behave in the manner + * desired by mrouted, fixed the fact it has been + * broken since 1.3.6 and cleaned up a few minor + * points. + * + * Chih-Jen Chang : Tried to revise IGMP to Version 2 + * Tsu-Sheng Tsao E-mail: chihjenc@scf.usc.edu and tsusheng@scf.usc.edu + * The enhancements are mainly based on Steve Deering's + * ipmulti-3.5 source code. + * Chih-Jen Chang : Added the igmp_get_mrouter_info and + * Tsu-Sheng Tsao igmp_set_mrouter_info to keep track of + * the mrouted version on that device. + * Chih-Jen Chang : Added the max_resp_time parameter to + * Tsu-Sheng Tsao igmp_heard_query(). Using this parameter + * to identify the multicast router version + * and do what the IGMP version 2 specified. + * Chih-Jen Chang : Added a timer to revert to IGMP V2 router + * Tsu-Sheng Tsao if the specified time expired. + * Alan Cox : Stop IGMP from 0.0.0.0 being accepted. + * Alan Cox : Use GFP_ATOMIC in the right places. + * Christian Daudt : igmp timer wasn't set for local group + * memberships but was being deleted, + * which caused a "del_timer() called + * from %p with timer not initialized\n" + * message (960131). + * Christian Daudt : removed del_timer from + * igmp_timer_expire function (960205). + * Christian Daudt : igmp_heard_report now only calls + * igmp_timer_expire if tm->running is + * true (960216). + * Malcolm Beattie : ttl comparison wrong in igmp_rcv made + * igmp_heard_query never trigger. Expiry + * miscalculation fixed in igmp_heard_query + * and random() made to return unsigned to + * prevent negative expiry times. + * Alexey Kuznetsov: Wrong group leaving behaviour, backport + * fix from pending 2.1.x patches. + * Alan Cox: Forget to enable FDDI support earlier. + * Alexey Kuznetsov: Fixed leaving groups on device down. + * Alexey Kuznetsov: Accordance to igmp-v2-06 draft. + * David L Stevens: IGMPv3 support, with help from + * Vinay Kulkarni + */ + +#include <linux/igmp.h> +#include <linux/ip.h> +#include <linux/skbuff.h> +#include <net/ip.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +static int ip_mc_check_iphdr(struct sk_buff *skb) +{ + const struct iphdr *iph; + unsigned int len; + unsigned int offset = skb_network_offset(skb) + sizeof(*iph); + + if (!pskb_may_pull(skb, offset)) + return -EINVAL; + + iph = ip_hdr(skb); + + if (iph->version != 4 || ip_hdrlen(skb) < sizeof(*iph)) + return -EINVAL; + + offset += ip_hdrlen(skb) - sizeof(*iph); + + if (!pskb_may_pull(skb, offset)) + return -EINVAL; + + iph = ip_hdr(skb); + + if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) + return -EINVAL; + + len = skb_network_offset(skb) + ntohs(iph->tot_len); + if (skb->len < len || len < offset) + return -EINVAL; + + skb_set_transport_header(skb, offset); + + return 0; +} + +static int ip_mc_check_igmp_reportv3(struct sk_buff *skb) +{ + unsigned int len = skb_transport_offset(skb); + + len += sizeof(struct igmpv3_report); + + return pskb_may_pull(skb, len) ? 0 : -EINVAL; +} + +static int ip_mc_check_igmp_query(struct sk_buff *skb) +{ + unsigned int len = skb_transport_offset(skb); + + len += sizeof(struct igmphdr); + if (skb->len < len) + return -EINVAL; + + /* IGMPv{1,2}? */ + if (skb->len != len) { + /* or IGMPv3? */ + len += sizeof(struct igmpv3_query) - sizeof(struct igmphdr); + if (skb->len < len || !pskb_may_pull(skb, len)) + return -EINVAL; + } + + /* RFC2236+RFC3376 (IGMPv2+IGMPv3) require the multicast link layer + * all-systems destination addresses (224.0.0.1) for general queries + */ + if (!igmp_hdr(skb)->group && + ip_hdr(skb)->daddr != htonl(INADDR_ALLHOSTS_GROUP)) + return -EINVAL; + + return 0; +} + +static int ip_mc_check_igmp_msg(struct sk_buff *skb) +{ + switch (igmp_hdr(skb)->type) { + case IGMP_HOST_LEAVE_MESSAGE: + case IGMP_HOST_MEMBERSHIP_REPORT: + case IGMPV2_HOST_MEMBERSHIP_REPORT: + /* fall through */ + return 0; + case IGMPV3_HOST_MEMBERSHIP_REPORT: + return ip_mc_check_igmp_reportv3(skb); + case IGMP_HOST_MEMBERSHIP_QUERY: + return ip_mc_check_igmp_query(skb); + default: + return -ENOMSG; + } +} + +static inline __sum16 ip_mc_validate_checksum(struct sk_buff *skb) +{ + return skb_checksum_simple_validate(skb); +} + +static int __ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed) + +{ + struct sk_buff *skb_chk; + unsigned int transport_len; + unsigned int len = skb_transport_offset(skb) + sizeof(struct igmphdr); + int ret; + + transport_len = ntohs(ip_hdr(skb)->tot_len) - ip_hdrlen(skb); + + skb_get(skb); + skb_chk = skb_checksum_trimmed(skb, transport_len, + ip_mc_validate_checksum); + if (!skb_chk) + return -EINVAL; + + if (!pskb_may_pull(skb_chk, len)) { + kfree_skb(skb_chk); + return -EINVAL; + } + + ret = ip_mc_check_igmp_msg(skb_chk); + if (ret) { + kfree_skb(skb_chk); + return ret; + } + + if (skb_trimmed) + *skb_trimmed = skb_chk; + else + kfree_skb(skb_chk); + + return 0; +} + +/** + * ip_mc_check_igmp - checks whether this is a sane IGMP packet + * @skb: the skb to validate + * @skb_trimmed: to store an skb pointer trimmed to IPv4 packet tail (optional) + * + * Checks whether an IPv4 packet is a valid IGMP packet. If so sets + * skb network and transport headers accordingly and returns zero. + * + * -EINVAL: A broken packet was detected, i.e. it violates some internet + * standard + * -ENOMSG: IP header validation succeeded but it is not an IGMP packet. + * -ENOMEM: A memory allocation failure happened. + * + * Optionally, an skb pointer might be provided via skb_trimmed (or set it + * to NULL): After parsing an IGMP packet successfully it will point to + * an skb which has its tail aligned to the IP packet end. This might + * either be the originally provided skb or a trimmed, cloned version if + * the skb frame had data beyond the IP packet. A cloned skb allows us + * to leave the original skb and its full frame unchanged (which might be + * desirable for layer 2 frame jugglers). + * + * The caller needs to release a reference count from any returned skb_trimmed. + */ +int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed) +{ + int ret = ip_mc_check_iphdr(skb); + + if (ret < 0) + return ret; + + if (ip_hdr(skb)->protocol != IPPROTO_IGMP) + return -ENOMSG; + + return __ip_mc_check_igmp(skb, skb_trimmed); +} + +#endif /* < KERNEL_VERSION(4, 2, 0) */ diff --git a/compat-sources/net/ipv6/mcast_snoop.c b/compat-sources/net/ipv6/mcast_snoop.c new file mode 100644 index 0000000..3f46ed3 --- /dev/null +++ b/compat-sources/net/ipv6/mcast_snoop.c @@ -0,0 +1,216 @@ +/* Copyright (C) 2010: YOSHIFUJI Hideaki yoshfuji@linux-ipv6.org + * Copyright (C) 2015: Linus Lüssing linus.luessing@c0d3.blue + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see http://www.gnu.org/licenses/. + * + * + * Based on the MLD support added to br_multicast.c by YOSHIFUJI Hideaki. + */ + +#include <linux/skbuff.h> +#include <net/ipv6.h> +#include <net/mld.h> +#include <net/addrconf.h> +#include <net/ip6_checksum.h> + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0) + +static int ipv6_mc_check_ip6hdr(struct sk_buff *skb) +{ + const struct ipv6hdr *ip6h; + unsigned int len; + unsigned int offset = skb_network_offset(skb) + sizeof(*ip6h); + + if (!pskb_may_pull(skb, offset)) + return -EINVAL; + + ip6h = ipv6_hdr(skb); + + if (ip6h->version != 6) + return -EINVAL; + + len = offset + ntohs(ip6h->payload_len); + if (skb->len < len || len <= offset) + return -EINVAL; + + return 0; +} + +static int ipv6_mc_check_exthdrs(struct sk_buff *skb) +{ + const struct ipv6hdr *ip6h; + int offset; + u8 nexthdr; + __be16 frag_off; + + ip6h = ipv6_hdr(skb); + + if (ip6h->nexthdr != IPPROTO_HOPOPTS) + return -ENOMSG; + + nexthdr = ip6h->nexthdr; + offset = skb_network_offset(skb) + sizeof(*ip6h); + offset = ipv6_skip_exthdr(skb, offset, &nexthdr, &frag_off); + + if (offset < 0) + return -EINVAL; + + if (nexthdr != IPPROTO_ICMPV6) + return -ENOMSG; + + skb_set_transport_header(skb, offset); + + return 0; +} + +static int ipv6_mc_check_mld_reportv2(struct sk_buff *skb) +{ + unsigned int len = skb_transport_offset(skb); + + len += sizeof(struct mld2_report); + + return pskb_may_pull(skb, len) ? 0 : -EINVAL; +} + +static int ipv6_mc_check_mld_query(struct sk_buff *skb) +{ + struct mld_msg *mld; + unsigned int len = skb_transport_offset(skb); + + /* RFC2710+RFC3810 (MLDv1+MLDv2) require link-local source addresses */ + if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) + return -EINVAL; + + len += sizeof(struct mld_msg); + if (skb->len < len) + return -EINVAL; + + /* MLDv1? */ + if (skb->len != len) { + /* or MLDv2? */ + len += sizeof(struct mld2_query) - sizeof(struct mld_msg); + if (skb->len < len || !pskb_may_pull(skb, len)) + return -EINVAL; + } + + mld = (struct mld_msg *)skb_transport_header(skb); + + /* RFC2710+RFC3810 (MLDv1+MLDv2) require the multicast link layer + * all-nodes destination address (ff02::1) for general queries + */ + if (ipv6_addr_any(&mld->mld_mca) && + !ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr)) + return -EINVAL; + + return 0; +} + +static int ipv6_mc_check_mld_msg(struct sk_buff *skb) +{ + struct mld_msg *mld = (struct mld_msg *)skb_transport_header(skb); + + switch (mld->mld_type) { + case ICMPV6_MGM_REDUCTION: + case ICMPV6_MGM_REPORT: + /* fall through */ + return 0; + case ICMPV6_MLD2_REPORT: + return ipv6_mc_check_mld_reportv2(skb); + case ICMPV6_MGM_QUERY: + return ipv6_mc_check_mld_query(skb); + default: + return -ENOMSG; + } +} + +static inline __sum16 ipv6_mc_validate_checksum(struct sk_buff *skb) +{ + return skb_checksum_validate(skb, IPPROTO_ICMPV6, ip6_compute_pseudo); +} + +static int __ipv6_mc_check_mld(struct sk_buff *skb, + struct sk_buff **skb_trimmed) + +{ + struct sk_buff *skb_chk = NULL; + unsigned int transport_len; + unsigned int len = skb_transport_offset(skb) + sizeof(struct mld_msg); + int ret; + + transport_len = ntohs(ipv6_hdr(skb)->payload_len); + transport_len -= skb_transport_offset(skb) - sizeof(struct ipv6hdr); + + skb_get(skb); + skb_chk = skb_checksum_trimmed(skb, transport_len, + ipv6_mc_validate_checksum); + if (!skb_chk) + return -EINVAL; + + if (!pskb_may_pull(skb_chk, len)) { + kfree_skb(skb_chk); + return -EINVAL; + } + + ret = ipv6_mc_check_mld_msg(skb_chk); + if (ret) { + kfree_skb(skb_chk); + return ret; + } + + if (skb_trimmed) + *skb_trimmed = skb_chk; + else + kfree_skb(skb_chk); + + return 0; +} + +/** + * ipv6_mc_check_mld - checks whether this is a sane MLD packet + * @skb: the skb to validate + * @skb_trimmed: to store an skb pointer trimmed to IPv6 packet tail (optional) + * + * Checks whether an IPv6 packet is a valid MLD packet. If so sets + * skb network and transport headers accordingly and returns zero. + * + * -EINVAL: A broken packet was detected, i.e. it violates some internet + * standard + * -ENOMSG: IP header validation succeeded but it is not an MLD packet. + * -ENOMEM: A memory allocation failure happened. + * + * Optionally, an skb pointer might be provided via skb_trimmed (or set it + * to NULL): After parsing an MLD packet successfully it will point to + * an skb which has its tail aligned to the IP packet end. This might + * either be the originally provided skb or a trimmed, cloned version if + * the skb frame had data beyond the IP packet. A cloned skb allows us + * to leave the original skb and its full frame unchanged (which might be + * desirable for layer 2 frame jugglers). + * + * The caller needs to release a reference count from any returned skb_trimmed. + */ +int ipv6_mc_check_mld(struct sk_buff *skb, struct sk_buff **skb_trimmed) +{ + int ret; + + ret = ipv6_mc_check_ip6hdr(skb); + if (ret < 0) + return ret; + + ret = ipv6_mc_check_exthdrs(skb); + if (ret < 0) + return ret; + + return __ipv6_mc_check_mld(skb, skb_trimmed); +} + +#endif /* < KERNEL_VERSION(4, 2, 0) */ diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index c64d9b1..3543587 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -25,7 +25,9 @@ #include <linux/errno.h> #include <linux/etherdevice.h> #include <linux/fs.h> +#include <linux/icmpv6.h> #include <linux/if_ether.h> +#include <linux/igmp.h> #include <linux/in6.h> #include <linux/in.h> #include <linux/ip.h> @@ -42,6 +44,7 @@ #include <linux/types.h> #include <net/addrconf.h> #include <net/ipv6.h> +#include <net/mld.h>
#include "packet.h" #include "translation-table.h" @@ -225,7 +228,7 @@ static bool batadv_mcast_mla_tvlv_update(struct batadv_priv *bat_priv) if (batadv_mcast_has_bridge(bat_priv)) { if (bat_priv->mcast.enabled) { batadv_tvlv_container_unregister(bat_priv, - BATADV_TVLV_MCAST, 1); + BATADV_TVLV_MCAST, 2); bat_priv->mcast.enabled = false; }
@@ -234,7 +237,7 @@ static bool batadv_mcast_mla_tvlv_update(struct batadv_priv *bat_priv)
if (!bat_priv->mcast.enabled || mcast_data.flags != bat_priv->mcast.flags) { - batadv_tvlv_container_register(bat_priv, BATADV_TVLV_MCAST, 1, + batadv_tvlv_container_register(bat_priv, BATADV_TVLV_MCAST, 2, &mcast_data, sizeof(mcast_data)); bat_priv->mcast.flags = mcast_data.flags; bat_priv->mcast.enabled = true; @@ -272,6 +275,30 @@ out: }
/** + * batadv_mcast_is_report_ipv4 - check for IGMP reports + * @skb: the ethernet frame destined for the mesh + * + * Checks whether the given frame is an IGMP report and if so returns true. + * Otherwise returns false. + * + * This call might reallocate skb data. + */ +static bool batadv_mcast_is_report_ipv4(struct sk_buff *skb) +{ + if (ip_mc_check_igmp(skb, NULL) < 0) + return false; + + switch (igmp_hdr(skb)->type) { + case IGMP_HOST_MEMBERSHIP_REPORT: + case IGMPV2_HOST_MEMBERSHIP_REPORT: + case IGMPV3_HOST_MEMBERSHIP_REPORT: + return true; + } + + return false; +} + +/** * batadv_mcast_forw_mode_check_ipv4 - check for optimized forwarding potential * @bat_priv: the bat priv with all the soft interface information * @skb: the IPv4 packet to check @@ -293,6 +320,9 @@ static int batadv_mcast_forw_mode_check_ipv4(struct batadv_priv *bat_priv, if (!pskb_may_pull(skb, sizeof(struct ethhdr) + sizeof(*iphdr))) return -ENOMEM;
+ if (batadv_mcast_is_report_ipv4(skb)) + return -EINVAL; + iphdr = ip_hdr(skb);
/* TODO: Implement Multicast Router Discovery (RFC4286), @@ -309,6 +339,30 @@ static int batadv_mcast_forw_mode_check_ipv4(struct batadv_priv *bat_priv, return 0; }
+#if IS_ENABLED(CONFIG_IPV6) +/** + * batadv_mcast_is_report_ipv6 - check for MLD reports + * @skb: the ethernet frame destined for the mesh + * + * Checks whether the given frame is an MLD report and if so returns true. + * Otherwise returns false. + * + * This call might reallocate skb data. + */ +static bool batadv_mcast_is_report_ipv6(struct sk_buff *skb) +{ + if (ipv6_mc_check_mld(skb, NULL) < 0) + return false; + + switch (icmp6_hdr(skb)->icmp6_type) { + case ICMPV6_MGM_REPORT: + case ICMPV6_MLD2_REPORT: + return true; + } + + return false; +} + /** * batadv_mcast_forw_mode_check_ipv6 - check for optimized forwarding potential * @bat_priv: the bat priv with all the soft interface information @@ -331,6 +385,9 @@ static int batadv_mcast_forw_mode_check_ipv6(struct batadv_priv *bat_priv, if (!pskb_may_pull(skb, sizeof(struct ethhdr) + sizeof(*ip6hdr))) return -ENOMEM;
+ if (batadv_mcast_is_report_ipv6(skb)) + return -EINVAL; + ip6hdr = ipv6_hdr(skb);
/* TODO: Implement Multicast Router Discovery (RFC4286), @@ -347,6 +404,7 @@ static int batadv_mcast_forw_mode_check_ipv6(struct batadv_priv *bat_priv,
return 0; } +#endif
/** * batadv_mcast_forw_mode_check - check for optimized forwarding potential @@ -376,9 +434,11 @@ static int batadv_mcast_forw_mode_check(struct batadv_priv *bat_priv, case ETH_P_IP: return batadv_mcast_forw_mode_check_ipv4(bat_priv, skb, is_unsnoopable); +#if IS_ENABLED(CONFIG_IPV6) case ETH_P_IPV6: return batadv_mcast_forw_mode_check_ipv6(bat_priv, skb, is_unsnoopable); +#endif default: return -EINVAL; } @@ -712,14 +772,14 @@ static void batadv_mcast_want_ipv6_update(struct batadv_priv *bat_priv, }
/** - * batadv_mcast_tvlv_ogm_handler_v1 - process incoming multicast tvlv container + * batadv_mcast_tvlv_ogm_handler_v2 - process incoming multicast tvlv container * @bat_priv: the bat priv with all the soft interface information * @orig: the orig_node of the ogm * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) * @tvlv_value: tvlv buffer containing the multicast data * @tvlv_value_len: tvlv buffer length */ -static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, +static void batadv_mcast_tvlv_ogm_handler_v2(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, u8 flags, void *tvlv_value, @@ -772,8 +832,8 @@ static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, */ void batadv_mcast_init(struct batadv_priv *bat_priv) { - batadv_tvlv_handler_register(bat_priv, batadv_mcast_tvlv_ogm_handler_v1, - NULL, BATADV_TVLV_MCAST, 1, + batadv_tvlv_handler_register(bat_priv, batadv_mcast_tvlv_ogm_handler_v2, + NULL, BATADV_TVLV_MCAST, 2, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); }
@@ -783,8 +843,8 @@ void batadv_mcast_init(struct batadv_priv *bat_priv) */ void batadv_mcast_free(struct batadv_priv *bat_priv) { - batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1); - batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1); + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 2); + batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 2);
batadv_mcast_mla_tt_retract(bat_priv, NULL); } diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index df1ff2c..73b956b 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -206,6 +206,8 @@ static int batadv_interface_tx(struct sk_buff *skb, if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) goto dropped;
+ skb_set_network_header(skb, ETH_HLEN); + soft_iface->trans_start = jiffies; vid = batadv_get_vid(skb, 0); ethhdr = eth_hdr(skb);
On Sunday 21 June 2015 15:36:17 Linus Lüssing wrote:
With this patch IGMP or MLD reports are always flooded. This is necessary for the upcoming bridge integration:
An IGMPv2/MLDv1 querier does not actively join the multicast group the reports are sent to. Because of this, this would lead to snooping bridges/switches not being able to learn of multicast listeners in the mesh and wrongly shutting down ports for multicast traffic to the mesh, leading to packet loss.
This patch increments the multicast tvlv version number to 2.
Signed-off-by: Linus Lüssing linus.luessing@c0d3.blue
After discussing with Linus, I agree that we should use the more simple approach and focus our optimizations effort on actual multicast traffic instead of the reports. I think this approach is much more simpler and easier to debug - since the multicast subject itself is complex enough on its own.
The patch is pretty much straight forward, most of the code is actually adopted from the Linux kernel. Therefore:
Acked-by: Simon Wunderlich sw@simonwunderlich.de
Thank you! Simon
On Sunday, June 21, 2015 15:36:17 Linus Lüssing wrote:
With this patch IGMP or MLD reports are always flooded. This is necessary for the upcoming bridge integration:
An IGMPv2/MLDv1 querier does not actively join the multicast group the reports are sent to. Because of this, this would lead to snooping bridges/switches not being able to learn of multicast listeners in the mesh and wrongly shutting down ports for multicast traffic to the mesh, leading to packet loss.
I am not quite clear on what the patch does. It helps to support a feature that is yet to come (upcoming bridge integration) or improves the situation today ?
/**
- batadv_mcast_forw_mode_check - check for optimized forwarding potential
@@ -376,9 +434,11 @@ static int batadv_mcast_forw_mode_check(struct batadv_priv *bat_priv, case ETH_P_IP: return batadv_mcast_forw_mode_check_ipv4(bat_priv, skb, is_unsnoopable); +#if IS_ENABLED(CONFIG_IPV6) case ETH_P_IPV6: return batadv_mcast_forw_mode_check_ipv6(bat_priv, skb, is_unsnoopable); +#endif default: return -EINVAL; }
This hunk seems not really related to the patch itself ?
/**
- batadv_mcast_tvlv_ogm_handler_v1 - process incoming multicast tvlv
container + * batadv_mcast_tvlv_ogm_handler_v2 - process incoming multicast tvlv container * @bat_priv: the bat priv with all the soft interface information * @orig: the orig_node of the ogm
- @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags)
- @tvlv_value: tvlv buffer containing the multicast data
- @tvlv_value_len: tvlv buffer length
*/ -static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, +static void batadv_mcast_tvlv_ogm_handler_v2(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, u8 flags, void *tvlv_value,
By removing mcast v1 you effectively are breaking compatibility with all nodes running v1 and require everyone to upgrade to v2. Is there a good reason for that ? The multicast optimizations can't easily work with v1 and v2 ?
By reading your patch it does not become clear why v2 is needed in the first place. The tvlv's content seems to be identical ?
@@ -206,6 +206,8 @@ static int batadv_interface_tx(struct sk_buff *skb, if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) goto dropped;
- skb_set_network_header(skb, ETH_HLEN);
- soft_iface->trans_start = jiffies; vid = batadv_get_vid(skb, 0); ethhdr = eth_hdr(skb);
Why is this change part of the patch ?
Cheers, Marek
On Sun, Jun 28, 2015 at 09:37:20AM +0800, Marek Lindner wrote:
On Sunday, June 21, 2015 15:36:17 Linus Lüssing wrote:
With this patch IGMP or MLD reports are always flooded. This is necessary for the upcoming bridge integration:
An IGMPv2/MLDv1 querier does not actively join the multicast group the reports are sent to. Because of this, this would lead to snooping bridges/switches not being able to learn of multicast listeners in the mesh and wrongly shutting down ports for multicast traffic to the mesh, leading to packet loss.
I am not quite clear on what the patch does. It helps to support a feature that is yet to come (upcoming bridge integration) or improves the situation today ?
The former. It's not fixing or improving anything for the current implementation.
/**
- batadv_mcast_forw_mode_check - check for optimized forwarding potential
@@ -376,9 +434,11 @@ static int batadv_mcast_forw_mode_check(struct batadv_priv *bat_priv, case ETH_P_IP: return batadv_mcast_forw_mode_check_ipv4(bat_priv, skb, is_unsnoopable); +#if IS_ENABLED(CONFIG_IPV6) case ETH_P_IPV6: return batadv_mcast_forw_mode_check_ipv6(bat_priv, skb, is_unsnoopable); +#endif default: return -EINVAL; }
This hunk seems not really related to the patch itself ?
I think it's necessary for the new ipv6_mc_check_mld() which isn't there if building a kernel without any IPv6 support.
/**
- batadv_mcast_tvlv_ogm_handler_v1 - process incoming multicast tvlv
container + * batadv_mcast_tvlv_ogm_handler_v2 - process incoming multicast tvlv container * @bat_priv: the bat priv with all the soft interface information * @orig: the orig_node of the ogm
- @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags)
- @tvlv_value: tvlv buffer containing the multicast data
- @tvlv_value_len: tvlv buffer length
*/ -static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, +static void batadv_mcast_tvlv_ogm_handler_v2(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, u8 flags, void *tvlv_value,
By removing mcast v1 you effectively are breaking compatibility with all nodes running v1 and require everyone to upgrade to v2.
No, no v1 nodes are required to upgrade. v1 and v2 nodes are still able to communicate. v1 nodes (like any pre v1 node) might downgrade the mesh to a mcast-optimizations-disabled state for now though, yes.
Is there a good reason for that ? The multicast optimizations can't easily work with v1 and v2 ?
By reading your patch it does not become clear why v2 is needed in the first place. The tvlv's content seems to be identical ?
We cannot safely use the multicast optimizations with bridges if there are nodes which do not handle reports properly. The bump isn't needed for any packet changes but the internal, local behaviour of a node.
I haven't heard of anyone using the multicast optimization feature in practice yet (that is, a setup without bridges), so I think it is safe to do a version bump?
@@ -206,6 +206,8 @@ static int batadv_interface_tx(struct sk_buff *skb, if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) goto dropped;
- skb_set_network_header(skb, ETH_HLEN);
- soft_iface->trans_start = jiffies; vid = batadv_get_vid(skb, 0); ethhdr = eth_hdr(skb);
Why is this change part of the patch ?
Hm, good question :). Need to recheck, I vaguely remember having had issues with the IP header parsing due to unset skb network headers.
Cheers, Marek
Cheers, Linus
On Sunday, June 28, 2015 05:59:14 Linus Lüssing wrote:
I am not quite clear on what the patch does. It helps to support a feature that is yet to come (upcoming bridge integration) or improves the situation today ?
The former. It's not fixing or improving anything for the current implementation.
If this patch isn't doing anything (for now) maybe it should be merged when it becomes useful ?
+#if IS_ENABLED(CONFIG_IPV6)
case ETH_P_IPV6: return batadv_mcast_forw_mode_check_ipv6(bat_priv, skb, is_unsnoopable);
+#endif
default: return -EINVAL;
}
This hunk seems not really related to the patch itself ?
I think it's necessary for the new ipv6_mc_check_mld() which isn't there if building a kernel without any IPv6 support.
You may want to send a separate patch then ? This change seems unrelated to the rest.
By removing mcast v1 you effectively are breaking compatibility with all nodes running v1 and require everyone to upgrade to v2.
No, no v1 nodes are required to upgrade. v1 and v2 nodes are still able to communicate. v1 nodes (like any pre v1 node) might downgrade the mesh to a mcast-optimizations-disabled state for now though, yes.
I do understand that 'normal' packet exchange is not affected. However, the TVLVs were introduced with the intend of maintaining best possible compatibility with future versions. With the first tvlv version bump we already require upgrading everyone or compatibility is already broken ? Is there really nothing we can do ? v2 could at least be compatible to v1 ?
We cannot safely use the multicast optimizations with bridges if there are nodes which do not handle reports properly. The bump isn't needed for any packet changes but the internal, local behaviour of a node.
I haven't heard of anyone using the multicast optimization feature in practice yet (that is, a setup without bridges), so I think it is safe to do a version bump?
A version bump for a feature which does not do anything useful yet (see my initial question) ? How likely is it that we will need another version bump by the time this feature does become useful ?
Hm, good question :). Need to recheck, I vaguely remember having had issues with the IP header parsing due to unset skb network headers.
If it is related to this patch please add a comment. The change is non- obvious.
Cheers, Marek
On Sun, Jun 28, 2015 at 09:21:34PM +0800, Marek Lindner wrote:
On Sunday, June 28, 2015 05:59:14 Linus Lüssing wrote:
I am not quite clear on what the patch does. It helps to support a feature that is yet to come (upcoming bridge integration) or improves the situation today ?
The former. It's not fixing or improving anything for the current implementation.
If this patch isn't doing anything (for now) maybe it should be merged when it becomes useful ?
Hm, good point. The intention was to keep things in small chunks for easy reviewing. With the unicasted reports approach there were more lines of code changed and an actual (but broken / insufficient) change / "improvement". Now it got rather small with no practical advantage so far. So yes, could be added to the multicast-bridge patchset.
+#if IS_ENABLED(CONFIG_IPV6)
case ETH_P_IPV6: return batadv_mcast_forw_mode_check_ipv6(bat_priv, skb, is_unsnoopable);
+#endif
default: return -EINVAL;
}
This hunk seems not really related to the patch itself ?
I think it's necessary for the new ipv6_mc_check_mld() which isn't there if building a kernel without any IPv6 support.
You may want to send a separate patch then ? This change seems unrelated to the rest.
The use of ipv6_mc_check_mld() is added in this patch. I think it's necessary in this patch?
I'm not quite sure if I understand. If I move this "#if" and the one around the function definition to a seperate patch after this one then I get for net-next (with the version number in the Makefile changed to 4.2) with CONFIG_IPV6 disabled:
"WARNING: "ipv6_mc_check_mld" [/home/tux/dev/batman-adv-t_x/net/batman-adv/batman-adv.ko] undefined!"
The "#if"'s aren't necessary prior this patch, so making an extra patch before this patch here doesn't make sense to me either. The "#if's" become necessary with the changes introduced in this patch.
By removing mcast v1 you effectively are breaking compatibility with all nodes running v1 and require everyone to upgrade to v2.
No, no v1 nodes are required to upgrade. v1 and v2 nodes are still able to communicate. v1 nodes (like any pre v1 node) might downgrade the mesh to a mcast-optimizations-disabled state for now though, yes.
I do understand that 'normal' packet exchange is not affected. However, the TVLVs were introduced with the intend of maintaining best possible compatibility with future versions. With the first tvlv version bump we already require upgrading everyone or compatibility is already broken ? Is there really nothing we can do ? v2 could at least be compatible to v1 ?
Hm, you mean while v1 nodes would downgrade a bridged v2 setup, whether we could at least keep things compatible in a non-bridged setup with a mix of v1 and v2 nodes?
The alternative to bumping the version number would be to introduce another flag into the mcast tvlv and keep counters and lists similar to the num_disabled or want_* ones. So maybe about 80 more lines of code and additional state and complexity to maintain.
My current feeling is that these 80 lines of code would never be used / useful for someone:
The only benefit of this approach arises if someone is using a non-bridged mesh of v2014.2.0 to v2015.0 without any v2014.4.0 or v2014.1.0 nodes. For one thing there aren't many people using non-bridged setups I think. For another these few people probably don't rely on the multicast feature since without bridges there usually isn't that much multicast traffic to safe with the current feature.
We cannot safely use the multicast optimizations with bridges if there are nodes which do not handle reports properly. The bump isn't needed for any packet changes but the internal, local behaviour of a node.
I haven't heard of anyone using the multicast optimization feature in practice yet (that is, a setup without bridges), so I think it is safe to do a version bump?
A version bump for a feature which does not do anything useful yet (see my initial question) ? How likely is it that we will need another version bump by the time this feature does become useful ?
Hopefully none ;). The next revision for the multicast bridge integration is ready for submission in the branch linus/multicast-bridge.
Of coures, if more people would like to dig into the conceptual part of IGMP/MLD and the planned integration with batman-adv as described on the wiki, I'd definitely welcome that.
But I'm starting to get a good feeling about this :P. The concept isn't easy but it's now been thought about and discussed a lot between Simon and me over the last two years.
I'm also angry with myself that I missed this issue before, but I think it wasn't really a trivial one either. On the plus side, it made me really paranoid and I probably got over the whole concept and the wiki pages another ten times to make sure there isn't another one year set back because of having missed something. So I can't say with 100% certainity that something else was missed and could introduce another version bump but I'm also quite confident, that it's rather unlikely that we might have missed another such issue.
Hm, good question :). Need to recheck, I vaguely remember having had issues with the IP header parsing due to unset skb network headers.
If it is related to this patch please add a comment. The change is non- obvious.
Oki doki.
Cheers, Marek
Cheers, Linus
On Mon, Jun 29, 2015 at 04:52:48AM +0200, Linus Lüssing wrote:
The alternative to bumping the version number would be to introduce another flag into the mcast tvlv and keep counters and lists similar to the num_disabled or want_* ones. So maybe about 80 more lines of code and additional state and complexity to maintain.
Another alternative I just had to think about would be to register both a v1 and v2 TVLV container while only registering a handler for v2. That'd add just two or four extra lines of code with the same result. The disadvantage would be a few bytes extra overhead for dragging a v1 TVLV in the OGMs along.
Actually, I think I'd kinda like that as it has a reasonable benefit-to-cost ratio. What do others think?
(And maybe such legacy multicast v1 TVLV could then safely be obsoleted/removed in four years or after evaluating whether someone still needs it.)
I was asked to give some abstract, non-technical explanation for this drastic change to the previous patchsets':
There are conceptual issues with the always-unicast approach for some, maybe rare scenarios.
(One could argue on the technical level though whether we'd care about these issues - the bridge for instance doesn't and has a manual switch as a workaround to change to always-flood for specific bridge ports; and I generally wouldn't care for a Freifunk mesh network either but think I found a "better approach" for less overhead there)
Cheers, Linus
On Sun, Jun 21, 2015 at 03:36:16PM +0200, Linus Lüssing wrote:
The last round of multicast patches send to the batman-adv mailinglist to add support for the multicast optimizations in bridged scenarios, too, unfortunately had one major conceptual flaw: It could lead to packet loss. It's not sufficient to have the unicasting of reports implemented on bridge-nodes only. Nodes without bridges need to either unicast them, too or as done with this patch flood them uncoditionally.
The issue is described in detail here:
https://www.open-mesh.org/projects/batman-adv/wiki/Multicast-optimizations-l...
This patch substitutes and greatly simplifies the following patchset: "batman-adv: Unicasting multicast reports to querier-node only"
The reason for doing this is, that I think the disadvantages of this simpler approach noted in the wikipage above (e.g. report overhead in large mesh networks) can be better compensated by segmenting the IGMP/MLD domain via ebtables (drop all queries and reports coming out of bat0 or before entering bat0) and the "multicast_router" bridge port flag on bat0 (flood all multicast to this bat0 port, let batman-adv deal with the optimizations from here on). Segmenting the IGMP/MLD domain also has the nice advantage of being able to run a querier on every mesh node which therefore is a way more decentral approach.
Cheers, Linus
Changelog
v1:
- Removed query snooping and state
- Squashed all three patches into one
- Renamed "batadv_mcast_tvlv_ogm_handler_v1()" to *_v2()
- Added explicit icmpv6.h include
- Rebased on top of master
Old changelog of "batman-adv: Unicasting multicast reports to querier-node only" v6:
- compat: copied copyright headers from original upstream c files
- compat: unified ordering in compat c files: -> copyright header, then includes, then kernel specific functions
v5:
- Removed RFC tag: Needed exports got merged to net-next and are going to be available with Linux 4.2
- Redid compat solution - now fully backwards compatible down to 2.6.33
v4:
- excluded bridge part from this patchset, they should hopefully be added to net-next soon
- Added a compat solution (PATCH 3/3)
- Removed Kconfig-depends as by David's suggestion the needed parsing functions for MLD are going to be forced built-ins even if IPv6 is going to be built as a module
- Removed unused variable 'int ret' in batadv_mcast_is_report_ipv6()
- Adjusted to new folder structure
v3:
- Adding Kconfig-depends and #if's (so basically adding similar dependancy constraints as the bridge code has, except that there are no depends if batman-adv gets compiled without multicast optimizations) -> the case of IPv6=M and batman-adv=y is still impossible if multicast optimizations are enabled; but I don't see the practical demand for that either - people who use IPv6 as a module will probably also want to use batman-adv as a module
v2:
- various bugfixes (now runtime tested, too - should(tm) work)
- added netdev+bridge mailinglists
b.a.t.m.a.n@lists.open-mesh.org