The following commit has been merged in the master branch: commit 3ab0a7a0c349a1d7beb2bb371a62669d1528269d Merge: 92ec804f3dbf0d986f8e10850bfff14f316d7aaf 805c6d3c19210c90c109107d189744e960eae025 Author: David S. Miller davem@davemloft.net Date: Tue Sep 22 16:45:34 2020 -0700
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node.
Signed-off-by: David S. Miller davem@davemloft.net
diff --combined Documentation/networking/ethtool-netlink.rst index 2c8e0ddf548e,b5a79881551f..30b98245979f --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@@ -68,7 -68,6 +68,7 @@@ the flags may not apply to requests. Re ================================= =================================== ``ETHTOOL_FLAG_COMPACT_BITSETS`` use compact format bitsets in reply ``ETHTOOL_FLAG_OMIT_REPLY`` omit optional reply (_SET and _ACT) + ``ETHTOOL_FLAG_STATS`` include optional device statistics ================================= ===================================
New request flags should follow the general idea that if the flag is not set, @@@ -207,6 -206,7 +207,7 @@@ Userspace to kernel ``ETHTOOL_MSG_TSINFO_GET`` get timestamping info ``ETHTOOL_MSG_CABLE_TEST_ACT`` action start cable test ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT`` action start raw TDR cable test + ``ETHTOOL_MSG_TUNNEL_INFO_GET`` get tunnel offload info ===================================== ================================
Kernel to userspace: @@@ -240,6 -240,7 +241,7 @@@ ``ETHTOOL_MSG_TSINFO_GET_REPLY`` timestamping info ``ETHTOOL_MSG_CABLE_TEST_NTF`` Cable test results ``ETHTOOL_MSG_CABLE_TEST_TDR_NTF`` Cable test TDR results + ``ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY`` tunnel offload info ===================================== =================================
``GET`` requests are sent by userspace applications to retrieve device @@@ -990,18 -991,8 +992,18 @@@ Kernel response contents ``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation ``ETHTOOL_A_PAUSE_RX`` bool receive pause frames ``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames + ``ETHTOOL_A_PAUSE_STATS`` nested pause statistics ===================================== ====== ==========================
+``ETHTOOL_A_PAUSE_STATS`` are reported if ``ETHTOOL_FLAG_STATS`` was set +in ``ETHTOOL_A_HEADER_FLAGS``. +It will be empty if driver did not report any statistics. Drivers fill in +the statistics in the following structure: + +.. kernel-doc:: include/linux/ethtool.h + :identifiers: ethtool_pause_stats + +Each member has a corresponding attribute defined.
PAUSE_SET ============ @@@ -1374,4 -1365,5 +1376,5 @@@ are netlink only ``ETHTOOL_SFECPARAM`` n/a n/a ''ETHTOOL_MSG_CABLE_TEST_ACT'' n/a ''ETHTOOL_MSG_CABLE_TEST_TDR_ACT'' + n/a ``ETHTOOL_MSG_TUNNEL_INFO_GET`` =================================== ===================================== diff --combined MAINTAINERS index e3c1c70057e4,9350506a1127..42c69d2eeece --- a/MAINTAINERS +++ b/MAINTAINERS @@@ -1286,7 -1286,7 +1286,7 @@@ S: Supporte F: Documentation/devicetree/bindings/net/apm-xgene-enet.txt F: Documentation/devicetree/bindings/net/apm-xgene-mdio.txt F: drivers/net/ethernet/apm/xgene/ -F: drivers/net/phy/mdio-xgene.c +F: drivers/net/mdio/mdio-xgene.c
APPLIED MICRO (APM) X-GENE SOC PMU M: Khuong Dinh khuong@os.amperecomputing.com @@@ -1694,7 -1694,6 +1694,6 @@@ F: arch/arm/mach-cns3xxx
ARM/CAVIUM THUNDER NETWORK DRIVER M: Sunil Goutham sgoutham@marvell.com - M: Robert Richter rrichter@marvell.com L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Supported F: drivers/net/ethernet/cavium/thunder/ @@@ -3948,8 -3947,8 +3947,8 @@@ W: https://wireless.wiki.kernel.org/en/ F: drivers/net/wireless/ath/carl9170/
CAVIUM I2C DRIVER - M: Robert Richter rrichter@marvell.com - S: Supported + M: Robert Richter rric@kernel.org + S: Odd Fixes W: http://www.marvell.com F: drivers/i2c/busses/i2c-octeon* F: drivers/i2c/busses/i2c-thunderx* @@@ -3964,8 -3963,8 +3963,8 @@@ W: http://www.marvell.co F: drivers/net/ethernet/cavium/liquidio/
CAVIUM MMC DRIVER - M: Robert Richter rrichter@marvell.com - S: Supported + M: Robert Richter rric@kernel.org + S: Odd Fixes W: http://www.marvell.com F: drivers/mmc/host/cavium*
@@@ -3977,9 -3976,9 +3976,9 @@@ W: http://www.marvell.co F: drivers/crypto/cavium/cpt/
CAVIUM THUNDERX2 ARM64 SOC - M: Robert Richter rrichter@marvell.com + M: Robert Richter rric@kernel.org L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) - S: Maintained + S: Odd Fixes F: Documentation/devicetree/bindings/arm/cavium-thunder2.txt F: arch/arm64/boot/dts/cavium/thunder2-99xx*
@@@ -4258,6 -4257,8 +4257,8 @@@ S: Maintaine F: .clang-format
CLANG/LLVM BUILD SUPPORT + M: Nathan Chancellor natechancellor@gmail.com + M: Nick Desaulniers ndesaulniers@google.com L: clang-built-linux@googlegroups.com S: Supported W: https://clangbuiltlinux.github.io/ @@@ -4407,12 -4408,6 +4408,6 @@@ T: git git://git.infradead.org/users/hc F: fs/configfs/ F: include/linux/configfs.h
- CONNECTOR - M: Evgeniy Polyakov zbr@ioremap.net - L: netdev@vger.kernel.org - S: Maintained - F: drivers/connector/ - CONSOLE SUBSYSTEM M: Greg Kroah-Hartman gregkh@linuxfoundation.org S: Supported @@@ -4709,15 -4704,6 +4704,15 @@@ S: Supporte W: http://www.chelsio.com F: drivers/crypto/chelsio
+CXGB4 INLINE CRYPTO DRIVER +M: Ayush Sawal ayush.sawal@chelsio.com +M: Vinay Kumar Yadav vinay.yadav@chelsio.com +M: Rohit Maheshwari rohitm@chelsio.com +L: netdev@vger.kernel.org +S: Supported +W: http://www.chelsio.com +F: drivers/net/ethernet/chelsio/inline_crypto/ + CXGB4 ETHERNET DRIVER (CXGB4) M: Vishal Kulkarni vishal@chelsio.com L: netdev@vger.kernel.org @@@ -6188,7 -6174,7 +6183,7 @@@ F: Documentation/devicetree/bindings/ed F: drivers/edac/aspeed_edac.c
EDAC-BLUEFIELD - M: Shravan Kumar Ramani sramani@nvidia.com + M: Shravan Kumar Ramani shravankr@nvidia.com S: Supported F: drivers/edac/bluefield_edac.c
@@@ -6200,16 -6186,15 +6195,15 @@@ F: drivers/edac/highbank
EDAC-CAVIUM OCTEON M: Ralf Baechle ralf@linux-mips.org - M: Robert Richter rrichter@marvell.com L: linux-edac@vger.kernel.org L: linux-mips@vger.kernel.org S: Supported F: drivers/edac/octeon_edac*
EDAC-CAVIUM THUNDERX - M: Robert Richter rrichter@marvell.com + M: Robert Richter rric@kernel.org L: linux-edac@vger.kernel.org - S: Supported + S: Odd Fixes F: drivers/edac/thunderx_edac*
EDAC-CORE @@@ -6217,7 -6202,7 +6211,7 @@@ M: Borislav Petkov <bp@alien8.de M: Mauro Carvalho Chehab mchehab@kernel.org M: Tony Luck tony.luck@intel.com R: James Morse james.morse@arm.com - R: Robert Richter rrichter@marvell.com + R: Robert Richter rric@kernel.org L: linux-edac@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next @@@ -6530,14 -6515,11 +6524,14 @@@ F: Documentation/devicetree/bindings/ne F: Documentation/devicetree/bindings/net/mdio* F: Documentation/devicetree/bindings/net/qca,ar803x.yaml F: Documentation/networking/phy.rst +F: drivers/net/mdio/ +F: drivers/net/pcs/ F: drivers/net/phy/ F: drivers/of/of_mdio.c F: drivers/of/of_net.c F: include/dt-bindings/net/qca-ar803x.h F: include/linux/*mdio*.h +F: include/linux/mdio/*.h F: include/linux/of_net.h F: include/linux/phy.h F: include/linux/phy_fixed.h @@@ -6913,6 -6895,14 +6907,14 @@@ L: linuxppc-dev@lists.ozlabs.or S: Maintained F: drivers/dma/fsldma.*
+ FREESCALE DSPI DRIVER + M: Vladimir Oltean olteanv@gmail.com + L: linux-spi@vger.kernel.org + S: Maintained + F: Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt + F: drivers/spi/spi-fsl-dspi.c + F: include/linux/spi/spi-fsl-dspi.h + FREESCALE ENETC ETHERNET DRIVERS M: Claudiu Manoil claudiu.manoil@nxp.com L: netdev@vger.kernel.org @@@ -8284,7 -8274,7 +8286,7 @@@ IA64 (Itanium) PLATFOR M: Tony Luck tony.luck@intel.com M: Fenghua Yu fenghua.yu@intel.com L: linux-ia64@vger.kernel.org - S: Maintained + S: Odd Fixes T: git git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git F: Documentation/ia64/ F: arch/ia64/ @@@ -8333,8 -8323,9 +8335,9 @@@ S: Supporte F: drivers/pci/hotplug/rpaphp*
IBM Power SRIOV Virtual NIC Device Driver - M: Thomas Falcon tlfalcon@linux.ibm.com - M: John Allen jallen@linux.ibm.com + M: Dany Madden drt@linux.ibm.com + M: Lijun Pan ljp@linux.ibm.com + M: Sukadev Bhattiprolu sukadev@linux.ibm.com L: netdev@vger.kernel.org S: Supported F: drivers/net/ethernet/ibm/ibmvnic.* @@@ -8348,7 -8339,7 +8351,7 @@@ F: arch/powerpc/platforms/powernv/copy- F: arch/powerpc/platforms/powernv/vas*
IBM Power Virtual Ethernet Device Driver - M: Thomas Falcon tlfalcon@linux.ibm.com + M: Cristobal Forno cforno12@linux.ibm.com L: netdev@vger.kernel.org S: Supported F: drivers/net/ethernet/ibm/ibmveth.* @@@ -9255,7 -9246,7 +9258,7 @@@ F: drivers/firmware/iscsi_ibft
ISCSI EXTENSIONS FOR RDMA (ISER) INITIATOR M: Sagi Grimberg sagi@grimberg.me - M: Max Gurtovoy maxg@nvidia.com + M: Max Gurtovoy mgurtovoy@nvidia.com L: linux-rdma@vger.kernel.org S: Supported W: http://www.openfabrics.org @@@ -9804,7 -9795,7 +9807,7 @@@ F: drivers/scsi/53c700
LEAKING_ADDRESSES M: Tobin C. Harding me@tobin.cc - M: Tycho Andersen tycho@tycho.ws + M: Tycho Andersen tycho@tycho.pizza L: kernel-hardening@lists.openwall.com S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tobin/leaks.git @@@ -10310,13 -10301,6 +10313,13 @@@ S: Maintaine W: http://linux-test-project.github.io/ T: git git://github.com/linux-test-project/ltp.git
+LYNX PCS MODULE +M: Ioana Ciornei ioana.ciornei@nxp.com +L: netdev@vger.kernel.org +S: Supported +F: drivers/net/pcs/pcs-lynx.c +F: include/linux/pcs-lynx.h + M68K ARCHITECTURE M: Geert Uytterhoeven geert@linux-m68k.org L: linux-m68k@lists.linux-m68k.org @@@ -10524,7 -10508,7 +10527,7 @@@ M: Tobias Waldekranz <tobias@waldekranz L: netdev@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/net/marvell,mvusb.yaml -F: drivers/net/phy/mdio-mvusb.c +F: drivers/net/mdio/mdio-mvusb.c
MARVELL XENON MMC/SD/SDIO HOST CONTROLLER DRIVER M: Hu Ziji huziji@marvell.com @@@ -10671,15 -10655,6 +10674,15 @@@ L: linux-input@vger.kernel.or S: Maintained F: drivers/hid/hid-mcp2221.c
+MCP25XXFD SPI-CAN NETWORK DRIVER +M: Marc Kleine-Budde mkl@pengutronix.de +M: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org +R: Thomas Kopp thomas.kopp@microchip.com +L: linux-can@vger.kernel.org +S: Maintained +F: Documentation/devicetree/bindings/net/can/microchip,mcp25xxfd.yaml +F: drivers/net/can/spi/mcp25xxfd/ + MCP4018 AND MCP4531 MICROCHIP DIGITAL POTENTIOMETER DRIVERS M: Peter Rosin peda@axentia.se L: linux-iio@vger.kernel.org @@@ -11062,6 -11037,7 +11065,7 @@@ F: drivers/char/hw_random/mtk-rng.
MEDIATEK SWITCH DRIVER M: Sean Wang sean.wang@mediatek.com + M: Landen Chao Landen.Chao@mediatek.com L: netdev@vger.kernel.org S: Maintained F: drivers/net/dsa/mt7530.* @@@ -12075,6 -12051,7 +12079,7 @@@ Q: http://patchwork.ozlabs.org/project/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git F: Documentation/devicetree/bindings/net/ + F: drivers/connector/ F: drivers/net/ F: include/linux/etherdevice.h F: include/linux/fcdevice.h @@@ -13474,10 -13451,10 +13479,10 @@@ F: Documentation/devicetree/bindings/pc F: drivers/pci/controller/dwc/*artpec*
PCIE DRIVER FOR CAVIUM THUNDERX - M: Robert Richter rrichter@marvell.com + M: Robert Richter rric@kernel.org L: linux-pci@vger.kernel.org L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) - S: Supported + S: Odd Fixes F: drivers/pci/controller/pci-thunder-*
PCIE DRIVER FOR HISILICON @@@ -14416,7 -14393,7 +14421,7 @@@ M: Rob Clark <robdclark@gmail.com L: iommu@lists.linux-foundation.org L: linux-arm-msm@vger.kernel.org S: Maintained - F: drivers/iommu/qcom_iommu.c + F: drivers/iommu/arm/arm-smmu/qcom_iommu.c
QUALCOMM IPCC MAILBOX DRIVER M: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org @@@ -15301,11 -15278,10 +15306,11 @@@ F: drivers/media/platform/s3c-camif F: include/media/drv-intf/s3c_camif.h
SAMSUNG S3FWRN5 NFC DRIVER -M: Robert Baldyga r.baldyga@samsung.com +M: Krzysztof Kozlowski krzk@kernel.org M: Krzysztof Opasiak k.opasiak@samsung.com L: linux-nfc@lists.01.org (moderated for non-subscribers) -S: Supported +S: Maintained +F: Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml F: drivers/nfc/s3fwrn5
SAMSUNG S5C73M3 CAMERA DRIVER @@@ -15598,6 -15574,7 +15603,7 @@@ F: include/uapi/linux/sed SECURITY CONTACT M: Security Officers security@kernel.org S: Supported + F: Documentation/admin-guide/security-bugs.rst
SECURITY SUBSYSTEM M: James Morris jmorris@namei.org @@@ -15689,7 -15666,6 +15695,7 @@@ L: netdev@vger.kernel.or S: Maintained F: drivers/net/phy/phylink.c F: drivers/net/phy/sfp* +F: include/linux/mdio/mdio-i2c.h F: include/linux/phylink.h F: include/linux/sfp.h K: phylink.h|struct\s+phylink|.phylink|>phylink_|phylink_(autoneg|clear|connect|create|destroy|disconnect|ethtool|helper|mac|mii|of|set|start|stop|test|validate) @@@ -16774,8 -16750,8 +16780,8 @@@ SYNOPSYS DESIGNWARE ETHERNET XPCS DRIVE M: Jose Abreu Jose.Abreu@synopsys.com L: netdev@vger.kernel.org S: Supported -F: drivers/net/phy/mdio-xpcs.c -F: include/linux/mdio-xpcs.h +F: drivers/net/pcs/pcs-xpcs.c +F: include/linux/pcs/pcs-xpcs.h
SYNOPSYS DESIGNWARE I2C DRIVER M: Jarkko Nikula jarkko.nikula@linux.intel.com @@@ -17267,8 -17243,8 +17273,8 @@@ S: Maintaine F: drivers/net/thunderbolt.c
THUNDERX GPIO DRIVER - M: Robert Richter rrichter@marvell.com - S: Maintained + M: Robert Richter rric@kernel.org + S: Odd Fixes F: drivers/gpio/gpio-thunderx.c
TI AM437X VPFE DRIVER diff --combined drivers/net/dsa/microchip/ksz9477.c index b62dd64470a8,2f5506ac7d19..153664bf0e20 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@@ -1208,7 -1208,7 +1208,7 @@@ static void ksz9477_port_setup(struct k
/* configure MAC to 1G & RGMII mode */ ksz_pread8(dev, port, REG_PORT_XMII_CTRL_1, &data8); - switch (dev->interface) { + switch (p->interface) { case PHY_INTERFACE_MODE_MII: ksz9477_set_xmii(dev, 0, &data8); ksz9477_set_gbit(dev, false, &data8); @@@ -1229,15 -1229,12 +1229,15 @@@ ksz9477_set_gbit(dev, true, &data8); data8 &= ~PORT_RGMII_ID_IG_ENABLE; data8 &= ~PORT_RGMII_ID_EG_ENABLE; - if (dev->interface == PHY_INTERFACE_MODE_RGMII_ID || - dev->interface == PHY_INTERFACE_MODE_RGMII_RXID) + if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || + p->interface == PHY_INTERFACE_MODE_RGMII_RXID) data8 |= PORT_RGMII_ID_IG_ENABLE; - if (dev->interface == PHY_INTERFACE_MODE_RGMII_ID || - dev->interface == PHY_INTERFACE_MODE_RGMII_TXID) + if (p->interface == PHY_INTERFACE_MODE_RGMII_ID || + p->interface == PHY_INTERFACE_MODE_RGMII_TXID) data8 |= PORT_RGMII_ID_EG_ENABLE; + /* On KSZ9893, disable RGMII in-band status support */ + if (dev->features & IS_9893) + data8 &= ~PORT_MII_MAC_MODE; p->phydev.speed = SPEED_1000; break; } @@@ -1268,37 -1265,36 +1268,46 @@@ static void ksz9477_config_cpu_port(str for (i = 0; i < dev->port_cnt; i++) { if (dsa_is_cpu_port(ds, i) && (dev->cpu_ports & (1 << i))) { phy_interface_t interface; + const char *prev_msg; + const char *prev_mode;
dev->cpu_port = i; dev->host_mask = (1 << dev->cpu_port); dev->port_mask |= dev->host_mask; + p = &dev->ports[i];
/* Read from XMII register to determine host port * interface. If set specifically in device tree * note the difference to help debugging. */ interface = ksz9477_get_interface(dev, i); - if (!dev->interface) - dev->interface = interface; - if (interface && interface != dev->interface) { + if (!p->interface) { + if (dev->compat_interface) { + dev_warn(dev->dev, + "Using legacy switch "phy-mode" property, because it is missing on port %d node. " + "Please update your device tree.\n", + i); + p->interface = dev->compat_interface; + } else { + p->interface = interface; + } + } - if (interface && interface != p->interface) - dev_info(dev->dev, - "use %s instead of %s\n", - phy_modes(p->interface), - phy_modes(interface)); ++ if (interface && interface != p->interface) { + prev_msg = " instead of "; + prev_mode = phy_modes(interface); + } else { + prev_msg = ""; + prev_mode = ""; + } + dev_info(dev->dev, + "Port%d: using phy mode %s%s%s\n", + i, - phy_modes(dev->interface), ++ phy_modes(p->interface), + prev_msg, + prev_mode);
/* enable cpu port */ ksz9477_port_setup(dev, i, true); - p = &dev->ports[dev->cpu_port]; p->vid_member = dev->port_mask; p->on = 1; } @@@ -1439,12 -1435,10 +1448,12 @@@ static int ksz9477_switch_detect(struc /* Default capability is gigabit capable. */ dev->features = GBIT_SUPPORT;
+ dev_dbg(dev->dev, "Switch detect: ID=%08x%02x\n", id32, data8); id_hi = (u8)(id32 >> 16); id_lo = (u8)(id32 >> 8); if ((id_lo & 0xf) == 3) { /* Chip is from KSZ9893 design. */ + dev_info(dev->dev, "Found KSZ9893\n"); dev->features |= IS_9893;
/* Chip does not support gigabit. */ @@@ -1453,7 -1447,6 +1462,7 @@@ dev->mib_port_cnt = 3; dev->phy_port_cnt = 2; } else { + dev_info(dev->dev, "Found KSZ9477 or compatible\n"); /* Chip uses new XMII register definitions. */ dev->features |= NEW_XMII;
diff --combined drivers/net/dsa/microchip/ksz_common.c index a31738662d95,8e755b50c9c1..cb534547c715 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@@ -388,6 -388,8 +388,8 @@@ int ksz_switch_register(struct ksz_devi const struct ksz_dev_ops *ops) { phy_interface_t interface; + struct device_node *port; + unsigned int port_num; int ret;
if (dev->pdata) @@@ -400,9 -402,8 +402,9 @@@
if (dev->reset_gpio) { gpiod_set_value_cansleep(dev->reset_gpio, 1); - mdelay(10); + usleep_range(10000, 12000); gpiod_set_value_cansleep(dev->reset_gpio, 0); + usleep_range(100, 1000); }
mutex_init(&dev->dev_mutex); @@@ -422,10 -423,19 +424,19 @@@ /* Host port interface will be self detected, or specifically set in * device tree. */ + for (port_num = 0; port_num < dev->port_cnt; ++port_num) + dev->ports[port_num].interface = PHY_INTERFACE_MODE_NA; if (dev->dev->of_node) { ret = of_get_phy_mode(dev->dev->of_node, &interface); if (ret == 0) - dev->interface = interface; + dev->compat_interface = interface; + for_each_available_child_of_node(dev->dev->of_node, port) { + if (of_property_read_u32(port, "reg", &port_num)) + continue; + if (port_num >= dev->port_cnt) + return -EINVAL; + of_get_phy_mode(port, &dev->ports[port_num].interface); + } dev->synclko_125 = of_property_read_bool(dev->dev->of_node, "microchip,synclko-125"); } diff --combined drivers/net/dsa/ocelot/felix.c index f9a7034be0c7,01427cd08448..5f395d4119ac --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@@ -19,7 -19,6 +19,7 @@@ #include <linux/of_net.h> #include <linux/pci.h> #include <linux/of.h> +#include <linux/pcs-lynx.h> #include <net/pkt_sched.h> #include <net/dsa.h> #include "felix.h" @@@ -197,16 -196,27 +197,16 @@@ static void felix_phylink_validate(stru felix->info->phylink_validate(ocelot, port, supported, state); }
-static int felix_phylink_mac_pcs_get_state(struct dsa_switch *ds, int port, - struct phylink_link_state *state) -{ - struct ocelot *ocelot = ds->priv; - struct felix *felix = ocelot_to_felix(ocelot); - - if (felix->info->pcs_link_state) - felix->info->pcs_link_state(ocelot, port, state); - - return 0; -} - static void felix_phylink_mac_config(struct dsa_switch *ds, int port, unsigned int link_an_mode, const struct phylink_link_state *state) { struct ocelot *ocelot = ds->priv; struct felix *felix = ocelot_to_felix(ocelot); + struct dsa_port *dp = dsa_to_port(ds, port);
- if (felix->info->pcs_config) - felix->info->pcs_config(ocelot, port, link_an_mode, state); + if (felix->pcs[port]) + phylink_set_pcs(dp->pl, &felix->pcs[port]->pcs); }
static void felix_phylink_mac_link_down(struct dsa_switch *ds, int port, @@@ -296,6 -306,10 +296,6 @@@ static void felix_phylink_mac_link_up(s ocelot_fields_write(ocelot, port, QSYS_SWITCH_PORT_MODE_PORT_ENA, 1);
- if (felix->info->pcs_link_up) - felix->info->pcs_link_up(ocelot, port, link_an_mode, interface, - speed, duplex); - if (felix->info->port_sched_speed_set) felix->info->port_sched_speed_set(ocelot, port, speed); } @@@ -538,6 -552,23 +538,6 @@@ static int felix_init_structs(struct fe return 0; }
-static struct ptp_clock_info ocelot_ptp_clock_info = { - .owner = THIS_MODULE, - .name = "felix ptp", - .max_adj = 0x7fffffff, - .n_alarm = 0, - .n_ext_ts = 0, - .n_per_out = OCELOT_PTP_PINS_NUM, - .n_pins = OCELOT_PTP_PINS_NUM, - .pps = 0, - .gettime64 = ocelot_ptp_gettime64, - .settime64 = ocelot_ptp_settime64, - .adjtime = ocelot_ptp_adjtime, - .adjfine = ocelot_ptp_adjfine, - .verify = ocelot_ptp_verify, - .enable = ocelot_ptp_enable, -}; - /* Hardware initialization done here so that we can allocate structures with * devm without fear of dsa_register_switch returning -EPROBE_DEFER and causing * us to allocate structures twice (leak memory) and map PCI memory twice @@@ -554,9 -585,12 +554,12 @@@ static int felix_setup(struct dsa_switc if (err) return err;
- ocelot_init(ocelot); + err = ocelot_init(ocelot); + if (err) + return err; + if (ocelot->ptp) { - err = ocelot_init_timestamp(ocelot, &ocelot_ptp_clock_info); + err = ocelot_init_timestamp(ocelot, felix->info->ptp_caps); if (err) { dev_err(ocelot->dev, "Timestamp initialization failed\n"); @@@ -596,6 -630,11 +599,6 @@@
ds->mtu_enforcement_ingress = true; ds->configure_vlan_while_not_filtering = true; - /* It looks like the MAC/PCS interrupt register - PM0_IEVENT (0x8040) - * isn't instantiated for the Felix PF. - * In-band AN may take a few ms to complete, so we need to poll. - */ - ds->pcs_poll = true;
return 0; } @@@ -604,10 -643,13 +607,13 @@@ static void felix_teardown(struct dsa_s { struct ocelot *ocelot = ds->priv; struct felix *felix = ocelot_to_felix(ocelot); + int port;
if (felix->info->mdio_bus_free) felix->info->mdio_bus_free(ocelot);
+ for (port = 0; port < ocelot->num_phys_ports; port++) + ocelot_deinit_port(ocelot, port); ocelot_deinit_timestamp(ocelot); /* stop workqueue thread */ ocelot_deinit(ocelot); @@@ -751,6 -793,7 +757,6 @@@ const struct dsa_switch_ops felix_switc .get_sset_count = felix_get_sset_count, .get_ts_info = felix_get_ts_info, .phylink_validate = felix_phylink_validate, - .phylink_mac_link_state = felix_phylink_mac_pcs_get_state, .phylink_mac_config = felix_phylink_mac_config, .phylink_mac_link_down = felix_phylink_mac_link_down, .phylink_mac_link_up = felix_phylink_mac_link_up, @@@ -780,5 -823,31 +786,5 @@@ .cls_flower_add = felix_cls_flower_add, .cls_flower_del = felix_cls_flower_del, .cls_flower_stats = felix_cls_flower_stats, - .port_setup_tc = felix_port_setup_tc, + .port_setup_tc = felix_port_setup_tc, }; - -static int __init felix_init(void) -{ - int err; - - err = pci_register_driver(&felix_vsc9959_pci_driver); - if (err) - return err; - - err = platform_driver_register(&seville_vsc9953_driver); - if (err) - return err; - - return 0; -} -module_init(felix_init); - -static void __exit felix_exit(void) -{ - pci_unregister_driver(&felix_vsc9959_pci_driver); - platform_driver_unregister(&seville_vsc9953_driver); -} -module_exit(felix_exit); - -MODULE_DESCRIPTION("Felix Switch driver"); -MODULE_LICENSE("GPL v2"); diff --combined drivers/net/dsa/ocelot/felix_vsc9959.c index 79ddc4ba27a3,6855c94256f8..3ab6d6847c5b --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@@ -9,7 -9,6 +9,7 @@@ #include <soc/mscc/ocelot_sys.h> #include <soc/mscc/ocelot.h> #include <linux/packing.h> +#include <linux/pcs-lynx.h> #include <net/pkt_sched.h> #include <linux/iopoll.h> #include <linux/mdio.h> @@@ -296,15 -295,15 +296,15 @@@ static const u32 vsc9959_sys_regmap[] };
static const u32 vsc9959_ptp_regmap[] = { - REG(PTP_PIN_CFG, 0x000000), - REG(PTP_PIN_TOD_SEC_MSB, 0x000004), - REG(PTP_PIN_TOD_SEC_LSB, 0x000008), - REG(PTP_PIN_TOD_NSEC, 0x00000c), - REG(PTP_PIN_WF_HIGH_PERIOD, 0x000014), - REG(PTP_PIN_WF_LOW_PERIOD, 0x000018), - REG(PTP_CFG_MISC, 0x0000a0), - REG(PTP_CLK_CFG_ADJ_CFG, 0x0000a4), - REG(PTP_CLK_CFG_ADJ_FREQ, 0x0000a8), + REG(PTP_PIN_CFG, 0x000000), + REG(PTP_PIN_TOD_SEC_MSB, 0x000004), + REG(PTP_PIN_TOD_SEC_LSB, 0x000008), + REG(PTP_PIN_TOD_NSEC, 0x00000c), + REG(PTP_PIN_WF_HIGH_PERIOD, 0x000014), + REG(PTP_PIN_WF_LOW_PERIOD, 0x000018), + REG(PTP_CFG_MISC, 0x0000a0), + REG(PTP_CLK_CFG_ADJ_CFG, 0x0000a4), + REG(PTP_CLK_CFG_ADJ_FREQ, 0x0000a8), };
static const u32 vsc9959_gcb_regmap[] = { @@@ -646,17 -645,17 +646,17 @@@ static struct vcap_field vsc9959_vcap_i [VCAP_IS2_HK_DIP_EQ_SIP] = {118, 1}, /* IP4_TCP_UDP (TYPE=100) */ [VCAP_IS2_HK_TCP] = {119, 1}, - [VCAP_IS2_HK_L4_SPORT] = {120, 16}, - [VCAP_IS2_HK_L4_DPORT] = {136, 16}, + [VCAP_IS2_HK_L4_DPORT] = {120, 16}, + [VCAP_IS2_HK_L4_SPORT] = {136, 16}, [VCAP_IS2_HK_L4_RNG] = {152, 8}, [VCAP_IS2_HK_L4_SPORT_EQ_DPORT] = {160, 1}, [VCAP_IS2_HK_L4_SEQUENCE_EQ0] = {161, 1}, - [VCAP_IS2_HK_L4_URG] = {162, 1}, - [VCAP_IS2_HK_L4_ACK] = {163, 1}, - [VCAP_IS2_HK_L4_PSH] = {164, 1}, - [VCAP_IS2_HK_L4_RST] = {165, 1}, - [VCAP_IS2_HK_L4_SYN] = {166, 1}, - [VCAP_IS2_HK_L4_FIN] = {167, 1}, + [VCAP_IS2_HK_L4_FIN] = {162, 1}, + [VCAP_IS2_HK_L4_SYN] = {163, 1}, + [VCAP_IS2_HK_L4_RST] = {164, 1}, + [VCAP_IS2_HK_L4_PSH] = {165, 1}, + [VCAP_IS2_HK_L4_ACK] = {166, 1}, + [VCAP_IS2_HK_L4_URG] = {167, 1}, [VCAP_IS2_HK_L4_1588_DOM] = {168, 8}, [VCAP_IS2_HK_L4_1588_VER] = {176, 4}, /* IP4_OTHER (TYPE=101) */ @@@ -719,23 -718,6 +719,23 @@@ static const struct vcap_props vsc9959_ }, };
+static const struct ptp_clock_info vsc9959_ptp_caps = { + .owner = THIS_MODULE, + .name = "felix ptp", + .max_adj = 0x7fffffff, + .n_alarm = 0, + .n_ext_ts = 0, + .n_per_out = OCELOT_PTP_PINS_NUM, + .n_pins = OCELOT_PTP_PINS_NUM, + .pps = 0, + .gettime64 = ocelot_ptp_gettime64, + .settime64 = ocelot_ptp_settime64, + .adjtime = ocelot_ptp_adjtime, + .adjfine = ocelot_ptp_adjfine, + .verify = ocelot_ptp_verify, + .enable = ocelot_ptp_enable, +}; + #define VSC9959_INIT_TIMEOUT 50000 #define VSC9959_GCB_RST_SLEEP 100 #define VSC9959_SYS_RAMINIT_SLEEP 80 @@@ -744,7 -726,7 +744,7 @@@ static int vsc9959_gcb_soft_rst_status( { int val;
- regmap_field_read(ocelot->regfields[GCB_SOFT_RST_SWC_RST], &val); + ocelot_field_read(ocelot, GCB_SOFT_RST_SWC_RST, &val);
return val; } @@@ -754,15 -736,12 +754,15 @@@ static int vsc9959_sys_ram_init_status( return ocelot_read(ocelot, SYS_RAM_INIT); }
+/* CORE_ENA is in SYS:SYSTEM:RESET_CFG + * RAM_INIT is in SYS:RAM_CTRL:RAM_INIT + */ static int vsc9959_reset(struct ocelot *ocelot) { int val, err;
/* soft-reset the switch core */ - regmap_field_write(ocelot->regfields[GCB_SOFT_RST_SWC_RST], 1); + ocelot_field_write(ocelot, GCB_SOFT_RST_SWC_RST, 1);
err = readx_poll_timeout(vsc9959_gcb_soft_rst_status, ocelot, val, !val, VSC9959_GCB_RST_SLEEP, VSC9959_INIT_TIMEOUT); @@@ -782,11 -761,352 +782,11 @@@ }
/* enable switch core */ - regmap_field_write(ocelot->regfields[SYS_RESET_CFG_CORE_ENA], 1); + ocelot_field_write(ocelot, SYS_RESET_CFG_CORE_ENA, 1);
return 0; }
-/* We enable SGMII AN only when the PHY has managed = "in-band-status" in the - * device tree. If we are in MLO_AN_PHY mode, we program directly state->speed - * into the PCS, which is retrieved out-of-band over MDIO. This also has the - * benefit of working with SGMII fixed-links, like downstream switches, where - * both link partners attempt to operate as AN slaves and therefore AN never - * completes. But it also has the disadvantage that some PHY chips don't pass - * traffic if SGMII AN is enabled but not completed (acknowledged by us), so - * setting MLO_AN_INBAND is actually required for those. - */ -static void vsc9959_pcs_config_sgmii(struct phy_device *pcs, - unsigned int link_an_mode, - const struct phylink_link_state *state) -{ - int bmsr, bmcr; - - /* Some PHYs like VSC8234 don't like it when AN restarts on - * their system side and they restart line side AN too, going - * into an endless link up/down loop. Don't restart PCS AN if - * link is up already. - * We do check that AN is enabled just in case this is the 1st - * call, PCS detects a carrier but AN is disabled from power on - * or by boot loader. - */ - bmcr = phy_read(pcs, MII_BMCR); - if (bmcr < 0) - return; - - bmsr = phy_read(pcs, MII_BMSR); - if (bmsr < 0) - return; - - if ((bmcr & BMCR_ANENABLE) && (bmsr & BMSR_LSTATUS)) - return; - - /* SGMII spec requires tx_config_Reg[15:0] to be exactly 0x4001 - * for the MAC PCS in order to acknowledge the AN. - */ - phy_write(pcs, MII_ADVERTISE, ADVERTISE_SGMII | - ADVERTISE_LPACK); - - phy_write(pcs, ENETC_PCS_IF_MODE, - ENETC_PCS_IF_MODE_SGMII_EN | - ENETC_PCS_IF_MODE_USE_SGMII_AN); - - /* Adjust link timer for SGMII */ - phy_write(pcs, ENETC_PCS_LINK_TIMER1, - ENETC_PCS_LINK_TIMER1_VAL); - phy_write(pcs, ENETC_PCS_LINK_TIMER2, - ENETC_PCS_LINK_TIMER2_VAL); - - phy_set_bits(pcs, MII_BMCR, BMCR_ANENABLE); -} - -static void vsc9959_pcs_config_usxgmii(struct phy_device *pcs, - unsigned int link_an_mode, - const struct phylink_link_state *state) -{ - /* Configure device ability for the USXGMII Replicator */ - phy_write_mmd(pcs, MDIO_MMD_VEND2, MII_ADVERTISE, - MDIO_USXGMII_2500FULL | - MDIO_USXGMII_LINK | - ADVERTISE_SGMII | - ADVERTISE_LPACK); -} - -void vsc9959_pcs_config(struct ocelot *ocelot, int port, - unsigned int link_an_mode, - const struct phylink_link_state *state) -{ - struct felix *felix = ocelot_to_felix(ocelot); - struct phy_device *pcs = felix->pcs[port]; - - if (!pcs) - return; - - /* The PCS does not implement the BMSR register fully, so capability - * detection via genphy_read_abilities does not work. Since we can get - * the PHY config word from the LPA register though, there is still - * value in using the generic phy_resolve_aneg_linkmode function. So - * populate the supported and advertising link modes manually here. - */ - linkmode_set_bit_array(phy_basic_ports_array, - ARRAY_SIZE(phy_basic_ports_array), - pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT, pcs->supported); - linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, pcs->supported); - if (pcs->interface == PHY_INTERFACE_MODE_2500BASEX || - pcs->interface == PHY_INTERFACE_MODE_USXGMII) - linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseX_Full_BIT, - pcs->supported); - if (pcs->interface != PHY_INTERFACE_MODE_2500BASEX) - linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, - pcs->supported); - phy_advertise_supported(pcs); - - if (!phylink_autoneg_inband(link_an_mode)) - return; - - switch (pcs->interface) { - case PHY_INTERFACE_MODE_SGMII: - case PHY_INTERFACE_MODE_QSGMII: - vsc9959_pcs_config_sgmii(pcs, link_an_mode, state); - break; - case PHY_INTERFACE_MODE_2500BASEX: - phydev_err(pcs, "AN not supported on 3.125GHz SerDes lane\n"); - break; - case PHY_INTERFACE_MODE_USXGMII: - vsc9959_pcs_config_usxgmii(pcs, link_an_mode, state); - break; - default: - dev_err(ocelot->dev, "Unsupported link mode %s\n", - phy_modes(pcs->interface)); - } -} - -static void vsc9959_pcs_link_up_sgmii(struct phy_device *pcs, - unsigned int link_an_mode, - int speed, int duplex) -{ - u16 if_mode = ENETC_PCS_IF_MODE_SGMII_EN; - - switch (speed) { - case SPEED_1000: - if_mode |= ENETC_PCS_IF_MODE_SGMII_SPEED(ENETC_PCS_SPEED_1000); - break; - case SPEED_100: - if_mode |= ENETC_PCS_IF_MODE_SGMII_SPEED(ENETC_PCS_SPEED_100); - break; - case SPEED_10: - if_mode |= ENETC_PCS_IF_MODE_SGMII_SPEED(ENETC_PCS_SPEED_10); - break; - default: - phydev_err(pcs, "Invalid PCS speed %d\n", speed); - return; - } - - if (duplex == DUPLEX_HALF) - if_mode |= ENETC_PCS_IF_MODE_DUPLEX_HALF; - - phy_write(pcs, ENETC_PCS_IF_MODE, if_mode); - phy_clear_bits(pcs, MII_BMCR, BMCR_ANENABLE); -} - -/* 2500Base-X is SerDes protocol 7 on Felix and 6 on ENETC. It is a SerDes lane - * clocked at 3.125 GHz which encodes symbols with 8b/10b and does not have - * auto-negotiation of any link parameters. Electrically it is compatible with - * a single lane of XAUI. - * The hardware reference manual wants to call this mode SGMII, but it isn't - * really, since the fundamental features of SGMII: - * - Downgrading the link speed by duplicating symbols - * - Auto-negotiation - * are not there. - * The speed is configured at 1000 in the IF_MODE and BMCR MDIO registers - * because the clock frequency is actually given by a PLL configured in the - * Reset Configuration Word (RCW). - * Since there is no difference between fixed speed SGMII w/o AN and 802.3z w/o - * AN, we call this PHY interface type 2500Base-X. In case a PHY negotiates a - * lower link speed on line side, the system-side interface remains fixed at - * 2500 Mbps and we do rate adaptation through pause frames. - */ -static void vsc9959_pcs_link_up_2500basex(struct phy_device *pcs, - unsigned int link_an_mode, - int speed, int duplex) -{ - u16 if_mode = ENETC_PCS_IF_MODE_SGMII_SPEED(ENETC_PCS_SPEED_2500) | - ENETC_PCS_IF_MODE_SGMII_EN; - - if (duplex == DUPLEX_HALF) - if_mode |= ENETC_PCS_IF_MODE_DUPLEX_HALF; - - phy_write(pcs, ENETC_PCS_IF_MODE, if_mode); - phy_clear_bits(pcs, MII_BMCR, BMCR_ANENABLE); -} - -void vsc9959_pcs_link_up(struct ocelot *ocelot, int port, - unsigned int link_an_mode, - phy_interface_t interface, - int speed, int duplex) -{ - struct felix *felix = ocelot_to_felix(ocelot); - struct phy_device *pcs = felix->pcs[port]; - - if (!pcs) - return; - - if (phylink_autoneg_inband(link_an_mode)) - return; - - switch (interface) { - case PHY_INTERFACE_MODE_SGMII: - case PHY_INTERFACE_MODE_QSGMII: - vsc9959_pcs_link_up_sgmii(pcs, link_an_mode, speed, duplex); - break; - case PHY_INTERFACE_MODE_2500BASEX: - vsc9959_pcs_link_up_2500basex(pcs, link_an_mode, speed, - duplex); - break; - case PHY_INTERFACE_MODE_USXGMII: - phydev_err(pcs, "USXGMII only supports in-band AN for now\n"); - break; - default: - dev_err(ocelot->dev, "Unsupported link mode %s\n", - phy_modes(pcs->interface)); - } -} - -static void vsc9959_pcs_link_state_resolve(struct phy_device *pcs, - struct phylink_link_state *state) -{ - state->an_complete = pcs->autoneg_complete; - state->an_enabled = pcs->autoneg; - state->link = pcs->link; - state->duplex = pcs->duplex; - state->speed = pcs->speed; - /* SGMII AN does not negotiate flow control, but that's ok, - * since phylink already knows that, and does: - * link_state.pause |= pl->phy_state.pause; - */ - state->pause = MLO_PAUSE_NONE; - - phydev_dbg(pcs, - "mode=%s/%s/%s adv=%*pb lpa=%*pb link=%u an_enabled=%u an_complete=%u\n", - phy_modes(pcs->interface), - phy_speed_to_str(pcs->speed), - phy_duplex_to_str(pcs->duplex), - __ETHTOOL_LINK_MODE_MASK_NBITS, pcs->advertising, - __ETHTOOL_LINK_MODE_MASK_NBITS, pcs->lp_advertising, - pcs->link, pcs->autoneg, pcs->autoneg_complete); -} - -static void vsc9959_pcs_link_state_sgmii(struct phy_device *pcs, - struct phylink_link_state *state) -{ - int err; - - err = genphy_update_link(pcs); - if (err < 0) - return; - - if (pcs->autoneg_complete) { - u16 lpa = phy_read(pcs, MII_LPA); - - mii_lpa_to_linkmode_lpa_sgmii(pcs->lp_advertising, lpa); - - phy_resolve_aneg_linkmode(pcs); - } -} - -static void vsc9959_pcs_link_state_2500basex(struct phy_device *pcs, - struct phylink_link_state *state) -{ - int err; - - err = genphy_update_link(pcs); - if (err < 0) - return; - - pcs->speed = SPEED_2500; - pcs->asym_pause = true; - pcs->pause = true; -} - -static void vsc9959_pcs_link_state_usxgmii(struct phy_device *pcs, - struct phylink_link_state *state) -{ - int status, lpa; - - status = phy_read_mmd(pcs, MDIO_MMD_VEND2, MII_BMSR); - if (status < 0) - return; - - pcs->autoneg = true; - pcs->autoneg_complete = !!(status & BMSR_ANEGCOMPLETE); - pcs->link = !!(status & BMSR_LSTATUS); - - if (!pcs->link || !pcs->autoneg_complete) - return; - - lpa = phy_read_mmd(pcs, MDIO_MMD_VEND2, MII_LPA); - if (lpa < 0) - return; - - switch (lpa & MDIO_USXGMII_SPD_MASK) { - case MDIO_USXGMII_10: - pcs->speed = SPEED_10; - break; - case MDIO_USXGMII_100: - pcs->speed = SPEED_100; - break; - case MDIO_USXGMII_1000: - pcs->speed = SPEED_1000; - break; - case MDIO_USXGMII_2500: - pcs->speed = SPEED_2500; - break; - default: - break; - } - - if (lpa & MDIO_USXGMII_FULL_DUPLEX) - pcs->duplex = DUPLEX_FULL; - else - pcs->duplex = DUPLEX_HALF; -} - -void vsc9959_pcs_link_state(struct ocelot *ocelot, int port, - struct phylink_link_state *state) -{ - struct felix *felix = ocelot_to_felix(ocelot); - struct phy_device *pcs = felix->pcs[port]; - - if (!pcs) - return; - - pcs->speed = SPEED_UNKNOWN; - pcs->duplex = DUPLEX_UNKNOWN; - pcs->pause = 0; - pcs->asym_pause = 0; - - switch (pcs->interface) { - case PHY_INTERFACE_MODE_SGMII: - case PHY_INTERFACE_MODE_QSGMII: - vsc9959_pcs_link_state_sgmii(pcs, state); - break; - case PHY_INTERFACE_MODE_2500BASEX: - vsc9959_pcs_link_state_2500basex(pcs, state); - break; - case PHY_INTERFACE_MODE_USXGMII: - vsc9959_pcs_link_state_usxgmii(pcs, state); - break; - default: - return; - } - - vsc9959_pcs_link_state_resolve(pcs, state); -} - static void vsc9959_phylink_validate(struct ocelot *ocelot, int port, unsigned long *supported, struct phylink_link_state *state) @@@ -875,7 -1195,7 +875,7 @@@ static int vsc9959_mdio_bus_alloc(struc int rc;
felix->pcs = devm_kcalloc(dev, felix->info->num_ports, - sizeof(struct phy_device *), + sizeof(struct lynx_pcs *), GFP_KERNEL); if (!felix->pcs) { dev_err(dev, "failed to allocate array for PCS PHYs\n"); @@@ -926,26 -1246,18 +926,26 @@@
for (port = 0; port < felix->info->num_ports; port++) { struct ocelot_port *ocelot_port = ocelot->ports[port]; - struct phy_device *pcs; - bool is_c45 = false; + struct mdio_device *pcs; + struct lynx_pcs *lynx;
- if (ocelot_port->phy_mode == PHY_INTERFACE_MODE_USXGMII) - is_c45 = true; + if (dsa_is_unused_port(felix->ds, port)) + continue; + + if (ocelot_port->phy_mode == PHY_INTERFACE_MODE_INTERNAL) + continue;
- pcs = get_phy_device(felix->imdio, port, is_c45); + pcs = mdio_device_create(felix->imdio, port); if (IS_ERR(pcs)) continue;
- pcs->interface = ocelot_port->phy_mode; - felix->pcs[port] = pcs; + lynx = lynx_pcs_create(pcs); + if (!lynx) { + mdio_device_free(pcs); + continue; + } + + felix->pcs[port] = lynx;
dev_info(dev, "Found PCS at internal MDIO address %d\n", port); } @@@ -953,19 -1265,18 +953,19 @@@ return 0; }
-void vsc9959_mdio_bus_free(struct ocelot *ocelot) +static void vsc9959_mdio_bus_free(struct ocelot *ocelot) { struct felix *felix = ocelot_to_felix(ocelot); int port;
for (port = 0; port < ocelot->num_phys_ports; port++) { - struct phy_device *pcs = felix->pcs[port]; + struct lynx_pcs *pcs = felix->pcs[port];
if (!pcs) continue;
- put_device(&pcs->mdio.dev); + mdio_device_free(pcs->mdio); + lynx_pcs_destroy(pcs); } mdiobus_unregister(felix->imdio); } @@@ -1186,13 -1497,15 +1186,13 @@@ static const struct felix_info felix_in .num_tx_queues = FELIX_NUM_TC, .switch_pci_bar = 4, .imdio_pci_bar = 0, + .ptp_caps = &vsc9959_ptp_caps, .mdio_bus_alloc = vsc9959_mdio_bus_alloc, .mdio_bus_free = vsc9959_mdio_bus_free, - .pcs_config = vsc9959_pcs_config, - .pcs_link_up = vsc9959_pcs_link_up, - .pcs_link_state = vsc9959_pcs_link_state, .phylink_validate = vsc9959_phylink_validate, .prevalidate_phy_mode = vsc9959_prevalidate_phy_mode, - .port_setup_tc = vsc9959_port_setup_tc, - .port_sched_speed_set = vsc9959_sched_speed_set, + .port_setup_tc = vsc9959_port_setup_tc, + .port_sched_speed_set = vsc9959_sched_speed_set, .xmit_template_populate = vsc9959_xmit_template_populate, };
@@@ -1328,13 -1641,9 +1328,13 @@@ static struct pci_device_id felix_ids[ }; MODULE_DEVICE_TABLE(pci, felix_ids);
-struct pci_driver felix_vsc9959_pci_driver = { +static struct pci_driver felix_vsc9959_pci_driver = { .name = "mscc_felix", .id_table = felix_ids, .probe = felix_pci_probe, .remove = felix_pci_remove, }; +module_pci_driver(felix_vsc9959_pci_driver); + +MODULE_DESCRIPTION("Felix Switch driver"); +MODULE_LICENSE("GPL v2"); diff --combined drivers/net/dsa/ocelot/seville_vsc9953.c index 12c7fb9c2c3f,29df0797ecf5..b0ff90c0ae16 --- a/drivers/net/dsa/ocelot/seville_vsc9953.c +++ b/drivers/net/dsa/ocelot/seville_vsc9953.c @@@ -7,7 -7,6 +7,7 @@@ #include <soc/mscc/ocelot_sys.h> #include <soc/mscc/ocelot.h> #include <linux/of_platform.h> +#include <linux/pcs-lynx.h> #include <linux/packing.h> #include <linux/iopoll.h> #include "felix.h" @@@ -16,12 -15,23 +16,12 @@@ #define VSC9953_VCAP_IS2_ENTRY_WIDTH 376 #define VSC9953_VCAP_PORT_CNT 10
-#define MSCC_MIIM_REG_STATUS 0x0 -#define MSCC_MIIM_STATUS_STAT_BUSY BIT(3) -#define MSCC_MIIM_REG_CMD 0x8 -#define MSCC_MIIM_CMD_OPR_WRITE BIT(1) -#define MSCC_MIIM_CMD_OPR_READ BIT(2) -#define MSCC_MIIM_CMD_WRDATA_SHIFT 4 -#define MSCC_MIIM_CMD_REGAD_SHIFT 20 -#define MSCC_MIIM_CMD_PHYAD_SHIFT 25 -#define MSCC_MIIM_CMD_VLD BIT(31) -#define MSCC_MIIM_REG_DATA 0xC -#define MSCC_MIIM_DATA_ERROR (BIT(16) | BIT(17)) - -#define MSCC_PHY_REG_PHY_CFG 0x0 -#define PHY_CFG_PHY_ENA (BIT(0) | BIT(1) | BIT(2) | BIT(3)) -#define PHY_CFG_PHY_COMMON_RESET BIT(4) -#define PHY_CFG_PHY_RESET (BIT(5) | BIT(6) | BIT(7) | BIT(8)) -#define MSCC_PHY_REG_PHY_STATUS 0x4 +#define MSCC_MIIM_CMD_OPR_WRITE BIT(1) +#define MSCC_MIIM_CMD_OPR_READ BIT(2) +#define MSCC_MIIM_CMD_WRDATA_SHIFT 4 +#define MSCC_MIIM_CMD_REGAD_SHIFT 20 +#define MSCC_MIIM_CMD_PHYAD_SHIFT 25 +#define MSCC_MIIM_CMD_VLD BIT(31)
static const u32 vsc9953_ana_regmap[] = { REG(ANA_ADVLEARN, 0x00b500), @@@ -649,17 -659,17 +649,17 @@@ static struct vcap_field vsc9953_vcap_i [VCAP_IS2_HK_DIP_EQ_SIP] = {122, 1}, /* IP4_TCP_UDP (TYPE=100) */ [VCAP_IS2_HK_TCP] = {123, 1}, - [VCAP_IS2_HK_L4_SPORT] = {124, 16}, - [VCAP_IS2_HK_L4_DPORT] = {140, 16}, + [VCAP_IS2_HK_L4_DPORT] = {124, 16}, + [VCAP_IS2_HK_L4_SPORT] = {140, 16}, [VCAP_IS2_HK_L4_RNG] = {156, 8}, [VCAP_IS2_HK_L4_SPORT_EQ_DPORT] = {164, 1}, [VCAP_IS2_HK_L4_SEQUENCE_EQ0] = {165, 1}, - [VCAP_IS2_HK_L4_URG] = {166, 1}, - [VCAP_IS2_HK_L4_ACK] = {167, 1}, - [VCAP_IS2_HK_L4_PSH] = {168, 1}, - [VCAP_IS2_HK_L4_RST] = {169, 1}, - [VCAP_IS2_HK_L4_SYN] = {170, 1}, - [VCAP_IS2_HK_L4_FIN] = {171, 1}, + [VCAP_IS2_HK_L4_FIN] = {166, 1}, + [VCAP_IS2_HK_L4_SYN] = {167, 1}, + [VCAP_IS2_HK_L4_RST] = {168, 1}, + [VCAP_IS2_HK_L4_PSH] = {169, 1}, + [VCAP_IS2_HK_L4_ACK] = {170, 1}, + [VCAP_IS2_HK_L4_URG] = {171, 1}, /* IP4_OTHER (TYPE=101) */ [VCAP_IS2_HK_IP4_L3_PROTO] = {123, 8}, [VCAP_IS2_HK_L3_PAYLOAD] = {131, 56}, @@@ -809,10 -819,6 +809,10 @@@ out return err; }
+/* CORE_ENA is in SYS:SYSTEM:RESET_CFG + * MEM_INIT is in SYS:SYSTEM:RESET_CFG + * MEM_ENA is in SYS:SYSTEM:RESET_CFG + */ static int vsc9953_reset(struct ocelot *ocelot) { int val, err; @@@ -828,8 -834,8 +828,8 @@@ }
/* initialize switch mem ~40us */ - ocelot_field_write(ocelot, SYS_RESET_CFG_MEM_INIT, 1); ocelot_field_write(ocelot, SYS_RESET_CFG_MEM_ENA, 1); + ocelot_field_write(ocelot, SYS_RESET_CFG_MEM_INIT, 1);
err = readx_poll_timeout(vsc9953_sys_ram_init_status, ocelot, val, !val, VSC9953_SYS_RAMINIT_SLEEP, @@@ -840,6 -846,7 +840,6 @@@ }
/* enable switch core */ - ocelot_field_write(ocelot, SYS_RESET_CFG_MEM_ENA, 1); ocelot_field_write(ocelot, SYS_RESET_CFG_CORE_ENA, 1);
return 0; @@@ -953,27 -960,18 +953,27 @@@ static int vsc9953_mdio_bus_alloc(struc
for (port = 0; port < felix->info->num_ports; port++) { struct ocelot_port *ocelot_port = ocelot->ports[port]; - struct phy_device *pcs; int addr = port + 4; + struct mdio_device *pcs; + struct lynx_pcs *lynx; + + if (dsa_is_unused_port(felix->ds, port)) + continue;
if (ocelot_port->phy_mode == PHY_INTERFACE_MODE_INTERNAL) continue;
- pcs = get_phy_device(felix->imdio, addr, false); + pcs = mdio_device_create(felix->imdio, addr); if (IS_ERR(pcs)) continue;
- pcs->interface = ocelot_port->phy_mode; - felix->pcs[port] = pcs; + lynx = lynx_pcs_create(pcs); + if (!lynx) { + mdio_device_free(pcs); + continue; + } + + felix->pcs[port] = lynx;
dev_info(dev, "Found PCS at internal MDIO address %d\n", addr); } @@@ -981,23 -979,6 +981,23 @@@ return 0; }
+static void vsc9953_mdio_bus_free(struct ocelot *ocelot) +{ + struct felix *felix = ocelot_to_felix(ocelot); + int port; + + for (port = 0; port < ocelot->num_phys_ports; port++) { + struct lynx_pcs *pcs = felix->pcs[port]; + + if (!pcs) + continue; + + mdio_device_free(pcs->mdio); + lynx_pcs_destroy(pcs); + } + mdiobus_unregister(felix->imdio); +} + static void vsc9953_xmit_template_populate(struct ocelot *ocelot, int port) { struct ocelot_port *ocelot_port = ocelot->ports[port]; @@@ -1027,11 -1008,14 +1027,11 @@@ static const struct felix_info seville_ .vcap_is2_keys = vsc9953_vcap_is2_keys, .vcap_is2_actions = vsc9953_vcap_is2_actions, .vcap = vsc9953_vcap_props, - .shared_queue_sz = 128 * 1024, + .shared_queue_sz = 2048 * 1024, .num_mact_rows = 2048, .num_ports = 10, .mdio_bus_alloc = vsc9953_mdio_bus_alloc, - .mdio_bus_free = vsc9959_mdio_bus_free, - .pcs_config = vsc9959_pcs_config, - .pcs_link_up = vsc9959_pcs_link_up, - .pcs_link_state = vsc9959_pcs_link_state, + .mdio_bus_free = vsc9953_mdio_bus_free, .phylink_validate = vsc9953_phylink_validate, .prevalidate_phy_mode = vsc9953_prevalidate_phy_mode, .xmit_template_populate = vsc9953_xmit_template_populate, @@@ -1110,7 -1094,7 +1110,7 @@@ static const struct of_device_id sevill }; MODULE_DEVICE_TABLE(of, seville_of_match);
-struct platform_driver seville_vsc9953_driver = { +static struct platform_driver seville_vsc9953_driver = { .probe = seville_probe, .remove = seville_remove, .driver = { @@@ -1118,7 -1102,3 +1118,7 @@@ .of_match_table = of_match_ptr(seville_of_match), }, }; +module_platform_driver(seville_vsc9953_driver); + +MODULE_DESCRIPTION("Seville Switch driver"); +MODULE_LICENSE("GPL v2"); diff --combined drivers/net/dsa/rtl8366.c index 7c09ed747bc0,a8c5a934c3d3..c58ca324a4b2 --- a/drivers/net/dsa/rtl8366.c +++ b/drivers/net/dsa/rtl8366.c @@@ -36,113 -36,12 +36,113 @@@ int rtl8366_mc_is_used(struct realtek_s } EXPORT_SYMBOL_GPL(rtl8366_mc_is_used);
+/** + * rtl8366_obtain_mc() - retrieve or allocate a VLAN member configuration + * @smi: the Realtek SMI device instance + * @vid: the VLAN ID to look up or allocate + * @vlanmc: the pointer will be assigned to a pointer to a valid member config + * if successful + * @return: index of a new member config or negative error number + */ +static int rtl8366_obtain_mc(struct realtek_smi *smi, int vid, + struct rtl8366_vlan_mc *vlanmc) +{ + struct rtl8366_vlan_4k vlan4k; + int ret; + int i; + + /* Try to find an existing member config entry for this VID */ + for (i = 0; i < smi->num_vlan_mc; i++) { + ret = smi->ops->get_vlan_mc(smi, i, vlanmc); + if (ret) { + dev_err(smi->dev, "error searching for VLAN MC %d for VID %d\n", + i, vid); + return ret; + } + + if (vid == vlanmc->vid) + return i; + } + + /* We have no MC entry for this VID, try to find an empty one */ + for (i = 0; i < smi->num_vlan_mc; i++) { + ret = smi->ops->get_vlan_mc(smi, i, vlanmc); + if (ret) { + dev_err(smi->dev, "error searching for VLAN MC %d for VID %d\n", + i, vid); + return ret; + } + + if (vlanmc->vid == 0 && vlanmc->member == 0) { + /* Update the entry from the 4K table */ + ret = smi->ops->get_vlan_4k(smi, vid, &vlan4k); + if (ret) { + dev_err(smi->dev, "error looking for 4K VLAN MC %d for VID %d\n", + i, vid); + return ret; + } + + vlanmc->vid = vid; + vlanmc->member = vlan4k.member; + vlanmc->untag = vlan4k.untag; + vlanmc->fid = vlan4k.fid; + ret = smi->ops->set_vlan_mc(smi, i, vlanmc); + if (ret) { + dev_err(smi->dev, "unable to set/update VLAN MC %d for VID %d\n", + i, vid); + return ret; + } + + dev_dbg(smi->dev, "created new MC at index %d for VID %d\n", + i, vid); + return i; + } + } + + /* MC table is full, try to find an unused entry and replace it */ + for (i = 0; i < smi->num_vlan_mc; i++) { + int used; + + ret = rtl8366_mc_is_used(smi, i, &used); + if (ret) + return ret; + + if (!used) { + /* Update the entry from the 4K table */ + ret = smi->ops->get_vlan_4k(smi, vid, &vlan4k); + if (ret) + return ret; + + vlanmc->vid = vid; + vlanmc->member = vlan4k.member; + vlanmc->untag = vlan4k.untag; + vlanmc->fid = vlan4k.fid; + ret = smi->ops->set_vlan_mc(smi, i, vlanmc); + if (ret) { + dev_err(smi->dev, "unable to set/update VLAN MC %d for VID %d\n", + i, vid); + return ret; + } + dev_dbg(smi->dev, "recycled MC at index %i for VID %d\n", + i, vid); + return i; + } + } + + dev_err(smi->dev, "all VLAN member configurations are in use\n"); + return -ENOSPC; +} + int rtl8366_set_vlan(struct realtek_smi *smi, int vid, u32 member, u32 untag, u32 fid) { + struct rtl8366_vlan_mc vlanmc; struct rtl8366_vlan_4k vlan4k; + int mc; int ret; - int i; + + if (!smi->ops->is_vlan_valid(smi, vid)) + return -EINVAL;
dev_dbg(smi->dev, "setting VLAN%d 4k members: 0x%02x, untagged: 0x%02x\n", @@@ -164,58 -63,133 +164,58 @@@ "resulting VLAN%d 4k members: 0x%02x, untagged: 0x%02x\n", vid, vlan4k.member, vlan4k.untag);
- /* Try to find an existing MC entry for this VID */ - for (i = 0; i < smi->num_vlan_mc; i++) { - struct rtl8366_vlan_mc vlanmc; - - ret = smi->ops->get_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; - - if (vid == vlanmc.vid) { - /* update the MC entry */ - vlanmc.member |= member; - vlanmc.untag |= untag; - vlanmc.fid = fid; - - ret = smi->ops->set_vlan_mc(smi, i, &vlanmc); + /* Find or allocate a member config for this VID */ + ret = rtl8366_obtain_mc(smi, vid, &vlanmc); + if (ret < 0) + return ret; + mc = ret;
- dev_dbg(smi->dev, - "resulting VLAN%d MC members: 0x%02x, untagged: 0x%02x\n", - vid, vlanmc.member, vlanmc.untag); + /* Update the MC entry */ + vlanmc.member |= member; + vlanmc.untag |= untag; + vlanmc.fid = fid;
- break; - } - } + /* Commit updates to the MC entry */ + ret = smi->ops->set_vlan_mc(smi, mc, &vlanmc); + if (ret) + dev_err(smi->dev, "failed to commit changes to VLAN MC index %d for VID %d\n", + mc, vid); + else + dev_dbg(smi->dev, + "resulting VLAN%d MC members: 0x%02x, untagged: 0x%02x\n", + vid, vlanmc.member, vlanmc.untag);
return ret; } EXPORT_SYMBOL_GPL(rtl8366_set_vlan);
-int rtl8366_get_pvid(struct realtek_smi *smi, int port, int *val) -{ - struct rtl8366_vlan_mc vlanmc; - int ret; - int index; - - ret = smi->ops->get_mc_index(smi, port, &index); - if (ret) - return ret; - - ret = smi->ops->get_vlan_mc(smi, index, &vlanmc); - if (ret) - return ret; - - *val = vlanmc.vid; - return 0; -} -EXPORT_SYMBOL_GPL(rtl8366_get_pvid); - int rtl8366_set_pvid(struct realtek_smi *smi, unsigned int port, unsigned int vid) { struct rtl8366_vlan_mc vlanmc; - struct rtl8366_vlan_4k vlan4k; + int mc; int ret; - int i; - - /* Try to find an existing MC entry for this VID */ - for (i = 0; i < smi->num_vlan_mc; i++) { - ret = smi->ops->get_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; - - if (vid == vlanmc.vid) { - ret = smi->ops->set_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; - - ret = smi->ops->set_mc_index(smi, port, i); - return ret; - } - } - - /* We have no MC entry for this VID, try to find an empty one */ - for (i = 0; i < smi->num_vlan_mc; i++) { - ret = smi->ops->get_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; - - if (vlanmc.vid == 0 && vlanmc.member == 0) { - /* Update the entry from the 4K table */ - ret = smi->ops->get_vlan_4k(smi, vid, &vlan4k); - if (ret) - return ret; - - vlanmc.vid = vid; - vlanmc.member = vlan4k.member; - vlanmc.untag = vlan4k.untag; - vlanmc.fid = vlan4k.fid; - ret = smi->ops->set_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; - - ret = smi->ops->set_mc_index(smi, port, i); - return ret; - } - } - - /* MC table is full, try to find an unused entry and replace it */ - for (i = 0; i < smi->num_vlan_mc; i++) { - int used; - - ret = rtl8366_mc_is_used(smi, i, &used); - if (ret) - return ret;
- if (!used) { - /* Update the entry from the 4K table */ - ret = smi->ops->get_vlan_4k(smi, vid, &vlan4k); - if (ret) - return ret; + if (!smi->ops->is_vlan_valid(smi, vid)) + return -EINVAL;
- vlanmc.vid = vid; - vlanmc.member = vlan4k.member; - vlanmc.untag = vlan4k.untag; - vlanmc.fid = vlan4k.fid; - ret = smi->ops->set_vlan_mc(smi, i, &vlanmc); - if (ret) - return ret; + /* Find or allocate a member config for this VID */ + ret = rtl8366_obtain_mc(smi, vid, &vlanmc); + if (ret < 0) + return ret; + mc = ret;
- ret = smi->ops->set_mc_index(smi, port, i); - return ret; - } + ret = smi->ops->set_mc_index(smi, port, mc); + if (ret) { + dev_err(smi->dev, "set PVID: failed to set MC index %d for port %d\n", + mc, port); + return ret; }
- dev_err(smi->dev, - "all VLAN member configurations are in use\n"); + dev_dbg(smi->dev, "set PVID: the PVID for port %d set to %d using existing MC index %d\n", + port, vid, mc);
- return -ENOSPC; + return 0; } EXPORT_SYMBOL_GPL(rtl8366_set_pvid);
@@@ -415,8 -389,7 +415,8 @@@ void rtl8366_vlan_add(struct dsa_switc if (!smi->ops->is_vlan_valid(smi, vid)) return;
- dev_info(smi->dev, "add VLAN on port %d, %s, %s\n", + dev_info(smi->dev, "add VLAN %d on port %d, %s, %s\n", + vlan->vid_begin, port, untagged ? "untagged" : "tagged", pvid ? " PVID" : "no PVID"); @@@ -425,29 -398,34 +425,29 @@@ dev_err(smi->dev, "port is DSA or CPU port\n");
for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) { - int pvid_val = 0; - - dev_info(smi->dev, "add VLAN %04x\n", vid); member |= BIT(port);
if (untagged) untag |= BIT(port);
- /* To ensure that we have a valid MC entry for this VLAN, - * initialize the port VLAN ID here. - */ - ret = rtl8366_get_pvid(smi, port, &pvid_val); - if (ret < 0) { - dev_err(smi->dev, "could not lookup PVID for port %d\n", - port); - return; - } - if (pvid_val == 0) { - ret = rtl8366_set_pvid(smi, port, vid); - if (ret < 0) - return; - } - ret = rtl8366_set_vlan(smi, vid, member, untag, 0); if (ret) dev_err(smi->dev, "failed to set up VLAN %04x", vid); + + if (!pvid) + continue; + + ret = rtl8366_set_pvid(smi, port, vid); + if (ret) + dev_err(smi->dev, + "failed to set PVID on port %d to VLAN %04x", + port, vid); + + if (!ret) + dev_dbg(smi->dev, "VLAN add: added VLAN %d with PVID on port %d\n", + vid, port); } } EXPORT_SYMBOL_GPL(rtl8366_vlan_add); @@@ -474,13 -452,19 +474,19 @@@ int rtl8366_vlan_del(struct dsa_switch return ret;
if (vid == vlanmc.vid) { - /* clear VLAN member configurations */ - vlanmc.vid = 0; - vlanmc.priority = 0; - vlanmc.member = 0; - vlanmc.untag = 0; - vlanmc.fid = 0; - + /* Remove this port from the VLAN */ + vlanmc.member &= ~BIT(port); + vlanmc.untag &= ~BIT(port); + /* + * If no ports are members of this VLAN + * anymore then clear the whole member + * config so it can be reused. + */ + if (!vlanmc.member && vlanmc.untag) { + vlanmc.vid = 0; + vlanmc.priority = 0; + vlanmc.fid = 0; + } ret = smi->ops->set_vlan_mc(smi, i, &vlanmc); if (ret) { dev_err(smi->dev, diff --combined drivers/net/ethernet/broadcom/bnxt/bnxt.c index 53f64ca673c3,7b7e8b7883c8..65c298f1f333 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@@ -3782,6 -3782,7 +3782,7 @@@ static int bnxt_hwrm_func_qstat_ext(str return -EOPNOTSUPP;
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_QSTATS_EXT, -1, -1); + req.fid = cpu_to_le16(0xffff); req.flags = FUNC_QSTATS_EXT_REQ_FLAGS_COUNTER_MASK; mutex_lock(&bp->hwrm_cmd_lock); rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); @@@ -3852,7 -3853,7 +3853,7 @@@ static void bnxt_init_stats(struct bnx tx_masks = stats->hw_masks; tx_count = sizeof(struct tx_port_stats_ext) / 8;
- flags = FUNC_QSTATS_EXT_REQ_FLAGS_COUNTER_MASK; + flags = PORT_QSTATS_EXT_REQ_FLAGS_COUNTER_MASK; rc = bnxt_hwrm_port_qstats_ext(bp, flags); if (rc) { mask = (1ULL << 40) - 1; @@@ -4305,7 -4306,7 +4306,7 @@@ static int bnxt_hwrm_do_send_msg(struc u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM; u16 dst = BNXT_HWRM_CHNL_CHIMP;
- if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) + if (BNXT_NO_FW_ACCESS(bp)) return -EBUSY;
if (msg_len > BNXT_HWRM_MAX_REQ_LEN) { @@@ -5723,7 -5724,7 +5724,7 @@@ static int hwrm_ring_free_send_msg(stru struct hwrm_ring_free_output *resp = bp->hwrm_cmd_resp_addr; u16 error_code;
- if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) + if (BNXT_NO_FW_ACCESS(bp)) return 0;
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_RING_FREE, cmpl_ring_id, -1); @@@ -7817,7 -7818,7 +7818,7 @@@ static int bnxt_set_tpa(struct bnxt *bp
if (set_tpa) tpa_flags = bp->flags & BNXT_FLAG_TPA; - else if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) + else if (BNXT_NO_FW_ACCESS(bp)) return 0; for (i = 0; i < bp->nr_vnics; i++) { rc = bnxt_hwrm_vnic_set_tpa(bp, i, tpa_flags); @@@ -8634,9 -8635,10 +8635,9 @@@ static void bnxt_del_napi(struct bnxt * for (i = 0; i < bp->cp_nr_rings; i++) { struct bnxt_napi *bnapi = bp->bnapi[i];
- napi_hash_del(&bnapi->napi); - netif_napi_del(&bnapi->napi); + __netif_napi_del(&bnapi->napi); } - /* We called napi_hash_del() before netif_napi_del(), we need + /* We called __netif_napi_del(), we need * to respect an RCU grace period before freeing napi structures. */ synchronize_net(); @@@ -9310,18 -9312,16 +9311,16 @@@ static ssize_t bnxt_show_temp(struct de struct hwrm_temp_monitor_query_output *resp; struct bnxt *bp = dev_get_drvdata(dev); u32 len = 0; + int rc;
resp = bp->hwrm_cmd_resp_addr; bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_TEMP_MONITOR_QUERY, -1, -1); mutex_lock(&bp->hwrm_cmd_lock); - if (!_hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT)) + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + if (!rc) len = sprintf(buf, "%u\n", resp->temp * 1000); /* display millidegree */ mutex_unlock(&bp->hwrm_cmd_lock); - - if (len) - return len; - - return sprintf(buf, "unknown\n"); + return rc ?: len; } static SENSOR_DEVICE_ATTR(temp1_input, 0444, bnxt_show_temp, NULL, 0);
@@@ -9341,7 -9341,16 +9340,16 @@@ static void bnxt_hwmon_close(struct bnx
static void bnxt_hwmon_open(struct bnxt *bp) { + struct hwrm_temp_monitor_query_input req = {0}; struct pci_dev *pdev = bp->pdev; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_TEMP_MONITOR_QUERY, -1, -1); + rc = hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + if (rc == -EACCES || rc == -EOPNOTSUPP) { + bnxt_hwmon_close(bp); + return; + }
if (bp->hwmon_dev) return; @@@ -11778,6 -11787,10 +11786,10 @@@ static void bnxt_remove_one(struct pci_ if (BNXT_PF(bp)) bnxt_sriov_disable(bp);
+ clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state); + bnxt_cancel_sp_work(bp); + bp->sp_event = 0; + bnxt_dl_fw_reporters_destroy(bp, true); if (BNXT_PF(bp)) devlink_port_type_clear(&bp->dl_port); @@@ -11785,9 -11798,6 +11797,6 @@@ unregister_netdev(dev); bnxt_dl_unregister(bp); bnxt_shutdown_tc(bp); - clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state); - bnxt_cancel_sp_work(bp); - bp->sp_event = 0;
bnxt_clear_int_mode(bp); bnxt_hwrm_func_drv_unrgtr(bp); @@@ -12088,7 -12098,7 +12097,7 @@@ static int bnxt_init_mac_addr(struct bn static void bnxt_vpd_read_info(struct bnxt *bp) { struct pci_dev *pdev = bp->pdev; - int i, len, pos, ro_size; + int i, len, pos, ro_size, size; ssize_t vpd_size; u8 *vpd_data;
@@@ -12123,7 -12133,8 +12132,8 @@@ if (len + pos > vpd_size) goto read_sn;
- strlcpy(bp->board_partno, &vpd_data[pos], min(len, BNXT_VPD_FLD_LEN)); + size = min(len, BNXT_VPD_FLD_LEN - 1); + memcpy(bp->board_partno, &vpd_data[pos], size);
read_sn: pos = pci_vpd_find_info_keyword(vpd_data, i, ro_size, @@@ -12136,7 -12147,8 +12146,8 @@@ if (len + pos > vpd_size) goto exit;
- strlcpy(bp->board_serialno, &vpd_data[pos], min(len, BNXT_VPD_FLD_LEN)); + size = min(len, BNXT_VPD_FLD_LEN - 1); + memcpy(bp->board_serialno, &vpd_data[pos], size); exit: kfree(vpd_data); } diff --combined drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 5a65f28ef771,fecdfd875af1..6a3453f46d9a --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@@ -1322,6 -1322,9 +1322,9 @@@ static int bnxt_get_regs_len(struct net struct bnxt *bp = netdev_priv(dev); int reg_len;
+ if (!BNXT_PF(bp)) + return -EOPNOTSUPP; + reg_len = BNXT_PXP_REG_LEN;
if (bp->fw_cap & BNXT_FW_CAP_PCIE_STATS_SUPPORTED) @@@ -1778,22 -1781,6 +1781,22 @@@ static void bnxt_get_pauseparam(struct epause->tx_pause = !!(link_info->req_flow_ctrl & BNXT_LINK_PAUSE_TX); }
+static void bnxt_get_pause_stats(struct net_device *dev, + struct ethtool_pause_stats *epstat) +{ + struct bnxt *bp = netdev_priv(dev); + u64 *rx, *tx; + + if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS)) + return; + + rx = bp->port_stats.sw_stats; + tx = bp->port_stats.sw_stats + BNXT_TX_PORT_STATS_BYTE_OFFSET / 8; + + epstat->rx_pause_frames = BNXT_GET_RX_PORT_STATS64(rx, rx_pause_frames); + epstat->tx_pause_frames = BNXT_GET_TX_PORT_STATS64(tx, tx_pause_frames); +} + static int bnxt_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam *epause) { @@@ -1804,9 -1791,12 +1807,12 @@@ if (!BNXT_PHY_CFG_ABLE(bp)) return -EOPNOTSUPP;
+ mutex_lock(&bp->link_lock); if (epause->autoneg) { - if (!(link_info->autoneg & BNXT_AUTONEG_SPEED)) - return -EINVAL; + if (!(link_info->autoneg & BNXT_AUTONEG_SPEED)) { + rc = -EINVAL; + goto pause_exit; + }
link_info->autoneg |= BNXT_AUTONEG_FLOW_CTRL; if (bp->hwrm_spec_code >= 0x10201) @@@ -1827,11 -1817,11 +1833,11 @@@ if (epause->tx_pause) link_info->req_flow_ctrl |= BNXT_LINK_PAUSE_TX;
- if (netif_running(dev)) { - mutex_lock(&bp->link_lock); + if (netif_running(dev)) rc = bnxt_hwrm_set_pause(bp); - mutex_unlock(&bp->link_lock); - } + + pause_exit: + mutex_unlock(&bp->link_lock); return rc; }
@@@ -2568,8 -2558,7 +2574,7 @@@ static int bnxt_set_eee(struct net_devi struct bnxt *bp = netdev_priv(dev); struct ethtool_eee *eee = &bp->eee; struct bnxt_link_info *link_info = &bp->link_info; - u32 advertising = - _bnxt_fw_to_ethtool_adv_spds(link_info->advertising, 0); + u32 advertising; int rc = 0;
if (!BNXT_PHY_CFG_ABLE(bp)) @@@ -2578,19 -2567,23 +2583,23 @@@ if (!(bp->flags & BNXT_FLAG_EEE_CAP)) return -EOPNOTSUPP;
+ mutex_lock(&bp->link_lock); + advertising = _bnxt_fw_to_ethtool_adv_spds(link_info->advertising, 0); if (!edata->eee_enabled) goto eee_ok;
if (!(link_info->autoneg & BNXT_AUTONEG_SPEED)) { netdev_warn(dev, "EEE requires autoneg\n"); - return -EINVAL; + rc = -EINVAL; + goto eee_exit; } if (edata->tx_lpi_enabled) { if (bp->lpi_tmr_hi && (edata->tx_lpi_timer > bp->lpi_tmr_hi || edata->tx_lpi_timer < bp->lpi_tmr_lo)) { netdev_warn(dev, "Valid LPI timer range is %d and %d microsecs\n", bp->lpi_tmr_lo, bp->lpi_tmr_hi); - return -EINVAL; + rc = -EINVAL; + goto eee_exit; } else if (!bp->lpi_tmr_hi) { edata->tx_lpi_timer = eee->tx_lpi_timer; } @@@ -2600,7 -2593,8 +2609,8 @@@ } else if (edata->advertised & ~advertising) { netdev_warn(dev, "EEE advertised %x must be a subset of autoneg advertised speeds %x\n", edata->advertised, advertising); - return -EINVAL; + rc = -EINVAL; + goto eee_exit; }
eee->advertised = edata->advertised; @@@ -2612,6 -2606,8 +2622,8 @@@ eee_ok if (netif_running(dev)) rc = bnxt_hwrm_set_link_setting(bp, false, true);
+ eee_exit: + mutex_unlock(&bp->link_lock); return rc; }
@@@ -3661,7 -3657,6 +3673,7 @@@ const struct ethtool_ops bnxt_ethtool_o ETHTOOL_COALESCE_USE_ADAPTIVE_RX, .get_link_ksettings = bnxt_get_link_ksettings, .set_link_ksettings = bnxt_set_link_ksettings, + .get_pause_stats = bnxt_get_pause_stats, .get_pauseparam = bnxt_get_pauseparam, .set_pauseparam = bnxt_set_pauseparam, .get_drvinfo = bnxt_get_drvinfo, diff --combined drivers/net/ethernet/cadence/macb_main.c index 830c537bc08c,9179f7b0b900..9c8f40e8a721 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@@ -647,8 -647,7 +647,7 @@@ static void macb_mac_link_up(struct phy ctrl |= GEM_BIT(GBE); }
- /* We do not support MLO_PAUSE_RX yet */ - if (tx_pause) + if (rx_pause) ctrl |= MACB_BIT(PAE);
macb_set_tx_clk(bp->tx_clk, speed, ndev); @@@ -1466,9 -1465,9 +1465,9 @@@ static int macb_poll(struct napi_struc return work_done; }
-static void macb_hresp_error_task(unsigned long data) +static void macb_hresp_error_task(struct tasklet_struct *t) { - struct macb *bp = (struct macb *)data; + struct macb *bp = from_tasklet(bp, t, hresp_err_tasklet); struct net_device *dev = bp->dev; struct macb_queue *queue; unsigned int q; @@@ -4560,7 -4559,8 +4559,7 @@@ static int macb_probe(struct platform_d goto err_out_unregister_mdio; }
- tasklet_init(&bp->hresp_err_tasklet, macb_hresp_error_task, - (unsigned long)bp); + tasklet_setup(&bp->hresp_err_tasklet, macb_hresp_error_task);
netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n", macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID), diff --combined drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c index f6c1ec140e09,481498585ead..6ec5f2f26f05 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@@ -604,14 -604,17 +604,14 @@@ int cxgb4_get_free_ftid(struct net_devi /* If the new rule wants to get inserted into * HPFILTER region, but its prio is greater * than the rule with the highest prio in HASH - * region, then reject the rule. - */ - if (t->tc_hash_tids_max_prio && - tc_prio > t->tc_hash_tids_max_prio) - break; - - /* If there's not enough slots available - * in HPFILTER region, then move on to - * normal FILTER region immediately. + * region, or if there's not enough slots + * available in HPFILTER region, then skip + * trying to insert this rule into HPFILTER + * region and directly go to the next region. */ - if (ftid + n > t->nhpftids) { + if ((t->tc_hash_tids_max_prio && + tc_prio > t->tc_hash_tids_max_prio) || + (ftid + n) > t->nhpftids) { ftid = t->nhpftids; continue; } @@@ -1908,13 -1911,16 +1908,16 @@@ out static int configure_filter_tcb(struct adapter *adap, unsigned int tid, struct filter_entry *f) { - if (f->fs.hitcnts) + if (f->fs.hitcnts) { set_tcb_field(adap, f, tid, TCB_TIMESTAMP_W, - TCB_TIMESTAMP_V(TCB_TIMESTAMP_M) | + TCB_TIMESTAMP_V(TCB_TIMESTAMP_M), + TCB_TIMESTAMP_V(0ULL), + 1); + set_tcb_field(adap, f, tid, TCB_RTT_TS_RECENT_AGE_W, TCB_RTT_TS_RECENT_AGE_V(TCB_RTT_TS_RECENT_AGE_M), - TCB_TIMESTAMP_V(0ULL) | TCB_RTT_TS_RECENT_AGE_V(0ULL), 1); + }
if (f->fs.newdmac) set_tcb_tflag(adap, f, tid, TF_CCTRL_ECE_S, 1, diff --combined drivers/net/ethernet/dec/tulip/de2104x.c index e648724c2c36,2610efe4f873..d9f6c19940ef --- a/drivers/net/ethernet/dec/tulip/de2104x.c +++ b/drivers/net/ethernet/dec/tulip/de2104x.c @@@ -85,7 -85,7 +85,7 @@@ MODULE_PARM_DESC (rx_copybreak, "de2104 #define DSL CONFIG_DE2104X_DSL #endif
- #define DE_RX_RING_SIZE 64 + #define DE_RX_RING_SIZE 128 #define DE_TX_RING_SIZE 64 #define DE_RING_BYTES \ ((sizeof(struct de_desc) * DE_RX_RING_SIZE) + \ @@@ -443,23 -443,21 +443,23 @@@ static void de_rx (struct de_private *d }
if (!copying_skb) { - pci_unmap_single(de->pdev, mapping, - buflen, PCI_DMA_FROMDEVICE); + dma_unmap_single(&de->pdev->dev, mapping, buflen, + DMA_FROM_DEVICE); skb_put(skb, len);
mapping = de->rx_skb[rx_tail].mapping = - pci_map_single(de->pdev, copy_skb->data, - buflen, PCI_DMA_FROMDEVICE); + dma_map_single(&de->pdev->dev, copy_skb->data, + buflen, DMA_FROM_DEVICE); de->rx_skb[rx_tail].skb = copy_skb; } else { - pci_dma_sync_single_for_cpu(de->pdev, mapping, len, PCI_DMA_FROMDEVICE); + dma_sync_single_for_cpu(&de->pdev->dev, mapping, len, + DMA_FROM_DEVICE); skb_reserve(copy_skb, RX_OFFSET); skb_copy_from_linear_data(skb, skb_put(copy_skb, len), len); - pci_dma_sync_single_for_device(de->pdev, mapping, len, PCI_DMA_FROMDEVICE); + dma_sync_single_for_device(&de->pdev->dev, mapping, + len, DMA_FROM_DEVICE);
/* We'll reuse the original ring buffer. */ skb = copy_skb; @@@ -556,15 -554,13 +556,15 @@@ static void de_tx (struct de_private *d goto next;
if (unlikely(skb == DE_SETUP_SKB)) { - pci_unmap_single(de->pdev, de->tx_skb[tx_tail].mapping, - sizeof(de->setup_frame), PCI_DMA_TODEVICE); + dma_unmap_single(&de->pdev->dev, + de->tx_skb[tx_tail].mapping, + sizeof(de->setup_frame), + DMA_TO_DEVICE); goto next; }
- pci_unmap_single(de->pdev, de->tx_skb[tx_tail].mapping, - skb->len, PCI_DMA_TODEVICE); + dma_unmap_single(&de->pdev->dev, de->tx_skb[tx_tail].mapping, + skb->len, DMA_TO_DEVICE);
if (status & LastFrag) { if (status & TxError) { @@@ -624,8 -620,7 +624,8 @@@ static netdev_tx_t de_start_xmit (struc txd = &de->tx_ring[entry];
len = skb->len; - mapping = pci_map_single(de->pdev, skb->data, len, PCI_DMA_TODEVICE); + mapping = dma_map_single(&de->pdev->dev, skb->data, len, + DMA_TO_DEVICE); if (entry == (DE_TX_RING_SIZE - 1)) flags |= RingEnd; if (!tx_free || (tx_free == (DE_TX_RING_SIZE / 2))) @@@ -768,8 -763,8 +768,8 @@@ static void __de_set_rx_mode (struct ne
de->tx_skb[entry].skb = DE_SETUP_SKB; de->tx_skb[entry].mapping = mapping = - pci_map_single (de->pdev, de->setup_frame, - sizeof (de->setup_frame), PCI_DMA_TODEVICE); + dma_map_single(&de->pdev->dev, de->setup_frame, + sizeof(de->setup_frame), DMA_TO_DEVICE);
/* Put the setup frame on the Tx list. */ txd = &de->tx_ring[entry]; @@@ -1284,10 -1279,8 +1284,10 @@@ static int de_refill_rx (struct de_priv if (!skb) goto err_out;
- de->rx_skb[i].mapping = pci_map_single(de->pdev, - skb->data, de->rx_buf_sz, PCI_DMA_FROMDEVICE); + de->rx_skb[i].mapping = dma_map_single(&de->pdev->dev, + skb->data, + de->rx_buf_sz, + DMA_FROM_DEVICE); de->rx_skb[i].skb = skb;
de->rx_ring[i].opts1 = cpu_to_le32(DescOwn); @@@ -1320,8 -1313,7 +1320,8 @@@ static int de_init_rings (struct de_pri
static int de_alloc_rings (struct de_private *de) { - de->rx_ring = pci_alloc_consistent(de->pdev, DE_RING_BYTES, &de->ring_dma); + de->rx_ring = dma_alloc_coherent(&de->pdev->dev, DE_RING_BYTES, + &de->ring_dma, GFP_KERNEL); if (!de->rx_ring) return -ENOMEM; de->tx_ring = &de->rx_ring[DE_RX_RING_SIZE]; @@@ -1341,9 -1333,8 +1341,9 @@@ static void de_clean_rings (struct de_p
for (i = 0; i < DE_RX_RING_SIZE; i++) { if (de->rx_skb[i].skb) { - pci_unmap_single(de->pdev, de->rx_skb[i].mapping, - de->rx_buf_sz, PCI_DMA_FROMDEVICE); + dma_unmap_single(&de->pdev->dev, + de->rx_skb[i].mapping, de->rx_buf_sz, + DMA_FROM_DEVICE); dev_kfree_skb(de->rx_skb[i].skb); } } @@@ -1353,15 -1344,15 +1353,15 @@@ if ((skb) && (skb != DE_DUMMY_SKB)) { if (skb != DE_SETUP_SKB) { de->dev->stats.tx_dropped++; - pci_unmap_single(de->pdev, - de->tx_skb[i].mapping, - skb->len, PCI_DMA_TODEVICE); + dma_unmap_single(&de->pdev->dev, + de->tx_skb[i].mapping, + skb->len, DMA_TO_DEVICE); dev_kfree_skb(skb); } else { - pci_unmap_single(de->pdev, - de->tx_skb[i].mapping, - sizeof(de->setup_frame), - PCI_DMA_TODEVICE); + dma_unmap_single(&de->pdev->dev, + de->tx_skb[i].mapping, + sizeof(de->setup_frame), + DMA_TO_DEVICE); } } } @@@ -1373,8 -1364,7 +1373,8 @@@ static void de_free_rings (struct de_private *de) { de_clean_rings(de); - pci_free_consistent(de->pdev, DE_RING_BYTES, de->rx_ring, de->ring_dma); + dma_free_coherent(&de->pdev->dev, DE_RING_BYTES, de->rx_ring, + de->ring_dma); de->rx_ring = NULL; de->tx_ring = NULL; } diff --combined drivers/net/ethernet/huawei/hinic/hinic_main.c index 2c63e3a690cd,28581bd8ce07..19d01def891f --- a/drivers/net/ethernet/huawei/hinic/hinic_main.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c @@@ -24,7 -24,6 +24,7 @@@ #include <linux/delay.h> #include <linux/err.h>
+#include "hinic_debugfs.h" #include "hinic_hw_qp.h" #include "hinic_hw_dev.h" #include "hinic_devlink.h" @@@ -154,8 -153,6 +154,8 @@@ static int create_txqs(struct hinic_de if (!nic_dev->txqs) return -ENOMEM;
+ hinic_sq_dbgfs_init(nic_dev); + for (i = 0; i < num_txqs; i++) { struct hinic_sq *sq = hinic_hwdev_get_sq(nic_dev->hwdev, i);
@@@ -165,32 -162,36 +165,50 @@@ "Failed to init Txq\n"); goto err_init_txq; } + + err = hinic_sq_debug_add(nic_dev, i); + if (err) { + netif_err(nic_dev, drv, netdev, + "Failed to add SQ%d debug\n", i); + goto err_add_sq_dbg; + } + }
return 0;
+err_add_sq_dbg: + hinic_clean_txq(&nic_dev->txqs[i]); err_init_txq: - for (j = 0; j < i; j++) + for (j = 0; j < i; j++) { + hinic_sq_debug_rem(nic_dev->txqs[j].sq); hinic_clean_txq(&nic_dev->txqs[j]); + } + + hinic_sq_dbgfs_uninit(nic_dev);
devm_kfree(&netdev->dev, nic_dev->txqs); return err; }
+ static void enable_txqs_napi(struct hinic_dev *nic_dev) + { + int num_txqs = hinic_hwdev_num_qps(nic_dev->hwdev); + int i; + + for (i = 0; i < num_txqs; i++) + napi_enable(&nic_dev->txqs[i].napi); + } + + static void disable_txqs_napi(struct hinic_dev *nic_dev) + { + int num_txqs = hinic_hwdev_num_qps(nic_dev->hwdev); + int i; + + for (i = 0; i < num_txqs; i++) + napi_disable(&nic_dev->txqs[i].napi); + } + /** * free_txqs - Free the Logical Tx Queues of specific NIC device * @nic_dev: the specific NIC device @@@ -203,12 -204,8 +221,12 @@@ static void free_txqs(struct hinic_dev if (!nic_dev->txqs) return;
- for (i = 0; i < num_txqs; i++) + for (i = 0; i < num_txqs; i++) { + hinic_sq_debug_rem(nic_dev->txqs[i].sq); hinic_clean_txq(&nic_dev->txqs[i]); + } + + hinic_sq_dbgfs_uninit(nic_dev);
devm_kfree(&netdev->dev, nic_dev->txqs); nic_dev->txqs = NULL; @@@ -234,8 -231,6 +252,8 @@@ static int create_rxqs(struct hinic_de if (!nic_dev->rxqs) return -ENOMEM;
+ hinic_rq_dbgfs_init(nic_dev); + for (i = 0; i < num_rxqs; i++) { struct hinic_rq *rq = hinic_hwdev_get_rq(nic_dev->hwdev, i);
@@@ -245,26 -240,13 +263,26 @@@ "Failed to init rxq\n"); goto err_init_rxq; } + + err = hinic_rq_debug_add(nic_dev, i); + if (err) { + netif_err(nic_dev, drv, netdev, + "Failed to add RQ%d debug\n", i); + goto err_add_rq_dbg; + } }
return 0;
+err_add_rq_dbg: + hinic_clean_rxq(&nic_dev->rxqs[i]); err_init_rxq: - for (j = 0; j < i; j++) + for (j = 0; j < i; j++) { + hinic_rq_debug_rem(nic_dev->rxqs[j].rq); hinic_clean_rxq(&nic_dev->rxqs[j]); + } + + hinic_rq_dbgfs_uninit(nic_dev);
devm_kfree(&netdev->dev, nic_dev->rxqs); return err; @@@ -282,12 -264,8 +300,12 @@@ static void free_rxqs(struct hinic_dev if (!nic_dev->rxqs) return;
- for (i = 0; i < num_rxqs; i++) + for (i = 0; i < num_rxqs; i++) { + hinic_rq_debug_rem(nic_dev->rxqs[i].rq); hinic_clean_rxq(&nic_dev->rxqs[i]); + } + + hinic_rq_dbgfs_uninit(nic_dev);
devm_kfree(&netdev->dev, nic_dev->rxqs); nic_dev->rxqs = NULL; @@@ -440,6 -418,8 +458,8 @@@ int hinic_open(struct net_device *netde goto err_create_txqs; }
+ enable_txqs_napi(nic_dev); + err = create_rxqs(nic_dev); if (err) { netif_err(nic_dev, drv, netdev, @@@ -524,6 -504,7 +544,7 @@@ err_port_state }
err_create_rxqs: + disable_txqs_napi(nic_dev); free_txqs(nic_dev);
err_create_txqs: @@@ -537,6 -518,9 +558,9 @@@ int hinic_close(struct net_device *netd struct hinic_dev *nic_dev = netdev_priv(netdev); unsigned int flags;
+ /* Disable txq napi firstly to aviod rewaking txq in free_tx_poll */ + disable_txqs_napi(nic_dev); + down(&nic_dev->mgmt_lock);
flags = nic_dev->flags; @@@ -929,16 -913,11 +953,16 @@@ static void netdev_features_init(struc netdev->hw_features = NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_RXCSUM | NETIF_F_LRO | - NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX; + NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | + NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM;
netdev->vlan_features = netdev->hw_features;
netdev->features = netdev->hw_features | NETIF_F_HW_VLAN_CTAG_FILTER; + + netdev->hw_enc_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_SCTP_CRC | + NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN | + NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_UDP_TUNNEL; }
static void hinic_refresh_nic_cfg(struct hinic_dev *nic_dev) @@@ -1305,16 -1284,6 +1329,16 @@@ static int nic_dev_init(struct pci_dev goto err_init_intr; }
+ hinic_dbg_init(nic_dev); + + hinic_func_tbl_dbgfs_init(nic_dev); + + err = hinic_func_table_debug_add(nic_dev); + if (err) { + dev_err(&pdev->dev, "Failed to add func_table debug\n"); + goto err_add_func_table_dbg; + } + err = register_netdev(netdev); if (err) { dev_err(&pdev->dev, "Failed to register netdev\n"); @@@ -1324,10 -1293,6 +1348,10 @@@ return 0;
err_reg_netdev: + hinic_func_table_debug_rem(nic_dev); +err_add_func_table_dbg: + hinic_func_tbl_dbgfs_uninit(nic_dev); + hinic_dbg_uninit(nic_dev); hinic_free_intr_coalesce(nic_dev); err_init_intr: err_set_pfc: @@@ -1450,12 -1415,6 +1474,12 @@@ static void hinic_remove(struct pci_de
unregister_netdev(netdev);
+ hinic_func_table_debug_rem(nic_dev); + + hinic_func_tbl_dbgfs_uninit(nic_dev); + + hinic_dbg_uninit(nic_dev); + hinic_free_intr_coalesce(nic_dev);
hinic_port_del_mac(nic_dev, netdev->dev_addr, 0); @@@ -1510,17 -1469,4 +1534,17 @@@ static struct pci_driver hinic_driver .sriov_configure = hinic_pci_sriov_configure, };
-module_pci_driver(hinic_driver); +static int __init hinic_module_init(void) +{ + hinic_dbg_register_debugfs(HINIC_DRV_NAME); + return pci_register_driver(&hinic_driver); +} + +static void __exit hinic_module_exit(void) +{ + pci_unregister_driver(&hinic_driver); + hinic_dbg_unregister_debugfs(); +} + +module_init(hinic_module_init); +module_exit(hinic_module_exit); diff --combined drivers/net/ethernet/huawei/hinic/hinic_rx.c index f403a6711e97,d0072f5e7efc..070a7cc6392e --- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c @@@ -543,18 -543,25 +543,25 @@@ static int rx_request_irq(struct hinic_ if (err) { netif_err(nic_dev, drv, rxq->netdev, "Failed to set RX interrupt coalescing attribute\n"); - rx_del_napi(rxq); - return err; + goto err_req_irq; }
err = request_irq(rq->irq, rx_irq, 0, rxq->irq_name, rxq); - if (err) { - rx_del_napi(rxq); - return err; - } + if (err) + goto err_req_irq;
cpumask_set_cpu(qp->q_id % num_online_cpus(), &rq->affinity_mask); - return irq_set_affinity_hint(rq->irq, &rq->affinity_mask); + err = irq_set_affinity_hint(rq->irq, &rq->affinity_mask); + if (err) + goto err_irq_affinity; + + return 0; + + err_irq_affinity: + free_irq(rq->irq, rxq); + err_req_irq: + rx_del_napi(rxq); + return err; }
static void rx_free_irq(struct hinic_rxq *rxq) @@@ -588,7 -595,7 +595,7 @@@ int hinic_init_rxq(struct hinic_rxq *rx rxq_stats_init(rxq);
rxq->irq_name = devm_kasprintf(&netdev->dev, GFP_KERNEL, - "hinic_rxq%d", qp->q_id); + "%s_rxq%d", netdev->name, qp->q_id); if (!rxq->irq_name) return -ENOMEM;
diff --combined drivers/net/ethernet/huawei/hinic/hinic_tx.c index c249b7e6e432,c1f81e9144a1..8da7d46363b2 --- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c @@@ -357,7 -357,6 +357,7 @@@ static int offload_csum(struct hinic_sq enum hinic_l4_offload_type l4_offload; u32 offset, l4_len, network_hdr_len; enum hinic_l3_offload_type l3_type; + u32 tunnel_type = NOT_TUNNEL; union hinic_l3 ip; union hinic_l4 l4; u8 l4_proto; @@@ -368,55 -367,27 +368,55 @@@ if (skb->encapsulation) { u32 l4_tunnel_len;
+ tunnel_type = TUNNEL_UDP_NO_CSUM; ip.hdr = skb_network_header(skb);
- if (ip.v4->version == 4) + if (ip.v4->version == 4) { l3_type = IPV4_PKT_NO_CHKSUM_OFFLOAD; - else if (ip.v4->version == 6) + l4_proto = ip.v4->protocol; + } else if (ip.v4->version == 6) { + unsigned char *exthdr; + __be16 frag_off; l3_type = IPV6_PKT; - else + tunnel_type = TUNNEL_UDP_CSUM; + exthdr = ip.hdr + sizeof(*ip.v6); + l4_proto = ip.v6->nexthdr; + l4.hdr = skb_transport_header(skb); + if (l4.hdr != exthdr) + ipv6_skip_exthdr(skb, exthdr - skb->data, + &l4_proto, &frag_off); + } else { l3_type = L3TYPE_UNKNOWN; + l4_proto = IPPROTO_RAW; + }
hinic_task_set_outter_l3(task, l3_type, skb_network_header_len(skb));
- l4_tunnel_len = skb_inner_network_offset(skb) - - skb_transport_offset(skb); - - hinic_task_set_tunnel_l4(task, TUNNEL_UDP_NO_CSUM, - l4_tunnel_len); + switch (l4_proto) { + case IPPROTO_UDP: + l4_tunnel_len = skb_inner_network_offset(skb) - + skb_transport_offset(skb); + ip.hdr = skb_inner_network_header(skb); + l4.hdr = skb_inner_transport_header(skb); + network_hdr_len = skb_inner_network_header_len(skb); + break; + case IPPROTO_IPIP: + case IPPROTO_IPV6: + tunnel_type = NOT_TUNNEL; + l4_tunnel_len = 0; + + ip.hdr = skb_inner_network_header(skb); + l4.hdr = skb_transport_header(skb); + network_hdr_len = skb_network_header_len(skb); + break; + default: + /* Unsupported tunnel packet, disable csum offload */ + skb_checksum_help(skb); + return 0; + }
- ip.hdr = skb_inner_network_header(skb); - l4.hdr = skb_inner_transport_header(skb); - network_hdr_len = skb_inner_network_header_len(skb); + hinic_task_set_tunnel_l4(task, tunnel_type, l4_tunnel_len); } else { ip.hdr = skb_network_header(skb); l4.hdr = skb_transport_header(skb); @@@ -746,8 -717,8 +746,8 @@@ static int free_tx_poll(struct napi_str netdev_txq = netdev_get_tx_queue(txq->netdev, qp->q_id);
__netif_tx_lock(netdev_txq, smp_processor_id()); - - netif_wake_subqueue(nic_dev->netdev, qp->q_id); + if (!netif_testing(nic_dev->netdev)) + netif_wake_subqueue(nic_dev->netdev, qp->q_id);
__netif_tx_unlock(netdev_txq);
@@@ -774,18 -745,6 +774,6 @@@ return budget; }
- static void tx_napi_add(struct hinic_txq *txq, int weight) - { - netif_napi_add(txq->netdev, &txq->napi, free_tx_poll, weight); - napi_enable(&txq->napi); - } - - static void tx_napi_del(struct hinic_txq *txq) - { - napi_disable(&txq->napi); - netif_napi_del(&txq->napi); - } - static irqreturn_t tx_irq(int irq, void *data) { struct hinic_txq *txq = data; @@@ -819,7 -778,7 +807,7 @@@ static int tx_request_irq(struct hinic_
qp = container_of(sq, struct hinic_qp, sq);
- tx_napi_add(txq, nic_dev->tx_weight); + netif_napi_add(txq->netdev, &txq->napi, free_tx_poll, nic_dev->tx_weight);
hinic_hwdev_msix_set(nic_dev->hwdev, sq->msix_entry, TX_IRQ_NO_PENDING, TX_IRQ_NO_COALESC, @@@ -836,14 -795,14 +824,14 @@@ if (err) { netif_err(nic_dev, drv, txq->netdev, "Failed to set TX interrupt coalescing attribute\n"); - tx_napi_del(txq); + netif_napi_del(&txq->napi); return err; }
err = request_irq(sq->irq, tx_irq, 0, txq->irq_name, txq); if (err) { dev_err(&pdev->dev, "Failed to request Tx irq\n"); - tx_napi_del(txq); + netif_napi_del(&txq->napi); return err; }
@@@ -855,7 -814,7 +843,7 @@@ static void tx_free_irq(struct hinic_tx struct hinic_sq *sq = txq->sq;
free_irq(sq->irq, txq); - tx_napi_del(txq); + netif_napi_del(&txq->napi); }
/** @@@ -894,14 -853,14 +882,14 @@@ int hinic_init_txq(struct hinic_txq *tx goto err_alloc_free_sges; }
- irqname_len = snprintf(NULL, 0, "hinic_txq%d", qp->q_id) + 1; + irqname_len = snprintf(NULL, 0, "%s_txq%d", netdev->name, qp->q_id) + 1; txq->irq_name = devm_kzalloc(&netdev->dev, irqname_len, GFP_KERNEL); if (!txq->irq_name) { err = -ENOMEM; goto err_alloc_irqname; }
- sprintf(txq->irq_name, "hinic_txq%d", qp->q_id); + sprintf(txq->irq_name, "%s_txq%d", netdev->name, qp->q_id);
err = hinic_hwdev_hw_ci_addr_set(hwdev, sq, CI_UPDATE_NO_PENDING, CI_UPDATE_NO_COALESC); diff --combined drivers/net/ethernet/ibm/ibmvnic.c index 6d320be47e60,1b702a43a5d0..a151ff37fda2 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@@ -104,7 -104,8 +104,7 @@@ static int send_login(struct ibmvnic_ad static void send_cap_queries(struct ibmvnic_adapter *adapter); static int init_sub_crqs(struct ibmvnic_adapter *); static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter); -static int ibmvnic_init(struct ibmvnic_adapter *); -static int ibmvnic_reset_init(struct ibmvnic_adapter *); +static int ibmvnic_reset_init(struct ibmvnic_adapter *, bool reset); static void release_crq_queue(struct ibmvnic_adapter *); static int __ibmvnic_set_mac(struct net_device *, u8 *); static int init_crq_queue(struct ibmvnic_adapter *adapter); @@@ -296,7 -297,8 +296,7 @@@ static void deactivate_rx_pools(struct { int i;
- for (i = 0; i < be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); - i++) + for (i = 0; i < adapter->num_active_rx_pools; i++) adapter->rx_pool[i].active = 0; }
@@@ -304,7 -306,6 +304,7 @@@ static void replenish_rx_pool(struct ib struct ibmvnic_rx_pool *pool) { int count = pool->size - atomic_read(&pool->available); + u64 handle = adapter->rx_scrq[pool->index]->handle; struct device *dev = &adapter->vdev->dev; int buffers_added = 0; unsigned long lpar_rc; @@@ -313,6 -314,7 +313,6 @@@ unsigned int offset; dma_addr_t dma_addr; unsigned char *dst; - u64 *handle_array; int shift = 0; int index; int i; @@@ -320,6 -322,10 +320,6 @@@ if (!pool->active) return;
- handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + - be32_to_cpu(adapter->login_rsp_buf-> - off_rxadd_subcrqs)); - for (i = 0; i < count; ++i) { skb = alloc_skb(pool->buff_size, GFP_ATOMIC); if (!skb) { @@@ -363,7 -369,8 +363,7 @@@ #endif sub_crq.rx_add.len = cpu_to_be32(pool->buff_size << shift);
- lpar_rc = send_subcrq(adapter, handle_array[pool->index], - &sub_crq); + lpar_rc = send_subcrq(adapter, handle, &sub_crq); if (lpar_rc != H_SUCCESS) goto failure;
@@@ -400,7 -407,8 +400,7 @@@ static void replenish_pools(struct ibmv int i;
adapter->replenish_task_cycles++; - for (i = 0; i < be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); - i++) { + for (i = 0; i < adapter->num_active_rx_pools; i++) { if (adapter->rx_pool[i].active) replenish_rx_pool(adapter, &adapter->rx_pool[i]); } @@@ -467,23 -475,25 +467,23 @@@ static int init_stats_token(struct ibmv static int reset_rx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_rx_pool *rx_pool; + u64 buff_size; int rx_scrqs; int i, j, rc; - u64 *size_array;
if (!adapter->rx_pool) return -1;
- size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + - be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); - - rx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); + buff_size = adapter->cur_rx_buf_sz; + rx_scrqs = adapter->num_active_rx_pools; for (i = 0; i < rx_scrqs; i++) { rx_pool = &adapter->rx_pool[i];
netdev_dbg(adapter->netdev, "Re-setting rx_pool[%d]\n", i);
- if (rx_pool->buff_size != be64_to_cpu(size_array[i])) { + if (rx_pool->buff_size != buff_size) { free_long_term_buff(adapter, &rx_pool->long_term_buff); - rx_pool->buff_size = be64_to_cpu(size_array[i]); + rx_pool->buff_size = buff_size; rc = alloc_long_term_buff(adapter, &rx_pool->long_term_buff, rx_pool->size * @@@ -551,11 -561,13 +551,11 @@@ static int init_rx_pools(struct net_dev struct device *dev = &adapter->vdev->dev; struct ibmvnic_rx_pool *rx_pool; int rxadd_subcrqs; - u64 *size_array; + u64 buff_size; int i, j;
- rxadd_subcrqs = - be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); - size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + - be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); + rxadd_subcrqs = adapter->num_active_rx_scrqs; + buff_size = adapter->cur_rx_buf_sz;
adapter->rx_pool = kcalloc(rxadd_subcrqs, sizeof(struct ibmvnic_rx_pool), @@@ -573,11 -585,11 +573,11 @@@ netdev_dbg(adapter->netdev, "Initializing rx_pool[%d], %lld buffs, %lld bytes each\n", i, adapter->req_rx_add_entries_per_subcrq, - be64_to_cpu(size_array[i])); + buff_size);
rx_pool->size = adapter->req_rx_add_entries_per_subcrq; rx_pool->index = i; - rx_pool->buff_size = be64_to_cpu(size_array[i]); + rx_pool->buff_size = buff_size; rx_pool->active = 1;
rx_pool->free_map = kcalloc(rx_pool->size, sizeof(int), @@@ -643,7 -655,7 +643,7 @@@ static int reset_tx_pools(struct ibmvni if (!adapter->tx_pool) return -1;
- tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs); + tx_scrqs = adapter->num_active_tx_pools; for (i = 0; i < tx_scrqs; i++) { rc = reset_one_tx_pool(adapter, &adapter->tso_pool[i]); if (rc) @@@ -732,7 -744,7 +732,7 @@@ static int init_tx_pools(struct net_dev int tx_subcrqs; int i, rc;
- tx_subcrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs); + tx_subcrqs = adapter->num_active_tx_scrqs; adapter->tx_pool = kcalloc(tx_subcrqs, sizeof(struct ibmvnic_tx_pool), GFP_KERNEL); if (!adapter->tx_pool) @@@ -968,7 -980,7 +968,7 @@@ static int set_link_state(struct ibmvni return -1; }
- if (adapter->init_done_rc == 1) { + if (adapter->init_done_rc == PARTIALSUCCESS) { /* Partuial success, delay and re-send */ mdelay(1000); resend = true; @@@ -1518,9 -1530,9 +1518,9 @@@ static netdev_tx_t ibmvnic_xmit(struct unsigned int offset; int num_entries = 1; unsigned char *dst; - u64 *handle_array; int index = 0; u8 proto = 0; + u64 handle; netdev_tx_t ret = NETDEV_TX_OK;
if (test_bit(0, &adapter->resetting)) { @@@ -1547,7 -1559,8 +1547,7 @@@
tx_scrq = adapter->tx_scrq[queue_num]; txq = netdev_get_tx_queue(netdev, skb_get_queue_mapping(skb)); - handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + - be32_to_cpu(adapter->login_rsp_buf->off_txsubm_subcrqs)); + handle = tx_scrq->handle;
index = tx_pool->free_map[tx_pool->consumer_index];
@@@ -1659,14 -1672,14 +1659,14 @@@ ret = NETDEV_TX_OK; goto tx_err_out; } - lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num], + lpar_rc = send_subcrq_indirect(adapter, handle, (u64)tx_buff->indir_dma, (u64)num_entries); dma_unmap_single(dev, tx_buff->indir_dma, sizeof(tx_buff->indir_arr), DMA_TO_DEVICE); } else { tx_buff->num_entries = num_entries; - lpar_rc = send_subcrq(adapter, handle_array[queue_num], + lpar_rc = send_subcrq(adapter, handle, &tx_crq); } if (lpar_rc != H_SUCCESS) { @@@ -1861,7 -1874,7 +1861,7 @@@ static int do_change_param_reset(struc return rc; }
- rc = ibmvnic_reset_init(adapter); + rc = ibmvnic_reset_init(adapter, true); if (rc) return IBMVNIC_INIT_FAILED;
@@@ -1979,7 -1992,7 +1979,7 @@@ static int do_reset(struct ibmvnic_adap goto out; }
- rc = ibmvnic_reset_init(adapter); + rc = ibmvnic_reset_init(adapter, true); if (rc) { rc = IBMVNIC_INIT_FAILED; goto out; @@@ -2019,16 -2032,18 +2019,18 @@@
} else { rc = reset_tx_pools(adapter); - if (rc) + if (rc) { netdev_dbg(adapter->netdev, "reset tx pools failed (%d)\n", rc); goto out; + }
rc = reset_rx_pools(adapter); - if (rc) + if (rc) { netdev_dbg(adapter->netdev, "reset rx pools failed (%d)\n", rc); goto out; + } } ibmvnic_disable_irqs(adapter); } @@@ -2093,7 -2108,7 +2095,7 @@@ static int do_hard_reset(struct ibmvnic return rc; }
- rc = ibmvnic_init(adapter); + rc = ibmvnic_reset_init(adapter, false); if (rc) return rc;
@@@ -3568,7 -3583,8 +3570,7 @@@ static int ibmvnic_send_crq(struct ibmv if (rc) { if (rc == H_CLOSED) { dev_warn(dev, "CRQ Queue closed\n"); - if (test_bit(0, &adapter->resetting)) - ibmvnic_reset(adapter, VNIC_RESET_FATAL); + /* do not reset, report the fail, wait for passive init from server */ }
dev_warn(dev, "Send error (rc=%d)\n", rc); @@@ -3579,31 -3595,14 +3581,31 @@@
static int ibmvnic_send_crq_init(struct ibmvnic_adapter *adapter) { + struct device *dev = &adapter->vdev->dev; union ibmvnic_crq crq; + int retries = 100; + int rc;
memset(&crq, 0, sizeof(crq)); crq.generic.first = IBMVNIC_CRQ_INIT_CMD; crq.generic.cmd = IBMVNIC_CRQ_INIT; netdev_dbg(adapter->netdev, "Sending CRQ init\n");
- return ibmvnic_send_crq(adapter, &crq); + do { + rc = ibmvnic_send_crq(adapter, &crq); + if (rc != H_CLOSED) + break; + retries--; + msleep(50); + + } while (retries > 0); + + if (rc) { + dev_err(dev, "Failed to send init request, rc = %d\n", rc); + return rc; + } + + return 0; }
static int send_version_xchg(struct ibmvnic_adapter *adapter) @@@ -4308,11 -4307,6 +4310,11 @@@ static int handle_login_rsp(union ibmvn struct net_device *netdev = adapter->netdev; struct ibmvnic_login_rsp_buffer *login_rsp = adapter->login_rsp_buf; struct ibmvnic_login_buffer *login = adapter->login_buf; + u64 *tx_handle_array; + u64 *rx_handle_array; + int num_tx_pools; + int num_rx_pools; + u64 *size_array; int i;
dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz, @@@ -4347,30 -4341,6 +4349,30 @@@ ibmvnic_remove(adapter->vdev); return -EIO; } + size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + + be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); + /* variable buffer sizes are not supported, so just read the + * first entry. + */ + adapter->cur_rx_buf_sz = be64_to_cpu(size_array[0]); + + num_tx_pools = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs); + num_rx_pools = be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); + + tx_handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + + be32_to_cpu(adapter->login_rsp_buf->off_txsubm_subcrqs)); + rx_handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + + be32_to_cpu(adapter->login_rsp_buf->off_rxadd_subcrqs)); + + for (i = 0; i < num_tx_pools; i++) + adapter->tx_scrq[i]->handle = tx_handle_array[i]; + + for (i = 0; i < num_rx_pools; i++) + adapter->rx_scrq[i]->handle = rx_handle_array[i]; + + adapter->num_active_tx_scrqs = num_tx_pools; + adapter->num_active_rx_scrqs = num_rx_pools; + release_login_rsp_buffer(adapter); release_login_buffer(adapter); complete(&adapter->init_done);
@@@ -4842,9 -4812,9 +4844,9 @@@ static irqreturn_t ibmvnic_interrupt(in return IRQ_HANDLED; }
-static void ibmvnic_tasklet(void *data) +static void ibmvnic_tasklet(struct tasklet_struct *t) { - struct ibmvnic_adapter *adapter = data; + struct ibmvnic_adapter *adapter = from_tasklet(adapter, t, tasklet); struct ibmvnic_crq_queue *queue = &adapter->crq; union ibmvnic_crq *crq; unsigned long flags; @@@ -4979,7 -4949,8 +4981,7 @@@ static int init_crq_queue(struct ibmvni
retrc = 0;
- tasklet_init(&adapter->tasklet, (void *)ibmvnic_tasklet, - (unsigned long)adapter); + tasklet_setup(&adapter->tasklet, (void *)ibmvnic_tasklet);
netdev_dbg(adapter->netdev, "registering irq 0x%x\n", vdev->irq); snprintf(crq->name, sizeof(crq->name), "ibmvnic-%x", @@@ -5015,7 -4986,7 +5017,7 @@@ map_failed return retrc; }
-static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) +static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset) { struct device *dev = &adapter->vdev->dev; unsigned long timeout = msecs_to_jiffies(30000); @@@ -5024,19 -4995,12 +5026,19 @@@
adapter->from_passive_init = false;
- old_num_rx_queues = adapter->req_rx_queues; - old_num_tx_queues = adapter->req_tx_queues; + if (reset) { + old_num_rx_queues = adapter->req_rx_queues; + old_num_tx_queues = adapter->req_tx_queues; + reinit_completion(&adapter->init_done); + }
- reinit_completion(&adapter->init_done); adapter->init_done_rc = 0; - ibmvnic_send_crq_init(adapter); + rc = ibmvnic_send_crq_init(adapter); + if (rc) { + dev_err(dev, "Send crq init failed with error %d\n", rc); + return rc; + } + if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { dev_err(dev, "Initialization sequence timed out\n"); return -1; @@@ -5053,8 -5017,7 +5055,8 @@@ return -1; }
- if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset && + if (reset && + test_bit(0, &adapter->resetting) && !adapter->wait_for_reset && adapter->reset_reason != VNIC_RESET_MOBILITY) { if (adapter->req_rx_queues != old_num_rx_queues || adapter->req_tx_queues != old_num_tx_queues) { @@@ -5082,6 -5045,48 +5084,6 @@@ return rc; }
-static int ibmvnic_init(struct ibmvnic_adapter *adapter) -{ - struct device *dev = &adapter->vdev->dev; - unsigned long timeout = msecs_to_jiffies(30000); - int rc; - - adapter->from_passive_init = false; - - adapter->init_done_rc = 0; - ibmvnic_send_crq_init(adapter); - if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { - dev_err(dev, "Initialization sequence timed out\n"); - return -1; - } - - if (adapter->init_done_rc) { - release_crq_queue(adapter); - return adapter->init_done_rc; - } - - if (adapter->from_passive_init) { - adapter->state = VNIC_OPEN; - adapter->from_passive_init = false; - return -1; - } - - rc = init_sub_crqs(adapter); - if (rc) { - dev_err(dev, "Initialization of sub crqs failed\n"); - release_crq_queue(adapter); - return rc; - } - - rc = init_sub_crq_irqs(adapter); - if (rc) { - dev_err(dev, "Failed to initialize sub crq irqs\n"); - release_crq_queue(adapter); - } - - return rc; -} - static struct device_attribute dev_attr_failover;
static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) @@@ -5144,7 -5149,7 +5146,7 @@@ goto ibmvnic_init_fail; }
- rc = ibmvnic_init(adapter); + rc = ibmvnic_reset_init(adapter, false); if (rc && rc != EAGAIN) goto ibmvnic_init_fail; } while (rc == EAGAIN); @@@ -5294,7 -5299,8 +5296,7 @@@ static unsigned long ibmvnic_get_desire for (i = 0; i < adapter->req_tx_queues + adapter->req_rx_queues; i++) ret += 4 * PAGE_SIZE; /* the scrq message queue */
- for (i = 0; i < be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs); - i++) + for (i = 0; i < adapter->num_active_rx_pools; i++) ret += adapter->rx_pool[i].size * IOMMU_PAGE_ALIGN(adapter->rx_pool[i].buff_size, tbl);
diff --combined drivers/net/ethernet/marvell/mvneta.c index e5ddefe5cade,c4345e3d616f..14df3aec285d --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@@ -330,6 -330,7 +330,6 @@@ #define MVNETA_SKB_HEADROOM ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) #define MVNETA_SKB_PAD (SKB_DATA_ALIGN(sizeof(struct skb_shared_info) + \ MVNETA_SKB_HEADROOM)) -#define MVNETA_SKB_SIZE(len) (SKB_DATA_ALIGN(len) + MVNETA_SKB_PAD) #define MVNETA_MAX_RX_BUF_SIZE (PAGE_SIZE - MVNETA_SKB_PAD)
#define IS_TSO_HEADER(txq, addr) \ @@@ -751,12 -752,13 +751,12 @@@ static void mvneta_txq_inc_put(struct m static void mvneta_mib_counters_clear(struct mvneta_port *pp) { int i; - u32 dummy;
/* Perform dummy reads from MIB counters */ for (i = 0; i < MVNETA_MIB_LATE_COLLISION; i += 4) - dummy = mvreg_read(pp, (MVNETA_MIB_COUNTERS_BASE + i)); - dummy = mvreg_read(pp, MVNETA_RX_DISCARD_FRAME_COUNT); - dummy = mvreg_read(pp, MVNETA_OVERRUN_FRAME_COUNT); + mvreg_read(pp, (MVNETA_MIB_COUNTERS_BASE + i)); + mvreg_read(pp, MVNETA_RX_DISCARD_FRAME_COUNT); + mvreg_read(pp, MVNETA_OVERRUN_FRAME_COUNT); }
/* Get System Network Statistics */ @@@ -2027,11 -2029,11 +2027,11 @@@ mvneta_xdp_put_buff(struct mvneta_port struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); int i;
- page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data), - sync_len, napi); for (i = 0; i < sinfo->nr_frags; i++) page_pool_put_full_page(rxq->page_pool, skb_frag_page(&sinfo->frags[i]), napi); + page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data), + sync_len, napi); }
static int @@@ -2225,7 -2227,8 +2225,7 @@@ mvneta_swbm_rx_frame(struct mvneta_por struct mvneta_rx_desc *rx_desc, struct mvneta_rx_queue *rxq, struct xdp_buff *xdp, int *size, - struct page *page, - struct mvneta_stats *stats) + struct page *page) { unsigned char *data = page_address(page); int data_len = -MVNETA_MH_SIZE, len; @@@ -2233,7 -2236,7 +2233,7 @@@ enum dma_data_direction dma_dir; struct skb_shared_info *sinfo;
- if (MVNETA_SKB_SIZE(rx_desc->data_size) > PAGE_SIZE) { + if (rx_desc->data_size > MVNETA_MAX_RX_BUF_SIZE) { len = MVNETA_MAX_RX_BUF_SIZE; data_len += len; } else { @@@ -2304,8 -2307,11 +2304,8 @@@ mvneta_swbm_build_skb(struct mvneta_por { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); int i, num_frags = sinfo->nr_frags; - skb_frag_t frags[MAX_SKB_FRAGS]; struct sk_buff *skb;
- memcpy(frags, sinfo->frags, sizeof(skb_frag_t) * num_frags); - skb = build_skb(xdp->data_hard_start, PAGE_SIZE); if (!skb) return ERR_PTR(-ENOMEM); @@@ -2317,12 -2323,12 +2317,12 @@@ mvneta_rx_csum(pp, desc_status, skb);
for (i = 0; i < num_frags; i++) { - struct page *page = skb_frag_page(&frags[i]); + skb_frag_t *frag = &sinfo->frags[i];
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - page, skb_frag_off(&frags[i]), - skb_frag_size(&frags[i]), PAGE_SIZE); - page_pool_release_page(rxq->page_pool, page); + skb_frag_page(frag), skb_frag_off(frag), + skb_frag_size(frag), PAGE_SIZE); + page_pool_release_page(rxq->page_pool, skb_frag_page(frag)); }
return skb; @@@ -2375,10 -2381,14 +2375,14 @@@ static int mvneta_rx_swbm(struct napi_s desc_status = rx_desc->status;
mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf, - &size, page, &ps); + &size, page); } else { - if (unlikely(!xdp_buf.data_hard_start)) + if (unlikely(!xdp_buf.data_hard_start)) { + rx_desc->buf_phys_addr = 0; + page_pool_put_full_page(rxq->page_pool, page, + true); continue; + }
mvneta_swbm_add_rx_fragment(pp, rx_desc, rxq, &xdp_buf, &size, page); diff --combined drivers/net/ethernet/mellanox/mlx5/core/en.h index 95aab8b429cf,90d5caabd6af..e9010ceb5ea9 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@@ -265,7 -265,6 +265,7 @@@ enum MLX5E_RQ_STATE_NO_CSUM_COMPLETE, MLX5E_RQ_STATE_CSUM_FULL, /* cqe_csum_full hw bit is set */ MLX5E_RQ_STATE_FPGA_TLS, /* FPGA TLS enabled */ + MLX5E_RQ_STATE_MINI_CQE_HW_STRIDX /* set when mini_cqe_resp_stride_index cap is used */ };
struct mlx5e_cq { @@@ -443,7 -442,7 +443,7 @@@ struct mlx5e_xdpsq struct mlx5e_cq cq;
/* read only */ - struct xdp_umem *umem; + struct xsk_buff_pool *xsk_pool; struct mlx5_wq_cyc wq; struct mlx5e_xdpsq_stats *stats; mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check; @@@ -601,13 -600,13 +601,13 @@@ struct mlx5e_rq struct dim dim; /* Dynamic Interrupt Moderation */
/* XDP */ - struct bpf_prog *xdp_prog; + struct bpf_prog __rcu *xdp_prog; struct mlx5e_xdpsq *xdpsq; DECLARE_BITMAP(flags, 8); struct page_pool *page_pool;
/* AF_XDP zero-copy */ - struct xdp_umem *umem; + struct xsk_buff_pool *xsk_pool;
struct work_struct recover_work;
@@@ -730,13 -729,12 +730,13 @@@ struct mlx5e_hv_vhca_stats_agent #endif
struct mlx5e_xsk { - /* UMEMs are stored separately from channels, because we don't want to - * lose them when channels are recreated. The kernel also stores UMEMs, - * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs, - * so rely on our mechanism. + /* XSK buffer pools are stored separately from channels, + * because we don't want to lose them when channels are + * recreated. The kernel also stores buffer pool, but it doesn't + * distinguish between zero-copy and non-zero-copy UMEMs, so + * rely on our mechanism. */ - struct xdp_umem **umems; + struct xsk_buff_pool **pools; u16 refcnt; bool ever_used; }; @@@ -895,7 -893,7 +895,7 @@@ struct mlx5e_xsk_param struct mlx5e_rq_param; int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, struct mlx5e_rq *rq); + struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq); int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time); void mlx5e_deactivate_rq(struct mlx5e_rq *rq); void mlx5e_close_rq(struct mlx5e_rq *rq); @@@ -905,7 -903,7 +905,7 @@@ int mlx5e_open_icosq(struct mlx5e_chann struct mlx5e_sq_param *param, struct mlx5e_icosq *sq); void mlx5e_close_icosq(struct mlx5e_icosq *sq); int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct mlx5e_sq_param *param, struct xdp_umem *umem, + struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool, struct mlx5e_xdpsq *sq, bool is_redirect); void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);
@@@ -1007,7 -1005,6 +1007,6 @@@ int mlx5e_update_nic_rx(struct mlx5e_pr void mlx5e_update_carrier(struct mlx5e_priv *priv); int mlx5e_close(struct net_device *netdev); int mlx5e_open(struct net_device *netdev); - void mlx5e_update_ndo_stats(struct mlx5e_priv *priv);
void mlx5e_queue_update_stats(struct mlx5e_priv *priv); int mlx5e_bits_invert(unsigned long a, int size); diff --combined drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index 145592788de5,b28df21981a1..562bc465f82b --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@@ -122,7 -122,7 +122,7 @@@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di, u32 *len, struct xdp_buff *xdp) { - struct bpf_prog *prog = READ_ONCE(rq->xdp_prog); + struct bpf_prog *prog = rcu_dereference(rq->xdp_prog); u32 act; int err;
@@@ -201,7 -201,7 +201,7 @@@ static void mlx5e_xdp_mpwqe_session_sta pi = mlx5e_xdpsq_get_next_pi(sq, MLX5_SEND_WQE_MAX_WQEBBS); session->wqe = MLX5E_TX_FETCH_WQE(sq, pi);
- prefetchw(session->wqe->data); + net_prefetchw(session->wqe->data); session->ds_count = MLX5E_XDP_TX_EMPTY_DS_COUNT; session->pkt_count = 0;
@@@ -322,7 -322,7 +322,7 @@@ mlx5e_xmit_xdp_frame(struct mlx5e_xdps
struct mlx5e_xdpsq_stats *stats = sq->stats;
- prefetchw(wqe); + net_prefetchw(wqe);
if (unlikely(dma_len < MLX5E_XDP_MIN_INLINE || sq->hw_mtu < dma_len)) { stats->err++; @@@ -445,7 -445,7 +445,7 @@@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_c } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
if (xsk_frames) - xsk_umem_complete_tx(sq->umem, xsk_frames); + xsk_tx_completed(sq->xsk_pool, xsk_frames);
sq->stats->cqes += i;
@@@ -475,7 -475,7 +475,7 @@@ void mlx5e_free_xdpsq_descs(struct mlx5 }
if (xsk_frames) - xsk_umem_complete_tx(sq->umem, xsk_frames); + xsk_tx_completed(sq->xsk_pool, xsk_frames); }
int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, @@@ -563,3 -563,4 +563,3 @@@ void mlx5e_set_xmit_fp(struct mlx5e_xdp sq->xmit_xdp_frame = is_mpw ? mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame; } - diff --combined drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c index bb6669d2a916,40db27bf790b..8e7b877d8a12 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c @@@ -31,7 -31,6 +31,6 @@@ struct sk_buff *mlx5e_xsk_skb_from_cqe_ { struct xdp_buff *xdp = wi->umr.dma_info[page_idx].xsk; u32 cqe_bcnt32 = cqe_bcnt; - bool consumed;
/* Check packet size. Note LRO doesn't use linear SKB */ if (unlikely(cqe_bcnt > rq->hw_mtu)) { @@@ -48,13 -47,9 +47,9 @@@
xdp->data_end = xdp->data + cqe_bcnt32; xdp_set_data_meta_invalid(xdp); - xsk_buff_dma_sync_for_cpu(xdp); - prefetch(xdp->data); + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); + net_prefetch(xdp->data);
- rcu_read_lock(); - consumed = mlx5e_xdp_handle(rq, NULL, &cqe_bcnt32, xdp); - rcu_read_unlock(); - /* Possible flows: * - XDP_REDIRECT to XSKMAP: * The page is owned by the userspace from now. @@@ -70,7 -65,7 +65,7 @@@ * allocated first from the Reuse Ring, so it has enough space. */
- if (likely(consumed)) { + if (likely(mlx5e_xdp_handle(rq, NULL, &cqe_bcnt32, xdp))) { if (likely(__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))) __set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */ return NULL; /* page/packet was consumed by XDP */ @@@ -88,7 -83,6 +83,6 @@@ struct sk_buff *mlx5e_xsk_skb_from_cqe_ u32 cqe_bcnt) { struct xdp_buff *xdp = wi->di->xsk; - bool consumed;
/* wi->offset is not used in this function, because xdp->data and the * DMA address point directly to the necessary place. Furthermore, the @@@ -99,19 -93,15 +93,15 @@@
xdp->data_end = xdp->data + cqe_bcnt; xdp_set_data_meta_invalid(xdp); - xsk_buff_dma_sync_for_cpu(xdp); - prefetch(xdp->data); + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); + net_prefetch(xdp->data);
if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) { rq->stats->wqe_err++; return NULL; }
- rcu_read_lock(); - consumed = mlx5e_xdp_handle(rq, NULL, &cqe_bcnt, xdp); - rcu_read_unlock(); - - if (likely(consumed)) + if (likely(mlx5e_xdp_handle(rq, NULL, &cqe_bcnt, xdp))) return NULL; /* page/packet was consumed by XDP */
/* XDP_PASS: copy the data from the UMEM to a new SKB. The frame reuse diff --combined drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c index 662a1dafeaad,55e65a438de7..4e574ac73019 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c @@@ -45,7 -45,7 +45,7 @@@ static void mlx5e_build_xsk_cparam(stru }
int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, - struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, + struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool, struct mlx5e_channel *c) { struct mlx5e_channel_param *cparam; @@@ -64,7 -64,7 +64,7 @@@ if (unlikely(err)) goto err_free_cparam;
- err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq); + err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq); if (unlikely(err)) goto err_close_rx_cq;
@@@ -72,13 -72,13 +72,13 @@@ if (unlikely(err)) goto err_close_rq;
- /* Create a separate SQ, so that when the UMEM is disabled, we could + /* Create a separate SQ, so that when the buff pool is disabled, we could * close this SQ safely and stop receiving CQEs. In other case, e.g., if - * the XDPSQ was used instead, we might run into trouble when the UMEM + * the XDPSQ was used instead, we might run into trouble when the buff pool * is disabled and then reenabled, but the SQ continues receiving CQEs - * from the old UMEM. + * from the old buff pool. */ - err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true); + err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true); if (unlikely(err)) goto err_close_tx_cq;
@@@ -106,8 -106,7 +106,7 @@@ err_free_cparam void mlx5e_close_xsk(struct mlx5e_channel *c) { clear_bit(MLX5E_CHANNEL_STATE_XSK, c->state); - napi_synchronize(&c->napi); - synchronize_rcu(); /* Sync with the XSK wakeup. */ + synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
mlx5e_close_rq(&c->xskrq); mlx5e_close_cq(&c->xskrq.cq); diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_main.c index b057a6c3a6d5,b3cda7b6e5e1..4d30a81cbadc --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@@ -57,7 -57,7 +57,7 @@@ #include "en/monitor_stats.h" #include "en/health.h" #include "en/params.h" -#include "en/xsk/umem.h" +#include "en/xsk/pool.h" #include "en/xsk/setup.h" #include "en/xsk/rx.h" #include "en/xsk/tx.h" @@@ -158,16 -158,6 +158,6 @@@ static void mlx5e_update_carrier_work(s mutex_unlock(&priv->state_lock); }
- void mlx5e_update_ndo_stats(struct mlx5e_priv *priv) - { - int i; - - for (i = mlx5e_nic_stats_grps_num(priv) - 1; i >= 0; i--) - if (mlx5e_nic_stats_grps[i]->update_stats_mask & - MLX5E_NDO_UPDATE_STATS) - mlx5e_nic_stats_grps[i]->update_stats(priv); - } - static void mlx5e_update_stats_work(struct work_struct *work) { struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv, @@@ -363,7 -353,7 +353,7 @@@ static void mlx5e_rq_err_cqe_work(struc static int mlx5e_alloc_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, + struct xsk_buff_pool *xsk_pool, struct mlx5e_rq_param *rqp, struct mlx5e_rq *rq) { @@@ -389,9 -379,9 +379,9 @@@ rq->mdev = mdev; rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); rq->xdpsq = &c->rq_xdpsq; - rq->umem = umem; + rq->xsk_pool = xsk_pool;
- if (rq->umem) + if (rq->xsk_pool) rq->stats = &c->priv->channel_stats[c->ix].xskrq; else rq->stats = &c->priv->channel_stats[c->ix].rq; @@@ -399,7 -389,7 +389,7 @@@
if (params->xdp_prog) bpf_prog_inc(params->xdp_prog); - rq->xdp_prog = params->xdp_prog; + RCU_INIT_POINTER(rq->xdp_prog, params->xdp_prog);
rq_xdp_ix = rq->ix; if (xsk) @@@ -408,7 -398,7 +398,7 @@@ if (err < 0) goto err_rq_wq_destroy;
- rq->buff.map_dir = rq->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; + rq->buff.map_dir = params->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; rq->buff.headroom = mlx5e_get_rq_headroom(mdev, params, xsk); pool_size = 1 << params->log_rq_mtu_frames;
@@@ -477,7 -467,7 +467,7 @@@ if (xsk) { err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); - xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq); + xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); } else { /* Create a page_pool and register it with rxq */ pp_params.order = 0; @@@ -564,8 -554,8 +554,8 @@@ err_free }
err_rq_wq_destroy: - if (rq->xdp_prog) - bpf_prog_put(rq->xdp_prog); + if (params->xdp_prog) + bpf_prog_put(params->xdp_prog); xdp_rxq_info_unreg(&rq->xdp_rxq); page_pool_destroy(rq->page_pool); mlx5_wq_destroy(&rq->wq_ctrl); @@@ -575,10 -565,16 +565,16 @@@
static void mlx5e_free_rq(struct mlx5e_rq *rq) { + struct mlx5e_channel *c = rq->channel; + struct bpf_prog *old_prog = NULL; int i;
- if (rq->xdp_prog) - bpf_prog_put(rq->xdp_prog); + /* drop_rq has neither channel nor xdp_prog. */ + if (c) + old_prog = rcu_dereference_protected(rq->xdp_prog, + lockdep_is_held(&c->priv->state_lock)); + if (old_prog) + bpf_prog_put(old_prog);
switch (rq->wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: @@@ -816,11 -812,11 +812,11 @@@ void mlx5e_free_rx_descs(struct mlx5e_r
int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, - struct xdp_umem *umem, struct mlx5e_rq *rq) + struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq) { int err;
- err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq); + err = mlx5e_alloc_rq(c, params, xsk, xsk_pool, param, rq); if (err) return err;
@@@ -848,13 -844,6 +844,13 @@@ if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_NO_CSUM_COMPLETE) || c->xdp) __set_bit(MLX5E_RQ_STATE_NO_CSUM_COMPLETE, &c->rq.state);
+ /* For CQE compression on striding RQ, use stride index provided by + * HW if capability is supported. + */ + if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ) && + MLX5_CAP_GEN(c->mdev, mini_cqe_resp_stride_index)) + __set_bit(MLX5E_RQ_STATE_MINI_CQE_HW_STRIDX, &c->rq.state); + return 0;
err_destroy_rq: @@@ -874,7 -863,7 +870,7 @@@ void mlx5e_activate_rq(struct mlx5e_rq void mlx5e_deactivate_rq(struct mlx5e_rq *rq) { clear_bit(MLX5E_RQ_STATE_ENABLED, &rq->state); - napi_synchronize(&rq->channel->napi); /* prevent mlx5e_post_rx_wqes */ + synchronize_rcu(); /* Sync with NAPI to prevent mlx5e_post_rx_wqes. */ }
void mlx5e_close_rq(struct mlx5e_rq *rq) @@@ -932,7 -921,7 +928,7 @@@ static int mlx5e_alloc_xdpsq_db(struct
static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct xdp_umem *umem, + struct xsk_buff_pool *xsk_pool, struct mlx5e_sq_param *param, struct mlx5e_xdpsq *sq, bool is_redirect) @@@ -948,9 -937,9 +944,9 @@@ sq->uar_map = mdev->mlx5e_res.bfreg.map; sq->min_inline_mode = params->tx_min_inline_mode; sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); - sq->umem = umem; + sq->xsk_pool = xsk_pool;
- sq->stats = sq->umem ? + sq->stats = sq->xsk_pool ? &c->priv->channel_stats[c->ix].xsksq : is_redirect ? &c->priv->channel_stats[c->ix].xdpsq : @@@ -1319,12 -1308,10 +1315,10 @@@ void mlx5e_tx_disable_queue(struct netd
static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq) { - struct mlx5e_channel *c = sq->channel; struct mlx5_wq_cyc *wq = &sq->wq;
clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state); - /* prevent netif_tx_wake_queue */ - napi_synchronize(&c->napi); + synchronize_rcu(); /* Sync with NAPI to prevent netif_tx_wake_queue. */
mlx5e_tx_disable_queue(sq->txq);
@@@ -1399,10 -1386,8 +1393,8 @@@ void mlx5e_activate_icosq(struct mlx5e_
void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq) { - struct mlx5e_channel *c = icosq->channel; - clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state); - napi_synchronize(&c->napi); + synchronize_rcu(); /* Sync with NAPI. */ }
void mlx5e_close_icosq(struct mlx5e_icosq *sq) @@@ -1415,13 -1400,13 +1407,13 @@@ }
int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, - struct mlx5e_sq_param *param, struct xdp_umem *umem, + struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool, struct mlx5e_xdpsq *sq, bool is_redirect) { struct mlx5e_create_sq_param csp = {}; int err;
- err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect); + err = mlx5e_alloc_xdpsq(c, params, xsk_pool, param, sq, is_redirect); if (err) return err;
@@@ -1481,7 -1466,7 +1473,7 @@@ void mlx5e_close_xdpsq(struct mlx5e_xdp struct mlx5e_channel *c = sq->channel;
clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state); - napi_synchronize(&c->napi); + synchronize_rcu(); /* Sync with NAPI. */
mlx5e_destroy_sq(c->mdev, sq->sqn); mlx5e_free_xdpsq_descs(sq); @@@ -1914,7 -1899,7 +1906,7 @@@ static u8 mlx5e_enumerate_lag_port(stru static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_params *params, struct mlx5e_channel_param *cparam, - struct xdp_umem *umem, + struct xsk_buff_pool *xsk_pool, struct mlx5e_channel **cp) { int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); @@@ -1953,9 -1938,9 +1945,9 @@@ if (unlikely(err)) goto err_napi_del;
- if (umem) { - mlx5e_build_xsk_param(umem, &xsk); - err = mlx5e_open_xsk(priv, params, &xsk, umem, c); + if (xsk_pool) { + mlx5e_build_xsk_param(xsk_pool, &xsk); + err = mlx5e_open_xsk(priv, params, &xsk, xsk_pool, c); if (unlikely(err)) goto err_close_queues; } @@@ -2189,7 -2174,6 +2181,7 @@@ void mlx5e_build_rx_cq_param(struct mlx struct mlx5e_cq_param *param) { struct mlx5_core_dev *mdev = priv->mdev; + bool hw_stridx = false; void *cqc = param->cqc; u8 log_cq_size;
@@@ -2197,7 -2181,6 +2189,7 @@@ case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: log_cq_size = mlx5e_mpwqe_get_log_rq_size(params, xsk) + mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk); + hw_stridx = MLX5_CAP_GEN(mdev, mini_cqe_resp_stride_index); break; default: /* MLX5_WQ_TYPE_CYCLIC */ log_cq_size = params->log_rq_mtu_frames; @@@ -2205,8 -2188,7 +2197,8 @@@
MLX5_SET(cqc, cqc, log_cq_size, log_cq_size); if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS)) { - MLX5_SET(cqc, cqc, mini_cqe_res_format, MLX5_CQE_FORMAT_CSUM); + MLX5_SET(cqc, cqc, mini_cqe_res_format, hw_stridx ? + MLX5_CQE_FORMAT_CSUM_STRIDX : MLX5_CQE_FORMAT_CSUM); MLX5_SET(cqc, cqc, cqe_comp_en, 1); }
@@@ -2319,12 -2301,12 +2311,12 @@@ int mlx5e_open_channels(struct mlx5e_pr
mlx5e_build_channel_param(priv, &chs->params, cparam); for (i = 0; i < chs->num; i++) { - struct xdp_umem *umem = NULL; + struct xsk_buff_pool *xsk_pool = NULL;
if (chs->params.xdp_prog) - umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i); + xsk_pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i);
- err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]); + err = mlx5e_open_channel(priv, i, &chs->params, cparam, xsk_pool, &chs->c[i]); if (err) goto err_close_channels; } @@@ -3577,6 -3559,7 +3569,7 @@@ void mlx5e_fold_sw_stats64(struct mlx5e
s->rx_packets += rq_stats->packets + xskrq_stats->packets; s->rx_bytes += rq_stats->bytes + xskrq_stats->bytes; + s->multicast += rq_stats->mcast_packets + xskrq_stats->mcast_packets;
for (j = 0; j < priv->max_opened_tc; j++) { struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j]; @@@ -3592,7 -3575,6 +3585,6 @@@ voi mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx5e_priv *priv = netdev_priv(dev); - struct mlx5e_vport_stats *vstats = &priv->stats.vport; struct mlx5e_pport_stats *pstats = &priv->stats.pport;
/* In switchdev mode, monitor counters doesn't monitor @@@ -3627,12 -3609,6 +3619,6 @@@ stats->rx_errors = stats->rx_length_errors + stats->rx_crc_errors + stats->rx_frame_errors; stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors; - - /* vport multicast also counts packets that are dropped due to steering - * or rx out of buffer - */ - stats->multicast = - VPORT_COUNTER_GET(vstats, received_eth_multicast.packets); }
static void mlx5e_set_rx_mode(struct net_device *dev) @@@ -3902,14 -3878,13 +3888,14 @@@ static bool mlx5e_xsk_validate_mtu(stru u16 ix;
for (ix = 0; ix < chs->params.num_channels; ix++) { - struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix); + struct xsk_buff_pool *xsk_pool = + mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix); struct mlx5e_xsk_param xsk;
- if (!umem) + if (!xsk_pool) continue;
- mlx5e_build_xsk_param(umem, &xsk); + mlx5e_build_xsk_param(xsk_pool, &xsk);
if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) { u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk); @@@ -4341,6 -4316,16 +4327,16 @@@ static int mlx5e_xdp_allowed(struct mlx return 0; }
+ static void mlx5e_rq_replace_xdp_prog(struct mlx5e_rq *rq, struct bpf_prog *prog) + { + struct bpf_prog *old_prog; + + old_prog = rcu_replace_pointer(rq->xdp_prog, prog, + lockdep_is_held(&rq->channel->priv->state_lock)); + if (old_prog) + bpf_prog_put(old_prog); + } + static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) { struct mlx5e_priv *priv = netdev_priv(netdev); @@@ -4399,29 -4384,10 +4395,10 @@@ */ for (i = 0; i < priv->channels.num; i++) { struct mlx5e_channel *c = priv->channels.c[i]; - bool xsk_open = test_bit(MLX5E_CHANNEL_STATE_XSK, c->state); - - clear_bit(MLX5E_RQ_STATE_ENABLED, &c->rq.state); - if (xsk_open) - clear_bit(MLX5E_RQ_STATE_ENABLED, &c->xskrq.state); - napi_synchronize(&c->napi); - /* prevent mlx5e_poll_rx_cq from accessing rq->xdp_prog */ - - old_prog = xchg(&c->rq.xdp_prog, prog); - if (old_prog) - bpf_prog_put(old_prog); - - if (xsk_open) { - old_prog = xchg(&c->xskrq.xdp_prog, prog); - if (old_prog) - bpf_prog_put(old_prog); - }
- set_bit(MLX5E_RQ_STATE_ENABLED, &c->rq.state); - if (xsk_open) - set_bit(MLX5E_RQ_STATE_ENABLED, &c->xskrq.state); - /* napi_schedule in case we have missed anything */ - napi_schedule(&c->napi); + mlx5e_rq_replace_xdp_prog(&c->rq, prog); + if (test_bit(MLX5E_CHANNEL_STATE_XSK, c->state)) + mlx5e_rq_replace_xdp_prog(&c->xskrq, prog); }
unlock: @@@ -4434,8 -4400,8 +4411,8 @@@ static int mlx5e_xdp(struct net_device switch (xdp->command) { case XDP_SETUP_PROG: return mlx5e_xdp_set(dev, xdp->prog); - case XDP_SETUP_XSK_UMEM: - return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem, + case XDP_SETUP_XSK_POOL: + return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool, xdp->xsk.queue_id); default: return -EINVAL; @@@ -5211,7 -5177,7 +5188,7 @@@ static const struct mlx5e_profile mlx5e .enable = mlx5e_nic_enable, .disable = mlx5e_nic_disable, .update_rx = mlx5e_update_nic_rx, - .update_stats = mlx5e_update_ndo_stats, + .update_stats = mlx5e_stats_update_ndo_stats, .update_carrier = mlx5e_update_carrier, .rx_handlers = &mlx5e_rx_handlers_nic, .max_tc = MLX5E_MAX_NUM_TC, diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 110addbb8992,e979bff64c49..97ba2da56cf9 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@@ -288,14 -288,6 +288,14 @@@ static u32 mlx5e_rep_get_rxfh_indir_siz return mlx5e_ethtool_get_rxfh_indir_size(priv); }
+static void mlx5e_uplink_rep_get_pause_stats(struct net_device *netdev, + struct ethtool_pause_stats *stats) +{ + struct mlx5e_priv *priv = netdev_priv(netdev); + + mlx5e_stats_pause_get(priv, stats); +} + static void mlx5e_uplink_rep_get_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pauseparam) { @@@ -370,7 -362,6 +370,7 @@@ static const struct ethtool_ops mlx5e_u .set_rxfh = mlx5e_set_rxfh, .get_rxnfc = mlx5e_get_rxnfc, .set_rxnfc = mlx5e_set_rxnfc, + .get_pause_stats = mlx5e_uplink_rep_get_pause_stats, .get_pauseparam = mlx5e_uplink_rep_get_pauseparam, .set_pauseparam = mlx5e_uplink_rep_set_pauseparam, }; @@@ -1180,7 -1171,7 +1180,7 @@@ static const struct mlx5e_profile mlx5e .cleanup_tx = mlx5e_cleanup_rep_tx, .enable = mlx5e_rep_enable, .update_rx = mlx5e_update_rep_rx, - .update_stats = mlx5e_update_ndo_stats, + .update_stats = mlx5e_stats_update_ndo_stats, .rx_handlers = &mlx5e_rx_handlers_rep, .max_tc = 1, .rq_groups = MLX5E_NUM_RQ_GROUPS(REGULAR), @@@ -1198,7 -1189,7 +1198,7 @@@ static const struct mlx5e_profile mlx5e .enable = mlx5e_uplink_rep_enable, .disable = mlx5e_uplink_rep_disable, .update_rx = mlx5e_update_rep_rx, - .update_stats = mlx5e_update_ndo_stats, + .update_stats = mlx5e_stats_update_ndo_stats, .update_carrier = mlx5e_update_carrier, .rx_handlers = &mlx5e_rx_handlers_rep, .max_tc = MLX5E_MAX_NUM_TC, @@@ -1219,22 -1210,16 +1219,22 @@@ is_devlink_port_supported(const struct static int register_devlink_port(struct mlx5_core_dev *dev, struct mlx5e_rep_priv *rpriv) { + struct mlx5_esw_offload *offloads = &dev->priv.eswitch->offloads; struct devlink *devlink = priv_to_devlink(dev); struct mlx5_eswitch_rep *rep = rpriv->rep; struct devlink_port_attrs attrs = {}; struct netdev_phys_item_id ppid = {}; unsigned int dl_port_index = 0; + u32 controller_num = 0; + bool external; u16 pfnum;
if (!is_devlink_port_supported(dev, rpriv)) return 0;
+ external = mlx5_core_is_ecpf_esw_manager(dev); + if (external) + controller_num = offloads->host_number + 1; mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid); dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, rep->vport); pfnum = PCI_FUNC(dev->pdev->devfn); @@@ -1247,13 -1232,12 +1247,13 @@@ } else if (rep->vport == MLX5_VPORT_PF) { memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len); rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len; - devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum); + devlink_port_attrs_pci_pf_set(&rpriv->dl_port, controller_num, + pfnum, external); } else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) { memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len); rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len; - devlink_port_attrs_pci_vf_set(&rpriv->dl_port, - pfnum, rep->vport - 1); + devlink_port_attrs_pci_vf_set(&rpriv->dl_port, controller_num, + pfnum, rep->vport - 1, external); } return devlink_port_register(devlink, &rpriv->dl_port, dl_port_index); } diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index c9c82b14060a,64c8ac5eabf6..310533fc9950 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@@ -30,6 -30,7 +30,6 @@@ * SOFTWARE. */
-#include <linux/prefetch.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/tcp.h> @@@ -52,6 -53,7 +52,7 @@@ #include "en/xsk/rx.h" #include "en/health.h" #include "en/params.h" + #include "en/txrx.h"
static struct sk_buff * mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi, @@@ -137,17 -139,8 +138,17 @@@ static inline void mlx5e_decompress_cqe title->check_sum = mini_cqe->checksum; title->op_own &= 0xf0; title->op_own |= 0x01 & (cqcc >> wq->fbc.log_sz); - title->wqe_counter = cpu_to_be16(cqd->wqe_counter);
+ /* state bit set implies linked-list striding RQ wq type and + * HW stride index capability supported + */ + if (test_bit(MLX5E_RQ_STATE_MINI_CQE_HW_STRIDX, &rq->state)) { + title->wqe_counter = mini_cqe->stridx; + return; + } + + /* HW stride index capability not supported */ + title->wqe_counter = cpu_to_be16(cqd->wqe_counter); if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) cqd->wqe_counter += mpwrq_get_cqe_consumed_strides(title); else @@@ -289,8 -282,8 +290,8 @@@ static inline int mlx5e_page_alloc_pool static inline int mlx5e_page_alloc(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - if (rq->umem) - return mlx5e_xsk_page_alloc_umem(rq, dma_info); + if (rq->xsk_pool) + return mlx5e_xsk_page_alloc_pool(rq, dma_info); else return mlx5e_page_alloc_pool(rq, dma_info); } @@@ -321,7 -314,7 +322,7 @@@ static inline void mlx5e_page_release(s struct mlx5e_dma_info *dma_info, bool recycle) { - if (rq->umem) + if (rq->xsk_pool) /* The `recycle` parameter is ignored, and the page is always * put into the Reuse Ring, because there is no way to return * the page to the userspace when the interface goes down. @@@ -408,14 -401,14 +409,14 @@@ static int mlx5e_alloc_rx_wqes(struct m int err; int i;
- if (rq->umem) { + if (rq->xsk_pool) { int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
/* Check in advance that we have enough frames, instead of * allocating one-by-one, failing and moving frames to the * Reuse Ring. */ - if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired))) + if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired))) return -ENOMEM; }
@@@ -513,8 -506,8 +514,8 @@@ static int mlx5e_alloc_rx_mpwqe(struct /* Check in advance that we have enough frames, instead of allocating * one-by-one, failing and moving frames to the Reuse Ring. */ - if (rq->umem && - unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { + if (rq->xsk_pool && + unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) { err = -ENOMEM; goto err; } @@@ -762,7 -755,7 +763,7 @@@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post * the driver when it refills the Fill Ring. * 2. Otherwise, busy poll by rescheduling the NAPI poll. */ - if (unlikely(alloc_err == -ENOMEM && rq->umem)) + if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool)) return true;
return false; @@@ -1088,6 -1081,9 +1089,9 @@@ static inline void mlx5e_build_rx_skb(s mlx5e_enable_ecn(rq, skb);
skb->protocol = eth_type_trans(skb, netdev); + + if (unlikely(mlx5e_skb_is_multicast(skb))) + stats->mcast_packets++; }
static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq, @@@ -1140,7 -1136,6 +1144,6 @@@ mlx5e_skb_from_cqe_linear(struct mlx5e_ struct xdp_buff xdp; struct sk_buff *skb; void *va, *data; - bool consumed; u32 frag_size;
va = page_address(di->page) + wi->offset; @@@ -1149,14 -1144,11 +1152,11 @@@
dma_sync_single_range_for_cpu(rq->pdev, di->addr, wi->offset, frag_size, DMA_FROM_DEVICE); - prefetchw(va); /* xdp_frame data area */ - prefetch(data); + net_prefetchw(va); /* xdp_frame data area */ + net_prefetch(data);
- rcu_read_lock(); mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp); - consumed = mlx5e_xdp_handle(rq, di, &cqe_bcnt, &xdp); - rcu_read_unlock(); - if (consumed) + if (mlx5e_xdp_handle(rq, di, &cqe_bcnt, &xdp)) return NULL; /* page/packet was consumed by XDP */
rx_headroom = xdp.data - xdp.data_hard_start; @@@ -1192,7 -1184,7 +1192,7 @@@ mlx5e_skb_from_cqe_nonlinear(struct mlx return NULL; }
- prefetchw(skb->data); + net_prefetchw(skb->data);
while (byte_cnt) { u16 frag_consumed_bytes = @@@ -1407,7 -1399,7 +1407,7 @@@ mlx5e_skb_from_cqe_mpwrq_nonlinear(stru return NULL; }
- prefetchw(skb->data); + net_prefetchw(skb->data);
if (unlikely(frag_offset >= PAGE_SIZE)) { di++; @@@ -1446,7 -1438,6 +1446,6 @@@ mlx5e_skb_from_cqe_mpwrq_linear(struct struct sk_buff *skb; void *va, *data; u32 frag_size; - bool consumed;
/* Check packet size. Note LRO doesn't use linear SKB */ if (unlikely(cqe_bcnt > rq->hw_mtu)) { @@@ -1460,14 -1451,11 +1459,11 @@@
dma_sync_single_range_for_cpu(rq->pdev, di->addr, head_offset, frag_size, DMA_FROM_DEVICE); - prefetchw(va); /* xdp_frame data area */ - prefetch(data); + net_prefetchw(va); /* xdp_frame data area */ + net_prefetch(data);
- rcu_read_lock(); mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt32, &xdp); - consumed = mlx5e_xdp_handle(rq, di, &cqe_bcnt32, &xdp); - rcu_read_unlock(); - if (consumed) { + if (mlx5e_xdp_handle(rq, di, &cqe_bcnt32, &xdp)) { if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) __set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */ return NULL; /* page/packet was consumed by XDP */ diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_stats.c index 6d5e54b964c0,f6383bc2bc3f..0d34befc5761 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c @@@ -54,6 -54,18 +54,18 @@@ unsigned int mlx5e_stats_total_num(stru return total; }
+ void mlx5e_stats_update_ndo_stats(struct mlx5e_priv *priv) + { + mlx5e_stats_grp_t *stats_grps = priv->profile->stats_grps; + const unsigned int num_stats_grps = stats_grps_num(priv); + int i; + + for (i = num_stats_grps - 1; i >= 0; i--) + if (stats_grps[i]->update_stats && + stats_grps[i]->update_stats_mask & MLX5E_NDO_UPDATE_STATS) + stats_grps[i]->update_stats(priv); + } + void mlx5e_stats_update(struct mlx5e_priv *priv) { mlx5e_stats_grp_t *stats_grps = priv->profile->stats_grps; @@@ -677,35 -689,6 +689,35 @@@ static MLX5E_DECLARE_STATS_GRP_OP_UPDAT mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0); }
+#define MLX5E_READ_CTR64_BE_F(ptr, c) \ + be64_to_cpu(*(__be64 *)((char *)ptr + \ + MLX5_BYTE_OFF(ppcnt_reg, \ + counter_set.eth_802_3_cntrs_grp_data_layout.c##_high))) + +void mlx5e_stats_pause_get(struct mlx5e_priv *priv, + struct ethtool_pause_stats *pause_stats) +{ + u32 ppcnt_ieee_802_3[MLX5_ST_SZ_DW(ppcnt_reg)]; + struct mlx5_core_dev *mdev = priv->mdev; + u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {}; + int sz = MLX5_ST_SZ_BYTES(ppcnt_reg); + + if (!MLX5_BASIC_PPCNT_SUPPORTED(mdev)) + return; + + MLX5_SET(ppcnt_reg, in, local_port, 1); + MLX5_SET(ppcnt_reg, in, grp, MLX5_IEEE_802_3_COUNTERS_GROUP); + mlx5_core_access_reg(mdev, in, sz, ppcnt_ieee_802_3, + sz, MLX5_REG_PPCNT, 0, 0); + + pause_stats->tx_pause_frames = + MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3, + a_pause_mac_ctrl_frames_transmitted); + pause_stats->rx_pause_frames = + MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3, + a_pause_mac_ctrl_frames_received); +} + #define PPORT_2863_OFF(c) \ MLX5_BYTE_OFF(ppcnt_reg, \ counter_set.eth_2863_cntrs_grp_data_layout.c##_high) diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_stats.h index 9d9ee269a041,562263d62141..771f30cb0700 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h @@@ -103,10 -103,8 +103,11 @@@ unsigned int mlx5e_stats_total_num(stru void mlx5e_stats_update(struct mlx5e_priv *priv); void mlx5e_stats_fill(struct mlx5e_priv *priv, u64 *data, int idx); void mlx5e_stats_fill_strings(struct mlx5e_priv *priv, u8 *data); + void mlx5e_stats_update_ndo_stats(struct mlx5e_priv *priv);
+void mlx5e_stats_pause_get(struct mlx5e_priv *priv, + struct ethtool_pause_stats *pause_stats); + /* Concrete NIC Stats */
struct mlx5e_sw_stats { @@@ -122,6 -120,7 +123,7 @@@ u64 tx_nop; u64 rx_lro_packets; u64 rx_lro_bytes; + u64 rx_mcast_packets; u64 rx_ecn_mark; u64 rx_removed_vlan_packets; u64 rx_csum_unnecessary; @@@ -301,6 -300,7 +303,7 @@@ struct mlx5e_rq_stats u64 csum_none; u64 lro_packets; u64 lro_bytes; + u64 mcast_packets; u64 ecn_mark; u64 removed_vlan_packets; u64 xdp_drop; diff --combined drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 817c503693fc,1c93f92d9210..28053c3c4380 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@@ -1290,11 -1290,8 +1290,8 @@@ static void mlx5e_tc_del_fdb_flow(struc
mlx5e_put_flow_tunnel_id(flow);
- if (flow_flag_test(flow, NOT_READY)) { + if (flow_flag_test(flow, NOT_READY)) remove_unready_flow(flow); - kvfree(attr->parse_attr); - return; - }
if (mlx5e_is_offloaded_flow(flow)) { if (flow_flag_test(flow, SLOW)) @@@ -1315,6 -1312,8 +1312,8 @@@ } kvfree(attr->parse_attr);
+ mlx5_tc_ct_match_del(priv, &flow->esw_attr->ct_attr); + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) mlx5e_detach_mod_hdr(priv, flow);
@@@ -2615,7 -2614,6 +2614,7 @@@ static struct mlx5_fields fields[] = OFFLOAD(DIPV6_31_0, 32, U32_MAX, ip6.daddr.s6_addr32[3], 0, dst_ipv4_dst_ipv6.ipv6_layout.ipv6[12]), OFFLOAD(IPV6_HOPLIMIT, 8, U8_MAX, ip6.hop_limit, 0, ttl_hoplimit), + OFFLOAD(IP_DSCP, 16, 0xc00f, ip6, 0, ip_dscp),
OFFLOAD(TCP_SPORT, 16, U16_MAX, tcp.source, 0, tcp_sport), OFFLOAD(TCP_DPORT, 16, U16_MAX, tcp.dest, 0, tcp_dport), @@@ -2626,6 -2624,22 +2625,22 @@@ OFFLOAD(UDP_DPORT, 16, U16_MAX, udp.dest, 0, udp_dport), };
+ static unsigned long mask_to_le(unsigned long mask, int size) + { + __be32 mask_be32; + __be16 mask_be16; + + if (size == 32) { + mask_be32 = (__force __be32)(mask); + mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32)); + } else if (size == 16) { + mask_be32 = (__force __be32)(mask); + mask_be16 = *(__be16 *)&mask_be32; + mask = (__force unsigned long)cpu_to_le16(be16_to_cpu(mask_be16)); + } + + return mask; + } static int offload_pedit_fields(struct mlx5e_priv *priv, int namespace, struct pedit_headers_action *hdrs, @@@ -2639,9 -2653,7 +2654,7 @@@ u32 *s_masks_p, *a_masks_p, s_mask, a_mask; struct mlx5e_tc_mod_hdr_acts *mod_acts; struct mlx5_fields *f; - unsigned long mask; - __be32 mask_be32; - __be16 mask_be16; + unsigned long mask, field_mask; int err; u8 cmd;
@@@ -2707,14 -2719,7 +2720,7 @@@ if (skip) continue;
- if (f->field_bsize == 32) { - mask_be32 = (__force __be32)(mask); - mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32)); - } else if (f->field_bsize == 16) { - mask_be32 = (__force __be32)(mask); - mask_be16 = *(__be16 *)&mask_be32; - mask = (__force unsigned long)cpu_to_le16(be16_to_cpu(mask_be16)); - } + mask = mask_to_le(mask, f->field_bsize);
first = find_first_bit(&mask, f->field_bsize); next_z = find_next_zero_bit(&mask, f->field_bsize, first); @@@ -2745,9 -2750,10 +2751,10 @@@ if (cmd == MLX5_ACTION_TYPE_SET) { int start;
+ field_mask = mask_to_le(f->field_mask, f->field_bsize); + /* if field is bit sized it can start not from first bit */ - start = find_first_bit((unsigned long *)&f->field_mask, - f->field_bsize); + start = find_first_bit(&field_mask, f->field_bsize);
MLX5_SET(set_action_in, action, offset, first - start); /* length is num of bits to be written, zero means length of 32 */ @@@ -3944,16 -3950,6 +3951,16 @@@ static int parse_tc_fdb_actions(struct action |= MLX5_FLOW_CONTEXT_ACTION_DROP | MLX5_FLOW_CONTEXT_ACTION_COUNT; break; + case FLOW_ACTION_TRAP: + if (!flow_offload_has_one_action(flow_action)) { + NL_SET_ERR_MSG_MOD(extack, + "action trap is supported as a sole action only"); + return -EOPNOTSUPP; + } + action |= (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | + MLX5_FLOW_CONTEXT_ACTION_COUNT); + attr->flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH; + break; case FLOW_ACTION_MPLS_PUSH: if (!MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev, reformat_l2_to_l3_tunnel) || @@@ -4413,8 -4409,8 +4420,8 @@@ __mlx5e_add_fdb_flow(struct mlx5e_priv goto err_free;
/* actions validation depends on parsing the ct matches first */ - err = mlx5_tc_ct_parse_match(priv, &parse_attr->spec, f, - &flow->esw_attr->ct_attr, extack); + err = mlx5_tc_ct_match_add(priv, &parse_attr->spec, f, + &flow->esw_attr->ct_attr, extack); if (err) goto err_free;
diff --combined drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index b23d20e16495,1bcf2609dca8..00bcf97cecbc --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@@ -1219,35 -1219,37 +1219,37 @@@ static int esw_create_offloads_fdb_tabl } esw->fdb_table.offloads.send_to_vport_grp = g;
- /* create peer esw miss group */ - memset(flow_group_in, 0, inlen); + if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) { + /* create peer esw miss group */ + memset(flow_group_in, 0, inlen);
- esw_set_flow_group_source_port(esw, flow_group_in); + esw_set_flow_group_source_port(esw, flow_group_in);
- if (!mlx5_eswitch_vport_match_metadata_enabled(esw)) { - match_criteria = MLX5_ADDR_OF(create_flow_group_in, - flow_group_in, - match_criteria); + if (!mlx5_eswitch_vport_match_metadata_enabled(esw)) { + match_criteria = MLX5_ADDR_OF(create_flow_group_in, + flow_group_in, + match_criteria);
- MLX5_SET_TO_ONES(fte_match_param, match_criteria, - misc_parameters.source_eswitch_owner_vhca_id); + MLX5_SET_TO_ONES(fte_match_param, match_criteria, + misc_parameters.source_eswitch_owner_vhca_id);
- MLX5_SET(create_flow_group_in, flow_group_in, - source_eswitch_owner_vhca_id_valid, 1); - } + MLX5_SET(create_flow_group_in, flow_group_in, + source_eswitch_owner_vhca_id_valid, 1); + }
- MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ix); - MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, - ix + esw->total_vports - 1); - ix += esw->total_vports; + MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ix); + MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, + ix + esw->total_vports - 1); + ix += esw->total_vports;
- g = mlx5_create_flow_group(fdb, flow_group_in); - if (IS_ERR(g)) { - err = PTR_ERR(g); - esw_warn(dev, "Failed to create peer miss flow group err(%d)\n", err); - goto peer_miss_err; + g = mlx5_create_flow_group(fdb, flow_group_in); + if (IS_ERR(g)) { + err = PTR_ERR(g); + esw_warn(dev, "Failed to create peer miss flow group err(%d)\n", err); + goto peer_miss_err; + } + esw->fdb_table.offloads.peer_miss_grp = g; } - esw->fdb_table.offloads.peer_miss_grp = g;
/* create miss group */ memset(flow_group_in, 0, inlen); @@@ -1281,7 -1283,8 +1283,8 @@@ miss_rule_err: mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp); miss_err: - mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); + if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) + mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); peer_miss_err: mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp); send_vport_err: @@@ -1305,7 -1308,8 +1308,8 @@@ static void esw_destroy_offloads_fdb_ta mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule_multi); mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule_uni); mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp); - mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); + if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) + mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp); mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
mlx5_esw_chains_destroy(esw); @@@ -1864,6 -1868,18 +1868,6 @@@ esw_check_vport_match_metadata_supporte return true; }
-static bool -esw_check_vport_match_metadata_mandatory(const struct mlx5_eswitch *esw) -{ - return mlx5_core_mp_enabled(esw->dev); -} - -static bool esw_use_vport_metadata(const struct mlx5_eswitch *esw) -{ - return esw_check_vport_match_metadata_mandatory(esw) && - esw_check_vport_match_metadata_supported(esw); -} - u32 mlx5_esw_match_metadata_alloc(struct mlx5_eswitch *esw) { u32 num_vports = GENMASK(ESW_VPORT_BITS - 1, 0) - 1; @@@ -1896,6 -1912,9 +1900,6 @@@ void mlx5_esw_match_metadata_free(struc static int esw_offloads_vport_metadata_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { - if (vport->vport == MLX5_VPORT_UPLINK) - return 0; - vport->default_metadata = mlx5_esw_match_metadata_alloc(esw); vport->metadata = vport->default_metadata; return vport->metadata ? 0 : -ENOSPC; @@@ -1904,56 -1923,26 +1908,56 @@@ static void esw_offloads_vport_metadata_cleanup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { - if (vport->vport == MLX5_VPORT_UPLINK || !vport->default_metadata) + if (!vport->default_metadata) return;
WARN_ON(vport->metadata != vport->default_metadata); mlx5_esw_match_metadata_free(esw, vport->default_metadata); }
+static void esw_offloads_metadata_uninit(struct mlx5_eswitch *esw) +{ + struct mlx5_vport *vport; + int i; + + if (!mlx5_eswitch_vport_match_metadata_enabled(esw)) + return; + + mlx5_esw_for_all_vports_reverse(esw, i, vport) + esw_offloads_vport_metadata_cleanup(esw, vport); +} + +static int esw_offloads_metadata_init(struct mlx5_eswitch *esw) +{ + struct mlx5_vport *vport; + int err; + int i; + + if (!mlx5_eswitch_vport_match_metadata_enabled(esw)) + return 0; + + mlx5_esw_for_all_vports(esw, i, vport) { + err = esw_offloads_vport_metadata_setup(esw, vport); + if (err) + goto metadata_err; + } + + return 0; + +metadata_err: + esw_offloads_metadata_uninit(esw); + return err; +} + int esw_vport_create_offloads_acl_tables(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { int err;
- err = esw_offloads_vport_metadata_setup(esw, vport); - if (err) - goto metadata_err; - err = esw_acl_ingress_ofld_setup(esw, vport); if (err) - goto ingress_err; + return err;
if (mlx5_eswitch_is_vf_vport(esw, vport->vport)) { err = esw_acl_egress_ofld_setup(esw, vport); @@@ -1965,6 -1954,9 +1969,6 @@@
egress_err: esw_acl_ingress_ofld_cleanup(esw, vport); -ingress_err: - esw_offloads_vport_metadata_cleanup(esw, vport); -metadata_err: return err; }
@@@ -1974,14 -1966,22 +1978,14 @@@ esw_vport_destroy_offloads_acl_tables(s { esw_acl_egress_ofld_cleanup(vport); esw_acl_ingress_ofld_cleanup(esw, vport); - esw_offloads_vport_metadata_cleanup(esw, vport); }
static int esw_create_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) { struct mlx5_vport *vport; - int err; - - if (esw_use_vport_metadata(esw)) - esw->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA;
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); - err = esw_vport_create_offloads_acl_tables(esw, vport); - if (err) - esw->flags &= ~MLX5_ESWITCH_VPORT_MATCH_METADATA; - return err; + return esw_vport_create_offloads_acl_tables(esw, vport); }
static void esw_destroy_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) @@@ -1990,6 -1990,7 +1994,6 @@@
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); esw_vport_destroy_offloads_acl_tables(esw, vport); - esw->flags &= ~MLX5_ESWITCH_VPORT_MATCH_METADATA; }
static int esw_offloads_steering_init(struct mlx5_eswitch *esw) @@@ -2113,24 -2114,6 +2117,24 @@@ int mlx5_esw_funcs_changed_handler(stru return NOTIFY_OK; }
+static int mlx5_esw_host_number_init(struct mlx5_eswitch *esw) +{ + const u32 *query_host_out; + + if (!mlx5_core_is_ecpf_esw_manager(esw->dev)) + return 0; + + query_host_out = mlx5_esw_query_functions(esw->dev); + if (IS_ERR(query_host_out)) + return PTR_ERR(query_host_out); + + /* Mark non local controller with non zero controller number. */ + esw->offloads.host_number = MLX5_GET(query_esw_functions_out, query_host_out, + host_params_context.host_number); + kvfree(query_host_out); + return 0; +} + int esw_offloads_enable(struct mlx5_eswitch *esw) { struct mlx5_vport *vport; @@@ -2145,17 -2128,6 +2149,17 @@@ mutex_init(&esw->offloads.termtbl_mutex); mlx5_rdma_enable_roce(esw->dev);
+ err = mlx5_esw_host_number_init(esw); + if (err) + goto err_metadata; + + if (esw_check_vport_match_metadata_supported(esw)) + esw->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA; + + err = esw_offloads_metadata_init(esw); + if (err) + goto err_metadata; + err = esw_set_passing_vport_metadata(esw, true); if (err) goto err_vport_metadata; @@@ -2188,9 -2160,6 +2192,9 @@@ err_uplink err_steering_init: esw_set_passing_vport_metadata(esw, false); err_vport_metadata: + esw_offloads_metadata_uninit(esw); +err_metadata: + esw->flags &= ~MLX5_ESWITCH_VPORT_MATCH_METADATA; mlx5_rdma_disable_roce(esw->dev); mutex_destroy(&esw->offloads.termtbl_mutex); return err; @@@ -2224,8 -2193,6 +2228,8 @@@ void esw_offloads_disable(struct mlx5_e esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK); esw_set_passing_vport_metadata(esw, false); esw_offloads_steering_cleanup(esw); + esw_offloads_metadata_uninit(esw); + esw->flags &= ~MLX5_ESWITCH_VPORT_MATCH_METADATA; mlx5_rdma_disable_roce(esw->dev); mutex_destroy(&esw->offloads.termtbl_mutex); esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE; diff --combined drivers/net/ethernet/qlogic/qed/qed_dev.c index f7f08e6a3acf,3db181f3617a..d2f5855b2ea7 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@@ -3973,7 -3973,6 +3973,7 @@@ static int qed_hw_get_nvm_info(struct q struct qed_mcp_link_speed_params *ext_speed; struct qed_mcp_link_capabilities *p_caps; struct qed_mcp_link_params *link; + int i;
/* Read global nvm_cfg address */ nvm_cfg_addr = qed_rd(p_hwfn, p_ptt, MISC_REG_GEN_PURP_CR0); @@@ -4254,7 -4253,8 +4254,8 @@@ cdev->mf_bits = BIT(QED_MF_LLH_MAC_CLSS) | BIT(QED_MF_LLH_PROTO_CLSS) | BIT(QED_MF_LL2_NON_UNICAST) | - BIT(QED_MF_INTER_PF_SWITCH); + BIT(QED_MF_INTER_PF_SWITCH) | + BIT(QED_MF_DISABLE_ARFS); break; case NVM_CFG1_GLOB_MF_MODE_DEFAULT: cdev->mf_bits = BIT(QED_MF_LLH_MAC_CLSS) | @@@ -4267,6 -4267,14 +4268,14 @@@
DP_INFO(p_hwfn, "Multi function mode is 0x%lx\n", cdev->mf_bits); + + /* In CMT the PF is unknown when the GFS block processes the + * packet. Therefore cannot use searcher as it has a per PF + * database, and thus ARFS must be disabled. + * + */ + if (QED_IS_CMT(cdev)) + cdev->mf_bits |= BIT(QED_MF_DISABLE_ARFS); }
DP_INFO(p_hwfn, "Multi function mode is 0x%lx\n", @@@ -4291,14 -4299,6 +4300,14 @@@ __set_bit(QED_DEV_CAP_ROCE, &p_hwfn->hw_info.device_capabilities);
+ /* Read device serial number information from shmem */ + addr = MCP_REG_SCRATCH + nvm_cfg1_offset + + offsetof(struct nvm_cfg1, glob) + + offsetof(struct nvm_cfg1_glob, serial_number); + + for (i = 0; i < 4; i++) + p_hwfn->hw_info.part_num[i] = qed_rd(p_hwfn, p_ptt, addr + i * 4); + return qed_mcp_fill_shmem_func_info(p_hwfn, p_ptt); }
diff --combined drivers/net/ethernet/qlogic/qed/qed_main.c index f2ab2b086946,50e5eb22e60a..5bd58c65e163 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@@ -39,7 -39,6 +39,7 @@@ #include "qed_hw.h" #include "qed_selftest.h" #include "qed_debug.h" +#include "qed_devlink.h"
#define QED_ROCE_QPS (8192) #define QED_ROCE_DPIS (8) @@@ -445,6 -444,8 +445,8 @@@ int qed_fill_dev_info(struct qed_dev *c dev_info->fw_eng = FW_ENGINEERING_VERSION; dev_info->b_inter_pf_switch = test_bit(QED_MF_INTER_PF_SWITCH, &cdev->mf_bits); + if (!test_bit(QED_MF_DISABLE_ARFS, &cdev->mf_bits)) + dev_info->b_arfs_capable = true; dev_info->tx_switching = true;
if (hw_info->b_wol_support == QED_WOL_SUPPORT_PME) @@@ -479,7 -480,6 +481,7 @@@ }
dev_info->mtu = hw_info->mtu; + cdev->common_dev_info = *dev_info;
return 0; } @@@ -512,6 -512,107 +514,6 @@@ static int qed_set_power_state(struct q return 0; }
-struct qed_devlink { - struct qed_dev *cdev; -}; - -enum qed_devlink_param_id { - QED_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX, - QED_DEVLINK_PARAM_ID_IWARP_CMT, -}; - -static int qed_dl_param_get(struct devlink *dl, u32 id, - struct devlink_param_gset_ctx *ctx) -{ - struct qed_devlink *qed_dl; - struct qed_dev *cdev; - - qed_dl = devlink_priv(dl); - cdev = qed_dl->cdev; - ctx->val.vbool = cdev->iwarp_cmt; - - return 0; -} - -static int qed_dl_param_set(struct devlink *dl, u32 id, - struct devlink_param_gset_ctx *ctx) -{ - struct qed_devlink *qed_dl; - struct qed_dev *cdev; - - qed_dl = devlink_priv(dl); - cdev = qed_dl->cdev; - cdev->iwarp_cmt = ctx->val.vbool; - - return 0; -} - -static const struct devlink_param qed_devlink_params[] = { - DEVLINK_PARAM_DRIVER(QED_DEVLINK_PARAM_ID_IWARP_CMT, - "iwarp_cmt", DEVLINK_PARAM_TYPE_BOOL, - BIT(DEVLINK_PARAM_CMODE_RUNTIME), - qed_dl_param_get, qed_dl_param_set, NULL), -}; - -static const struct devlink_ops qed_dl_ops; - -static int qed_devlink_register(struct qed_dev *cdev) -{ - union devlink_param_value value; - struct qed_devlink *qed_dl; - struct devlink *dl; - int rc; - - dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl)); - if (!dl) - return -ENOMEM; - - qed_dl = devlink_priv(dl); - - cdev->dl = dl; - qed_dl->cdev = cdev; - - rc = devlink_register(dl, &cdev->pdev->dev); - if (rc) - goto err_free; - - rc = devlink_params_register(dl, qed_devlink_params, - ARRAY_SIZE(qed_devlink_params)); - if (rc) - goto err_unregister; - - value.vbool = false; - devlink_param_driverinit_value_set(dl, - QED_DEVLINK_PARAM_ID_IWARP_CMT, - value); - - devlink_params_publish(dl); - cdev->iwarp_cmt = false; - - return 0; - -err_unregister: - devlink_unregister(dl); - -err_free: - cdev->dl = NULL; - devlink_free(dl); - - return rc; -} - -static void qed_devlink_unregister(struct qed_dev *cdev) -{ - if (!cdev->dl) - return; - - devlink_params_unregister(cdev->dl, qed_devlink_params, - ARRAY_SIZE(qed_devlink_params)); - - devlink_unregister(cdev->dl); - devlink_free(cdev->dl); -} - /* probing */ static struct qed_dev *qed_probe(struct pci_dev *pdev, struct qed_probe_params *params) @@@ -540,6 -641,12 +542,6 @@@ } DP_INFO(cdev, "PCI init completed successfully\n");
- rc = qed_devlink_register(cdev); - if (rc) { - DP_INFO(cdev, "Failed to register devlink.\n"); - goto err2; - } - rc = qed_hw_prepare(cdev, QED_PCI_DEFAULT); if (rc) { DP_ERR(cdev, "hw prepare failed\n"); @@@ -569,6 -676,8 +571,6 @@@ static void qed_remove(struct qed_dev *
qed_set_power_state(cdev, PCI_D3hot);
- qed_devlink_unregister(cdev); - qed_free_cdev(cdev); }
@@@ -734,7 -843,7 +736,7 @@@ static irqreturn_t qed_single_int(int i
/* Slowpath interrupt */ if (unlikely(status & 0x1)) { - tasklet_schedule(hwfn->sp_dpc); + tasklet_schedule(&hwfn->sp_dpc); status &= ~0x1; rc = IRQ_HANDLED; } @@@ -780,7 -889,7 +782,7 @@@ int qed_slowpath_irq_req(struct qed_hwf id, cdev->pdev->bus->number, PCI_SLOT(cdev->pdev->devfn), hwfn->abs_pf_id); rc = request_irq(cdev->int_params.msix_table[id].vector, - qed_msix_sp_int, 0, hwfn->name, hwfn->sp_dpc); + qed_msix_sp_int, 0, hwfn->name, &hwfn->sp_dpc); } else { unsigned long flags = 0;
@@@ -812,8 -921,8 +814,8 @@@ static void qed_slowpath_tasklet_flush( * enable function makes this sequence a flush-like operation. */ if (p_hwfn->b_sp_dpc_enabled) { - tasklet_disable(p_hwfn->sp_dpc); - tasklet_enable(p_hwfn->sp_dpc); + tasklet_disable(&p_hwfn->sp_dpc); + tasklet_enable(&p_hwfn->sp_dpc); } }
@@@ -842,7 -951,7 +844,7 @@@ static void qed_slowpath_irq_free(struc break; synchronize_irq(cdev->int_params.msix_table[i].vector); free_irq(cdev->int_params.msix_table[i].vector, - cdev->hwfns[i].sp_dpc); + &cdev->hwfns[i].sp_dpc); } } else { if (QED_LEADING_HWFN(cdev)->b_int_requested) @@@ -861,11 -970,11 +863,11 @@@ static int qed_nic_stop(struct qed_dev struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
if (p_hwfn->b_sp_dpc_enabled) { - tasklet_disable(p_hwfn->sp_dpc); + tasklet_disable(&p_hwfn->sp_dpc); p_hwfn->b_sp_dpc_enabled = false; DP_VERBOSE(cdev, NETIF_MSG_IFDOWN, "Disabled sp tasklet [hwfn %d] at %p\n", - i, p_hwfn->sp_dpc); + i, &p_hwfn->sp_dpc); } }
@@@ -2817,7 -2926,7 +2819,7 @@@ static int qed_set_led(struct qed_dev * return status; }
-static int qed_recovery_process(struct qed_dev *cdev) +int qed_recovery_process(struct qed_dev *cdev) { struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev); struct qed_ptt *p_ptt; @@@ -3005,9 -3114,6 +3007,9 @@@ const struct qed_common_ops qed_common_ .get_link = &qed_get_current_link, .drain = &qed_drain, .update_msglvl = &qed_init_dp, + .devlink_register = qed_devlink_register, + .devlink_unregister = qed_devlink_unregister, + .report_fatal_error = qed_report_fatal_error, .dbg_all_data = &qed_dbg_all_data, .dbg_all_data_size = &qed_dbg_all_data_size, .chain_alloc = &qed_chain_alloc, diff --combined drivers/net/ethernet/qlogic/qede/qede_main.c index 20d2296beb79,9e1f41ba766c..05e3a3b60269 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@@ -804,7 -804,7 +804,7 @@@ static void qede_init_ndev(struct qede_ NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_HW_TC;
- if (!IS_VF(edev) && edev->dev_info.common.num_hwfns == 1) + if (edev->dev_info.common.b_arfs_capable) hw_features |= NETIF_F_NTUPLE;
if (edev->dev_info.common.vxlan_enable || @@@ -1170,23 -1170,10 +1170,23 @@@ static int __qede_probe(struct pci_dev rc = -ENOMEM; goto err2; } + + edev->devlink = qed_ops->common->devlink_register(cdev); + if (IS_ERR(edev->devlink)) { + DP_NOTICE(edev, "Cannot register devlink\n"); + edev->devlink = NULL; + /* Go on, we can live without devlink */ + } } else { struct net_device *ndev = pci_get_drvdata(pdev);
edev = netdev_priv(ndev); + + if (edev->devlink) { + struct qed_devlink *qdl = devlink_priv(edev->devlink); + + qdl->cdev = cdev; + } edev->cdev = cdev; memset(&edev->stats, 0, sizeof(edev->stats)); memcpy(&edev->dev_info, &dev_info, sizeof(dev_info)); @@@ -1238,10 -1225,7 +1238,10 @@@ err4: qede_rdma_dev_remove(edev, (mode == QEDE_PROBE_RECOVERY)); err3: - free_netdev(edev->ndev); + if (mode != QEDE_PROBE_RECOVERY) + free_netdev(edev->ndev); + else + edev->cdev = NULL; err2: qed_ops->common->slowpath_stop(cdev); err1: @@@ -1312,11 -1296,6 +1312,11 @@@ static void __qede_remove(struct pci_de qed_ops->common->slowpath_stop(cdev); if (system_state == SYSTEM_POWER_OFF) return; + + if (mode != QEDE_REMOVE_RECOVERY && edev->devlink) { + qed_ops->common->devlink_unregister(edev->devlink); + edev->devlink = NULL; + } qed_ops->common->remove(cdev); edev->cdev = NULL;
@@@ -2295,7 -2274,7 +2295,7 @@@ static void qede_unload(struct qede_de qede_vlan_mark_nonconfigured(edev); edev->ops->fastpath_stop(edev->cdev);
- if (!IS_VF(edev) && edev->dev_info.common.num_hwfns == 1) { + if (edev->dev_info.common.b_arfs_capable) { qede_poll_for_freeing_arfs_filters(edev); qede_free_arfs(edev); } @@@ -2362,10 -2341,9 +2362,9 @@@ static int qede_load(struct qede_dev *e if (rc) goto err2;
- if (!IS_VF(edev) && edev->dev_info.common.num_hwfns == 1) { - rc = qede_alloc_arfs(edev); - if (rc) - DP_NOTICE(edev, "aRFS memory allocation failed\n"); + if (qede_alloc_arfs(edev)) { + edev->ndev->features &= ~NETIF_F_NTUPLE; + edev->dev_info.common.b_arfs_capable = false; }
qede_napi_add_enable(edev); @@@ -2476,8 -2454,7 +2475,8 @@@ static int qede_close(struct net_devic
qede_unload(edev, QEDE_UNLOAD_NORMAL, false);
- edev->ops->common->update_drv_state(edev->cdev, false); + if (edev->cdev) + edev->ops->common->update_drv_state(edev->cdev, false);
return 0; } @@@ -2599,12 -2576,19 +2598,12 @@@ static void qede_atomic_hw_err_handler(
static void qede_generic_hw_err_handler(struct qede_dev *edev) { - struct qed_dev *cdev = edev->cdev; - DP_NOTICE(edev, "Generic sleepable HW error handling started - err_flags 0x%lx\n", edev->err_flags);
- /* Trigger a recovery process. - * This is placed in the sleep requiring section just to make - * sure it is the last one, and that all the other operations - * were completed. - */ - if (test_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags)) - edev->ops->common->recovery_process(cdev); + if (edev->devlink) + edev->ops->common->report_fatal_error(edev->devlink, edev->last_err_type);
clear_bit(QEDE_ERR_IS_HANDLED, &edev->err_flags);
@@@ -2658,7 -2642,6 +2657,7 @@@ static void qede_schedule_hw_err_handle return; }
+ edev->last_err_type = err_type; qede_set_hw_err_flags(edev, err_type); qede_atomic_hw_err_handler(edev); set_bit(QEDE_SP_HW_ERR, &edev->sp_flags); diff --combined drivers/net/ethernet/ti/cpsw_new.c index a3528c5c823f,15672d0a4de6..26073cd63e33 --- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@@ -17,6 -17,7 +17,7 @@@ #include <linux/phy.h> #include <linux/phy/phy.h> #include <linux/delay.h> + #include <linux/pinctrl/consumer.h> #include <linux/pm_runtime.h> #include <linux/gpio/consumer.h> #include <linux/of.h> @@@ -1243,6 -1244,7 +1244,6 @@@ static int cpsw_probe_dt(struct cpsw_co
data->active_slave = 0; data->channels = CPSW_MAX_QUEUES; - data->ale_entries = CPSW_ALE_NUM_ENTRIES; data->dual_emac = true; data->bd_ram_size = CPSW_BD_RAM_SIZE; data->mac_control = 0; @@@ -2069,9 -2071,61 +2070,61 @@@ static int cpsw_remove(struct platform_ return 0; }
+ static int __maybe_unused cpsw_suspend(struct device *dev) + { + struct cpsw_common *cpsw = dev_get_drvdata(dev); + int i; + + rtnl_lock(); + + for (i = 0; i < cpsw->data.slaves; i++) { + struct net_device *ndev = cpsw->slaves[i].ndev; + + if (!(ndev && netif_running(ndev))) + continue; + + cpsw_ndo_stop(ndev); + } + + rtnl_unlock(); + + /* Select sleep pin state */ + pinctrl_pm_select_sleep_state(dev); + + return 0; + } + + static int __maybe_unused cpsw_resume(struct device *dev) + { + struct cpsw_common *cpsw = dev_get_drvdata(dev); + int i; + + /* Select default pin state */ + pinctrl_pm_select_default_state(dev); + + /* shut up ASSERT_RTNL() warning in netif_set_real_num_tx/rx_queues */ + rtnl_lock(); + + for (i = 0; i < cpsw->data.slaves; i++) { + struct net_device *ndev = cpsw->slaves[i].ndev; + + if (!(ndev && netif_running(ndev))) + continue; + + cpsw_ndo_open(ndev); + } + + rtnl_unlock(); + + return 0; + } + + static SIMPLE_DEV_PM_OPS(cpsw_pm_ops, cpsw_suspend, cpsw_resume); + static struct platform_driver cpsw_driver = { .driver = { .name = "cpsw-switch", + .pm = &cpsw_pm_ops, .of_match_table = cpsw_of_mtable, }, .probe = cpsw_probe, diff --combined drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 991ca9e32be3,3c07d1bbe1c6..d4989e0cd7be --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@@ -664,9 -664,15 +664,15 @@@ static void pkt_align(struct sk_buff *p /* To check if there's window offered */ static bool data_ok(struct brcmf_sdio *bus) { - /* Reserve TXCTL_CREDITS credits for txctl */ - return (bus->tx_max - bus->tx_seq) > TXCTL_CREDITS && - ((bus->tx_max - bus->tx_seq) & 0x80) == 0; + u8 tx_rsv = 0; + + /* Reserve TXCTL_CREDITS credits for txctl when it is ready to send */ + if (bus->ctrl_frame_stat) + tx_rsv = TXCTL_CREDITS; + + return (bus->tx_max - bus->tx_seq - tx_rsv) != 0 && + ((bus->tx_max - bus->tx_seq - tx_rsv) & 0x80) == 0; + }
/* To check if there's window offered */ @@@ -4272,9 -4278,8 +4278,9 @@@ static void brcmf_sdio_firmware_callbac brcmf_sdiod_writeb(sdiod, SBSDIO_FUNC1_MESBUSYCTRL, CY_43012_MESBUSYCTRL, &err); break; + case SDIO_DEVICE_ID_BROADCOM_4329: case SDIO_DEVICE_ID_BROADCOM_4339: - brcmf_dbg(INFO, "set F2 watermark to 0x%x*4 bytes for 4339\n", + brcmf_dbg(INFO, "set F2 watermark to 0x%x*4 bytes\n", CY_4339_F2_WATERMARK); brcmf_sdiod_writeb(sdiod, SBSDIO_WATERMARK, CY_4339_F2_WATERMARK, &err); @@@ -4287,7 -4292,7 +4293,7 @@@ CY_4339_MESBUSYCTRL, &err); break; case SDIO_DEVICE_ID_BROADCOM_43455: - brcmf_dbg(INFO, "set F2 watermark to 0x%x*4 bytes for 43455\n", + brcmf_dbg(INFO, "set F2 watermark to 0x%x*4 bytes\n", CY_43455_F2_WATERMARK); brcmf_sdiod_writeb(sdiod, SBSDIO_WATERMARK, CY_43455_F2_WATERMARK, &err); @@@ -4300,7 -4305,9 +4306,7 @@@ CY_43455_MESBUSYCTRL, &err); break; case SDIO_DEVICE_ID_BROADCOM_4359: - /* fallthrough */ case SDIO_DEVICE_ID_BROADCOM_4354: - /* fallthrough */ case SDIO_DEVICE_ID_BROADCOM_4356: brcmf_dbg(INFO, "set F2 watermark to 0x%x*4 bytes\n", CY_435X_F2_WATERMARK); diff --combined drivers/net/wireless/marvell/mwifiex/fw.h index 1f02c5058aed,d9f8bdbc817b..470d669c7f14 --- a/drivers/net/wireless/marvell/mwifiex/fw.h +++ b/drivers/net/wireless/marvell/mwifiex/fw.h @@@ -513,10 -513,10 +513,10 @@@ enum mwifiex_channel_flags
#define RF_ANTENNA_AUTO 0xFFFF
-#define HostCmd_SET_SEQ_NO_BSS_INFO(seq, num, type) { \ - (((seq) & 0x00ff) | \ - (((num) & 0x000f) << 8)) | \ - (((type) & 0x000f) << 12); } +#define HostCmd_SET_SEQ_NO_BSS_INFO(seq, num, type) \ + ((((seq) & 0x00ff) | \ + (((num) & 0x000f) << 8)) | \ + (((type) & 0x000f) << 12))
#define HostCmd_GET_SEQ_NO(seq) \ ((seq) & HostCmd_SEQ_NUM_MASK) @@@ -954,7 -954,7 +954,7 @@@ struct mwifiex_tkip_param struct mwifiex_aes_param { u8 pn[WPA_PN_SIZE]; __le16 key_len; - u8 key[WLAN_KEY_LEN_CCMP]; + u8 key[WLAN_KEY_LEN_CCMP_256]; } __packed;
struct mwifiex_wapi_param { diff --combined drivers/net/wireless/mediatek/mt76/mt7615/mcu.c index 084982eb6abd,bd316dbd9041..7781530fb3e6 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c @@@ -650,12 -650,12 +650,12 @@@ mt7615_mcu_add_beacon_offload(struct mt memcpy(req.pkt + MT_TXD_SIZE, skb->data, skb->len); req.pkt_len = cpu_to_le16(MT_TXD_SIZE + skb->len); req.tim_ie_pos = cpu_to_le16(MT_TXD_SIZE + offs.tim_offset); - if (offs.csa_counter_offs[0]) { + if (offs.cntdwn_counter_offs[0]) { u16 csa_offs;
- csa_offs = MT_TXD_SIZE + offs.csa_counter_offs[0] - 4; + csa_offs = MT_TXD_SIZE + offs.cntdwn_counter_offs[0] - 4; req.csa_ie_pos = cpu_to_le16(csa_offs); - req.csa_cnt = skb->data[offs.csa_counter_offs[0]]; + req.csa_cnt = skb->data[offs.cntdwn_counter_offs[0]]; } dev_kfree_skb(skb);
@@@ -1713,10 -1713,10 +1713,10 @@@ mt7615_mcu_uni_add_beacon_offload(struc req.beacon_tlv.pkt_len = cpu_to_le16(MT_TXD_SIZE + skb->len); req.beacon_tlv.tim_ie_pos = cpu_to_le16(MT_TXD_SIZE + offs.tim_offset);
- if (offs.csa_counter_offs[0]) { + if (offs.cntdwn_counter_offs[0]) { u16 csa_offs;
- csa_offs = MT_TXD_SIZE + offs.csa_counter_offs[0] - 4; + csa_offs = MT_TXD_SIZE + offs.cntdwn_counter_offs[0] - 4; req.beacon_tlv.csa_ie_pos = cpu_to_le16(csa_offs); } dev_kfree_skb(skb); @@@ -2128,7 -2128,8 +2128,8 @@@ static int mt7615_load_n9(struct mt7615 sizeof(dev->mt76.hw->wiphy->fw_version), "%.10s-%.15s", hdr->fw_ver, hdr->build_date);
- if (!strncmp(hdr->fw_ver, "2.0", sizeof(hdr->fw_ver))) { + if (!is_mt7615(&dev->mt76) && + !strncmp(hdr->fw_ver, "2.0", sizeof(hdr->fw_ver))) { dev->fw_ver = MT7615_FIRMWARE_V2; dev->mcu_ops = &sta_update_ops; } else { diff --combined drivers/s390/net/qeth_l2_main.c index 54e02518ce08,6384f7adba66..e12ac32b8b47 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@@ -17,13 -17,10 +17,13 @@@ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/etherdevice.h> +#include <linux/if_bridge.h> #include <linux/list.h> #include <linux/hash.h> #include <linux/hashtable.h> +#include <net/switchdev.h> #include <asm/chsc.h> +#include <asm/css_chars.h> #include <asm/setup.h> #include "qeth_core.h" #include "qeth_l2.h" @@@ -33,7 -30,6 +33,7 @@@ static void qeth_bridge_state_change(st struct qeth_ipa_cmd *cmd); static void qeth_addr_change_event(struct qeth_card *card, struct qeth_ipa_cmd *cmd); +static bool qeth_bridgeport_is_in_use(struct qeth_card *card); static void qeth_l2_vnicc_set_defaults(struct qeth_card *card); static void qeth_l2_vnicc_init(struct qeth_card *card); static bool qeth_l2_vnicc_recover_timeout(struct qeth_card *card, u32 vnicc, @@@ -277,37 -273,8 +277,37 @@@ static int qeth_l2_vlan_rx_kill_vid(str return qeth_l2_send_setdelvlan(card, vid, IPA_CMD_DELVLAN); }
+static void qeth_l2_set_pnso_mode(struct qeth_card *card, + enum qeth_pnso_mode mode) +{ + spin_lock_irq(get_ccwdev_lock(CARD_RDEV(card))); + WRITE_ONCE(card->info.pnso_mode, mode); + spin_unlock_irq(get_ccwdev_lock(CARD_RDEV(card))); + + if (mode == QETH_PNSO_NONE) + drain_workqueue(card->event_wq); +} + +static void qeth_l2_dev2br_fdb_flush(struct qeth_card *card) +{ + struct switchdev_notifier_fdb_info info; + + QETH_CARD_TEXT(card, 2, "fdbflush"); + + info.addr = NULL; + /* flush all VLANs: */ + info.vid = 0; + info.added_by_user = false; + info.offloaded = true; + + call_switchdev_notifiers(SWITCHDEV_FDB_FLUSH_TO_BRIDGE, + card->dev, &info.info, NULL); +} + static void qeth_l2_stop_card(struct qeth_card *card) { + struct qeth_priv *priv = netdev_priv(card->dev); + QETH_CARD_TEXT(card, 2, "stopcard");
qeth_set_allowed_threads(card, 0, 1); @@@ -317,21 -284,15 +317,21 @@@
if (card->state == CARD_STATE_SOFTSETUP) { qeth_clear_ipacmd_list(card); - qeth_drain_output_queues(card); card->state = CARD_STATE_DOWN; }
qeth_qdio_clear_card(card, 0); + qeth_drain_output_queues(card); qeth_clear_working_pool_list(card); - flush_workqueue(card->event_wq); + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); qeth_flush_local_addrs(card); card->info.promisc_mode = 0; + + if (priv->brport_features & BR_LEARNING_SYNC) { + rtnl_lock(); + qeth_l2_dev2br_fdb_flush(card); + rtnl_unlock(); + } }
static int qeth_l2_request_initial_mac(struct qeth_card *card) @@@ -670,7 -631,6 +670,7 @@@ static void qeth_l2_set_rx_mode(struct /** * qeth_l2_pnso() - perform network subchannel operation * @card: qeth_card structure pointer + * @oc: Operation Code * @cnc: Boolean Change-Notification Control * @cb: Callback function will be executed for each element * of the address list @@@ -681,7 -641,7 +681,7 @@@ * control" is set, further changes in the address list will be reported * via the IPA command. */ -static int qeth_l2_pnso(struct qeth_card *card, int cnc, +static int qeth_l2_pnso(struct qeth_card *card, u8 oc, int cnc, void (*cb)(void *priv, struct chsc_pnso_naid_l2 *entry), void *priv) { @@@ -692,14 -652,13 +692,14 @@@ int i, size, elems; int rc;
- QETH_CARD_TEXT(card, 2, "PNSO"); rr = (struct chsc_pnso_area *)get_zeroed_page(GFP_KERNEL); if (rr == NULL) return -ENOMEM; do { + QETH_CARD_TEXT(card, 2, "PNSO"); /* on the first iteration, naihdr.resume_token will be zero */ - rc = ccw_device_pnso(ddev, rr, rr->naihdr.resume_token, cnc); + rc = ccw_device_pnso(ddev, rr, oc, rr->naihdr.resume_token, + cnc); if (rc) continue; if (cb == NULL) @@@ -735,218 -694,6 +735,218 @@@ return rc; }
+static bool qeth_is_my_net_if_token(struct qeth_card *card, + struct net_if_token *token) +{ + return ((card->info.ddev_devno == token->devnum) && + (card->info.cssid == token->cssid) && + (card->info.iid == token->iid) && + (card->info.ssid == token->ssid) && + (card->info.chpid == token->chpid) && + (card->info.chid == token->chid)); +} + +/** + * qeth_l2_dev2br_fdb_notify() - update fdb of master bridge + * @card: qeth_card structure pointer + * @code: event bitmask: high order bit 0x80 set to + * 1 - removal of an object + * 0 - addition of an object + * Object type(s): + * 0x01 - VLAN, 0x02 - MAC, 0x03 - VLAN and MAC + * @token: "network token" structure identifying 'physical' location + * of the target + * @addr_lnid: structure with MAC address and VLAN ID of the target + */ +static void qeth_l2_dev2br_fdb_notify(struct qeth_card *card, u8 code, + struct net_if_token *token, + struct mac_addr_lnid *addr_lnid) +{ + struct switchdev_notifier_fdb_info info; + u8 ntfy_mac[ETH_ALEN]; + + ether_addr_copy(ntfy_mac, addr_lnid->mac); + /* Ignore VLAN only changes */ + if (!(code & IPA_ADDR_CHANGE_CODE_MACADDR)) + return; + /* Ignore mcast entries */ + if (is_multicast_ether_addr(ntfy_mac)) + return; + /* Ignore my own addresses */ + if (qeth_is_my_net_if_token(card, token)) + return; + + info.addr = ntfy_mac; + /* don't report VLAN IDs */ + info.vid = 0; + info.added_by_user = false; + info.offloaded = true; + + if (code & IPA_ADDR_CHANGE_CODE_REMOVAL) { + call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_BRIDGE, + card->dev, &info.info, NULL); + QETH_CARD_TEXT(card, 4, "andelmac"); + QETH_CARD_TEXT_(card, 4, + "mc%012lx", ether_addr_to_u64(ntfy_mac)); + } else { + call_switchdev_notifiers(SWITCHDEV_FDB_ADD_TO_BRIDGE, + card->dev, &info.info, NULL); + QETH_CARD_TEXT(card, 4, "anaddmac"); + QETH_CARD_TEXT_(card, 4, + "mc%012lx", ether_addr_to_u64(ntfy_mac)); + } +} + +static void qeth_l2_dev2br_an_set_cb(void *priv, + struct chsc_pnso_naid_l2 *entry) +{ + u8 code = IPA_ADDR_CHANGE_CODE_MACADDR; + struct qeth_card *card = priv; + + if (entry->addr_lnid.lnid < VLAN_N_VID) + code |= IPA_ADDR_CHANGE_CODE_VLANID; + qeth_l2_dev2br_fdb_notify(card, code, + (struct net_if_token *)&entry->nit, + (struct mac_addr_lnid *)&entry->addr_lnid); +} + +/** + * qeth_l2_dev2br_an_set() - + * Enable or disable 'dev to bridge network address notification' + * @card: qeth_card structure pointer + * @enable: Enable or disable 'dev to bridge network address notification' + * + * Returns negative errno-compatible error indication or 0 on success. + * + * On enable, emits a series of address notifications for all + * currently registered hosts. + * + * Must be called under rtnl_lock + */ +static int qeth_l2_dev2br_an_set(struct qeth_card *card, bool enable) +{ + int rc; + + if (enable) { + QETH_CARD_TEXT(card, 2, "anseton"); + rc = qeth_l2_pnso(card, PNSO_OC_NET_ADDR_INFO, 1, + qeth_l2_dev2br_an_set_cb, card); + if (rc == -EAGAIN) + /* address notification enabled, but inconsistent + * addresses reported -> disable address notification + */ + qeth_l2_pnso(card, PNSO_OC_NET_ADDR_INFO, 0, + NULL, NULL); + } else { + QETH_CARD_TEXT(card, 2, "ansetoff"); + rc = qeth_l2_pnso(card, PNSO_OC_NET_ADDR_INFO, 0, NULL, NULL); + } + + return rc; +} + +static int qeth_l2_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, + struct net_device *dev, u32 filter_mask, + int nlflags) +{ + struct qeth_priv *priv = netdev_priv(dev); + struct qeth_card *card = dev->ml_priv; + u16 mode = BRIDGE_MODE_UNDEF; + + /* Do not even show qeth devs that cannot do bridge_setlink */ + if (!priv->brport_hw_features || !netif_device_present(dev) || + qeth_bridgeport_is_in_use(card)) + return -EOPNOTSUPP; + + return ndo_dflt_bridge_getlink(skb, pid, seq, dev, + mode, priv->brport_features, + priv->brport_hw_features, + nlflags, filter_mask, NULL); +} + +static const struct nla_policy qeth_brport_policy[IFLA_BRPORT_MAX + 1] = { + [IFLA_BRPORT_LEARNING_SYNC] = { .type = NLA_U8 }, +}; + +/** + * qeth_l2_bridge_setlink() - set bridgeport attributes + * @dev: netdevice + * @nlh: netlink message header + * @flags: bridge flags (here: BRIDGE_FLAGS_SELF) + * @extack: extended ACK report struct + * + * Called under rtnl_lock + */ +static int qeth_l2_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh, + u16 flags, struct netlink_ext_ack *extack) +{ + struct qeth_priv *priv = netdev_priv(dev); + struct nlattr *bp_tb[IFLA_BRPORT_MAX + 1]; + struct qeth_card *card = dev->ml_priv; + struct nlattr *attr, *nested_attr; + bool enable, has_protinfo = false; + int rem1, rem2; + int rc; + + if (!netif_device_present(dev)) + return -ENODEV; + if (!(priv->brport_hw_features)) + return -EOPNOTSUPP; + + nlmsg_for_each_attr(attr, nlh, sizeof(struct ifinfomsg), rem1) { + if (nla_type(attr) == IFLA_PROTINFO) { + rc = nla_parse_nested(bp_tb, IFLA_BRPORT_MAX, attr, + qeth_brport_policy, extack); + if (rc) + return rc; + has_protinfo = true; + } else if (nla_type(attr) == IFLA_AF_SPEC) { + nla_for_each_nested(nested_attr, attr, rem2) { + if (nla_type(nested_attr) == IFLA_BRIDGE_FLAGS) + continue; + NL_SET_ERR_MSG_ATTR(extack, nested_attr, + "Unsupported attribute"); + return -EINVAL; + } + } else { + NL_SET_ERR_MSG_ATTR(extack, attr, "Unsupported attribute"); + return -EINVAL; + } + } + if (!has_protinfo) + return 0; + if (!bp_tb[IFLA_BRPORT_LEARNING_SYNC]) + return -EINVAL; + enable = !!nla_get_u8(bp_tb[IFLA_BRPORT_LEARNING_SYNC]); + + if (enable == !!(priv->brport_features & BR_LEARNING_SYNC)) + return 0; + + mutex_lock(&card->sbp_lock); + /* do not change anything if BridgePort is enabled */ + if (qeth_bridgeport_is_in_use(card)) { + NL_SET_ERR_MSG(extack, "n/a (BridgePort)"); + rc = -EBUSY; + } else if (enable) { + qeth_l2_set_pnso_mode(card, QETH_PNSO_ADDR_INFO); + rc = qeth_l2_dev2br_an_set(card, true); + if (rc) + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + else + priv->brport_features |= BR_LEARNING_SYNC; + } else { + rc = qeth_l2_dev2br_an_set(card, false); + if (!rc) { + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + priv->brport_features ^= BR_LEARNING_SYNC; + qeth_l2_dev2br_fdb_flush(card); + } + } + mutex_unlock(&card->sbp_lock); + + return rc; +} + static const struct net_device_ops qeth_l2_netdev_ops = { .ndo_open = qeth_open, .ndo_stop = qeth_stop, @@@ -962,9 -709,7 +962,9 @@@ .ndo_vlan_rx_kill_vid = qeth_l2_vlan_rx_kill_vid, .ndo_tx_timeout = qeth_tx_timeout, .ndo_fix_features = qeth_fix_features, - .ndo_set_features = qeth_set_features + .ndo_set_features = qeth_set_features, + .ndo_bridge_getlink = qeth_l2_bridge_getlink, + .ndo_bridge_setlink = qeth_l2_bridge_setlink, };
static const struct net_device_ops qeth_osn_netdev_ops = { @@@ -1065,78 -810,8 +1065,78 @@@ static void qeth_l2_setup_bridgeport_at if (card->options.sbp.hostnotification) { if (qeth_bridgeport_an_set(card, 1)) card->options.sbp.hostnotification = 0; + } +} + +/** + * qeth_l2_detect_dev2br_support() - + * Detect whether this card supports 'dev to bridge fdb network address + * change notification' and thus can support the learning_sync bridgeport + * attribute + * @card: qeth_card structure pointer + * + * This is a destructive test and must be called before dev2br or + * bridgeport address notification is enabled! + */ +static void qeth_l2_detect_dev2br_support(struct qeth_card *card) +{ + struct qeth_priv *priv = netdev_priv(card->dev); + bool dev2br_supported; + int rc; + + QETH_CARD_TEXT(card, 2, "d2brsup"); + if (!IS_IQD(card)) + return; + + /* dev2br requires valid cssid,iid,chid */ + if (!card->info.ids_valid) { + dev2br_supported = false; + } else if (css_general_characteristics.enarf) { + dev2br_supported = true; } else { - qeth_bridgeport_an_set(card, 0); + /* Old machines don't have the feature bit: + * Probe by testing whether a disable succeeds + */ + rc = qeth_l2_pnso(card, PNSO_OC_NET_ADDR_INFO, 0, NULL, NULL); + dev2br_supported = !rc; + } + QETH_CARD_TEXT_(card, 2, "D2Bsup%02x", dev2br_supported); + + if (dev2br_supported) + priv->brport_hw_features |= BR_LEARNING_SYNC; + else + priv->brport_hw_features &= ~BR_LEARNING_SYNC; +} + +static void qeth_l2_enable_brport_features(struct qeth_card *card) +{ + struct qeth_priv *priv = netdev_priv(card->dev); + int rc; + + if (priv->brport_features & BR_LEARNING_SYNC) { + if (priv->brport_hw_features & BR_LEARNING_SYNC) { + qeth_l2_set_pnso_mode(card, QETH_PNSO_ADDR_INFO); + rc = qeth_l2_dev2br_an_set(card, true); + if (rc == -EAGAIN) { + /* Recoverable error, retry once */ + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + qeth_l2_dev2br_fdb_flush(card); + qeth_l2_set_pnso_mode(card, QETH_PNSO_ADDR_INFO); + rc = qeth_l2_dev2br_an_set(card, true); + } + if (rc) { + netdev_err(card->dev, + "failed to enable bridge learning_sync: %d\n", + rc); + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + qeth_l2_dev2br_fdb_flush(card); + priv->brport_features ^= BR_LEARNING_SYNC; + } + } else { + dev_warn(&card->gdev->dev, + "bridge learning_sync not supported\n"); + priv->brport_features ^= BR_LEARNING_SYNC; + } } }
@@@ -1154,9 -829,6 +1154,9 @@@ static int qeth_l2_set_online(struct qe goto out_remove; }
+ /* query before bridgeport_notification may be enabled */ + qeth_l2_detect_dev2br_support(card); + mutex_lock(&card->sbp_lock); qeth_bridgeport_query_support(card); if (card->options.sbp.supported_funcs) { @@@ -1199,7 -871,6 +1199,7 @@@
netif_device_attach(dev); qeth_enable_hw_features(dev); + qeth_l2_enable_brport_features(card);
if (card->info.open_when_online) { card->info.open_when_online = 0; @@@ -1419,14 -1090,15 +1419,14 @@@ static void qeth_bridge_emit_host_event struct qeth_bridge_state_data { struct work_struct worker; struct qeth_card *card; - struct qeth_sbp_state_change qports; + u8 role; + u8 state; };
static void qeth_bridge_state_change_worker(struct work_struct *work) { struct qeth_bridge_state_data *data = container_of(work, struct qeth_bridge_state_data, worker); - /* We are only interested in the first entry - local port */ - struct qeth_sbp_port_entry *entry = &data->qports.entry[0]; char env_locrem[32]; char env_role[32]; char env_state[32]; @@@ -1437,16 -1109,22 +1437,16 @@@ NULL };
- /* Role should not change by itself, but if it did, */ - /* information from the hardware is authoritative. */ - mutex_lock(&data->card->sbp_lock); - data->card->options.sbp.role = entry->role; - mutex_unlock(&data->card->sbp_lock); - snprintf(env_locrem, sizeof(env_locrem), "BRIDGEPORT=statechange"); snprintf(env_role, sizeof(env_role), "ROLE=%s", - (entry->role == QETH_SBP_ROLE_NONE) ? "none" : - (entry->role == QETH_SBP_ROLE_PRIMARY) ? "primary" : - (entry->role == QETH_SBP_ROLE_SECONDARY) ? "secondary" : + (data->role == QETH_SBP_ROLE_NONE) ? "none" : + (data->role == QETH_SBP_ROLE_PRIMARY) ? "primary" : + (data->role == QETH_SBP_ROLE_SECONDARY) ? "secondary" : "<INVALID>"); snprintf(env_state, sizeof(env_state), "STATE=%s", - (entry->state == QETH_SBP_STATE_INACTIVE) ? "inactive" : - (entry->state == QETH_SBP_STATE_STANDBY) ? "standby" : - (entry->state == QETH_SBP_STATE_ACTIVE) ? "active" : + (data->state == QETH_SBP_STATE_INACTIVE) ? "inactive" : + (data->state == QETH_SBP_STATE_STANDBY) ? "standby" : + (data->state == QETH_SBP_STATE_ACTIVE) ? "active" : "<INVALID>"); kobject_uevent_env(&data->card->gdev->dev.kobj, KOBJ_CHANGE, env); @@@ -1456,8 -1134,10 +1456,8 @@@ static void qeth_bridge_state_change(struct qeth_card *card, struct qeth_ipa_cmd *cmd) { - struct qeth_sbp_state_change *qports = - &cmd->data.sbp.data.state_change; + struct qeth_sbp_port_data *qports = &cmd->data.sbp.data.port_data; struct qeth_bridge_state_data *data; - int extrasize;
QETH_CARD_TEXT(card, 2, "brstchng"); if (qports->num_entries == 0) { @@@ -1468,125 -1148,34 +1468,125 @@@ QETH_CARD_TEXT_(card, 2, "BPsz%04x", qports->entry_length); return; } - extrasize = sizeof(struct qeth_sbp_port_entry) * qports->num_entries; - data = kzalloc(sizeof(struct qeth_bridge_state_data) + extrasize, - GFP_ATOMIC); + + data = kzalloc(sizeof(*data), GFP_ATOMIC); if (!data) { QETH_CARD_TEXT(card, 2, "BPSalloc"); return; } INIT_WORK(&data->worker, qeth_bridge_state_change_worker); data->card = card; - memcpy(&data->qports, qports, - sizeof(struct qeth_sbp_state_change) + extrasize); + /* Information for the local port: */ + data->role = qports->entry[0].role; + data->state = qports->entry[0].state; + queue_work(card->event_wq, &data->worker); }
struct qeth_addr_change_data { - struct work_struct worker; + struct delayed_work dwork; struct qeth_card *card; struct qeth_ipacmd_addr_change ac_event; };
+static void qeth_l2_dev2br_worker(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct qeth_addr_change_data *data; + struct qeth_card *card; + struct qeth_priv *priv; + unsigned int i; + int rc; + + data = container_of(dwork, struct qeth_addr_change_data, dwork); + card = data->card; + priv = netdev_priv(card->dev); + + QETH_CARD_TEXT(card, 4, "dev2brew"); + + if (READ_ONCE(card->info.pnso_mode) == QETH_PNSO_NONE) + goto free; + + /* Potential re-config in progress, try again later: */ + if (!rtnl_trylock()) { + queue_delayed_work(card->event_wq, dwork, + msecs_to_jiffies(100)); + return; + } + if (!netif_device_present(card->dev)) + goto out_unlock; + + if (data->ac_event.lost_event_mask) { + QETH_DBF_MESSAGE(3, + "Address change notification overflow on device %x\n", + CARD_DEVID(card)); + /* Card fdb and bridge fdb are out of sync, card has stopped + * notifications (no need to drain_workqueue). Purge all + * 'extern_learn' entries from the parent bridge and restart + * the notifications. + */ + qeth_l2_dev2br_fdb_flush(card); + rc = qeth_l2_dev2br_an_set(card, true); + if (rc) { + /* TODO: if we want to retry after -EAGAIN, be + * aware there could be stale entries in the + * workqueue now, that need to be drained. + * For now we give up: + */ + netdev_err(card->dev, + "bridge learning_sync failed to recover: %d\n", + rc); + WRITE_ONCE(card->info.pnso_mode, + QETH_PNSO_NONE); + /* To remove fdb entries reported by an_set: */ + qeth_l2_dev2br_fdb_flush(card); + priv->brport_features ^= BR_LEARNING_SYNC; + } else { + QETH_DBF_MESSAGE(3, + "Address Notification resynced on device %x\n", + CARD_DEVID(card)); + } + } else { + for (i = 0; i < data->ac_event.num_entries; i++) { + struct qeth_ipacmd_addr_change_entry *entry = + &data->ac_event.entry[i]; + qeth_l2_dev2br_fdb_notify(card, + entry->change_code, + &entry->token, + &entry->addr_lnid); + } + } + +out_unlock: + rtnl_unlock(); + +free: + kfree(data); +} + static void qeth_addr_change_event_worker(struct work_struct *work) { - struct qeth_addr_change_data *data = - container_of(work, struct qeth_addr_change_data, worker); + struct delayed_work *dwork = to_delayed_work(work); + struct qeth_addr_change_data *data; + struct qeth_card *card; int i;
+ data = container_of(dwork, struct qeth_addr_change_data, dwork); + card = data->card; + QETH_CARD_TEXT(data->card, 4, "adrchgew"); + + if (READ_ONCE(card->info.pnso_mode) == QETH_PNSO_NONE) + goto free; + if (data->ac_event.lost_event_mask) { + /* Potential re-config in progress, try again later: */ + if (!mutex_trylock(&card->sbp_lock)) { + queue_delayed_work(card->event_wq, dwork, + msecs_to_jiffies(100)); + return; + } + dev_info(&data->card->gdev->dev, "Address change notification stopped on %s (%s)\n", data->card->dev->name, @@@ -1595,9 -1184,8 +1595,9 @@@ : (data->ac_event.lost_event_mask == 0x02) ? "Bridge port state change" : "Unknown reason"); - mutex_lock(&data->card->sbp_lock); + data->card->options.sbp.hostnotification = 0; + card->info.pnso_mode = QETH_PNSO_NONE; mutex_unlock(&data->card->sbp_lock); qeth_bridge_emit_host_event(data->card, anev_abort, 0, NULL, NULL); @@@ -1611,8 -1199,6 +1611,8 @@@ &entry->token, &entry->addr_lnid); } + +free: kfree(data); }
@@@ -1624,9 -1210,6 +1624,9 @@@ static void qeth_addr_change_event(stru struct qeth_addr_change_data *data; int extrasize;
+ if (card->info.pnso_mode == QETH_PNSO_NONE) + return; + QETH_CARD_TEXT(card, 4, "adrchgev"); if (cmd->hdr.return_code != 0x0000) { if (cmd->hdr.return_code == 0x0010) { @@@ -1646,14 -1229,11 +1646,14 @@@ QETH_CARD_TEXT(card, 2, "ACNalloc"); return; } - INIT_WORK(&data->worker, qeth_addr_change_event_worker); + if (card->info.pnso_mode == QETH_PNSO_BRIDGEPORT) + INIT_DELAYED_WORK(&data->dwork, qeth_addr_change_event_worker); + else + INIT_DELAYED_WORK(&data->dwork, qeth_l2_dev2br_worker); data->card = card; memcpy(&data->ac_event, hostevs, sizeof(struct qeth_ipacmd_addr_change) + extrasize); - queue_work(card->event_wq, &data->worker); + queue_delayed_work(card->event_wq, &data->dwork, 0); }
/* SETBRIDGEPORT support; sending commands */ @@@ -1838,8 -1418,8 +1838,8 @@@ static int qeth_bridgeport_query_ports_ struct qeth_reply *reply, unsigned long data) { struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data; - struct qeth_sbp_query_ports *qports = &cmd->data.sbp.data.query_ports; struct _qeth_sbp_cbctl *cbctl = (struct _qeth_sbp_cbctl *)reply->param; + struct qeth_sbp_port_data *qports; int rc;
QETH_CARD_TEXT(card, 2, "brqprtcb"); @@@ -1847,7 -1427,6 +1847,7 @@@ if (rc) return rc;
+ qports = &cmd->data.sbp.data.port_data; if (qports->entry_length != sizeof(struct qeth_sbp_port_entry)) { QETH_CARD_TEXT_(card, 2, "SBPs%04x", qports->entry_length); return -EINVAL; @@@ -1975,15 -1554,9 +1975,15 @@@ int qeth_bridgeport_an_set(struct qeth_
if (enable) { qeth_bridge_emit_host_event(card, anev_reset, 0, NULL, NULL); - rc = qeth_l2_pnso(card, 1, qeth_bridgeport_an_set_cb, card); - } else - rc = qeth_l2_pnso(card, 0, NULL, NULL); + qeth_l2_set_pnso_mode(card, QETH_PNSO_BRIDGEPORT); + rc = qeth_l2_pnso(card, PNSO_OC_NET_BRIDGE_INFO, 1, + qeth_bridgeport_an_set_cb, card); + if (rc) + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + } else { + rc = qeth_l2_pnso(card, PNSO_OC_NET_BRIDGE_INFO, 0, NULL, NULL); + qeth_l2_set_pnso_mode(card, QETH_PNSO_NONE); + } return rc; }
@@@ -2278,7 -1851,7 +2278,7 @@@ int qeth_l2_vnicc_get_timeout(struct qe }
/* check if VNICC is currently enabled */ -bool qeth_l2_vnicc_is_in_use(struct qeth_card *card) +static bool _qeth_l2_vnicc_is_in_use(struct qeth_card *card) { if (!card->options.vnicc.sup_chars) return false; @@@ -2293,21 -1866,6 +2293,21 @@@ return true; }
+/** + * qeth_bridgeport_allowed - are any qeth_bridgeport functions allowed? + * @card: qeth_card structure pointer + * + * qeth_bridgeport functionality is mutually exclusive with usage of the + * VNIC Characteristics and dev2br address notifications + */ +bool qeth_bridgeport_allowed(struct qeth_card *card) +{ + struct qeth_priv *priv = netdev_priv(card->dev); + + return (!_qeth_l2_vnicc_is_in_use(card) && + !(priv->brport_features & BR_LEARNING_SYNC)); +} + /* recover user timeout setting */ static bool qeth_l2_vnicc_recover_timeout(struct qeth_card *card, u32 vnicc, u32 *timeout) diff --combined drivers/s390/net/qeth_l3_main.c index 767c5bb7c24c,09ef518ca1ea..33fdad1a6887 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@@ -314,8 -314,7 +314,8 @@@ static int qeth_l3_setdelip_cb(struct q }
static int qeth_l3_send_setdelmc(struct qeth_card *card, - struct qeth_ipaddr *addr, int ipacmd) + struct qeth_ipaddr *addr, + enum qeth_ipa_cmds ipacmd) { struct qeth_cmd_buffer *iob; struct qeth_ipa_cmd *cmd; @@@ -1169,11 -1168,11 +1169,11 @@@ static void qeth_l3_stop_card(struct qe if (card->state == CARD_STATE_SOFTSETUP) { qeth_l3_clear_ip_htable(card, 1); qeth_clear_ipacmd_list(card); - qeth_drain_output_queues(card); card->state = CARD_STATE_DOWN; }
qeth_qdio_clear_card(card, 0); + qeth_drain_output_queues(card); qeth_clear_working_pool_list(card); flush_workqueue(card->event_wq); qeth_flush_local_addrs(card); diff --combined fs/io_uring.c index 522b891dd187,8b426aa29668..c9aea6c44372 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@@ -1753,6 -1753,9 +1753,9 @@@ static int io_req_task_work_add(struct struct io_ring_ctx *ctx = req->ctx; int ret, notify;
+ if (tsk->flags & PF_EXITING) + return -ESRCH; + /* * SQPOLL kernel thread doesn't need notification, just a wakeup. For * all other cases, use TWA_SIGNAL unconditionally to ensure we're @@@ -1787,8 -1790,10 +1790,10 @@@ static void __io_req_task_cancel(struc static void io_req_task_cancel(struct callback_head *cb) { struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work); + struct io_ring_ctx *ctx = req->ctx;
__io_req_task_cancel(req, -ECANCELED); + percpu_ref_put(&ctx->refs); }
static void __io_req_task_submit(struct io_kiocb *req) @@@ -2010,6 -2015,12 +2015,12 @@@ static inline unsigned int io_put_rw_kb
static inline bool io_run_task_work(void) { + /* + * Not safe to run on exiting task, and the task_work handling will + * not add work to such a task. + */ + if (unlikely(current->flags & PF_EXITING)) + return false; if (current->task_works) { __set_current_state(TASK_RUNNING); task_work_run(); @@@ -2283,13 -2294,17 +2294,17 @@@ static bool io_resubmit_prep(struct io_ goto end_req; }
- ret = io_import_iovec(rw, req, &iovec, &iter, false); - if (ret < 0) - goto end_req; - ret = io_setup_async_rw(req, iovec, inline_vecs, &iter, false); - if (!ret) + if (!req->io) { + ret = io_import_iovec(rw, req, &iovec, &iter, false); + if (ret < 0) + goto end_req; + ret = io_setup_async_rw(req, iovec, inline_vecs, &iter, false); + if (!ret) + return true; + kfree(iovec); + } else { return true; - kfree(iovec); + } end_req: req_set_fail_links(req); io_req_complete(req, ret); @@@ -2980,14 -2995,15 +2995,15 @@@ static inline int io_rw_prep_async(stru bool force_nonblock) { struct io_async_rw *iorw = &req->io->rw; + struct iovec *iov; ssize_t ret;
- iorw->iter.iov = iorw->fast_iov; - ret = __io_import_iovec(rw, req, (struct iovec **) &iorw->iter.iov, - &iorw->iter, !force_nonblock); + iorw->iter.iov = iov = iorw->fast_iov; + ret = __io_import_iovec(rw, req, &iov, &iorw->iter, !force_nonblock); if (unlikely(ret < 0)) return ret;
+ iorw->iter.iov = iov; io_req_map_rw(req, iorw->iter.iov, iorw->fast_iov, &iorw->iter); return 0; } @@@ -3114,6 -3130,7 +3130,7 @@@ static int io_read(struct io_kiocb *req struct iov_iter __iter, *iter = &__iter; ssize_t io_size, ret, ret2; size_t iov_count; + bool no_async;
if (req->io) iter = &req->io->rw.iter; @@@ -3131,7 -3148,8 +3148,8 @@@ kiocb->ki_flags &= ~IOCB_NOWAIT;
/* If the file doesn't support async, just async punt */ - if (force_nonblock && !io_file_supports_async(req->file, READ)) + no_async = force_nonblock && !io_file_supports_async(req->file, READ); + if (no_async) goto copy_iov;
ret = rw_verify_area(READ, req->file, io_kiocb_ppos(kiocb), iov_count); @@@ -3175,6 -3193,8 +3193,8 @@@ copy_iov ret = ret2; goto out_free; } + if (no_async) + return -EAGAIN; /* it's copied and will be cleaned with ->io */ iovec = NULL; /* now use our persistent iterator, if we aren't already */ @@@ -3507,8 -3527,6 +3527,6 @@@ static int __io_openat_prep(struct io_k const char __user *fname; int ret;
- if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) - return -EINVAL; if (unlikely(sqe->ioprio || sqe->buf_index)) return -EINVAL; if (unlikely(req->flags & REQ_F_FIXED_FILE)) @@@ -3535,6 -3553,8 +3553,8 @@@ static int io_openat_prep(struct io_kio { u64 flags, mode;
+ if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; if (req->flags & REQ_F_NEED_CLEANUP) return 0; mode = READ_ONCE(sqe->len); @@@ -3549,6 -3569,8 +3569,8 @@@ static int io_openat2_prep(struct io_ki size_t len; int ret;
+ if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; if (req->flags & REQ_F_NEED_CLEANUP) return 0; how = u64_to_user_ptr(READ_ONCE(sqe->addr2)); @@@ -3766,7 -3788,7 +3788,7 @@@ static int io_epoll_ctl_prep(struct io_ #if defined(CONFIG_EPOLL) if (sqe->ioprio || sqe->buf_index) return -EINVAL; - if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL))) return -EINVAL;
req->epoll.epfd = READ_ONCE(sqe->fd); @@@ -3881,7 -3903,7 +3903,7 @@@ static int io_fadvise(struct io_kiocb *
static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { - if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL))) return -EINVAL; if (sqe->ioprio || sqe->buf_index) return -EINVAL; @@@ -4927,12 -4949,6 +4949,12 @@@ static bool io_arm_poll_handler(struct mask |= POLLIN | POLLRDNORM; if (def->pollout) mask |= POLLOUT | POLLWRNORM; + + /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ + if ((req->opcode == IORING_OP_RECVMSG) && + (req->sr_msg.msg_flags & MSG_ERRQUEUE)) + mask &= ~POLLIN; + mask |= POLLERR | POLLPRI;
ipt.pt._qproc = io_async_queue_proc; @@@ -5404,6 -5420,8 +5426,8 @@@ static int io_async_cancel(struct io_ki static int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { + if (unlikely(req->ctx->flags & IORING_SETUP_SQPOLL)) + return -EINVAL; if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT))) return -EINVAL; if (sqe->ioprio || sqe->rw_flags) @@@ -5454,6 -5472,8 +5478,8 @@@ static int io_req_defer_prep(struct io_ if (unlikely(ret)) return ret;
+ io_prep_async_work(req); + switch (req->opcode) { case IORING_OP_NOP: break; @@@ -8029,6 -8049,28 +8055,28 @@@ static bool io_match_link(struct io_kio return false; }
+ static inline bool io_match_files(struct io_kiocb *req, + struct files_struct *files) + { + return (req->flags & REQ_F_WORK_INITIALIZED) && req->work.files == files; + } + + static bool io_match_link_files(struct io_kiocb *req, + struct files_struct *files) + { + struct io_kiocb *link; + + if (io_match_files(req, files)) + return true; + if (req->flags & REQ_F_LINK_HEAD) { + list_for_each_entry(link, &req->link_list, link_list) { + if (io_match_files(link, files)) + return true; + } + } + return false; + } + /* * We're looking to cancel 'req' because it's holding on to our files, but * 'req' could be a link to another request. See if it is, and cancel that @@@ -8103,12 -8145,38 +8151,38 @@@ static void io_attempt_cancel(struct io io_timeout_remove_link(ctx, req); }
+ static void io_cancel_defer_files(struct io_ring_ctx *ctx, + struct files_struct *files) + { + struct io_defer_entry *de = NULL; + LIST_HEAD(list); + + spin_lock_irq(&ctx->completion_lock); + list_for_each_entry_reverse(de, &ctx->defer_list, list) { + if (io_match_link_files(de->req, files)) { + list_cut_position(&list, &ctx->defer_list, &de->list); + break; + } + } + spin_unlock_irq(&ctx->completion_lock); + + while (!list_empty(&list)) { + de = list_first_entry(&list, struct io_defer_entry, list); + list_del_init(&de->list); + req_set_fail_links(de->req); + io_put_req(de->req); + io_req_complete(de->req, -ECANCELED); + kfree(de); + } + } + static void io_uring_cancel_files(struct io_ring_ctx *ctx, struct files_struct *files) { if (list_empty_careful(&ctx->inflight_list)) return;
+ io_cancel_defer_files(ctx, files); /* cancel all at once, should be faster than doing it one by one*/ io_wq_cancel_cb(ctx->io_wq, io_wq_files_match, files, true);
@@@ -8137,6 -8205,8 +8211,8 @@@ /* cancel this request, or head link requests */ io_attempt_cancel(ctx, cancel_req); io_put_req(cancel_req); + /* cancellations _may_ trigger task work */ + io_run_task_work(); schedule(); finish_wait(&ctx->inflight_wait, &wait); } diff --combined include/linux/netdevice.h index fef0eb96cf69,7bd4fcdd0738..a431c3229cbf --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@@ -70,7 -70,6 +70,7 @@@ struct udp_tunnel_nic struct bpf_prog; struct xdp_buff;
+void synchronize_net(void); void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops);
@@@ -355,7 -354,7 +355,7 @@@ enum NAPI_STATE_MISSED, /* reschedule a napi */ NAPI_STATE_DISABLE, /* Disable pending */ NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */ - NAPI_STATE_HASHED, /* In NAPI hash (busy polling possible) */ + NAPI_STATE_LISTED, /* NAPI added to system lists */ NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */ NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */ }; @@@ -365,7 -364,7 +365,7 @@@ enum NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED), NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE), NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC), - NAPIF_STATE_HASHED = BIT(NAPI_STATE_HASHED), + NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED), NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), }; @@@ -489,6 -488,20 +489,6 @@@ static inline bool napi_complete(struc return napi_complete_done(n, 0); }
-/** - * napi_hash_del - remove a NAPI from global table - * @napi: NAPI context - * - * Warning: caller must observe RCU grace period - * before freeing memory containing @napi, if - * this function returns true. - * Note: core networking stack automatically calls it - * from netif_napi_del(). - * Drivers might want to call this helper to combine all - * the needed RCU grace periods into a single one. - */ -bool napi_hash_del(struct napi_struct *napi); - /** * napi_disable - prevent NAPI from scheduling * @n: NAPI context @@@ -605,7 -618,7 +605,7 @@@ struct netdev_queue /* Subordinate device that the queue has been assigned to */ struct net_device *sb_dev; #ifdef CONFIG_XDP_SOCKETS - struct xdp_umem *umem; + struct xsk_buff_pool *pool; #endif /* * write-mostly part @@@ -627,16 -640,11 +627,16 @@@ extern int sysctl_fb_tunnels_only_for_init_net; extern int sysctl_devconf_inherit_init_net;
+/* + * sysctl_fb_tunnels_only_for_init_net == 0 : For all netns + * == 1 : For initns only + * == 2 : For none. + */ static inline bool net_has_fallback_tunnels(const struct net *net) { - return net == &init_net || - !IS_ENABLED(CONFIG_SYSCTL) || - !sysctl_fb_tunnels_only_for_init_net; + return !IS_ENABLED(CONFIG_SYSCTL) || + !sysctl_fb_tunnels_only_for_init_net || + (net == &init_net && sysctl_fb_tunnels_only_for_init_net == 1); }
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q) @@@ -743,7 -751,7 +743,7 @@@ struct netdev_rx_queue struct net_device *dev; struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_XDP_SOCKETS - struct xdp_umem *umem; + struct xsk_buff_pool *pool; #endif } ____cacheline_aligned_in_smp;
@@@ -871,7 -879,7 +871,7 @@@ enum bpf_netdev_command /* BPF program for offload callbacks, invoked at program load time. */ BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, - XDP_SETUP_XSK_UMEM, + XDP_SETUP_XSK_POOL, };
struct bpf_prog_offload_ops; @@@ -905,9 -913,9 +905,9 @@@ struct netdev_bpf struct { struct bpf_offloaded_map *offmap; }; - /* XDP_SETUP_XSK_UMEM */ + /* XDP_SETUP_XSK_POOL */ struct { - struct xdp_umem *umem; + struct xsk_buff_pool *pool; u16 queue_id; } xsk; }; @@@ -1776,6 -1784,7 +1776,7 @@@ enum netdev_priv_flags * the watchdog (see dev_watchdog()) * @watchdog_timer: List of timers * + * @proto_down_reason: reason a netdev interface is held down * @pcpu_refcnt: Number of references to this device * @todo_list: Delayed register/unregister * @link_watch_list: XXX: need comments on this one @@@ -1840,6 -1849,7 +1841,7 @@@ * @udp_tunnel_nic_info: static structure describing the UDP tunnel * offload capabilities of the device * @udp_tunnel_nic: UDP tunnel offload state + * @xdp_state: stores info on attached XDP BPF programs * * FIXME: cleanup struct net_device such that network protocol info * moves out. @@@ -2185,22 -2195,6 +2187,22 @@@ int netdev_get_num_tc(struct net_devic return dev->num_tc; }
+static inline void net_prefetch(void *p) +{ + prefetch(p); +#if L1_CACHE_BYTES < 128 + prefetch((u8 *)p + L1_CACHE_BYTES); +#endif +} + +static inline void net_prefetchw(void *p) +{ + prefetchw(p); +#if L1_CACHE_BYTES < 128 + prefetchw((u8 *)p + L1_CACHE_BYTES); +#endif +} + void netdev_unbind_sb_channel(struct net_device *dev, struct net_device *sb_dev); int netdev_bind_sb_channel_queue(struct net_device *dev, @@@ -2355,27 -2349,13 +2357,27 @@@ static inline void netif_tx_napi_add(st netif_napi_add(dev, napi, poll, weight); }
+/** + * __netif_napi_del - remove a NAPI context + * @napi: NAPI context + * + * Warning: caller must observe RCU grace period before freeing memory + * containing @napi. Drivers might want to call this helper to combine + * all the needed RCU grace periods into a single one. + */ +void __netif_napi_del(struct napi_struct *napi); + /** * netif_napi_del - remove a NAPI context * @napi: NAPI context * * netif_napi_del() removes a NAPI context from the network device NAPI list */ -void netif_napi_del(struct napi_struct *napi); +static inline void netif_napi_del(struct napi_struct *napi) +{ + __netif_napi_del(napi); + synchronize_net(); +}
struct napi_gro_cb { /* Virtual address of skb_shinfo(skb)->frags[0].page + offset. */ @@@ -2799,6 -2779,7 +2801,6 @@@ static inline void unregister_netdevice int netdev_refcnt_read(const struct net_device *dev); void free_netdev(struct net_device *dev); void netdev_freemem(struct net_device *dev); -void synchronize_net(void); int init_dummy_netdev(struct net_device *dev);
struct net_device *netdev_get_xmit_slave(struct net_device *dev, @@@ -4678,6 -4659,16 +4680,6 @@@ int netdev_class_create_file_ns(const s void netdev_class_remove_file_ns(const struct class_attribute *class_attr, const void *ns);
-static inline int netdev_class_create_file(const struct class_attribute *class_attr) -{ - return netdev_class_create_file_ns(class_attr, NULL); -} - -static inline void netdev_class_remove_file(const struct class_attribute *class_attr) -{ - netdev_class_remove_file_ns(class_attr, NULL); -} - extern const struct kobj_ns_type_operations net_ns_type_operations;
const char *netdev_drivername(const struct net_device *dev); diff --combined include/linux/qed/qed_if.h index 56fa55841d39,cdd73afc4c46..57fb295ea41a --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@@ -21,7 -21,6 +21,7 @@@ #include <linux/qed/common_hsi.h> #include <linux/qed/qed_chain.h> #include <linux/io-64-nonatomic-lo-hi.h> +#include <net/devlink.h>
enum dcbx_protocol_type { DCBX_PROTOCOL_ISCSI, @@@ -624,6 -623,7 +624,7 @@@ struct qed_dev_info #define QED_MFW_VERSION_3_OFFSET 24
u32 flash_size; + bool b_arfs_capable; bool b_inter_pf_switch; bool tx_switching; bool rdma_supported; @@@ -780,11 -780,6 +781,11 @@@ enum qed_nvm_flash_cmd QED_NVM_FLASH_CMD_NVM_MAX, };
+struct qed_devlink { + struct qed_dev *cdev; + struct devlink_health_reporter *fw_reporter; +}; + struct qed_common_cb_ops { void (*arfs_filter_op)(void *dev, void *fltr, u8 fw_rc); void (*link_update)(void *dev, struct qed_link_output *link); @@@ -850,9 -845,10 +851,9 @@@ struct qed_common_ops struct qed_dev* (*probe)(struct pci_dev *dev, struct qed_probe_params *params);
- void (*remove)(struct qed_dev *cdev); + void (*remove)(struct qed_dev *cdev);
- int (*set_power_state)(struct qed_dev *cdev, - pci_power_t state); + int (*set_power_state)(struct qed_dev *cdev, pci_power_t state);
void (*set_name) (struct qed_dev *cdev, char name[]);
@@@ -860,51 -856,50 +861,51 @@@ * PF params required for the call before slowpath_start is * documented within the qed_pf_params structure definition. */ - void (*update_pf_params)(struct qed_dev *cdev, - struct qed_pf_params *params); - int (*slowpath_start)(struct qed_dev *cdev, - struct qed_slowpath_params *params); + void (*update_pf_params)(struct qed_dev *cdev, + struct qed_pf_params *params);
- int (*slowpath_stop)(struct qed_dev *cdev); + int (*slowpath_start)(struct qed_dev *cdev, + struct qed_slowpath_params *params); + + int (*slowpath_stop)(struct qed_dev *cdev);
/* Requests to use `cnt' interrupts for fastpath. * upon success, returns number of interrupts allocated for fastpath. */ - int (*set_fp_int)(struct qed_dev *cdev, - u16 cnt); + int (*set_fp_int)(struct qed_dev *cdev, u16 cnt);
/* Fills `info' with pointers required for utilizing interrupts */ - int (*get_fp_int)(struct qed_dev *cdev, - struct qed_int_info *info); - - u32 (*sb_init)(struct qed_dev *cdev, - struct qed_sb_info *sb_info, - void *sb_virt_addr, - dma_addr_t sb_phy_addr, - u16 sb_id, - enum qed_sb_type type); - - u32 (*sb_release)(struct qed_dev *cdev, - struct qed_sb_info *sb_info, - u16 sb_id, - enum qed_sb_type type); - - void (*simd_handler_config)(struct qed_dev *cdev, - void *token, - int index, - void (*handler)(void *)); - - void (*simd_handler_clean)(struct qed_dev *cdev, - int index); - int (*dbg_grc)(struct qed_dev *cdev, - void *buffer, u32 *num_dumped_bytes); + int (*get_fp_int)(struct qed_dev *cdev, struct qed_int_info *info); + + u32 (*sb_init)(struct qed_dev *cdev, + struct qed_sb_info *sb_info, + void *sb_virt_addr, + dma_addr_t sb_phy_addr, + u16 sb_id, + enum qed_sb_type type); + + u32 (*sb_release)(struct qed_dev *cdev, + struct qed_sb_info *sb_info, + u16 sb_id, + enum qed_sb_type type); + + void (*simd_handler_config)(struct qed_dev *cdev, + void *token, + int index, + void (*handler)(void *)); + + void (*simd_handler_clean)(struct qed_dev *cdev, int index); + + int (*dbg_grc)(struct qed_dev *cdev, void *buffer, u32 *num_dumped_bytes);
int (*dbg_grc_size)(struct qed_dev *cdev);
- int (*dbg_all_data) (struct qed_dev *cdev, void *buffer); + int (*dbg_all_data)(struct qed_dev *cdev, void *buffer);
- int (*dbg_all_data_size) (struct qed_dev *cdev); + int (*dbg_all_data_size)(struct qed_dev *cdev); + + int (*report_fatal_error)(struct devlink *devlink, + enum qed_hw_err_type err_type);
/** * @brief can_link_change - can the instance change the link or not @@@ -1143,10 -1138,6 +1144,10 @@@ * */ int (*set_grc_config)(struct qed_dev *cdev, u32 cfg_id, u32 val); + + struct devlink* (*devlink_register)(struct qed_dev *cdev); + + void (*devlink_unregister)(struct devlink *devlink); };
#define MASK_FIELD(_name, _value) \ diff --combined include/net/netlink.h index fdd317f8fde4,8e0eb2c9c528..b2cf34f53e55 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@@ -181,6 -181,8 +181,6 @@@ enum NLA_S64, NLA_BITFIELD32, NLA_REJECT, - NLA_EXACT_LEN, - NLA_MIN_LEN, __NLA_TYPE_MAX, };
@@@ -197,11 -199,11 +197,11 @@@ struct netlink_range_validation_signed enum nla_policy_validation { NLA_VALIDATE_NONE, NLA_VALIDATE_RANGE, + NLA_VALIDATE_RANGE_WARN_TOO_LONG, NLA_VALIDATE_MIN, NLA_VALIDATE_MAX, NLA_VALIDATE_RANGE_PTR, NLA_VALIDATE_FUNCTION, - NLA_VALIDATE_WARN_TOO_LONG, };
/** @@@ -220,7 -222,7 +220,7 @@@ * NLA_NUL_STRING Maximum length of string (excluding NUL) * NLA_FLAG Unused * NLA_BINARY Maximum length of attribute payload - * NLA_MIN_LEN Minimum length of attribute payload + * (but see also below with the validation type) * NLA_NESTED, * NLA_NESTED_ARRAY Length verification is done by checking len of * nested header (or empty); len field is used if @@@ -235,6 -237,11 +235,6 @@@ * just like "All other" * NLA_BITFIELD32 Unused * NLA_REJECT Unused - * NLA_EXACT_LEN Attribute should have exactly this length, otherwise - * it is rejected or warned about, the latter happening - * if and only if the `validation_type' is set to - * NLA_VALIDATE_WARN_TOO_LONG. - * NLA_MIN_LEN Minimum length of attribute payload * All other Minimum length of attribute payload * * Meaning of validation union: @@@ -289,11 -296,6 +289,11 @@@ * pointer to a struct netlink_range_validation_signed * that indicates the min/max values. * Use NLA_POLICY_FULL_RANGE_SIGNED(). + * + * NLA_BINARY If the validation type is like the ones for integers + * above, then the min/max length (not value like for + * integers) of the attribute is enforced. + * * All other Unused - but note that it's a union * * Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN: @@@ -307,7 -309,7 +307,7 @@@ * static const struct nla_policy my_policy[ATTR_MAX+1] = { * [ATTR_FOO] = { .type = NLA_U16 }, * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ }, - * [ATTR_BAZ] = { .type = NLA_EXACT_LEN, .len = sizeof(struct mystruct) }, + * [ATTR_BAZ] = NLA_POLICY_EXACT_LEN(sizeof(struct mystruct)), * [ATTR_GOO] = NLA_POLICY_BITFIELD32(myvalidflags), * }; */ @@@ -333,10 -335,9 +333,10 @@@ struct nla_policy * nesting validation starts here. * * Additionally, it means that NLA_UNSPEC is actually NLA_REJECT - * for any types >= this, so need to use NLA_MIN_LEN to get the - * previous pure { .len = xyz } behaviour. The advantage of this - * is that types not specified in the policy will be rejected. + * for any types >= this, so need to use NLA_POLICY_MIN_LEN() to + * get the previous pure { .len = xyz } behaviour. The advantage + * of this is that types not specified in the policy will be + * rejected. * * For completely new families it should be set to 1 so that the * validation is enforced for all attributes. For existing ones @@@ -348,6 -349,12 +348,6 @@@ }; };
-#define NLA_POLICY_EXACT_LEN(_len) { .type = NLA_EXACT_LEN, .len = _len } -#define NLA_POLICY_EXACT_LEN_WARN(_len) \ - { .type = NLA_EXACT_LEN, .len = _len, \ - .validation_type = NLA_VALIDATE_WARN_TOO_LONG, } -#define NLA_POLICY_MIN_LEN(_len) { .type = NLA_MIN_LEN, .len = _len } - #define NLA_POLICY_ETH_ADDR NLA_POLICY_EXACT_LEN(ETH_ALEN) #define NLA_POLICY_ETH_ADDR_COMPAT NLA_POLICY_EXACT_LEN_WARN(ETH_ALEN)
@@@ -363,21 -370,19 +363,21 @@@ { .type = NLA_BITFIELD32, .bitfield32_valid = valid }
#define __NLA_ENSURE(condition) BUILD_BUG_ON_ZERO(!(condition)) -#define NLA_ENSURE_UINT_TYPE(tp) \ +#define NLA_ENSURE_UINT_OR_BINARY_TYPE(tp) \ (__NLA_ENSURE(tp == NLA_U8 || tp == NLA_U16 || \ tp == NLA_U32 || tp == NLA_U64 || \ - tp == NLA_MSECS) + tp) + tp == NLA_MSECS || \ + tp == NLA_BINARY) + tp) #define NLA_ENSURE_SINT_TYPE(tp) \ (__NLA_ENSURE(tp == NLA_S8 || tp == NLA_S16 || \ tp == NLA_S32 || tp == NLA_S64) + tp) -#define NLA_ENSURE_INT_TYPE(tp) \ +#define NLA_ENSURE_INT_OR_BINARY_TYPE(tp) \ (__NLA_ENSURE(tp == NLA_S8 || tp == NLA_U8 || \ tp == NLA_S16 || tp == NLA_U16 || \ tp == NLA_S32 || tp == NLA_U32 || \ tp == NLA_S64 || tp == NLA_U64 || \ - tp == NLA_MSECS) + tp) + tp == NLA_MSECS || \ + tp == NLA_BINARY) + tp) #define NLA_ENSURE_NO_VALIDATION_PTR(tp) \ (__NLA_ENSURE(tp != NLA_BITFIELD32 && \ tp != NLA_REJECT && \ @@@ -385,14 -390,14 +385,14 @@@ tp != NLA_NESTED_ARRAY) + tp)
#define NLA_POLICY_RANGE(tp, _min, _max) { \ - .type = NLA_ENSURE_INT_TYPE(tp), \ + .type = NLA_ENSURE_INT_OR_BINARY_TYPE(tp), \ .validation_type = NLA_VALIDATE_RANGE, \ .min = _min, \ .max = _max \ }
#define NLA_POLICY_FULL_RANGE(tp, _range) { \ - .type = NLA_ENSURE_UINT_TYPE(tp), \ + .type = NLA_ENSURE_UINT_OR_BINARY_TYPE(tp), \ .validation_type = NLA_VALIDATE_RANGE_PTR, \ .range = _range, \ } @@@ -404,13 -409,13 +404,13 @@@ }
#define NLA_POLICY_MIN(tp, _min) { \ - .type = NLA_ENSURE_INT_TYPE(tp), \ + .type = NLA_ENSURE_INT_OR_BINARY_TYPE(tp), \ .validation_type = NLA_VALIDATE_MIN, \ .min = _min, \ }
#define NLA_POLICY_MAX(tp, _max) { \ - .type = NLA_ENSURE_INT_TYPE(tp), \ + .type = NLA_ENSURE_INT_OR_BINARY_TYPE(tp), \ .validation_type = NLA_VALIDATE_MAX, \ .max = _max, \ } @@@ -422,15 -427,6 +422,15 @@@ .len = __VA_ARGS__ + 0, \ }
+#define NLA_POLICY_EXACT_LEN(_len) NLA_POLICY_RANGE(NLA_BINARY, _len, _len) +#define NLA_POLICY_EXACT_LEN_WARN(_len) { \ + .type = NLA_BINARY, \ + .validation_type = NLA_VALIDATE_RANGE_WARN_TOO_LONG, \ + .min = _len, \ + .max = _len \ +} +#define NLA_POLICY_MIN_LEN(_len) NLA_POLICY_MIN(NLA_BINARY, _len) + /** * struct nl_info - netlink source information * @nlh: Netlink message header of original request @@@ -730,7 -726,6 +730,6 @@@ static inline int __nlmsg_parse(const s * @hdrlen: length of family specific header * @tb: destination array with maxtype+1 elements * @maxtype: maximum attribute type to be expected - * @validate: validation strictness * @extack: extended ACK report struct * * See nla_parse() @@@ -828,7 -823,6 +827,6 @@@ static inline int nla_validate_deprecat * @len: length of attribute stream * @maxtype: maximum attribute type to be expected * @policy: validation policy - * @validate: validation strictness * @extack: extended ACK report struct * * Validates all attributes in the specified attribute stream against the diff --combined include/uapi/linux/ethtool_netlink.h index 9cee6df01a10,72ba36be9655..e2bf36e6964b --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@@ -79,6 -79,7 +79,7 @@@ enum ETHTOOL_MSG_TSINFO_GET_REPLY, ETHTOOL_MSG_CABLE_TEST_NTF, ETHTOOL_MSG_CABLE_TEST_TDR_NTF, + ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
/* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@@ -91,12 -92,9 +92,12 @@@ #define ETHTOOL_FLAG_COMPACT_BITSETS (1 << 0) /* provide optional reply for SET or ACT requests */ #define ETHTOOL_FLAG_OMIT_REPLY (1 << 1) +/* request statistics, if supported by the driver */ +#define ETHTOOL_FLAG_STATS (1 << 2)
#define ETHTOOL_FLAG_ALL (ETHTOOL_FLAG_COMPACT_BITSETS | \ - ETHTOOL_FLAG_OMIT_REPLY) + ETHTOOL_FLAG_OMIT_REPLY | \ + ETHTOOL_FLAG_STATS)
enum { ETHTOOL_A_HEADER_UNSPEC, @@@ -379,25 -377,12 +380,25 @@@ enum ETHTOOL_A_PAUSE_AUTONEG, /* u8 */ ETHTOOL_A_PAUSE_RX, /* u8 */ ETHTOOL_A_PAUSE_TX, /* u8 */ + ETHTOOL_A_PAUSE_STATS, /* nest - _PAUSE_STAT_* */
/* add new constants above here */ __ETHTOOL_A_PAUSE_CNT, ETHTOOL_A_PAUSE_MAX = (__ETHTOOL_A_PAUSE_CNT - 1) };
+enum { + ETHTOOL_A_PAUSE_STAT_UNSPEC, + ETHTOOL_A_PAUSE_STAT_PAD, + + ETHTOOL_A_PAUSE_STAT_TX_FRAMES, + ETHTOOL_A_PAUSE_STAT_RX_FRAMES, + + /* add new constants above here */ + __ETHTOOL_A_PAUSE_STAT_CNT, + ETHTOOL_A_PAUSE_STAT_MAX = (__ETHTOOL_A_PAUSE_STAT_CNT - 1) +}; + /* EEE */
enum { diff --combined kernel/bpf/hashtab.c index fe0e06284d33,7df28a45c66b..3395cf140d22 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@@ -9,7 -9,6 +9,7 @@@ #include <linux/rculist_nulls.h> #include <linux/random.h> #include <uapi/linux/btf.h> +#include <linux/rcupdate_trace.h> #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" @@@ -578,7 -577,8 +578,7 @@@ static void *__htab_map_lookup_elem(str struct htab_elem *l; u32 hash, key_size;
- /* Must be called with rcu_read_lock. */ - WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size;
@@@ -941,7 -941,7 +941,7 @@@ static int htab_map_update_elem(struct /* unknown flags */ return -EINVAL;
- WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size;
@@@ -1032,7 -1032,7 +1032,7 @@@ static int htab_lru_map_update_elem(str /* unknown flags */ return -EINVAL;
- WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size;
@@@ -1220,7 -1220,7 +1220,7 @@@ static int htab_map_delete_elem(struct u32 hash, key_size; int ret = -ENOENT;
- WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size;
@@@ -1252,7 -1252,7 +1252,7 @@@ static int htab_lru_map_delete_elem(str u32 hash, key_size; int ret = -ENOENT;
- WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size;
@@@ -1622,7 -1622,6 +1622,6 @@@ struct bpf_iter_seq_hash_map_info struct bpf_map *map; struct bpf_htab *htab; void *percpu_value_buf; // non-zero means percpu hash - unsigned long flags; u32 bucket_id; u32 skip_elems; }; @@@ -1632,7 -1631,6 +1631,6 @@@ bpf_hash_map_seq_find_next(struct bpf_i struct htab_elem *prev_elem) { const struct bpf_htab *htab = info->htab; - unsigned long flags = info->flags; u32 skip_elems = info->skip_elems; u32 bucket_id = info->bucket_id; struct hlist_nulls_head *head; @@@ -1656,19 -1654,18 +1654,18 @@@
/* not found, unlock and go to the next bucket */ b = &htab->buckets[bucket_id++]; - htab_unlock_bucket(htab, b, flags); + rcu_read_unlock(); skip_elems = 0; }
for (i = bucket_id; i < htab->n_buckets; i++) { b = &htab->buckets[i]; - flags = htab_lock_bucket(htab, b); + rcu_read_lock();
count = 0; head = &b->head; hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) { if (count >= skip_elems) { - info->flags = flags; info->bucket_id = i; info->skip_elems = count; return elem; @@@ -1676,7 -1673,7 +1673,7 @@@ count++; }
- htab_unlock_bucket(htab, b, flags); + rcu_read_unlock(); skip_elems = 0; }
@@@ -1754,14 -1751,10 +1751,10 @@@ static int bpf_hash_map_seq_show(struc
static void bpf_hash_map_seq_stop(struct seq_file *seq, void *v) { - struct bpf_iter_seq_hash_map_info *info = seq->private; - if (!v) (void)__bpf_hash_map_seq_show(seq, NULL); else - htab_unlock_bucket(info->htab, - &info->htab->buckets[info->bucket_id], - info->flags); + rcu_read_unlock(); }
static int bpf_iter_init_hash_map(void *priv_data, @@@ -1810,7 -1803,6 +1803,7 @@@ static const struct bpf_iter_seq_info i
static int htab_map_btf_id; const struct bpf_map_ops htab_map_ops = { + .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, @@@ -1828,7 -1820,6 +1821,7 @@@
static int htab_lru_map_btf_id; const struct bpf_map_ops htab_lru_map_ops = { + .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, @@@ -1949,7 -1940,6 +1942,7 @@@ static void htab_percpu_map_seq_show_el
static int htab_percpu_map_btf_id; const struct bpf_map_ops htab_percpu_map_ops = { + .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, @@@ -1966,7 -1956,6 +1959,7 @@@
static int htab_lru_percpu_map_btf_id; const struct bpf_map_ops htab_lru_percpu_map_ops = { + .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, diff --combined kernel/bpf/inode.c index b48a56f53495,18f4969552ac..dd4b7fd60ee7 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@@ -20,7 -20,6 +20,7 @@@ #include <linux/filter.h> #include <linux/bpf.h> #include <linux/bpf_trace.h> +#include "preload/bpf_preload.h"
enum bpf_type { BPF_TYPE_UNSPEC = 0, @@@ -227,10 -226,12 +227,12 @@@ static void *map_seq_next(struct seq_fi else prev_key = key;
+ rcu_read_lock(); if (map->ops->map_get_next_key(map, prev_key, key)) { map_iter(m)->done = true; - return NULL; + key = NULL; } + rcu_read_unlock(); return key; }
@@@ -370,10 -371,9 +372,10 @@@ static struct dentry bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags) { /* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future - * extensions. + * extensions. That allows popoulate_bpffs() create special files. */ - if (strchr(dentry->d_name.name, '.')) + if ((dir->i_mode & S_IALLUGO) && + strchr(dentry->d_name.name, '.')) return ERR_PTR(-EPERM);
return simple_lookup(dir, dentry, flags); @@@ -411,27 -411,6 +413,27 @@@ static const struct inode_operations bp .unlink = simple_unlink, };
+/* pin iterator link into bpffs */ +static int bpf_iter_link_pin_kernel(struct dentry *parent, + const char *name, struct bpf_link *link) +{ + umode_t mode = S_IFREG | S_IRUSR; + struct dentry *dentry; + int ret; + + inode_lock(parent->d_inode); + dentry = lookup_one_len(name, parent, strlen(name)); + if (IS_ERR(dentry)) { + inode_unlock(parent->d_inode); + return PTR_ERR(dentry); + } + ret = bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops, + &bpf_iter_fops); + dput(dentry); + inode_unlock(parent->d_inode); + return ret; +} + static int bpf_obj_do_pin(const char __user *pathname, void *raw, enum bpf_type type) { @@@ -661,91 -640,6 +663,91 @@@ static int bpf_parse_param(struct fs_co return 0; }
+struct bpf_preload_ops *bpf_preload_ops; +EXPORT_SYMBOL_GPL(bpf_preload_ops); + +static bool bpf_preload_mod_get(void) +{ + /* If bpf_preload.ko wasn't loaded earlier then load it now. + * When bpf_preload is built into vmlinux the module's __init + * function will populate it. + */ + if (!bpf_preload_ops) { + request_module("bpf_preload"); + if (!bpf_preload_ops) + return false; + } + /* And grab the reference, so the module doesn't disappear while the + * kernel is interacting with the kernel module and its UMD. + */ + if (!try_module_get(bpf_preload_ops->owner)) { + pr_err("bpf_preload module get failed.\n"); + return false; + } + return true; +} + +static void bpf_preload_mod_put(void) +{ + if (bpf_preload_ops) + /* now user can "rmmod bpf_preload" if necessary */ + module_put(bpf_preload_ops->owner); +} + +static DEFINE_MUTEX(bpf_preload_lock); + +static int populate_bpffs(struct dentry *parent) +{ + struct bpf_preload_info objs[BPF_PRELOAD_LINKS] = {}; + struct bpf_link *links[BPF_PRELOAD_LINKS] = {}; + int err = 0, i; + + /* grab the mutex to make sure the kernel interactions with bpf_preload + * UMD are serialized + */ + mutex_lock(&bpf_preload_lock); + + /* if bpf_preload.ko wasn't built into vmlinux then load it */ + if (!bpf_preload_mod_get()) + goto out; + + if (!bpf_preload_ops->info.tgid) { + /* preload() will start UMD that will load BPF iterator programs */ + err = bpf_preload_ops->preload(objs); + if (err) + goto out_put; + for (i = 0; i < BPF_PRELOAD_LINKS; i++) { + links[i] = bpf_link_by_id(objs[i].link_id); + if (IS_ERR(links[i])) { + err = PTR_ERR(links[i]); + goto out_put; + } + } + for (i = 0; i < BPF_PRELOAD_LINKS; i++) { + err = bpf_iter_link_pin_kernel(parent, + objs[i].link_name, links[i]); + if (err) + goto out_put; + /* do not unlink successfully pinned links even + * if later link fails to pin + */ + links[i] = NULL; + } + /* finish() will tell UMD process to exit */ + err = bpf_preload_ops->finish(); + if (err) + goto out_put; + } +out_put: + bpf_preload_mod_put(); +out: + mutex_unlock(&bpf_preload_lock); + for (i = 0; i < BPF_PRELOAD_LINKS && err; i++) + if (!IS_ERR_OR_NULL(links[i])) + bpf_link_put(links[i]); + return err; +} + static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) { static const struct tree_descr bpf_rfiles[] = { { "" } }; @@@ -762,8 -656,8 +764,8 @@@ inode = sb->s_root->d_inode; inode->i_op = &bpf_dir_iops; inode->i_mode &= ~S_IALLUGO; + populate_bpffs(sb->s_root); inode->i_mode |= S_ISVTX | opts->mode; - return 0; }
@@@ -813,8 -707,6 +815,8 @@@ static int __init bpf_init(void { int ret;
+ mutex_init(&bpf_preload_lock); + ret = sysfs_create_mount_point(fs_kobj, "bpf"); if (ret) return ret; diff --combined mm/filemap.c index 054d93a86f8a,5202e38ab79e..8c8f2deee4e3 --- a/mm/filemap.c +++ b/mm/filemap.c @@@ -827,10 -827,10 +827,10 @@@ int replace_page_cache_page(struct pag } EXPORT_SYMBOL_GPL(replace_page_cache_page);
-static int __add_to_page_cache_locked(struct page *page, - struct address_space *mapping, - pgoff_t offset, gfp_t gfp_mask, - void **shadowp) +noinline int __add_to_page_cache_locked(struct page *page, + struct address_space *mapping, + pgoff_t offset, gfp_t gfp_mask, + void **shadowp) { XA_STATE(xas, &mapping->i_pages, offset); int huge = PageHuge(page); @@@ -988,9 -988,43 +988,43 @@@ void __init pagecache_init(void page_writeback_init(); }
+ /* + * The page wait code treats the "wait->flags" somewhat unusually, because + * we have multiple different kinds of waits, not just the usual "exclusive" + * one. + * + * We have: + * + * (a) no special bits set: + * + * We're just waiting for the bit to be released, and when a waker + * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up, + * and remove it from the wait queue. + * + * Simple and straightforward. + * + * (b) WQ_FLAG_EXCLUSIVE: + * + * The waiter is waiting to get the lock, and only one waiter should + * be woken up to avoid any thundering herd behavior. We'll set the + * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue. + * + * This is the traditional exclusive wait. + * + * (c) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM: + * + * The waiter is waiting to get the bit, and additionally wants the + * lock to be transferred to it for fair lock behavior. If the lock + * cannot be taken, we stop walking the wait queue without waking + * the waiter. + * + * This is the "fair lock handoff" case, and in addition to setting + * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see + * that it now has the lock. + */ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *arg) { - int ret; + unsigned int flags; struct wait_page_key *key = arg; struct wait_page_queue *wait_page = container_of(wait, struct wait_page_queue, wait); @@@ -999,35 -1033,44 +1033,44 @@@ return 0;
/* - * If it's an exclusive wait, we get the bit for it, and - * stop walking if we can't. - * - * If it's a non-exclusive wait, then the fact that this - * wake function was called means that the bit already - * was cleared, and we don't care if somebody then - * re-took it. + * If it's a lock handoff wait, we get the bit for it, and + * stop walking (and do not wake it up) if we can't. */ - ret = 0; - if (wait->flags & WQ_FLAG_EXCLUSIVE) { - if (test_and_set_bit(key->bit_nr, &key->page->flags)) + flags = wait->flags; + if (flags & WQ_FLAG_EXCLUSIVE) { + if (test_bit(key->bit_nr, &key->page->flags)) return -1; - ret = 1; + if (flags & WQ_FLAG_CUSTOM) { + if (test_and_set_bit(key->bit_nr, &key->page->flags)) + return -1; + flags |= WQ_FLAG_DONE; + } } - wait->flags |= WQ_FLAG_WOKEN;
+ /* + * We are holding the wait-queue lock, but the waiter that + * is waiting for this will be checking the flags without + * any locking. + * + * So update the flags atomically, and wake up the waiter + * afterwards to avoid any races. This store-release pairs + * with the load-acquire in wait_on_page_bit_common(). + */ + smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN); wake_up_state(wait->private, mode);
/* * Ok, we have successfully done what we're waiting for, * and we can unconditionally remove the wait entry. * - * Note that this has to be the absolute last thing we do, - * since after list_del_init(&wait->entry) the wait entry + * Note that this pairs with the "finish_wait()" in the + * waiter, and has to be the absolute last thing we do. + * After this list_del_init(&wait->entry) the wait entry * might be de-allocated and the process might even have * exited. */ list_del_init_careful(&wait->entry); - return ret; + return (flags & WQ_FLAG_EXCLUSIVE) != 0; }
static void wake_up_page_bit(struct page *page, int bit_nr) @@@ -1107,8 -1150,8 +1150,8 @@@ enum behavior };
/* - * Attempt to check (or get) the page bit, and mark the - * waiter woken if successful. + * Attempt to check (or get) the page bit, and mark us done + * if successful. */ static inline bool trylock_page_bit_common(struct page *page, int bit_nr, struct wait_queue_entry *wait) @@@ -1119,13 -1162,17 +1162,17 @@@ } else if (test_bit(bit_nr, &page->flags)) return false;
- wait->flags |= WQ_FLAG_WOKEN; + wait->flags |= WQ_FLAG_WOKEN | WQ_FLAG_DONE; return true; }
+ /* How many times do we accept lock stealing from under a waiter? */ + int sysctl_page_lock_unfairness = 5; + static inline int wait_on_page_bit_common(wait_queue_head_t *q, struct page *page, int bit_nr, int state, enum behavior behavior) { + int unfairness = sysctl_page_lock_unfairness; struct wait_page_queue wait_page; wait_queue_entry_t *wait = &wait_page.wait; bool thrashing = false; @@@ -1143,11 -1190,18 +1190,18 @@@ }
init_wait(wait); - wait->flags = behavior == EXCLUSIVE ? WQ_FLAG_EXCLUSIVE : 0; wait->func = wake_page_function; wait_page.page = page; wait_page.bit_nr = bit_nr;
+ repeat: + wait->flags = 0; + if (behavior == EXCLUSIVE) { + wait->flags = WQ_FLAG_EXCLUSIVE; + if (--unfairness < 0) + wait->flags |= WQ_FLAG_CUSTOM; + } + /* * Do one last check whether we can get the * page bit synchronously. @@@ -1170,27 -1224,63 +1224,63 @@@
/* * From now on, all the logic will be based on - * the WQ_FLAG_WOKEN flag, and the and the page - * bit testing (and setting) will be - or has - * already been - done by the wake function. + * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to + * see whether the page bit testing has already + * been done by the wake function. * * We can drop our reference to the page. */ if (behavior == DROP) put_page(page);
+ /* + * Note that until the "finish_wait()", or until + * we see the WQ_FLAG_WOKEN flag, we need to + * be very careful with the 'wait->flags', because + * we may race with a waker that sets them. + */ for (;;) { + unsigned int flags; + set_current_state(state);
- if (signal_pending_state(state, current)) + /* Loop until we've been woken or interrupted */ + flags = smp_load_acquire(&wait->flags); + if (!(flags & WQ_FLAG_WOKEN)) { + if (signal_pending_state(state, current)) + break; + + io_schedule(); + continue; + } + + /* If we were non-exclusive, we're done */ + if (behavior != EXCLUSIVE) break;
- if (wait->flags & WQ_FLAG_WOKEN) + /* If the waker got the lock for us, we're done */ + if (flags & WQ_FLAG_DONE) break;
- io_schedule(); + /* + * Otherwise, if we're getting the lock, we need to + * try to get it ourselves. + * + * And if that fails, we'll have to retry this all. + */ + if (unlikely(test_and_set_bit(bit_nr, &page->flags))) + goto repeat; + + wait->flags |= WQ_FLAG_DONE; + break; }
+ /* + * If a signal happened, this 'finish_wait()' may remove the last + * waiter from the wait-queues, but the PageWaiters bit will remain + * set. That's ok. The next wakeup will take care of it, and trying + * to do it here would be difficult and prone to races. + */ finish_wait(q, wait);
if (thrashing) { @@@ -1200,12 -1290,20 +1290,20 @@@ }
/* - * A signal could leave PageWaiters set. Clearing it here if - * !waitqueue_active would be possible (by open-coding finish_wait), - * but still fail to catch it in the case of wait hash collision. We - * already can fail to clear wait hash collision cases, so don't - * bother with signals either. + * NOTE! The wait->flags weren't stable until we've done the + * 'finish_wait()', and we could have exited the loop above due + * to a signal, and had a wakeup event happen after the signal + * test but before the 'finish_wait()'. + * + * So only after the finish_wait() can we reliably determine + * if we got woken up or not, so we can now figure out the final + * return value based on that state without races. + * + * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive + * waiter, but an exclusive one requires WQ_FLAG_DONE. */ + if (behavior == EXCLUSIVE) + return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; } diff --combined net/batman-adv/bridge_loop_avoidance.c index ab6cec3c7586,c350ab63cd54..ba0027d1f2df --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@@ -25,6 -25,7 +25,7 @@@ #include <linux/lockdep.h> #include <linux/netdevice.h> #include <linux/netlink.h> + #include <linux/preempt.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/seq_file.h> @@@ -83,11 -84,12 +84,12 @@@ static inline u32 batadv_choose_claim(c */ static inline u32 batadv_choose_backbone_gw(const void *data, u32 size) { - const struct batadv_bla_claim *claim = (struct batadv_bla_claim *)data; + const struct batadv_bla_backbone_gw *gw; u32 hash = 0;
- hash = jhash(&claim->addr, sizeof(claim->addr), hash); - hash = jhash(&claim->vid, sizeof(claim->vid), hash); + gw = (struct batadv_bla_backbone_gw *)data; + hash = jhash(&gw->orig, sizeof(gw->orig), hash); + hash = jhash(&gw->vid, sizeof(gw->vid), hash);
return hash % size; } @@@ -1579,13 -1581,16 +1581,16 @@@ int batadv_bla_init(struct batadv_priv }
/** - * batadv_bla_check_bcast_duplist() - Check if a frame is in the broadcast dup. + * batadv_bla_check_duplist() - Check if a frame is in the broadcast dup. * @bat_priv: the bat priv with all the soft interface information - * @skb: contains the bcast_packet to be checked + * @skb: contains the multicast packet to be checked + * @payload_ptr: pointer to position inside the head buffer of the skb + * marking the start of the data to be CRC'ed + * @orig: originator mac address, NULL if unknown * - * check if it is on our broadcast list. Another gateway might - * have sent the same packet because it is connected to the same backbone, - * so we have to remove this duplicate. + * Check if it is on our broadcast list. Another gateway might have sent the + * same packet because it is connected to the same backbone, so we have to + * remove this duplicate. * * This is performed by checking the CRC, which will tell us * with a good chance that it is the same packet. If it is furthermore @@@ -1594,19 -1599,17 +1599,17 @@@ * * Return: true if a packet is in the duplicate list, false otherwise. */ - bool batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, - struct sk_buff *skb) + static bool batadv_bla_check_duplist(struct batadv_priv *bat_priv, + struct sk_buff *skb, u8 *payload_ptr, + const u8 *orig) { - int i, curr; - __be32 crc; - struct batadv_bcast_packet *bcast_packet; struct batadv_bcast_duplist_entry *entry; bool ret = false; - - bcast_packet = (struct batadv_bcast_packet *)skb->data; + int i, curr; + __be32 crc;
/* calculate the crc ... */ - crc = batadv_skb_crc32(skb, (u8 *)(bcast_packet + 1)); + crc = batadv_skb_crc32(skb, payload_ptr);
spin_lock_bh(&bat_priv->bla.bcast_duplist_lock);
@@@ -1625,8 -1628,21 +1628,21 @@@ if (entry->crc != crc) continue;
- if (batadv_compare_eth(entry->orig, bcast_packet->orig)) - continue; + /* are the originators both known and not anonymous? */ + if (orig && !is_zero_ether_addr(orig) && + !is_zero_ether_addr(entry->orig)) { + /* If known, check if the new frame came from + * the same originator: + * We are safe to take identical frames from the + * same orig, if known, as multiplications in + * the mesh are detected via the (orig, seqno) pair. + * So we can be a bit more liberal here and allow + * identical frames from the same orig which the source + * host might have sent multiple times on purpose. + */ + if (batadv_compare_eth(entry->orig, orig)) + continue; + }
/* this entry seems to match: same crc, not too old, * and from another gw. therefore return true to forbid it. @@@ -1642,7 -1658,14 +1658,14 @@@ entry = &bat_priv->bla.bcast_duplist[curr]; entry->crc = crc; entry->entrytime = jiffies; - ether_addr_copy(entry->orig, bcast_packet->orig); + + /* known originator */ + if (orig) + ether_addr_copy(entry->orig, orig); + /* anonymous originator */ + else + eth_zero_addr(entry->orig); + bat_priv->bla.bcast_duplist_curr = curr;
out: @@@ -1651,6 -1674,48 +1674,48 @@@ return ret; }
+ /** + * batadv_bla_check_ucast_duplist() - Check if a frame is in the broadcast dup. + * @bat_priv: the bat priv with all the soft interface information + * @skb: contains the multicast packet to be checked, decapsulated from a + * unicast_packet + * + * Check if it is on our broadcast list. Another gateway might have sent the + * same packet because it is connected to the same backbone, so we have to + * remove this duplicate. + * + * Return: true if a packet is in the duplicate list, false otherwise. + */ + static bool batadv_bla_check_ucast_duplist(struct batadv_priv *bat_priv, + struct sk_buff *skb) + { + return batadv_bla_check_duplist(bat_priv, skb, (u8 *)skb->data, NULL); + } + + /** + * batadv_bla_check_bcast_duplist() - Check if a frame is in the broadcast dup. + * @bat_priv: the bat priv with all the soft interface information + * @skb: contains the bcast_packet to be checked + * + * Check if it is on our broadcast list. Another gateway might have sent the + * same packet because it is connected to the same backbone, so we have to + * remove this duplicate. + * + * Return: true if a packet is in the duplicate list, false otherwise. + */ + bool batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, + struct sk_buff *skb) + { + struct batadv_bcast_packet *bcast_packet; + u8 *payload_ptr; + + bcast_packet = (struct batadv_bcast_packet *)skb->data; + payload_ptr = (u8 *)(bcast_packet + 1); + + return batadv_bla_check_duplist(bat_priv, skb, payload_ptr, + bcast_packet->orig); + } + /** * batadv_bla_is_backbone_gw_orig() - Check if the originator is a gateway for * the VLAN identified by vid. @@@ -1798,7 -1863,7 +1863,7 @@@ batadv_bla_loopdetect_check(struct bata
ret = queue_work(batadv_event_workqueue, &backbone_gw->report_work);
- /* backbone_gw is unreferenced in the report work function function + /* backbone_gw is unreferenced in the report work function * if queue_work() call was successful */ if (!ret) @@@ -1812,7 -1877,7 +1877,7 @@@ * @bat_priv: the bat priv with all the soft interface information * @skb: the frame to be checked * @vid: the VLAN ID of the frame - * @is_bcast: the packet came in a broadcast packet type. + * @packet_type: the batman packet type this frame came in * * batadv_bla_rx avoidance checks if: * * we have to race for a claim @@@ -1824,7 -1889,7 +1889,7 @@@ * further process the skb. */ bool batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, - unsigned short vid, bool is_bcast) + unsigned short vid, int packet_type) { struct batadv_bla_backbone_gw *backbone_gw; struct ethhdr *ethhdr; @@@ -1846,9 -1911,32 +1911,32 @@@ goto handled;
if (unlikely(atomic_read(&bat_priv->bla.num_requests))) - /* don't allow broadcasts while requests are in flight */ - if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast) - goto handled; + /* don't allow multicast packets while requests are in flight */ + if (is_multicast_ether_addr(ethhdr->h_dest)) + /* Both broadcast flooding or multicast-via-unicasts + * delivery might send to multiple backbone gateways + * sharing the same LAN and therefore need to coordinate + * which backbone gateway forwards into the LAN, + * by claiming the payload source address. + * + * Broadcast flooding and multicast-via-unicasts + * delivery use the following two batman packet types. + * Note: explicitly exclude BATADV_UNICAST_4ADDR, + * as the DHCP gateway feature will send explicitly + * to only one BLA gateway, so the claiming process + * should be avoided there. + */ + if (packet_type == BATADV_BCAST || + packet_type == BATADV_UNICAST) + goto handled; + + /* potential duplicates from foreign BLA backbone gateways via + * multicast-in-unicast packets + */ + if (is_multicast_ether_addr(ethhdr->h_dest) && + packet_type == BATADV_UNICAST && + batadv_bla_check_ucast_duplist(bat_priv, skb)) + goto handled;
ether_addr_copy(search_claim.addr, ethhdr->h_source); search_claim.vid = vid; @@@ -1883,13 -1971,14 +1971,14 @@@ goto allow; }
- /* if it is a broadcast ... */ - if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast) { + /* if it is a multicast ... */ + if (is_multicast_ether_addr(ethhdr->h_dest) && + (packet_type == BATADV_BCAST || packet_type == BATADV_UNICAST)) { /* ... drop it. the responsible gateway is in charge. * - * We need to check is_bcast because with the gateway + * We need to check packet type because with the gateway * feature, broadcasts (like DHCP requests) may be sent - * using a unicast packet type. + * using a unicast 4 address packet type. See comment above. */ goto handled; } else { diff --combined net/batman-adv/multicast.c index 1622c3f5898f,ca24a2e522b7..0746fe2c2c04 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@@ -51,6 -51,7 +51,7 @@@ #include <uapi/linux/batadv_packet.h> #include <uapi/linux/batman_adv.h>
+ #include "bridge_loop_avoidance.h" #include "hard-interface.h" #include "hash.h" #include "log.h" @@@ -207,7 -208,7 +208,7 @@@ static u8 batadv_mcast_mla_rtr_flags_br return BATADV_MCAST_WANT_NO_RTR4 | BATADV_MCAST_WANT_NO_RTR6;
/* TODO: ask the bridge if a multicast router is present (the bridge - * is capable of performing proper RFC4286 multicast multicast router + * is capable of performing proper RFC4286 multicast router * discovery) instead of searching for a ff02::2 listener here */ ret = br_multicast_list_adjacent(dev, &bridge_mcast_list); @@@ -1434,6 -1435,35 +1435,35 @@@ batadv_mcast_forw_mode(struct batadv_pr return BATADV_FORW_ALL; }
+ /** + * batadv_mcast_forw_send_orig() - send a multicast packet to an originator + * @bat_priv: the bat priv with all the soft interface information + * @skb: the multicast packet to send + * @vid: the vlan identifier + * @orig_node: the originator to send the packet to + * + * Return: NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ + int batadv_mcast_forw_send_orig(struct batadv_priv *bat_priv, + struct sk_buff *skb, + unsigned short vid, + struct batadv_orig_node *orig_node) + { + /* Avoid sending multicast-in-unicast packets to other BLA + * gateways - they already got the frame from the LAN side + * we share with them. + * TODO: Refactor to take BLA into account earlier, to avoid + * reducing the mcast_fanout count. + */ + if (batadv_bla_is_backbone_gw_orig(bat_priv, orig_node->orig, vid)) { + dev_kfree_skb(skb); + return NET_XMIT_SUCCESS; + } + + return batadv_send_skb_unicast(bat_priv, skb, BATADV_UNICAST, 0, + orig_node, vid); + } + /** * batadv_mcast_forw_tt() - forwards a packet to multicast listeners * @bat_priv: the bat priv with all the soft interface information @@@ -1471,8 -1501,8 +1501,8 @@@ batadv_mcast_forw_tt(struct batadv_pri break; }
- batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0, - orig_entry->orig_node, vid); + batadv_mcast_forw_send_orig(bat_priv, newskb, vid, + orig_entry->orig_node); } rcu_read_unlock();
@@@ -1513,8 -1543,7 +1543,7 @@@ batadv_mcast_forw_want_all_ipv4(struct break; }
- batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0, - orig_node, vid); + batadv_mcast_forw_send_orig(bat_priv, newskb, vid, orig_node); } rcu_read_unlock(); return ret; @@@ -1551,8 -1580,7 +1580,7 @@@ batadv_mcast_forw_want_all_ipv6(struct break; }
- batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0, - orig_node, vid); + batadv_mcast_forw_send_orig(bat_priv, newskb, vid, orig_node); } rcu_read_unlock(); return ret; @@@ -1618,8 -1646,7 +1646,7 @@@ batadv_mcast_forw_want_all_rtr4(struct break; }
- batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0, - orig_node, vid); + batadv_mcast_forw_send_orig(bat_priv, newskb, vid, orig_node); } rcu_read_unlock(); return ret; @@@ -1656,8 -1683,7 +1683,7 @@@ batadv_mcast_forw_want_all_rtr6(struct break; }
- batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0, - orig_node, vid); + batadv_mcast_forw_send_orig(bat_priv, newskb, vid, orig_node); } rcu_read_unlock(); return ret; diff --combined net/batman-adv/soft-interface.c index 9d3974ba11ed,cdde943c1b83..82e7ca886605 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@@ -364,9 -364,8 +364,8 @@@ send goto dropped; ret = batadv_send_skb_via_gw(bat_priv, skb, vid); } else if (mcast_single_orig) { - ret = batadv_send_skb_unicast(bat_priv, skb, - BATADV_UNICAST, 0, - mcast_single_orig, vid); + ret = batadv_mcast_forw_send_orig(bat_priv, skb, vid, + mcast_single_orig); } else if (forw_mode == BATADV_FORW_SOME) { ret = batadv_mcast_forw_send(bat_priv, skb, vid); } else { @@@ -425,10 -424,10 +424,10 @@@ void batadv_interface_rx(struct net_dev struct vlan_ethhdr *vhdr; struct ethhdr *ethhdr; unsigned short vid; - bool is_bcast; + int packet_type;
batadv_bcast_packet = (struct batadv_bcast_packet *)skb->data; - is_bcast = (batadv_bcast_packet->packet_type == BATADV_BCAST); + packet_type = batadv_bcast_packet->packet_type;
skb_pull_rcsum(skb, hdr_size); skb_reset_mac_header(skb); @@@ -471,7 -470,7 +470,7 @@@ /* Let the bridge loop avoidance check the packet. If will * not handle it, we can safely push it up. */ - if (batadv_bla_rx(bat_priv, skb, vid, is_bcast)) + if (batadv_bla_rx(bat_priv, skb, vid, packet_type)) goto out;
if (orig_node) @@@ -649,7 -648,7 +648,7 @@@ static void batadv_softif_destroy_vlan( /** * batadv_interface_add_vid() - ndo_add_vid API implementation * @dev: the netdev of the mesh interface - * @proto: protocol of the the vlan id + * @proto: protocol of the vlan id * @vid: identifier of the new vlan * * Set up all the internal structures for handling the new vlan on top of the @@@ -707,7 -706,7 +706,7 @@@ static int batadv_interface_add_vid(str /** * batadv_interface_kill_vid() - ndo_kill_vid API implementation * @dev: the netdev of the mesh interface - * @proto: protocol of the the vlan id + * @proto: protocol of the vlan id * @vid: identifier of the deleted vlan * * Destroy all the internal structures used to handle the vlan identified by vid diff --combined net/bridge/br_vlan.c index 199deb2adf60,61c94cefa843..002bbc93209d --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@@ -140,7 -140,7 +140,7 @@@ static int __vlan_vid_del(struct net_de return err == -EOPNOTSUPP ? 0 : err; }
-/* Returns a master vlan, if it didn't exist it gets created. In all cases a +/* Returns a master vlan, if it didn't exist it gets created. In all cases * a reference is taken to the master vlan before returning. */ static struct net_bridge_vlan * @@@ -1288,11 -1288,13 +1288,13 @@@ void br_vlan_get_stats(const struct net } }
- static int __br_vlan_get_pvid(const struct net_device *dev, - struct net_bridge_port *p, u16 *p_pvid) + int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid) { struct net_bridge_vlan_group *vg; + struct net_bridge_port *p;
+ ASSERT_RTNL(); + p = br_port_get_check_rtnl(dev); if (p) vg = nbp_vlan_group(p); else if (netif_is_bridge_master(dev)) @@@ -1303,18 -1305,23 +1305,23 @@@ *p_pvid = br_get_pvid(vg); return 0; } - - int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid) - { - ASSERT_RTNL(); - - return __br_vlan_get_pvid(dev, br_port_get_check_rtnl(dev), p_pvid); - } EXPORT_SYMBOL_GPL(br_vlan_get_pvid);
int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid) { - return __br_vlan_get_pvid(dev, br_port_get_check_rcu(dev), p_pvid); + struct net_bridge_vlan_group *vg; + struct net_bridge_port *p; + + p = br_port_get_check_rcu(dev); + if (p) + vg = nbp_vlan_group_rcu(p); + else if (netif_is_bridge_master(dev)) + vg = br_vlan_group_rcu(netdev_priv(dev)); + else + return -EINVAL; + + *p_pvid = br_get_pvid(vg); + return 0; } EXPORT_SYMBOL_GPL(br_vlan_get_pvid_rcu);
@@@ -1884,8 -1891,8 +1891,8 @@@ out_err }
static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] = { - [BRIDGE_VLANDB_ENTRY_INFO] = { .type = NLA_EXACT_LEN, - .len = sizeof(struct bridge_vlan_info) }, + [BRIDGE_VLANDB_ENTRY_INFO] = + NLA_POLICY_EXACT_LEN(sizeof(struct bridge_vlan_info)), [BRIDGE_VLANDB_ENTRY_RANGE] = { .type = NLA_U16 }, [BRIDGE_VLANDB_ENTRY_STATE] = { .type = NLA_U8 }, [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED }, diff --combined net/core/dev.c index bd9c8510d86f,266073e300b5..38a172a63318 --- a/net/core/dev.c +++ b/net/core/dev.c @@@ -98,7 -98,6 +98,7 @@@ #include <net/busy_poll.h> #include <linux/rtnetlink.h> #include <linux/stat.h> +#include <net/dsa.h> #include <net/dst.h> #include <net/dst_metadata.h> #include <net/pkt_sched.h> @@@ -1131,7 -1130,7 +1131,7 @@@ EXPORT_SYMBOL(__dev_get_by_flags) * @name: name string * * Network device names need to be valid file names to - * to allow sysfs to work. We also disallow any kind of + * allow sysfs to work. We also disallow any kind of * whitespace. */ bool dev_valid_name(const char *name) @@@ -5193,7 -5192,7 +5193,7 @@@ skip_classify } }
- if (unlikely(skb_vlan_tag_present(skb))) { + if (unlikely(skb_vlan_tag_present(skb)) && !netdev_uses_dsa(skb->dev)) { check_vlan_id: if (skb_vlan_tag_get_id(skb)) { /* Vlan id is non 0 and vlan_do_receive() above couldn't @@@ -5622,60 -5621,17 +5622,60 @@@ static void flush_backlog(struct work_s local_bh_enable(); }
+static bool flush_required(int cpu) +{ +#if IS_ENABLED(CONFIG_RPS) + struct softnet_data *sd = &per_cpu(softnet_data, cpu); + bool do_flush; + + local_irq_disable(); + rps_lock(sd); + + /* as insertion into process_queue happens with the rps lock held, + * process_queue access may race only with dequeue + */ + do_flush = !skb_queue_empty(&sd->input_pkt_queue) || + !skb_queue_empty_lockless(&sd->process_queue); + rps_unlock(sd); + local_irq_enable(); + + return do_flush; +#endif + /* without RPS we can't safely check input_pkt_queue: during a + * concurrent remote skb_queue_splice() we can detect as empty both + * input_pkt_queue and process_queue even if the latter could end-up + * containing a lot of packets. + */ + return true; +} + static void flush_all_backlogs(void) { + static cpumask_t flush_cpus; unsigned int cpu;
+ /* since we are under rtnl lock protection we can use static data + * for the cpumask and avoid allocating on stack the possibly + * large mask + */ + ASSERT_RTNL(); + get_online_cpus();
- for_each_online_cpu(cpu) - queue_work_on(cpu, system_highpri_wq, - per_cpu_ptr(&flush_works, cpu)); + cpumask_clear(&flush_cpus); + for_each_online_cpu(cpu) { + if (flush_required(cpu)) { + queue_work_on(cpu, system_highpri_wq, + per_cpu_ptr(&flush_works, cpu)); + cpumask_set_cpu(cpu, &flush_cpus); + } + }
- for_each_online_cpu(cpu) + /* we can have in flight packet[s] on the cpus we are not flushing, + * synchronize_net() in rollback_registered_many() will take care of + * them + */ + for_each_cpu(cpu, &flush_cpus) flush_work(per_cpu_ptr(&flush_works, cpu));
put_online_cpus(); @@@ -6337,7 -6293,7 +6337,7 @@@ EXPORT_SYMBOL(__napi_schedule) * @n: napi context * * Test if NAPI routine is already running, and if not mark - * it as running. This is used as a condition variable + * it as running. This is used as a condition variable to * insure only one NAPI poll instance runs. We also make * sure there is no pending NAPI disable. */ @@@ -6577,7 -6533,8 +6577,7 @@@ EXPORT_SYMBOL(napi_busy_loop)
static void napi_hash_add(struct napi_struct *napi) { - if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state) || - test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) + if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state)) return;
spin_lock(&napi_hash_lock); @@@ -6598,14 -6555,20 +6598,14 @@@ /* Warning : caller is responsible to make sure rcu grace period * is respected before freeing memory containing @napi */ -bool napi_hash_del(struct napi_struct *napi) +static void napi_hash_del(struct napi_struct *napi) { - bool rcu_sync_needed = false; - spin_lock(&napi_hash_lock);
- if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state)) { - rcu_sync_needed = true; - hlist_del_rcu(&napi->napi_hash_node); - } + hlist_del_init_rcu(&napi->napi_hash_node); + spin_unlock(&napi_hash_lock); - return rcu_sync_needed; } -EXPORT_SYMBOL_GPL(napi_hash_del);
static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) { @@@ -6637,11 -6600,7 +6637,11 @@@ static void init_gro_hash(struct napi_s void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight) { + if (WARN_ON(test_and_set_bit(NAPI_STATE_LISTED, &napi->state))) + return; + INIT_LIST_HEAD(&napi->poll_list); + INIT_HLIST_NODE(&napi->napi_hash_node); hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); napi->timer.function = napi_watchdog; init_gro_hash(napi); @@@ -6694,19 -6653,18 +6694,19 @@@ static void flush_gro_hash(struct napi_ }
/* Must be called in process context */ -void netif_napi_del(struct napi_struct *napi) +void __netif_napi_del(struct napi_struct *napi) { - might_sleep(); - if (napi_hash_del(napi)) - synchronize_net(); - list_del_init(&napi->dev_list); + if (!test_and_clear_bit(NAPI_STATE_LISTED, &napi->state)) + return; + + napi_hash_del(napi); + list_del_rcu(&napi->dev_list); napi_free_frags(napi);
flush_gro_hash(napi); napi->gro_bitmask = 0; } -EXPORT_SYMBOL(netif_napi_del); +EXPORT_SYMBOL(__netif_napi_del);
static int napi_poll(struct napi_struct *n, struct list_head *repoll) { @@@ -8689,7 -8647,7 +8689,7 @@@ int dev_get_port_parent_id(struct net_d if (!first.id_len) first = *ppid; else if (memcmp(&first, ppid, sizeof(*ppid))) - return -ENODATA; + return -EOPNOTSUPP; }
return err; @@@ -9512,7 -9470,7 +9512,7 @@@ int __netdev_update_features(struct net /* driver might be less strict about feature dependencies */ features = netdev_fix_features(dev, features);
- /* some features can't be enabled if they're off an an upper device */ + /* some features can't be enabled if they're off on an upper device */ netdev_for_each_upper_dev_rcu(dev, upper, iter) features = netdev_sync_upper_features(dev, upper, features);
@@@ -10028,12 -9986,10 +10028,12 @@@ EXPORT_SYMBOL(netdev_refcnt_read) * We can get stuck here if buggy protocols don't correctly * call dev_put. */ +#define WAIT_REFS_MIN_MSECS 1 +#define WAIT_REFS_MAX_MSECS 250 static void netdev_wait_allrefs(struct net_device *dev) { unsigned long rebroadcast_time, warning_time; - int refcnt; + int wait = 0, refcnt;
linkwatch_forget_dev(dev);
@@@ -10067,13 -10023,7 +10067,13 @@@ rebroadcast_time = jiffies; }
- msleep(250); + if (!wait) { + rcu_barrier(); + wait = WAIT_REFS_MIN_MSECS; + } else { + msleep(wait); + wait = min(wait << 1, WAIT_REFS_MAX_MSECS); + }
refcnt = netdev_refcnt_read(dev);
diff --combined net/core/filter.c index 2ad9c0ef1946,21eaf3b182f2..08f577114acc --- a/net/core/filter.c +++ b/net/core/filter.c @@@ -4459,7 -4459,6 +4459,7 @@@ static int _bpf_setsockopt(struct sock } else { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + unsigned long timeout;
if (optlen != sizeof(int)) return -EINVAL; @@@ -4481,20 -4480,6 +4481,20 @@@ tp->snd_ssthresh = val; } break; + case TCP_BPF_DELACK_MAX: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_DELACK_MAX || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_delack_max = timeout; + break; + case TCP_BPF_RTO_MIN: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_RTO_MIN || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_rto_min = timeout; + break; case TCP_SAVE_SYN: if (val < 0 || val > 1) ret = -EINVAL; @@@ -4565,9 -4550,9 +4565,9 @@@ static int _bpf_getsockopt(struct sock tp = tcp_sk(sk);
if (optlen <= 0 || !tp->saved_syn || - optlen > tp->saved_syn[0]) + optlen > tcp_saved_syn_len(tp->saved_syn)) goto err_clear; - memcpy(optval, tp->saved_syn + 1, optlen); + memcpy(optval, tp->saved_syn->data, optlen); break; default: goto err_clear; @@@ -4669,99 -4654,9 +4669,99 @@@ static const struct bpf_func_proto bpf_ .arg5_type = ARG_CONST_SIZE, };
+static int bpf_sock_ops_get_syn(struct bpf_sock_ops_kern *bpf_sock, + int optname, const u8 **start) +{ + struct sk_buff *syn_skb = bpf_sock->syn_skb; + const u8 *hdr_start; + int ret; + + if (syn_skb) { + /* sk is a request_sock here */ + + if (optname == TCP_BPF_SYN) { + hdr_start = syn_skb->data; + ret = tcp_hdrlen(syn_skb); + } else if (optname == TCP_BPF_SYN_IP) { + hdr_start = skb_network_header(syn_skb); + ret = skb_network_header_len(syn_skb) + + tcp_hdrlen(syn_skb); + } else { + /* optname == TCP_BPF_SYN_MAC */ + hdr_start = skb_mac_header(syn_skb); + ret = skb_mac_header_len(syn_skb) + + skb_network_header_len(syn_skb) + + tcp_hdrlen(syn_skb); + } + } else { + struct sock *sk = bpf_sock->sk; + struct saved_syn *saved_syn; + + if (sk->sk_state == TCP_NEW_SYN_RECV) + /* synack retransmit. bpf_sock->syn_skb will + * not be available. It has to resort to + * saved_syn (if it is saved). + */ + saved_syn = inet_reqsk(sk)->saved_syn; + else + saved_syn = tcp_sk(sk)->saved_syn; + + if (!saved_syn) + return -ENOENT; + + if (optname == TCP_BPF_SYN) { + hdr_start = saved_syn->data + + saved_syn->mac_hdrlen + + saved_syn->network_hdrlen; + ret = saved_syn->tcp_hdrlen; + } else if (optname == TCP_BPF_SYN_IP) { + hdr_start = saved_syn->data + + saved_syn->mac_hdrlen; + ret = saved_syn->network_hdrlen + + saved_syn->tcp_hdrlen; + } else { + /* optname == TCP_BPF_SYN_MAC */ + + /* TCP_SAVE_SYN may not have saved the mac hdr */ + if (!saved_syn->mac_hdrlen) + return -ENOENT; + + hdr_start = saved_syn->data; + ret = saved_syn->mac_hdrlen + + saved_syn->network_hdrlen + + saved_syn->tcp_hdrlen; + } + } + + *start = hdr_start; + return ret; +} + BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, int, level, int, optname, char *, optval, int, optlen) { + if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && + optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) { + int ret, copy_len = 0; + const u8 *start; + + ret = bpf_sock_ops_get_syn(bpf_sock, optname, &start); + if (ret > 0) { + copy_len = ret; + if (optlen < copy_len) { + copy_len = optlen; + ret = -ENOSPC; + } + + memcpy(optval, start, copy_len); + } + + /* Zero out unused buffer at the end */ + memset(optval + copy_len, 0, optlen - copy_len); + + return ret; + } + return _bpf_getsockopt(bpf_sock->sk, level, optname, optval, optlen); }
@@@ -4943,6 -4838,7 +4943,7 @@@ static int bpf_ipv4_fib_lookup(struct n fl4.saddr = params->ipv4_src; fl4.fl4_sport = params->sport; fl4.fl4_dport = params->dport; + fl4.flowi4_multipath_hash = 0;
if (flags & BPF_FIB_LOOKUP_DIRECT) { u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; @@@ -6255,232 -6151,6 +6256,232 @@@ static const struct bpf_func_proto bpf_ .arg3_type = ARG_ANYTHING, };
+static const u8 *bpf_search_tcp_opt(const u8 *op, const u8 *opend, + u8 search_kind, const u8 *magic, + u8 magic_len, bool *eol) +{ + u8 kind, kind_len; + + *eol = false; + + while (op < opend) { + kind = op[0]; + + if (kind == TCPOPT_EOL) { + *eol = true; + return ERR_PTR(-ENOMSG); + } else if (kind == TCPOPT_NOP) { + op++; + continue; + } + + if (opend - op < 2 || opend - op < op[1] || op[1] < 2) + /* Something is wrong in the received header. + * Follow the TCP stack's tcp_parse_options() + * and just bail here. + */ + return ERR_PTR(-EFAULT); + + kind_len = op[1]; + if (search_kind == kind) { + if (!magic_len) + return op; + + if (magic_len > kind_len - 2) + return ERR_PTR(-ENOMSG); + + if (!memcmp(&op[2], magic, magic_len)) + return op; + } + + op += kind_len; + } + + return ERR_PTR(-ENOMSG); +} + +BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, + void *, search_res, u32, len, u64, flags) +{ + bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN; + const u8 *op, *opend, *magic, *search = search_res; + u8 search_kind, search_len, copy_len, magic_len; + int ret; + + /* 2 byte is the minimal option len except TCPOPT_NOP and + * TCPOPT_EOL which are useless for the bpf prog to learn + * and this helper disallow loading them also. + */ + if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN) + return -EINVAL; + + search_kind = search[0]; + search_len = search[1]; + + if (search_len > len || search_kind == TCPOPT_NOP || + search_kind == TCPOPT_EOL) + return -EINVAL; + + if (search_kind == TCPOPT_EXP || search_kind == 253) { + /* 16 or 32 bit magic. +2 for kind and kind length */ + if (search_len != 4 && search_len != 6) + return -EINVAL; + magic = &search[2]; + magic_len = search_len - 2; + } else { + if (search_len) + return -EINVAL; + magic = NULL; + magic_len = 0; + } + + if (load_syn) { + ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op); + if (ret < 0) + return ret; + + opend = op + ret; + op += sizeof(struct tcphdr); + } else { + if (!bpf_sock->skb || + bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB) + /* This bpf_sock->op cannot call this helper */ + return -EPERM; + + opend = bpf_sock->skb_data_end; + op = bpf_sock->skb->data + sizeof(struct tcphdr); + } + + op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len, + &eol); + if (IS_ERR(op)) + return PTR_ERR(op); + + copy_len = op[1]; + ret = copy_len; + if (copy_len > len) { + ret = -ENOSPC; + copy_len = len; + } + + memcpy(search_res, op, copy_len); + return ret; +} + +static const struct bpf_func_proto bpf_sock_ops_load_hdr_opt_proto = { + .func = bpf_sock_ops_load_hdr_opt, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, +}; + +BPF_CALL_4(bpf_sock_ops_store_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, + const void *, from, u32, len, u64, flags) +{ + u8 new_kind, new_kind_len, magic_len = 0, *opend; + const u8 *op, *new_op, *magic = NULL; + struct sk_buff *skb; + bool eol; + + if (bpf_sock->op != BPF_SOCK_OPS_WRITE_HDR_OPT_CB) + return -EPERM; + + if (len < 2 || flags) + return -EINVAL; + + new_op = from; + new_kind = new_op[0]; + new_kind_len = new_op[1]; + + if (new_kind_len > len || new_kind == TCPOPT_NOP || + new_kind == TCPOPT_EOL) + return -EINVAL; + + if (new_kind_len > bpf_sock->remaining_opt_len) + return -ENOSPC; + + /* 253 is another experimental kind */ + if (new_kind == TCPOPT_EXP || new_kind == 253) { + if (new_kind_len < 4) + return -EINVAL; + /* Match for the 2 byte magic also. + * RFC 6994: the magic could be 2 or 4 bytes. + * Hence, matching by 2 byte only is on the + * conservative side but it is the right + * thing to do for the 'search-for-duplication' + * purpose. + */ + magic = &new_op[2]; + magic_len = 2; + } + + /* Check for duplication */ + skb = bpf_sock->skb; + op = skb->data + sizeof(struct tcphdr); + opend = bpf_sock->skb_data_end; + + op = bpf_search_tcp_opt(op, opend, new_kind, magic, magic_len, + &eol); + if (!IS_ERR(op)) + return -EEXIST; + + if (PTR_ERR(op) != -ENOMSG) + return PTR_ERR(op); + + if (eol) + /* The option has been ended. Treat it as no more + * header option can be written. + */ + return -ENOSPC; + + /* No duplication found. Store the header option. */ + memcpy(opend, from, new_kind_len); + + bpf_sock->remaining_opt_len -= new_kind_len; + bpf_sock->skb_data_end += new_kind_len; + + return 0; +} + +static const struct bpf_func_proto bpf_sock_ops_store_hdr_opt_proto = { + .func = bpf_sock_ops_store_hdr_opt, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, +}; + +BPF_CALL_3(bpf_sock_ops_reserve_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, + u32, len, u64, flags) +{ + if (bpf_sock->op != BPF_SOCK_OPS_HDR_OPT_LEN_CB) + return -EPERM; + + if (flags || len < 2) + return -EINVAL; + + if (len > bpf_sock->remaining_opt_len) + return -ENOSPC; + + bpf_sock->remaining_opt_len -= len; + + return 0; +} + +static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = { + .func = bpf_sock_ops_reserve_hdr_opt, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, +}; + #endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func) @@@ -6509,9 -6179,6 +6510,9 @@@ func == bpf_lwt_seg6_store_bytes || func == bpf_lwt_seg6_adjust_srh || func == bpf_lwt_seg6_action || +#endif +#ifdef CONFIG_INET + func == bpf_sock_ops_store_hdr_opt || #endif func == bpf_lwt_in_push_encap || func == bpf_lwt_xmit_push_encap) @@@ -6884,12 -6551,6 +6885,12 @@@ sock_ops_func_proto(enum bpf_func_id fu case BPF_FUNC_sk_storage_delete: return &bpf_sk_storage_delete_proto; #ifdef CONFIG_INET + case BPF_FUNC_load_hdr_opt: + return &bpf_sock_ops_load_hdr_opt_proto; + case BPF_FUNC_store_hdr_opt: + return &bpf_sock_ops_store_hdr_opt_proto; + case BPF_FUNC_reserve_hdr_opt: + return &bpf_sock_ops_reserve_hdr_opt_proto; case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; #endif /* CONFIG_INET */ @@@ -7405,8 -7066,6 +7406,6 @@@ static int bpf_gen_ld_abs(const struct bool indirect = BPF_MODE(orig->code) == BPF_IND; struct bpf_insn *insn = insn_buf;
- /* We're guaranteed here that CTX is in R6. */ - *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_CTX); if (!indirect) { *insn++ = BPF_MOV64_IMM(BPF_REG_2, orig->imm); } else { @@@ -7414,6 -7073,8 +7413,8 @@@ if (orig->imm) *insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, orig->imm); } + /* We're guaranteed here that CTX is in R6. */ + *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_CTX);
switch (BPF_SIZE(orig->code)) { case BPF_B: @@@ -7689,20 -7350,6 +7690,20 @@@ static bool sock_ops_is_valid_access(in return false; info->reg_type = PTR_TO_SOCKET_OR_NULL; break; + case offsetof(struct bpf_sock_ops, skb_data): + if (size != sizeof(__u64)) + return false; + info->reg_type = PTR_TO_PACKET; + break; + case offsetof(struct bpf_sock_ops, skb_data_end): + if (size != sizeof(__u64)) + return false; + info->reg_type = PTR_TO_PACKET_END; + break; + case offsetof(struct bpf_sock_ops, skb_tcp_flags): + bpf_ctx_record_field_size(info, size_default); + return bpf_ctx_narrow_access_ok(off, size, + size_default); default: if (size != size_default) return false; @@@ -8804,22 -8451,17 +8805,22 @@@ static u32 sock_ops_convert_ctx_access( return insn - insn_buf;
switch (si->off) { - case offsetof(struct bpf_sock_ops, op) ... + case offsetof(struct bpf_sock_ops, op): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, + op), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, op)); + break; + + case offsetof(struct bpf_sock_ops, replylong[0]) ... offsetof(struct bpf_sock_ops, replylong[3]): - BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, op) != - sizeof_field(struct bpf_sock_ops_kern, op)); BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, reply) != sizeof_field(struct bpf_sock_ops_kern, reply)); BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, replylong) != sizeof_field(struct bpf_sock_ops_kern, replylong)); off = si->off; - off -= offsetof(struct bpf_sock_ops, op); - off += offsetof(struct bpf_sock_ops_kern, op); + off -= offsetof(struct bpf_sock_ops, replylong[0]); + off += offsetof(struct bpf_sock_ops_kern, replylong[0]); if (type == BPF_WRITE) *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, off); @@@ -9040,49 -8682,6 +9041,49 @@@ case offsetof(struct bpf_sock_ops, sk): SOCK_OPS_GET_SK(); break; + case offsetof(struct bpf_sock_ops, skb_data_end): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, + skb_data_end), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + skb_data_end)); + break; + case offsetof(struct bpf_sock_ops, skb_data): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, + skb), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + skb)); + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data), + si->dst_reg, si->dst_reg, + offsetof(struct sk_buff, data)); + break; + case offsetof(struct bpf_sock_ops, skb_len): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, + skb), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + skb)); + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len), + si->dst_reg, si->dst_reg, + offsetof(struct sk_buff, len)); + break; + case offsetof(struct bpf_sock_ops, skb_tcp_flags): + off = offsetof(struct sk_buff, cb); + off += offsetof(struct tcp_skb_cb, tcp_flags); + *target_size = sizeof_field(struct tcp_skb_cb, tcp_flags); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, + skb), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + skb)); + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct tcp_skb_cb, + tcp_flags), + si->dst_reg, si->dst_reg, off); + break; } return insn - insn_buf; } @@@ -9924,7 -9523,7 +9925,7 @@@ BPF_CALL_1(bpf_skc_to_tcp6_sock, struc * trigger an explicit type generation here. */ BTF_TYPE_EMIT(struct tcp6_sock); - if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && + if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk->sk_family == AF_INET6) return (unsigned long)sk;
@@@ -9942,7 -9541,7 +9943,7 @@@ const struct bpf_func_proto bpf_skc_to_
BPF_CALL_1(bpf_skc_to_tcp_sock, struct sock *, sk) { - if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP) + if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP) return (unsigned long)sk;
return (unsigned long)NULL; @@@ -9960,12 -9559,12 +9961,12 @@@ const struct bpf_func_proto bpf_skc_to_ BPF_CALL_1(bpf_skc_to_tcp_timewait_sock, struct sock *, sk) { #ifdef CONFIG_INET - if (sk->sk_prot == &tcp_prot && sk->sk_state == TCP_TIME_WAIT) + if (sk && sk->sk_prot == &tcp_prot && sk->sk_state == TCP_TIME_WAIT) return (unsigned long)sk; #endif
#if IS_BUILTIN(CONFIG_IPV6) - if (sk->sk_prot == &tcpv6_prot && sk->sk_state == TCP_TIME_WAIT) + if (sk && sk->sk_prot == &tcpv6_prot && sk->sk_state == TCP_TIME_WAIT) return (unsigned long)sk; #endif
@@@ -9984,12 -9583,12 +9985,12 @@@ const struct bpf_func_proto bpf_skc_to_ BPF_CALL_1(bpf_skc_to_tcp_request_sock, struct sock *, sk) { #ifdef CONFIG_INET - if (sk->sk_prot == &tcp_prot && sk->sk_state == TCP_NEW_SYN_RECV) + if (sk && sk->sk_prot == &tcp_prot && sk->sk_state == TCP_NEW_SYN_RECV) return (unsigned long)sk; #endif
#if IS_BUILTIN(CONFIG_IPV6) - if (sk->sk_prot == &tcpv6_prot && sk->sk_state == TCP_NEW_SYN_RECV) + if (sk && sk->sk_prot == &tcpv6_prot && sk->sk_state == TCP_NEW_SYN_RECV) return (unsigned long)sk; #endif
@@@ -10011,7 -9610,7 +10012,7 @@@ BPF_CALL_1(bpf_skc_to_udp6_sock, struc * trigger an explicit type generation here. */ BTF_TYPE_EMIT(struct udp6_sock); - if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_UDP && + if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_UDP && sk->sk_type == SOCK_DGRAM && sk->sk_family == AF_INET6) return (unsigned long)sk;
diff --combined net/dsa/slave.c index 2d52bfba110a,16e5f98d4882..e7c1d62fde99 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@@ -303,36 -303,13 +303,36 @@@ static int dsa_slave_port_attr_set(stru return ret; }
+/* Must be called under rcu_read_lock() */ +static int +dsa_slave_vlan_check_for_8021q_uppers(struct net_device *slave, + const struct switchdev_obj_port_vlan *vlan) +{ + struct net_device *upper_dev; + struct list_head *iter; + + netdev_for_each_upper_dev_rcu(slave, upper_dev, iter) { + u16 vid; + + if (!is_vlan_dev(upper_dev)) + continue; + + vid = vlan_dev_vlan_id(upper_dev); + if (vid >= vlan->vid_begin && vid <= vlan->vid_end) + return -EBUSY; + } + + return 0; +} + static int dsa_slave_vlan_add(struct net_device *dev, const struct switchdev_obj *obj, struct switchdev_trans *trans) { + struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); struct switchdev_obj_port_vlan vlan; - int err; + int vid, err;
if (obj->orig_dev != dev) return -EOPNOTSUPP; @@@ -342,17 -319,6 +342,17 @@@
vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
+ /* Deny adding a bridge VLAN when there is already an 802.1Q upper with + * the same VID. + */ + if (trans->ph_prepare && br_vlan_enabled(dp->bridge_dev)) { + rcu_read_lock(); + err = dsa_slave_vlan_check_for_8021q_uppers(dev, &vlan); + rcu_read_unlock(); + if (err) + return err; + } + err = dsa_port_vlan_add(dp, &vlan, trans); if (err) return err; @@@ -367,12 -333,6 +367,12 @@@ if (err) return err;
+ for (vid = vlan.vid_begin; vid <= vlan.vid_end; vid++) { + err = vlan_vid_add(master, htons(ETH_P_8021Q), vid); + if (err) + return err; + } + return 0; }
@@@ -416,10 -376,7 +416,10 @@@ static int dsa_slave_port_obj_add(struc static int dsa_slave_vlan_del(struct net_device *dev, const struct switchdev_obj *obj) { + struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); + struct switchdev_obj_port_vlan *vlan; + int vid, err;
if (obj->orig_dev != dev) return -EOPNOTSUPP; @@@ -427,19 -384,10 +427,19 @@@ if (dsa_port_skip_vlan_configuration(dp)) return 0;
+ vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); + /* Do not deprogram the CPU port as it may be shared with other user * ports which can be members of this VLAN as well. */ - return dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj)); + err = dsa_port_vlan_del(dp, vlan); + if (err) + return err; + + for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) + vlan_vid_del(master, htons(ETH_P_8021Q), vid); + + return 0; }
static int dsa_slave_port_obj_del(struct net_device *dev, @@@ -1284,66 -1232,64 +1284,66 @@@ static int dsa_slave_get_ts_info(struc static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid) { + struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); - struct bridge_vlan_info info; + struct switchdev_obj_port_vlan vlan = { + .obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN, + .vid_begin = vid, + .vid_end = vid, + /* This API only allows programming tagged, non-PVID VIDs */ + .flags = 0, + }; + struct switchdev_trans trans; int ret;
- /* Check for a possible bridge VLAN entry now since there is no - * need to emulate the switchdev prepare + commit phase. - */ - if (dp->bridge_dev) { - if (dsa_port_skip_vlan_configuration(dp)) - return 0; + /* User port... */ + trans.ph_prepare = true; + ret = dsa_port_vlan_add(dp, &vlan, &trans); + if (ret) + return ret;
- /* br_vlan_get_info() returns -EINVAL or -ENOENT if the - * device, respectively the VID is not found, returning - * 0 means success, which is a failure for us here. - */ - ret = br_vlan_get_info(dp->bridge_dev, vid, &info); - if (ret == 0) - return -EBUSY; - } + trans.ph_prepare = false; + ret = dsa_port_vlan_add(dp, &vlan, &trans); + if (ret) + return ret;
- ret = dsa_port_vid_add(dp, vid, 0); + /* And CPU port... */ + trans.ph_prepare = true; + ret = dsa_port_vlan_add(dp->cpu_dp, &vlan, &trans); if (ret) return ret;
- ret = dsa_port_vid_add(dp->cpu_dp, vid, 0); + trans.ph_prepare = false; + ret = dsa_port_vlan_add(dp->cpu_dp, &vlan, &trans); if (ret) return ret;
- return 0; + return vlan_vid_add(master, proto, vid); }
static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid) { + struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); - struct bridge_vlan_info info; - int ret; - - /* Check for a possible bridge VLAN entry now since there is no - * need to emulate the switchdev prepare + commit phase. - */ - if (dp->bridge_dev) { - if (dsa_port_skip_vlan_configuration(dp)) - return 0; - - /* br_vlan_get_info() returns -EINVAL or -ENOENT if the - * device, respectively the VID is not found, returning - * 0 means success, which is a failure for us here. - */ - ret = br_vlan_get_info(dp->bridge_dev, vid, &info); - if (ret == 0) - return -EBUSY; - } + struct switchdev_obj_port_vlan vlan = { + .vid_begin = vid, + .vid_end = vid, + /* This API only allows programming tagged, non-PVID VIDs */ + .flags = 0, + }; + int err;
/* Do not deprogram the CPU port as it may be shared with other user * ports which can be members of this VLAN as well. */ - return dsa_port_vid_del(dp, vid); + err = dsa_port_vlan_del(dp, &vlan); + if (err) + return err; + + vlan_vid_del(master, proto, vid); + + return 0; }
struct dsa_hw_port { @@@ -1838,7 -1784,7 +1838,7 @@@ int dsa_slave_create(struct dsa_port *p rtnl_lock(); ret = dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN); rtnl_unlock(); - if (ret) + if (ret && ret != -EOPNOTSUPP) dev_warn(ds->dev, "nonfatal error %d setting MTU on port %d\n", ret, port->index);
@@@ -1846,23 -1792,34 +1846,35 @@@
ret = dsa_slave_phy_setup(slave_dev); if (ret) { - netdev_err(master, "error %d setting up slave PHY for %s\n", - ret, slave_dev->name); + netdev_err(slave_dev, + "error %d setting up PHY for tree %d, switch %d, port %d\n", + ret, ds->dst->index, ds->index, port->index); goto out_gcells; }
dsa_slave_notify(slave_dev, DSA_PORT_REGISTER);
- ret = register_netdev(slave_dev); + rtnl_lock(); + + ret = register_netdevice(slave_dev); if (ret) { netdev_err(master, "error %d registering interface %s\n", ret, slave_dev->name); + rtnl_unlock(); goto out_phy; }
+ ret = netdev_upper_dev_link(master, slave_dev, NULL); + + rtnl_unlock(); + + if (ret) + goto out_unregister; + return 0;
+ out_unregister: + unregister_netdev(slave_dev); out_phy: rtnl_lock(); phylink_disconnect_phy(p->dp->pl); @@@ -1879,16 -1836,18 +1891,18 @@@ out_free
void dsa_slave_destroy(struct net_device *slave_dev) { + struct net_device *master = dsa_slave_to_master(slave_dev); struct dsa_port *dp = dsa_slave_to_port(slave_dev); struct dsa_slave_priv *p = netdev_priv(slave_dev);
netif_carrier_off(slave_dev); rtnl_lock(); + netdev_upper_dev_unlink(master, slave_dev); + unregister_netdevice(slave_dev); phylink_disconnect_phy(dp->pl); rtnl_unlock();
dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER); - unregister_netdev(slave_dev); phylink_destroy(dp->pl); gro_cells_destroy(&p->gcells); free_percpu(p->stats64); @@@ -1921,9 -1880,9 +1935,9 @@@ static int dsa_slave_changeupper(struc return err; }
-static int dsa_slave_upper_vlan_check(struct net_device *dev, - struct netdev_notifier_changeupper_info * - info) +static int +dsa_prevent_bridging_8021q_upper(struct net_device *dev, + struct netdev_notifier_changeupper_info *info) { struct netlink_ext_ack *ext_ack; struct net_device *slave; @@@ -1953,56 -1912,14 +1967,56 @@@ return NOTIFY_DONE; }
+static int +dsa_slave_check_8021q_upper(struct net_device *dev, + struct netdev_notifier_changeupper_info *info) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct net_device *br = dp->bridge_dev; + struct bridge_vlan_info br_info; + struct netlink_ext_ack *extack; + int err = NOTIFY_DONE; + u16 vid; + + if (!br || !br_vlan_enabled(br)) + return NOTIFY_DONE; + + extack = netdev_notifier_info_to_extack(&info->info); + vid = vlan_dev_vlan_id(info->upper_dev); + + /* br_vlan_get_info() returns -EINVAL or -ENOENT if the + * device, respectively the VID is not found, returning + * 0 means success, which is a failure for us here. + */ + err = br_vlan_get_info(br, vid, &br_info); + if (err == 0) { + NL_SET_ERR_MSG_MOD(extack, + "This VLAN is already configured by the bridge"); + return notifier_from_errno(-EBUSY); + } + + return NOTIFY_DONE; +} + static int dsa_slave_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr);
- if (event == NETDEV_CHANGEUPPER) { + switch (event) { + case NETDEV_PRECHANGEUPPER: { + struct netdev_notifier_changeupper_info *info = ptr; + + if (!dsa_slave_dev_check(dev)) + return dsa_prevent_bridging_8021q_upper(dev, ptr); + + if (is_vlan_dev(info->upper_dev)) + return dsa_slave_check_8021q_upper(dev, ptr); + break; + } + case NETDEV_CHANGEUPPER: if (!dsa_slave_dev_check(dev)) - return dsa_slave_upper_vlan_check(dev, ptr); + return NOTIFY_DONE;
return dsa_slave_changeupper(dev, ptr); } diff --combined net/ipv4/inet_diag.c index 93816d47e55a,f1bd95f243b3..366a4507b5a3 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@@ -125,7 -125,6 +125,7 @@@ int inet_diag_msg_attrs_fill(struct soc bool net_admin) { const struct inet_sock *inet = inet_sk(sk); + struct inet_diag_sockopt inet_sockopt;
if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown)) goto errout; @@@ -181,30 -180,14 +181,30 @@@ r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk)); r->idiag_inode = sock_i_ino(sk);
+ memset(&inet_sockopt, 0, sizeof(inet_sockopt)); + inet_sockopt.recverr = inet->recverr; + inet_sockopt.is_icsk = inet->is_icsk; + inet_sockopt.freebind = inet->freebind; + inet_sockopt.hdrincl = inet->hdrincl; + inet_sockopt.mc_loop = inet->mc_loop; + inet_sockopt.transparent = inet->transparent; + inet_sockopt.mc_all = inet->mc_all; + inet_sockopt.nodefrag = inet->nodefrag; + inet_sockopt.bind_address_no_port = inet->bind_address_no_port; + inet_sockopt.recverr_rfc4884 = inet->recverr_rfc4884; + inet_sockopt.defer_connect = inet->defer_connect; + if (nla_put(skb, INET_DIAG_SOCKOPT, sizeof(inet_sockopt), + &inet_sockopt)) + goto errout; + return 0; errout: return 1; } EXPORT_SYMBOL_GPL(inet_diag_msg_attrs_fill);
- static void inet_diag_parse_attrs(const struct nlmsghdr *nlh, int hdrlen, - struct nlattr **req_nlas) + static int inet_diag_parse_attrs(const struct nlmsghdr *nlh, int hdrlen, + struct nlattr **req_nlas) { struct nlattr *nla; int remaining; @@@ -212,9 -195,13 +212,13 @@@ nlmsg_for_each_attr(nla, nlh, hdrlen, remaining) { int type = nla_type(nla);
+ if (type == INET_DIAG_REQ_PROTOCOL && nla_len(nla) != sizeof(u32)) + return -EINVAL; + if (type < __INET_DIAG_REQ_MAX) req_nlas[type] = nla; } + return 0; }
static int inet_diag_get_protocol(const struct inet_diag_req_v2 *req, @@@ -591,7 -578,10 +595,10 @@@ static int inet_diag_cmd_exact(int cmd int err, protocol;
memset(&dump_data, 0, sizeof(dump_data)); - inet_diag_parse_attrs(nlh, hdrlen, dump_data.req_nlas); + err = inet_diag_parse_attrs(nlh, hdrlen, dump_data.req_nlas); + if (err) + return err; + protocol = inet_diag_get_protocol(req, &dump_data);
handler = inet_diag_lock_handler(protocol); @@@ -1197,8 -1187,11 +1204,11 @@@ static int __inet_diag_dump_start(struc if (!cb_data) return -ENOMEM;
- inet_diag_parse_attrs(nlh, hdrlen, cb_data->req_nlas); - + err = inet_diag_parse_attrs(nlh, hdrlen, cb_data->req_nlas); + if (err) { + kfree(cb_data); + return err; + } nla = cb_data->inet_diag_nla_bc; if (nla) { err = inet_diag_bc_audit(nla, skb); diff --combined net/ipv4/ip_output.c index 5fb536ff51f0,e6f2ada9e7d5..8b200252c655 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@@ -74,6 -74,7 +74,7 @@@ #include <net/icmp.h> #include <net/checksum.h> #include <net/inetpeer.h> + #include <net/inet_ecn.h> #include <net/lwtunnel.h> #include <linux/bpf-cgroup.h> #include <linux/igmp.h> @@@ -142,8 -143,7 +143,8 @@@ static inline int ip_select_ttl(struct * */ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, - __be32 saddr, __be32 daddr, struct ip_options_rcu *opt) + __be32 saddr, __be32 daddr, struct ip_options_rcu *opt, + u8 tos) { struct inet_sock *inet = inet_sk(sk); struct rtable *rt = skb_rtable(skb); @@@ -156,7 -156,7 +157,7 @@@ iph = ip_hdr(skb); iph->version = 4; iph->ihl = 5; - iph->tos = inet->tos; + iph->tos = tos; iph->ttl = ip_select_ttl(inet, &rt->dst); iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr); iph->saddr = saddr; @@@ -997,7 -997,7 +998,7 @@@ static int __ip_append_data(struct soc
fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; - maxnonfragsize = ip_sk_ignore_df(sk) ? 0xFFFF : mtu; + maxnonfragsize = ip_sk_ignore_df(sk) ? IP_MAX_MTU : mtu;
if (cork->length + length > maxnonfragsize - fragheaderlen) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, @@@ -1352,7 -1352,7 +1353,7 @@@ ssize_t ip_append_page(struct sock *sk if (cork->flags & IPCORK_OPT) opt = cork->opt;
- if (!(rt->dst.dev->features&NETIF_F_SG)) + if (!(rt->dst.dev->features & NETIF_F_SG)) return -EOPNOTSUPP;
hh_len = LL_RESERVED_SPACE(rt->dst.dev); @@@ -1537,7 -1537,7 +1538,7 @@@ struct sk_buff *__ip_make_skb(struct so ip_select_ident(net, skb, sk);
if (opt) { - iph->ihl += opt->optlen>>2; + iph->ihl += opt->optlen >> 2; ip_options_build(skb, opt, cork->addr, rt, 0); }
@@@ -1704,7 -1704,7 +1705,7 @@@ void ip_send_unicast_reply(struct sock if (IS_ERR(rt)) return;
- inet_sk(sk)->tos = arg->tos; + inet_sk(sk)->tos = arg->tos & ~INET_ECN_MASK;
sk->sk_protocol = ip_hdr(skb)->protocol; sk->sk_bound_dev_if = arg->bound_dev_if; diff --combined net/ipv4/route.c index 2c05b863ae43,58642b29a499..d15a78b26dfa --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@@ -623,7 -623,7 +623,7 @@@ static inline u32 fnhe_hashfun(__be32 d u32 hval;
net_get_random_once(&fnhe_hashrnd, sizeof(fnhe_hashrnd)); - hval = jhash_1word((__force u32) daddr, fnhe_hashrnd); + hval = jhash_1word((__force u32)daddr, fnhe_hashrnd); return hash_32(hval, FNHE_HASH_SHIFT); }
@@@ -786,8 -786,10 +786,10 @@@ static void __ip_do_redirect(struct rta neigh_event_send(n, NULL); } else { if (fib_lookup(net, fl4, &res, 0) == 0) { - struct fib_nh_common *nhc = FIB_RES_NHC(res); + struct fib_nh_common *nhc;
+ fib_select_path(net, &res, fl4, skb); + nhc = FIB_RES_NHC(res); update_or_create_fnhe(nhc, fl4->daddr, new_gw, 0, false, jiffies + ip_rt_gc_timeout); @@@ -1013,14 -1015,14 +1015,15 @@@ out: kfree_skb(skb) static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) { struct dst_entry *dst = &rt->dst; + struct net *net = dev_net(dst->dev); - u32 old_mtu = ipv4_mtu(dst); struct fib_result res; bool lock = false; + u32 old_mtu;
if (ip_mtu_locked(dst)) return;
+ old_mtu = ipv4_mtu(dst); if (old_mtu < mtu) return;
@@@ -1034,9 -1036,11 +1037,11 @@@ return;
rcu_read_lock(); - if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) { - struct fib_nh_common *nhc = FIB_RES_NHC(res); + if (fib_lookup(net, fl4, &res, 0) == 0) { + struct fib_nh_common *nhc;
+ fib_select_path(net, &res, fl4, NULL); + nhc = FIB_RES_NHC(res); update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, jiffies + ip_rt_mtu_expires); } @@@ -1062,7 -1066,7 +1067,7 @@@ static void ip_rt_update_pmtu(struct ds void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, int oif, u8 protocol) { - const struct iphdr *iph = (const struct iphdr *) skb->data; + const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; u32 mark = IP4_REPLY_MARK(net, skb->mark); @@@ -1079,7 -1083,7 +1084,7 @@@ EXPORT_SYMBOL_GPL(ipv4_update_pmtu)
static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) { - const struct iphdr *iph = (const struct iphdr *) skb->data; + const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt;
@@@ -1097,7 -1101,7 +1102,7 @@@
void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) { - const struct iphdr *iph = (const struct iphdr *) skb->data; + const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; struct dst_entry *odst = NULL; @@@ -1127,7 -1131,7 +1132,7 @@@ new = true; }
- __ip_rt_update_pmtu((struct rtable *) xfrm_dst_path(&rt->dst), &fl4, mtu); + __ip_rt_update_pmtu((struct rtable *)xfrm_dst_path(&rt->dst), &fl4, mtu);
if (!dst_check(&rt->dst, 0)) { if (new) @@@ -1152,7 -1156,7 +1157,7 @@@ EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu) void ipv4_redirect(struct sk_buff *skb, struct net *net, int oif, u8 protocol) { - const struct iphdr *iph = (const struct iphdr *) skb->data; + const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt;
@@@ -1168,7 -1172,7 +1173,7 @@@ EXPORT_SYMBOL_GPL(ipv4_redirect)
void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk) { - const struct iphdr *iph = (const struct iphdr *) skb->data; + const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; struct net *net = sock_net(sk); @@@ -1308,7 -1312,7 +1313,7 @@@ static unsigned int ipv4_default_advmss
static unsigned int ipv4_mtu(const struct dst_entry *dst) { - const struct rtable *rt = (const struct rtable *) dst; + const struct rtable *rt = (const struct rtable *)dst; unsigned int mtu = rt->rt_pmtu;
if (!mtu || time_after_eq(jiffies, rt->dst.expires)) @@@ -2148,6 -2152,7 +2153,7 @@@ static int ip_route_input_slow(struct s fl4.daddr = daddr; fl4.saddr = saddr; fl4.flowi4_uid = sock_net_uid(net, NULL); + fl4.flowi4_multipath_hash = 0;
if (fib4_rules_early_flow_dissect(net, skb, &fl4, &_flkeys)) { flkeys = &_flkeys; @@@ -2668,8 -2673,6 +2674,6 @@@ struct rtable *ip_route_output_key_hash fib_select_path(net, res, fl4, skb);
dev_out = FIB_RES_DEV(*res); - fl4->flowi4_oif = dev_out->ifindex; -
make_route: rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags); diff --combined net/ipv6/ip6_fib.c index 44d68ed70f24,4a664ad4f4d4..141c0a4c569a --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@@ -1812,14 -1812,10 +1812,14 @@@ static struct fib6_node *fib6_repair_tr
children = 0; child = NULL; - if (fn_r) - child = fn_r, children |= 1; - if (fn_l) - child = fn_l, children |= 2; + if (fn_r) { + child = fn_r; + children |= 1; + } + if (fn_l) { + child = fn_l; + children |= 2; + }
if (children == 3 || FIB6_SUBTREE(fn) #ifdef CONFIG_IPV6_SUBTREES @@@ -1997,14 -1993,19 +1997,19 @@@ static void fib6_del_route(struct fib6_ /* Need to own table->tb6_lock */ int fib6_del(struct fib6_info *rt, struct nl_info *info) { - struct fib6_node *fn = rcu_dereference_protected(rt->fib6_node, - lockdep_is_held(&rt->fib6_table->tb6_lock)); - struct fib6_table *table = rt->fib6_table; struct net *net = info->nl_net; struct fib6_info __rcu **rtp; struct fib6_info __rcu **rtp_next; + struct fib6_table *table; + struct fib6_node *fn; + + if (rt == net->ipv6.fib6_null_entry) + return -ENOENT;
- if (!fn || rt == net->ipv6.fib6_null_entry) + table = rt->fib6_table; + fn = rcu_dereference_protected(rt->fib6_node, + lockdep_is_held(&table->tb6_lock)); + if (!fn) return -ENOENT;
WARN_ON(!(fn->fn_flags & RTN_RTINFO)); diff --combined net/ipv6/route.c index e8ee20720fe0,fb075d9545b9..bde6d48a9942 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@@ -4202,7 -4202,7 +4202,7 @@@ static struct fib6_info *rt6_add_route_ .fc_nlinfo.nl_net = net, };
- cfg.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO, + cfg.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO; cfg.fc_dst = *prefix; cfg.fc_gateway = *gwaddr;
@@@ -5284,10 -5284,9 +5284,10 @@@ static int ip6_route_multipath_del(stru { struct fib6_config r_cfg; struct rtnexthop *rtnh; + int last_err = 0; int remaining; int attrlen; - int err = 1, last_err = 0; + int err;
remaining = cfg->fc_mp_len; rtnh = (struct rtnexthop *)cfg->fc_mp; diff --combined net/mac80211/mlme.c index 50a9b9025725,2e400b0ff696..2489c6c64c2d --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@@ -2432,6 -2432,23 +2432,6 @@@ static void ieee80211_set_disassoc(stru sdata->encrypt_headroom = IEEE80211_ENCRYPT_HEADROOM; }
-void ieee80211_sta_rx_notify(struct ieee80211_sub_if_data *sdata, - struct ieee80211_hdr *hdr) -{ - /* - * We can postpone the mgd.timer whenever receiving unicast frames - * from AP because we know that the connection is working both ways - * at that time. But multicast frames (and hence also beacons) must - * be ignored here, because we need to trigger the timer during - * data idle periods for sending the periodic probe request to the - * AP we're connected to. - */ - if (is_multicast_ether_addr(hdr->addr1)) - return; - - ieee80211_sta_reset_conn_monitor(sdata); -} - static void ieee80211_reset_ap_probe(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; @@@ -2504,13 -2521,21 +2504,13 @@@ void ieee80211_sta_tx_notify(struct iee { ieee80211_sta_tx_wmm_ac_notify(sdata, hdr, tx_time);
- if (!ieee80211_is_data(hdr->frame_control)) - return; - - if (ieee80211_is_any_nullfunc(hdr->frame_control) && - sdata->u.mgd.probe_send_count > 0) { - if (ack) - ieee80211_sta_reset_conn_monitor(sdata); - else - sdata->u.mgd.nullfunc_failed = true; - ieee80211_queue_work(&sdata->local->hw, &sdata->work); + if (!ieee80211_is_any_nullfunc(hdr->frame_control) || + !sdata->u.mgd.probe_send_count) return; - }
- if (ack) - ieee80211_sta_reset_conn_monitor(sdata); + if (!ack) + sdata->u.mgd.nullfunc_failed = true; + ieee80211_queue_work(&sdata->local->hw, &sdata->work); }
static void ieee80211_mlme_send_probe_req(struct ieee80211_sub_if_data *sdata, @@@ -3523,9 -3548,6 +3523,9 @@@ static bool ieee80211_assoc_success(str goto out; }
+ if (sdata->wdev.use_4addr) + drv_sta_set_4addr(local, sdata, &sta->sta, true); + mutex_unlock(&sdata->local->sta_mtx);
/* @@@ -3583,8 -3605,8 +3583,8 @@@ * Start timer to probe the connection to the AP now. * Also start the timer that will detect beacon loss. */ - ieee80211_sta_rx_notify(sdata, (struct ieee80211_hdr *)mgmt); ieee80211_sta_reset_beacon_monitor(sdata); + ieee80211_sta_reset_conn_monitor(sdata);
ret = true; out: @@@ -4555,26 -4577,10 +4555,26 @@@ static void ieee80211_sta_conn_mon_time from_timer(sdata, t, u.mgd.conn_mon_timer); struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; struct ieee80211_local *local = sdata->local; + struct sta_info *sta; + unsigned long timeout;
if (sdata->vif.csa_active && !ifmgd->csa_waiting_bcn) return;
+ sta = sta_info_get(sdata, ifmgd->bssid); + if (!sta) + return; + + timeout = sta->status_stats.last_ack; + if (time_before(sta->status_stats.last_ack, sta->rx_stats.last_rx)) + timeout = sta->rx_stats.last_rx; + timeout += IEEE80211_CONNECTION_IDLE_TIME; + + if (time_is_before_jiffies(timeout)) { + mod_timer(&ifmgd->conn_mon_timer, round_jiffies_up(timeout)); + return; + } + ieee80211_queue_work(&local->hw, &ifmgd->monitor_work); }
@@@ -4855,6 -4861,7 +4855,7 @@@ static int ieee80211_prep_channel(struc struct ieee80211_supported_band *sband; struct cfg80211_chan_def chandef; bool is_6ghz = cbss->channel->band == NL80211_BAND_6GHZ; + bool is_5ghz = cbss->channel->band == NL80211_BAND_5GHZ; struct ieee80211_bss *bss = (void *)cbss->priv; int ret; u32 i; @@@ -4873,7 -4880,7 +4874,7 @@@ ifmgd->flags |= IEEE80211_STA_DISABLE_HE; }
- if (!sband->vht_cap.vht_supported && !is_6ghz) { + if (!sband->vht_cap.vht_supported && is_5ghz) { ifmgd->flags |= IEEE80211_STA_DISABLE_VHT; ifmgd->flags |= IEEE80211_STA_DISABLE_HE; } diff --combined net/mac80211/rx.c index b04d6e01a346,a959ebf56852..7f88a1b2215c --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@@ -451,7 -451,8 +451,8 @@@ ieee80211_add_rx_radiotap_header(struc else if (status->bw == RATE_INFO_BW_5) channel_flags |= IEEE80211_CHAN_QUARTER;
- if (status->band == NL80211_BAND_5GHZ) + if (status->band == NL80211_BAND_5GHZ || + status->band == NL80211_BAND_6GHZ) channel_flags |= IEEE80211_CHAN_OFDM | IEEE80211_CHAN_5GHZ; else if (status->encoding != RX_ENC_LEGACY) channel_flags |= IEEE80211_CHAN_DYN | IEEE80211_CHAN_2GHZ; @@@ -1811,6 -1812,9 +1812,6 @@@ ieee80211_rx_h_sta_process(struct ieee8 sta->rx_stats.last_rate = sta_stats_encode_rate(status); }
- if (rx->sdata->vif.type == NL80211_IFTYPE_STATION) - ieee80211_sta_rx_notify(rx->sdata, hdr); - sta->rx_stats.fragments++;
u64_stats_update_begin(&rx->sta->rx_stats.syncp); @@@ -2896,7 -2900,7 +2897,7 @@@ ieee80211_rx_h_mesh_fwding(struct ieee8 fwd_hdr->frame_control &= ~cpu_to_le16(IEEE80211_FCTL_RETRY); info = IEEE80211_SKB_CB(fwd_skb); memset(info, 0, sizeof(*info)); - info->flags |= IEEE80211_TX_INTFL_NEED_TXPROCESSING; + info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; info->control.vif = &rx->sdata->vif; info->control.jiffies = jiffies; if (is_multicast_ether_addr(fwd_hdr->addr1)) { @@@ -4145,6 -4149,7 +4146,6 @@@ void ieee80211_check_fast_rx(struct sta fastrx.sa_offs = offsetof(struct ieee80211_hdr, addr2); fastrx.expected_ds_bits = 0; } else { - fastrx.sta_notify = sdata->u.mgd.probe_send_count > 0; fastrx.da_offs = offsetof(struct ieee80211_hdr, addr1); fastrx.sa_offs = offsetof(struct ieee80211_hdr, addr3); fastrx.expected_ds_bits = @@@ -4374,6 -4379,11 +4375,6 @@@ static bool ieee80211_invoke_fast_rx(st pskb_trim(skb, skb->len - fast_rx->icv_len)) goto drop;
- if (unlikely(fast_rx->sta_notify)) { - ieee80211_sta_rx_notify(rx->sdata, hdr); - fast_rx->sta_notify = false; - } - /* statistics part of ieee80211_rx_h_sta_process() */ if (!(status->flag & RX_FLAG_NO_SIGNAL_VAL)) { stats->last_signal = status->signal; diff --combined net/mac80211/vht.c index 7e601d067d53,d1b64d0751f2..fb0e3a657d2d --- a/net/mac80211/vht.c +++ b/net/mac80211/vht.c @@@ -168,10 -168,7 +168,7 @@@ ieee80211_vht_cap_ie_to_sta_vht_cap(str /* take some capabilities as-is */ cap_info = le32_to_cpu(vht_cap_ie->vht_cap_info); vht_cap->cap = cap_info; - vht_cap->cap &= IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 | - IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 | - IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_11454 | - IEEE80211_VHT_CAP_RXLDPC | + vht_cap->cap &= IEEE80211_VHT_CAP_RXLDPC | IEEE80211_VHT_CAP_VHT_TXOP_PS | IEEE80211_VHT_CAP_HTC_VHT | IEEE80211_VHT_CAP_MAX_A_MPDU_LENGTH_EXPONENT_MASK | @@@ -180,6 -177,9 +177,9 @@@ IEEE80211_VHT_CAP_RX_ANTENNA_PATTERN | IEEE80211_VHT_CAP_TX_ANTENNA_PATTERN;
+ vht_cap->cap |= min_t(u32, cap_info & IEEE80211_VHT_CAP_MAX_MPDU_MASK, + own_cap.cap & IEEE80211_VHT_CAP_MAX_MPDU_MASK); + /* and some based on our own capabilities */ switch (own_cap.cap & IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_MASK) { case IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ: @@@ -315,6 -315,10 +315,6 @@@
sta->sta.bandwidth = ieee80211_sta_cur_vht_bw(sta);
- /* If HT IE reported 3839 bytes only, stay with that size. */ - if (sta->sta.max_amsdu_len == IEEE80211_MAX_MPDU_LEN_HT_3839) - return; - switch (vht_cap->cap & IEEE80211_VHT_CAP_MAX_MPDU_MASK) { case IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_11454: sta->sta.max_amsdu_len = IEEE80211_MAX_MPDU_LEN_VHT_11454; diff --combined net/mptcp/pm_netlink.c index 6947f4fee6b9,770da3627848..b4a9624d7bf2 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@@ -23,6 -23,8 +23,6 @@@ static int pm_nl_pernet_id
struct mptcp_pm_addr_entry { struct list_head list; - unsigned int flags; - int ifindex; struct mptcp_addr_info addr; struct rcu_head rcu; }; @@@ -64,6 -66,16 +64,16 @@@ static bool addresses_equal(const struc return a->port == b->port; }
+ static bool address_zero(const struct mptcp_addr_info *addr) + { + struct mptcp_addr_info zero; + + memset(&zero, 0, sizeof(zero)); + zero.family = addr->family; + + return addresses_equal(addr, &zero, false); + } + static void local_address(const struct sock_common *skc, struct mptcp_addr_info *addr) { @@@ -117,7 -129,7 +127,7 @@@ select_local_address(const struct pm_nl rcu_read_lock(); spin_lock_bh(&msk->join_list_lock); list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) continue;
/* avoid any address already in use by subflows and @@@ -148,7 -160,7 +158,7 @@@ select_signal_address(struct pm_nl_pern * can lead to additional addresses not being announced. */ list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) + if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue; if (i++ == pos) { ret = entry; @@@ -169,9 -181,9 +179,9 @@@ static void check_work_pending(struct m
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { + struct mptcp_addr_info remote = { 0 }; struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *local; - struct mptcp_addr_info remote; struct pm_nl_pernet *pernet;
pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id); @@@ -208,7 -220,8 +218,7 @@@ msk->pm.subflows++; check_work_pending(msk); spin_unlock_bh(&msk->pm.lock); - __mptcp_subflow_connect(sk, local->ifindex, - &local->addr, &remote); + __mptcp_subflow_connect(sk, &local->addr, &remote); spin_lock_bh(&msk->pm.lock); return; } @@@ -254,13 -267,13 +264,13 @@@ void mptcp_pm_nl_add_addr_received(stru local.family = remote.family;
spin_unlock_bh(&msk->pm.lock); - __mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote); + __mptcp_subflow_connect((struct sock *)msk, &local, &remote); spin_lock_bh(&msk->pm.lock); }
static bool address_use_port(struct mptcp_pm_addr_entry *entry) { - return (entry->flags & + return (entry->addr.flags & (MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) == MPTCP_PM_ADDR_FLAG_SIGNAL; } @@@ -290,9 -303,9 +300,9 @@@ static int mptcp_pm_nl_append_new_local goto out; }
- if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL) pernet->add_addr_signal_max++; - if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) pernet->local_addr_max++;
entry->addr.id = pernet->next_id++; @@@ -320,10 -333,13 +330,13 @@@ int mptcp_pm_nl_get_local_id(struct mpt * addr */ local_address((struct sock_common *)msk, &msk_local); - local_address((struct sock_common *)msk, &skc_local); + local_address((struct sock_common *)skc, &skc_local); if (addresses_equal(&msk_local, &skc_local, false)) return 0;
+ if (address_zero(&skc_local)) + return 0; + pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
rcu_read_lock(); @@@ -338,13 -354,12 +351,13 @@@ return ret;
/* address not found, add to local list */ - entry = kmalloc(sizeof(*entry), GFP_KERNEL); + entry = kmalloc(sizeof(*entry), GFP_ATOMIC); if (!entry) return -ENOMEM;
- entry->flags = 0; entry->addr = skc_local; + entry->addr.ifindex = 0; + entry->addr.flags = 0; ret = mptcp_pm_nl_append_new_local_addr(pernet, entry); if (ret < 0) kfree(entry); @@@ -382,8 -397,8 +395,8 @@@ mptcp_pm_addr_policy[MPTCP_PM_ADDR_ATTR [MPTCP_PM_ADDR_ATTR_FAMILY] = { .type = NLA_U16, }, [MPTCP_PM_ADDR_ATTR_ID] = { .type = NLA_U8, }, [MPTCP_PM_ADDR_ATTR_ADDR4] = { .type = NLA_U32, }, - [MPTCP_PM_ADDR_ATTR_ADDR6] = { .type = NLA_EXACT_LEN, - .len = sizeof(struct in6_addr), }, + [MPTCP_PM_ADDR_ATTR_ADDR6] = + NLA_POLICY_EXACT_LEN(sizeof(struct in6_addr)), [MPTCP_PM_ADDR_ATTR_PORT] = { .type = NLA_U16 }, [MPTCP_PM_ADDR_ATTR_FLAGS] = { .type = NLA_U32 }, [MPTCP_PM_ADDR_ATTR_IF_IDX] = { .type = NLA_S32 }, @@@ -458,17 -473,14 +471,17 @@@ static int mptcp_pm_parse_addr(struct n entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
skip_family: - if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) - entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]); + if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) { + u32 val = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]); + + entry->addr.ifindex = val; + }
if (tb[MPTCP_PM_ADDR_ATTR_ID]) entry->addr.id = nla_get_u8(tb[MPTCP_PM_ADDR_ATTR_ID]);
if (tb[MPTCP_PM_ADDR_ATTR_FLAGS]) - entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]); + entry->addr.flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
return 0; } @@@ -536,9 -548,9 +549,9 @@@ static int mptcp_nl_cmd_del_addr(struc ret = -EINVAL; goto out; } - if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL) pernet->add_addr_signal_max--; - if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) pernet->local_addr_max--;
pernet->addrs--; @@@ -594,10 -606,10 +607,10 @@@ static int mptcp_nl_fill_addr(struct sk goto nla_put_failure; if (nla_put_u8(skb, MPTCP_PM_ADDR_ATTR_ID, addr->id)) goto nla_put_failure; - if (nla_put_u32(skb, MPTCP_PM_ADDR_ATTR_FLAGS, entry->flags)) + if (nla_put_u32(skb, MPTCP_PM_ADDR_ATTR_FLAGS, entry->addr.flags)) goto nla_put_failure; - if (entry->ifindex && - nla_put_s32(skb, MPTCP_PM_ADDR_ATTR_IF_IDX, entry->ifindex)) + if (entry->addr.ifindex && + nla_put_s32(skb, MPTCP_PM_ADDR_ATTR_IF_IDX, entry->addr.ifindex)) goto nla_put_failure;
if (addr->family == AF_INET && diff --combined net/mptcp/subflow.c index 34d6230df017,9ead43f79023..141d555b7bd2 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@@ -20,7 -20,6 +20,7 @@@ #include <net/ip6_route.h> #endif #include <net/mptcp.h> +#include <uapi/linux/mptcp.h> #include "protocol.h" #include "mib.h"
@@@ -435,7 -434,6 +435,7 @@@ static void mptcp_sock_destruct(struct sock_orphan(sk); }
+ skb_rbtree_purge(&mptcp_sk(sk)->out_of_order_queue); mptcp_token_destroy(mptcp_sk(sk)); inet_sock_destruct(sk); } @@@ -806,25 -804,16 +806,25 @@@ validate_seq return MAPPING_OK; }
-static int subflow_read_actor(read_descriptor_t *desc, - struct sk_buff *skb, - unsigned int offset, size_t len) +static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, + u64 limit) { - size_t copy_len = min(desc->count, len); - - desc->count -= copy_len; - - pr_debug("flushed %zu bytes, %zu left", copy_len, desc->count); - return copy_len; + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + bool fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; + u32 incr; + + incr = limit >= skb->len ? skb->len + fin : limit; + + pr_debug("discarding=%d len=%d seq=%d", incr, skb->len, + subflow->map_subflow_seq); + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DUPDATA); + tcp_sk(ssk)->copied_seq += incr; + if (!before(tcp_sk(ssk)->copied_seq, TCP_SKB_CB(skb)->end_seq)) + sk_eat_skb(ssk, skb); + if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len) + subflow->map_valid = 0; + if (incr) + tcp_cleanup_rbuf(ssk, incr); }
static bool subflow_check_data_avail(struct sock *ssk) @@@ -836,13 -825,13 +836,13 @@@
pr_debug("msk=%p ssk=%p data_avail=%d skb=%p", subflow->conn, ssk, subflow->data_avail, skb_peek(&ssk->sk_receive_queue)); + if (!skb_peek(&ssk->sk_receive_queue)) + subflow->data_avail = 0; if (subflow->data_avail) return true;
msk = mptcp_sk(subflow->conn); for (;;) { - u32 map_remaining; - size_t delta; u64 ack_seq; u64 old_ack;
@@@ -860,7 -849,6 +860,7 @@@ subflow->map_data_len = skb->len; subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - subflow->ssn_offset; + subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; return true; }
@@@ -888,18 -876,42 +888,18 @@@ ack_seq = mptcp_subflow_get_mapped_dsn(subflow); pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack, ack_seq); - if (ack_seq == old_ack) + if (ack_seq == old_ack) { + subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; break; + } else if (after64(ack_seq, old_ack)) { + subflow->data_avail = MPTCP_SUBFLOW_OOO_DATA; + break; + }
/* only accept in-sequence mapping. Old values are spurious - * retransmission; we can hit "future" values on active backup - * subflow switch, we relay on retransmissions to get - * in-sequence data. - * Cuncurrent subflows support will require subflow data - * reordering + * retransmission */ - map_remaining = subflow->map_data_len - - mptcp_subflow_get_map_offset(subflow); - if (before64(ack_seq, old_ack)) - delta = min_t(size_t, old_ack - ack_seq, map_remaining); - else - delta = min_t(size_t, ack_seq - old_ack, map_remaining); - - /* discard mapped data */ - pr_debug("discarding %zu bytes, current map len=%d", delta, - map_remaining); - if (delta) { - read_descriptor_t desc = { - .count = delta, - }; - int ret; - - ret = tcp_read_sock(ssk, &desc, subflow_read_actor); - if (ret < 0) { - ssk->sk_err = -ret; - goto fatal; - } - if (ret < delta) - return false; - if (delta == map_remaining) - subflow->map_valid = 0; - } + mptcp_subflow_discard_data(ssk, skb, old_ack - ack_seq); } return true;
@@@ -910,13 -922,13 +910,13 @@@ fatal ssk->sk_error_report(ssk); tcp_set_state(ssk, TCP_CLOSE); tcp_send_active_reset(ssk, GFP_ATOMIC); + subflow->data_avail = 0; return false; }
bool mptcp_subflow_data_available(struct sock *sk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); - struct sk_buff *skb;
/* check if current mapping is still valid */ if (subflow->map_valid && @@@ -929,7 -941,15 +929,7 @@@ subflow->map_data_len); }
- if (!subflow_check_data_avail(sk)) { - subflow->data_avail = 0; - return false; - } - - skb = skb_peek(&sk->sk_receive_queue); - subflow->data_avail = skb && - before(tcp_sk(sk)->copied_seq, TCP_SKB_CB(skb)->end_seq); - return subflow->data_avail; + return subflow_check_data_avail(sk); }
/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy, @@@ -976,10 -996,8 +976,10 @@@ static void subflow_write_space(struct struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct sock *parent = subflow->conn;
- sk_stream_write_space(sk); - if (sk_stream_is_writeable(sk)) { + if (!sk_stream_is_writeable(sk)) + return; + + if (sk_stream_is_writeable(parent)) { set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags); smp_mb__after_atomic(); /* set SEND_SPACE before sk_stream_write_space clears NOSPACE */ @@@ -1038,12 -1056,14 +1038,13 @@@ static void mptcp_info2sockaddr(const s #endif }
-int __mptcp_subflow_connect(struct sock *sk, int ifindex, - const struct mptcp_addr_info *loc, +int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, const struct mptcp_addr_info *remote) { struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_subflow_context *subflow; struct sockaddr_storage addr; + int remote_id = remote->id; int local_id = loc->id; struct socket *sf; struct sock *ssk; @@@ -1082,18 -1102,19 +1083,19 @@@ if (loc->family == AF_INET6) addrlen = sizeof(struct sockaddr_in6); #endif - ssk->sk_bound_dev_if = ifindex; + ssk->sk_bound_dev_if = loc->ifindex; err = kernel_bind(sf, (struct sockaddr *)&addr, addrlen); if (err) goto failed;
mptcp_crypto_key_sha(subflow->remote_key, &remote_token, NULL); - pr_debug("msk=%p remote_token=%u local_id=%d", msk, remote_token, - local_id); + pr_debug("msk=%p remote_token=%u local_id=%d remote_id=%d", msk, + remote_token, local_id, remote_id); subflow->remote_token = remote_token; subflow->local_id = local_id; + subflow->remote_id = remote_id; subflow->request_join = 1; - subflow->request_bkup = 1; + subflow->request_bkup = !!(loc->flags & MPTCP_PM_ADDR_FLAG_BACKUP); mptcp_info2sockaddr(remote, &addr);
err = kernel_connect(sf, (struct sockaddr *)&addr, addrlen, O_NONBLOCK); @@@ -1328,6 -1349,7 +1330,7 @@@ static void subflow_ulp_clone(const str new_ctx->fully_established = 1; new_ctx->backup = subflow_req->backup; new_ctx->local_id = subflow_req->local_id; + new_ctx->remote_id = subflow_req->remote_id; new_ctx->token = subflow_req->token; new_ctx->thmac = subflow_req->thmac; } diff --combined net/netfilter/nf_conntrack_netlink.c index 89d99f6dfd0a,c3a4214dc958..3d0fd33be018 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@@ -851,7 -851,6 +851,6 @@@ static int ctnetlink_done(struct netlin }
struct ctnetlink_filter { - u_int32_t cta_flags; u8 family;
u_int32_t orig_flags; @@@ -906,10 -905,6 +905,6 @@@ static int ctnetlink_parse_tuple_filter struct nf_conntrack_zone *zone, u_int32_t flags);
- /* applied on filters */ - #define CTA_FILTER_F_CTA_MARK (1 << 0) - #define CTA_FILTER_F_CTA_MARK_MASK (1 << 1) - static struct ctnetlink_filter * ctnetlink_alloc_filter(const struct nlattr * const cda[], u8 family) { @@@ -930,14 -925,10 +925,10 @@@ #ifdef CONFIG_NF_CONNTRACK_MARK if (cda[CTA_MARK]) { filter->mark.val = ntohl(nla_get_be32(cda[CTA_MARK])); - filter->cta_flags |= CTA_FILTER_FLAG(CTA_MARK); - - if (cda[CTA_MARK_MASK]) { + if (cda[CTA_MARK_MASK]) filter->mark.mask = ntohl(nla_get_be32(cda[CTA_MARK_MASK])); - filter->cta_flags |= CTA_FILTER_FLAG(CTA_MARK_MASK); - } else { + else filter->mark.mask = 0xffffffff; - } } else if (cda[CTA_MARK_MASK]) { err = -EINVAL; goto err_filter; @@@ -1117,11 -1108,7 +1108,7 @@@ static int ctnetlink_filter_match(struc }
#ifdef CONFIG_NF_CONNTRACK_MARK - if ((filter->cta_flags & CTA_FILTER_FLAG(CTA_MARK_MASK)) && - (ct->mark & filter->mark.mask) != filter->mark.val) - goto ignore_entry; - else if ((filter->cta_flags & CTA_FILTER_FLAG(CTA_MARK)) && - ct->mark != filter->mark.val) + if ((ct->mark & filter->mark.mask) != filter->mark.val) goto ignore_entry; #endif
@@@ -1404,7 -1391,8 +1391,8 @@@ ctnetlink_parse_tuple_filter(const stru if (err < 0) return err;
- + if (l3num != NFPROTO_IPV4 && l3num != NFPROTO_IPV6) + return -EOPNOTSUPP; tuple->src.l3num = l3num;
if (flags & CTA_FILTER_FLAG(CTA_IP_DST) || @@@ -2509,6 -2497,7 +2497,6 @@@ ctnetlink_ct_stat_cpu_fill_info(struct
if (nla_put_be32(skb, CTA_STATS_FOUND, htonl(st->found)) || nla_put_be32(skb, CTA_STATS_INVALID, htonl(st->invalid)) || - nla_put_be32(skb, CTA_STATS_IGNORE, htonl(st->ignore)) || nla_put_be32(skb, CTA_STATS_INSERT, htonl(st->insert)) || nla_put_be32(skb, CTA_STATS_INSERT_FAILED, htonl(st->insert_failed)) || @@@ -2516,9 -2505,7 +2504,9 @@@ nla_put_be32(skb, CTA_STATS_EARLY_DROP, htonl(st->early_drop)) || nla_put_be32(skb, CTA_STATS_ERROR, htonl(st->error)) || nla_put_be32(skb, CTA_STATS_SEARCH_RESTART, - htonl(st->search_restart))) + htonl(st->search_restart)) || + nla_put_be32(skb, CTA_STATS_CLASH_RESOLVE, + htonl(st->clash_resolve))) goto nla_put_failure;
nlmsg_end(skb, nlh); diff --combined net/netfilter/nf_tables_api.c index 84c0c1aaae99,4603b667973a..97fb6f776114 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@@ -650,8 -650,6 +650,8 @@@ static const struct nla_policy nft_tabl .len = NFT_TABLE_MAXNAMELEN - 1 }, [NFTA_TABLE_FLAGS] = { .type = NLA_U32 }, [NFTA_TABLE_HANDLE] = { .type = NLA_U64 }, + [NFTA_TABLE_USERDATA] = { .type = NLA_BINARY, + .len = NFT_USERDATA_MAXLEN } };
static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, @@@ -678,11 -676,6 +678,11 @@@ NFTA_TABLE_PAD)) goto nla_put_failure;
+ if (table->udata) { + if (nla_put(skb, NFTA_TABLE_USERDATA, table->udlen, table->udata)) + goto nla_put_failure; + } + nlmsg_end(skb, nlh); return 0;
@@@ -691,6 -684,18 +691,18 @@@ nla_put_failure return -1; }
+ struct nftnl_skb_parms { + bool report; + }; + #define NFT_CB(skb) (*(struct nftnl_skb_parms*)&((skb)->cb)) + + static void nft_notify_enqueue(struct sk_buff *skb, bool report, + struct list_head *notify_list) + { + NFT_CB(skb).report = report; + list_add_tail(&skb->list, notify_list); + } + static void nf_tables_table_notify(const struct nft_ctx *ctx, int event) { struct sk_buff *skb; @@@ -722,8 -727,7 +734,7 @@@ goto err; }
- nfnetlink_send(skb, ctx->net, ctx->portid, NFNLGRP_NFTABLES, - ctx->report, GFP_KERNEL); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -984,9 -988,8 +995,9 @@@ static int nf_tables_newtable(struct ne int family = nfmsg->nfgen_family; const struct nlattr *attr; struct nft_table *table; - u32 flags = 0; struct nft_ctx ctx; + u32 flags = 0; + u16 udlen = 0; int err;
lockdep_assert_held(&net->nft.commit_mutex); @@@ -1022,16 -1025,6 +1033,16 @@@ if (table->name == NULL) goto err_strdup;
+ if (nla[NFTA_TABLE_USERDATA]) { + udlen = nla_len(nla[NFTA_TABLE_USERDATA]); + table->udata = kzalloc(udlen, GFP_KERNEL); + if (table->udata == NULL) + goto err_table_udata; + + nla_memcpy(table->udata, nla[NFTA_TABLE_USERDATA], udlen); + table->udlen = udlen; + } + err = rhltable_init(&table->chains_ht, &nft_chain_ht_params); if (err) goto err_chain_ht; @@@ -1054,8 -1047,6 +1065,8 @@@ err_trans: rhltable_destroy(&table->chains_ht); err_chain_ht: + kfree(table->udata); +err_table_udata: kfree(table->name); err_strdup: kfree(table); @@@ -1488,8 -1479,7 +1499,7 @@@ static void nf_tables_chain_notify(cons goto err; }
- nfnetlink_send(skb, ctx->net, ctx->portid, NFNLGRP_NFTABLES, - ctx->report, GFP_KERNEL); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -2827,8 -2817,7 +2837,7 @@@ static void nf_tables_rule_notify(cons goto err; }
- nfnetlink_send(skb, ctx->net, ctx->portid, NFNLGRP_NFTABLES, - ctx->report, GFP_KERNEL); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -3857,8 -3846,7 +3866,7 @@@ static void nf_tables_set_notify(const goto err; }
- nfnetlink_send(skb, ctx->net, portid, NFNLGRP_NFTABLES, ctx->report, - gfp_flags); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(ctx->net, portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -4979,8 -4967,7 +4987,7 @@@ static void nf_tables_setelem_notify(co goto err; }
- nfnetlink_send(skb, net, portid, NFNLGRP_NFTABLES, ctx->report, - GFP_KERNEL); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(net, portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -5750,8 -5737,6 +5757,8 @@@ static const struct nla_policy nft_obj_ [NFTA_OBJ_TYPE] = { .type = NLA_U32 }, [NFTA_OBJ_DATA] = { .type = NLA_NESTED }, [NFTA_OBJ_HANDLE] = { .type = NLA_U64}, + [NFTA_OBJ_USERDATA] = { .type = NLA_BINARY, + .len = NFT_USERDATA_MAXLEN }, };
static struct nft_object *nft_obj_init(const struct nft_ctx *ctx, @@@ -5899,7 -5884,6 +5906,7 @@@ static int nf_tables_newobj(struct net struct nft_object *obj; struct nft_ctx ctx; u32 objtype; + u16 udlen; int err;
if (!nla[NFTA_OBJ_TYPE] || @@@ -5944,7 -5928,7 +5951,7 @@@ obj = nft_obj_init(&ctx, type, nla[NFTA_OBJ_DATA]); if (IS_ERR(obj)) { err = PTR_ERR(obj); - goto err1; + goto err_init; } obj->key.table = table; obj->handle = nf_tables_alloc_handle(table); @@@ -5952,44 -5936,32 +5959,44 @@@ obj->key.name = nla_strdup(nla[NFTA_OBJ_NAME], GFP_KERNEL); if (!obj->key.name) { err = -ENOMEM; - goto err2; + goto err_strdup; + } + + if (nla[NFTA_OBJ_USERDATA]) { + udlen = nla_len(nla[NFTA_OBJ_USERDATA]); + obj->udata = kzalloc(udlen, GFP_KERNEL); + if (obj->udata == NULL) + goto err_userdata; + + nla_memcpy(obj->udata, nla[NFTA_OBJ_USERDATA], udlen); + obj->udlen = udlen; }
err = nft_trans_obj_add(&ctx, NFT_MSG_NEWOBJ, obj); if (err < 0) - goto err3; + goto err_trans;
err = rhltable_insert(&nft_objname_ht, &obj->rhlhead, nft_objname_ht_params); if (err < 0) - goto err4; + goto err_obj_ht;
list_add_tail_rcu(&obj->list, &table->objects); table->use++; return 0; -err4: +err_obj_ht: /* queued in transaction log */ INIT_LIST_HEAD(&obj->list); return err; -err3: +err_trans: kfree(obj->key.name); -err2: +err_userdata: + kfree(obj->udata); +err_strdup: if (obj->ops->destroy) obj->ops->destroy(&ctx, obj); kfree(obj); -err1: +err_init: module_put(type->owner); return err; } @@@ -6021,10 -5993,6 +6028,10 @@@ static int nf_tables_fill_obj_info(stru NFTA_OBJ_PAD)) goto nla_put_failure;
+ if (obj->udata && + nla_put(skb, NFTA_OBJ_USERDATA, obj->udlen, obj->udata)) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0;
@@@ -6314,7 -6282,7 +6321,7 @@@ void nft_obj_notify(struct net *net, co goto err; }
- nfnetlink_send(skb, net, portid, NFNLGRP_NFTABLES, report, gfp); + nft_notify_enqueue(skb, report, &net->nft.notify_list); return; err: nfnetlink_set_err(net, portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -7124,8 -7092,7 +7131,7 @@@ static void nf_tables_flowtable_notify( goto err; }
- nfnetlink_send(skb, ctx->net, ctx->portid, NFNLGRP_NFTABLES, - ctx->report, GFP_KERNEL); + nft_notify_enqueue(skb, ctx->report, &ctx->net->nft.notify_list); return; err: nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); @@@ -7734,6 -7701,41 +7740,41 @@@ static void nf_tables_commit_release(st mutex_unlock(&net->nft.commit_mutex); }
+ static void nft_commit_notify(struct net *net, u32 portid) + { + struct sk_buff *batch_skb = NULL, *nskb, *skb; + unsigned char *data; + int len; + + list_for_each_entry_safe(skb, nskb, &net->nft.notify_list, list) { + if (!batch_skb) { + new_batch: + batch_skb = skb; + len = NLMSG_GOODSIZE - skb->len; + list_del(&skb->list); + continue; + } + len -= skb->len; + if (len > 0 && NFT_CB(skb).report == NFT_CB(batch_skb).report) { + data = skb_put(batch_skb, skb->len); + memcpy(data, skb->data, skb->len); + list_del(&skb->list); + kfree_skb(skb); + continue; + } + nfnetlink_send(batch_skb, net, portid, NFNLGRP_NFTABLES, + NFT_CB(batch_skb).report, GFP_KERNEL); + goto new_batch; + } + + if (batch_skb) { + nfnetlink_send(batch_skb, net, portid, NFNLGRP_NFTABLES, + NFT_CB(batch_skb).report, GFP_KERNEL); + } + + WARN_ON_ONCE(!list_empty(&net->nft.notify_list)); + } + static int nf_tables_commit(struct net *net, struct sk_buff *skb) { struct nft_trans *trans, *next; @@@ -7936,6 -7938,7 +7977,7 @@@ } }
+ nft_commit_notify(net, NETLINK_CB(skb).portid); nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); nf_tables_commit_release(net);
@@@ -8760,6 -8763,7 +8802,7 @@@ static int __net_init nf_tables_init_ne INIT_LIST_HEAD(&net->nft.tables); INIT_LIST_HEAD(&net->nft.commit_list); INIT_LIST_HEAD(&net->nft.module_list); + INIT_LIST_HEAD(&net->nft.notify_list); mutex_init(&net->nft.commit_mutex); net->nft.base_seq = 1; net->nft.validate_state = NFT_VALIDATE_SKIP; @@@ -8776,6 -8780,7 +8819,7 @@@ static void __net_exit nf_tables_exit_n mutex_unlock(&net->nft.commit_mutex); WARN_ON_ONCE(!list_empty(&net->nft.tables)); WARN_ON_ONCE(!list_empty(&net->nft.module_list)); + WARN_ON_ONCE(!list_empty(&net->nft.notify_list)); }
static struct pernet_operations nf_tables_net_ops = { diff --combined net/tipc/link.c index 97dc4b5fb20b,cef38a910107..06b880da2a8e --- a/net/tipc/link.c +++ b/net/tipc/link.c @@@ -216,6 -216,11 +216,6 @@@ enum #define TIPC_BC_RETR_LIM (jiffies + msecs_to_jiffies(10)) #define TIPC_UC_RETR_TIME (jiffies + msecs_to_jiffies(1))
-/* - * Interval between NACKs when packets arrive out of order - */ -#define TIPC_NACK_INTV (TIPC_MIN_LINK_WIN * 2) - /* Link FSM states: */ enum { @@@ -527,7 -532,8 +527,8 @@@ bool tipc_link_create(struct net *net, * tipc_link_bc_create - create new link to be used for broadcast * @net: pointer to associated network namespace * @mtu: mtu to be used initially if no peers - * @window: send window to be used + * @min_win: minimal send window to be used by link + * @max_win: maximal send window to be used by link * @inputq: queue to put messages ready for delivery * @namedq: queue to put binding table update messages ready for delivery * @link: return value, pointer to put the created link @@@ -1250,11 -1256,6 +1251,11 @@@ static bool tipc_data_input(struct tipc case MSG_FRAGMENTER: case BCAST_PROTOCOL: return false; +#ifdef CONFIG_TIPC_CRYPTO + case MSG_CRYPTO: + tipc_crypto_msg_rcv(l->net, skb); + return true; +#endif default: pr_warn("Dropping received illegal msg type\n"); kfree_skb(skb); diff --combined net/tipc/msg.c index 2d9a383b8192,52e93ba4d8e2..11a429f4c6cd --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@@ -150,7 -150,8 +150,8 @@@ int tipc_buf_append(struct sk_buff **he if (fragid == FIRST_FRAGMENT) { if (unlikely(head)) goto err; - if (unlikely(skb_unclone(frag, GFP_ATOMIC))) + frag = skb_unshare(frag, GFP_ATOMIC); + if (unlikely(!frag)) goto err; head = *headbuf = frag; *buf = NULL; @@@ -581,7 -582,7 +582,7 @@@ bundle * @pos: position in outer message of msg to be extracted. * Returns position of next msg * Consumes outer buffer when last packet extracted - * Returns true when when there is an extracted buffer, otherwise false + * Returns true when there is an extracted buffer, otherwise false */ bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos) { diff --combined net/tipc/socket.c index 0f894aca98ed,11b27ddc75ba..e795a8a2955b --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@@ -52,9 -52,10 +52,9 @@@ #define NAGLE_START_MAX 1024 #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */ #define CONN_PROBING_INTV msecs_to_jiffies(3600000) /* [ms] => 1 h */ -#define TIPC_FWD_MSG 1 #define TIPC_MAX_PORT 0xffffffff #define TIPC_MIN_PORT 1 -#define TIPC_ACK_RATE 4 /* ACK at 1/4 of of rcv window size */ +#define TIPC_ACK_RATE 4 /* ACK at 1/4 of rcv window size */
enum { TIPC_LISTEN = TCP_LISTEN, @@@ -2770,10 -2771,7 +2770,7 @@@ static int tipc_shutdown(struct socket
trace_tipc_sk_shutdown(sk, NULL, TIPC_DUMP_ALL, " "); __tipc_shutdown(sock, TIPC_CONN_SHUTDOWN); - if (tipc_sk_type_connectionless(sk)) - sk->sk_shutdown = SHUTDOWN_MASK; - else - sk->sk_shutdown = SEND_SHUTDOWN; + sk->sk_shutdown = SHUTDOWN_MASK;
if (sk->sk_state == TIPC_DISCONNECTING) { /* Discard any unreceived messages */ diff --combined net/wireless/util.c index ac2bb1a80f2b,6fa99df52f86..f01746894a4e --- a/net/wireless/util.c +++ b/net/wireless/util.c @@@ -95,7 -95,7 +95,7 @@@ u32 ieee80211_channel_to_freq_khz(int c /* see 802.11ax D6.1 27.3.23.2 */ if (chan == 2) return MHZ_TO_KHZ(5935); - if (chan <= 253) + if (chan <= 233) return MHZ_TO_KHZ(5950 + chan * 5); break; case NL80211_BAND_60GHZ: @@@ -111,33 -111,6 +111,33 @@@ } EXPORT_SYMBOL(ieee80211_channel_to_freq_khz);
+enum nl80211_chan_width +ieee80211_s1g_channel_width(const struct ieee80211_channel *chan) +{ + if (WARN_ON(!chan || chan->band != NL80211_BAND_S1GHZ)) + return NL80211_CHAN_WIDTH_20_NOHT; + + /*S1G defines a single allowed channel width per channel. + * Extract that width here. + */ + if (chan->flags & IEEE80211_CHAN_1MHZ) + return NL80211_CHAN_WIDTH_1; + else if (chan->flags & IEEE80211_CHAN_2MHZ) + return NL80211_CHAN_WIDTH_2; + else if (chan->flags & IEEE80211_CHAN_4MHZ) + return NL80211_CHAN_WIDTH_4; + else if (chan->flags & IEEE80211_CHAN_8MHZ) + return NL80211_CHAN_WIDTH_8; + else if (chan->flags & IEEE80211_CHAN_16MHZ) + return NL80211_CHAN_WIDTH_16; + + pr_err("unknown channel width for channel at %dKHz?\n", + ieee80211_channel_to_khz(chan)); + + return NL80211_CHAN_WIDTH_1; +} +EXPORT_SYMBOL(ieee80211_s1g_channel_width); + int ieee80211_freq_khz_to_channel(u32 freq) { /* TODO: just handle MHz for now */ @@@ -426,11 -399,6 +426,11 @@@ unsigned int __attribute_const__ ieee80 { unsigned int hdrlen = 24;
+ if (ieee80211_is_ext(fc)) { + hdrlen = 4; + goto out; + } + if (ieee80211_is_data(fc)) { if (ieee80211_has_a4(fc)) hdrlen = 30; diff --combined net/xdp/xdp_umem.c index a7227b447228,b010bfde0149..56d052bc65cb --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@@ -23,6 -23,162 +23,6 @@@
static DEFINE_IDA(umem_ida);
-void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) -{ - unsigned long flags; - - if (!xs->tx) - return; - - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); - list_add_rcu(&xs->list, &umem->xsk_tx_list); - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); -} - -void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) -{ - unsigned long flags; - - if (!xs->tx) - return; - - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); - list_del_rcu(&xs->list); - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); -} - -/* The umem is stored both in the _rx struct and the _tx struct as we do - * not know if the device has more tx queues than rx, or the opposite. - * This might also change during run time. - */ -static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem, - u16 queue_id) -{ - if (queue_id >= max_t(unsigned int, - dev->real_num_rx_queues, - dev->real_num_tx_queues)) - return -EINVAL; - - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].umem = umem; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].umem = umem; - - return 0; -} - -struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, - u16 queue_id) -{ - if (queue_id < dev->real_num_rx_queues) - return dev->_rx[queue_id].umem; - if (queue_id < dev->real_num_tx_queues) - return dev->_tx[queue_id].umem; - - return NULL; -} -EXPORT_SYMBOL(xdp_get_umem_from_qid); - -static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id) -{ - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].umem = NULL; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].umem = NULL; -} - -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id, u16 flags) -{ - bool force_zc, force_copy; - struct netdev_bpf bpf; - int err = 0; - - ASSERT_RTNL(); - - force_zc = flags & XDP_ZEROCOPY; - force_copy = flags & XDP_COPY; - - if (force_zc && force_copy) - return -EINVAL; - - if (xdp_get_umem_from_qid(dev, queue_id)) - return -EBUSY; - - err = xdp_reg_umem_at_qid(dev, umem, queue_id); - if (err) - return err; - - umem->dev = dev; - umem->queue_id = queue_id; - - if (flags & XDP_USE_NEED_WAKEUP) { - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; - /* Tx needs to be explicitly woken up the first time. - * Also for supporting drivers that do not implement this - * feature. They will always have to call sendto(). - */ - xsk_set_tx_need_wakeup(umem); - } - - dev_hold(dev); - - if (force_copy) - /* For copy-mode, we are done. */ - return 0; - - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { - err = -EOPNOTSUPP; - goto err_unreg_umem; - } - - bpf.command = XDP_SETUP_XSK_UMEM; - bpf.xsk.umem = umem; - bpf.xsk.queue_id = queue_id; - - err = dev->netdev_ops->ndo_bpf(dev, &bpf); - if (err) - goto err_unreg_umem; - - umem->zc = true; - return 0; - -err_unreg_umem: - if (!force_zc) - err = 0; /* fallback to copy mode */ - if (err) - xdp_clear_umem_at_qid(dev, queue_id); - return err; -} - -void xdp_umem_clear_dev(struct xdp_umem *umem) -{ - struct netdev_bpf bpf; - int err; - - ASSERT_RTNL(); - - if (!umem->dev) - return; - - if (umem->zc) { - bpf.command = XDP_SETUP_XSK_UMEM; - bpf.xsk.umem = NULL; - bpf.xsk.queue_id = umem->queue_id; - - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); - - if (err) - WARN(1, "failed to disable umem!\n"); - } - - xdp_clear_umem_at_qid(umem->dev, umem->queue_id); - - dev_put(umem->dev); - umem->dev = NULL; - umem->zc = false; -} - static void xdp_umem_unpin_pages(struct xdp_umem *umem) { unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); @@@ -39,33 -195,38 +39,33 @@@ static void xdp_umem_unaccount_pages(st } }
-static void xdp_umem_release(struct xdp_umem *umem) +static void xdp_umem_addr_unmap(struct xdp_umem *umem) { - rtnl_lock(); - xdp_umem_clear_dev(umem); - rtnl_unlock(); - - ida_simple_remove(&umem_ida, umem->id); + vunmap(umem->addrs); + umem->addrs = NULL; +}
- if (umem->fq) { - xskq_destroy(umem->fq); - umem->fq = NULL; - } +static int xdp_umem_addr_map(struct xdp_umem *umem, struct page **pages, + u32 nr_pages) +{ + umem->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (!umem->addrs) + return -ENOMEM; + return 0; +}
- if (umem->cq) { - xskq_destroy(umem->cq); - umem->cq = NULL; - } +static void xdp_umem_release(struct xdp_umem *umem) +{ + umem->zc = false; + ida_simple_remove(&umem_ida, umem->id);
- xp_destroy(umem->pool); + xdp_umem_addr_unmap(umem); xdp_umem_unpin_pages(umem);
xdp_umem_unaccount_pages(umem); kfree(umem); }
-static void xdp_umem_release_deferred(struct work_struct *work) -{ - struct xdp_umem *umem = container_of(work, struct xdp_umem, work); - - xdp_umem_release(umem); -} - void xdp_get_umem(struct xdp_umem *umem) { refcount_inc(&umem->users); @@@ -76,8 -237,10 +76,8 @@@ void xdp_put_umem(struct xdp_umem *umem if (!umem) return;
- if (refcount_dec_and_test(&umem->users)) { - INIT_WORK(&umem->work, xdp_umem_release_deferred); - schedule_work(&umem->work); - } + if (refcount_dec_and_test(&umem->users)) + xdp_umem_release(umem); }
static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) @@@ -140,10 -303,10 +140,10 @@@ static int xdp_umem_account_pages(struc
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) { + u32 npgs_rem, chunk_size = mr->chunk_size, headroom = mr->headroom; bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG; - u32 chunk_size = mr->chunk_size, headroom = mr->headroom; u64 npgs, addr = mr->addr, size = mr->len; - unsigned int chunks, chunks_per_page; + unsigned int chunks, chunks_rem; int err;
if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) { @@@ -156,7 -319,8 +156,7 @@@ return -EINVAL; }
- if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG | - XDP_UMEM_USES_NEED_WAKEUP)) + if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG) return -EINVAL;
if (!unaligned_chunks && !is_power_of_2(chunk_size)) @@@ -172,19 -336,18 +172,18 @@@ if ((addr + size) < addr) return -EINVAL;
- npgs = size >> PAGE_SHIFT; + npgs = div_u64_rem(size, PAGE_SIZE, &npgs_rem); + if (npgs_rem) + npgs++; if (npgs > U32_MAX) return -EINVAL;
- chunks = (unsigned int)div_u64(size, chunk_size); + chunks = (unsigned int)div_u64_rem(size, chunk_size, &chunks_rem); if (chunks == 0) return -EINVAL;
- if (!unaligned_chunks) { - chunks_per_page = PAGE_SIZE / chunk_size; - if (chunks < chunks_per_page || chunks % chunks_per_page) - return -EINVAL; - } + if (!unaligned_chunks && chunks_rem) + return -EINVAL;
if (headroom >= chunk_size - XDP_PACKET_HEADROOM) return -EINVAL; @@@ -192,13 -355,13 +191,13 @@@ umem->size = size; umem->headroom = headroom; umem->chunk_size = chunk_size; + umem->chunks = chunks; umem->npgs = (u32)npgs; umem->pgs = NULL; umem->user = NULL; umem->flags = mr->flags; - INIT_LIST_HEAD(&umem->xsk_tx_list); - spin_lock_init(&umem->xsk_tx_list_lock);
+ INIT_LIST_HEAD(&umem->xsk_dma_list); refcount_set(&umem->users, 1);
err = xdp_umem_account_pages(umem); @@@ -209,13 -372,15 +208,13 @@@ if (err) goto out_account;
- umem->pool = xp_create(umem->pgs, umem->npgs, chunks, chunk_size, - headroom, size, unaligned_chunks); - if (!umem->pool) { - err = -ENOMEM; - goto out_pin; - } + err = xdp_umem_addr_map(umem, umem->pgs, umem->npgs); + if (err) + goto out_unpin; + return 0;
-out_pin: +out_unpin: xdp_umem_unpin_pages(umem); out_account: xdp_umem_unaccount_pages(umem); @@@ -247,3 -412,8 +246,3 @@@ struct xdp_umem *xdp_umem_create(struc
return umem; } - -bool xdp_umem_validate_queues(struct xdp_umem *umem) -{ - return umem->fq && umem->cq; -} diff --combined tools/lib/bpf/Makefile index adbe994610f2,9ae8f4ef0aac..f43249696d9f --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@@ -1,9 -1,6 +1,9 @@@ # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) # Most of this file is copied from tools/lib/traceevent/Makefile
+RM ?= rm +srctree = $(abs_srctree) + LIBBPF_VERSION := $(shell \ grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \ sort -rV | head -n1 | cut -d'_' -f2) @@@ -59,10 -56,10 +59,10 @@@ ifndef VERBOS endif
FEATURE_USER = .libbpf -FEATURE_TESTS = libelf libelf-mmap zlib bpf reallocarray +FEATURE_TESTS = libelf zlib bpf FEATURE_DISPLAY = libelf zlib bpf
- INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi + INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
check_feat := 1 @@@ -101,8 -98,16 +101,8 @@@ els CFLAGS := -g -Wall endif
-ifeq ($(feature-libelf-mmap), 1) - override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT -endif - -ifeq ($(feature-reallocarray), 0) - override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY -endif - # Append required CFLAGS -override CFLAGS += $(EXTRA_WARNINGS) +override CFLAGS += $(EXTRA_WARNINGS) -Wno-switch-enum override CFLAGS += -Werror -Wall override CFLAGS += -fPIC override CFLAGS += $(INCLUDES) @@@ -147,6 -152,7 +147,7 @@@ GLOBAL_SYM_COUNT = $(shell readelf -s - awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \ sort -u | wc -l) VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
CMD_TARGETS = $(LIB_TARGET) $(PC_FILE) @@@ -191,7 -197,7 +192,7 @@@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $ @ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
$(OUTPUT)libbpf.a: $(BPF_IN_STATIC) - $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ + $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^
$(OUTPUT)libbpf.pc: $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \ @@@ -214,6 -220,7 +215,7 @@@ check_abi: $(OUTPUT)libbpf.s awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \ sort -u > $(OUTPUT)libbpf_global_syms.tmp; \ readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \ sort -u > $(OUTPUT)libbpf_versioned_syms.tmp; \ diff -u $(OUTPUT)libbpf_global_syms.tmp \ @@@ -264,10 -271,10 +266,10 @@@ install: install_lib install_pkgconfig ### Cleaning rules
config-clean: - $(call QUIET_CLEAN, config) + $(call QUIET_CLEAN, feature-detect) $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
-clean: +clean: config-clean $(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \ *~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \ $(SHARED_OBJDIR) $(STATIC_OBJDIR) \ @@@ -294,7 -301,7 +296,7 @@@ cscope cscope -b -q -I $(srctree)/include -f cscope.out
tags: - rm -f TAGS tags + $(RM) -f TAGS tags ls *.c *.h | xargs $(TAGS_PROG) -a
# Declare the contents of the .PHONY variable as phony. We keep that diff --combined tools/lib/bpf/libbpf.c index b688aadf09c5,7253b833576c..46d727b45c81 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@@ -44,6 -44,7 +44,6 @@@ #include <sys/vfs.h> #include <sys/utsname.h> #include <sys/resource.h> -#include <tools/libc_compat.h> #include <libelf.h> #include <gelf.h> #include <zlib.h> @@@ -55,6 -56,9 +55,6 @@@ #include "libbpf_internal.h" #include "hashmap.h"
-/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #ifndef EM_BPF #define EM_BPF 247 #endif @@@ -63,8 -67,6 +63,8 @@@ #define BPF_FS_MAGIC 0xcafe4a11 #endif
+#define BPF_INSN_SZ (sizeof(struct bpf_insn)) + /* vsprintf() in __base_pr() uses nonliteral format string. It may break * compilation if user enables corresponding warning. Disable it explicitly. */ @@@ -152,35 -154,34 +152,35 @@@ static void pr_perm_msg(int err ___err; }) #endif
-#ifdef HAVE_LIBELF_MMAP_SUPPORT -# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP -#else -# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ -#endif - static inline __u64 ptr_to_u64(const void *ptr) { return (__u64) (unsigned long) ptr; }
-struct bpf_capabilities { +enum kern_feature_id { /* v4.14: kernel support for program & map names. */ - __u32 name:1; + FEAT_PROG_NAME, /* v5.2: kernel support for global data sections. */ - __u32 global_data:1; + FEAT_GLOBAL_DATA, + /* BTF support */ + FEAT_BTF, /* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */ - __u32 btf_func:1; + FEAT_BTF_FUNC, /* BTF_KIND_VAR and BTF_KIND_DATASEC support */ - __u32 btf_datasec:1; - /* BPF_F_MMAPABLE is supported for arrays */ - __u32 array_mmap:1; + FEAT_BTF_DATASEC, /* BTF_FUNC_GLOBAL is supported */ - __u32 btf_func_global:1; + FEAT_BTF_GLOBAL_FUNC, + /* BPF_F_MMAPABLE is supported for arrays */ + FEAT_ARRAY_MMAP, /* kernel support for expected_attach_type in BPF_PROG_LOAD */ - __u32 exp_attach_type:1; + FEAT_EXP_ATTACH_TYPE, + /* bpf_probe_read_{kernel,user}[_str] helpers */ + FEAT_PROBE_READ_KERN, + __FEAT_CNT, };
+static bool kernel_supports(enum kern_feature_id feat_id); + enum reloc_type { RELO_LD64, RELO_CALL, @@@ -208,7 -209,6 +208,7 @@@ struct bpf_sec_def bool is_exp_attach_type_optional; bool is_attachable; bool is_attach_btf; + bool is_sleepable; attach_fn_t attach_fn; };
@@@ -253,6 -253,8 +253,6 @@@ struct bpf_program __u32 func_info_rec_size; __u32 func_info_cnt;
- struct bpf_capabilities *caps; - void *line_info; __u32 line_info_rec_size; __u32 line_info_cnt; @@@ -401,7 -403,6 +401,7 @@@ struct bpf_object Elf_Data *rodata; Elf_Data *bss; Elf_Data *st_ops_data; + size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct { GElf_Shdr shdr; @@@ -435,18 -436,12 +435,18 @@@ void *priv; bpf_object_clear_priv_t clear_priv;
- struct bpf_capabilities caps; - char path[]; }; #define obj_elf_valid(o) ((o)->efile.elf)
+static const char *elf_sym_str(const struct bpf_object *obj, size_t off); +static const char *elf_sec_str(const struct bpf_object *obj, size_t off); +static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx); +static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name); +static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr); +static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn); +static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn); + void bpf_program__unload(struct bpf_program *prog) { int i; @@@ -508,7 -503,7 +508,7 @@@ static char *__bpf_program__pin_name(st }
static int -bpf_program__init(void *data, size_t size, char *section_name, int idx, +bpf_program__init(void *data, size_t size, const char *section_name, int idx, struct bpf_program *prog) { const size_t bpf_insn_sz = sizeof(struct bpf_insn); @@@ -557,7 -552,7 +557,7 @@@ errout
static int bpf_object__add_program(struct bpf_object *obj, void *data, size_t size, - char *section_name, int idx) + const char *section_name, int idx) { struct bpf_program prog, *progs; int nr_progs, err; @@@ -566,10 -561,11 +566,10 @@@ if (err) return err;
- prog.caps = &obj->caps; progs = obj->programs; nr_progs = obj->nr_programs;
- progs = reallocarray(progs, nr_progs + 1, sizeof(progs[0])); + progs = libbpf_reallocarray(progs, nr_progs + 1, sizeof(progs[0])); if (!progs) { /* * In this case the original obj->programs @@@ -582,7 -578,7 +582,7 @@@ return -ENOMEM; }
- pr_debug("found program %s\n", prog.section_name); + pr_debug("elf: found program '%s'\n", prog.section_name); obj->programs = progs; obj->nr_programs = nr_progs + 1; prog.obj = obj; @@@ -602,7 -598,8 +602,7 @@@ bpf_object__init_prog_names(struct bpf_
prog = &obj->programs[pi];
- for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name; - si++) { + for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name; si++) { GElf_Sym sym;
if (!gelf_getsym(symbols, si, &sym)) @@@ -612,9 -609,11 +612,9 @@@ if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL) continue;
- name = elf_strptr(obj->efile.elf, - obj->efile.strtabidx, - sym.st_name); + name = elf_sym_str(obj, sym.st_name); if (!name) { - pr_warn("failed to get sym name string for prog %s\n", + pr_warn("prog '%s': failed to get symbol name\n", prog->section_name); return -LIBBPF_ERRNO__LIBELF; } @@@ -624,14 -623,17 +624,14 @@@ name = ".text";
if (!name) { - pr_warn("failed to find sym for prog %s\n", + pr_warn("prog '%s': failed to find program symbol\n", prog->section_name); return -EINVAL; }
prog->name = strdup(name); - if (!prog->name) { - pr_warn("failed to allocate memory for prog sym %s\n", - name); + if (!prog->name) return -ENOMEM; - } }
return 0; @@@ -1064,18 -1066,13 +1064,18 @@@ static void bpf_object__elf_finish(stru obj->efile.obj_buf_sz = 0; }
+/* if libelf is old and doesn't support mmap(), fall back to read() */ +#ifndef ELF_C_READ_MMAP +#define ELF_C_READ_MMAP ELF_C_READ +#endif + static int bpf_object__elf_init(struct bpf_object *obj) { int err = 0; GElf_Ehdr *ep;
if (obj_elf_valid(obj)) { - pr_warn("elf init: internal error\n"); + pr_warn("elf: init internal error\n"); return -LIBBPF_ERRNO__LIBELF; }
@@@ -1093,44 -1090,31 +1093,44 @@@
err = -errno; cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg)); - pr_warn("failed to open %s: %s\n", obj->path, cp); + pr_warn("elf: failed to open %s: %s\n", obj->path, cp); return err; }
- obj->efile.elf = elf_begin(obj->efile.fd, - LIBBPF_ELF_C_READ_MMAP, NULL); + obj->efile.elf = elf_begin(obj->efile.fd, ELF_C_READ_MMAP, NULL); }
if (!obj->efile.elf) { - pr_warn("failed to open %s as ELF file\n", obj->path); + pr_warn("elf: failed to open %s as ELF file: %s\n", obj->path, elf_errmsg(-1)); err = -LIBBPF_ERRNO__LIBELF; goto errout; }
if (!gelf_getehdr(obj->efile.elf, &obj->efile.ehdr)) { - pr_warn("failed to get EHDR from %s\n", obj->path); + pr_warn("elf: failed to get ELF header from %s: %s\n", obj->path, elf_errmsg(-1)); err = -LIBBPF_ERRNO__FORMAT; goto errout; } ep = &obj->efile.ehdr;
+ if (elf_getshdrstrndx(obj->efile.elf, &obj->efile.shstrndx)) { + pr_warn("elf: failed to get section names section index for %s: %s\n", + obj->path, elf_errmsg(-1)); + err = -LIBBPF_ERRNO__FORMAT; + goto errout; + } + + /* Elf is corrupted/truncated, avoid calling elf_strptr. */ + if (!elf_rawdata(elf_getscn(obj->efile.elf, obj->efile.shstrndx), NULL)) { + pr_warn("elf: failed to get section names strings from %s: %s\n", + obj->path, elf_errmsg(-1)); + return -LIBBPF_ERRNO__FORMAT; + } + /* Old LLVM set e_machine to EM_NONE */ if (ep->e_type != ET_REL || (ep->e_machine && ep->e_machine != EM_BPF)) { - pr_warn("%s is not an eBPF object file\n", obj->path); + pr_warn("elf: %s is not a valid eBPF object file\n", obj->path); err = -LIBBPF_ERRNO__FORMAT; goto errout; } @@@ -1152,7 -1136,7 +1152,7 @@@ static int bpf_object__check_endianness #else # error "Unrecognized __BYTE_ORDER__" #endif - pr_warn("endianness mismatch.\n"); + pr_warn("elf: endianness mismatch in %s.\n", obj->path); return -LIBBPF_ERRNO__ENDIAN; }
@@@ -1187,10 -1171,55 +1187,10 @@@ static bool bpf_map_type__is_map_in_map return false; }
-static int bpf_object_search_section_size(const struct bpf_object *obj, - const char *name, size_t *d_size) -{ - const GElf_Ehdr *ep = &obj->efile.ehdr; - Elf *elf = obj->efile.elf; - Elf_Scn *scn = NULL; - int idx = 0; - - while ((scn = elf_nextscn(elf, scn)) != NULL) { - const char *sec_name; - Elf_Data *data; - GElf_Shdr sh; - - idx++; - if (gelf_getshdr(scn, &sh) != &sh) { - pr_warn("failed to get section(%d) header from %s\n", - idx, obj->path); - return -EIO; - } - - sec_name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); - if (!sec_name) { - pr_warn("failed to get section(%d) name from %s\n", - idx, obj->path); - return -EIO; - } - - if (strcmp(name, sec_name)) - continue; - - data = elf_getdata(scn, 0); - if (!data) { - pr_warn("failed to get section(%d) data from %s(%s)\n", - idx, name, obj->path); - return -EIO; - } - - *d_size = data->d_size; - return 0; - } - - return -ENOENT; -} - int bpf_object__section_size(const struct bpf_object *obj, const char *name, __u32 *size) { int ret = -ENOENT; - size_t d_size;
*size = 0; if (!name) { @@@ -1208,13 -1237,9 +1208,13 @@@ if (obj->efile.st_ops_data) *size = obj->efile.st_ops_data->d_size; } else { - ret = bpf_object_search_section_size(obj, name, &d_size); - if (!ret) - *size = d_size; + Elf_Scn *scn = elf_sec_by_name(obj, name); + Elf_Data *data = elf_sec_data(obj, scn); + + if (data) { + ret = 0; /* found it */ + *size = data->d_size; + } }
return *size ? 0 : ret; @@@ -1239,7 -1264,8 +1239,7 @@@ int bpf_object__variable_offset(const s GELF_ST_TYPE(sym.st_info) != STT_OBJECT) continue;
- sname = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + sname = elf_sym_str(obj, sym.st_name); if (!sname) { pr_warn("failed to get sym name string for var %s\n", name); @@@ -1264,7 -1290,7 +1264,7 @@@ static struct bpf_map *bpf_object__add_ return &obj->maps[obj->nr_maps++];
new_cap = max((size_t)4, obj->maps_cap * 3 / 2); - new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps)); + new_maps = libbpf_reallocarray(obj->maps, new_cap, sizeof(*obj->maps)); if (!new_maps) { pr_warn("alloc maps for object failed\n"); return ERR_PTR(-ENOMEM); @@@ -1716,12 -1742,12 +1716,12 @@@ static int bpf_object__init_user_maps(s if (!symbols) return -EINVAL;
- scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx); - if (scn) - data = elf_getdata(scn, NULL); + + scn = elf_sec_by_idx(obj, obj->efile.maps_shndx); + data = elf_sec_data(obj, scn); if (!scn || !data) { - pr_warn("failed to get Elf_Data from map section %d\n", - obj->efile.maps_shndx); + pr_warn("elf: failed to get legacy map definitions for %s\n", + obj->path); return -EINVAL; }
@@@ -1743,12 -1769,12 +1743,12 @@@ nr_maps++; } /* Assume equally sized map definitions */ - pr_debug("maps in %s: %d maps in %zd bytes\n", - obj->path, nr_maps, data->d_size); + pr_debug("elf: found %d legacy map definitions (%zd bytes) in %s\n", + nr_maps, data->d_size, obj->path);
if (!data->d_size || nr_maps == 0 || (data->d_size % nr_maps) != 0) { - pr_warn("unable to determine map definition size section %s, %d maps in %zd bytes\n", - obj->path, nr_maps, data->d_size); + pr_warn("elf: unable to determine legacy map definition size in %s\n", + obj->path); return -EINVAL; } map_def_sz = data->d_size / nr_maps; @@@ -1769,7 -1795,8 +1769,7 @@@ if (IS_ERR(map)) return PTR_ERR(map);
- map_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + map_name = elf_sym_str(obj, sym.st_name); if (!map_name) { pr_warn("failed to get map #%d name sym string for obj %s\n", i, obj->path); @@@ -1857,29 -1884,6 +1857,29 @@@ resolve_func_ptr(const struct btf *btf return btf_is_func_proto(t) ? t : NULL; }
+static const char *btf_kind_str(const struct btf_type *t) +{ + switch (btf_kind(t)) { + case BTF_KIND_UNKN: return "void"; + case BTF_KIND_INT: return "int"; + case BTF_KIND_PTR: return "ptr"; + case BTF_KIND_ARRAY: return "array"; + case BTF_KIND_STRUCT: return "struct"; + case BTF_KIND_UNION: return "union"; + case BTF_KIND_ENUM: return "enum"; + case BTF_KIND_FWD: return "fwd"; + case BTF_KIND_TYPEDEF: return "typedef"; + case BTF_KIND_VOLATILE: return "volatile"; + case BTF_KIND_CONST: return "const"; + case BTF_KIND_RESTRICT: return "restrict"; + case BTF_KIND_FUNC: return "func"; + case BTF_KIND_FUNC_PROTO: return "func_proto"; + case BTF_KIND_VAR: return "var"; + case BTF_KIND_DATASEC: return "datasec"; + default: return "unknown"; + } +} + /* * Fetch integer attribute of BTF map definition. Such attributes are * represented using a pointer to an array, in which dimensionality of array @@@ -1896,8 -1900,8 +1896,8 @@@ static bool get_map_field_int(const cha const struct btf_type *arr_t;
if (!btf_is_ptr(t)) { - pr_warn("map '%s': attr '%s': expected PTR, got %u.\n", - map_name, name, btf_kind(t)); + pr_warn("map '%s': attr '%s': expected PTR, got %s.\n", + map_name, name, btf_kind_str(t)); return false; }
@@@ -1908,8 -1912,8 +1908,8 @@@ return false; } if (!btf_is_array(arr_t)) { - pr_warn("map '%s': attr '%s': expected ARRAY, got %u.\n", - map_name, name, btf_kind(arr_t)); + pr_warn("map '%s': attr '%s': expected ARRAY, got %s.\n", + map_name, name, btf_kind_str(arr_t)); return false; } arr_info = btf_array(arr_t); @@@ -1920,7 -1924,7 +1920,7 @@@ static int build_map_pin_path(struct bpf_map *map, const char *path) { char buf[PATH_MAX]; - int err, len; + int len;
if (!path) path = "/sys/fs/bpf"; @@@ -1931,7 -1935,11 +1931,7 @@@ else if (len >= PATH_MAX) return -ENAMETOOLONG;
- err = bpf_map__set_pin_path(map, buf); - if (err) - return err; - - return 0; + return bpf_map__set_pin_path(map, buf); }
@@@ -1999,8 -2007,8 +1999,8 @@@ static int parse_btf_map_def(struct bpf return -EINVAL; } if (!btf_is_ptr(t)) { - pr_warn("map '%s': key spec is not PTR: %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': key spec is not PTR: %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } sz = btf__resolve_size(obj->btf, t->type); @@@ -2041,8 -2049,8 +2041,8 @@@ return -EINVAL; } if (!btf_is_ptr(t)) { - pr_warn("map '%s': value spec is not PTR: %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': value spec is not PTR: %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } sz = btf__resolve_size(obj->btf, t->type); @@@ -2099,14 -2107,14 +2099,14 @@@ t = skip_mods_and_typedefs(obj->btf, btf_array(t)->type, NULL); if (!btf_is_ptr(t)) { - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } t = skip_mods_and_typedefs(obj->btf, t->type, NULL); if (!btf_is_struct(t)) { - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; }
@@@ -2197,8 -2205,8 +2197,8 @@@ static int bpf_object__init_user_btf_ma return -EINVAL; } if (!btf_is_var(var)) { - pr_warn("map '%s': unexpected var kind %u.\n", - map_name, btf_kind(var)); + pr_warn("map '%s': unexpected var kind %s.\n", + map_name, btf_kind_str(var)); return -EINVAL; } if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED && @@@ -2210,8 -2218,8 +2210,8 @@@
def = skip_mods_and_typedefs(obj->btf, var->type, NULL); if (!btf_is_struct(def)) { - pr_warn("map '%s': unexpected def kind %u.\n", - map_name, btf_kind(var)); + pr_warn("map '%s': unexpected def kind %s.\n", + map_name, btf_kind_str(var)); return -EINVAL; } if (def->size > vi->size) { @@@ -2251,11 -2259,12 +2251,11 @@@ static int bpf_object__init_user_btf_ma if (obj->efile.btf_maps_shndx < 0) return 0;
- scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx); - if (scn) - data = elf_getdata(scn, NULL); + scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); + data = elf_sec_data(obj, scn); if (!scn || !data) { - pr_warn("failed to get Elf_Data from map section %d (%s)\n", - obj->efile.btf_maps_shndx, MAPS_ELF_SEC); + pr_warn("elf: failed to get %s map definitions for %s\n", + MAPS_ELF_SEC, obj->path); return -EINVAL; }
@@@ -2313,28 -2322,36 +2313,28 @@@ static int bpf_object__init_maps(struc
static bool section_have_execinstr(struct bpf_object *obj, int idx) { - Elf_Scn *scn; GElf_Shdr sh;
- scn = elf_getscn(obj->efile.elf, idx); - if (!scn) - return false; - - if (gelf_getshdr(scn, &sh) != &sh) + if (elf_sec_hdr(obj, elf_sec_by_idx(obj, idx), &sh)) return false;
- if (sh.sh_flags & SHF_EXECINSTR) - return true; - - return false; + return sh.sh_flags & SHF_EXECINSTR; }
static bool btf_needs_sanitization(struct bpf_object *obj) { - bool has_func_global = obj->caps.btf_func_global; - bool has_datasec = obj->caps.btf_datasec; - bool has_func = obj->caps.btf_func; + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); + bool has_func = kernel_supports(FEAT_BTF_FUNC);
return !has_func || !has_datasec || !has_func_global; }
static void bpf_object__sanitize_btf(struct bpf_object *obj, struct btf *btf) { - bool has_func_global = obj->caps.btf_func_global; - bool has_datasec = obj->caps.btf_datasec; - bool has_func = obj->caps.btf_func; + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); + bool has_func = kernel_supports(FEAT_BTF_FUNC); struct btf_type *t; int i, j, vlen;
@@@ -2482,7 -2499,7 +2482,7 @@@ static int bpf_object__load_vmlinux_btf int err;
/* CO-RE relocations need kernel BTF */ - if (obj->btf_ext && obj->btf_ext->field_reloc_info.len) + if (obj->btf_ext && obj->btf_ext->core_relo_info.len) need_vmlinux_btf = true;
bpf_object__for_each_program(prog, obj) { @@@ -2516,15 -2533,6 +2516,15 @@@ static int bpf_object__sanitize_and_loa if (!obj->btf) return 0;
+ if (!kernel_supports(FEAT_BTF)) { + if (kernel_needs_btf(obj)) { + err = -EOPNOTSUPP; + goto report; + } + pr_debug("Kernel doesn't support BTF, skipping uploading it.\n"); + return 0; + } + sanitize = btf_needs_sanitization(obj); if (sanitize) { const void *raw_data; @@@ -2550,7 -2558,6 +2550,7 @@@ } btf__free(kern_btf); } +report: if (err) { btf_mandatory = kernel_needs_btf(obj); pr_warn("Error loading .BTF into kernel: %d. %s\n", err, @@@ -2562,199 -2569,61 +2562,199 @@@ return err; }
+static const char *elf_sym_str(const struct bpf_object *obj, size_t off) +{ + const char *name; + + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, off); + if (!name) { + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", + off, obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static const char *elf_sec_str(const struct bpf_object *obj, size_t off) +{ + const char *name; + + name = elf_strptr(obj->efile.elf, obj->efile.shstrndx, off); + if (!name) { + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", + off, obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx) +{ + Elf_Scn *scn; + + scn = elf_getscn(obj->efile.elf, idx); + if (!scn) { + pr_warn("elf: failed to get section(%zu) from %s: %s\n", + idx, obj->path, elf_errmsg(-1)); + return NULL; + } + return scn; +} + +static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name) +{ + Elf_Scn *scn = NULL; + Elf *elf = obj->efile.elf; + const char *sec_name; + + while ((scn = elf_nextscn(elf, scn)) != NULL) { + sec_name = elf_sec_name(obj, scn); + if (!sec_name) + return NULL; + + if (strcmp(sec_name, name) != 0) + continue; + + return scn; + } + return NULL; +} + +static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr) +{ + if (!scn) + return -EINVAL; + + if (gelf_getshdr(scn, hdr) != hdr) { + pr_warn("elf: failed to get section(%zu) header from %s: %s\n", + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); + return -EINVAL; + } + + return 0; +} + +static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn) +{ + const char *name; + GElf_Shdr sh; + + if (!scn) + return NULL; + + if (elf_sec_hdr(obj, scn, &sh)) + return NULL; + + name = elf_sec_str(obj, sh.sh_name); + if (!name) { + pr_warn("elf: failed to get section(%zu) name from %s: %s\n", + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn) +{ + Elf_Data *data; + + if (!scn) + return NULL; + + data = elf_getdata(scn, 0); + if (!data) { + pr_warn("elf: failed to get section(%zu) %s data from %s: %s\n", + elf_ndxscn(scn), elf_sec_name(obj, scn) ?: "<?>", + obj->path, elf_errmsg(-1)); + return NULL; + } + + return data; +} + +static bool is_sec_name_dwarf(const char *name) +{ + /* approximation, but the actual list is too long */ + return strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0; +} + +static bool ignore_elf_section(GElf_Shdr *hdr, const char *name) +{ + /* no special handling of .strtab */ + if (hdr->sh_type == SHT_STRTAB) + return true; + + /* ignore .llvm_addrsig section as well */ + if (hdr->sh_type == 0x6FFF4C03 /* SHT_LLVM_ADDRSIG */) + return true; + + /* no subprograms will lead to an empty .text section, ignore it */ + if (hdr->sh_type == SHT_PROGBITS && hdr->sh_size == 0 && + strcmp(name, ".text") == 0) + return true; + + /* DWARF sections */ + if (is_sec_name_dwarf(name)) + return true; + + if (strncmp(name, ".rel", sizeof(".rel") - 1) == 0) { + name += sizeof(".rel") - 1; + /* DWARF section relocations */ + if (is_sec_name_dwarf(name)) + return true; + + /* .BTF and .BTF.ext don't need relocations */ + if (strcmp(name, BTF_ELF_SEC) == 0 || + strcmp(name, BTF_EXT_ELF_SEC) == 0) + return true; + } + + return false; +} + static int bpf_object__elf_collect(struct bpf_object *obj) { Elf *elf = obj->efile.elf; - GElf_Ehdr *ep = &obj->efile.ehdr; Elf_Data *btf_ext_data = NULL; Elf_Data *btf_data = NULL; Elf_Scn *scn = NULL; int idx = 0, err = 0;
- /* Elf is corrupted/truncated, avoid calling elf_strptr. */ - if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) { - pr_warn("failed to get e_shstrndx from %s\n", obj->path); - return -LIBBPF_ERRNO__FORMAT; - } - while ((scn = elf_nextscn(elf, scn)) != NULL) { - char *name; + const char *name; GElf_Shdr sh; Elf_Data *data;
idx++; - if (gelf_getshdr(scn, &sh) != &sh) { - pr_warn("failed to get section(%d) header from %s\n", - idx, obj->path); + + if (elf_sec_hdr(obj, scn, &sh)) return -LIBBPF_ERRNO__FORMAT; - }
- name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); - if (!name) { - pr_warn("failed to get section(%d) name from %s\n", - idx, obj->path); + name = elf_sec_str(obj, sh.sh_name); + if (!name) return -LIBBPF_ERRNO__FORMAT; - }
- data = elf_getdata(scn, 0); - if (!data) { - pr_warn("failed to get section(%d) data from %s(%s)\n", - idx, name, obj->path); + if (ignore_elf_section(&sh, name)) + continue; + + data = elf_sec_data(obj, scn); + if (!data) return -LIBBPF_ERRNO__FORMAT; - } - pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", + + pr_debug("elf: section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", idx, name, (unsigned long)data->d_size, (int)sh.sh_link, (unsigned long)sh.sh_flags, (int)sh.sh_type);
if (strcmp(name, "license") == 0) { - err = bpf_object__init_license(obj, - data->d_buf, - data->d_size); + err = bpf_object__init_license(obj, data->d_buf, data->d_size); if (err) return err; } else if (strcmp(name, "version") == 0) { - err = bpf_object__init_kversion(obj, - data->d_buf, - data->d_size); + err = bpf_object__init_kversion(obj, data->d_buf, data->d_size); if (err) return err; } else if (strcmp(name, "maps") == 0) { @@@ -2767,7 -2636,8 +2767,7 @@@ btf_ext_data = data; } else if (sh.sh_type == SHT_SYMTAB) { if (obj->efile.symbols) { - pr_warn("bpf: multiple SYMTAB in %s\n", - obj->path); + pr_warn("elf: multiple symbol tables in %s\n", obj->path); return -LIBBPF_ERRNO__FORMAT; } obj->efile.symbols = data; @@@ -2780,8 -2650,16 +2780,8 @@@ err = bpf_object__add_program(obj, data->d_buf, data->d_size, name, idx); - if (err) { - char errmsg[STRERR_BUFSIZE]; - char *cp; - - cp = libbpf_strerror_r(-err, errmsg, - sizeof(errmsg)); - pr_warn("failed to alloc program %s (%s): %s", - name, obj->path, cp); + if (err) return err; - } } else if (strcmp(name, DATA_SEC) == 0) { obj->efile.data = data; obj->efile.data_shndx = idx; @@@ -2792,8 -2670,7 +2792,8 @@@ obj->efile.st_ops_data = data; obj->efile.st_ops_shndx = idx; } else { - pr_debug("skip section(%d) %s\n", idx, name); + pr_info("elf: skipping unrecognized data section(%d) %s\n", + idx, name); } } else if (sh.sh_type == SHT_REL) { int nr_sects = obj->efile.nr_reloc_sects; @@@ -2804,33 -2681,34 +2804,33 @@@ if (!section_have_execinstr(obj, sec) && strcmp(name, ".rel" STRUCT_OPS_SEC) && strcmp(name, ".rel" MAPS_ELF_SEC)) { - pr_debug("skip relo %s(%d) for section(%d)\n", - name, idx, sec); + pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", + idx, name, sec, + elf_sec_name(obj, elf_sec_by_idx(obj, sec)) ?: "<?>"); continue; }
- sects = reallocarray(sects, nr_sects + 1, - sizeof(*obj->efile.reloc_sects)); - if (!sects) { - pr_warn("reloc_sects realloc failed\n"); + sects = libbpf_reallocarray(sects, nr_sects + 1, + sizeof(*obj->efile.reloc_sects)); + if (!sects) return -ENOMEM; - }
obj->efile.reloc_sects = sects; obj->efile.nr_reloc_sects++;
obj->efile.reloc_sects[nr_sects].shdr = sh; obj->efile.reloc_sects[nr_sects].data = data; - } else if (sh.sh_type == SHT_NOBITS && - strcmp(name, BSS_SEC) == 0) { + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, BSS_SEC) == 0) { obj->efile.bss = data; obj->efile.bss_shndx = idx; } else { - pr_debug("skip section(%d) %s\n", idx, name); + pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name, + (size_t)sh.sh_size); } }
if (!obj->efile.strtabidx || obj->efile.strtabidx > idx) { - pr_warn("Corrupted ELF file: index of strtab invalid\n"); + pr_warn("elf: symbol strings section missing or invalid in %s\n", obj->path); return -LIBBPF_ERRNO__FORMAT; } return bpf_object__init_btf(obj, btf_data, btf_ext_data); @@@ -2991,13 -2869,14 +2991,13 @@@ static int bpf_object__collect_externs( if (!obj->efile.symbols) return 0;
- scn = elf_getscn(obj->efile.elf, obj->efile.symbols_shndx); - if (!scn) - return -LIBBPF_ERRNO__FORMAT; - if (gelf_getshdr(scn, &sh) != &sh) + scn = elf_sec_by_idx(obj, obj->efile.symbols_shndx); + if (elf_sec_hdr(obj, scn, &sh)) return -LIBBPF_ERRNO__FORMAT; - n = sh.sh_size / sh.sh_entsize;
+ n = sh.sh_size / sh.sh_entsize; pr_debug("looking for externs among %d symbols...\n", n); + for (i = 0; i < n; i++) { GElf_Sym sym;
@@@ -3005,12 -2884,13 +3005,12 @@@ return -LIBBPF_ERRNO__FORMAT; if (!sym_is_extern(&sym)) continue; - ext_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + ext_name = elf_sym_str(obj, sym.st_name); if (!ext_name || !ext_name[0]) continue;
ext = obj->externs; - ext = reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); + ext = libbpf_reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); if (!ext) return -ENOMEM; obj->externs = ext; @@@ -3229,7 -3109,7 +3229,7 @@@ bpf_object__section_to_libbpf_map_type(
static int bpf_program__record_reloc(struct bpf_program *prog, struct reloc_desc *reloc_desc, - __u32 insn_idx, const char *name, + __u32 insn_idx, const char *sym_name, const GElf_Sym *sym, const GElf_Rel *rel) { struct bpf_insn *insn = &prog->insns[insn_idx]; @@@ -3237,25 -3117,22 +3237,25 @@@ struct bpf_object *obj = prog->obj; __u32 shdr_idx = sym->st_shndx; enum libbpf_map_type type; + const char *sym_sec_name; struct bpf_map *map;
/* sub-program call relocation */ if (insn->code == (BPF_JMP | BPF_CALL)) { if (insn->src_reg != BPF_PSEUDO_CALL) { - pr_warn("incorrect bpf_call opcode\n"); + pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name); return -LIBBPF_ERRNO__RELOC; } /* text_shndx can be 0, if no default "main" program exists */ if (!shdr_idx || shdr_idx != obj->efile.text_shndx) { - pr_warn("bad call relo against section %u\n", shdr_idx); + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); + pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n", + prog->name, sym_name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } - if (sym->st_value % 8) { - pr_warn("bad call relo offset: %zu\n", - (size_t)sym->st_value); + if (sym->st_value % BPF_INSN_SZ) { + pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n", + prog->name, sym_name, (size_t)sym->st_value); return -LIBBPF_ERRNO__RELOC; } reloc_desc->type = RELO_CALL; @@@ -3266,8 -3143,8 +3266,8 @@@ }
if (insn->code != (BPF_LD | BPF_IMM | BPF_DW)) { - pr_warn("invalid relo for insns[%d].code 0x%x\n", - insn_idx, insn->code); + pr_warn("prog '%s': invalid relo against '%s' for insns[%d].code 0x%x\n", + prog->name, sym_name, insn_idx, insn->code); return -LIBBPF_ERRNO__RELOC; }
@@@ -3282,12 -3159,12 +3282,12 @@@ break; } if (i >= n) { - pr_warn("extern relo failed to find extern for sym %d\n", - sym_idx); + pr_warn("prog '%s': extern relo failed to find extern for '%s' (%d)\n", + prog->name, sym_name, sym_idx); return -LIBBPF_ERRNO__RELOC; } - pr_debug("found extern #%d '%s' (sym %d) for insn %u\n", - i, ext->name, ext->sym_idx, insn_idx); + pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n", + prog->name, i, ext->name, ext->sym_idx, insn_idx); reloc_desc->type = RELO_EXTERN; reloc_desc->insn_idx = insn_idx; reloc_desc->sym_off = i; /* sym_off stores extern index */ @@@ -3295,19 -3172,18 +3295,19 @@@ }
if (!shdr_idx || shdr_idx >= SHN_LORESERVE) { - pr_warn("invalid relo for '%s' in special section 0x%x; forgot to initialize global var?..\n", - name, shdr_idx); + pr_warn("prog '%s': invalid relo against '%s' in special section 0x%x; forgot to initialize global var?..\n", + prog->name, sym_name, shdr_idx); return -LIBBPF_ERRNO__RELOC; }
type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx); + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));
/* generic map reference relocation */ if (type == LIBBPF_MAP_UNSPEC) { if (!bpf_object__shndx_is_maps(obj, shdr_idx)) { - pr_warn("bad map relo against section %u\n", - shdr_idx); + pr_warn("prog '%s': bad map relo against '%s' in section '%s'\n", + prog->name, sym_name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } for (map_idx = 0; map_idx < nr_maps; map_idx++) { @@@ -3316,14 -3192,14 +3316,14 @@@ map->sec_idx != sym->st_shndx || map->sec_offset != sym->st_value) continue; - pr_debug("found map %zd (%s, sec %d, off %zu) for insn %u\n", - map_idx, map->name, map->sec_idx, + pr_debug("prog '%s': found map %zd (%s, sec %d, off %zu) for insn #%u\n", + prog->name, map_idx, map->name, map->sec_idx, map->sec_offset, insn_idx); break; } if (map_idx >= nr_maps) { - pr_warn("map relo failed to find map for sec %u, off %zu\n", - shdr_idx, (size_t)sym->st_value); + pr_warn("prog '%s': map relo failed to find map for section '%s', off %zu\n", + prog->name, sym_sec_name, (size_t)sym->st_value); return -LIBBPF_ERRNO__RELOC; } reloc_desc->type = RELO_LD64; @@@ -3335,22 -3211,21 +3335,22 @@@
/* global data map relocation */ if (!bpf_object__shndx_is_data(obj, shdr_idx)) { - pr_warn("bad data relo against section %u\n", shdr_idx); + pr_warn("prog '%s': bad data relo against section '%s'\n", + prog->name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } for (map_idx = 0; map_idx < nr_maps; map_idx++) { map = &obj->maps[map_idx]; if (map->libbpf_type != type) continue; - pr_debug("found data map %zd (%s, sec %d, off %zu) for insn %u\n", - map_idx, map->name, map->sec_idx, map->sec_offset, - insn_idx); + pr_debug("prog '%s': found data map %zd (%s, sec %d, off %zu) for insn %u\n", + prog->name, map_idx, map->name, map->sec_idx, + map->sec_offset, insn_idx); break; } if (map_idx >= nr_maps) { - pr_warn("data relo failed to find map for sec %u\n", - shdr_idx); + pr_warn("prog '%s': data relo failed to find map for section '%s'\n", + prog->name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; }
@@@ -3366,17 -3241,9 +3366,17 @@@ bpf_program__collect_reloc(struct bpf_p Elf_Data *data, struct bpf_object *obj) { Elf_Data *symbols = obj->efile.symbols; + const char *relo_sec_name, *sec_name; + size_t sec_idx = shdr->sh_info; int err, i, nrels;
- pr_debug("collecting relocating info for: '%s'\n", prog->section_name); + relo_sec_name = elf_sec_str(obj, shdr->sh_name); + sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx)); + if (!relo_sec_name || !sec_name) + return -EINVAL; + + pr_debug("sec '%s': collecting relocation for section(%zu) '%s'\n", + relo_sec_name, sec_idx, sec_name); nrels = shdr->sh_size / shdr->sh_entsize;
prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels); @@@ -3387,34 -3254,35 +3387,34 @@@ prog->nr_reloc = nrels;
for (i = 0; i < nrels; i++) { - const char *name; + const char *sym_name; __u32 insn_idx; GElf_Sym sym; GElf_Rel rel;
if (!gelf_getrel(data, i, &rel)) { - pr_warn("relocation: failed to get %d reloc\n", i); + pr_warn("sec '%s': failed to get relo #%d\n", relo_sec_name, i); return -LIBBPF_ERRNO__FORMAT; } if (!gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym)) { - pr_warn("relocation: symbol %"PRIx64" not found\n", - GELF_R_SYM(rel.r_info)); + pr_warn("sec '%s': symbol 0x%zx not found for relo #%d\n", + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); return -LIBBPF_ERRNO__FORMAT; } - if (rel.r_offset % sizeof(struct bpf_insn)) + if (rel.r_offset % BPF_INSN_SZ) { + pr_warn("sec '%s': invalid offset 0x%zx for relo #%d\n", + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); return -LIBBPF_ERRNO__FORMAT; + }
- insn_idx = rel.r_offset / sizeof(struct bpf_insn); - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + insn_idx = rel.r_offset / BPF_INSN_SZ; + sym_name = elf_sym_str(obj, sym.st_name) ?: "<?>";
- pr_debug("relo for shdr %u, symb %zu, value %zu, type %d, bind %d, name %d ('%s'), insn %u\n", - (__u32)sym.st_shndx, (size_t)GELF_R_SYM(rel.r_info), - (size_t)sym.st_value, GELF_ST_TYPE(sym.st_info), - GELF_ST_BIND(sym.st_info), sym.st_name, name, - insn_idx); + pr_debug("sec '%s': relo #%d: insn #%u against '%s'\n", + relo_sec_name, i, insn_idx, sym_name);
err = bpf_program__record_reloc(prog, &prog->reloc_desc[i], - insn_idx, name, &sym, &rel); + insn_idx, sym_name, &sym, &rel); if (err) return err; } @@@ -3565,14 -3433,8 +3565,14 @@@ bpf_object__probe_loading(struct bpf_ob return 0; }
-static int -bpf_object__probe_name(struct bpf_object *obj) +static int probe_fd(int fd) +{ + if (fd >= 0) + close(fd); + return fd >= 0; +} + +static int probe_kern_prog_name(void) { struct bpf_load_program_attr attr; struct bpf_insn insns[] = { @@@ -3590,10 -3452,16 +3590,10 @@@ attr.license = "GPL"; attr.name = "test"; ret = bpf_load_program_xattr(&attr, NULL, 0); - if (ret >= 0) { - obj->caps.name = 1; - close(ret); - } - - return 0; + return probe_fd(ret); }
-static int -bpf_object__probe_global_data(struct bpf_object *obj) +static int probe_kern_global_data(void) { struct bpf_load_program_attr prg_attr; struct bpf_create_map_attr map_attr; @@@ -3630,23 -3498,16 +3630,23 @@@ prg_attr.license = "GPL";
ret = bpf_load_program_xattr(&prg_attr, NULL, 0); - if (ret >= 0) { - obj->caps.global_data = 1; - close(ret); - } - close(map); - return 0; + return probe_fd(ret); +} + +static int probe_kern_btf(void) +{ + static const char strs[] = "\0int"; + __u32 types[] = { + /* int */ + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), + }; + + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); }
-static int bpf_object__probe_btf_func(struct bpf_object *obj) +static int probe_kern_btf_func(void) { static const char strs[] = "\0int\0x\0a"; /* void x(int a) {} */ @@@ -3659,12 -3520,20 +3659,12 @@@ /* FUNC x */ /* [3] */ BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), }; - int btf_fd;
- btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_func = 1; - close(btf_fd); - return 1; - } - - return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); }
-static int bpf_object__probe_btf_func_global(struct bpf_object *obj) +static int probe_kern_btf_func_global(void) { static const char strs[] = "\0int\0x\0a"; /* static void x(int a) {} */ @@@ -3677,12 -3546,20 +3677,12 @@@ /* FUNC x BTF_FUNC_GLOBAL */ /* [3] */ BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2), }; - int btf_fd;
- btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_func_global = 1; - close(btf_fd); - return 1; - } - - return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); }
-static int bpf_object__probe_btf_datasec(struct bpf_object *obj) +static int probe_kern_btf_datasec(void) { static const char strs[] = "\0x\0.data"; /* static int a; */ @@@ -3696,12 -3573,20 +3696,12 @@@ BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), BTF_VAR_SECINFO_ENC(2, 0, 4), }; - int btf_fd; - - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_datasec = 1; - close(btf_fd); - return 1; - }
- return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); }
-static int bpf_object__probe_array_mmap(struct bpf_object *obj) +static int probe_kern_array_mmap(void) { struct bpf_create_map_attr attr = { .map_type = BPF_MAP_TYPE_ARRAY, @@@ -3710,17 -3595,27 +3710,17 @@@ .value_size = sizeof(int), .max_entries = 1, }; - int fd; - - fd = bpf_create_map_xattr(&attr); - if (fd >= 0) { - obj->caps.array_mmap = 1; - close(fd); - return 1; - }
- return 0; + return probe_fd(bpf_create_map_xattr(&attr)); }
-static int -bpf_object__probe_exp_attach_type(struct bpf_object *obj) +static int probe_kern_exp_attach_type(void) { struct bpf_load_program_attr attr; struct bpf_insn insns[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }; - int fd;
memset(&attr, 0, sizeof(attr)); /* use any valid combination of program type and (optional) @@@ -3734,91 -3629,36 +3734,91 @@@ attr.insns_cnt = ARRAY_SIZE(insns); attr.license = "GPL";
- fd = bpf_load_program_xattr(&attr, NULL, 0); - if (fd >= 0) { - obj->caps.exp_attach_type = 1; - close(fd); - return 1; - } - return 0; + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); }
-static int -bpf_object__probe_caps(struct bpf_object *obj) -{ - int (*probe_fn[])(struct bpf_object *obj) = { - bpf_object__probe_name, - bpf_object__probe_global_data, - bpf_object__probe_btf_func, - bpf_object__probe_btf_func_global, - bpf_object__probe_btf_datasec, - bpf_object__probe_array_mmap, - bpf_object__probe_exp_attach_type, +static int probe_kern_probe_read_kernel(void) +{ + struct bpf_load_program_attr attr; + struct bpf_insn insns[] = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */ + BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */ + BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */ + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), + BPF_EXIT_INSN(), }; - int i, ret;
- for (i = 0; i < ARRAY_SIZE(probe_fn); i++) { - ret = probe_fn[i](obj); - if (ret < 0) - pr_debug("Probe #%d failed with %d.\n", i, ret); + memset(&attr, 0, sizeof(attr)); + attr.prog_type = BPF_PROG_TYPE_KPROBE; + attr.insns = insns; + attr.insns_cnt = ARRAY_SIZE(insns); + attr.license = "GPL"; + + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); +} + +enum kern_feature_result { + FEAT_UNKNOWN = 0, + FEAT_SUPPORTED = 1, + FEAT_MISSING = 2, +}; + +typedef int (*feature_probe_fn)(void); + +static struct kern_feature_desc { + const char *desc; + feature_probe_fn probe; + enum kern_feature_result res; +} feature_probes[__FEAT_CNT] = { + [FEAT_PROG_NAME] = { + "BPF program name", probe_kern_prog_name, + }, + [FEAT_GLOBAL_DATA] = { + "global variables", probe_kern_global_data, + }, + [FEAT_BTF] = { + "minimal BTF", probe_kern_btf, + }, + [FEAT_BTF_FUNC] = { + "BTF functions", probe_kern_btf_func, + }, + [FEAT_BTF_GLOBAL_FUNC] = { + "BTF global function", probe_kern_btf_func_global, + }, + [FEAT_BTF_DATASEC] = { + "BTF data section and variable", probe_kern_btf_datasec, + }, + [FEAT_ARRAY_MMAP] = { + "ARRAY map mmap()", probe_kern_array_mmap, + }, + [FEAT_EXP_ATTACH_TYPE] = { + "BPF_PROG_LOAD expected_attach_type attribute", + probe_kern_exp_attach_type, + }, + [FEAT_PROBE_READ_KERN] = { + "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, } +};
- return 0; +static bool kernel_supports(enum kern_feature_id feat_id) +{ + struct kern_feature_desc *feat = &feature_probes[feat_id]; + int ret; + + if (READ_ONCE(feat->res) == FEAT_UNKNOWN) { + ret = feat->probe(); + if (ret > 0) { + WRITE_ONCE(feat->res, FEAT_SUPPORTED); + } else if (ret == 0) { + WRITE_ONCE(feat->res, FEAT_MISSING); + } else { + pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret); + WRITE_ONCE(feat->res, FEAT_MISSING); + } + } + + return READ_ONCE(feat->res) == FEAT_SUPPORTED; }
static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd) @@@ -3920,7 -3760,7 +3920,7 @@@ static int bpf_object__create_map(struc
memset(&create_attr, 0, sizeof(create_attr));
- if (obj->caps.name) + if (kernel_supports(FEAT_PROG_NAME)) create_attr.name = map->name; create_attr.map_ifindex = map->map_ifindex; create_attr.map_type = def->type; @@@ -4171,10 -4011,6 +4171,10 @@@ struct bpf_core_spec const struct btf *btf; /* high-level spec: named fields and array indices only */ struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; + /* original unresolved (no skip_mods_or_typedefs) root type ID */ + __u32 root_type_id; + /* CO-RE relocation kind */ + enum bpf_core_relo_kind relo_kind; /* high-level spec length */ int len; /* raw, low-level spec: 1-to-1 with accessor spec string */ @@@ -4205,66 -4041,8 +4205,66 @@@ static bool is_flex_arr(const struct bt return acc->idx == btf_vlen(t) - 1; }
+static const char *core_relo_kind_str(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_FIELD_BYTE_OFFSET: return "byte_off"; + case BPF_FIELD_BYTE_SIZE: return "byte_sz"; + case BPF_FIELD_EXISTS: return "field_exists"; + case BPF_FIELD_SIGNED: return "signed"; + case BPF_FIELD_LSHIFT_U64: return "lshift_u64"; + case BPF_FIELD_RSHIFT_U64: return "rshift_u64"; + case BPF_TYPE_ID_LOCAL: return "local_type_id"; + case BPF_TYPE_ID_TARGET: return "target_type_id"; + case BPF_TYPE_EXISTS: return "type_exists"; + case BPF_TYPE_SIZE: return "type_size"; + case BPF_ENUMVAL_EXISTS: return "enumval_exists"; + case BPF_ENUMVAL_VALUE: return "enumval_value"; + default: return "unknown"; + } +} + +static bool core_relo_is_field_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_FIELD_BYTE_OFFSET: + case BPF_FIELD_BYTE_SIZE: + case BPF_FIELD_EXISTS: + case BPF_FIELD_SIGNED: + case BPF_FIELD_LSHIFT_U64: + case BPF_FIELD_RSHIFT_U64: + return true; + default: + return false; + } +} + +static bool core_relo_is_type_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_TYPE_ID_LOCAL: + case BPF_TYPE_ID_TARGET: + case BPF_TYPE_EXISTS: + case BPF_TYPE_SIZE: + return true; + default: + return false; + } +} + +static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_ENUMVAL_EXISTS: + case BPF_ENUMVAL_VALUE: + return true; + default: + return false; + } +} + /* - * Turn bpf_field_reloc into a low- and high-level spec representation, + * Turn bpf_core_relo into a low- and high-level spec representation, * validating correctness along the way, as well as calculating resulting * field bit offset, specified by accessor string. Low-level spec captures * every single level of nestedness, including traversing anonymous @@@ -4293,17 -4071,10 +4293,17 @@@ * - field 'a' access (corresponds to '2' in low-level spec); * - array element #3 access (corresponds to '3' in low-level spec). * + * Type-based relocations (TYPE_EXISTS/TYPE_SIZE, + * TYPE_ID_LOCAL/TYPE_ID_TARGET) don't capture any field information. Their + * spec and raw_spec are kept empty. + * + * Enum value-based relocations (ENUMVAL_EXISTS/ENUMVAL_VALUE) use access + * string to specify enumerator's value index that need to be relocated. */ -static int bpf_core_spec_parse(const struct btf *btf, +static int bpf_core_parse_spec(const struct btf *btf, __u32 type_id, const char *spec_str, + enum bpf_core_relo_kind relo_kind, struct bpf_core_spec *spec) { int access_idx, parsed_len, i; @@@ -4318,15 -4089,6 +4318,15 @@@
memset(spec, 0, sizeof(*spec)); spec->btf = btf; + spec->root_type_id = type_id; + spec->relo_kind = relo_kind; + + /* type-based relocations don't have a field access string */ + if (core_relo_is_type_based(relo_kind)) { + if (strcmp(spec_str, "0")) + return -EINVAL; + return 0; + }
/* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */ while (*spec_str) { @@@ -4343,28 -4105,16 +4343,28 @@@ if (spec->raw_len == 0) return -EINVAL;
- /* first spec value is always reloc type array index */ t = skip_mods_and_typedefs(btf, type_id, &id); if (!t) return -EINVAL;
access_idx = spec->raw_spec[0]; - spec->spec[0].type_id = id; - spec->spec[0].idx = access_idx; + acc = &spec->spec[0]; + acc->type_id = id; + acc->idx = access_idx; spec->len++;
+ if (core_relo_is_enumval_based(relo_kind)) { + if (!btf_is_enum(t) || spec->raw_len > 1 || access_idx >= btf_vlen(t)) + return -EINVAL; + + /* record enumerator name in a first accessor */ + acc->name = btf__name_by_offset(btf, btf_enum(t)[access_idx].name_off); + return 0; + } + + if (!core_relo_is_field_based(relo_kind)) + return -EINVAL; + sz = btf__resolve_size(btf, id); if (sz < 0) return sz; @@@ -4422,8 -4172,8 +4422,8 @@@ return sz; spec->bit_offset += access_idx * sz * 8; } else { - pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n", - type_id, spec_str, i, id, btf_kind(t)); + pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n", + type_id, spec_str, i, id, btf_kind_str(t)); return -EINVAL; } } @@@ -4473,16 -4223,16 +4473,16 @@@ static struct ids_vec *bpf_core_find_ca { size_t local_essent_len, targ_essent_len; const char *local_name, *targ_name; - const struct btf_type *t; + const struct btf_type *t, *local_t; struct ids_vec *cand_ids; __u32 *new_ids; int i, err, n;
- t = btf__type_by_id(local_btf, local_type_id); - if (!t) + local_t = btf__type_by_id(local_btf, local_type_id); + if (!local_t) return ERR_PTR(-EINVAL);
- local_name = btf__name_by_offset(local_btf, t->name_off); + local_name = btf__name_by_offset(local_btf, local_t->name_off); if (str_is_empty(local_name)) return ERR_PTR(-EINVAL); local_essent_len = bpf_core_essential_name_len(local_name); @@@ -4494,11 -4244,12 +4494,11 @@@ n = btf__get_nr_types(targ_btf); for (i = 1; i <= n; i++) { t = btf__type_by_id(targ_btf, i); - targ_name = btf__name_by_offset(targ_btf, t->name_off); - if (str_is_empty(targ_name)) + if (btf_kind(t) != btf_kind(local_t)) continue;
- t = skip_mods_and_typedefs(targ_btf, i, NULL); - if (!btf_is_composite(t) && !btf_is_array(t)) + targ_name = btf__name_by_offset(targ_btf, t->name_off); + if (str_is_empty(targ_name)) continue;
targ_essent_len = bpf_core_essential_name_len(targ_name); @@@ -4506,12 -4257,11 +4506,12 @@@ continue;
if (strncmp(local_name, targ_name, local_essent_len) == 0) { - pr_debug("[%d] %s: found candidate [%d] %s\n", - local_type_id, local_name, i, targ_name); - new_ids = reallocarray(cand_ids->data, - cand_ids->len + 1, - sizeof(*cand_ids->data)); + pr_debug("CO-RE relocating [%d] %s %s: found target candidate [%d] %s %s\n", + local_type_id, btf_kind_str(local_t), + local_name, i, btf_kind_str(t), targ_name); + new_ids = libbpf_reallocarray(cand_ids->data, + cand_ids->len + 1, + sizeof(*cand_ids->data)); if (!new_ids) { err = -ENOMEM; goto err_out; @@@ -4526,9 -4276,8 +4526,9 @@@ err_out return ERR_PTR(err); }
-/* Check two types for compatibility, skipping const/volatile/restrict and - * typedefs, to ensure we are relocating compatible entities: +/* Check two types for compatibility for the purpose of field access + * relocation. const/volatile/restrict and typedefs are skipped to ensure we + * are relocating semantically compatible entities: * - any two STRUCTs/UNIONs are compatible and can be mixed; * - any two FWDs are compatible, if their names match (modulo flavor suffix); * - any two PTRs are always compatible; @@@ -4662,119 -4411,25 +4662,119 @@@ static int bpf_core_match_member(const /* matching named field */ struct bpf_core_accessor *targ_acc;
- targ_acc = &spec->spec[spec->len++]; - targ_acc->type_id = targ_id; - targ_acc->idx = i; - targ_acc->name = targ_name; + targ_acc = &spec->spec[spec->len++]; + targ_acc->type_id = targ_id; + targ_acc->idx = i; + targ_acc->name = targ_name; + + *next_targ_id = m->type; + found = bpf_core_fields_are_compat(local_btf, + local_member->type, + targ_btf, m->type); + if (!found) + spec->len--; /* pop accessor */ + return found; + } + /* member turned out not to be what we looked for */ + spec->bit_offset -= bit_offset; + spec->raw_len--; + } + + return 0; +} + +/* Check local and target types for compatibility. This check is used for + * type-based CO-RE relocations and follow slightly different rules than + * field-based relocations. This function assumes that root types were already + * checked for name match. Beyond that initial root-level name check, names + * are completely ignored. Compatibility rules are as follows: + * - any two STRUCTs/UNIONs/FWDs/ENUMs/INTs are considered compatible, but + * kind should match for local and target types (i.e., STRUCT is not + * compatible with UNION); + * - for ENUMs, the size is ignored; + * - for INT, size and signedness are ignored; + * - for ARRAY, dimensionality is ignored, element types are checked for + * compatibility recursively; + * - CONST/VOLATILE/RESTRICT modifiers are ignored; + * - TYPEDEFs/PTRs are compatible if types they pointing to are compatible; + * - FUNC_PROTOs are compatible if they have compatible signature: same + * number of input args and compatible return and argument types. + * These rules are not set in stone and probably will be adjusted as we get + * more experience with using BPF CO-RE relocations. + */ +static int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, + const struct btf *targ_btf, __u32 targ_id) +{ + const struct btf_type *local_type, *targ_type; + int depth = 32; /* max recursion depth */ + + /* caller made sure that names match (ignoring flavor suffix) */ + local_type = btf__type_by_id(local_btf, local_id); + targ_type = btf__type_by_id(targ_btf, targ_id); + if (btf_kind(local_type) != btf_kind(targ_type)) + return 0; + +recur: + depth--; + if (depth < 0) + return -EINVAL;
- *next_targ_id = m->type; - found = bpf_core_fields_are_compat(local_btf, - local_member->type, - targ_btf, m->type); - if (!found) - spec->len--; /* pop accessor */ - return found; + local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id); + targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id); + if (!local_type || !targ_type) + return -EINVAL; + + if (btf_kind(local_type) != btf_kind(targ_type)) + return 0; + + switch (btf_kind(local_type)) { + case BTF_KIND_UNKN: + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + case BTF_KIND_ENUM: + case BTF_KIND_FWD: + return 1; + case BTF_KIND_INT: + /* just reject deprecated bitfield-like integers; all other + * integers are by default compatible between each other + */ + return btf_int_offset(local_type) == 0 && btf_int_offset(targ_type) == 0; + case BTF_KIND_PTR: + local_id = local_type->type; + targ_id = targ_type->type; + goto recur; + case BTF_KIND_ARRAY: + local_id = btf_array(local_type)->type; + targ_id = btf_array(targ_type)->type; + goto recur; + case BTF_KIND_FUNC_PROTO: { + struct btf_param *local_p = btf_params(local_type); + struct btf_param *targ_p = btf_params(targ_type); + __u16 local_vlen = btf_vlen(local_type); + __u16 targ_vlen = btf_vlen(targ_type); + int i, err; + + if (local_vlen != targ_vlen) + return 0; + + for (i = 0; i < local_vlen; i++, local_p++, targ_p++) { + skip_mods_and_typedefs(local_btf, local_p->type, &local_id); + skip_mods_and_typedefs(targ_btf, targ_p->type, &targ_id); + err = bpf_core_types_are_compat(local_btf, local_id, targ_btf, targ_id); + if (err <= 0) + return err; } - /* member turned out not to be what we looked for */ - spec->bit_offset -= bit_offset; - spec->raw_len--; - }
- return 0; + /* tail recurse for return type check */ + skip_mods_and_typedefs(local_btf, local_type->type, &local_id); + skip_mods_and_typedefs(targ_btf, targ_type->type, &targ_id); + goto recur; + } + default: + pr_warn("unexpected kind %s relocated, local [%d], target [%d]\n", + btf_kind_str(local_type), local_id, targ_id); + return 0; + } }
/* @@@ -4792,51 -4447,10 +4792,51 @@@ static int bpf_core_spec_match(struct b
memset(targ_spec, 0, sizeof(*targ_spec)); targ_spec->btf = targ_btf; + targ_spec->root_type_id = targ_id; + targ_spec->relo_kind = local_spec->relo_kind; + + if (core_relo_is_type_based(local_spec->relo_kind)) { + return bpf_core_types_are_compat(local_spec->btf, + local_spec->root_type_id, + targ_btf, targ_id); + }
local_acc = &local_spec->spec[0]; targ_acc = &targ_spec->spec[0];
+ if (core_relo_is_enumval_based(local_spec->relo_kind)) { + size_t local_essent_len, targ_essent_len; + const struct btf_enum *e; + const char *targ_name; + + /* has to resolve to an enum */ + targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, &targ_id); + if (!btf_is_enum(targ_type)) + return 0; + + local_essent_len = bpf_core_essential_name_len(local_acc->name); + + for (i = 0, e = btf_enum(targ_type); i < btf_vlen(targ_type); i++, e++) { + targ_name = btf__name_by_offset(targ_spec->btf, e->name_off); + targ_essent_len = bpf_core_essential_name_len(targ_name); + if (targ_essent_len != local_essent_len) + continue; + if (strncmp(local_acc->name, targ_name, local_essent_len) == 0) { + targ_acc->type_id = targ_id; + targ_acc->idx = i; + targ_acc->name = targ_name; + targ_spec->len++; + targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx; + targ_spec->raw_len++; + return 1; + } + } + return 0; + } + + if (!core_relo_is_field_based(local_spec->relo_kind)) + return -EINVAL; + for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) { targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, &targ_id); @@@ -4893,29 -4507,18 +4893,29 @@@ }
static int bpf_core_calc_field_relo(const struct bpf_program *prog, - const struct bpf_field_reloc *relo, + const struct bpf_core_relo *relo, const struct bpf_core_spec *spec, __u32 *val, bool *validate) { - const struct bpf_core_accessor *acc = &spec->spec[spec->len - 1]; - const struct btf_type *t = btf__type_by_id(spec->btf, acc->type_id); + const struct bpf_core_accessor *acc; + const struct btf_type *t; __u32 byte_off, byte_sz, bit_off, bit_sz; const struct btf_member *m; const struct btf_type *mt; bool bitfield; __s64 sz;
+ if (relo->kind == BPF_FIELD_EXISTS) { + *val = spec ? 1 : 0; + return 0; + } + + if (!spec) + return -EUCLEAN; /* request instruction poisoning */ + + acc = &spec->spec[spec->len - 1]; + t = btf__type_by_id(spec->btf, acc->type_id); + /* a[n] accessor needs special handling */ if (!acc->name) { if (relo->kind == BPF_FIELD_BYTE_OFFSET) { @@@ -5001,158 -4604,21 +5001,158 @@@ break; case BPF_FIELD_EXISTS: default: - pr_warn("prog '%s': unknown relo %d at insn #%d\n", - bpf_program__title(prog, false), - relo->kind, relo->insn_off / 8); - return -EINVAL; + return -EOPNOTSUPP; + } + + return 0; +} + +static int bpf_core_calc_type_relo(const struct bpf_core_relo *relo, + const struct bpf_core_spec *spec, + __u32 *val) +{ + __s64 sz; + + /* type-based relos return zero when target type is not found */ + if (!spec) { + *val = 0; + return 0; + } + + switch (relo->kind) { + case BPF_TYPE_ID_TARGET: + *val = spec->root_type_id; + break; + case BPF_TYPE_EXISTS: + *val = 1; + break; + case BPF_TYPE_SIZE: + sz = btf__resolve_size(spec->btf, spec->root_type_id); + if (sz < 0) + return -EINVAL; + *val = sz; + break; + case BPF_TYPE_ID_LOCAL: + /* BPF_TYPE_ID_LOCAL is handled specially and shouldn't get here */ + default: + return -EOPNOTSUPP; + } + + return 0; +} + +static int bpf_core_calc_enumval_relo(const struct bpf_core_relo *relo, + const struct bpf_core_spec *spec, + __u32 *val) +{ + const struct btf_type *t; + const struct btf_enum *e; + + switch (relo->kind) { + case BPF_ENUMVAL_EXISTS: + *val = spec ? 1 : 0; + break; + case BPF_ENUMVAL_VALUE: + if (!spec) + return -EUCLEAN; /* request instruction poisoning */ + t = btf__type_by_id(spec->btf, spec->spec[0].type_id); + e = btf_enum(t) + spec->spec[0].idx; + *val = e->val; + break; + default: + return -EOPNOTSUPP; }
return 0; }
+struct bpf_core_relo_res +{ + /* expected value in the instruction, unless validate == false */ + __u32 orig_val; + /* new value that needs to be patched up to */ + __u32 new_val; + /* relocation unsuccessful, poison instruction, but don't fail load */ + bool poison; + /* some relocations can't be validated against orig_val */ + bool validate; +}; + +/* Calculate original and target relocation values, given local and target + * specs and relocation kind. These values are calculated for each candidate. + * If there are multiple candidates, resulting values should all be consistent + * with each other. Otherwise, libbpf will refuse to proceed due to ambiguity. + * If instruction has to be poisoned, *poison will be set to true. + */ +static int bpf_core_calc_relo(const struct bpf_program *prog, + const struct bpf_core_relo *relo, + int relo_idx, + const struct bpf_core_spec *local_spec, + const struct bpf_core_spec *targ_spec, + struct bpf_core_relo_res *res) +{ + int err = -EOPNOTSUPP; + + res->orig_val = 0; + res->new_val = 0; + res->poison = false; + res->validate = true; + + if (core_relo_is_field_based(relo->kind)) { + err = bpf_core_calc_field_relo(prog, relo, local_spec, &res->orig_val, &res->validate); + err = err ?: bpf_core_calc_field_relo(prog, relo, targ_spec, &res->new_val, NULL); + } else if (core_relo_is_type_based(relo->kind)) { + err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val); + err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val); + } else if (core_relo_is_enumval_based(relo->kind)) { + err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val); + err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val); + } + + if (err == -EUCLEAN) { + /* EUCLEAN is used to signal instruction poisoning request */ + res->poison = true; + err = 0; + } else if (err == -EOPNOTSUPP) { + /* EOPNOTSUPP means unknown/unsupported relocation */ + pr_warn("prog '%s': relo #%d: unrecognized CO-RE relocation %s (%d) at insn #%d\n", + bpf_program__title(prog, false), relo_idx, + core_relo_kind_str(relo->kind), relo->kind, relo->insn_off / 8); + } + + return err; +} + +/* + * Turn instruction for which CO_RE relocation failed into invalid one with + * distinct signature. + */ +static void bpf_core_poison_insn(struct bpf_program *prog, int relo_idx, + int insn_idx, struct bpf_insn *insn) +{ + pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", + bpf_program__title(prog, false), relo_idx, insn_idx); + insn->code = BPF_JMP | BPF_CALL; + insn->dst_reg = 0; + insn->src_reg = 0; + insn->off = 0; + /* if this instruction is reachable (not a dead code), + * verifier will complain with the following message: + * invalid func unknown#195896080 + */ + insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ +} + +static bool is_ldimm64(struct bpf_insn *insn) +{ + return insn->code == (BPF_LD | BPF_IMM | BPF_DW); +} + /* * Patch relocatable BPF instruction. * * Patched value is determined by relocation kind and target specification. - * For field existence relocation target spec will be NULL if field is not - * found. + * For existence relocations target spec will be NULL if field/type is not found. * Expected insn->imm value is determined using relocation kind and local * spec, and is checked before patching instruction. If actual insn->imm value * is wrong, bail out with error. @@@ -5160,43 -4626,58 +5160,43 @@@ * Currently three kinds of BPF instructions are supported: * 1. rX = <imm> (assignment with immediate operand); * 2. rX += <imm> (arithmetic operations with immediate operand); + * 3. rX = <imm64> (load with 64-bit immediate value). */ -static int bpf_core_reloc_insn(struct bpf_program *prog, - const struct bpf_field_reloc *relo, +static int bpf_core_patch_insn(struct bpf_program *prog, + const struct bpf_core_relo *relo, int relo_idx, - const struct bpf_core_spec *local_spec, - const struct bpf_core_spec *targ_spec) + const struct bpf_core_relo_res *res) { __u32 orig_val, new_val; struct bpf_insn *insn; - bool validate = true; - int insn_idx, err; + int insn_idx; __u8 class;
- if (relo->insn_off % sizeof(struct bpf_insn)) + if (relo->insn_off % BPF_INSN_SZ) return -EINVAL; - insn_idx = relo->insn_off / sizeof(struct bpf_insn); + insn_idx = relo->insn_off / BPF_INSN_SZ; insn = &prog->insns[insn_idx]; class = BPF_CLASS(insn->code);
- if (relo->kind == BPF_FIELD_EXISTS) { - orig_val = 1; /* can't generate EXISTS relo w/o local field */ - new_val = targ_spec ? 1 : 0; - } else if (!targ_spec) { - pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", - bpf_program__title(prog, false), relo_idx, insn_idx); - insn->code = BPF_JMP | BPF_CALL; - insn->dst_reg = 0; - insn->src_reg = 0; - insn->off = 0; - /* if this instruction is reachable (not a dead code), - * verifier will complain with the following message: - * invalid func unknown#195896080 + if (res->poison) { + /* poison second part of ldimm64 to avoid confusing error from + * verifier about "unknown opcode 00" */ - insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ + if (is_ldimm64(insn)) + bpf_core_poison_insn(prog, relo_idx, insn_idx + 1, insn + 1); + bpf_core_poison_insn(prog, relo_idx, insn_idx, insn); return 0; - } else { - err = bpf_core_calc_field_relo(prog, relo, local_spec, - &orig_val, &validate); - if (err) - return err; - err = bpf_core_calc_field_relo(prog, relo, targ_spec, - &new_val, NULL); - if (err) - return err; }
+ orig_val = res->orig_val; + new_val = res->new_val; + switch (class) { case BPF_ALU: case BPF_ALU64: if (BPF_SRC(insn->code) != BPF_K) return -EINVAL; - if (validate && insn->imm != orig_val) { + if (res->validate && insn->imm != orig_val) { pr_warn("prog '%s': relo #%d: unexpected insn #%d (ALU/ALU64) value: got %u, exp %u -> %u\n", bpf_program__title(prog, false), relo_idx, insn_idx, insn->imm, orig_val, new_val); @@@ -5211,8 -4692,8 +5211,8 @@@ case BPF_LDX: case BPF_ST: case BPF_STX: - if (validate && insn->off != orig_val) { - pr_warn("prog '%s': relo #%d: unexpected insn #%d (LD/LDX/ST/STX) value: got %u, exp %u -> %u\n", + if (res->validate && insn->off != orig_val) { + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDX/ST/STX) value: got %u, exp %u -> %u\n", bpf_program__title(prog, false), relo_idx, insn_idx, insn->off, orig_val, new_val); return -EINVAL; @@@ -5229,37 -4710,8 +5229,37 @@@ bpf_program__title(prog, false), relo_idx, insn_idx, orig_val, new_val); break; + case BPF_LD: { + __u64 imm; + + if (!is_ldimm64(insn) || + insn[0].src_reg != 0 || insn[0].off != 0 || + insn_idx + 1 >= prog->insns_cnt || + insn[1].code != 0 || insn[1].dst_reg != 0 || + insn[1].src_reg != 0 || insn[1].off != 0) { + pr_warn("prog '%s': relo #%d: insn #%d (LDIMM64) has unexpected form\n", + bpf_program__title(prog, false), relo_idx, insn_idx); + return -EINVAL; + } + + imm = insn[0].imm + ((__u64)insn[1].imm << 32); + if (res->validate && imm != orig_val) { + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDIMM64) value: got %llu, exp %u -> %u\n", + bpf_program__title(prog, false), relo_idx, + insn_idx, (unsigned long long)imm, + orig_val, new_val); + return -EINVAL; + } + + insn[0].imm = new_val; + insn[1].imm = 0; /* currently only 32-bit values are supported */ + pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %u\n", + bpf_program__title(prog, false), relo_idx, insn_idx, + (unsigned long long)imm, new_val); + break; + } default: - pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n", + pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:0x%x, src:0x%x, dst:0x%x, off:0x%x, imm:0x%x\n", bpf_program__title(prog, false), relo_idx, insn_idx, insn->code, insn->src_reg, insn->dst_reg, insn->off, insn->imm); @@@ -5276,48 -4728,29 +5276,48 @@@ static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec) { const struct btf_type *t; + const struct btf_enum *e; const char *s; __u32 type_id; int i;
- type_id = spec->spec[0].type_id; + type_id = spec->root_type_id; t = btf__type_by_id(spec->btf, type_id); s = btf__name_by_offset(spec->btf, t->name_off); - libbpf_print(level, "[%u] %s + ", type_id, s);
- for (i = 0; i < spec->raw_len; i++) - libbpf_print(level, "%d%s", spec->raw_spec[i], - i == spec->raw_len - 1 ? " => " : ":"); + libbpf_print(level, "[%u] %s %s", type_id, btf_kind_str(t), str_is_empty(s) ? "<anon>" : s);
- libbpf_print(level, "%u.%u @ &x", - spec->bit_offset / 8, spec->bit_offset % 8); + if (core_relo_is_type_based(spec->relo_kind)) + return;
- for (i = 0; i < spec->len; i++) { - if (spec->spec[i].name) - libbpf_print(level, ".%s", spec->spec[i].name); - else - libbpf_print(level, "[%u]", spec->spec[i].idx); + if (core_relo_is_enumval_based(spec->relo_kind)) { + t = skip_mods_and_typedefs(spec->btf, type_id, NULL); + e = btf_enum(t) + spec->raw_spec[0]; + s = btf__name_by_offset(spec->btf, e->name_off); + + libbpf_print(level, "::%s = %u", s, e->val); + return; }
+ if (core_relo_is_field_based(spec->relo_kind)) { + for (i = 0; i < spec->len; i++) { + if (spec->spec[i].name) + libbpf_print(level, ".%s", spec->spec[i].name); + else if (i > 0 || spec->spec[i].idx > 0) + libbpf_print(level, "[%u]", spec->spec[i].idx); + } + + libbpf_print(level, " ("); + for (i = 0; i < spec->raw_len; i++) + libbpf_print(level, "%s%d", i == 0 ? "" : ":", spec->raw_spec[i]); + + if (spec->bit_offset % 8) + libbpf_print(level, " @ offset %u.%u)", + spec->bit_offset / 8, spec->bit_offset % 8); + else + libbpf_print(level, " @ offset %u)", spec->bit_offset / 8); + return; + } }
static size_t bpf_core_hash_fn(const void *key, void *ctx) @@@ -5381,23 -4814,22 +5381,23 @@@ static void *u32_as_hash_key(__u32 x * CPU-wise compared to prebuilding a map from all local type names to * a list of candidate type names. It's also sped up by caching resolved * list of matching candidates per each local "root" type ID, that has at - * least one bpf_field_reloc associated with it. This list is shared + * least one bpf_core_relo associated with it. This list is shared * between multiple relocations for the same type ID and is updated as some * of the candidates are pruned due to structural incompatibility. */ -static int bpf_core_reloc_field(struct bpf_program *prog, - const struct bpf_field_reloc *relo, - int relo_idx, - const struct btf *local_btf, - const struct btf *targ_btf, - struct hashmap *cand_cache) +static int bpf_core_apply_relo(struct bpf_program *prog, + const struct bpf_core_relo *relo, + int relo_idx, + const struct btf *local_btf, + const struct btf *targ_btf, + struct hashmap *cand_cache) { const char *prog_name = bpf_program__title(prog, false); - struct bpf_core_spec local_spec, cand_spec, targ_spec; + struct bpf_core_spec local_spec, cand_spec, targ_spec = {}; const void *type_key = u32_as_hash_key(relo->type_id); - const struct btf_type *local_type, *cand_type; - const char *local_name, *cand_name; + struct bpf_core_relo_res cand_res, targ_res; + const struct btf_type *local_type; + const char *local_name; struct ids_vec *cand_ids; __u32 local_id, cand_id; const char *spec_str; @@@ -5409,49 -4841,32 +5409,49 @@@ return -EINVAL;
local_name = btf__name_by_offset(local_btf, local_type->name_off); - if (str_is_empty(local_name)) + if (!local_name) return -EINVAL;
spec_str = btf__name_by_offset(local_btf, relo->access_str_off); if (str_is_empty(spec_str)) return -EINVAL;
- err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec); + err = bpf_core_parse_spec(local_btf, local_id, spec_str, relo->kind, &local_spec); if (err) { - pr_warn("prog '%s': relo #%d: parsing [%d] %s + %s failed: %d\n", - prog_name, relo_idx, local_id, local_name, spec_str, - err); + pr_warn("prog '%s': relo #%d: parsing [%d] %s %s + %s failed: %d\n", + prog_name, relo_idx, local_id, btf_kind_str(local_type), + str_is_empty(local_name) ? "<anon>" : local_name, + spec_str, err); return -EINVAL; }
- pr_debug("prog '%s': relo #%d: kind %d, spec is ", prog_name, relo_idx, - relo->kind); + pr_debug("prog '%s': relo #%d: kind <%s> (%d), spec is ", prog_name, + relo_idx, core_relo_kind_str(relo->kind), relo->kind); bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec); libbpf_print(LIBBPF_DEBUG, "\n");
+ /* TYPE_ID_LOCAL relo is special and doesn't need candidate search */ + if (relo->kind == BPF_TYPE_ID_LOCAL) { + targ_res.validate = true; + targ_res.poison = false; + targ_res.orig_val = local_spec.root_type_id; + targ_res.new_val = local_spec.root_type_id; + goto patch_insn; + } + + /* libbpf doesn't support candidate search for anonymous types */ + if (str_is_empty(spec_str)) { + pr_warn("prog '%s': relo #%d: <%s> (%d) relocation doesn't support anonymous types\n", + prog_name, relo_idx, core_relo_kind_str(relo->kind), relo->kind); + return -EOPNOTSUPP; + } + if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) { cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf); if (IS_ERR(cand_ids)) { - pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s: %ld", - prog_name, relo_idx, local_id, local_name, - PTR_ERR(cand_ids)); + pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s %s: %ld", + prog_name, relo_idx, local_id, btf_kind_str(local_type), + local_name, PTR_ERR(cand_ids)); return PTR_ERR(cand_ids); } err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL); @@@ -5463,51 -4878,36 +5463,51 @@@
for (i = 0, j = 0; i < cand_ids->len; i++) { cand_id = cand_ids->data[i]; - cand_type = btf__type_by_id(targ_btf, cand_id); - cand_name = btf__name_by_offset(targ_btf, cand_type->name_off); - - err = bpf_core_spec_match(&local_spec, targ_btf, - cand_id, &cand_spec); - pr_debug("prog '%s': relo #%d: matching candidate #%d %s against spec ", - prog_name, relo_idx, i, cand_name); - bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); - libbpf_print(LIBBPF_DEBUG, ": %d\n", err); + err = bpf_core_spec_match(&local_spec, targ_btf, cand_id, &cand_spec); if (err < 0) { - pr_warn("prog '%s': relo #%d: matching error: %d\n", - prog_name, relo_idx, err); + pr_warn("prog '%s': relo #%d: error matching candidate #%d ", + prog_name, relo_idx, i); + bpf_core_dump_spec(LIBBPF_WARN, &cand_spec); + libbpf_print(LIBBPF_WARN, ": %d\n", err); return err; } + + pr_debug("prog '%s': relo #%d: %s candidate #%d ", prog_name, + relo_idx, err == 0 ? "non-matching" : "matching", i); + bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); + libbpf_print(LIBBPF_DEBUG, "\n"); + if (err == 0) continue;
+ err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, &cand_spec, &cand_res); + if (err) + return err; + if (j == 0) { + targ_res = cand_res; targ_spec = cand_spec; } else if (cand_spec.bit_offset != targ_spec.bit_offset) { - /* if there are many candidates, they should all - * resolve to the same bit offset + /* if there are many field relo candidates, they + * should all resolve to the same bit offset */ - pr_warn("prog '%s': relo #%d: offset ambiguity: %u != %u\n", + pr_warn("prog '%s': relo #%d: field offset ambiguity: %u != %u\n", prog_name, relo_idx, cand_spec.bit_offset, targ_spec.bit_offset); return -EINVAL; + } else if (cand_res.poison != targ_res.poison || cand_res.new_val != targ_res.new_val) { + /* all candidates should result in the same relocation + * decision and value, otherwise it's dangerous to + * proceed due to ambiguity + */ + pr_warn("prog '%s': relo #%d: relocation decision ambiguity: %s %u != %s %u\n", + prog_name, relo_idx, + cand_res.poison ? "failure" : "success", cand_res.new_val, + targ_res.poison ? "failure" : "success", targ_res.new_val); + return -EINVAL; }
- cand_ids->data[j++] = cand_spec.spec[0].type_id; + cand_ids->data[j++] = cand_spec.root_type_id; }
/* @@@ -5526,25 -4926,19 +5526,25 @@@ * as well as expected case, depending whether instruction w/ * relocation is guarded in some way that makes it unreachable (dead * code) if relocation can't be resolved. This is handled in - * bpf_core_reloc_insn() uniformly by replacing that instruction with + * bpf_core_patch_insn() uniformly by replacing that instruction with * BPF helper call insn (using invalid helper ID). If that instruction * is indeed unreachable, then it will be ignored and eliminated by * verifier. If it was an error, then verifier will complain and point * to a specific instruction number in its log. */ - if (j == 0) - pr_debug("prog '%s': relo #%d: no matching targets found for [%d] %s + %s\n", - prog_name, relo_idx, local_id, local_name, spec_str); + if (j == 0) { + pr_debug("prog '%s': relo #%d: no matching targets found\n", + prog_name, relo_idx); + + /* calculate single target relo result explicitly */ + err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, NULL, &targ_res); + if (err) + return err; + }
- /* bpf_core_reloc_insn should know how to handle missing targ_spec */ - err = bpf_core_reloc_insn(prog, relo, relo_idx, &local_spec, - j ? &targ_spec : NULL); +patch_insn: + /* bpf_core_patch_insn() should know how to handle missing targ_spec */ + err = bpf_core_patch_insn(prog, relo, relo_idx, &targ_res); if (err) { pr_warn("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n", prog_name, relo_idx, relo->insn_off, err); @@@ -5555,10 -4949,10 +5555,10 @@@ }
static int -bpf_core_reloc_fields(struct bpf_object *obj, const char *targ_btf_path) +bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) { const struct btf_ext_info_sec *sec; - const struct bpf_field_reloc *rec; + const struct bpf_core_relo *rec; const struct btf_ext_info *seg; struct hashmap_entry *entry; struct hashmap *cand_cache = NULL; @@@ -5567,9 -4961,6 +5567,9 @@@ const char *sec_name; int i, err = 0;
+ if (obj->btf_ext->core_relo_info.len == 0) + return 0; + if (targ_btf_path) targ_btf = btf__parse_elf(targ_btf_path, NULL); else @@@ -5585,7 -4976,7 +5585,7 @@@ goto out; }
- seg = &obj->btf_ext->field_reloc_info; + seg = &obj->btf_ext->core_relo_info; for_each_btf_ext_sec(seg, sec) { sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off); if (str_is_empty(sec_name)) { @@@ -5606,15 -4997,15 +5606,15 @@@ goto out; }
- pr_debug("prog '%s': performing %d CO-RE offset relocs\n", + pr_debug("sec '%s': found %d CO-RE relocations\n", sec_name, sec->num_info);
for_each_btf_ext_rec(seg, sec, i, rec) { - err = bpf_core_reloc_field(prog, rec, i, obj->btf, - targ_btf, cand_cache); + err = bpf_core_apply_relo(prog, rec, i, obj->btf, + targ_btf, cand_cache); if (err) { pr_warn("prog '%s': relo #%d: failed to relocate: %d\n", - sec_name, i, err); + prog->name, i, err); goto out; } } @@@ -5633,6 -5024,17 +5633,6 @@@ out return err; }
-static int -bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) -{ - int err = 0; - - if (obj->btf_ext->field_reloc_info.len) - err = bpf_core_reloc_fields(obj, targ_btf_path); - - return err; -} - static int bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj, struct reloc_desc *relo) @@@ -5649,7 -5051,7 +5649,7 @@@ return -LIBBPF_ERRNO__RELOC; } new_cnt = prog->insns_cnt + text->insns_cnt; - new_insn = reallocarray(prog->insns, new_cnt, sizeof(*insn)); + new_insn = libbpf_reallocarray(prog->insns, new_cnt, sizeof(*insn)); if (!new_insn) { pr_warn("oom in prog realloc\n"); return -ENOMEM; @@@ -5734,8 -5136,7 +5734,8 @@@ bpf_program__relocate(struct bpf_progra return err; break; default: - pr_warn("relo #%d: bad relo type %d\n", i, relo->type); + pr_warn("prog '%s': relo #%d: bad relo type %d\n", + prog->name, i, relo->type); return -EINVAL; } } @@@ -5770,8 -5171,7 +5770,8 @@@ bpf_object__relocate(struct bpf_object
err = bpf_program__relocate(prog, obj); if (err) { - pr_warn("failed to relocate '%s'\n", prog->section_name); + pr_warn("prog '%s': failed to relocate data references: %d\n", + prog->name, err); return err; } break; @@@ -5786,8 -5186,7 +5786,8 @@@
err = bpf_program__relocate(prog, obj); if (err) { - pr_warn("failed to relocate '%s'\n", prog->section_name); + pr_warn("prog '%s': failed to relocate calls: %d\n", + prog->name, err); return err; } } @@@ -5804,8 -5203,8 +5804,8 @@@ static int bpf_object__collect_map_relo int i, j, nrels, new_sz; const struct btf_var_secinfo *vi = NULL; const struct btf_type *sec, *var, *def; + struct bpf_map *map = NULL, *targ_map; const struct btf_member *member; - struct bpf_map *map, *targ_map; const char *name, *mname; Elf_Data *symbols; unsigned int moff; @@@ -5831,7 -5230,8 +5831,7 @@@ i, (size_t)GELF_R_SYM(rel.r_info)); return -LIBBPF_ERRNO__FORMAT; } - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; if (sym.st_shndx != obj->efile.btf_maps_shndx) { pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n", i, name); @@@ -5893,7 -5293,7 +5893,7 @@@ moff /= bpf_ptr_sz; if (moff >= map->init_slots_sz) { new_sz = moff + 1; - tmp = realloc(map->init_slots, new_sz * host_ptr_sz); + tmp = libbpf_reallocarray(map->init_slots, new_sz, host_ptr_sz); if (!tmp) return -ENOMEM; map->init_slots = tmp; @@@ -5948,51 -5348,6 +5948,51 @@@ static int bpf_object__collect_reloc(st return 0; }
+static bool insn_is_helper_call(struct bpf_insn *insn, enum bpf_func_id *func_id) +{ + if (BPF_CLASS(insn->code) == BPF_JMP && + BPF_OP(insn->code) == BPF_CALL && + BPF_SRC(insn->code) == BPF_K && + insn->src_reg == 0 && + insn->dst_reg == 0) { + *func_id = insn->imm; + return true; + } + return false; +} + +static int bpf_object__sanitize_prog(struct bpf_object* obj, struct bpf_program *prog) +{ + struct bpf_insn *insn = prog->insns; + enum bpf_func_id func_id; + int i; + + for (i = 0; i < prog->insns_cnt; i++, insn++) { + if (!insn_is_helper_call(insn, &func_id)) + continue; + + /* on kernels that don't yet support + * bpf_probe_read_{kernel,user}[_str] helpers, fall back + * to bpf_probe_read() which works well for old kernels + */ + switch (func_id) { + case BPF_FUNC_probe_read_kernel: + case BPF_FUNC_probe_read_user: + if (!kernel_supports(FEAT_PROBE_READ_KERN)) + insn->imm = BPF_FUNC_probe_read; + break; + case BPF_FUNC_probe_read_kernel_str: + case BPF_FUNC_probe_read_user_str: + if (!kernel_supports(FEAT_PROBE_READ_KERN)) + insn->imm = BPF_FUNC_probe_read_str; + break; + default: + break; + } + } + return 0; +} + static int load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, char *license, __u32 kern_version, int *pfd) @@@ -6009,12 -5364,12 +6009,12 @@@ memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); load_attr.prog_type = prog->type; /* old kernels might not support specifying expected_attach_type */ - if (!prog->caps->exp_attach_type && prog->sec_def && + if (!kernel_supports(FEAT_EXP_ATTACH_TYPE) && prog->sec_def && prog->sec_def->is_exp_attach_type_optional) load_attr.expected_attach_type = 0; else load_attr.expected_attach_type = prog->expected_attach_type; - if (prog->caps->name) + if (kernel_supports(FEAT_PROG_NAME)) load_attr.name = prog->name; load_attr.insns = insns; load_attr.insns_cnt = insns_cnt; @@@ -6032,7 -5387,7 +6032,7 @@@ } /* specify func_info/line_info only if kernel supports them */ btf_fd = bpf_object__btf_fd(prog->obj); - if (btf_fd >= 0 && prog->obj->caps.btf_func) { + if (btf_fd >= 0 && kernel_supports(FEAT_BTF_FUNC)) { load_attr.prog_btf_fd = btf_fd; load_attr.func_info = prog->func_info; load_attr.func_info_rec_size = prog->func_info_rec_size; @@@ -6070,7 -5425,7 +6070,7 @@@ retry_load free(log_buf); goto retry_load; } - ret = -errno; + ret = errno ? -errno : -LIBBPF_ERRNO__LOAD; cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warn("load bpf program failed: %s\n", cp); pr_perm_msg(ret); @@@ -6207,19 -5562,13 +6207,19 @@@ bpf_object__load_progs(struct bpf_objec size_t i; int err;
+ for (i = 0; i < obj->nr_programs; i++) { + prog = &obj->programs[i]; + err = bpf_object__sanitize_prog(obj, prog); + if (err) + return err; + } + for (i = 0; i < obj->nr_programs; i++) { prog = &obj->programs[i]; if (bpf_program__is_function_storage(prog, obj)) continue; if (!prog->load) { - pr_debug("prog '%s'('%s'): skipped loading\n", - prog->name, prog->section_name); + pr_debug("prog '%s': skipped loading\n", prog->name); continue; } prog->log_level |= log_level; @@@ -6292,8 -5641,6 +6292,8 @@@ __bpf_object__open(const char *path, co /* couldn't guess, but user might manually specify */ continue;
+ if (prog->sec_def->is_sleepable) + prog->prog_flags |= BPF_F_SLEEPABLE; bpf_program__set_type(prog, prog->sec_def->prog_type); bpf_program__set_expected_attach_type(prog, prog->sec_def->expected_attach_type); @@@ -6403,11 -5750,11 +6403,11 @@@ static int bpf_object__sanitize_maps(st bpf_object__for_each_map(m, obj) { if (!bpf_map__is_internal(m)) continue; - if (!obj->caps.global_data) { + if (!kernel_supports(FEAT_GLOBAL_DATA)) { pr_warn("kernel doesn't support global data\n"); return -ENOTSUP; } - if (!obj->caps.array_mmap) + if (!kernel_supports(FEAT_ARRAY_MMAP)) m->def.map_flags ^= BPF_F_MMAPABLE; }
@@@ -6557,6 -5904,7 +6557,6 @@@ int bpf_object__load_xattr(struct bpf_o }
err = bpf_object__probe_loading(obj); - err = err ? : bpf_object__probe_caps(obj); err = err ? : bpf_object__resolve_externs(obj, obj->kconfig); err = err ? : bpf_object__sanitize_and_load_btf(obj); err = err ? : bpf_object__sanitize_maps(obj); @@@ -7365,7 -6713,7 +7365,7 @@@ int bpf_program__fd(const struct bpf_pr
size_t bpf_program__size(const struct bpf_program *prog) { - return prog->insns_cnt * sizeof(struct bpf_insn); + return prog->insns_cnt * BPF_INSN_SZ; }
int bpf_program__set_prep(struct bpf_program *prog, int nr_instances, @@@ -7562,21 -6910,6 +7562,21 @@@ static const struct bpf_sec_def section .expected_attach_type = BPF_TRACE_FEXIT, .is_attach_btf = true, .attach_fn = attach_trace), + SEC_DEF("fentry.s/", TRACING, + .expected_attach_type = BPF_TRACE_FENTRY, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), + SEC_DEF("fmod_ret.s/", TRACING, + .expected_attach_type = BPF_MODIFY_RETURN, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), + SEC_DEF("fexit.s/", TRACING, + .expected_attach_type = BPF_TRACE_FEXIT, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), SEC_DEF("freplace/", EXT, .is_attach_btf = true, .attach_fn = attach_trace), @@@ -7584,11 -6917,6 +7584,11 @@@ .is_attach_btf = true, .expected_attach_type = BPF_LSM_MAC, .attach_fn = attach_lsm), + SEC_DEF("lsm.s/", LSM, + .is_attach_btf = true, + .is_sleepable = true, + .expected_attach_type = BPF_LSM_MAC, + .attach_fn = attach_lsm), SEC_DEF("iter/", TRACING, .expected_attach_type = BPF_TRACE_ITER, .is_attach_btf = true, @@@ -7794,7 -7122,8 +7794,7 @@@ static int bpf_object__collect_st_ops_r return -LIBBPF_ERRNO__FORMAT; }
- name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; map = find_struct_ops_map_by_offset(obj, rel.r_offset); if (!map) { pr_warn("struct_ops reloc: cannot find map at rel.r_offset %zu\n", @@@ -8311,7 -7640,7 +8311,7 @@@ int bpf_prog_load_xattr(const struct bp
prog->prog_ifindex = attr->ifindex; prog->log_level = attr->log_level; - prog->prog_flags = attr->prog_flags; + prog->prog_flags |= attr->prog_flags; if (!first_prog) first_prog = prog; } @@@ -9265,7 -8594,7 +9265,7 @@@ struct perf_buffer *perf_buffer__new(in struct perf_buffer_params p = {}; struct perf_event_attr attr = { 0, };
- attr.config = PERF_COUNT_SW_BPF_OUTPUT, + attr.config = PERF_COUNT_SW_BPF_OUTPUT; attr.type = PERF_TYPE_SOFTWARE; attr.sample_type = PERF_SAMPLE_RAW; attr.sample_period = 1; @@@ -9503,11 -8832,6 +9503,11 @@@ static int perf_buffer__process_records return 0; }
+int perf_buffer__epoll_fd(const struct perf_buffer *pb) +{ + return pb->epoll_fd; +} + int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) { int i, cnt, err; @@@ -9525,55 -8849,6 +9525,55 @@@ return cnt < 0 ? -errno : cnt; }
+/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer + * manager. + */ +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb) +{ + return pb->cpu_cnt; +} + +/* + * Return perf_event FD of a ring buffer in *buf_idx* slot of + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using + * select()/poll()/epoll() Linux syscalls. + */ +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx) +{ + struct perf_cpu_buf *cpu_buf; + + if (buf_idx >= pb->cpu_cnt) + return -EINVAL; + + cpu_buf = pb->cpu_bufs[buf_idx]; + if (!cpu_buf) + return -ENOENT; + + return cpu_buf->fd; +} + +/* + * Consume data from perf ring buffer corresponding to slot *buf_idx* in + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to + * consume, do nothing and return success. + * Returns: + * - 0 on success; + * - <0 on failure. + */ +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx) +{ + struct perf_cpu_buf *cpu_buf; + + if (buf_idx >= pb->cpu_cnt) + return -EINVAL; + + cpu_buf = pb->cpu_bufs[buf_idx]; + if (!cpu_buf) + return -ENOENT; + + return perf_buffer__process_records(pb, cpu_buf); +} + int perf_buffer__consume(struct perf_buffer *pb) { int i, err; @@@ -9586,7 -8861,7 +9586,7 @@@
err = perf_buffer__process_records(pb, cpu_buf); if (err) { - pr_warn("error while processing records: %d\n", err); + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err); return err; } }