Commit Graph

126905 Commits

Author SHA1 Message Date
Linus Torvalds
70cd33d34c Merge tag 'efi-next-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ard Biesheuvel via Borislav Petkov:
 "A few cleanups left and right, some of which were part of a initrd
  measured boot series that needs some more work, and so only the
  cleanup patches have been included for this release"

* tag 'efi-next-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/arm64: Update debug prints to reflect other entropy sources
  efi: x86: clean up previous struct mm switching
  efi: x86: move mixed mode stack PA variable out of 'efi_scratch'
  efi/libstub: move TPM related prototypes into efistub.h
  efi/libstub: fix prototype of efi_tcg2_protocol::get_event_log()
  efi/libstub: whitespace cleanup
  efi: ia64: move IA64-only declarations to new asm/efi.h header
2021-02-20 19:09:26 -08:00
Linus Torvalds
e767b3530a Merge tag 'arm-drivers-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
 "Updates for SoC specific drivers include a few subsystems that have
  their own maintainers but send them through the soc tree:

  SCMI firmware:
   - add support for a completion interrupt

  Reset controllers:
   - new driver for BCM4908
   - new devm_reset_control_get_optional_exclusive_released() function

  Memory controllers:
   - Renesas RZ/G2 support
   - Tegra124 interconnect support
   - Allow more drivers to be loadable modules

  TEE/optee firmware:
   - minor code cleanup

  The other half of this is SoC specific drivers that do not belong into
  any other subsystem, most of them living in drivers/soc:

   - Allwinner/sunxi power management work
   - Allwinner H616 support

   - ASpeed AST2600 system identification support

   - AT91 SAMA7G5 SoC ID driver
   - AT91 SoC driver cleanups

   - Broadcom BCM4908 power management bus support

   - Marvell mbus cleanups

   - Mediatek MT8167 power domain support

   - Qualcomm socinfo driver support for PMIC
   - Qualcomm SoC identification for many more products

   - TI Keystone driver cleanups for PRUSS and elsewhere"

* tag 'arm-drivers-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (89 commits)
  soc: aspeed: socinfo: Add new systems
  soc: aspeed: snoop: Add clock control logic
  memory: tegra186-emc: Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE
  memory: samsung: exynos5422-dmc: Correct function names in kerneldoc
  memory: ti-emif-pm: Drop of_match_ptr from of_device_id table
  optee: simplify i2c access
  drivers: soc: atmel: fix type for same7
  tee: optee: remove need_resched() before cond_resched()
  soc: qcom: ocmem: don't return NULL in of_get_ocmem
  optee: sync OP-TEE headers
  tee: optee: fix 'physical' typos
  drivers: optee: use flexible-array member instead of zero-length array
  tee: fix some comment typos in header files
  soc: ti: k3-ringacc: Use of_device_get_match_data()
  soc: ti: pruss: Refactor the CFG sub-module init
  soc: mediatek: pm-domains: Don't print an error if child domain is deferred
  soc: mediatek: pm-domains: Add domain regulator supply
  dt-bindings: power: Add domain regulator supply
  soc: mediatek: cmdq: Remove cmdq_pkt_flush()
  soc: mediatek: pm-domains: Add support for mt8167
  ...
2021-02-20 18:42:28 -08:00
Linus Torvalds
82851fce61 Merge tag 'arm-dt-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC devicetree updates from Arnd Bergmann:
 "After the last release contained a surprising amount of new 32-bit
  machines, this time two thirds of the code changes are for 64-bit.

  The usual updates to existing files include:

   - Device tree compiler warning fixes for Berlin, Renesas, SoCFPGA,
     nomadik, stm32, Allwinner, TI Keystone

   - Support for additional devices on existing machines on Renesas,
     SoCFPGA, at91, hisilicon, OMAP, Tegra, TI K3, Allwinner, Broadcom,
     ux500, Mediatek, Marvell Armada, Marvell MMP, ZynqMP, AMLogic,
     Qualcomm, i.MX, Layerscape, Actions, ASpeed, Toshiba

   - Cleanups and minor fixes for Renesas, at91, mstar, ux500, Samsung,
     stm32, Tegra, Broadcom, Mediatek, Marvell MMP, AMLogic, Qualcomm,
     i.MX, Rockchip, ASpeed, Zynq

  Only three new SoCs this time, but a number of boards across:

  Renesas:
   - Two Beacon EmbeddedWorks boards (RZ/G2H and RZ/G2N based)

  Intel SoCFPGA:
   - eASIC N5X board (N5X)

  ST-Ericsson Ux500:
   - Samsung GT-I9070 (Janice) phone (u8500)

  TI OMAP:
   - MYIR Tech Limited development board (AM335X)

  Allwinner/sunxi:
   - SL631 Action Camera (V3)
   - PineTab Early Adopter tablet (A64)

  Broadcom:
   - BCM4906 networking chip
   - Netgear R8000P router (BCM4906)

  AMLogic:
   - Hardkernel ODROID-HC4 development board (SM1)
   - Beelink GS-King-X TV Box (S922X)

  Qualcomm:
   - Snapdragon 888 / SM8350 high-end phone SoC
   - Qualcomm SDX55 5G modem as standalone SoC
   - Snapdragon MTP reference board (SM8350)
   - Snapdragon MTP reference board (SDX55)
   - Sony Kitakami phones: Xperia Z3+/Z4/Z5 (APQ8094)
   - Alcatel Idol 3 phone (MSM8916)
   - ASUS Zenfone 2 Laser phone (MSM8916)
   - BQ Aquaris X5 aka Longcheer L8910 phone (MSM8916)
   - OnePlus6 phone (SDM845)
   - OnePlus6T phone (SDM845)
   - Alfa Network AP120C-AC access point (IPQ4018)

  NXP i.MX6 (32-bit):
   - Plymovent BAS base system controller for filter systems (imx6dl)
   - Protonic MVT industrial touchscreen terminals (imx6dl)
   - Protonic PRTI6G reference board (imx6ul)
   - Kverneland UT1, UT1Q, UT1P, TGO agricultural terminals (imx6q/dl/qp)

  NXP i.MX8 (64-bit)
   - Beacon i.MX8M Nano development kit (imx8mn)
   - Boundary Devices i.MX8MM Nitrogen SBC (imx8mm)
   - Gateworks Venice i.MX 8M Mini Development Kits (imx8mm)
   - phyBOARD-Pollux-i.MX8MP (imx8mp)
   - Purism Librem5 Evergreen phone (imx8mp)
   - Kontron SMARC-sAL28 system-on-module(imx8mp)

  Rockchip:
   - NanoPi M4B Single-board computer (RK3399)
   - Radxa Rock Pi E router SBC (RK3328)

  ASpeed:
   - Ampere Mt. Jade, a BMC for an x86 server (AST2500)
   - IBM Everest, a BMC for a Power10 server (AST2600)
   - Supermicro x11spi, a BMC for an ARM server (AST2500)

  Zynq:
   - Ebang EBAZ4205, FPGA board (Zynq-7000)
   - ZynqMP zcu104 revC reference platform (ZynqMP)"

* tag 'arm-dt-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (584 commits)
  ARM: dts: aspeed: align GPIO hog names with dtschema
  ARM: dts: aspeed: fix PCA95xx GPIO expander properties on Portwell
  dt-bindings: spi: zynq: Convert Zynq QSPI binding to yaml
  arm: dts: visconti: Add DT support for Toshiba Visconti5 GPIO driver
  ARM: dts: aspeed: ast2600evb: Add enable ehci and uhci
  ARM: dts: aspeed: mowgli: Add i2c rtc device
  ARM: dts: aspeed: amd-ethanolx: Enable secondary LPC snooping address
  dt-bindings: arm: xilinx: Add missing Zturn boards
  ARM: dts: ebaz4205: add pinctrl entries for switches
  ARM: dts: add Ebang EBAZ4205 device tree
  dt-bindings: arm: add Ebang EBAZ4205 board
  dt-bindings: add ebang vendor prefix
  ARM: dts: aspeed: Add Everest BMC machine
  ARM: dts: aspeed: inspur-fp5280g2: Add ipsps1 driver
  ARM: dts: aspeed: inspur-fp5280g2: Add GPIO line names
  ARM: dts: aspeed: Add Supermicro x11spi BMC machine
  ARM: dts: aspeed: g220a: Fix some gpio
  ARM: dts: aspeed: g220a: Enable ipmb
  ARM: dts: aspeed: rainier: Add eMMC clock phase compensation
  ARM: dts: aspeed: Add LCLK to lpc-snoop
  ...
2021-02-20 18:34:53 -08:00
Linus Torvalds
584ce3c9b4 Merge tag 'arm-platform-removal-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC platform removals from Arnd Bergmann:
 "There are a lot of platforms that have not seen any interesting code
  changes in the past five years or more.

  I made a list and asked around which ones are no longer in use, and
  received confirmation about six ARM platforms and the TI C6x
  architecture that have all reached the end of their life upstream,
  with no known users remaining:

   - efm32 - added in 2011, first Cortex-M, no notable changes after 2013

   - picoxcell - added in 2011, abandoned after 2012 acquisition

   - prima2 - added in 20111, no notable changes since 2015

   - tango - added in 2015, sporadic changes until 2017, but abandoned

   - u300 - added in 2009, no notable changes since 2013

   - zx - added in 2015 for both 32, 2017 for 64 bit, no notable changes

   - arch/c6x - added in 2011, but work stalled soon after that

  A number of other platforms on the original list turned out to still
  have users. In some cases there are out-of-tree patches and users that
  plan to contribute them in the future, in other cases the code is
  complete and works reliably"

Link: https://lore.kernel.org/lkml/CAK8P3a2DZ8xQp7R=H=wewHnT2=a_=M53QsZOueMVEf7tOZLKNg@mail.gmail.com/

* tag 'arm-platform-removal-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: remove u300 platform
  ARM: remove tango platform
  ARM: remove zte zx platform
  ARM: remove sirf prima2/atlas platforms
  c6x: remove architecture
  MAINTAINERS: Remove deleted platform efm32
  ARM: drop efm32 platform
  ARM: Remove PicoXcell platform support
  ARM: dts: Remove PicoXcell platforms
2021-02-20 18:16:30 -08:00
Linus Torvalds
51e6d17809 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Here is what we have this merge window:

   1) Support SW steering for mlx5 Connect-X6Dx, from Yevgeny Kliteynik.

   2) Add RSS multi group support to octeontx2-pf driver, from Geetha
      Sowjanya.

   3) Add support for KS8851 PHY. From Marek Vasut.

   4) Add support for GarfieldPeak bluetooth controller from Kiran K.

   5) Add support for half-duplex tcan4x5x can controllers.

   6) Add batch skb rx processing to bcrm63xx_enet, from Sieng Piaw
      Liew.

   7) Rework RX port offload infrastructure, particularly wrt, UDP
      tunneling, from Jakub Kicinski.

   8) Add BCM72116 PHY support, from Florian Fainelli.

   9) Remove Dsa specific notifiers, they are unnecessary. From Vladimir
      Oltean.

  10) Add support for picosecond rx delay in dwmac-meson8b chips. From
      Martin Blumenstingl.

  11) Support TSO on xfrm interfaces from Eyal Birger.

  12) Add support for MP_PRIO to mptcp stack, from Geliang Tang.

  13) Support BCM4908 integrated switch, from Rafał Miłecki.

  14) Support for directly accessing kernel module variables via module
      BTF info, from Andrii Naryiko.

  15) Add DASH (esktop and mobile Architecture for System Hardware)
      support to r8169 driver, from Heiner Kallweit.

  16) Add rx vlan filtering to dpaa2-eth, from Ionut-robert Aron.

  17) Add support for 100 base0x SFP devices, from Bjarni Jonasson.

  18) Support link aggregation in DSA, from Tobias Waldekranz.

  19) Support for bitwidse atomics in bpf, from Brendan Jackman.

  20) SmartEEE support in at803x driver, from Russell King.

  21) Add support for flow based tunneling to GTP, from Pravin B Shelar.

  22) Allow arbitrary number of interconnrcts in ipa, from Alex Elder.

  23) TLS RX offload for bonding, from Tariq Toukan.

  24) RX decap offklload support in mac80211, from Felix Fietkou.

  25) devlink health saupport in octeontx2-af, from George Cherian.

  26) Add TTL attr to SCM_TIMESTAMP_OPT_STATS, from Yousuk Seung

  27) Delegated actionss support in mptcp, from Paolo Abeni.

  28) Support receive timestamping when doin zerocopy tcp receive. From
      Arjun Ray.

  29) HTB offload support for mlx5, from Maxim Mikityanskiy.

  30) UDP GRO forwarding, from Maxim Mikityanskiy.

  31) TAPRIO offloading in dsa hellcreek driver, from Kurt Kanzenbach.

  32) Weighted random twos choice algorithm for ipvs, from Darby Payne.

  33) Fix netdev registration deadlock, from Johannes Berg.

  34) Various conversions to new tasklet api, from EmilRenner Berthing.

  35) Bulk skb allocations in veth, from Lorenzo Bianconi.

  36) New ethtool interface for lane setting, from Danielle Ratson.

  37) Offload failiure notifications for routes, from Amit Cohen.

  38) BCM4908 support, from Rafał Miłecki.

  39) Support several new iwlwifi chips, from Ihab Zhaika.

  40) Flow drector support for ipv6 in i40e, from Przemyslaw Patynowski.

  41) Support for mhi prrotocols, from Loic Poulain.

  42) Optimize bpf program stats.

  43) Implement RFC6056, for better port randomization, from Eric
      Dumazet.

  44) hsr tag offloading support from George McCollister.

  45) Netpoll support in qede, from Bhaskar Upadhaya.

  46) 2005/400g speed support in bonding 3ad mode, from Nikolay
      Aleksandrov.

  47) Netlink event support in mptcp, from Florian Westphal.

  48) Better skbuff caching, from Alexander Lobakin.

  49) MRP (Media Redundancy Protocol) offloading in DSA and a few
      drivers, from Horatiu Vultur.

  50) mqprio saupport in mvneta, from Maxime Chevallier.

  51) Remove of_phy_attach, no longer needed, from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1766 commits)
  octeontx2-pf: Fix otx2_get_fecparam()
  cteontx2-pf: cn10k: Prevent harmless double shift bugs
  net: stmmac: Add PCI bus info to ethtool driver query output
  ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary
  ptp: ptp_clockmatrix: Simplify code - remove unnecessary `err` variable.
  ptp: ptp_clockmatrix: Coding style - tighten vertical spacing.
  ptp: ptp_clockmatrix: Clean-up dev_*() messages.
  ptp: ptp_clockmatrix: Remove unused header declarations.
  ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
  ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
  net: stmmac: dwmac-sun8i: Add a shutdown callback
  net: stmmac: dwmac-sun8i: Minor probe function cleanup
  net: stmmac: dwmac-sun8i: Use reset_control_reset
  net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check
  net: stmmac: dwmac-sun8i: Return void from PHY unpower
  r8169: use macro pm_ptr
  net: mdio: Remove of_phy_attach()
  net: mscc: ocelot: select PACKING in the Kconfig
  net: re-solve some conflicts after net -> net-next merge
  net: dsa: tag_rtl4_a: Support also egress tags
  ...
2021-02-20 17:45:32 -08:00
Wong Vee Khee
20e07e2c3c net: stmmac: Add PCI bus info to ethtool driver query output
This patch populates the PCI bus info in the ethtool driver query data.

Users will be able to view PCI bus info using 'ethtool -i <interface>'.

Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 14:25:50 -08:00
David S. Miller
32511f8e49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add two helper functions to release one table and hooks from
   the netns and netlink event path.

2) Add table ownership infrastructure, this new infrastructure allows
   users to bind a table (and its content) to a process through the
   netlink socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:19:24 -08:00
Florian Fainelli
96313e1db8 net: mdio: Remove of_phy_attach()
We have no in-tree users, also update the sfp-phylink.rst documentation
to indicate that phy_attach_direct() is used instead of of_phy_attach().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:17:49 -08:00
David S. Miller
d489ded1a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-16 17:51:13 -08:00
Robert Hancock
b834489bce net: phy: Add is_on_sfp_module flag and phy_on_sfp helper
Add a flag and helper function to indicate that a PHY device is part of
an SFP module, which is set on attach. This can be used by PHY drivers
to handle SFP-specific quirks or behavior.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 15:23:23 -08:00
Robert Hancock
3afd021899 net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
The default configuration for the BCM54616S PHY may not match the desired
mode when using 1000BaseX or SGMII interface modes, such as when it is on
an SFP module. Add code to explicitly set the correct mode using
programming sequences provided by Bel-Fuse:

https://www.belfuse.com/resources/datasheets/powersolutions/ds-bps-sfp-1gbt-05-series.pdf
https://www.belfuse.com/resources/datasheets/powersolutions/ds-bps-sfp-1gbt-06-series.pdf

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 15:23:23 -08:00
Vlad Buslov
396d7f23ad net: sched: fix police ext initialization
When police action is created by cls API tcf_exts_validate() first
conditional that calls tcf_action_init_1() directly, the action idr is not
updated according to latest changes in action API that require caller to
commit newly created action to idr with tcf_idr_insert_many(). This results
such action not being accessible through act API and causes crash reported
by syzbot:

==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204

CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:400 [inline]
 kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
 check_memory_region_inline mm/kasan/generic.c:179 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 __tcf_idr_release net/sched/act_api.c:178 [inline]
 tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
 tc_action_net_exit include/net/act_api.h:151 [inline]
 police_exit_net+0x168/0x360 net/sched/act_police.c:390
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G    B             5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 panic+0x306/0x73d kernel/panic.c:231
 end_report+0x58/0x5e mm/kasan/report.c:100
 __kasan_report mm/kasan/report.c:403 [inline]
 kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
 check_memory_region_inline mm/kasan/generic.c:179 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 __tcf_idr_release net/sched/act_api.c:178 [inline]
 tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
 tc_action_net_exit include/net/act_api.h:151 [inline]
 police_exit_net+0x168/0x360 net/sched/act_police.c:390
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Kernel Offset: disabled

Fix the issue by calling tcf_idr_insert_many() after successful action
initialization.

Fixes: 0fedc63fad ("net_sched: commit action insertions together")
Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:59:19 -08:00
David S. Miller
44c3203975 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
pull-request: mlx5-next 2021-02-16

The patches in this pr are already submitted and reviewed through the
netdev and rdma mailing lists.

The series includes mlx5 HW bits and definitions for mlx5 real time clock
translation and handling in the mlx5 driver clock module to enable and
support such mode [1]

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:53:30 -08:00
Horatiu Vultur
c595c4330d net: dsa: add MRP support
Add support for offloading MRP in HW. Currently implement the switchdev
calls 'SWITCHDEV_OBJ_ID_MRP', 'SWITCHDEV_OBJ_ID_RING_ROLE_MRP',
to allow to create MRP instances and to set the role of these instances.

Add DSA_NOTIFIER_MRP_ADD/DEL and DSA_NOTIFIER_MRP_ADD/DEL_RING_ROLE
which calls to .port_mrp_add/del and .port_mrp_add/del_ring_role in the
DSA driver for the switch.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:47:46 -08:00
Horatiu Vultur
d8ea7ff399 net: mscc: ocelot: Add support for MRP
Add basic support for MRP. The HW will just trap all MRP frames on the
ring ports to CPU and allow the SW to process them. In this way it is
possible to for this node to behave both as MRM and MRC.

Current limitations are:
- it doesn't support Interconnect roles.
- it supports only a single ring.
- the HW should be able to do forwarding of MRP Test frames so the SW
  will not need to do this. So it would be able to have the role MRC
  without SW support.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:47:46 -08:00
Horatiu Vultur
c513efa20c switchdev: mrp: Extend ring_role_mrp and in_role_mrp
Add the member sw_backup to the structures switchdev_obj_ring_role_mrp
and switchdev_obj_in_role_mrp. In this way the SW can call the driver in
2 ways, once when sw_backup is set to false, meaning that the driver
should implement this completely in HW. And if that is not supported the
SW will call again but with sw_backup set to true, meaning that the
HW should help or allow the SW to run the protocol.

For example when role is MRM, if the HW can't detect when it stops
receiving MRP Test frames but it can trap these frames to CPU, then it
needs to return -EOPNOTSUPP when sw_backup is false and return 0 when
sw_backup is true.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:47:46 -08:00
Horatiu Vultur
405be6b46b switchdev: mrp: Remove CONFIG_BRIDGE_MRP
Remove #IS_ENABLED(CONFIG_BRIDGE_MRP) from switchdev.h. This will
simplify the code implements MRP callbacks and will be similar with the
vlan filtering.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:47:46 -08:00
Pavana Sharma
7331d1d462 net: phy: Add 5GBASER interface mode
Add 5GBASE-R phy interface mode

Signed-off-by: Pavana Sharma <pavana.sharma@digi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:15:12 -08:00
Florian Fainelli
32aeba1f7a tg3: Remove unused PHY_BRCM flags
The tg3 driver tried to communicate towards the PHY driver whether it
wanted RGMII in-band signaling enabled or disabled however there is
nothing that looks at those flags in drivers/net/phy/broadcom.c so this
does do not anything.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:11:57 -08:00
Eran Ben Elisha
d6f3dc8f50 net/mlx5: Move all internal timer metadata into a dedicated struct
Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
Eran Ben Elisha
ae02d41551 net/mlx5: Add register layout to support real-time time-stamp
Add needed structure layouts and defines for MTUTC (Management UTC)
register. MTUTC will be used for cyc2time HW translation.

In addition, add cyc2time modify capability bit and init segment HCA
real time address.

Finally, add capability bits indicating which time-stamping format is
supported per SQ and RQ. Add ts_format in the queue's context layout to
allow configuration.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
David S. Miller
b8af417e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc, &tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &zc, &len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 13:14:06 -08:00
Aharon Landau
a6a217dddc net/mlx5: Add new timestamp mode bits
These fields declare which timestamp mode is supported by the device
per RQ/SQ/QP.

In addition add the ts_format field to the select the mode for
RQ/SQ/QP.

Link: https://lore.kernel.org/r/20210209131107.698833-2-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-02-16 16:10:39 +02:00
Heiner Kallweit
93e8990c24 net: phy: rename PHY_IGNORE_INTERRUPT to PHY_MAC_INTERRUPT
Some internal PHY's have their events like link change reported by the
MAC interrupt. We have PHY_IGNORE_INTERRUPT to deal with this scenario.
I'm not too happy with this name. We don't ignore interrupts, typically
there is no interrupt exposed at a PHY level. So let's rename it to
PHY_MAC_INTERRUPT. This is in line with phy_mac_interrupt(), which is
called from the MAC interrupt handler to handle PHY events.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:20:49 -08:00
Florian Fainelli
5d4358ede8 net: phy: broadcom: Allow BCM54210E to configure APD
BCM54210E/BCM50212E has been verified to work correctly with the
auto-power down configuration done by bcm54xx_adjust_rxrefclk(), add it
to the list of PHYs working.

While we are at it, provide an appropriate name for the bit we are
changing which disables the RXC and TXC during auto-power down when
there is no energy on the cable.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:15:25 -08:00
Florian Fainelli
17d3a83afb net: phy: broadcom: Remove unused flags
We have a number of unused flags defined today and since we are scarce
on space and may need to introduce new flags in the future remove and
shift every existing flag down into a contiguous assignment.
PHY_BCM_FLAGS_MODE_1000BX was only used internally for the BCM54616S
PHY, so we allocate a driver private structure instead to store that
flag instead of canibalizing one from phydev->dev_flags for that
purpose.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:15:25 -08:00
Geliang Tang
0caf3ada24 mptcp: add local addr info in mptcp_info
Add mptcpi_local_addr_used and mptcpi_local_addr_max in struct mptcp_info.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:09:14 -08:00
Vladimir Oltean
419dfaed7c net: bridge: fix switchdev_port_attr_set stub when CONFIG_SWITCHDEV=n
The switchdev_port_attr_set function prototype was updated only for the
case where CONFIG_SWITCHDEV=y|m, leaving a prototype mismatch with the
stub definition for the disabled case. This results in a build error, so
update that function too.

Fixes: dcbdf1350e ("net: bridge: propagate extack through switchdev_port_attr_set")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 13:15:10 -08:00
Tong Zhang
d0a0bbe7b0 atm: idt77252: fix build broken on amd64
idt77252 is broken and wont load on amd64 systems
  modprobe idt77252 shows the following

    idt77252_init: skb->cb is too small (48 < 56)

  Add packed attribute to struct idt77252_skb_prv and struct atm_skb_data
  so that the total size can be <= sizeof(skb->cb)
  Also convert runtime size check to buildtime size check in
  idt77252_init()

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 12:36:27 -08:00
Pablo Neira Ayuso
6001a930ce netfilter: nftables: introduce table ownership
A userspace daemon like firewalld might need to monitor for netlink
updates to detect its ruleset removal by the (global) flush ruleset
command to ensure ruleset persistency. This adds extra complexity from
userspace and, for some little time, the firewall policy is not in
place.

This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace
program to own the table that creates in exclusivity.

Tables that are owned...

- can only be updated and removed by the owner, non-owners hit EPERM if
  they try to update it or remove it.
- are destroyed when the owner closes the netlink socket or the process
  is gone (implicit netlink socket closure).
- are skipped by the global flush ruleset command.
- are listed in the global ruleset.

The userspace process that sets on the NFT_TABLE_F_OWNER flag need to
leave open the netlink socket.

A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID
to identify the owner from userspace.

This patch also updates error reporting when an unknown table flag is
specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is
usually reserved to report for malformed netlink messages to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-15 18:17:15 +01:00
Vladimir Oltean
89153ed6eb net: dsa: propagate extack to .port_vlan_filtering
Some drivers can't dynamically change the VLAN filtering option, or
impose some restrictions, it would be nice to propagate this info
through netlink instead of printing it to a kernel log that might never
be read. Also netlink extack includes the module that emitted the
message, which means that it's easier to figure out which ones are
driver-generated errors as opposed to command misuse.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:12 -08:00
Vladimir Oltean
31046a5fd9 net: dsa: propagate extack to .port_vlan_add
Allow drivers to communicate their restrictions to user space directly,
instead of printing to the kernel log. Where the conversion would have
been lossy and things like VLAN ID could no longer be conveyed (due to
the lack of support for printf format specifier in netlink extack), I
chose to keep the messages in full form to the kernel log only, and
leave it up to individual driver maintainers to move more messages to
extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean
dcbdf1350e net: bridge: propagate extack through switchdev_port_attr_set
The benefit is the ability to propagate errors from switchdev drivers
for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean
0a6f17c6ae net: dsa: tag_ocelot_8021q: add support for PTP timestamping
For TX timestamping, we use the felix_txtstamp method which is common
with the regular (non-8021q) ocelot tagger. This method says that skb
deferral is needed, prepares a timestamp request ID, and puts a clone of
the skb in a queue waiting for the timestamp IRQ.

felix_txtstamp is called by dsa_skb_tx_timestamp() just before the
tagger's xmit method. In the tagger xmit, we divert the packets
classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based
injection registers, and we declare them as dead towards dsa_slave_xmit.
If not PTP, we proceed with normal tag_8021q stuff.

Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is
matched to the TX timestamp retrieved from the switch's FIFO based on
the timestamp request ID, and the clone is delivered to the stack.

On RX, thanks to the VCAP IS2 rule that redirects the frames with an
EtherType for 1588 towards two destinations:
- the CPU port module (for MMIO based extraction) and
- if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port
the relevant data path processing starts in the ptp_classify_raw BPF
classifier installed by DSA in the RX data path (post tagger, which is
completely unaware that it saw a PTP packet).

This time we can't reuse the same implementation of .port_rxtstamp that
also works with the default ocelot tagger. That is because felix_rxtstamp
is given an skb with a freshly stripped DSA header, and it says "I don't
need deferral for its RX timestamp, it's right in it, let me show you";
and it just points to the header right behind skb->data, from where it
unpacks the timestamp and annotates the skb with it.

The same thing cannot happen with tag_ocelot_8021q, because for one
thing, the skb did not have an extraction frame header in the first
place, but a VLAN tag with no timestamp information. So the code paths
in felix_rxtstamp for the regular and 8021q tagger are completely
independent. With tag_8021q, the timestamp must come from the packet's
duplicate delivered to the CPU port module, but there is potentially
complex logic to be handled [ and prone to reordering ] if we were to
just start reading packets from the CPU port module, and try to match
them to the one we received over Ethernet and which needs an RX
timestamp. So we do something simple: we tell DSA "give me some time to
think" (we request skb deferral by returning false from .port_rxtstamp)
and we just drop the frame we got over Ethernet with no attempt to match
it to anything - we just treat it as a notification that there's data to
be processed from the CPU port module's queues. Then we proceed to read
the packets from those, one by one, which we deliver up the stack,
timestamped, using netif_rx - the same function that any driver would
use anyway if it needed RX timestamp deferral. So the assumption is that
we'll come across the PTP packet that triggered the CPU extraction
notification eventually, but we don't know when exactly. Thanks to the
VCAP IS2 trap/redirect rule and the exclusion of the CPU port module
from the flooding replicators, only PTP frames should be present in the
CPU port module's RX queues anyway.

There is just one conflict between the VCAP IS2 trapping rule and the
semantics of the BPF classifier. Namely, ptp_classify_raw() deems
general messages as non-timestampable, but still, those are trapped to
the CPU port module since they have an EtherType of ETH_P_1588. So, if
the "no XTR IRQ" workaround is in place, we need to run another BPF
classifier on the frames extracted over MMIO, to avoid duplicates being
sent to the stack (once over Ethernet, once over MMIO). It doesn't look
like it's possible to install VCAP IS2 rules based on keys extracted
from the 1588 frame headers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean
924ee317f7 net: mscc: ocelot: refactor ocelot_xtr_irq_handler into ocelot_xtr_poll
Since the felix DSA driver will need to poll the CPU port module for
extracted frames as well, let's create some common functions that read
an Extraction Frame Header, and then an skb, from a CPU extraction
group.

We abuse the struct ocelot_ops :: port_to_netdev function a little bit,
in order to retrieve the DSA port net_device or the ocelot switchdev
net_device based on the source port information from the Extraction
Frame Header, but it's all in the benefit of code simplification -
netdev_alloc_skb needs it. Originally, the port_to_netdev method was
intended for parsing act->dev from tc flower offload code.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean
7c4bb540e9 net: dsa: tag_ocelot: create separate tagger for Seville
The ocelot tagger is a hot mess currently, it relies on memory
initialized by the attached driver for basic frame transmission.
This is against all that DSA tagging protocols stand for, which is that
the transmission and reception of a DSA-tagged frame, the data path,
should be independent from the switch control path, because the tag
protocol is in principle hot-pluggable and reusable across switches
(even if in practice it wasn't until very recently). But if another
driver like dsa_loop wants to make use of tag_ocelot, it couldn't.

This was done to have common code between Felix and Ocelot, which have
one bit difference in the frame header format. Quoting from commit
67c2404922 ("net: dsa: felix: create a template for the DSA tags on
xmit"):

    Other alternatives have been analyzed, such as:
    - Create a separate tag_seville.c: too much code duplication for just 1
      bit field difference.
    - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
      tag_brcm.c, which would have a separate .xmit function. Again, too
      much code duplication for just 1 bit field difference.
    - Allocate the template from the init function of the tag_ocelot.c
      module, instead of from the driver: couldn't figure out a method of
      accessing the correct port template corresponding to the correct
      tagger in the .xmit function.

The really interesting part is that Seville should have had its own
tagging protocol defined - it is not compatible on the wire with Ocelot,
even for that single bit. In principle, a packet generated by
DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain
way, but when booted on NXP T1040 it would look differently. The reverse
is also true: a packet generated by a Seville switch would be
interpreted incorrectly by Wireshark if it was told it was generated by
an Ocelot switch.

Actually things are a bit more nuanced. If we concentrate only on the
DSA tag, what I said above is true, but Ocelot/Seville also support an
optional DSA tag prefix, which can be short or long, and it is possible
to distinguish the two taggers based on an integer constant put in that
prefix. Nonetheless, creating a separate tagger is still justified,
since the tag prefix is optional, and without it, there is again no way
to distinguish.

Claiming backwards binary compatibility is a bit more tough, since I've
already changed the format of tag_ocelot once, in commit 5124197ce5
("net: dsa: tag_ocelot: use a short prefix on both ingress and egress").
Therefore I am not very concerned with treating this as a bugfix and
backporting it to stable kernels (which would be another mess due to the
fact that there would be lots of conflicts with the other DSA_TAG_PROTO*
definitions). It's just simpler to say that the string values of the
taggers have ABI value starting with kernel 5.12, which will be when the
changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging
goes live.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean
40d3f295b5 net: mscc: ocelot: use common tag parsing code with DSA
The Injection Frame Header and Extraction Frame Header that the switch
prepends to frames over the NPI port is also prepended to frames
delivered over the CPU port module's queues.

Let's unify the handling of the frame headers by making the ocelot
driver call some helpers exported by the DSA tagger. Among other things,
this allows us to get rid of the strange cpu_to_be32 when transmitting
the Injection Frame Header on ocelot, since the packing API uses
network byte order natively (when "quirks" is 0).

The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and
the cpu extraction queue mask to something, but the code doesn't do it,
so we don't do it either.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean
137ffbc4bb net: mscc: ocelot: refactor ocelot_port_inject_frame out of ocelot_port_xmit
The felix DSA driver will inject some frames through register MMIO, same
as ocelot switchdev currently does. So we need to be able to reuse the
common code.

Also create some shim definitions, since the DSA tagger can be compiled
without support for the switch driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Alexander Lobakin
9243adfc31 skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing
napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13 14:32:04 -08:00
Alexander Lobakin
f450d539c0 skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
Instead of just bulk-flushing skbuff_heads queued up through
napi_consume_skb() or __kfree_skb_defer(), try to reuse them
on allocation path.
If the cache is empty on allocation, bulk-allocate the first
16 elements, which is more efficient than per-skb allocation.
If the cache is full on freeing, bulk-wipe the second half of
the cache (32 elements).
This also includes custom KASAN poisoning/unpoisoning to be
double sure there are no use-after-free cases.

To not change current behaviour, introduce a new function,
napi_build_skb(), to optionally use a new approach later
in drivers.

Note on selected bulk size, 16:
 - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
   and especially VETH_XDP_BATCH, which is also used to
   bulk-allocate skbuff_heads and was tested on powerful
   setups;
 - this also showed the best performance in the actual
   test series (from the array of {8, 16, 32}).

Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13 14:32:04 -08:00
Alexander Lobakin
fec6e49b63 skbuff: remove __kfree_skb_flush()
This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13 14:32:03 -08:00
Dmitrii Banshchikov
e5069b9c23 bpf: Support pointers in global func args
Add an ability to pass a pointer to a type with known size in arguments
of a global function. Such pointers may be used to overcome the limit on
the maximum number of arguments, avoid expensive and tricky workarounds
and to have multiple output arguments.

A referenced type may contain pointers but indirect access through them
isn't supported.

The implementation consists of two parts.  If a global function has an
argument that is a pointer to a type with known size then:

  1) In btf_check_func_arg_match(): check that the corresponding
register points to NULL or to a valid memory region that is large enough
to contain the expected argument's type.

  2) In btf_prepare_func_args(): set the corresponding register type to
PTR_TO_MEM_OR_NULL and its size to the size of the expected type.

Only global functions are supported because allowance of pointers for
static functions might break validation. Consider the following
scenario. A static function has a pointer argument. A caller passes
pointer to its stack memory. Because the callee can change referenced
memory verifier cannot longer assume any particular slot type of the
caller's stack memory hence the slot type is changed to SLOT_MISC.  If
there is an operation that relies on slot type other than SLOT_MISC then
verifier won't be able to infer safety of the operation.

When verifier sees a static function that has a pointer argument
different from PTR_TO_CTX then it skips arguments check and continues
with "inline" validation with more information available. The operation
that relies on the particular slot type now succeeds.

Because global functions were not allowed to have pointer arguments
different from PTR_TO_CTX it's not possible to break existing and valid
code.

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
2021-02-12 17:37:23 -08:00
Eric Dumazet
05dc72aba3 tcp: factorize logic into tcp_epollin_ready()
Both tcp_data_ready() and tcp_stream_is_readable() share the same logic.

Add tcp_epollin_ready() helper to avoid duplication.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:28:26 -08:00
Eric Dumazet
f969dc5a88 tcp: fix SO_RCVLOWAT related hangs under mem pressure
While commit 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
it missed to address issue caused by memory pressure.

1) If we are under memory pressure and socket receive queue is empty.
First incoming packet is allowed to be queued, after commit
76dfa60820 ("tcp: allow one skb to be received per socket under memory pressure")

But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
is bigger than skb length.

2) Then, when next packet comes, it is dropped, and we directly
call sk->sk_data_ready().

3) If application is using poll(), tcp_poll() will then use
tcp_stream_is_readable() and decide the socket receive queue is
not yet filled, so nothing will happen.

Even when sender retransmits packets, phases 2) & 3) repeat
and flow is effectively frozen, until memory pressure is off.

Fix is to consider tcp_under_memory_pressure() to take care
of global memory pressure or memcg pressure.

Fixes: 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun Roy <arjunroy@google.com>
Suggested-by: Wei Wang <weiwan@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:28:26 -08:00
Vladimir Oltean
421741ea56 net: mscc: ocelot: offload bridge port flags to device
We should not be unconditionally enabling address learning, since doing
that is actively detrimential when a port is standalone and not offloading
a bridge. Namely, if a port in the switch is standalone and others are
offloading the bridge, then we could enter a situation where we learn an
address towards the standalone port, but the bridged ports could not
forward the packet there, because the CPU is the only path between the
standalone and the bridged ports. The solution of course is to not
enable address learning unless the bridge asks for it.

We need to set up the initial port flags for no learning and flooding
everything, and also when the port joins and leaves the bridge.
The flood configuration was already configured ok for standalone mode
in ocelot_init, we just need to disable learning in ocelot_init_port.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:05 -08:00
Vladimir Oltean
b360d94f1b net: mscc: ocelot: use separate flooding PGID for broadcast
In preparation of offloading the bridge port flags which have
independent settings for unknown multicast and for broadcast, we should
also start reserving one destination Port Group ID for the flooding of
broadcast packets, to allow configuring it individually.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:05 -08:00
Vladimir Oltean
a8b659e7ff net: dsa: act as passthrough for bridge port flags
There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
expressed by the bridge through switchdev, and not all of them can be
emulated by DSA mid-layer API at the same time.

One possible configuration is when the bridge offloads the port flags
using a mask that has a single bit set - therefore only one feature
should change. However, DSA currently groups together unicast and
multicast flooding in the .port_egress_floods method, which limits our
options when we try to add support for turning off broadcast flooding:
do we extend .port_egress_floods with a third parameter which b53 and
mv88e6xxx will ignore? But that means that the DSA layer, which
currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
see that .port_egress_floods is implemented, and will report that all 3
types of flooding are supported - not necessarily true.

Another configuration is when the user specifies more than one flag at
the same time, in the same netlink message. If we were to create one
individual function per offloadable bridge port flag, we would limit the
expressiveness of the switch driver of refusing certain combinations of
flag values. For example, a switch may not have an explicit knob for
flooding of unknown multicast, just for flooding in general. In that
case, the only correct thing to do is to allow changes to BR_FLOOD and
BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
a separate .port_set_unicast_flood and .port_set_multicast_flood would
not allow the driver to possibly reject that.

Also, DSA doesn't consider it necessary to inform the driver that a
SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it.  If it does, there
will need to be a dedicated .port_set_mrouter anyway.

So we need to let the DSA drivers see the exact form that the bridge
passes this switchdev attribute in, otherwise we are standing in the
way. Therefore we also need to use this form of language when
communicating to the driver that it needs to configure its initial
(before bridge join) and final (after bridge leave) port flags.

The b53 and mv88e6xxx drivers are converted to the passthrough API and
their implementation of .port_egress_floods is split into two: a
function that configures unicast flooding and another for multicast.
The mv88e6xxx implementation is quite hairy, and it turns out that
the implementations of unknown unicast flooding are actually the same
for 6185 and for 6352:

behind the confusing names actually lie two individual bits:
NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)

so there was no reason to entangle them in the first place.

Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
PORT_CTL0, which has the exact same bit index. I have left the
implementations separate though, for the only reason that the names are
different enough to confuse me, since I am not able to double-check with
a user manual. The multicast flooding setting for 6185 is in a different
register than for 6352 though.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean
e18f4c18ab net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean
4c08c586ff net: switchdev: propagate extack to port attributes
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
David S. Miller
21cc70c75b Merge tag 'mac80211-next-for-net-next-2021-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Last set of updates:
 * more minstrel work from Felix to reduce the
   probing overhead
 * QoS for nl80211 control port frames
 * STBC injection support
 * and a couple of small fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 16:48:52 -08:00