Commit Graph

1351117 Commits

Author SHA1 Message Date
Stanislav Fomichev
1901066aab netdevsim: add dummy device notifiers
In order to exercise and verify notifiers' locking assumptions,
register dummy notifiers (via register_netdevice_notifier_dev_net).
Share notifier event handler that enforces the assumptions with
lock_debug.c (rename and export rtnl_net_debug_event as
netdev_debug_event). Add ops lock asserts to netdev_debug_event.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Stanislav Fomichev
b912d599d3 net: rename rtnl_net_debug to lock_debug
And make it selected by CONFIG_DEBUG_NET. Don't rename any of
the structs/functions. Next patch will use rtnl_net_debug_event in
netdevsim.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-5-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Stanislav Fomichev
8965c160b8 net: use netif_disable_lro in ipv6_add_dev
ipv6_add_dev might call dev_disable_lro which unconditionally grabs
instance lock, so it will deadlock during NETDEV_REGISTER. Switch
to netif_disable_lro.

Make sure all callers hold the instance lock as well.

Cc: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Stanislav Fomichev
4c975fd700 net: hold instance lock during NETDEV_REGISTER/UP
Callers of inetdev_init can come from several places with inconsistent
expectation about netdev instance lock. Grab instance lock during
REGISTER (plus UP). Also solve the inconsistency with UNREGISTER
where it was locked only during move netns path.

WARNING: CPU: 10 PID: 1479 at ./include/net/netdev_lock.h:54
__netdev_update_features+0x65f/0xca0
__warn+0x81/0x180
__netdev_update_features+0x65f/0xca0
report_bug+0x156/0x180
handle_bug+0x4f/0x90
exc_invalid_op+0x13/0x60
asm_exc_invalid_op+0x16/0x20
__netdev_update_features+0x65f/0xca0
netif_disable_lro+0x30/0x1d0
inetdev_init+0x12f/0x1f0
inetdev_event+0x48b/0x870
notifier_call_chain+0x38/0xf0
register_netdevice+0x741/0x8b0
register_netdev+0x1f/0x40
mlx5e_probe+0x4e3/0x8e0 [mlx5_core]
auxiliary_bus_probe+0x3f/0x90
really_probe+0xc3/0x3a0
__driver_probe_device+0x80/0x150
driver_probe_device+0x1f/0x90
__device_attach_driver+0x7d/0x100
bus_for_each_drv+0x80/0xd0
__device_attach+0xb4/0x1c0
bus_probe_device+0x91/0xa0
device_add+0x657/0x870

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Stanislav Fomichev
d2ccd0560d net: switch to netif_disable_lro in inetdev_init
Cosmin reports the following deadlock:
dump_stack_lvl+0x62/0x90
print_deadlock_bug+0x274/0x3b0
__lock_acquire+0x1229/0x2470
lock_acquire+0xb7/0x2b0
__mutex_lock+0xa6/0xd20
dev_disable_lro+0x20/0x80
inetdev_init+0x12f/0x1f0
inetdev_event+0x48b/0x870
notifier_call_chain+0x38/0xf0
netif_change_net_namespace+0x72e/0x9f0
do_setlink.isra.0+0xd5/0x1220
rtnl_newlink+0x7ea/0xb50
rtnetlink_rcv_msg+0x459/0x5e0
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x193/0x270
netlink_sendmsg+0x204/0x450

Switch to netif_disable_lro which assumes the caller holds the instance
lock. inetdev_init is called for blackhole device (which sw device and
doesn't grab instance lock) and from REGISTER/UNREGISTER notifiers.
We already hold the instance lock for REGISTER notifier during
netns change and we'll soon hold the lock during other paths.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Linus Torvalds
5916a6fbc0 Merge tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "We see a net reduction of the number of lines of code thanks to the
  removal of a now unused driver and a testing tool that is not used
  anymore. Apart from this, the max31335 driver gets support for a new
  part number and pm8xxx gets UEFI support.

  Core:

   - setdate is removed as it has better replacements

   - skip alarms with a second resolution when we know the RTC doesn't
     support those.

  Subsystem:

   - remove unnecessary private struct members

   - use devm_pm_set_wake_irq were relevant

  Drivers:

   - ds1307: stop disabling alarms on probe for DS1337, DS1339, DS1341
     and DS3231

   - max31335: add max31331 support

   - pcf50633 is removed as support for the related SoC has been removed

   - pcf85063: properly handle POR failures"

* tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  rtc: remove 'setdate' test program
  selftest: rtc: skip some tests if the alarm only supports minutes
  rtc: mt6397: drop unused defines
  rtc: pcf85063: replace dev_err+return with return dev_err_probe
  rtc: pcf85063: do a SW reset if POR failed
  rtc: max31335: Add driver support for max31331
  dt-bindings: rtc: max31335: Add max31331 support
  rtc: cros-ec: Avoid a couple of -Wflex-array-member-not-at-end warnings
  dt-bindings: rtc: pcf2127: Reference spi-peripheral-props.yaml
  rtc: rzn1: implement one-second accuracy for alarms
  rtc: pcf50633: Remove
  rtc: pm8xxx: implement qcom,no-alarm flag for non-HLOS owned alarm
  rtc: pm8xxx: mitigate flash wear
  rtc: pm8xxx: add support for uefi offset
  dt-bindings: rtc: qcom-pm8xxx: document qcom,no-alarm flag
  rtc: rv3032: drop WADA
  rtc: rv3032: fix EERD location
  rtc: pm8xxx: switch to devm_device_init_wakeup
  rtc: pm8xxx: fix possible race condition
  rtc: mpfs: switch to devm_device_init_wakeup
  ...
2025-04-03 15:31:14 -07:00
Lorenzo Bianconi
09bccf56db net: airoha: Validate egress gdm port in airoha_ppe_foe_entry_prepare()
Dev pointer in airoha_ppe_foe_entry_prepare routine is not strictly
a device allocated by airoha_eth driver since it is an egress device
and the flowtable can contain even wlan, pppoe or vlan devices. E.g:

flowtable ft {
        hook ingress priority filter
        devices = { eth1, lan1, lan2, lan3, lan4, wlan0 }
        flags offload                               ^
                                                    |
                     "not allocated by airoha_eth" --
}

In this case airoha_get_dsa_port() will just return the original device
pointer and we can't assume netdev priv pointer points to an
airoha_gdm_port struct.
Fix the issue validating egress gdm port in airoha_ppe_foe_entry_prepare
routine before accessing net_device priv pointer.

Fixes: 00a7678310 ("net: airoha: Introduce flowtable offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250401-airoha-validate-egress-gdm-port-v4-1-c7315d33ce10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:18:16 -07:00
David Oberhollenzer
a58d882841 net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy
The mv88e6xxx has an internal PPU that polls PHY state. If we want to
access the internal PHYs, we need to disable the PPU first. Because
that is a slow operation, a 10ms timer is used to re-enable it,
canceled with every access, so bulk operations effectively only
disable it once and re-enable it some 10ms after the last access.

If a PHY is accessed and then the mv88e6xxx module is removed before
the 10ms are up, the PPU re-enable ends up accessing a dangling pointer.

This especially affects probing during bootup. The MDIO bus and PHY
registration may succeed, but registration with the DSA framework
may fail later on (e.g. because the CPU port depends on another,
very slow device that isn't done probing yet, returning -EPROBE_DEFER).
In this case, probe() fails, but the MDIO subsystem may already have
accessed the MIDO bus or PHYs, arming the timer.

This is fixed as follows:
 - If probe fails after mv88e6xxx_phy_init(), make sure we also call
   mv88e6xxx_phy_destroy() before returning
 - In mv88e6xxx_remove(), make sure we do the teardown in the correct
   order, calling mv88e6xxx_phy_destroy() after unregistering the
   switch device.
 - In mv88e6xxx_phy_destroy(), destroy both the timer and the work item
   that the timer might schedule, synchronously waiting in case one of
   the callbacks already fired and destroying the timer first, before
   waiting for the work item.
 - Access to the PPU is guarded by a mutex, the worker acquires it
   with a mutex_trylock(), not proceeding with the expensive shutdown
   if that fails. We grab the mutex in mv88e6xxx_phy_destroy() to make
   sure the slow PPU shutdown is already done or won't even enter, when
   we wait for the work item.

Fixes: 2e5f032095 ("dsa: add support for the Marvell 88E6131 switch chip")
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20250401135705.92760-1-david.oberhollenzer@sigma-star.at
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:14:13 -07:00
Loic Poulain
40eb4a0434 MAINTAINERS: Update Loic Poulain's email address
Update Loic Poulain's email address to @oss.qualcomm.com.

Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250401145344.10669-1-loic.poulain@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:13:19 -07:00
Fernando Fernandez Mancera
7ac6ea4a3e ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATS
Using RTEXT_FILTER_SKIP_STATS is incorrectly skipping non-stats IPv6
netlink attributes on link dump. This causes issues on userspace tools,
e.g iproute2 is not rendering address generation mode as it should due
to missing netlink attribute.

Move the filling of IFLA_INET6_STATS and IFLA_INET6_ICMP6STATS to a
helper function guarded by a flag check to avoid hitting the same
situation in the future.

Fixes: d5566fd72e ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402121751.3108-1-ffmancera@riseup.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:11:29 -07:00
Taehee Yoo
e4546c6498 eth: bnxt: fix deadlock in the mgmt_ops
When queue is being reset, callbacks of mgmt_ops are called by
netdev_nl_bind_rx_doit().
The netdev_nl_bind_rx_doit() first acquires netdev_lock() and then calls
callbacks.
So, mgmt_ops callbacks should not acquire netdev_lock() internaly.

The bnxt_queue_{start | stop}() calls napi_{enable | disable}() but they
internally acquire netdev_lock().
So, deadlock occurs.

To avoid deadlock, napi_{enable | disable}_locked() should be used
instead.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Fixes: cae03e5bdd ("net: hold netdev instance lock during queue operations")
Link: https://patch.msgid.link/20250402133123.840173-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:10:52 -07:00
Dmitry Safonov
e5ddf19dbc net/selftests: Add loopback link local route for self-connect
self-connect-ipv6 got slightly flaky on netdev:
> # timeout set to 120
> # selftests: net/tcp_ao: self-connect_ipv6
> # 1..5
> # # 708[lib/setup.c:250] rand seed 1742872572
> # TAP version 13
> # # 708[lib/proc.c:213]    Snmp6            Ip6OutNoRoutes: 0 => 1
> # not ok 1 # error 708[self-connect.c:70] failed to connect()
> # ok 2 No unexpected trace events during the test run
> # # Planned tests != run tests (5 != 2)
> # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1
> ok 1 selftests: net/tcp_ao: self-connect_ipv6

I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes"
there is no route to the local_addr (::1).

Looking at the kernel code, I see that kernel does add link-local
address automatically in init_loopback(), but that is called from
ipv6 notifier block. So, in turn the userspace that brought up
the loopback interface may see rtnetlink ACK earlier than
addrconf_notify() does it's job (at least, on a slow VM such as netdev).
Probably, for ipv4 it's the same, judging by inetdev_event().

The fix is quite simple: set the link-local route straight after
bringing the loopback interface. That will make it synchronous.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250402-tcp-ao-selfconnect-flake-v1-1-8388d629ef3d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:08:31 -07:00
Edward Cree
8241ecec1c sfc: fix NULL dereferences in ef100_process_design_param()
Since cited commit, ef100_probe_main() and hence also
 ef100_check_design_params() run before efx->net_dev is created;
 consequently, we cannot netif_set_tso_max_size() or _segs() at this
 point.
Move those netif calls to ef100_probe_netdev(), and also replace
 netif_err within the design params code with pci_err.

Reported-by: Kyungwook Boo <bookyungwook@gmail.com>
Fixes: 98ff4c7c8a ("sfc: Separate netdev probe/remove from PCI probe/remove")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250401225439.2401047-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:08:25 -07:00
Joshua Washington
15970e1b23 gve: handle overflow when reporting TX consumed descriptors
When the tx tail is less than the head (in cases of wraparound), the TX
consumed descriptor statistic in DQ will be reported as
UINT32_MAX - head + tail, which is incorrect. Mask the difference of
head and tail according to the ring size when reporting the statistic.

Cc: stable@vger.kernel.org
Fixes: 2c9198356d ("gve: Add consumed counts to ethtool stats")
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402001037.2717315-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:07:27 -07:00
Linus Torvalds
e8b4712852 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM and clkdev updates from Russell King:

 - Simplify ARM_MMU_KEEP usage

 - Add Rust support for ARM architecture version 7

 - Align IPIs reported in /proc/interrupts

 - require linker to support KEEP within OVERLAY

 - add KEEP() for ARM vectors

 - add __printf() attribute for clkdev functions

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9445/1: clkdev: Mark some functions with __printf() attribute
  ARM: 9444/1: add KEEP() keyword to ARM_VECTORS
  ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE
  ARM: 9442/1: smp: Fix IPI alignment in /proc/interrupts
  ARM: 9441/1: rust: Enable Rust support for ARMv7
  ARM: 9439/1: arm32: simplify ARM_MMU_KEEP usage
2025-04-03 12:21:44 -07:00
Linus Torvalds
aa18761a44 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Fix max_pfn calculation when hotplugging memory so that it never
   decreases

 - Fix dereference of unused source register in the MOPS SET operation
   fault handling

 - Fix NULL calling in do_compat_alignment_fixup() when the 32-bit user
   space does an unaligned LDREX/STREX

 - Add the HiSilicon HIP09 processor to the Spectre-BHB affected CPUs

 - Drop unused code pud accessors (special/mkspecial)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Don't call NULL in do_compat_alignment_fixup()
  arm64: Add support for HIP09 Spectre-BHB mitigation
  arm64: mm: Drop dead code for pud special bit handling
  arm64: mops: Do not dereference src reg for a set operation
  arm64: mm: Correct the update of max_pfn
2025-04-03 12:07:01 -07:00
Linus Torvalds
531a62f223 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Fix BPF selftests expectations of assembler output and struct layout
   (Song Liu and Yonghong Song)

 - Fix XSK error code when queue is full (Wang Liang)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix verifier_private_stack test failure
  selftests/bpf: Fix verifier_bpf_fastcall test
  selftests/bpf: Fix tests after fields reorder in struct file
  xsk: Fix __xsk_generic_xmit() error code when cq is full
2025-04-03 11:55:41 -07:00
Linus Torvalds
5a2b5cb76c Merge tag 'mm-nonmm-stable-2025-04-02-22-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:
 "One bugfix and a couple of small late-arriving updates"

* tag 'mm-nonmm-stable-2025-04-02-22-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
  lib/sort.c: add _nonatomic() variants with cond_resched()
  mailmap: add an entry for Nicolas Schier
2025-04-03 11:16:57 -07:00
Linus Torvalds
8c7c1b5506 Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
2025-04-03 11:10:00 -07:00
Linus Torvalds
204e9a18f1 Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM hotfixes from Andrew Morton:
 "Five hotfixes. Three are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are for MM"

* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
  mm/hugetlb: move hugetlb_sysctl_init() to the __init section
  mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
  mm/hugetlb_vmemmap: fix memory loads ordering
  mm/userfaultfd: fix release hang over concurrent GUP
2025-04-03 10:47:47 -07:00
Linus Torvalds
ea59cb7423 Merge tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:

 - Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.

 - scx_select_cpu_dfl() fixes and cleanups.

 - Synchronize tool/sched_ext with external scheduler repo. While this
   isn't a fix. There's no risk to the kernel and it's better if they
   stay synced closer.

* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: Sync with scx repo
  sched_ext: initialize built-in idle state before ops.init()
  sched_ext: create_dsq: Return -EEXIST on duplicate request
  sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
  sched_ext: idle: Fix return code of scx_select_cpu_dfl()
2025-04-03 10:03:38 -07:00
Linus Torvalds
41677970ad Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled

   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is
   enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
   configs as the function argument tracing requires. Just have the
   function argument tracing depend on PROBE_EVENTS_BTF_ARGS.

 - Free module_delta for persistent ring buffer instance

   When an instance holds the persistent ring buffer, it allocates a
   helper array to hold the deltas between where modules are loaded on
   the last boot and the current boot. This array needs to be freed when
   the instance is freed.

 - Add cond_resched() to loop in ftrace_graph_set_hash()

   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This loop is
   known to trigger soft lockup warnings and requires a cond_resched().
   The loop in ftrace_graph_set_hash() was missing it.

 - Fix the event format verifier to include "%*p.." arguments

   To prevent events from dereferencing stale pointers that can happen
   if a trace event uses a dereferece pointer to something that was not
   copied into the ring buffer and can be freed by the time the trace is
   read, a verifier is called. At boot or module load, the verifier
   scans the print format string for pointers that can be dereferenced
   and it checks the arguments to make sure they do not contain
   something that can be freed. The "%*p" was not handled, which would
   add another argument and cause the verifier to not only not verify
   this pointer, but it will look at the wrong argument for every
   pointer after that.

 - Fix mcount sorttable building for different endian type target

   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used to
   determine if the bytes need to be swapped when calculations are done.
   A change was made to the sorting of the mcount_loc that read the
   values from the ELF file into an array and the swap happened on the
   filling of the array. But one of the calculations of the array still
   did the swap when it did not need to. This caused building on a
   little endian machine for a big endian target to not find the mcount
   function in the 'nm' table and it zeroed it out, causing there to be
   no functions available to trace.

 - Add goto out_unlock jump to rv_register_monitor() on error path

   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.

* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix missing unlock on double nested monitors return path
  scripts/sorttable: Fix endianness handling in build-time mcount sort
  tracing: Verify event formats that have "%*p.."
  ftrace: Add cond_resched() to ftrace_graph_set_hash()
  tracing: Free module_delta on freeing of persistent ring buffer
  ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
2025-04-03 09:52:44 -07:00
Kent Overstreet
77ad1df82b bcachefs: Fix "journal stuck" during recovery
If we crash when the journal pin fifo is completely full - i.e. we're
at the maximum number of dirty journal entries - that may put us in a
sticky situation in recovery, as journal replay will need to be able to
open new journal entries in order to get going.

bch2_fs_journal_start() already had provisions for resizing the journal
pin fifo if needed, but it needs a fudge factor to ensure there's room
for journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
2581f89ac8 bcachefs: backpointer_get_key: check for null from peek_slot()
peek_slot() doesn't normally return bkey_s_c_null - except when we ask
for a key at a btree level that doesn't exist, which can happen here.

We might want to revisit this, but we'll have to look over all the
places where we use peek_slot() on interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
39ebd74864 bcachefs: Fix null ptr deref in invalidate_one_bucket()
bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't
found.

backpointer_get_key() flags the error, so there's nothing else to do
here - just skip it and move on.

Link: https://github.com/koverstreet/bcachefs/issues/847
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
83d539b1b0 bcachefs: Fix check_snapshot_exists() restart handling
Codepaths that create entries in the snapshots btree currently call
bch2_mark_snapshot(), which updates the in-memory snapshot table, before
transaction commit.

This is because bch2_mark_snapshot() is an atomic trigger, run with
btree write locks held, and isn't allowed to fail - but it might need to
reallocate the table, hence we call it early when we're still allowed to
fail.

This is generally harmless - if we fail, we'll have left an entry in the
snapshots table around, but nothing will reference it and it'll get
overwritten if reused by another transaction.

But check_snapshot_exists(), which reconstructs snapshots when the
snapshots btree has been corrupted or lost, was erronously rechecking if
the snapshot exists inside the transaction commit loop - so on
transaction restart (in this case mem_realloced), the second iteration
would return without repairing.

This code needs some cleanup: splitting out a "maybe realloc snapshots
table" helper would have avoided this, that will be in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Bharadwaj Raju
570f5050bb bcachefs: use nonblocking variant of print_string_as_lines in error path
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.

Replace calls to it with the nonblocking variant which is safe to call.

Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Kent Overstreet
b2ffadcc7f bcachefs: Fix scheduling while atomic from logging changes
Two fixes from the recent logging changes:

bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.

The one syzbot found is in
  bch2_bkey_pick_read_device
  bch2_dev_rcu
  bch2_fs_inconsistent

We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.

Secondly, in btree_node_write_endio:

00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085  #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085  #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085  #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085  #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last  enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60
00085 softirqs last  enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [<0000000000000000>] 0x0
00085 Preemption disabled at:
00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085  show_stack+0x1c/0x30 (C)
00085  dump_stack_lvl+0x84/0xc0
00085  dump_stack+0x14/0x20
00085  __might_resched+0x180/0x288
00085  __might_sleep+0x4c/0x88
00085  __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085  krealloc_noprof+0x1a0/0x2d8
00085  bch2_printbuf_make_room+0x9c/0x120
00085  bch2_prt_printf+0x60/0x1b8
00085  btree_node_write_endio+0x1b0/0x2d8
00085  bio_endio+0x138/0x1f0
00085  btree_node_write_endio+0xe8/0x2d8
00085  bio_endio+0x138/0x1f0
00085  blk_update_request+0x220/0x4c0
00085  blk_mq_end_request+0x28/0x148
00085  virtblk_request_done+0x64/0xe8
00085  blk_mq_complete_request+0x34/0x40
00085  virtblk_done+0x78/0x130
00085  vring_interrupt+0x6c/0xb0
00085  __handle_irq_event_percpu+0x8c/0x2e0
00085  handle_irq_event+0x50/0xb0
00085  handle_fasteoi_irq+0xc4/0x250
00085  handle_irq_desc+0x44/0x60
00085  generic_handle_domain_irq+0x20/0x30
00085  gic_handle_irq+0x54/0xc8
00085  call_on_irq_stack+0x24/0x40

Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Wentao Liang
9364f17ba4 bcachefs: Add error handling for zlib_deflateInit2()
In attempt_compress(), the return value of zlib_deflateInit2() needs to be
checked. A proper implementation can be found in  pstore_compress().

Add an error check and return 0 immediately if the initialzation fails.

Fixes: 986e9842fb ("bcachefs: Compression levels")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Palmer Dabbelt
3eb64093f5 Merge tag 'riscv-mw2-6.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.15-rc1, part 2

* A bunch of fixes:
  - 2 fixes in the purgatory code which prevented kexec to work
  - Workaround an issue with gcc-15

* tag 'riscv-mw2-6.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux:
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  RISC-V: add vector extension validation checks
2025-04-03 08:53:19 -07:00
Ming Lei
01b91bf14f block: don't grab elevator lock during queue initialization
->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c.

queue freeze lock depends on fs_reclaim.

So don't grab elevator lock during queue initialization which needs to
call kmalloc(GFP_KERNEL), and we can cut the dependency between
->elevator_lock and fs_reclaim, then the lockdep warning can be killed.

This way is safe because elevator setting isn't ready to run during
queue initialization.

There isn't such issue in __blk_mq_update_nr_hw_queues() because
memalloc_noio_save() is called before acquiring elevator lock.

Fixes the following lockdep warning:

https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/

Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com
Cc: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250403105402.1334206-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03 08:32:03 -06:00
Pavel Begunkov
390513642e io_uring: always do atomic put from iowq
io_uring always switches requests to atomic refcounting for iowq
execution before there is any parallilism by setting REQ_F_REFCOUNT,
and the flag is not cleared until the request completes. That should be
fine as long as the compiler doesn't make up a non existing value for
the flags, however KCSAN still complains when the request owner changes
oter flag bits:

BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work
...
read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0:
 req_ref_put_and_test io_uring/refs.h:22 [inline]

Skip REQ_F_REFCOUNT checks for iowq, we know it's set.

Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03 08:31:57 -06:00
Lin Ma
1b755d8eb1 netfilter: nft_tunnel: fix geneve_opt type confusion addition
When handling multiple NFTA_TUNNEL_KEY_OPTS_GENEVE attributes, the
parsing logic should place every geneve_opt structure one by one
compactly. Hence, when deciding the next geneve_opt position, the
pointer addition should be in units of char *.

However, the current implementation erroneously does type conversion
before the addition, which will lead to heap out-of-bounds write.

[    6.989857] ==================================================================
[    6.990293] BUG: KASAN: slab-out-of-bounds in nft_tunnel_obj_init+0x977/0xa70
[    6.990725] Write of size 124 at addr ffff888005f18974 by task poc/178
[    6.991162]
[    6.991259] CPU: 0 PID: 178 Comm: poc-oob-write Not tainted 6.1.132 #1
[    6.991655] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[    6.992281] Call Trace:
[    6.992423]  <TASK>
[    6.992586]  dump_stack_lvl+0x44/0x5c
[    6.992801]  print_report+0x184/0x4be
[    6.993790]  kasan_report+0xc5/0x100
[    6.994252]  kasan_check_range+0xf3/0x1a0
[    6.994486]  memcpy+0x38/0x60
[    6.994692]  nft_tunnel_obj_init+0x977/0xa70
[    6.995677]  nft_obj_init+0x10c/0x1b0
[    6.995891]  nf_tables_newobj+0x585/0x950
[    6.996922]  nfnetlink_rcv_batch+0xdf9/0x1020
[    6.998997]  nfnetlink_rcv+0x1df/0x220
[    6.999537]  netlink_unicast+0x395/0x530
[    7.000771]  netlink_sendmsg+0x3d0/0x6d0
[    7.001462]  __sock_sendmsg+0x99/0xa0
[    7.001707]  ____sys_sendmsg+0x409/0x450
[    7.002391]  ___sys_sendmsg+0xfd/0x170
[    7.003145]  __sys_sendmsg+0xea/0x170
[    7.004359]  do_syscall_64+0x5e/0x90
[    7.005817]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[    7.006127] RIP: 0033:0x7ec756d4e407
[    7.006339] Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 faf
[    7.007364] RSP: 002b:00007ffed5d46760 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[    7.007827] RAX: ffffffffffffffda RBX: 00007ec756cc4740 RCX: 00007ec756d4e407
[    7.008223] RDX: 0000000000000000 RSI: 00007ffed5d467f0 RDI: 0000000000000003
[    7.008620] RBP: 00007ffed5d468a0 R08: 0000000000000000 R09: 0000000000000000
[    7.009039] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[    7.009429] R13: 00007ffed5d478b0 R14: 00007ec756ee5000 R15: 00005cbd4e655cb8

Fix this bug with correct pointer addition and conversion in parse
and dump code.

Fixes: 925d844696 ("netfilter: nft_tunnel: add support for geneve opts")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-03 13:32:03 +02:00
Mathieu Desnoyers
169eae7711 rseq: Eliminate useless task_work on execve
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().

The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.

sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.

Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().

[ mingo: Merged to a recent upstream kernel, extended the changelog. ]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-03 13:10:47 +02:00
Oleg Nesterov
975776841e sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
kernel/sched/isolation.c obviously makes no sense without CONFIG_SMP, but
the Kconfig entry we have right now:

	config CPU_ISOLATION
		bool "CPU isolation"
		depends on SMP || COMPILE_TEST

allows the creation of pointless .config's which cause
build failures.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250330134955.GA7910@redhat.com

Closes: https://lore.kernel.org/oe-kbuild-all/202503260646.lrUqD3j5-lkp@intel.com/
2025-04-03 13:08:04 +02:00
Antoine Tenart
3a0a3ff659 net: decrease cached dst counters in dst_release
Upstream fix ac888d5886 ("net: do not delay dst_entries_add() in
dst_release()") moved decrementing the dst count from dst_destroy to
dst_release to avoid accessing already freed data in case of netns
dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels
are used, this fix is incomplete as the same issue will be seen for
cached dsts:

  Unable to handle kernel paging request at virtual address ffff5aabf6b5c000
  Call trace:
   percpu_counter_add_batch+0x3c/0x160 (P)
   dst_release+0xec/0x108
   dst_cache_destroy+0x68/0xd8
   dst_destroy+0x13c/0x168
   dst_destroy_rcu+0x1c/0xb0
   rcu_do_batch+0x18c/0x7d0
   rcu_core+0x174/0x378
   rcu_core_si+0x18/0x30

Fix this by invalidating the cache, and thus decrementing cached dst
counters, in dst_release too.

Fixes: d71785ffc7 ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250326173634.31096-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-03 13:05:07 +02:00
Christian Marangi
12e0b15b19 crypto: inside-secure/eip93 - acquire lock on eip93_put_descriptor hash
In the EIP93 HASH functions, the eip93_put_descriptor is called without
acquiring lock. This is problematic when multiple thread execute hash
operations.

Correctly acquire ring write lock on calling eip93_put_descriptor to
prevent concurrent access and mess with the ring pointers.

Fixes: 9739f5f93b ("crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support")
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-03 19:04:33 +08:00
Dave Airlie
526da2436b Merge tag 'amd-drm-next-6.15-2025-03-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.15-2025-03-27:

amdgpu:
- Guard against potential division by 0 in fan code
- Zero RPM support for SMU 14.0.2
- Properly handle SI and CIK support being disabled
- PSR fixes
- DML2 fixes
- DP Link training fix
- Vblank fixes
- RAS fixes
- Partitioning fix
- SDMA fix
- SMU 13.0.x fixes
- Rom fetching fix
- MES fixes
- Queue reset fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250328004749.3392457-1-alexander.deucher@amd.com
2025-04-03 15:53:37 +10:00
Dave Airlie
227bcf2c55 Merge tag 'drm-xe-next-fixes-2025-03-27' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
- Fix NULL pointer dereference on error path
- Add missing HW workaround for BMG
- Fix survivability mode not triggering
- Fix build warning when DRM_FBDEV_EMULATION is not set

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/vxy5kwdkzgp2u2umnyxv4ygslmdlvzjl22xotzxaw55dv7plpz@34miqxkbvggu
2025-04-03 15:52:39 +10:00
Dave Airlie
41ae768afb Merge tag 'drm-intel-next-fixes-2025-03-25' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 fixes for v6.15 merge window:
- Bounds check for scalers in DSC prefill latency computation
- Fix build by adding a missing include

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/878qota36x.fsf@intel.com
2025-04-03 15:52:15 +10:00
Dave Airlie
889f32b4d7 Merge tag 'drm-misc-next-fixes-2025-03-27' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:

adp:
- Fix error handling in plane setup

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250327141835.GA96037@linux.fritz.box
2025-04-03 15:46:30 +10:00
Linus Torvalds
a2cc6ff5ec Merge tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire update from Takashi Sakamoto:
 "A single commit to use the common helper function for on-stack
  trailing array to enqueue any isochronous packet by the requests
  from userspace applications"

* tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: avoid -Wflex-array-member-not-at-end warning
2025-04-02 22:41:04 -07:00
Yonghong Song
3f8ad18f81 selftests/bpf: Fix verifier_private_stack test failure
Several verifier_private_stack tests failed with latest bpf-next.
For example, for 'Private stack, single prog' subtest, the
jitted code:
  func #0:
  0:      f3 0f 1e fa                             endbr64
  4:      0f 1f 44 00 00                          nopl    (%rax,%rax)
  9:      0f 1f 00                                nopl    (%rax)
  c:      55                                      pushq   %rbp
  d:      48 89 e5                                movq    %rsp, %rbp
  10:     f3 0f 1e fa                             endbr64
  14:     49 b9 58 74 8a 8f 7d 60 00 00           movabsq $0x607d8f8a7458, %r9
  1e:     65 4c 03 0c 25 28 c0 48 87              addq    %gs:-0x78b73fd8, %r9
  27:     bf 2a 00 00 00                          movl    $0x2a, %edi
  2c:     49 89 b9 00 ff ff ff                    movq    %rdi, -0x100(%r9)
  33:     31 c0                                   xorl    %eax, %eax
  35:     c9                                      leave
  36:     e9 20 5d 0f e1                          jmp     0xffffffffe10f5d5b

The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected
regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure.

Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the
possible negative offset. A few other subtests are fixed in a similar way.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:44 -07:00
Song Liu
14d84357a0 selftests/bpf: Fix verifier_bpf_fastcall test
Commit [1] moves percpu data on x86 from address 0x000... to address
0xfff...

Before [1]:

159020: 0000000000030700     0 OBJECT  GLOBAL DEFAULT   23 pcpu_hot

After [1]:

152602: ffffffff83a3e034     4 OBJECT  GLOBAL DEFAULT   35 pcpu_hot

As a result, verifier_bpf_fastcall tests should now expect a negative
value for pcpu_hot, IOW, the disassemble should show "r=" instead of
"w=".

Fix this in the test.

Note that, a later change created a new variable "cpu_number" for
bpf_get_smp_processor_id() [2]. The inlining logic is updated properly
as part of this change, so there is no need to fix anything on the
kernel side.

[1] commit 9d7de2aa8b ("x86/percpu/64: Use relative percpu offsets")
[2] commit 01c7bc5198 ("x86/smp: Move cpu number to percpu hot section")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:43 -07:00
Song Liu
00387808d3 selftests/bpf: Fix tests after fields reorder in struct file
The change in struct file [1] moved f_ref to the 3rd cache line.
It made *(u64 *)file dereference invalid from the verifier point of view,
because btf_struct_walk() walks into f_lock field, which is 4-byte long.

Fix the selftests to deference the file pointer as a 4-byte access.

[1] commit e249056c91 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:43 -07:00
Wang Liang
5d0b204654 xsk: Fix __xsk_generic_xmit() error code when cq is full
When the cq reservation is failed, the error code is not set which is
initialized to zero in __xsk_generic_xmit(). That means the packet is not
send successfully but sendto() return ok.

Considering the impact on uapi, return -EAGAIN is a good idea. The cq is
full usually because it is not released in time, try to send msg again is
appropriate.

The bug was at the very early implementation of xsk, so the Fixes tag
targets the commit that introduced the changes in
xsk_cq_reserve_addr_locked where this fix depends on.

Fixes: e6c4047f51 ("xsk: Use xsk_buff_pool directly for cq functions")
Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250227081052.4096337-1-wangliang74@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:43 -07:00
Linus Torvalds
5014bebee0 Merge tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:

 - dm-crypt: switch to using the crc32 library

 - dm-verity, dm-integrity, dm-crypt: documentation improvement

 - dm-vdo fixes

 - dm-stripe: enable inline crypto passthrough

 - dm-integrity: set ti->error on memory allocation failure

 - dm-bufio: remove unused return value

 - dm-verity: do forward error correction on metadata I/O errors

 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH

 - dm cache: prevent BUG_ON by blocking retries on failed device resumes

 - dm cache: support shrinking the origin device

 - dm: restrict dm device size to 2^63-512 bytes

 - dm-delay: support zoned devices

 - dm-verity: support block number limits for different ioprio classes

 - dm-integrity: fix non-constant-time tag verification (security bug)

 - dm-verity, dm-ebs: fix prefetch-vs-suspend race

* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm-ebs: fix prefetch-vs-suspend race
  dm-verity: fix prefetch-vs-suspend race
  dm-integrity: fix non-constant-time tag verification
  dm-verity: support block number limits for different ioprio classes
  dm-delay: support zoned devices
  dm: restrict dm device size to 2^63-512 bytes
  dm cache: support shrinking the origin device
  dm cache: prevent BUG_ON by blocking retries on failed device resumes
  dm vdo indexer: reorder uds_request to reduce padding
  dm: fix unconditional IO throttle caused by REQ_PREFLUSH
  dm vdo: rework processing of loaded refcount byte arrays
  dm vdo: remove remaining ring references
  dm-verity: do forward error correction on metadata I/O errors
  dm-bufio: remove unused return value
  dm-integrity: set ti->error on memory allocation failure
  dm: Enable inline crypto passthrough for striped target
  dm vdo slab-depot: read refcount blocks in large chunks at load time
  dm vdo vio-pool: allow variable-sized metadata vios
  dm vdo vio-pool: support pools with multiple data blocks per vio
  dm vdo vio-pool: add a pool pointer to pooled_vio
  ...
2025-04-02 21:27:59 -07:00
Tingmao Wang
4210030d8b docs: fs/9p: Add missing "not" in cache documentation
A quick fix for what I assume is a typo.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20250330213443.98434-1-m@maowtm.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-04-03 12:31:11 +09:00
Linus Torvalds
447d2d272e Merge tag 'libnvdimm-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ira Weiny:
 "Most of the code changes are to remove dead code.

  The bug fixes are minor, Syzkaller and one for broken devices which
  are unlikely to be in the field. So no need to backport them.

   - two patches to remove dead code: nd_attach_ndns() and
     nd_region_conflict() have not been used since 2017 and 2019
     respectively

   - Fix divide-by-0 if device returns a broken LSA value

   - Fix Syzkaller reported bug"

* tag 'libnvdimm-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/labels: Fix divide error in nd_label_data_init()
  libnvdimm: Remove unused nd_attach_ndns
  libnvdimm: Remove unused nd_region_conflict
  acpi: nfit: fix narrowing conversion in acpi_nfit_ctl
2025-04-02 20:27:18 -07:00
Linus Torvalds
01ecadbe09 Merge tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL)  updates from Dave Jiang:

 - Add support for Global Persistent Flush (GPF)

 - Cleanup of DPA partition metadata handling:
     - Remove the CXL_DECODER_MIXED enum that's not needed anymore
     - Introduce helpers to access resource and perf meta data
     - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
     - Make cxl_dpa_alloc() DPA partition number agnostic
     - Remove cxl_decoder_mode
     - Cleanup partition size and perf helpers

 - Remove unused CXL partition values

 - Add logging support for CXL CPER endpoint and port protocol errors:
     - Prefix protocol error struct and function names with cxl_
     - Move protocol error definitions and structures to a common location
     - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
     - Add support in GHES to process CXL CPER protocol errors
     - Process CXL CPER protocol errors
     - Add trace logging for CXL PCIe port RAS errors

 - Remove redundant gp_port init

 - Add validation of cxl device serial number

 - CXL ABI documentation updates/fixups

 - A series that uses guard() to clean up open coded mutex lockings and
   remove gotos for error handling.

 - Some followup patches to support dirty shutdown accounting:
     - Add helper to retrieve DVSEC offset for dirty shutdown registers
     - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
     - Add support for dirty shutdown count via sysfs
     - cxl_test support for dirty shutdown

 - A series to support CXL mailbox Features commands.

   Mostly in preparation for CXL EDAC code to utilize the Features
   commands. It's also in preparation for CXL fwctl support to utilize
   the CXL Features. The commands include "Get Supported Features", "Get
   Feature", and "Set Feature".

 - A series to support extended linear cache support described by the
   ACPI HMAT table.

   The addition helps enumerate the cache and also provides additional
   RAS reporting support for configuration with extended linear cache.
   (and related fixes for the series).

 - An update to cxl_test to support a 3-way capable CFMWS

 - A documentation fix to remove unused "mixed mode"

* tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits)
  cxl/region: Fix the first aliased address miscalculation
  cxl/region: Quiet some dev_warn()s in extended linear cache setup
  cxl/Documentation: Remove 'mixed' from sysfs mode doc
  cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems
  cxl/test: Define a CFMWS capable of a 3 way HB interleave
  cxl/mem: Do not return error if CONFIG_CXL_MCE unset
  tools/testing/cxl: Set Shutdown State support
  cxl/pmem: Export dirty shutdown count via sysfs
  cxl/pmem: Rename cxl_dirty_shutdown_state()
  cxl/pci: Introduce cxl_gpf_get_dvsec()
  cxl/pci: Support Global Persistent Flush (GPF)
  cxl: Document missing sysfs files
  cxl: Plug typos in ABI doc
  cxl/pmem: debug invalid serial number data
  cxl/cdat: Remove redundant gp_port initialization
  cxl/memdev: Remove unused partition values
  cxl/region: Drop goto pattern of construct_region()
  cxl/region: Drop goto pattern in cxl_dax_region_alloc()
  cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()
  cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()
  ...
2025-04-02 20:04:43 -07:00