Commit Graph

89455 Commits

Author SHA1 Message Date
Linus Torvalds
ecbcffe3b7 Merge tag 'ata-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:

 - Avoid deadlocks on resume from sleep by delaying scsi rescan until
   the scsi device is also fully resumed.

* tag 'ata-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-scsi: Avoid deadlock on rescan after device resume
2023-06-18 09:48:39 -07:00
Damien Le Moal
6aa0365a3c ata: libata-scsi: Avoid deadlock on rescan after device resume
When an ATA port is resumed from sleep, the port is reset and a power
management request issued to libata EH to reset the port and rescanning
the device(s) attached to the port. Device rescanning is done by
scheduling an ata_scsi_dev_rescan() work, which will execute
scsi_rescan_device().

However, scsi_rescan_device() takes the generic device lock, which is
also taken by dpm_resume() when the SCSI device is resumed as well. If
a device rescan execution starts before the completion of the SCSI
device resume, the rcu locking used to refresh the cached VPD pages of
the device, combined with the generic device locking from
scsi_rescan_device() and from dpm_resume() can cause a deadlock.

Avoid this situation by changing struct ata_port scsi_rescan_task to be
a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
modified to check if the SCSI device associated with the ATA device that
must be rescanned is not suspended. If the SCSI device is still
suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
execution after an arbitrary delay of 5ms.

Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Joe Breuer <linux-kernel@jmbreuer.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530
Fixes: a19a93e4c6 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>
2023-06-18 12:00:49 +09:00
Linus Torvalds
b73056e9f8 Merge tag 'urgent-rcu.2023.06.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
 "This fixes a spinlock-initialization regression in SRCU that causes
  the SRCU notifier to fail.

  The fix simply adds the initialization, but introduces a #ifdef
  because there is no spinlock to initialize for the Tiny SRCU used in
  !SMP builds.

  Yes, it would be nice to abstract this somehow in order to hide it in
  SRCU, but I still don't see a good way of doing this"

* tag 'urgent-rcu.2023.06.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  notifier: Initialize new struct srcu_usage field
2023-06-16 11:43:15 -07:00
Linus Torvalds
93fd8eb053 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "This is an unusually large bunch of bug fixes for the later rc cycle,
  rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
  itself, and mlx5 is fixing a bunch of "queue counters" related bugs.

  There is one highly notable bug fix regarding the qkey. This small
  security check was missed in the original 2005 implementation and it
  allows some significant issues.

  Summary:

   - Two rtrs bug fixes for error unwind bugs

   - Several rxe bug fixes:
      * Incorrect Rx packet validation
      * Using memory without a refcount
      * Syzkaller found use before initialization
      * Regression fix for missing locking with the tasklet conversion
        from this merge window

   - Have bnxt report the correct link properties to userspace, this was
     a regression in v6.3

   - Several mlx5 bug fixes:
      * Kernel crash triggerable by userspace for the RAW ethernet
        profile
      * Defend against steering refcounting issues created by userspace
      * Incorrect change of QP port affinity parameters in some LAG
        configurations

   - Fix mlx5 Q counters:
      * Do not over allocate Q counters to allow userspace to use the
        full port capacity
      * Kernel crash triggered by eswitch due to mis-use of Q counters
      * Incorrect mlx5_device for Q counters in some LAG configurations

   - Properly implement the IBA spec restricting privileged qkeys to
     root

   - Always an error when reading from a disassociated device's event
     queue

   - isert bug fixes:
      * Avoid a deadlock with the CM handler and CM ID destruction
      * Correct list corruption due to incorrect locking
      * Fix a use after free around connection tear down"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix rxe_cq_post
  IB/isert: Fix incorrect release of isert connection
  IB/isert: Fix possible list corruption in CMA handler
  IB/isert: Fix dead lock in ib_isert
  RDMA/mlx5: Fix affinity assignment
  IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
  RDMA/uverbs: Restrict usage of privileged QKEYs
  RDMA/cma: Always set static rate to 0 for RoCE
  RDMA/mlx5: Fix Q-counters query in LAG mode
  RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
  RDMA/mlx5: Fix Q-counters per vport allocation
  RDMA/mlx5: Create an indirect flow table for steering anchor
  RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
  RDMA/rxe: Fix the use-before-initialization error of resp_pkts
  RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
  RDMA/rxe: Fix ref count error in check_rkey()
  RDMA/rxe: Fix packet length checks
  RDMA/rtrs: Fix rxe_dealloc_pd warning
  RDMA/rtrs: Fix the last iu->buf leak in err path
2023-06-15 20:13:56 -07:00
Mark Bloch
617f5db1a6 RDMA/mlx5: Fix affinity assignment
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.

However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.

The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.

To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.

Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11 11:27:17 +03:00
Linus Torvalds
859c745951 Merge tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "Most of the changes this time are for the Qualcomm Snapdragon
  platforms.

  There are bug fixes for error handling in Qualcomm icc-bwmon,
  rpmh-rsc, ramp_controller and rmtfs driver as well as the AMD tee
  firmware driver and a missing initialization in the Arm ff-a firmware
  driver. The Qualcomm RPMh and EDAC drivers need some rework to work
  correctly on all supported chips.

  The DT fixes include:

   - i.MX8 fixes for gpio, pinmux and clock settings

   - ADS touchscreen gpio polarity settings in several machines

   - Address dtb warnings for caches, panel and input-enable properties
     on Qualcomm platforms

   - Incorrect data on qualcomm platforms fir SA8155P power domains,
     SM8550 LLCC, SC7180-lite SDRAM frequencies and SM8550 soundwire

   - Remoteproc firmware paths are corrected for Sony Xperia 10 IV"

* tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits)
  firmware: arm_ffa: Set handle field to zero in memory descriptor
  ARM: dts: Fix erroneous ADS touchscreen polarities
  arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
  arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
  arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
  EDAC/qcom: Get rid of hardcoded register offsets
  EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
  arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
  dt-bindings: cache: qcom,llcc: Fix SM8550 description
  arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
  arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
  soc: qcom: rpmhpd: Add SA8155P power domains
  arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
  dt-bindings: power: qcom,rpmpd: Add SA8155P
  soc: qcom: Rename ice to qcom_ice to avoid module name conflict
  soc: qcom: rmtfs: Fix error code in probe()
  soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
  ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
  ARM: at91: pm: fix imbalanced reference counter for ethernet devices
  arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
  ...
2023-06-10 13:01:09 -07:00
Linus Torvalds
25041a4c02 Merge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from can, wifi, netfilter, bluetooth and ebpf.

  Current release - regressions:

   - bpf: sockmap: avoid potential NULL dereference in
     sk_psock_verdict_data_ready()

   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()

   - phylink: actually fix ksettings_set() ethtool call

   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3

  Current release - new code bugs:

   - wifi: mt76: fix possible NULL pointer dereference in
     mt7996_mac_write_txwi()

  Previous releases - regressions:

   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper

   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS

   - openvswitch: fix upcall counter access before allocation

   - bluetooth:
      - fix use-after-free in hci_remove_ltk/hci_remove_irk
      - fix l2cap_disconnect_req deadlock

   - nic: bnxt_en: prevent kernel panic when receiving unexpected
     PHC_UPDATE event

  Previous releases - always broken:

   - core: annotate rfs lockless accesses

   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

   - netfilter: add null check for nla_nest_start_noflag() in
     nft_dump_basechain_hook()

   - bpf: fix UAF in task local storage

   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294

   - ipv6: rpl: fix route of death.

   - tcp: gso: really support BIG TCP

   - mptcp: fixes for user-space PM address advertisement

   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT

   - can: avoid possible use-after-free when j1939_can_rx_register fails

   - batman-adv: fix UaF while rescheduling delayed work

   - eth: qede: fix scheduling while atomic

   - eth: ice: make writes to /dev/gnssX synchronous"

* tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
  bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
  bnxt_en: Skip firmware fatal error recovery if chip is not accessible
  bnxt_en: Query default VLAN before VNIC setup on a VF
  bnxt_en: Don't issue AP reset during ethtool's reset operation
  bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
  net: bcmgenet: Fix EEE implementation
  eth: ixgbe: fix the wake condition
  eth: bnxt: fix the wake condition
  lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
  bpf: Add extra path pointer check to d_path helper
  net: sched: fix possible refcount leak in tc_chain_tmplt_add()
  net: sched: act_police: fix sparse errors in tcf_police_dump()
  net: openvswitch: fix upcall counter access before allocation
  net: sched: move rtm_tca_policy declaration to include file
  ice: make writes to /dev/gnssX synchronous
  net: sched: add rcu annotations around qdisc->qdisc_sleeping
  rfs: annotate lockless accesses to RFS sock flow table
  rfs: annotate lockless accesses to sk->sk_rxhash
  virtio_net: use control_buf for coalesce params
  ...
2023-06-08 09:27:19 -07:00
Chen-Yu Tsai
de29a96acc notifier: Initialize new struct srcu_usage field
In commit 95433f7263 ("srcu: Begin offloading srcu_struct fields to
srcu_update"), a new struct srcu_usage field was added, but was not
properly initialized. This led to a "spinlock bad magic" BUG when the
SRCU notifier was ever used. This was observed in the MediaTek CCI
devfreq driver on next-20230525. The trimmed stack trace is as follows:

    BUG: spinlock bad magic on CPU#4, swapper/0/1
     lock: 0xffffff80ff529ac0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
    Call trace:
     spin_bug+0xa4/0xe8
     do_raw_spin_lock+0xec/0x120
     _raw_spin_lock_irqsave+0x78/0xb8
     synchronize_srcu+0x3c/0x168
     srcu_notifier_chain_unregister+0x5c/0xa0
     cpufreq_unregister_notifier+0x94/0xe0
     devfreq_passive_event_handler+0x7c/0x3e0
     devfreq_remove_device+0x48/0xe8

Add __SRCU_USAGE_INIT() to SRCU_NOTIFIER_INIT() so that srcu_usage gets
initialized properly.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 95433f7263 ("srcu: Begin offloading srcu_struct fields to srcu_update")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-06-07 13:42:02 -07:00
Eric Dumazet
d636fc5dd6 net: sched: add rcu annotations around qdisc->qdisc_sleeping
syzbot reported a race around qdisc->qdisc_sleeping [1]

It is time we add proper annotations to reads and writes to/from
qdisc->qdisc_sleeping.

[1]
BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu

read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
__tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023

Fixes: 3a7d0d07a3 ("net: sched: extend Qdisc with rcu")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:25:39 +01:00
Eric Dumazet
5c3b74a92a rfs: annotate lockless accesses to RFS sock flow table
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.

This also prevents a (smart ?) compiler to remove the condition in:

if (table->ents[index] != newval)
        table->ents[index] = newval;

We need the condition to avoid dirtying a shared cache line.

Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:08:45 +01:00
Linus Torvalds
846b065da6 Merge tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:

 - various Microsoft Surface support fixes

 - one fix for the INT3472 driver

* tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: int3472: Avoid crash in unregistering regulator gpio
  platform/surface: aggregator_tabletsw: Add support for book mode in POS subsystem
  platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem
  platform/surface: aggregator: Allow completion work-items to be executed in parallel
  platform/surface: aggregator: Make to_ssam_device_driver() respect constness
2023-06-06 05:42:21 -07:00
Arnd Bergmann
be02e1fc71 Merge tag 'qcom-driver-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm driver fixes for 6.4

Error paths is corrected across icc-bwmon, rpmh-rsc, ramp_controller and
rmtfs. The ice module is renamed qcom_ice, to avoid clashing with
existing "ice" driver.

SA8155P-specific RPMh power-domains are introduced to avoid the code
trying to access resources that exists on SM8150, but not on SA8155P.

Lastly, changes to the EDAC driver to fix an issue where the driver
performs mmio based on the wrong register map.

* tag 'qcom-driver-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  EDAC/qcom: Get rid of hardcoded register offsets
  EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
  dt-bindings: cache: qcom,llcc: Fix SM8550 description
  soc: qcom: rpmhpd: Add SA8155P power domains
  dt-bindings: power: qcom,rpmpd: Add SA8155P
  soc: qcom: Rename ice to qcom_ice to avoid module name conflict
  soc: qcom: rmtfs: Fix error code in probe()
  soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
  soc: qcom: rpmh-rsc: drop redundant unsigned >=0 comparision
  soc: qcom: icc-bwmon: fix incorrect error code passed to dev_err_probe()

Link: https://lore.kernel.org/r/20230601141058.2246039-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-05 17:35:28 +02:00
Linus Torvalds
209835e8ec Merge tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that
  resolve a number of reported issues. Included in here are:

   - iio driver fixes

   - fpga driver fixes

   - test_firmware bugfixes

   - fastrpc driver tiny bugfixes

   - MAINTAINERS file updates for some subsystems

  All of these have been in linux-next this past week with no reported
  issues"

* tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits)
  test_firmware: fix the memory leak of the allocated firmware buffer
  test_firmware: fix a memory leak with reqs buffer
  test_firmware: prevent race conditions by a correct implementation of locking
  firmware_loader: Fix a NULL vs IS_ERR() check
  MAINTAINERS: Vaibhav Gupta is the new ipack maintainer
  dt-bindings: fpga: replace Ivan Bornyakov maintainership
  MAINTAINERS: update Microchip MPF FPGA reviewers
  misc: fastrpc: reject new invocations during device removal
  misc: fastrpc: return -EPIPE to invocations on device removal
  misc: fastrpc: Reassign memory ownership only for remote heap
  misc: fastrpc: Pass proper scm arguments for secure map request
  iio: imu: inv_icm42600: fix timestamp reset
  iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
  dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
  iio: dac: mcp4725: Fix i2c_master_send() return value handling
  iio: accel: kx022a fix irq getting
  iio: bu27034: Ensure reset is written
  iio: dac: build ad5758 driver when AD5758 is selected
  iio: addac: ad74413: fix resistance input processing
  iio: light: vcnl4035: fixed chip ID check
  ...
2023-06-04 08:32:30 -04:00
Linus Torvalds
8b435e4025 Merge tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some USB driver and core fixes for 6.4-rc5. Most of these are
  tiny driver fixes, including:

   - udc driver bugfix

   - f_fs gadget driver bugfix

   - cdns3 driver bugfix

   - typec bugfixes

  But the "big" thing in here is a fix yet-again for how the USB buffers
  are handled from userspace when dealing with DMA issues. The changes
  were discussed a lot, and tested a lot, on the list, and acked by the
  relevant mm maintainers and have been in linux-next all this past week
  with no reported problems"

* tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tps6598x: Fix broken polling mode after system suspend/resume
  mm: page_table_check: Ensure user pages are not slab pages
  mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
  usb: usbfs: Use consistent mmap functions
  usb: usbfs: Enforce page requirements for mmap
  dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type
  usb: gadget: udc: fix NULL dereference in remove()
  usb: gadget: f_fs: Add unbind event before functionfs_unbind
  usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
2023-06-04 07:31:48 -04:00
Linus Torvalds
a746ca666a Merge tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Two minor bug fixes

* tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix double fget() bug in __write_ports_addfd()
  nfsd: make a copy of struct iattr before calling notify_change
2023-06-02 13:38:55 -04:00
Linus Torvalds
792fc92140 Merge tag 'efi-fixes-for-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
 "A few minor fixes for EFI, one of which fixes the reported boot
  regression when booting x86 kernels using the BIOS based loader built
  into the hypervisor framework on macOS.

   - fix harmless warning in zboot code on 'make clean'

   - add some missing prototypes

   - fix boot regressions triggered by PE/COFF header image minor
     version bump"

* tag 'efi-fixes-for-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Bump stub image version for macOS HVF compatibility
  efi: fix missing prototype warnings
  efi/libstub: zboot: Avoid eager evaluation of objcopy flags
2023-06-01 20:43:11 -04:00
Linus Torvalds
714069daa5 Merge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Happy Wear a Dress Day.

  Fairly standard-sized batch of fixes, accounting for the lack of
  sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
  were complaining about that. No fires burning.

  Current release - regressions:

   - eth: mlx5e:
      - multiple fixes for dynamic IRQ allocation
      - prevent encap offload when neigh update is running

   - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters

  Current release - new code bugs:

   - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager

  Previous releases - always broken:

   - tcp: deny tcp_disconnect() when threads are waiting

   - sched: prevent ingress Qdiscs from getting installed in random
     locations in the hierarchy and moving around

   - sched: flower: fix possible OOB write in fl_set_geneve_opt()

   - netlink: fix NETLINK_LIST_MEMBERSHIPS length report

   - udp6: fix race condition in udp6_sendmsg & connect

   - tcp: fix mishandling when the sack compression is deferred

   - rtnetlink: validate link attributes set at creation time

   - mptcp: fix connect timeout handling

   - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked

   - eth: amd-xgbe: fix the false linkup in xgbe_phy_status

   - eth: mlx5e:
      - fix corner cases in internal buffer configuration
      - drain health before unregistering devlink

   - usb: qmi_wwan: set DTR quirk for BroadMobi BM818

  Misc:

   - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
     user_mss set"

* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  mptcp: fix active subflow finalization
  mptcp: add annotations around sk->sk_shutdown accesses
  mptcp: fix data race around msk->first access
  mptcp: consolidate passive msk socket initialization
  mptcp: add annotations around msk->subflow accesses
  mptcp: fix connect timeout handling
  rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
  rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
  rtnetlink: call validate_linkmsg in rtnl_create_link
  ice: recycle/free all of the fragments from multi-buffer frame
  net: phy: mxl-gpy: extend interrupt fix to all impacted variants
  net: renesas: rswitch: Fix return value in error path of xmit
  net: dsa: mv88e6xxx: Increase wait after reset deactivation
  net: ipa: Use correct value for IPA_STATUS_SIZE
  tcp: fix mishandling when the sack compression is deferred.
  net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
  sfc: fix error unwinds in TC offload
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  ...
2023-06-01 17:29:18 -04:00
Mike Christie
f9010dbdce fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d50 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-01 17:15:33 -04:00
Linus Torvalds
1874a42a7d Merge tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
 "A single patch to use a flexible array rather than a zero-length one"

* tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: Replace zero-length array with flexible-array member
2023-06-01 11:18:20 -04:00
Gustavo A. R. Silva
4420528254 firewire: Replace zero-length array with flexible-array member
Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.

Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
  694 |                 generate_cip_header(s, cip_header, data_block_counter, syt);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
  667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
      |             ^~~~~~~~~~~~~~~~~~~

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-06-01 22:41:14 +09:00
Dan Carpenter
c034203b6a nfsd: fix double fget() bug in __write_ports_addfd()
The bug here is that you cannot rely on getting the same socket
from multiple calls to fget() because userspace can influence
that.  This is a kind of double fetch bug.

The fix is to delete the svc_alien_sock() function and instead do
the checking inside the svc_addsock() function.

Fixes: 3064639423 ("nfsd: check passed socket's net matches NFSd superblock's one")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-31 09:57:14 -04:00
Maximilian Luz
ed08d937ea platform/surface: aggregator: Make to_ssam_device_driver() respect constness
Make to_ssam_device_driver() a bit safer by replacing container_of()
with container_of_const() to respect the constness of the passed in
pointer, instead of silently discarding any const specifications. This
change also makes it more similar to to_ssam_device(), which already
uses container_of_const().

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525205041.2774947-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-30 11:20:02 +02:00
Ruihan Li
44d0fb387b mm: page_table_check: Ensure user pages are not slab pages
The current uses of PageAnon in page table check functions can lead to
type confusion bugs between struct page and slab [1], if slab pages are
accidentally mapped into the user space. This is because slab reuses the
bits in struct page to store its internal states, which renders PageAnon
ineffective on slab pages.

Since slab pages are not expected to be mapped into the user space, this
patch adds BUG_ON(PageSlab(page)) checks to make sure that slab pages
are not inadvertently mapped. Otherwise, there must be some bugs in the
kernel.

Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: df4e817b71 ("mm: page table check")
Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-5-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Ruihan Li
0143d148d1 usb: usbfs: Enforce page requirements for mmap
The current implementation of usbdev_mmap uses usb_alloc_coherent to
allocate memory pages that will later be mapped into the user space.
Meanwhile, usb_alloc_coherent employs three different methods to
allocate memory, as outlined below:
 * If hcd->localmem_pool is non-null, it uses gen_pool_dma_alloc to
   allocate memory;
 * If DMA is not available, it uses kmalloc to allocate memory;
 * Otherwise, it uses dma_alloc_coherent.

However, it should be noted that gen_pool_dma_alloc does not guarantee
that the resulting memory will be page-aligned. Furthermore, trying to
map slab pages (i.e., memory allocated by kmalloc) into the user space
is not resonable and can lead to problems, such as a type confusion bug
when PAGE_TABLE_CHECK=y [1].

To address these issues, this patch introduces hcd_alloc_coherent_pages,
which addresses the above two problems. Specifically,
hcd_alloc_coherent_pages uses gen_pool_dma_alloc_align instead of
gen_pool_dma_alloc to ensure that the memory is page-aligned. To replace
kmalloc, hcd_alloc_coherent_pages directly allocates pages by calling
__get_free_pages.

Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.comm
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: f7d34b445a ("USB: Add support for usbfs zerocopy.")
Fixes: ff2437befd ("usb: host: Fix excessive alignment restriction for local memory allocations")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230515130958.32471-2-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Linus Torvalds
8b817fded4 Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
 "User events:

   - Use long instead of int for storing the enable set/clear bit, as it
     was found that big endian machines could end up using the wrong
     bits.

   - Split allocating mm and attaching it. This keeps the allocation
     separate from the registration and avoids various races.

   - Remove RCU locking around pin_user_pages_remote() as that can
     schedule. The RCU protection is no longer needed with the above
     split of mm allocation and attaching.

   - Rename the "link" fields of the various structs to something more
     meaningful.

   - Add comments around user_event_mm struct usage and locking
     requirements.

  Timerlat tracer:

   - Fix missed wakeup of timerlat thread caused by the timerlat
     interrupt triggering when tracing is off. The timer interrupt
     handler needs to always wake up the timerlat thread regardless if
     tracing is enabled or not, otherwise, it will never wake up.

  Histograms:

   - Fix regression of breaking the "stacktrace" modifier for variables.
     That modifier cannot be used for values, but can be used for
     variables that are passed from one histogram to the next. This was
     broken when adding the restriction to values as the variable logic
     used the same code.

   - Rename the special field "stacktrace" to "common_stacktrace".

     Special fields (that are not actually part of the event, but can
     act just like event fields, like 'comm' and 'timestamp') should be
     prefixed with 'common_' for consistency. To keep backward
     compatibility, 'stacktrace' can still be used (as with the special
     field 'cpu'), but can be overridden if the event has a field called
     'stacktrace'.

   - Update the synthetic event selftests to use the new name (synthetic
     events are created by histograms)

  Tracing bootup selftests:

   - Reorganize the code to keep artifacts of the selftests not compiled
     in when selftests are not configured.

   - Add various cond_resched() around the selftest code, as the
     softlock watchdog was triggering much more often. It appears that
     the kernel runs slower now with full debugging enabled.

   - While debugging ftrace with ftrace (using an instance ring buffer
     instead of the top level one), I found that the selftests were
     disabling prints to the debug instance.

     This should not happen, as the selftests only disable printing to
     the main buffer as the selftests examine the main buffer to see if
     it has what it expects, and prints can make the tests fail.

     Make the selftests only disable printing to the toplevel buffer,
     and leave the instance buffers alone"

* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have function_graph selftest call cond_resched()
  tracing: Only make selftest conditionals affect the global_trace
  tracing: Make tracing_selftest_running/delete nops when not used
  tracing: Have tracer selftests call cond_resched() before running
  tracing: Move setting of tracing_selftest_running out of register_tracer()
  tracing/selftests: Update synthetic event selftest to use common_stacktrace
  tracing: Rename stacktrace field to common_stacktrace
  tracing/histograms: Allow variables to have some modifiers
  tracing/user_events: Document user_event_mm one-shot list usage
  tracing/user_events: Rename link fields for clarity
  tracing/user_events: Remove RCU lock while pinning pages
  tracing/user_events: Split up mm alloc and attach
  tracing/timerlat: Always wakeup the timerlat thread
  tracing/user_events: Use long vs int for atomic bit ops
2023-05-29 07:20:13 -04:00
Linus Torvalds
ac2263b588 Revert "module: error out early on concurrent load of the same module file"
This reverts commit 9828ed3f69.

Sadly, it does seem to cause failures to load modules. Johan Hovold reports:

 "This change breaks module loading during boot on the Lenovo Thinkpad
  X13s (aarch64).

  Specifically it results in indefinite probe deferral of the display
  and USB (ethernet) which makes it a pain to debug. Typing in the dark
  to acquire some logs reveals that other modules are missing as well"

Since this was applied late as a "let's try this", I'm reverting it
asap, and we can try to figure out what goes wrong later.  The excessive
parallel module loading problem is annoying, but not noticeable in
normal situations, and this was only meant as an optimistic workaround
for a user-space bug.

One possible solution may be to do the optimistic exclusive open first,
and then use a lock to serialize loading if that fails.

Reported-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-29 06:40:33 -04:00
Greg Kroah-Hartman
bca7a46336 Merge tag 'iio-fixes-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:

1st set of IIO fixes for the 6.4 cycle.

Usual mixed bag of issues in new code for this cycle and old issues
that have surfaced in the last few weeks.

- adi,ad_sigma_delta
  * Ensure irq lazy disable handing is not used as it breaks completion
    detection.
- adi,ad4130
  * Fix failure to remove clock provider.
- adi,ad5758
  * Wrong CONFIG variable used to control driver build.
- adi,ad7192
  * Fix repeated channel index by just expressing shorted channels
    as differential channel between a channel and itself.
- adi,ad74413
  * Fix error handling for resistance input processing to not fail
    in case of success.
- rohm,bu27034
  * Fix integration time in wrong units (should be seconds not usecs)
  * Ensure reset is actually written not detected as already set from
    regcache.
- gts helper
  * Fix wrong parameter docs.
  * Fix integration time in wrong units (should be seconds not usecs)
- fsl,imx8qxp-adc
  * Add missing vref-supply to binding doc (already used by driver)
- fsl,imx93
  * Fix sign bug in read_raw() so that error check didn't work.
- inv,icm42600
  * Fix reset of timestamp to work even if a particular sensor is off when
    the chip is first enabled.
- kionix,kx022a
  * Fix irq get form fw node to not include the 0 value.
- microchip,mcp4725
  * Fix return value from i2c_master_send() handling to nto assume 0 on
    success.
- mediatek,mt6370
  * Fix incorrect scaling of a few currents on devices with particular
    vendor IDs.
- fsl,mxs-lradc
  * Cleanup ordering issue fix.
- renesas,rcar-adc bindings
  * Fix missing vendor prefix for adi,ad7476
- st,st_accel
  * Fix handling when no ACPI _ONT method present.
- st,stm32-adc
  * Handle no adc-diff-channel present case (all single ended)
  * Handle no adc-channels present case (all differential)
- ti,palmas
  * Fix off by one bug that could allow out of bounds read if callers
    provided wrong value.
- ti,tmag5273
  * Fix a runtime PM leak on measurement error
- vishay,vcnl4035
  * Correctly mask chip ID so devices with different addresses
    don't fail the test.

* tag 'iio-fixes-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (23 commits)
  iio: imu: inv_icm42600: fix timestamp reset
  iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
  dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
  iio: dac: mcp4725: Fix i2c_master_send() return value handling
  iio: accel: kx022a fix irq getting
  iio: bu27034: Ensure reset is written
  iio: dac: build ad5758 driver when AD5758 is selected
  iio: addac: ad74413: fix resistance input processing
  iio: light: vcnl4035: fixed chip ID check
  dt-bindings: iio: imx8qxp-adc: add missing vref-supply
  iio: adc: stm32-adc: skip adc-channels setup if none is present
  iio: adc: stm32-adc: skip adc-diff-channels setup if none is present
  iio: adc: ad7192: Change "shorted" channels to differential
  iio: accel: st_accel: Fix invalid mount_matrix on devices without ACPI _ONT method
  iio: gts-helpers: fix integration time units
  iio: bu27034: Fix integration time
  iio: fix doc for iio_gts_find_sel_by_int_time
  iio: adc: palmas: fix off by one bugs
  iio: adc: mxs-lradc: fix the order of two cleanup operations
  iio: ad4130: Make sure clock provider gets removed
  ...
2023-05-28 20:03:33 +01:00
Akihiro Suda
36e4fc57fc efi: Bump stub image version for macOS HVF compatibility
The macOS hypervisor framework includes a host-side VMM called
VZLinuxBootLoader [1] which implements native support for booting the
Linux kernel inside a guest directly (instead of, e.g., via GRUB
installed inside the guest). On x86, it incorporates a BIOS style loader
that does not implement or expose EFI to the loaded kernel. However,
this loader appears to fail when the 'image minor version' field in the
kernel image's PE/COFF header (which is generally only used by EFI based
bootloaders) is set to any value other than 0x0. [2]

Commit e346bebbd3 ("efi: libstub: Always enable initrd command
line loader and bump version") incremented the EFI stub image minor
version to convey that all EFI stub kernels now implement support for
the initrd= command line option, and do so in a way where it can load
initrd images from any filesystem known to the EFI firmware (as opposed
to prior implementations that could only load initrds from the same
volume that the kernel image was loaded from).

Unfortunately, bumping the version to v1.1 triggers this issue in
VZLinuxBootLoader, breaking the boot on x86. So let's keep the image
minor version at 0x0, and bump the image major version instead.

While at it, convert this field to a bit field, so that individual
features are discoverable from it, as suggested by Linus. So let's bump
the major version to v3, and document the initrd= command line loading
feature as being represented by bit 1 in the mask.

Note that, due to the prior interpretation as a monotonically increasing
version field, loaders are still permitted to assume that the LoadFile2
initrd loading feature is supported for any major version value >= 1,
even if bit 0 is not set.

[1] https://developer.apple.com/documentation/virtualization/vzlinuxbootloader
[2] https://lore.kernel.org/linux-efi/CAG8fp8Teu4G9JuenQrqGndFt2Gy+V4YgJ=hN1xX7AD940YKf3A@mail.gmail.com/

Fixes: e346bebbd3 ("efi: libstub: Always enable initrd command ...")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217485
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
[ardb: rewrite comment and commit log]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-28 20:45:46 +02:00
Linus Torvalds
d8f14b84fe Merge tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fixes from Thomas Gleixner:
 "Two fixes for debugobjects:

   - Prevent the allocation path from waking up kswapd.

     That's a long standing issue due to the GFP_ATOMIC allocation flag.
     As debug objects can be invoked from pretty much any context waking
     kswapd can end up in arbitrary lock chains versus the waitqueue
     lock

   - Correct the explicit lockdep wait-type violation in
     debug_object_fill_pool()"

* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Don't wake up kswapd from fill_pool()
  debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
2023-05-28 07:15:33 -04:00
Linus Torvalds
4e893b5aa4 Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - a double free fix in the Xen pvcalls backend driver

 - a fix for a regression causing the MSI related sysfs entries to not
   being created in Xen PV guests

 - a fix in the Xen blkfront driver for handling insane input data
   better

* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pci/xen: populate MSI sysfs entries
  xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
  xen/blkfront: Only check REQ_FUA for writes
2023-05-27 09:42:56 -07:00
Manivannan Sadhasivam
cbd77119b6 EDAC/qcom: Get rid of hardcoded register offsets
The LLCC EDAC register offsets varies between each SoC. Hardcoding the
register offsets won't work and will often result in crash due to
accessing the wrong locations.

Hence, get the register offsets from the LLCC driver matching the
individual SoCs.

Cc: <stable@vger.kernel.org> # 6.0: 5365cea199 ("soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version")
Cc: <stable@vger.kernel.org> # 6.0: c13d7d261e ("soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver")
Cc: <stable@vger.kernel.org> # 6.0
Fixes: a6e9d7ef25 ("soc: qcom: llcc: Add configuration data for SM8450 SoC")
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-3-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:55 -07:00
Linus Torvalds
18713e8a68 Merge tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "There have not been a lot of fixes for for the soc tree in 6.4, but
  these have been sitting here for too long.

  For the devicetree side, there is one minor warning fix for vexpress,
  the rest all all for the the NXP i.MX platforms: SoC specific bugfixes
  for the iMX8 clocks and its USB-3.0 gadget device, as well as board
  specific fixes for regulators and the phy on some of the i.MX boards.

  The microchip risc-v and arm32 maintainers now also add a shared
  maintainer file entry for the arm64 parts.

  The remaining fixes are all for firmware drivers, addressing mistakes
  in the optee, scmi and ff-a firmware driver implementation, mostly in
  the error handling code, incorrect use of the alloc_workqueue()
  interface in SCMI, and compatibility with corner cases of the firmware
  implementation"

* tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  MAINTAINERS: update arm64 Microchip entries
  arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed
  dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type
  arm64: dts: colibri-imx8x: delete adc1 and dsp
  arm64: dts: colibri-imx8x: fix iris pinctrl configuration
  arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board
  arm64: dts: colibri-imx8x: fix eval board pin configuration
  arm64: dts: imx8mp: Fix video clock parents
  ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator
  ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3
  arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay
  arm64: dts: imx8mn: Fix video clock parents
  firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
  firmware: arm_ffa: Fix FFA device names for logical partitions
  firmware: arm_ffa: Fix usage of partition info get count flag
  firmware: arm_ffa: Check if ffa_driver remove is present before executing
  arm64: dts: arm: add missing cache properties
  ARM: dts: vexpress: add missing cache properties
  firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation
  optee: fix uninited async notif value
2023-05-26 16:17:56 -07:00
Arnd Bergmann
abf5422e82 Merge tag 'ffa-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
Arm FF-A fixes for v6.4

Quite a few fixes to address set of assorted issues:
1. NULL pointer dereference if the ffa driver doesn't provide remove()
   callback as it is currently executed unconditionally
2. FF-A core probe failure on systems with v1.0 firmware as the new
   partition info get count flag is used unconditionally
3. Failure to register more than one logical partition or service within
   the same physical partition as the device name contains only VM ID
   which will be same for all but each will have unique UUID.
4. Rejection of certain memory interface transmissions by the receivers
   (secure partitions) as few MBZ fields are non-zero due to lack of
   explicit re-initialization of those fields

* tag 'ffa-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
  firmware: arm_ffa: Fix FFA device names for logical partitions
  firmware: arm_ffa: Fix usage of partition info get count flag
  firmware: arm_ffa: Check if ffa_driver remove is present before executing

Link: https://lore.kernel.org/r/20230509143453.1188753-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-05-26 16:49:15 +02:00
Jakub Kicinski
aa866ee4b1 Merge tag 'mlx5-fixes-2023-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2023-05-24

This series includes bug fixes for the mlx5 driver.

* tag 'mlx5-fixes-2023-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  Documentation: net/mlx5: Wrap notes in admonition blocks
  Documentation: net/mlx5: Add blank line separator before numbered lists
  Documentation: net/mlx5: Use bullet and definition lists for vnic counters description
  Documentation: net/mlx5: Wrap vnic reporter devlink commands in code blocks
  net/mlx5: Fix check for allocation failure in comp_irqs_request_pci()
  net/mlx5: DR, Add missing mutex init/destroy in pattern manager
  net/mlx5e: Move Ethernet driver debugfs to profile init callback
  net/mlx5e: Don't attach netdev profile while handling internal error
  net/mlx5: Fix post parse infra to only parse every action once
  net/mlx5e: Use query_special_contexts cmd only once per mdev
  net/mlx5: fw_tracer, Fix event handling
  net/mlx5: SF, Drain health before removing device
  net/mlx5: Drain health before unregistering devlink
  net/mlx5e: Do not update SBCM when prio2buffer command is invalid
  net/mlx5e: Consider internal buffers size in port buffer calculations
  net/mlx5e: Prevent encap offload when neigh update is running
  net/mlx5e: Extract remaining tunnel encap code to dedicated file
====================

Link: https://lore.kernel.org/r/20230525034847.99268-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25 21:09:41 -07:00
Linus Torvalds
9828ed3f69 module: error out early on concurrent load of the same module file
It turns out that udev under certain circumstances will concurrently try
to load the same modules over-and-over excessively.  This isn't a kernel
bug, but it ends up affecting the kernel, to the point that under
certain circumstances we can fail to boot, because the kernel uses a lot
of memory to read all the module data all at once.

Note that it isn't a memory leak, it's just basically a thundering herd
problem happening at bootup with a lot of CPUs, with the worst cases
then being pretty bad.

Admittedly the worst situations are somewhat contrived: lots and lots of
CPUs, not a lot of memory, and KASAN enabled to make it all slower and
as such (unintentionally) exacerbate the problem.

Luis explains: [1]

 "My best assessment of the situation is that each CPU in udev ends up
  triggering a load of duplicate set of modules, not just one, but *a
  lot*. Not sure what heuristics udev uses to load a set of modules per
  CPU."

Petr Pavlu chimes in: [2]

 "My understanding is that udev workers are forked. An initial kmod
  context is created by the main udevd process but no sharing happens
  after the fork. It means that the mentioned memory pool logic doesn't
  really kick in.

  Multiple parallel load requests come from multiple udev workers, for
  instance, each handling an udev event for one CPU device and making
  the exactly same requests as all others are doing at the same time.

  The optimization idea would be to recognize these duplicate requests
  at the udevd/kmod level and converge them"

Note that module loading has tried to mitigate this issue before, see
for example commit 064f4536d1 ("module: avoid allocation if module is
already present and ready"), which has a few ASCII graphs on memory use
due to this same issue.

However, while that noticed that the module was already loaded, and
exited with an error early before spending any more time on setting up
the module, it didn't handle the case of multiple concurrent module
loads all being active - but not complete - at the same time.

Yes, one of them will eventually win the race and finalize its copy, and
the others will then notice that the module already exists and error
out, but while this all happens, we have tons of unnecessary concurrent
work being done.

Again, the real fix is for udev to not do that (maybe it should use
threads instead of fork, and have actual shared data structures and not
cause duplicate work). That real fix is apparently not trivial.

But it turns out that the kernel already has a pretty good model for
dealing with concurrent access to the same file: the i_writecount of the
inode.

In fact, the module loading already indirectly uses 'i_writecount' ,
because 'kernel_file_read()' will in fact do

	ret = deny_write_access(file);
	if (ret)
		return ret;
	...
	allow_write_access(file);

around the read of the file data.  We do not allow concurrent writes to
the file, and return -ETXTBUSY if the file was open for writing at the
same time as the module data is loaded from it.

And the solution to the reader concurrency problem is to simply extend
this "no concurrent writers" logic to simply be "exclusive access".

Note that "exclusive" in this context isn't really some absolute thing:
it's only exclusion from writers and from other "special readers" that
do this writer denial.  So we simply introduce a variation of that
"deny_write_access()" logic that not only denies write access, but also
requires that this is the _only_ such access that denies write access.

Which means that you can't start loading a module that is already being
loaded as a module by somebody else, or you will get the same -ETXTBSY
error that you would get if there were writers around.

[ It also means that you can't try to load a currently executing
  executable as a module, for the same reason: executables do that same
  "deny_write_access()" thing, and that's obviously where the whole
  ETXTBSY logic traditionally came from.

  This is not a problem for kernel modules, since the set of normal
  executable files and kernel module files is entirely disjoint. ]

This new function is called "exclusive_deny_write_access()", and the
implementation is trivial, in that it's just an atomic decrement of
i_writecount if it was 0 before.

To use that new exclusivity check, all we then do is wrap the module
loading with that exclusive_deny_write_access()() / allow_write_access()
pair.  The actual patch is a bit bigger than that, because we want to
surround not just the "load file data" part, but the whole module setup,
to get maximum exclusion.

So this ends up splitting up "finit_module()" into a few helper
functions to make it all very clear and legible.

In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]),
the "wasted vmalloc" space (ie module data read into a vmalloc'ed area
in order to be loaded as a module, but then discarded because somebody
else loaded the same module instead) dropped from 1.8GiB to 474kB.  Yes,
that's gigabytes to kilobytes.

It doesn't drop completely to zero, because even with this change, you
can still end up having completely serial pointless module loads, where
one udev process has loaded a module fully (and thus the kernel has
released that exclusive lock on the module file), and then another udev
process tries to load the same module again.

So while we cannot fully get rid of the fundamental bug in user space,
we _can_ get rid of the excessive concurrent thundering herd effect.

A couple of final side notes on this all:

 - This tweak only affects the "finit_module()" system call, which gives
   the kernel a file descriptor with the module data.

   You can also just feed the module data as raw data from user space
   with "init_module()" (note the lack of 'f' at the beginning), and
   obviously for that case we do _not_ have any "exclusive read" logic.

   So if you absolutely want to do things wrong in user space, and try
   to load the same module multiple times, and error out only later when
   the kernel ends up saying "you can't load the same module name
   twice", you can still do that.

   And in fact, some distros will do exactly that, because they will
   uncompress the kernel module data in user space before feeding it to
   the kernel (mainly because they haven't started using the new kernel
   side decompression yet).

   So this is not some absolute "you can't do concurrent loads of the
   same module". It's literally just a very simple heuristic that will
   catch it early in case you try to load the exact same module file at
   the same time, and in that case avoid a potentially nasty situation.

 - There is another user of "deny_write_access()": the verity code that
   enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl).

   If you use fs-verity and you care about verifying the kernel modules
   (which does make sense), you should do it *before* loading said
   kernel module. That may sound obvious, but now the implementation
   basically requires it. Because if you try to do it concurrently, the
   kernel may refuse to load the module file that is being set up by the
   fs-verity code.

 - This all will obviously mean that if you insist on loading the same
   module in parallel, only one module load will succeed, and the others
   will return with an error.

   That was true before too, but what is different is that the -ETXTBSY
   error can be returned *before* the success case of another process
   fully loading and instantiating the module.

   Again, that might sound obvious, and it is indeed the whole point of
   the whole change: we are much quicker to notice the whole "you're
   already in the process of loading this module".

   So it's very much intentional, but it does mean that if you just
   spray the kernel with "finit_module()", and expect that the module is
   immediately loaded afterwards without checking the return value, you
   are doing something horribly horribly wrong.

   I'd like to say that that would never happen, but the whole _reason_
   for this commit is that udev is currently doing something horribly
   horribly wrong, so ...

Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1]
Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2]
Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3]
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-25 17:07:57 -07:00
Linus Torvalds
9db898594c Merge tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - During the acl rework we merged this cycle the generic_listxattr()
   helper had to be modified in a way that in principle it would allow
   for POSIX ACLs to be reported. At least that was the impression we
   had initially. Because before the acl rework POSIX ACLs would be
   reported if the filesystem did have POSIX ACL xattr handlers in
   sb->s_xattr. That logic changed and now we can simply check whether
   the superblock has SB_POSIXACL set and if the inode has
   inode->i_{default_}acl set report the appropriate POSIX ACL name.

   However, we didn't realize that generic_listxattr() was only ever
   used by two filesystems. Both of them don't support POSIX ACLs via
   sb->s_xattr handlers and so never reported POSIX ACLs via
   generic_listxattr() even if they raised SB_POSIXACL and did contain
   inodes which had acls set. The example here is nfs4.

   As a result, generic_listxattr() suddenly started reporting POSIX
   ACLs when it wouldn't have before. Since SB_POSIXACL implies that the
   umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from
   the superblock as it would also alter umask handling for them.

   So just have generic_listxattr() not report POSIX ACLs as it never
   did anyway. It's documented as such.

 - Our SB_* flags currently use a signed integer and we shift the last
   bit causing UBSAN to complain about undefined behavior. Switch to
   using unsigned. While the original patch used an explicit unsigned
   bitshift it's now pretty common to rely on the BIT() macro in a lot
   of headers nowadays. So the patch has been adjusted to use that.

 - Add Namjae as ntfs reviewer. They're already active this cycle so
   let's make it explicit right now.

* tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ntfs: Add myself as a reviewer
  fs: don't call posix_acl_listxattr in generic_listxattr
  fs: fix undefined behavior in bit shift for SB_NOUSER
2023-05-25 11:03:58 -07:00
Linus Torvalds
50fb587e6a Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and bpf.

  Current release - regressions:

   - net: fix skb leak in __skb_tstamp_tx()

   - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs

  Current release - new code bugs:

   - handshake:
      - fix sock->file allocation
      - fix handshake_dup() ref counting

   - bluetooth:
      - fix potential double free caused by hci_conn_unlink
      - fix UAF in hci_conn_hash_flush

  Previous releases - regressions:

   - core: fix stack overflow when LRO is disabled for virtual
     interfaces

   - tls: fix strparser rx issues

   - bpf:
      - fix many sockmap/TCP related issues
      - fix a memory leak in the LRU and LRU_PERCPU hash maps
      - init the offload table earlier

   - eth: mlx5e:
      - do as little as possible in napi poll when budget is 0
      - fix using eswitch mapping in nic mode
      - fix deadlock in tc route query code

  Previous releases - always broken:

   - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()

   - raw: fix output xfrm lookup wrt protocol

   - smc: reset connection when trying to use SMCRv2 fails

   - phy: mscc: enable VSC8501/2 RGMII RX clock

   - eth: octeontx2-pf: fix TSOv6 offload

   - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"

* tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
  udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
  net: phy: mscc: enable VSC8501/2 RGMII RX clock
  net: phy: mscc: remove unnecessary phydev locking
  net: phy: mscc: add support for VSC8501
  net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
  net/handshake: Enable the SNI extension to work properly
  net/handshake: Unpin sock->file if a handshake is cancelled
  net/handshake: handshake_genl_notify() shouldn't ignore @flags
  net/handshake: Fix uninitialized local variable
  net/handshake: Fix handshake_dup() ref counting
  net/handshake: Remove unneeded check from handshake_dup()
  ipv6: Fix out-of-bounds access in ipv6_find_tlv()
  net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
  docs: netdev: document the existence of the mail bot
  net: fix skb leak in __skb_tstamp_tx()
  r8169: Use a raw_spinlock_t for the register locks.
  page_pool: fix inconsistency for page_pool_ring_[un]lock()
  bpf, sockmap: Test progs verifier error with latest clang
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
  ...
2023-05-25 10:55:26 -07:00
Linus Torvalds
eb03e31813 Merge tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:

 - Fix power_supply_get_battery_info for devices without parent devices
   resulting in NULL pointer dereference

 - Fix desktop systems reporting to run on battery once a power-supply
   device with device scope appears (e.g. a HID keyboard with a battery)

 - Ratelimit debug print about driver not providing data

 - Fix race condition related to external_power_changed in multiple
   drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx)

 - Fix LED trigger switching from blinking to solid-on when charging
   finishes

 - Fix multiple races in bq27xxx battery driver

 - mt6360: handle potential ENOMEM from devm_work_autocancel

 - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit

 - rt9467: avoid passing 0 to dev_err_probe

* tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits)
  power: supply: Fix logic checking if system is running from battery
  power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe
  power: supply: sbs-charger: Fix INHIBITED bit for Status reg
  power: supply: rt9467: Fix passing zero to 'dev_err_probe'
  power: supply: Ratelimit no data debug output
  power: supply: Fix power_supply_get_battery_info() if parent is NULL
  power: supply: bq24190: Call power_supply_changed() after updating input current
  power: supply: bq25890: Call power_supply_changed() after updating input current or voltage
  power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule()
  power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
  power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes
  power: supply: bq27xxx: Move bq27xxx_battery_update() down
  power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status()
  power: supply: bq27xxx: Fix poll_interval handling and races on remove
  power: supply: bq27xxx: Fix I2C IRQ race on remove
  power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition
  power: supply: leds: Fix blink to LED on transition
  power: supply: sc27xx: Fix external_power_changed race
  power: supply: bq25890: Fix external_power_changed race
  power: supply: axp288_fuel_gauge: Fix external_power_changed race
  ...
2023-05-25 10:26:36 -07:00
Arnd Bergmann
fd936fd8ac efi: fix missing prototype warnings
The cper.c file needs to include an extra header, and efi_zboot_entry
needs an extern declaration to avoid these 'make W=1' warnings:

drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes]
drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes]

To make this easier, move the cper specific declarations to
include/linux/cper.h.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-25 09:26:19 +02:00
Jakub Kicinski
0c615f1cc3 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2023-05-24

We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).

The main changes are:

1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
   from John Fastabend.

2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
   from Anton Protopopov.

3) Init the BPF offload table earlier than just late_initcall,
   from Jakub Kicinski.

4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
   from Will Deacon.

5) Remove a now unsupported __fallthrough in BPF samples,
   from Andrii Nakryiko.

6) Fix a typo in pkg-config call for building sign-file,
   from Jeremy Sowden.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Test progs verifier error with latest clang
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
  bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
  bpf, sockmap: Build helper to create connected socket pair
  bpf, sockmap: Pull socket helpers out of listen test for general use
  bpf, sockmap: Incorrectly handling copied_seq
  bpf, sockmap: Wake up polling after data copy
  bpf, sockmap: TCP data stall on recv before accept
  bpf, sockmap: Handle fin correctly
  bpf, sockmap: Improved check for empty queue
  bpf, sockmap: Reschedule is now done through backlog
  bpf, sockmap: Convert schedule_work into delayed_work
  bpf, sockmap: Pass skb ownership through read_skb
  bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
  bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
  samples/bpf: Drop unnecessary fallthrough
  bpf: netdev: init the offload table earlier
  selftests/bpf: Fix pkg-config call building sign-file
====================

Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 21:57:57 -07:00
Dragos Tatulea
1db1f21cae net/mlx5e: Use query_special_contexts cmd only once per mdev
Don't query the firmware so many times (num rqs * num wqes * wqe frags)
because it slows down linearly the interface creation time when the
product is larger. Do it only once per mdev and store the result in
mlx5e_param.

Due to helper function being called from different files, move it to
an appropriate location. Rename the function with a proper prefix and
add a small cleanup.

This fix applies only for legacy rq.

Fixes: 1b1e486883 ("net/mlx5e: Use query_special_contexts for mkeys")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-24 20:44:18 -07:00
Maximilian Heyne
335b422346 x86/pci/xen: populate MSI sysfs entries
Commit bf5e758f02 ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.

Commit 6c796996ee ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.

Fix this by making use of the fallback functions for sysfs population.

Fixes: bf5e758f02 ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24 18:08:49 +02:00
Steven Rostedt (Google)
4b512860bd tracing: Rename stacktrace field to common_stacktrace
The histogram and synthetic events can use a pseudo event called
"stacktrace" that will create a stacktrace at the time of the event and
use it just like it was a normal field. We have other pseudo events such
as "common_cpu" and "common_timestamp". To stay consistent with that,
convert "stacktrace" to "common_stacktrace". As this was used in older
kernels, to keep backward compatibility, this will act just like
"common_cpu" did with "cpu". That is, "cpu" will be the same as
"common_cpu" unless the event has a "cpu" field. In which case, the
event's field is used. The same is true with "stacktrace".

Also update the documentation to reflect this change.

Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-23 23:38:23 -04:00
Beau Belgrave
ff9e1632d6 tracing/user_events: Document user_event_mm one-shot list usage
During 6.4 development it became clear that the one-shot list used by
the user_event_mm's next field was confusing to others. It is not clear
how this list is protected or what the next field usage is for unless
you are familiar with the code.

Add comments into the user_event_mm struct indicating lock requirement
and usage. Also document how and why this approach was used via comments
in both user_event_enabler_update() and user_event_mm_get_all() and the
rules to properly use it.

Link: https://lkml.kernel.org/r/20230519230741.669-5-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-23 21:08:33 -04:00
Beau Belgrave
dcbd1ac266 tracing/user_events: Rename link fields for clarity
Currently most list_head fields of various structs within user_events
are simply named link. This causes folks to keep additional context in
their head when working with the code, which can be confusing.

Instead of using link, describe what the actual link is, for example:
list_del_rcu(&mm->link);

Changes into:
list_del_rcu(&mm->mms_link);

The reader now is given a hint the link is to the mms global list
instead of having to remember or spot check within the code.

Link: https://lkml.kernel.org/r/20230519230741.669-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-23 21:08:33 -04:00
John Fastabend
405df89dd5 bpf, sockmap: Improved check for empty queue
We noticed some rare sk_buffs were stepping past the queue when system was
under memory pressure. The general theory is to skip enqueueing
sk_buffs when its not necessary which is the normal case with a system
that is properly provisioned for the task, no memory pressure and enough
cpu assigned.

But, if we can't allocate memory due to an ENOMEM error when enqueueing
the sk_buff into the sockmap receive queue we push it onto a delayed
workqueue to retry later. When a new sk_buff is received we then check
if that queue is empty. However, there is a problem with simply checking
the queue length. When a sk_buff is being processed from the ingress queue
but not yet on the sockmap msg receive queue its possible to also recv
a sk_buff through normal path. It will check the ingress queue which is
zero and then skip ahead of the pkt being processed.

Previously we used sock lock from both contexts which made the problem
harder to hit, but not impossible.

To fix instead of popping the skb from the queue entirely we peek the
skb from the queue and do the copy there. This ensures checks to the
queue length are non-zero while skb is being processed. Then finally
when the entire skb has been copied to user space queue or another
socket we pop it off the queue. This way the queue length check allows
bypassing the queue only after the list has been completely processed.

To reproduce issue we run NGINX compliance test with sockmap running and
observe some flakes in our testing that we attributed to this issue.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
2023-05-23 16:10:11 +02:00
John Fastabend
29173d07f7 bpf, sockmap: Convert schedule_work into delayed_work
Sk_buffs are fed into sockmap verdict programs either from a strparser
(when the user might want to decide how framing of skb is done by attaching
another parser program) or directly through tcp_read_sock. The
tcp_read_sock is the preferred method for performance when the BPF logic is
a stream parser.

The flow for Cilium's common use case with a stream parser is,

 tcp_read_sock()
  sk_psock_verdict_recv
    ret = bpf_prog_run_pin_on_cpu()
    sk_psock_verdict_apply(sock, skb, ret)
     // if system is under memory pressure or app is slow we may
     // need to queue skb. Do this queuing through ingress_skb and
     // then kick timer to wake up handler
     skb_queue_tail(ingress_skb, skb)
     schedule_work(work);

The work queue is wired up to sk_psock_backlog(). This will then walk the
ingress_skb skb list that holds our sk_buffs that could not be handled,
but should be OK to run at some later point. However, its possible that
the workqueue doing this work still hits an error when sending the skb.
When this happens the skbuff is requeued on a temporary 'state' struct
kept with the workqueue. This is necessary because its possible to
partially send an skbuff before hitting an error and we need to know how
and where to restart when the workqueue runs next.

Now for the trouble, we don't rekick the workqueue. This can cause a
stall where the skbuff we just cached on the state variable might never
be sent. This happens when its the last packet in a flow and no further
packets come along that would cause the system to kick the workqueue from
that side.

To fix we could do simple schedule_work(), but while under memory pressure
it makes sense to back off some instead of continue to retry repeatedly. So
instead to fix convert schedule_work to schedule_delayed_work and add
backoff logic to reschedule from backlog queue on errors. Its not obvious
though what a good backoff is so use '1'.

To test we observed some flakes whil running NGINX compliance test with
sockmap we attributed these failed test to this bug and subsequent issue.

>From on list discussion. This commit

 bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock")

was intended to address similar race, but had a couple cases it missed.
Most obvious it only accounted for receiving traffic on the local socket
so if redirecting into another socket we could still get an sk_buff stuck
here. Next it missed the case where copied=0 in the recv() handler and
then we wouldn't kick the scheduler. Also its sub-optimal to require
userspace to kick the internal mechanisms of sockmap to wake it up and
copy data to user. It results in an extra syscall and requires the app
to actual handle the EAGAIN correctly.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com
2023-05-23 16:09:56 +02:00
Yevgeny Kliteynik
c7dd225bc2 net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.

Fixes: 7304d603a5 ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Linus Torvalds
2dd0d98d62 Merge tag 'usb-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some USB fixes for 6.4-rc3, as well as a driver core fix that
  resolves a memory leak that shows up in USB devices easier than other
  subsystems.

  Included in here are:

   - driver core memory leak as reported and tested by syzbot and
     developers

   - dwc3 driver fixes for reported problems

   - xhci driver fixes for reported problems

   - USB gadget driver reverts to resolve regressions

   - usbtmc driver fix for syzbot reported problem

   - thunderbolt driver fixes for reported issues

   - other small USB fixes

  All of these, except for the driver core fix, have been in linux-next
  with no reported problems. The driver core fix was tested and verified
  to solve the issue by syzbot and the original reporter"

* tag 'usb-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  driver core: class: properly reference count class_dev_iter()
  xhci: Fix incorrect tracking of free space on transfer rings
  xhci-pci: Only run d3cold avoidance quirk for s2idle
  usb-storage: fix deadlock when a scsi command timeouts more than once
  usb: dwc3: fix a test for error in dwc3_core_init()
  usb: typec: tps6598x: Fix fault at module removal
  usb: gadget: u_ether: Fix host MAC address case
  usb: typec: altmodes/displayport: fix pin_assignment_show
  Revert "usb: gadget: udc: core: Invoke usb_gadget_connect only when started"
  Revert "usb: gadget: udc: core: Prevent redundant calls to pullup"
  usb: gadget: drop superfluous ':' in doc string
  usb: dwc3: debugfs: Resume dwc3 before accessing registers
  USB: UHCI: adjust zhaoxin UHCI controllers OverCurrent bit value
  usb: dwc3: fix gadget mode suspend interrupt handler issue
  usb: dwc3: gadget: Improve dwc3_gadget_suspend() and dwc3_gadget_resume()
  USB: usbtmc: Fix direction for 0-length ioctl control messages
  thunderbolt: Clear registers properly when auto clear isn't in use
2023-05-20 10:16:38 -07:00
Linus Torvalds
98be58a6e9 Merge tag 'block-6.4-2023-05-20' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - More device quirks (Sagi, Hristo, Adrian, Daniel)
     - Controller delete race (Maurizo)
     - Multipath cleanup fix (Christoph)

 - Deny writeable mmap mapping on a readonly block device (Loic)

 - Kill unused define that got introduced by accident (Christoph)

 - Error handling fix for s390 dasd (Stefan)

 - ublk locking fix (Ming)

* tag 'block-6.4-2023-05-20' of git://git.kernel.dk/linux:
  block: remove NFL4_UFLG_MASK
  block: Deny writable memory mapping if block is read-only
  s390/dasd: fix command reject error on ESE devices
  nvme-pci: Add quirk for Teamgroup MP33 SSD
  ublk: fix AB-BA lockdep warning
  nvme: do not let the user delete a ctrl before a complete initialization
  nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk
  nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
  nvme-pci: add quirk for missing secondary temperature thresholds
  nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G
2023-05-20 08:48:04 -07:00