Patch series "vma_start_write_killable"", v2.
When we added the VMA lock, we made a major oversight in not adding a
killable variant. That can run us into trouble where a thread takes the
VMA lock for read (eg handling a page fault) and then goes out to lunch
for an hour (eg doing reclaim). Another thread tries to modify the VMA,
taking the mmap_lock for write, then attempts to lock the VMA for write.
That blocks on the first thread, and ensures that every other page fault
now tries to take the mmap_lock for read. Because everything's in an
uninterruptible sleep, we can't kill the task, which makes me angry.
This patchset just adds vma_start_write_killable() and converts one caller
to use it. Most users are somewhat tricky to convert, so expect follow-up
individual patches per call-site which need careful analysis to make sure
we've done proper cleanup.
This patch (of 2):
The vma can be held read-locked for a substantial period of time, eg if
memory allocation needs to go into reclaim. It's useful to be able to
send fatal signals to threads which are waiting for the write lock.
Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Chris Li <chriscli@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.
Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.
This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.
This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.
The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.
It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.
To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:
a. if set in one VMA and not in another still permit those VMAs to be
merged (if otherwise compatible).
b. When they are merged, the resultant VMA must have the flag set.
The VMA logic is updated to propagate these flags correctly.
Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.
We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.
Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.
The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.
This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.
Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.
This patch (of 9):
Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).
Since commit 8e2f2aeb8b ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.
This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.
This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).
We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5 ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.
We also update the smaps logic and documentation to identify these VMAs.
Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.
We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.
Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/swapfile.c: select swap devices of default priority round
robin", v5.
Currently, on system with multiple swap devices, swap allocation will
select one swap device according to priority. The swap device with the
highest priority will be chosen to allocate firstly.
People can specify a priority from 0 to 32767 when swapon a swap device,
or the system will set it from -2 then downwards by default. Meanwhile,
on NUMA system, the swap device with node_id will be considered first on
that NUMA node of the node_id.
In the current code, an array of plist, swap_avail_heads[nid], is used to
organize swap devices on each NUMA node. For each NUMA node, there is a
plist organizing all swap devices. The 'prio' value in the plist is the
negated value of the device's priority due to plist being sorted from low
to high. The swap device owning one node_id will be promoted to the front
position on that NUMA node, then other swap devices are put in order of
their default priority.
E.g I got a system with 8 NUMA nodes, and I setup 4 zram partition as
swap devices.
Current behaviour:
their priorities will be(note that -1 is skipped):
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 16G 0B -2
/dev/zram1 partition 16G 0B -3
/dev/zram2 partition 16G 0B -4
/dev/zram3 partition 16G 0B -5
And their positions in the 8 swap_avail_lists[nid] will be:
swap_avail_lists[0]: /* node 0's available swap device list */
zram0 -> zram1 -> zram2 -> zram3
prio:1 prio:3 prio:4 prio:5
swap_avali_lists[1]: /* node 1's available swap device list */
zram1 -> zram0 -> zram2 -> zram3
prio:1 prio:2 prio:4 prio:5
swap_avail_lists[2]: /* node 2's available swap device list */
zram2 -> zram0 -> zram1 -> zram3
prio:1 prio:2 prio:3 prio:5
swap_avail_lists[3]: /* node 3's available swap device list */
zram3 -> zram0 -> zram1 -> zram2
prio:1 prio:2 prio:3 prio:4
swap_avail_lists[4-7]: /* node 4,5,6,7's available swap device list */
zram0 -> zram1 -> zram2 -> zram3
prio:2 prio:3 prio:4 prio:5
The adjustment for swap device with node_id intended to decrease the
pressure of lock contention for one swap device by taking different swap
device on different node. The adjustment was introduced in commit
a2468cc9bf ("swap: choose swap device according to numa node").
However, the adjustment is a little coarse-grained. On the node, the swap
device sharing the node's id will always be selected firstly by node's
CPUs until exhausted, then next one. And on other nodes where no swap
device shares its node id, swap device with priority '-2' will be selected
firstly until exhausted, then next with priority '-3'.
This is the swapon output during the process high pressure vm-scability
test is being taken. It's clearly showing zram0 is heavily exploited
until exhausted.
===================================
[root@hp-dl385g10-03 ~]# swapon
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 16G 15.7G -2
/dev/zram1 partition 16G 3.4G -3
/dev/zram2 partition 16G 3.4G -4
/dev/zram3 partition 16G 2.6G -5
The node based strategy on selecting swap device is much better then the
old way one by one selecting swap device. However it is still
unreasonable because swap devices are assumed to have similar accessing
speed if no priority is specified when swapon. It's unfair and doesn't
make sense just because one swap device is swapped on firstly, its
priority will be higher than the one swapped on later.
So in this patchset, change is made to select the swap device round robin
if default priority. In code, the plist array swap_avail_heads[nid] is
replaced with a plist swap_avail_head which reverts commit a2468cc9bf.
Meanwhile, on top of the revert, further change is taken to make any
device w/o specified priority get the same default priority '-1'. Surely,
swap device with specified priority are always put foremost, this is not
impacted. If you care about their different accessing speed, then use
'swapon -p xx' to deploy priority for your swap devices.
New behaviour:
swap_avail_list: /* one global available swap device list */
zram0 -> zram1 -> zram2 -> zram3
prio:1 prio:1 prio:1 prio:1
This is the swapon output during the process high pressure vm-scability
being taken, all is selected round robin:
=======================================
[root@hp-dl385g10-03 linux]# swapon
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 16G 12.6G -1
/dev/zram1 partition 16G 12.6G -1
/dev/zram2 partition 16G 12.6G -1
/dev/zram3 partition 16G 12.6G -1
With the change, we can see about 18% efficiency promotion as below:
vm-scability test:
==================
Test with:
usemem --init-time -O -y -x -n 31 2G (4G memcg, zram as swap)
Before: After:
System time: 637.92 s 526.74 s (lower is better)
Sum Throughput: 3546.56 MB/s 4207.56 MB/s (higher is better)
Single process Throughput: 114.40 MB/s 135.72 MB/s (higher is better)
free latency: 10138455.99 us 6810119.01 us (low is better)
This patch (of 2):
This reverts commit a2468cc9bf ("swap: choose swap device according to
numa node").
After this patch, the behaviour will change back to pre-commit
a2468cc9bf. Means the priority will be set from -1 then downwards by
default, and when swapping, it will exhault swap device one by one
according to priority from high to low. This is preparation work for
later change.
[root@hp-dl385g10-03 ~]# swapon
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 16G 16G -1
/dev/zram1 partition 16G 966.2M -2
/dev/zram2 partition 16G 0B -3
/dev/zram3 partition 16G 0B -4
Link: https://lkml.kernel.org/r/20251028034308.929550-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20251028034308.929550-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Suggested-by: Chris Li <chrisl@kernel.org>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allow to override defaults for shemem and tmpfs at config time. This is
consistent with how transparent hugepages can be configured.
Same results can be achieved with the existing
'transparent_hugepage_shmem' and 'transparent_hugepage_tmpfs' settings in
the kernel command line, but it is more convenient to define basic
settings at config time instead of changing kernel command line later.
Defaults for shmem and tmpfs were not changed. They are remained the same
as before: 'never' for both cases. Options 'deny' and 'force' are omitted
intentionally since these are special values and supposed to be used for
emergencies or testing and are not expected to be permanent ones.
Primary motivation for adding config option is to enable policy
enforcement at build time. In large-scale production environments (Meta's
for example), the kernel configuration is often maintained centrally close
to the kernel code itself and owned by the kernel engineers, while boot
parameters are managed independently (e.g. by provisioning systems). In
such setups, the kernel build defines the supported and expected behavior
in a single place, but there is no reliable or uniform control over the
kernel command line options.
A build-time default allows kernel integrators to enforce a predictable
hugepage policy for shmem/tmpfs on a base layer, ensuring reproducible
behavior and avoiding configuration drift caused by possible boot-time
differences.
In short, primary benefit is mostly operational: it provides a way to
codify preferred policy in the kernel configuration, which is versioned,
reviewed, and tested as part of the kernel build process, rather than
depending on potentially variable boot parameters.
[d@ilvokhin.com: v2]
Link: https://lkml.kernel.org/r/aQECPpjd-fU_TC79@shell.ilvokhin.com
Link: https://lkml.kernel.org/r/aPpv8sAa2sYgNu3L@shell.ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The kernel can throttle network sockets if the memory cgroup associated
with the corresponding socket is under memory pressure. The throttling
actions include clamping the transmit window, failing to expand receive or
send buffers, aggressively prune out-of-order receive queue, FIN deferred
to a retransmitted packet and more. Let's add memcg metric to track such
throttling actions.
At the moment memcg memory pressure is defined through vmpressure and in
future it may be defined using PSI or we may add more flexible way for the
users to define memory pressure, maybe through ebpf. However the
potential throttling actions will remain the same, so this newly
introduced metric will continue to track throttling actions irrespective
of how memcg memory pressure is defined.
Link: https://lkml.kernel.org/r/20251016161035.86161-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull gpio fixes from Bartosz Golaszewski:
- use the firmware node of the GPIO chip, not its label for software
node lookup
- fix invalid pointer access in GPIO debugfs
- drop unused functions from gpio-tb10x
- fix a regression in gpio-aggregator: restore the set_config()
callback in the driver
- correct schema $id path in ti,twl4030 DT bindings
* tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tb10x: Drop unused tb10x_set_bits() function
gpio: aggregator: restore the set_config operation
gpiolib: fix invalid pointer access in debugfs
gpio: swnode: don't use the swnode's name as the key for GPIO lookup
dt-bindings: gpio: ti,twl4030: Correct the schema $id path
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became slightly bigger than usual due
to timing issues (holidays, etc), but all changes are rather
device-specific fixes, so not really worrisome.
- ASoC Cirrus codec fixes for AMD
- Various fixes for ASoC Intel AVS, Qualcomm, SoundWire, FSL,
Mediatek, Renesas
- A few HD-audio quirks, and USB-audio regression fixes for Presonus"
* tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda/realtek: Enable mic on Vaio RPL
ASoC: dt-bindings: pm4125-sdw: correct number of soundwire ports
ASoC: renesas: rz-ssi: Use proper dma_buffer_pos after resume
ASoC: soc_sdw_utils: remove cs42l43 component_name
ASoC: fsl_sai: Fix sync error in consumer mode
ASoC: Fix build for sdw_utils
ALSA: hda/realtek: Fix mute led for HP Victus 15-fa1xxx (MB 8C2D)
ASoC: rt721: fix prepare clock stop failed
ALSA: usb-audio: don't log messages meant for 1810c when initializing 1824c
ASoC: mediatek: Fix double pm_runtime_disable in remove functions
ASoC: fsl_micfil: correct the endian format for DSD
ASoC: fsl_sai: fix bit order for DSD format
ASoC: Intel: avs: Use snd_codec format when initializing probe
ASoC: Intel: avs: Disable periods-elapsed work when closing PCM
ASoC: Intel: avs: Unprepare a stream when XRUN occurs
ASoC: sdw_utils: add name_prefix for rt1321 part id
ASoC: qdsp6: q6asm: do not sleep while atomic
ASoC: Intel: soc-acpi-intel-ptl-match: Remove cs42l43 match from sdw link3
ASOC: max98090/91: fix for filter configuration: AHPF removed DMIC2_HPF added
ASoC: amd: acp: Add ACP7.0 match entries for cs35l56 and cs42l43
...
The dpll.yaml spec incorrectly omitted module-name and clock-id from the
pin-get operation reply specification, even though the kernel DPLL
implementation has always included these attributes in pin-get responses
since the initial implementation.
This spec inconsistency caused issues with the C YNL code generator.
The generated dpll_pin_get_rsp structure was missing these fields.
Fix the spec by adding module-name and clock-id to the pin-attrs reply
specification to match the actual kernel behavior.
Fixes: 3badff3a25 ("dpll: spec: Add Netlink spec in YAML")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251024185512.363376-1-poros@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported issues.
Included in here are:
- sh-sci serial driver fixes
- 8250_dw and _mtk driver fixes
- sc16is7xx driver bugfix
- new 8250_exar device ids added
All of these have been in linux-next this past week with no reported
issues"
* tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_mtk: Enable baud clock and manage in runtime PM
serial: 8250_dw: handle reset control deassert error
dt-bindings: serial: sh-sci: Fix r8a78000 interrupts
serial: sc16is7xx: remove useless enable of enhanced features
serial: 8250_exar: add support for Advantech 2 port card with Device ID 0x0018
tty: serial: sh-sci: fix RSCI FIFO overrun handling
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 6.18-rc3.
Included in here are:
- new option serial driver device ids added
- dt bindings fixes for numerous platforms
- xhci bugfixes for many reported regressions
- usbio dependency bugfix
- dwc3 driver fix
- raw-gadget bugfix
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add Telit FN920C04 ECM compositions
USB: serial: option: add Quectel RG255C
tcpm: switch check for role_sw device with fw_node
usb/core/quirks: Add Huawei ME906S to wakeup quirk
usb: raw-gadget: do not limit transfer length
USB: serial: option: add UNISOC UIS7720
xhci: dbc: enable back DbC in resume if it was enabled before suspend
xhci: dbc: fix bogus 1024 byte prefix if ttyDBC read races with stall event
usb: xhci-pci: Fix USB2-only root hub registration
dt-bindings: usb: qcom,snps-dwc3: Fix bindings for X1E80100
usb: misc: Add x86 dependency for Intel USBIO driver
dt-bindings: usb: switch: split out ports definition
usb: dwc3: Don't call clk_bulk_disable_unprepare() twice
dt-bindings: usb: dwc3-imx8mp: dma-range is required only for imx8mp
Pull spi fixes from Mark Brown:
"A moderately large collection of device specific changes here, mostly
fixes but also including a few new quirks and device IDs. This is all
fairly routine even for the affected devices"
* tag 'spi-fix-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dt-bindings: spi-rockchip: Add RK3506 compatible
spi: intel-pci: Add support for Intel Wildcat Lake SPI serial flash
spi: intel-pci: Add support for Arrow Lake-H SPI serial flash
spi: intel: Add support for 128M component density
spi: airoha: fix reading/writing of flashes with more than one plane per lun
spi: airoha: switch back to non-dma mode in the case of error
spi: airoha: add support of dual/quad wires spi modes to exec_op() handler
spi: airoha: return an error for continuous mode dirmap creation cases
spi: amlogic: fix spifc build error
spi: cadence-quadspi: Fix pm_runtime unbalance on dma EPROBE_DEFER
spi: spi-nxp-fspi: limit the clock rate for different sample clock source selection
spi: spi-nxp-fspi: add extra delay after dll locked
spi: spi-nxp-fspi: re-config the clock rate when operation require new clock rate
spi: dw-mmio: add error handling for reset_control_deassert()
spi: rockchip-sfc: Fix DMA-API usage
spi: dt-bindings: cadence: add soc-specific compatible strings for zynqmp and versal-net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can. Slim pickings, I'm guessing people haven't
really started testing.
Current release - new code bugs:
- eth: mlx5e:
- psp: avoid 'accel' NULL pointer dereference
- skip PPHCR register query for FEC histogram if not supported
Previous releases - regressions:
- bonding: update the slave array for broadcast mode
- rtnetlink: re-allow deleting FDB entries in user namespace
- eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path
Previous releases - always broken:
- can: drop skb on xmit if device is in listen-only mode
- gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb()
- eth: mlx5e
- RX, fix generating skb from non-linear xdp_buff if program
trims frags
- make devcom init failures non-fatal, fix races with IPSec
Misc:
- some documentation formatting 'fixes'"
* tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5: Fix IPsec cleanup over MPV device
net/mlx5: Refactor devcom to return NULL on failure
net/mlx5e: Skip PPHCR register query if not supported by the device
net/mlx5: Add PPHCR to PCAM supported registers mask
virtio-net: zero unused hash fields
net: phy: micrel: always set shared->phydev for LAN8814
vsock: fix lock inversion in vsock_assign_transport()
ovpn: use datagram_poll_queue for socket readiness in TCP
espintcp: use datagram_poll_queue for socket readiness
net: datagram: introduce datagram_poll_queue for custom receive queues
net: bonding: fix possible peer notify event loss or dup issue
net: hsr: prevent creation of HSR device with slaves from another netns
sctp: avoid NULL dereference when chunk data buffer is missing
ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop
net: ravb: Ensure memory write completes before ringing TX doorbell
net: ravb: Enforce descriptor type ordering
net: hibmcge: select FIXED_PHY
net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb
Documentation: networking: ax25: update the mailing list info.
net: gro_cells: fix lock imbalance in gro_cells_receive()
...
Pull SCSI fixes from James Bottomley:
"All driver fixes. The big change is the storvsc one to rejig the
hyper-v channel handling to be more efficient for SMP virtual
machines"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: phy: dt-bindings: Add QMP UFS PHY compatible for Kaanapali
scsi: ufs: qcom: dt-bindings: Document the Kaanapali UFS controller
scsi: libfc: Prevent integer overflow in fc_fcp_recv_data()
scsi: qla4xxx: Fix typos in comments
scsi: storvsc: Prefer returning channel with the same CPU as on the I/O issuing CPU
Pull rustfmt fixes from Miguel Ojeda:
"Rust 'rustfmt' cleanup
'rustfmt', by default, formats imports in a way that is prone to
conflicts while merging and rebasing, since in some cases it condenses
several items into the same line.
Document in our guidelines that we will handle this for the moment
with the trailing empty comment workaround and make the tree
'rustfmt'-clean again"
* tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: bitmap: fix formatting
rust: cpufreq: fix formatting
rust: alloc: employ a trailing comment to keep vertical layout
docs: rust: add section on imports formatting
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the handling of ZCR_EL2 in NV VMs
- Pick the correct translation regime when doing a PTW on the back of
a SEA
- Prevent userspace from injecting an event into a vcpu that isn't
initialised yet
- Move timer save/restore to the sysreg handling code, fixing EL2
timer access in the process
- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
- Fix trapping configuration when the host isn't GICv3
- Improve the detection of HCR_EL2.E2H being RES1
- Drop a spurious 'break' statement in the S1 PTW
- Don't try to access SPE when owned by EL3
Documentation updates:
- Document the failure modes of event injection
- Document that a GICv3 guest can be created on a GICv5 host with
FEAT_GCIE_LEGACY
Selftest improvements:
- Add a selftest for the effective value of HCR_EL2.AMO
- Address build warning in the timer selftest when building with
clang
- Teach irqfd selftests about non-x86 architectures
- Add missing sysregs to the set_id_regs selftest
- Fix vcpu allocation in the vgic_lpi_stress selftest
- Correctly enable interrupts in the vgic_lpi_stress selftest
x86:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
for the bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return
-EAGAIN if userspace deletes/moves memslot during prefault")
- Don't try to get PMU capabilities from perf when running a CPU with
hybrid CPUs/PMUs, as perf will rightly WARN.
guest_memfd:
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
more generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to
explicitly set said flag to initialize memory as SHARED,
irrespective of MMAP.
The behavior merged in 6.18 is that enabling mmap() implicitly
initializes memory as SHARED, which would result in an ABI
collision for x86 CoCo VMs as their memory is currently always
initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
private memory, to enable testing such setups, i.e. to hopefully
flush out any other lurking ABI issues before 6.18 is officially
released.
- Add testcases to the guest_memfd selftest to cover guest_memfd
without MMAP, and host userspace accesses to mmap()'d private
memory"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
arm64: Revamp HCR_EL2.E2H RES1 detection
KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
KVM: arm64: Kill leftovers of ad-hoc timer userspace access
KVM: arm64: Fix WFxT handling of nested virt
KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
KVM: arm64: Make timer_set_offset() generally accessible
KVM: arm64: Replace timer context vcpu pointer with timer_id
KVM: arm64: Introduce timer_context_to_vcpu() helper
KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
Documentation: KVM: Update GICv3 docs for GICv5 hosts
KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
KVM: arm64: selftests: Allocate vcpus with correct size
...
KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are rather boring
device-specific fixes and quirks:
- A few fixes for missing NULL checks
- ASoC NAU8821 fixes for jack and irq handling
- Various fixes for ASoC TAS2781, IDT821034, sc8280xp, max9809x,
wcd938x, and SoundWire
- Usual HD-audio and USB-audio quirks"
* tag 'sound-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx
ALSA: usb-audio: fix vendor quirk for Logitech H390
ALSA: usb-audio: add volume quirks for MS LifeChat LX-3000
ASoC: amd/sdw_utils: avoid NULL deref when devm_kasprintf() fails
ASoC: max98090/91: fixed max98091 ALSA widget powering up/down
ASoC: dt-bindings: Add compatible string fsl,imx-audio-tlv320
ASoC: codecs: wcd938x-sdw: remove redundant runtime pm calls
ASoC: sdw_utils: add rt1321 part id to codec_info_list
ALSA: usb-audio: Fix NULL pointer deference in try_to_register_card
ALSA: firewire: amdtp-stream: fix enum kernel-doc warnings
ALSA: usb-audio: add mixer_playback_min_mute quirk for Logitech H390
ASoC: nau8821: Avoid unnecessary blocking in IRQ handler
ASoC: nau8821: Add DMI quirk to bypass jack debounce circuit
ASoC: nau8821: Consistently clear interrupts before unmasking
ASoC: nau8821: Generalize helper to clear IRQ status
ASoC: nau8821: Cancel jdet_work before handling jack ejection
ASoC: codecs: Fix gain setting ranges for Renesas IDT821034 codec
ASoC: tas2781: Update ti,tas2781.yaml for adding tas58xx
ASoC: tas2781: Support more newly-released amplifiers tas58xx in the driver
ASoC: qcom: sc8280xp: Add support for QCS615
...
Pull i2c fixes from Wolfram Sang:
- PM cleanup after all prerequisites are merged with rc1
- usbio: missing addition after all dependencies are in
- slimpro: DT binding schema conversion
* tag 'i2c-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: Convert apm,xgene-slimpro-i2c to DT schema
i2c: usbio: Add ACPI device-id for MTL-CVF devices
i2c: Remove redundant pm_runtime_mark_last_busy() calls
`rustfmt`, by default, formats imports in a way that is prone to conflicts
while merging and rebasing, since in some cases it condenses several
items into the same line.
For instance, Linus mentioned [1] that the following case:
use crate::{
fmt,
page::AsPageIter,
};
is compressed by `rustfmt` into:
use crate::{fmt, page::AsPageIter};
which is undesirable.
Similarly, `rustfmt` may put several items in the same line even if the
braces span already multiple lines, e.g.:
use kernel::{
acpi, c_str,
device::{property, Core},
of, platform,
};
The options that control the formatting behavior around imports are
generally unstable, and `rustfmt` releases do not allow to use nightly
features, unlike the compiler and other Rust tooling [2].
For the moment, we can introduce a workaround to prevent `rustfmt`
from compressing the example above -- the "trailing empty comment":
use crate::{
fmt,
page::AsPageIter, //
};
which is reminiscent of the trailing comma behavior in other formatters.
We already used empty comments for formatting purposes in the past,
e.g. in commit b9b701fce4 ("rust: clarify the language unstable features
in use").
In addition, `rustfmt` actually reformats with a vertical layout (i.e. it
does not put two items in the same line) when seeing such a comment,
i.e. it doesn't just preserve the formatting, which is good in the sense
that we can use it to easily reformat some imports, since it matches
the style we generally want to have.
A Git merge driver would help (suggested by Gary and Wedson), though
maintainers would need to set it up, the diffs would still be larger
and the formatting rules for imports would remain hard to predict.
Thus document the style that we will follow in the coding guidelines
by introducing a new section and explain how the trailing empty comment
works there too.
We discussed the issue with upstream Rust in our usual Rust <-> Rust
for Linux meeting [3], and there have also been a few other discussions
in parallel in issues [4][5] and Zulip [6]. We will see what happens,
but upstream Rust has already created a subteam of `rustfmt` to try
to overcome the bandwidth issue [7], which is a good signal, and some
organization work has already started (e.g. tracking issues). We will
continue our discussions with them about it.
Cc: Caleb Cartwright <caleb.cartwright@outlook.com>
Cc: Yacin Tmimi <yacintmimi@gmail.com>
Cc: Manish Goregaokar <manishsmail@gmail.com>
Cc: Deadbeef <ent3rm4n@gmail.com>
Cc: Cameron Steffen <cam.steffen94@gmail.com>
Cc: Jieyou Xu <jieyouxu@outlook.com>
Link: https://lore.kernel.org/all/CAHk-=wgO7S_FZUSBbngG5vtejWOpzDfTTBkVvP3_yjJmFddbzA@mail.gmail.com/ [1]
Link: https://github.com/rust-lang/rustfmt/issues/4884 [2]
Link: https://hackmd.io/iSCyY3JTTz-g8YM-nnzTTA [3]
Link: https://github.com/rust-lang/rustfmt/issues/4991 [4]
Link: https://github.com/rust-lang/rustfmt/issues/3361 [5]
Link: https://rust-lang.zulipchat.com/#narrow/channel/392734-council/topic/rustfmt.20maintenance/near/543815381 [6]
Link: https://github.com/rust-lang/team/pull/2017 [7]
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN
Current release - regressions:
- udp: do not use skb_release_head_state() before
skb_attempt_defer_free()
- gro_cells: use nested-BH locking for gro_cell
- dpll: zl3073x: increase maximum size of flash utility
Previous releases - regressions:
- core: fix lockdep splat on device unregister
- tcp: fix tcp_tso_should_defer() vs large RTT
- tls:
- don't rely on tx_work during send()
- wait for pending async decryptions if tls_strp_msg_hold fails
- can: j1939: add missing calls in NETDEV_UNREGISTER notification
handler
- eth: lan78xx: fix lost EEPROM write timeout in
lan78xx_write_raw_eeprom
Previous releases - always broken:
- ip6_tunnel: prevent perpetual tunnel growth
- dpll: zl3073x: handle missing or corrupted flash configuration
- can: m_can: fix pm_runtime and CAN state handling
- eth:
- ixgbe: fix too early devlink_free() in ixgbe_remove()
- ixgbevf: fix mailbox API compatibility
- gve: Check valid ts bit on RX descriptor before hw timestamping
- idpf: cleanup remaining SKBs in PTP flows
- r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"
* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
netdevsim: set the carrier when the device goes up
selftests: tls: add test for short splice due to full skmsg
selftests: net: tls: add tests for cmsg vs MSG_MORE
tls: don't rely on tx_work during send()
tls: wait for pending async decryptions if tls_strp_msg_hold fails
tls: always set record_type in tls_process_cmsg
tls: wait for async encrypt in case of error during latter iterations of sendmsg
tls: trim encrypted message to match the plaintext on short splice
tg3: prevent use of uninitialized remote_adv and local_adv variables
MAINTAINERS: new entry for IPv6 IOAM
gve: Check valid ts bit on RX descriptor before hw timestamping
net: core: fix lockdep splat on device unregister
MAINTAINERS: add myself as maintainer for b53
selftests: net: check jq command is supported
net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
tcp: fix tcp_tso_should_defer() vs large RTT
r8152: add error handling in rtl8152_driver_init
usbnet: Fix using smp_processor_id() in preemptible code warnings
...
Merge series from Le Qi <le.qi@oss.qualcomm.com>:
This patch series adds support for the QCS615 sound card:
- Updates device tree bindings for SM8250 to include QCS615.
- Adds QCS615 support in the SC8280XP ASoC driver.
Marc Kleine-Budde says:
====================
pull-request: can 2025-10-14
The first 2 paches are by Celeste Liu and target the gS_usb driver.
The first patch remove the limitation to 3 CAN interface per USB
device. The second patch adds the missing population of
net_device->dev_port.
The next 4 patches are by me and fix the m_can driver. They add a
missing pm_runtime_disable(), fix the CAN state transition back to
Error Active and fix the state after ifup and suspend/resume.
Another patch by me targets the m_can driver, too and replaces Dong
Aisheng's old email address.
The next 2 patches are by Vincent Mailhol and update the CAN
networking Documentation.
Tetsuo Handa contributes the last patch that add missing cleanup calls
in the NETDEV_UNREGISTER notification handler.
* tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
can: add Transmitter Delay Compensation (TDC) documentation
can: remove false statement about 1:1 mapping between DLC and length
can: m_can: replace Dong Aisheng's old email address
can: m_can: fix CAN state in system PM
can: m_can: m_can_chip_config(): bring up interface in correct state
can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active
can: m_can: m_can_plat_remove(): add missing pm_runtime_disable()
can: gs_usb: gs_make_candev(): populate net_device->dev_port
can: gs_usb: increase max interface to U8_MAX
====================
Link: https://patch.msgid.link/20251014122140.990472-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>