Pull RTC updates from Alexandre Belloni:
"Not much this cycle, there are multiple small fixes.
Core:
- use boolean values with device_init_wakeup()
Drivers:
- pcf2127: add BSM support
- pcf85063: fix possible out of bounds write"
* tag 'rtc-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: pcf2127: add BSM support
rtc: Remove hpet_rtc_dropped_irq()
dt-bindings: rtc: mxc: Document fsl,imx31-rtc
rtc: stm32: Use syscon_regmap_lookup_by_phandle_args
rtc: zynqmp: Fix optional clock name property
rtc: loongson: clear TOY_MATCH0_REG in loongson_rtc_isr()
rtc: pcf85063: fix potential OOB write in PCF85063 NVMEM read
rtc: tps6594: Fix integer overflow on 32bit systems
rtc: use boolean values with device_init_wakeup()
rtc: RTC_DRV_SPEAR should not default to y when compile-testing
Pull i2c fixes from Wolfram Sang:
- add a missing Kconfig dependency for imx-lpi2c
- in the core, handle the new per-client debugfs directory during
probe/remove, not during {un}register
* tag 'i2c-for-6.14-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Fix core-managed per-client debugfs handling
i2c: imx-lpi2c: select CONFIG_I2C_SLAVE
Pull perf tools fixes from Namhyung Kim:
"An early round of random fixes in perf tools for this cycle.
perf trace:
- Fix loading of BPF program on certain clang versions
- Fix out-of-bound access in syscalls with 6 arguments
- Skip syscall enum test if landlock syscall is not available
perf annotate:
- Fix segfaults due to invalid access in disasm arrays
perf stat:
- Fix error handling in topology parsing"
* tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cpumap: Fix die and cluster IDs
perf test: Skip syscall enum test if no landlock syscall
perf trace: Fix runtime error of index out of bounds
perf annotate: Use an array for the disassembler preference
perf trace: Fix BPF loading failure (-E2BIG)
Pull audit fix from Paul Moore:
"A minor audit patch to fix an unitialized variable problem"
* tag 'audit-pr-20250130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Initialize lsmctx to avoid memory allocation error
Pull ACPI fixes from Rafael Wysocki:
"Add a new ACPI-related quirk for Vexia EDU ATLA 10 tablet 5V (Hans de
Goede) and fix the MADT parsing code so that CPUs with different entry
types (LAPIC and x2APIC) are initialized in the order in which they
appear in the MADT as required by the ACPI specification (Zhang Rui)"
* tag 'acpi-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/acpi: Fix LAPIC/x2APIC parsing order
ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V
Pull more power management updates from Rafael Wysocki:
"These are mostly fixes on top of the previously merged power
management material with the addition of some teo cpuidle governor
updates, some of which may also be regarded as fixes:
- Add missing error handling for syscore_suspend() to the hibernation
core code (Wentao Liang)
- Revert a commit that added unused macros (Andy Shevchenko)
- Synchronize the runtime PM status of devices that were runtime-
suspended before a system-wide suspend and need to be resumed
during the subsequent system-wide resume transition (Rafael
Wysocki)
- Clean up the teo cpuidle governor and make the handling of short
idle intervals in it consistent regardless of the properties of
idle states supplied by the cpuidle driver (Rafael Wysocki)
- Fix some boost-related issues in cpufreq (Lifeng Zheng)
- Fix build issues in the s3c64xx and airoha cpufreq drivers (Viresh
Kumar)
- Remove unconditional binding of schedutil governor kthreads to the
affected CPUs if the cpufreq driver indicates that updates can
happen from any CPU (Christian Loehle)"
* tag 'pm-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Synchronize runtime PM status of parents and children
cpufreq: airoha: Depends on OF
PM: Revert "Add EXPORT macros for exporting PM functions"
PM: hibernate: Add error handling for syscore_suspend()
cpufreq/schedutil: Only bind threads if needed
cpufreq: ACPI: Remove set_boost in acpi_cpufreq_cpu_init()
cpufreq: CPPC: Fix wrong max_freq in policy initialization
cpufreq: Introduce a more generic way to set default per-policy boost flag
cpufreq: Fix re-boost issue after hotplugging a CPU
cpufreq: s3c64xx: Fix compilation warning
cpuidle: teo: Skip sleep length computation for low latency constraints
cpuidle: teo: Replace time_span_ns with a flag
cpuidle: teo: Simplify handling of total events count
cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
cpuidle: teo: Simplify counting events used for tick management
cpuidle: teo: Clarify two code comments
cpuidle: teo: Drop local variable prev_intercept_idx
cpuidle: teo: Combine candidate state index checks against 0
cpuidle: teo: Reorder candidate state index checks
cpuidle: teo: Rearrange idle state lookup code
Merge fixes related to system sleep for 6.14-rc1:
- Add missing error handling for syscore_suspend() to the hibernation
core code (Wentao Liang).
- Revert a commit that added unused macros (Andy Shevchenko).
- Synchronize the runtime PM status of devices that were runtime-
suspended before a system-wide suspend and need to be resumed during
the subsequent syste-wide resume transition (Rafael Wysocki).
* pm-sleep:
PM: sleep: core: Synchronize runtime PM status of parents and children
PM: Revert "Add EXPORT macros for exporting PM functions"
PM: hibernate: Add error handling for syscore_suspend()
Merge updates of the teo cpuidle governor for 6.14-rc1 that clean it
up and make the handling of short idle intervals in it consistent
regardless of the properties of idle states supplied by the cpuidle
driver.
* pm-cpuidle:
cpuidle: teo: Skip sleep length computation for low latency constraints
cpuidle: teo: Replace time_span_ns with a flag
cpuidle: teo: Simplify handling of total events count
cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
cpuidle: teo: Simplify counting events used for tick management
cpuidle: teo: Clarify two code comments
cpuidle: teo: Drop local variable prev_intercept_idx
cpuidle: teo: Combine candidate state index checks against 0
cpuidle: teo: Reorder candidate state index checks
cpuidle: teo: Rearrange idle state lookup code
Pull networking fixes from Jakub Kicinski:
"Including fixes from IPSec, netfilter and Bluetooth.
Nothing really stands out, but as usual there's a slight concentration
of fixes for issues added in the last two weeks before the merge
window, and driver bugs from 6.13 which tend to get discovered upon
wider distribution.
Current release - regressions:
- net: revert RTNL changes in unregister_netdevice_many_notify()
- Bluetooth: fix possible infinite recursion of btusb_reset
- eth: adjust locking in some old drivers which protect their state
with spinlocks to avoid sleeping in atomic; core protects netdev
state with a mutex now
Previous releases - regressions:
- eth:
- mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
- bgmac: reduce max frame size to support just 1500 bytes; the
jumbo frame support would previously cause OOB writes, but now
fails outright
- mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
false detection of MPTCP blackholing
Previous releases - always broken:
- mptcp: handle fastopen disconnect correctly
- xfrm:
- make sure skb->sk is a full sock before accessing its fields
- fix taking a lock with preempt disabled for RT kernels
- usb: ipheth: improve safety of packet metadata parsing; prevent
potential OOB accesses
- eth: renesas: fix missing rtnl lock in suspend/resume path"
* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
MAINTAINERS: add Neal to TCP maintainers
net: revert RTNL changes in unregister_netdevice_many_notify()
net: hsr: fix fill_frame_info() regression vs VLAN packets
doc: mptcp: sysctl: blackhole_timeout is per-netns
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
netfilter: nf_tables: reject mismatching sum of field_len with set key length
net: sh_eth: Fix missing rtnl lock in suspend/resume path
net: ravb: Fix missing rtnl lock in suspend/resume path
selftests/net: Add test for loading devbound XDP program in generic mode
net: xdp: Disallow attaching device-bound programs in generic mode
tcp: correct handling of extreme memory squeeze
bgmac: reduce max frame size to support just MTU 1500
vsock/test: Add test for connect() retries
vsock/test: Add test for UAF due to socket unbinding
vsock/test: Introduce vsock_connect_fd()
vsock/test: Introduce vsock_bind()
vsock: Allow retrying on connect() failure
vsock: Keep the binding until socket destruction
Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
...
Pull documentation fixes from Jonathan Corbet:
"Two fixes for footnote-related warnings that appeared with Sphinx 8.x.
We want to encourage use of newer Sphinx - they fixed a performance
problem and the docs build takes less than half the time it used to"
* tag 'docs-6.14-2' of git://git.lwn.net/linux:
docs: power: Fix footnote reference for Toshiba Satellite P10-554
Documentation: ublk: Drop Stefan Hajnoczi's message footnote
Pull s390 fixes from Alexander Gordeev:
- Architecutre-specific ftrace recursion trylock tests were removed in
favour of the generic function_graph_enter(), but s390 got missed.
Remove this test for s390 as well.
- Add ftrace_get_symaddr() for s390, which returns the symbol address
from ftrace 'ip' parameter
* tag 's390-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/tracing: Define ftrace_get_symaddr() for s390
s390/fgraph: Fix to remove ftrace_test_recursion_trylock()
Pull more s390 updates from Alexander Gordeev:
- The rework that uncoupled physical and virtual address spaces
inadvertently prevented KASAN shadow mappings from using large pages.
Restore large page mappings for KASAN shadows
- Add decompressor routine physmem_alloc() that may fail, unlike
physmem_alloc_or_die(). This allows callers to implement fallback
paths
- Allow falling back from large pages to smaller pages (1MB or 4KB) if
the allocation of 2GB pages in the decompressor can not be fulfilled
- Add to the decompressor boot print support of "%%" format string,
width and padding hadnling, length modifiers and decimal conversion
specifiers
- Add to the decompressor message severity levels similar to kernel
ones. Support command-line options that control console output
verbosity
- Replaces boot_printk() calls with appropriate loglevel- specific
helpers such as boot_emerg(), boot_warn(), and boot_debug().
- Collect all boot messages into a ring buffer independent of the
current log level. This is particularly useful for early crash
analysis
- If 'earlyprintk' command line parameter is not specified, store
decompressor boot messages in a ring buffer to be printed later by
the kernel, once the console driver is registered
- Add 'bootdebug' command line parameter to enable printing of
decompressor debug messages when needed. That parameters allows
message suppressing and filtering
- Dump boot messages on a decompressor crash, but only if 'bootdebug'
command line parameter is enabled
- When CONFIG_PRINTK_TIME is enabled, add timestamps to boot messages
in the same format as regular printk()
- Dump physical memory tracking information on boot: online ranges,
reserved areas and vmem allocations
- Dump virtual memory layout and randomization details
- Improve decompression error reporting and dump the message ring
buffer in case the boot failed and system halted
- Add an exception handler which handles exceptions when FPU control
register is attempted to be set to an invalid value. Remove '.fixup'
section as result of this change
- Use 'A', 'O', and 'R' inline assembly format flags, which allows
recent Clang compilers to generate better FPU code
- Rework uaccess code so it reads better and generates more efficient
code
- Cleanup futex inline assembly code
- Disable KMSAN instrumention for futex inline assemblies, which
contain dereferenced user pointers. Otherwise, shadows for the user
pointers would be accessed
- PFs which are not initially configured but in standby create only a
single-function PCI domain. If they are configured later on, sibling
PFs and their child VFs will not be added to their PCI domain
breaking SR-IOV expectations.
Fix that by allowing initially configured but in standby PFs create
multi-function PCI domains
- Add '-std=gnu11' to decompressor and purgatory CFLAGS to avoid
compile errors caused by kernel's own definitions of 'bool', 'false',
and 'true' conflicting with the C23 reserved keywords
- Fix sclp subsystem failure when a sclp console is not present
- Fix misuse of non-NULL terminated strings in vmlogrdr driver
- Various other small improvements, cleanups and fixes
* tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (53 commits)
s390/vmlogrdr: Use array instead of string initializer
s390/vmlogrdr: Use internal_name for error messages
s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()
s390/tools: Use array instead of string initializer
s390/vmem: Fix null-pointer-arithmetic warning in vmem_map_init()
s390: Add '-std=gnu11' to decompressor and purgatory CFLAGS
s390/bitops: Use correct constraint for arch_test_bit() inline assembly
s390/pci: Fix SR-IOV for PFs initially in standby
s390/futex: Avoid KMSAN instrumention for user pointers
s390/uaccess: Rename get_put_user_noinstr_attributes to uaccess_kmsan_or_inline
s390/futex: Cleanup futex_atomic_cmpxchg_inatomic()
s390/futex: Generate futex atomic op functions
s390/uaccess: Remove INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER
s390/uaccess: Use asm goto for put_user()/get_user()
s390/uaccess: Remove usage of the oac specifier
s390/uaccess: Replace EX_TABLE_UA_LOAD_MEM exception handling
s390/uaccess: Cleanup noinstr __put_user()/__get_user() inline assembly constraints
s390/uaccess: Remove __put_user_fn()/__get_user_fn() wrappers
s390/uaccess: Move put_user() / __put_user() close to put_user() asm code
s390/uaccess: Use asm goto for __mvc_kernel_nofault()
...
Pull gpio fixes from Bartosz Golaszewski:
- update gpio-sim selftests to not fail now that we no longer allow
rmdir() on configfs entries of active devices
- remove leftover code from gpio-mxc
* tag 'gpio-fixes-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
selftests: gpio: gpio-sim: Fix missing chip disablements
gpio: mxc: remove dead code after switch to DT-only
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains one Netfilter fix:
1) Reject mismatching sum of field_len with set key length which allows
to create a set without inconsistent pipapo rule width and set key
length.
* tag 'nf-25-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: reject mismatching sum of field_len with set key length
====================
Link: https://patch.msgid.link/20250130113307.2327470-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ntfs3 fixes from Konstantin Komarov:
- unify inode corruption marking and mark them as bad immediately upon
detection of an error in attribute enumeration
- folio cleanup
* tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Unify inode corruption marking with _ntfs_bad_inode()
fs/ntfs3: Mark inode as bad as soon as error detected in mi_enum_attr()
ntfs3: Remove an access to page->index
Pull bcachefs fixes from Kent Overstreet:
- second half of a fix for a bug that'd been causing oopses on
filesystems using snapshots with memory pressure (key cache fills for
snaphots btrees are tricky)
- build fix for strange compiler configurations that double stack frame
size
- "journal stuck timeout" now takes into account device latency: this
fixes some spurious warnings, and the main remaining source of SRCU
lock hold time warnings (I'm no longer seeing this in my CI, so any
users still seeing this should definitely ping me)
- fix for slow/hanging unmounts (" Improve journal pin flushing")
- some more tracepoint fixes/improvements, to chase down the "rebalance
isn't making progress" issues
* tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs:
bcachefs: Improve trace_move_extent_finish
bcachefs: Fix trace_copygc
bcachefs: Journal writes are now IOPRIO_CLASS_RT
bcachefs: Improve journal pin flushing
bcachefs: fix bch2_btree_node_flags
bcachefs: rebalance, copygc enabled are runtime opts
bcachefs: Improve decompression error messages
bcachefs: bset_blacklisted_journal_seq is now AUTOFIX
bcachefs: "Journal stuck" timeout now takes into account device latency
bcachefs: Reduce stack frame size of __bch2_str_hash_check_key()
bcachefs: Fix btree_trans_peek_key_cache()
Matthieu Baerts says:
====================
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
Here are two small fixes for issues introduced in v6.12.
- Patch 1: reset the mpc_drop mark for other SYN retransmits, to only
consider an MPTCP blackhole when the first SYN retransmitted without
the MPTCP options is accepted, as initially intended.
- Patch 2: also mention in the doc that the blackhole_timeout sysctl
knob is per-netns, like all the others.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================
Link: https://patch.msgid.link/20250129-net-mptcp-blackhole-fix-v1-0-afe88e5a6d2c@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
All other sysctl entries mention it, and it is a per-namespace sysctl.
So mention it as well.
Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The Fixes commit mentioned this:
> An MPTCP firewall blackhole can be detected if the following SYN
> retransmission after a fallback to "plain" TCP is accepted.
But in fact, this blackhole was detected if any following SYN
retransmissions after a fallback to TCP was accepted.
That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
to 0, and 'mpc_drop' will never be reset to 0 after.
This is an issue, because some not so unusual situations might cause the
kernel to detect a false-positive blackhole, e.g. a client trying to
connect to a server while the network is not ready yet, causing a few
SYN retransmissions, before reaching the end server.
Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The field length description provides the length of each separated key
field in the concatenation, each field gets rounded up to 32-bits to
calculate the pipapo rule width from pipapo_init(). The set key length
provides the total size of the key aligned to 32-bits.
Register-based arithmetics still allows for combining mismatching set
key length and field length description, eg. set key length 10 and field
description [ 5, 4 ] leading to pipapo width of 12.
Cc: stable@vger.kernel.org
Fixes: 3ce67e3793 ("netfilter: nf_tables: do not allow mismatch field size and set key length")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix the suspend/resume path by ensuring the rtnl lock is held where
required. Calls to sh_eth_close, sh_eth_open and wol operations must be
performed under the rtnl lock to prevent conflicts with ongoing ndo
operations.
Fixes: b71af04676 ("sh_eth: add more PM methods")
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btusb: mediatek: Add locks for usb_driver_claim_interface()
- L2CAP: accept zero as a special value for MTU auto-selection
- btusb: Fix possible infinite recursion of btusb_reset
- Add ABI doc for sysfs reset
- btnxpuart: Fix glitches seen in dual A2DP streaming
* tag 'for-net-2025-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
Bluetooth: Add ABI doc for sysfs reset
Bluetooth: Fix possible infinite recursion of btusb_reset
Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()
====================
Link: https://patch.msgid.link/20250129210057.1318963-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Testing with iperf3 using the "pasta" protocol splicer has revealed
a problem in the way tcp handles window advertising in extreme memory
squeeze situations.
Under memory pressure, a socket endpoint may temporarily advertise
a zero-sized window, but this is not stored as part of the socket data.
The reasoning behind this is that it is considered a temporary setting
which shouldn't influence any further calculations.
However, if we happen to stall at an unfortunate value of the current
window size, the algorithm selecting a new value will consistently fail
to advertise a non-zero window once we have freed up enough memory.
This means that this side's notion of the current window size is
different from the one last advertised to the peer, causing the latter
to not send any data to resolve the sitution.
The problem occurs on the iperf3 server side, and the socket in question
is a completely regular socket with the default settings for the
fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket.
The following excerpt of a logging session, with own comments added,
shows more in detail what is happening:
// tcp_v4_rcv(->)
// tcp_rcv_established(->)
[5201<->39222]: ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ====
[5201<->39222]: tcp_data_queue(->)
[5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 259909392->260034360 (124968), unread 5565800, qlen 85, ofoq 0]
[OFO queue: gap: 65480, len: 0]
[5201<->39222]: tcp_data_queue(<-)
[5201<->39222]: __tcp_transmit_skb(->)
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: tcp_select_window(->)
[5201<->39222]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) ? --> TRUE
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
returning 0
[5201<->39222]: tcp_select_window(<-)
[5201<->39222]: ADVERTISING WIN 0, ACK_SEQ: 265600160
[5201<->39222]: [__tcp_transmit_skb(<-)
[5201<->39222]: tcp_rcv_established(<-)
[5201<->39222]: tcp_v4_rcv(<-)
// Receive queue is at 85 buffers and we are out of memory.
// We drop the incoming buffer, although it is in sequence, and decide
// to send an advertisement with a window of zero.
// We don't update tp->rcv_wnd and tp->rcv_wup accordingly, which means
// we unconditionally shrink the window.
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 0, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 260040464->260040464 (0), unread 5559696, qlen 85, ofoq 0]
returning 6104 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// After each read, the algorithm for calculating the new receive
// window in __tcp_cleanup_rbuf() finds it is too small to advertise
// or to update tp->rcv_wnd.
// Meanwhile, the peer thinks the window is zero, and will not send
// any more data to trigger an update from the interrupt mode side.
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 260099840->260171536 (71696), unread 5428624, qlen 83, ofoq 0]
returning 131072 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// The above pattern repeats again and again, since nothing changes
// between the reads.
[...]
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 265600160->265600160 (0), unread 0, qlen 0, ofoq 0]
returning 54672 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// The receive queue is empty, but no new advertisement has been sent.
// The peer still thinks the receive window is zero, and sends nothing.
// We have ended up in a deadlock situation.
Note that well behaved endpoints will send win0 probes, so the problem
will not occur.
Furthermore, we have observed that in these situations this side may
send out an updated 'th->ack_seq´ which is not stored in tp->rcv_wup
as it should be. Backing ack_seq seems to be harmless, but is of
course still wrong from a protocol viewpoint.
We fix this by updating the socket state correctly when a packet has
been dropped because of memory exhaustion and we have to advertize
a zero window.
Further testing shows that the connection recovers neatly from the
squeeze situation, and traffic can continue indefinitely.
Fixes: e2142825c1 ("net: tcp: send zero-window ACK when no memory")
Cc: Menglong Dong <menglong8.dong@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250127231304.1465565-1-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bgmac allocates new replacement buffer before handling each received
frame. Allocating & DMA-preparing 9724 B each time consumes a lot of CPU
time. Ideally bgmac should just respect currently set MTU but it isn't
the case right now. For now just revert back to the old limited frame
size.
This change bumps NAT masquerade speed by ~95%.
Since commit 8218f62c9c ("mm: page_frag: use initial zero offset for
page_frag_alloc_align()"), the bgmac driver fails to open its network
interface successfully and runs out of memory in the following call
stack:
bgmac_open
-> bgmac_dma_init
-> bgmac_dma_rx_skb_for_slot
-> netdev_alloc_frag
BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768.
Eventually we land into __page_frag_alloc_align() with the following
parameters across multiple successive calls:
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144
So in that case we do indeed have offset + fragsz (40192) > size (32768)
and so we would eventually return NULL. Reverting to the older 1500
bytes MTU allows the network driver to be usable again.
Fixes: 8c7da63978 ("bgmac: configure MTU and add support for frames beyond 8192 byte size")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
[florian: expand commit message about recent commits]
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250127175159.1788246-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Preserve sockets bindings; this includes both resulting from an explicit
bind() and those implicitly bound through autobind during connect().
Prevents socket unbinding during a transport reassignment, which fixes a
use-after-free:
1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
2. transport->release() calls vsock_remove_bound() without checking if
sk was bound and moved to bound list (refcnt=1)
3. vsock_bind() assumes sk is in unbound list and before
__vsock_insert_bound(vsock_bound_sockets()) calls
__vsock_remove_bound() which does:
list_del_init(&vsk->bound_table); // nop
sock_put(&vsk->sk); // refcnt=0
BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
Read of size 4 at addr ffff88816b46a74c by task a.out/2057
dump_stack_lvl+0x68/0x90
print_report+0x174/0x4f6
kasan_report+0xb9/0x190
__vsock_bind+0x62e/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 2057:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
__kasan_slab_alloc+0x85/0x90
kmem_cache_alloc_noprof+0x131/0x450
sk_prot_alloc+0x5b/0x220
sk_alloc+0x2c/0x870
__vsock_create.constprop.0+0x2e/0xb60
vsock_create+0xe4/0x420
__sock_create+0x241/0x650
__sys_socket+0xf2/0x1a0
__x64_sys_socket+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 2057:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x60
__kasan_slab_free+0x4b/0x70
kmem_cache_free+0x1a1/0x590
__sk_destruct+0x388/0x5a0
__vsock_bind+0x5e1/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
RIP: 0010:refcount_warn_saturate+0xce/0x150
__vsock_bind+0x66d/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
refcount_t: underflow; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
RIP: 0010:refcount_warn_saturate+0xee/0x150
vsock_remove_bound+0x187/0x1e0
__vsock_release+0x383/0x4a0
vsock_release+0x90/0x120
__sock_release+0xa3/0x250
sock_close+0x14/0x20
__fput+0x359/0xa80
task_work_run+0x107/0x1d0
do_exit+0x847/0x2560
do_group_exit+0xb8/0x250
__x64_sys_exit_group+0x3a/0x50
x64_sys_call+0xfec/0x14f0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-1-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When audit is enabled in a kernel build, and there are no LSMs active
that support LSM labeling, it is possible that local variable lsmctx
in the AUDIT_SIGNAL_INFO handler in audit_receive_msg() could be used
before it is properly initialize. Then kmalloc() will try to allocate
a large amount of memory with the uninitialized length.
This patch corrects this problem by initializing the lsmctx to a safe
value when it is declared, which avoid errors like:
WARNING: CPU: 2 PID: 443 at mm/page_alloc.c:4727 __alloc_pages_noprof
...
ra: 9000000003059644 ___kmalloc_large_node+0x84/0x1e0
ERA: 900000000304d588 __alloc_pages_noprof+0x4c8/0x1040
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 00000004 (PPLV0 +PIE -PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
CPU: 2 UID: 0 PID: 443 Comm: auditd Not tainted 6.13.0-rc1+ #1899
...
Call Trace:
[<9000000002def6a8>] show_stack+0x30/0x148
[<9000000002debf58>] dump_stack_lvl+0x68/0xa0
[<9000000002e0fe18>] __warn+0x80/0x108
[<900000000407486c>] report_bug+0x154/0x268
[<90000000040ad468>] do_bp+0x2a8/0x320
[<9000000002dedda0>] handle_bp+0x120/0x1c0
[<900000000304d588>] __alloc_pages_noprof+0x4c8/0x1040
[<9000000003059640>] ___kmalloc_large_node+0x80/0x1e0
[<9000000003061504>] __kmalloc_noprof+0x2c4/0x380
[<9000000002f0f7ac>] audit_receive_msg+0x764/0x1530
[<9000000002f1065c>] audit_receive+0xe4/0x1c0
[<9000000003e5abe8>] netlink_unicast+0x340/0x450
[<9000000003e5ae9c>] netlink_sendmsg+0x1a4/0x4a0
[<9000000003d9ffd0>] __sock_sendmsg+0x48/0x58
[<9000000003da32f0>] __sys_sendto+0x100/0x170
[<9000000003da3374>] sys_sendto+0x14/0x28
[<90000000040ad574>] do_syscall+0x94/0x138
[<9000000002ded318>] handle_syscall+0xb8/0x158
Fixes: 6fba89813c ("lsm: ensure the correct LSM context releaser")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
[PM: resolved excessive line length in the backtrace]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull soundwire updates from Vinod Koul:
- SoundWire multi lane support to use multiple lanes if supported
- Stream handling of DEPREPARED state
- AMD wake register programming for power off mode
* tag 'soundwire-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: clear wake enable register for power off mode
soundwire: generic_bandwidth_allocation: count the bandwidth of active streams only
SoundWire: pass stream to compute_params()
soundwire: generic_bandwidth_allocation: add lane in sdw_group_params
soundwire: generic_bandwidth_allocation: select data lane
soundwire: generic_bandwidth_allocation: check required freq accurately
soundwire: generic_bandwidth_allocation: correct clk_freq check in sdw_select_row_col
Soundwire: generic_bandwidth_allocation: set frame shape on fly
Soundwire: stream: program BUSCLOCK_SCALE
Soundwire: add sdw_slave_get_scale_index helper
soundwire: generic_bandwidth_allocation: skip DEPREPARED streams
soundwire: stream: set DEPREPARED state earlier
soundwire: add lane_used_bandwidth in struct sdw_bus
soundwire: mipi_disco: read lane mapping properties from ACPI
soundwire: add lane field in sdw_port_runtime
soundwire: bus: Move irq mapping cleanup into devres
Pull dmaengine updates from Vinod Koul:
"A bunch of new device support and updates to few drivers, biggest of
them amd ones.
New support:
- TI J722S CSI BCDMA controller support
- Intel idxd Panther Lake family platforms
- Allwinner F1C100s suniv DMA
- Qualcomm QCS615, QCS8300, SM8750, SA8775P GPI dma controller support
- AMD ae4dma controller support and reorganisation of amd driver
Updates:
- Channel page support for Nvidia Tegra210 adma driver
- Freescale support for S32G based platforms
- Yamilfy atmel dma bindings"
* tag 'dmaengine-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (45 commits)
dmaengine: idxd: Enable Function Level Reset (FLR) for halt
dmaengine: idxd: Refactor halt handler
dmaengine: idxd: Add idxd_device_config_save() and idxd_device_config_restore() helpers
dmaengine: idxd: Binding and unbinding IDXD device and driver
dmaengine: idxd: Add idxd_pci_probe_alloc() helper
dt-bindings: dma: atmel: Convert to json schema
dt-bindings: dma: st-stm32-dmamux: Add description for dma-cell values
dmaengine: qcom: gpi: Add GPI immediate DMA support for SPI protocol
dt-bindings: dma: adi,axi-dmac: deprecate adi,channels node
dt-bindings: dma: adi,axi-dmac: convert to yaml schema
dmaengine: mv_xor: switch to for_each_child_of_node_scoped()
dmaengine: bcm2835-dma: Prevent suspend if DMA channel is busy
dmaengine: tegra210-adma: Support channel page
dt-bindings: dma: Support channel page to nvidia,tegra210-adma
dmaengine: ti: k3-udma: Add support for J722S CSI BCDMA
dt-bindings: dma: ti: k3-bcdma: Add J722S CSI BCDMA
dmaengine: ti: edma: fix OF node reference leaks in edma_driver
dmaengine: ti: edma: make the loop condition simpler in edma_probe()
dmaengine: fsl-edma: read/write multiple registers in cyclic transactions
dmaengine: fsl-edma: add support for S32G based platforms
...
One of the possible ways to enable the input MTU auto-selection for L2CAP
connections is supposed to be through passing a special "0" value for it
as a socket option. Commit [1] added one of those into avdtp. However, it
simply wouldn't work because the kernel still treats the specified value
as invalid and denies the setting attempt. Recorded BlueZ logs include the
following:
bluetoothd[496]: profiles/audio/avdtp.c:l2cap_connect() setsockopt(L2CAP_OPTIONS): Invalid argument (22)
[1]: ae5be371a9
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4b6e228e29 ("Bluetooth: Auto tune if input MTU is set to 0")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes a regression caused by previous commit for fixing truncated
ACL data, which is causing some intermittent glitches when running two
A2DP streams.
serdev_device_write_buf() is the root cause of the glitch, which is
reverted, and the TX work will continue to write until the queue is empty.
This change fixes both issues. No A2DP streaming glitches or truncated
ACL data issue observed.
Fixes: 8023dd2204 ("Bluetooth: btnxpuart: Fix driver sending truncated data")
Fixes: 689ca16e52 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")
Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The functionality was implemented in commit 0f8a001374 ("Bluetooth:
Allow reset via sysfs")
Fixes: 0f8a001374 ("Bluetooth: Allow reset via sysfs")
Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The function enters infinite recursion if the HCI device doesn't support
GPIO reset: btusb_reset -> hdev->reset -> vendor_reset -> btusb_reset...
btusb_reset shouldn't call hdev->reset after commit f07d478090
("Bluetooth: Get rid of cmd_timeout and use the reset callback")
Fixes: f07d478090 ("Bluetooth: Get rid of cmd_timeout and use the reset callback")
Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The documentation for usb_driver_claim_interface() says that "the
device lock" is needed when the function is called from places other
than probe(). This appears to be the lock for the USB interface
device. The Mediatek btusb code gets called via this path:
Workqueue: hci0 hci_power_on [bluetooth]
Call trace:
usb_driver_claim_interface
btusb_mtk_claim_iso_intf
btusb_mtk_setup
hci_dev_open_sync
hci_power_on
process_scheduled_works
worker_thread
kthread
With the above call trace the device lock hasn't been claimed. Claim
it.
Without this fix, we'd sometimes see the error "Failed to claim iso
interface". Sometimes we'd even see worse errors, like a NULL pointer
dereference (where `intf->dev.driver` was NULL) with a trace like:
Call trace:
usb_suspend_both
usb_runtime_suspend
__rpm_callback
rpm_suspend
pm_runtime_work
process_scheduled_works
Both errors appear to be fixed with the proper locking.
Fixes: ceac1cb025 ("Bluetooth: btusb: mediatek: add ISO data transmission functions")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pull regulator fixes from Mark Brown:
"A couple of fixes that have come in during the merge window: one that
operates the TPS6287x devices more within the design spec and can
prevent current surges when changing voltages and another more trivial
one for error message formatting"
* tag 'regulator-fix-v6.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Add missing newline character
regulator: TPS6287X: Use min/max uV to get VRANGE