The following bug was triggered on a system built with
CONFIG_DEBUG_PREEMPT=y:
# echo p > /proc/sysrq-trigger
BUG: using smp_processor_id() in preemptible [00000000] code: sh/117
caller is perf_event_print_debug+0x1a/0x4c0
CPU: 3 UID: 0 PID: 117 Comm: sh Not tainted 6.11.0-rc1 #109
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x60
check_preemption_disabled+0xc8/0xd0
perf_event_print_debug+0x1a/0x4c0
__handle_sysrq+0x140/0x180
write_sysrq_trigger+0x61/0x70
proc_reg_write+0x4e/0x70
vfs_write+0xd0/0x430
? handle_mm_fault+0xc8/0x240
ksys_write+0x9c/0xd0
do_syscall_64+0x96/0x190
entry_SYSCALL_64_after_hwframe+0x4b/0x53
This is because the commit d4b294bf84 ("perf/x86: Hybrid PMU support
for counters") took smp_processor_id() outside the irq critical section.
If a preemption occurs in perf_event_print_debug() and the task is
migrated to another cpu, we may get incorrect pmu debug information.
Move smp_processor_id() back inside the irq critical section to fix this
issue.
Fixes: d4b294bf84 ("perf/x86: Hybrid PMU support for counters")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240729220928.325449-1-lihuafei1@huawei.com
Per the example of:
!atomic_cmpxchg(&key->enabled, 0, 1)
the inverse was written as:
atomic_cmpxchg(&key->enabled, 1, 0)
except of course, that while !old is only true for old == 0, old is
true for everything except old == 0.
Fix it to read:
atomic_cmpxchg(&key->enabled, 1, 0) == 1
such that only the 1->0 transition returns true and goes on to disable
the keys.
Fixes: 83ab38ef0a ("jump_label: Fix concurrency issues in static_key_slow_dec()")
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lkml.kernel.org/r/20240731105557.GY33588@noisy.programming.kicks-ass.net
The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:
BUG: using smp_processor_id() in preemptible [00000000] code:
caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
Merge fixes for the int340x thermal driver handling of MSI IRQs:
- Fix MSI error path cleanup in int340x, allow it to work with a
subset of thermal MSI IRQs if some of them are not working and
make it free all MSI IRQs on module exit (Srinivas Pandruvada).
* thermal-intel:
thermal: intel: int340x: Free MSI IRQ vectors on module exit
thermal: intel: int340x: Allow limited thermal MSI support
thermal: intel: int340x: Fix kernel warning during MSI cleanup
Say there are 3 trip points A, B, C sorted in ascending temperature
order with no hysteresis. If the zone temerature is exactly equal to
B, thermal_zone_set_trips() will set the boundaries to A and C and the
hardware will not catch any crossing of B (either way) until either A
or C is crossed and the boundaries are changed.
To avoid that, use non-strict inequalities when comparing the trip
threshold to the zone temperature in thermal_zone_set_trips().
In the example above, it will cause both boundaries to be set to B,
which is desirable because an interrupt will trigger when the zone
temperature becomes different from B regardless of which way it goes.
That will allow a new interval to be set depending on the direction of
the zone temperature change.
Fixes: 893bae9223 ("thermal: trip: Make thermal_zone_set_trips() use trip thresholds")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12509184.O9o76ZdvQC@rjwysocki.net
Commit 7ba5ca32fe ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since
Kernels 5.14.0, causing an AB/BA deadlock competition for the
substream lock with eventual system freeze under ALSA operation:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
AB/BA deadlock competition for ALSA substream lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
revert commit 7ba5ca32fe ("ALSA: firewire-lib: operate for period
elapse event in process context")
Replace inline description to prevent future deadlock.
Cc: stable@vger.kernel.org
Fixes: 7ba5ca32fe ("ALSA: firewire-lib: operate for period elapse event in process context")
Reported-by: edmund.raile <edmund.raile@proton.me>
Closes: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/
Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20240730195318.869840-3-edmund.raile@protonmail.com
The tx_resolution field is never read. In theory you can imagine this
field being useful for detecting whether the transmitter has the
resolution for the message you are trying to send, but I am not aware of
any hardware where this could be an issue.
Just remove.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
IR Controller could be used and updated by other processor
while kernel has been suspended.
Reinitialize IR Controller just in case while kernel is resuming.
Signed-off-by: Zelong Dong <zelong.dong@amlogic.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Pull btrfs fixes from David Sterba:
- fix regression in extent map rework when handling insertion of
overlapping compressed extent
- fix unexpected file length when appending to a file using direct io
and buffer not faulted in
- in zoned mode, fix accounting of unusable space when flipping
read-only block group back to read-write
- fix page locking when COWing an inline range, assertion failure found
by syzbot
- fix calculation of space info in debugging print
- tree-checker, add validation of data reference item
- fix a few -Wmaybe-uninitialized build warnings
* tag 'for-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry()
btrfs: fix corruption after buffer fault in during direct IO append write
btrfs: zoned: fix zone_unusable accounting on making block group read-write again
btrfs: do not subtract delalloc from avail bytes
btrfs: make cow_file_range_inline() honor locked_page on error
btrfs: fix corrupt read due to bad offset of a compressed extent map
btrfs: tree-checker: validate dref root and objectid
Pull perf tools fixes from Namhyung Kim:
"Some more build fixes and a random crash fix:
- Fix cross-build by setting pkg-config env according to the arch
- Fix static build for missing library dependencies
- Fix Segfault when callchain has no symbols"
* tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf docs: Document cross compilation
perf: build: Link lib 'zstd' for static build
perf: build: Link lib 'lzma' for static build
perf: build: Only link libebl.a for old libdw
perf: build: Set Python configuration for cross compilation
perf: build: Setup PKG_CONFIG_LIBDIR for cross compilation
perf tool: fix dereferencing NULL al->maps
Tony Nguyen says:
====================
ice: fix AF_XDP ZC timeout and concurrency issues
Maciej Fijalkowski says:
Changes included in this patchset address an issue that customer has
been facing when AF_XDP ZC Tx sockets were used in combination with flow
control and regular Tx traffic.
After executing:
ethtool --set-priv-flags $dev link-down-on-close on
ethtool -A $dev rx on tx on
launching multiple ZC Tx sockets on $dev + pinging remote interface (so
that regular Tx traffic is present) and then going through down/up of
$dev, Tx timeout occurred and then most of the time ice driver was unable
to recover from that state.
These patches combined together solve the described above issue on
customer side. Main focus here is to forbid producing Tx descriptors when
either carrier is not yet initialized or process of bringing interface
down has already started.
v1: https://lore.kernel.org/netdev/20240708221416.625850-1-anthony.l.nguyen@intel.com/
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: xsk: fix txq interrupt mapping
ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog
ice: improve updating ice_{t,r}x_ring::xsk_pool
ice: toggle netif_carrier when setting up XSK pool
ice: modify error handling when setting XSK pool in ndo_bpf
ice: replace synchronize_rcu with synchronize_net
ice: don't busy wait for Rx queue disable in ice_qp_dis()
ice: respect netif readiness in AF_XDP ZC related ndo's
====================
Link: https://patch.msgid.link/20240729200716.681496-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d ("net: missing check virtio")
Cc: stable@vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MDIX status is not accurately reflecting the current state after the link
partner has manually altered its MDIX configuration while operating in forced
mode.
Access information about Auto mdix completion and pair selection from the
KSZ9131's Auto/MDI/MDI-X status register
Fixes: b64e6a8794 ("net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240725071125.13960-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull chrome-platform fix from Tzung-Bi Shih:
"Fix a race condition that sends multiple host commands at a time"
* tag 'chrome-platform-fixes-for-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_proto: Lock device when updating MKBP version
The mbigen interrupt chip has its per node registers located in a
contiguous region of page sized chunks. The code maps them into virtual
address space as a contiguous region and determines the address of a node
by using the node ID as index.
mbigen chip
|-----------------|------------|--------------|
mgn_node_0 mgn_node_1 ... mgn_node_i
|--------------| |--------------| |----------------------|
[0x0000, 0x0x0FFF] [0x1000, 0x1FFF] [i*0x1000, (i+1)*0x1000 - 1]
This works correctly up to 10 nodes, but then fails because the 11th's
array slot is used for the MGN_CLEAR registers.
mbigen chip
|-----------|--------|--------|---------------|--------|
mgn_node_0 mgn_node_1 ... mgn_clear_register ... mgn_node_i
|-----------------|
[0xA000, 0xAFFF]
Skip the MGN_CLEAR register space when calculating the offset for node IDs
greater than or equal to ten.
Fixes: a6c2f87b88 ("irqchip/mbigen: Implement the mbigen irq chip operation functions")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240730014400.1751530-1-zouyipeng@huawei.com
This clarifies the rules for min()/max()/clamp() type checking and makes
them a much more efficient macro expansion.
In particular, we now look at the type and range of the inputs to see
whether they work together, generating a mask of acceptable comparisons,
and then just verifying that the inputs have a shared case:
- an expression with a signed type can be used for
(1) signed comparisons
(2) unsigned comparisons if it is statically known to have a
non-negative value
- an expression with an unsigned type can be used for
(3) unsigned comparison
(4) signed comparisons if the type is smaller than 'int' and thus
the C integer promotion rules will make it signed anyway
Here rule (1) and (3) are obvious, and rule (2) is important in order to
allow obvious trivial constants to be used together with unsigned
values.
Rule (4) is not necessarily a good idea, but matches what we used to do,
and we have extant cases of this situation in the kernel. Notably with
bcachefs having an expression like
min(bch2_bucket_sectors_dirty(a), ca->mi.bucket_size)
where bch2_bucket_sectors_dirty() returns an 's64', and
'ca->mi.bucket_size' is of type 'u16'.
Technically that bcachefs comparison is clearly sensible on a C type
level, because the 'u16' will go through the normal C integer promotion,
and become 'int', and then we're comparing two signed values and
everything looks sane.
However, it's not entirely clear that a 'min(s64,u16)' operation makes a
lot of conceptual sense, and it's possible that we will remove rule (4).
After all, the _reason_ we have these complicated type checks is exactly
that the C type promotion rules are not very intuitive.
But at least for now the rule is in place for backwards compatibility.
Also note that rule (2) existed before, but is hugely relaxed by this
commit. It used to be true only for the simplest compile-time
non-negative integer constants. The new macro model will allow cases
where the compiler can trivially see that an expression is non-negative
even if it isn't necessarily a constant.
For example, the amdgpu driver does
min_t(size_t, sizeof(fru_info->serial), pia[addr] & 0x3F));
because our old 'min()' macro would see that 'pia[addr] & 0x3F' is of
type 'int' and clearly not a C constant expression, so doing a 'min()'
with a 'size_t' is a signedness violation.
Our new 'min()' macro still sees that 'pia[addr] & 0x3F' is of type
'int', but is smart enough to also see that it is clearly non-negative,
and thus would allow that case without any complaints.
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some pre-production Lunar Lake systems, there is a kernel
warning:
remove_proc_entry: removing non-empty directory 'irq/172'
WARNING: CPU: 0 PID: 501 at fs/proc/generic.c:717
remove_proc_entry+0x1b4/0x1e0
...
...
remove_proc_entry+0x1b4/0x1e0
report_bug+0x182/0x1b0
handle_bug+0x51/0xa0
exc_invalid_op+0x18/0x80
asm_exc_invalid_op+0x1b/0x20
remove_proc_entry+0x1b4/0x1e0
remove_proc_entry+0x1b4/0x1e0
unregister_irq_proc+0xf2/0x120
free_desc+0x41/0xe0
irq_domain_free_irqs+0x138/0x1c0
irq_free_descs+0x52/0x80
irq_domain_free_irqs+0x151/0x1c0
msi_domain_free_locked.part.0+0x17e/0x1c0
msi_domain_free_irqs_all_locked+0x74/0xc0
pci_msi_teardown_msi_irqs+0x50/0x60
pci_free_msi_irqs+0x12/0x40
pci_free_irq_vectors+0x58/0x70
On these systems, not all the MSI thermal vectors are valid. This causes
devm_request_threaded_irq() to fail for some vectors. As part of the
clean up on this error, pci_free_irq_vectors() is called without calling
devm_free_irq(). This causes the above warning.
Add a function proc_thermal_free_msi() to call devm_free_irq() for all
successfully registered IRQ handlers, then call pci_free_irq_vectors().
Call this function for MSI cleanup.
Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Reported-by: Yijun Shen <Yijun.shen@dell.com>
Tested-by: Yijun Shen <Yijun.shen@dell.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On the off chance that clock value ends up being too high (by means
of skl_ddi_calculate_wrpll() having been called with big enough
value of crtc_state->port_clock * 1000), one possible consequence
may be that the result will not be able to fit into signed int.
Fix this issue by moving conversion of clock parameter from kHz to Hz
into the body of skl_ddi_calculate_wrpll(), as well as casting the
same parameter to u64 type while calculating the value for AFE clock.
This both mitigates the overflow problem and avoids possible erroneous
integer promotion mishaps.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 82d3543701 ("drm/i915/skl: Implementation of SKL DPLL programming")
Cc: stable@vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240729174035.25727-1-n.zhandarovich@fintech.ru
(cherry picked from commit 833cf12846)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some arch + compiler combinations report a potentially unused variable
location in btrfs_lookup_dentry(). This is a false alert as the variable
is passed by value and always valid or there's an error. The compilers
cannot probably reason about that although btrfs_inode_by_name() is in
the same file.
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5
m68k-gcc8/m68k-allmodconfig
mips-gcc8/mips-allmodconfig
powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc64_defconfig
Initialize it to zero, this should fix the warnings and won't change the
behaviour as btrfs_inode_by_name() accepts only a root or inode item
types, otherwise returns an error.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
iucv_sever_path() is called from process context and from bh context.
iucv->path is used as indicator whether somebody else is taking care of
severing the path (or it is already removed / never existed).
This needs to be done with atomic compare and swap, otherwise there is a
small window where iucv_sock_close() will try to work with a path that has
already been severed and freed by iucv_callback_connrej() called by
iucv_tasklet_fn().
Example:
[452744.123844] Call Trace:
[452744.123845] ([<0000001e87f03880>] 0x1e87f03880)
[452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138
[452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv]
[452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv]
[452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv]
[452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8
[452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48
[452744.124820] [<00000000d5421642>] __fput+0xba/0x268
[452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0
[452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90
[452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8
[452744.125319] Last Breaking-Event-Address:
[452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138
[452744.125324]
[452744.125325] Kernel panic - not syncing: Fatal exception in interrupt
Note that bh_lock_sock() is not serializing the tasklet context against
process context, because the check for sock_owned_by_user() and
corresponding handling is missing.
Ideas for a future clean-up patch:
A) Correct usage of bh_lock_sock() in tasklet context, as described in
Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/
Re-enqueue, if needed. This may require adding return values to the
tasklet functions and thus changes to all users of iucv.
B) Change iucv tasklet into worker and use only lock_sock() in af_iucv.
Fixes: 7d316b9453 ("af_iucv: remove IUCV-pathes completely")
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The cros_ec_get_host_command_version_mask() function requires that the
caller must have ec_dev->lock mutex before calling it. This requirement
was not met and as a result it was possible that two commands were sent
to the device at the same time.
The problem was observed while using UART backend which doesn't use any
additional locks, unlike SPI backend which locks the controller until
response is received.
Fixes: f74c7557ed ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
Cc: stable@vger.kernel.org
Signed-off-by: Patryk Duda <patrykd@google.com>
Link: https://lore.kernel.org/r/20240730104425.607083-1-patrykd@google.com
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Matthieu Baerts says:
====================
mptcp: fix inconsistent backup usage
In all the MPTCP backup related tests, the backup flag was set on one
side, and the expected behaviour is to have both sides respecting this
decision. That's also the "natural" way, and what the users seem to
expect.
On the scheduler side, only the 'backup' field was checked, which is
supposed to be set only if the other peer flagged a subflow as backup.
But in various places, this flag was also set when the local host
flagged the subflow as backup, certainly to have the expected behaviour
mentioned above.
Patch 1 modifies the packet scheduler to check if the backup flag has
been set on both directions, not to change its behaviour after having
applied the following patches. That's what the default packet scheduler
should have done since the beginning in v5.7.
Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by
accident since its introduction in v5.7. Instead, the received and sent
backup flags are properly distinguished in requests.
Patch 3 stops setting the received backup flag as well when sending an
MP_PRIO, something that was done since the MP_PRIO support in v5.12.
Patch 4 adds related and missing MIB counters to be able to easily check
if MP_JOIN are sent with a backup flag. Certainly because these counters
were not there, the behaviour that is fixed by patches here was not
properly verified.
Patch 5 validates the previous patch by extending the MPTCP Join
selftest.
Patch 6 fixes the backup support in signal endpoints: if a signal
endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as
expected. It was only set for ongoing connections, but not future ones
as expected, since the introduction of the backup flag in endpoints in
v5.10.
Patch 7 validates the previous patch by extending the MPTCP Join
selftest as well.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (7):
mptcp: sched: check both directions for backup
mptcp: distinguish rcv vs sent backup flag in requests
mptcp: pm: only set request_bkup flag when sending MP_PRIO
mptcp: mib: count MPJ with backup flag
selftests: mptcp: join: validate backup in MPJ
mptcp: pm: fix backup support in signal endpoints
selftests: mptcp: join: check backup support in signal endp
include/trace/events/mptcp.h | 2 +-
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 2 +-
net/mptcp/pm.c | 12 +++++
net/mptcp/pm_netlink.c | 19 ++++++-
net/mptcp/pm_userspace.c | 18 +++++++
net/mptcp/protocol.c | 10 ++--
net/mptcp/protocol.h | 4 ++
net/mptcp/subflow.c | 10 ++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++-----
11 files changed, 132 insertions(+), 21 deletions(-)
====================
Link: https://patch.msgid.link/20240727-upstream-net-20240727-mptcp-backup-signal-v1-0-f50b31604cf1@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Before the previous commit, 'signal' endpoints with the 'backup' flag
were ignored when sending the MP_JOIN.
The MPTCP Join selftest has then been modified to validate this case:
the "single address, backup" test, is now validating the MP_JOIN with a
backup flag as it is what we expect it to do with such name. The
previous version has been kept, but renamed to "single address, switch
to backup" to avoid confusions.
The "single address with port, backup" test is also now validating the
MPJ with a backup flag, which makes more sense than checking the switch
to backup with an MP_PRIO.
The "mpc backup both sides" test is now validating that the backup flag
is also set in MP_JOIN from and to the addresses used in the initial
subflow, using the special ID 0.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 4596a2c1b7 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There was a support for signal endpoints, but only when the endpoint's
flag was changed during a connection. If an endpoint with the signal and
backup was already present, the MP_JOIN reply was not containing the
backup flag as expected.
That's confusing to have this inconsistent behaviour. On the other hand,
the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was
already there, it was just never set before. Now when requesting the
local ID from the path-manager, the backup status is also requested.
Note that when the userspace PM is used, the backup flag can be set if
the local address was already used before with a backup flag, e.g. if
the address was announced with the 'backup' flag, or a subflow was
created with the 'backup' flag.
Fixes: 4596a2c1b7 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A peer can notify the other one that a subflow has to be treated as
"backup" by two different ways: either by sending a dedicated MP_PRIO
notification, or by setting the backup flag in the MP_JOIN handshake.
The selftests were previously monitoring the former, but not the latter.
This is what is now done here by looking at these new MIB counters when
validating the 'backup' cases:
MPTcpExtMPJoinSynBackupRx
MPTcpExtMPJoinSynAckBackupRx
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it will help to validate a new fix for an issue introduced by this
commit ID.
Fixes: 4596a2c1b7 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Without such counters, it is difficult to easily debug issues with MPJ
not having the backup flags on production servers.
This is not strictly a fix, but it eases to validate the following
patches without requiring to take packet traces, to query ongoing
connections with Netlink with admin permissions, or to guess by looking
at the behaviour of the packet scheduler. Also, the modification is self
contained, isolated, well controlled, and the increments are done just
after others, there from the beginning. It looks then safe, and helpful
to backport this.
Fixes: 4596a2c1b7 ("mptcp: allow creating non-backup subflows")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The 'backup' flag from mptcp_subflow_context structure is supposed to be
set only when the other peer flagged a subflow as backup, not the
opposite.
Fixes: 067065422f ("mptcp: add the outgoing MP_PRIO support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow
as 'backup' by setting the flag with the same name. Before this patch,
the backup was set if the other peer set it in its MP_JOIN + SYN
request.
It is not correct: the backup flag should be set in the MPJ+SYN+ACK only
if the host asks for it, and not mirroring what was done by the other
peer. It is then required to have a dedicated bit for each direction,
similar to what is done in the subflow context.
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The 'mptcp_subflow_context' structure has two items related to the
backup flags:
- 'backup': the subflow has been marked as backup by the other peer
- 'request_bkup': the backup flag has been set by the host
Before this patch, the scheduler was only looking at the 'backup' flag.
That can make sense in some cases, but it looks like that's not what we
wanted for the general use, because either the path-manager was setting
both of them when sending an MP_PRIO, or the receiver was duplicating
the 'backup' flag in the subflow request.
Note that the use of these two flags in the path-manager are going to be
fixed in the next commits, but this change here is needed not to modify
the behaviour.
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 4670c8c3fb ("media: ipu-bridge: Fix Kconfig dependencies") changed
how IPU_BRIDGE dependencies are handled for all drivers, but the IPU6
variant was added the old way, which causes build time warnings when I2C is
turned off:
WARNING: unmet direct dependencies detected for IPU_BRIDGE
Depends on [n]: MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && I2C [=n]
Selected by [m]:
- VIDEO_INTEL_IPU6 [=m] && MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && VIDEO_DEV [=m] && X86 [=y] && X86_64 [=y] && HAS_DMA [=y]
To make it consistent with the other IPU drivers as well as avoid this
warning, change the 'select' into 'depends on'.
Fixes: c70281cc83 ("media: intel/ipu6: add Kconfig and Makefile")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Sakari Ailus: Alternatively depend on !IPU_BRIDGE.]
Cc: stable@vger.kernel.org # for v6.10
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
For some reason I didn't see this issue on my arm64 or x86-64 builds,
but Stephen Rothwell reports that commit 2accfdb7ef ("profiling:
attempt to remove per-cpu profile flip buffer") left these static
variables around, and the powerpc build is unhappy about them:
kernel/profile.c:52:28: warning: 'cpu_profile_flip' defined but not used [-Wunused-variable]
52 | static DEFINE_PER_CPU(int, cpu_profile_flip);
| ^~~~~~~~~~~~~~~~
..
So remove these stale left-over remnants too.
Fixes: 2accfdb7ef ("profiling: attempt to remove per-cpu profile flip buffer")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jakub reports build failures when merging linux/master with net tree:
CXX test_cpp
In file included from <built-in>:454:
<command line>:2:9: error: '_GNU_SOURCE' macro redefined [-Werror,-Wmacro-redefined]
2 | #define _GNU_SOURCE
| ^
<built-in>:445:9: note: previous definition is here
445 | #define _GNU_SOURCE 1
The culprit is commit cc937dad85 ("selftests: centralize -D_GNU_SOURCE= to
CFLAGS in lib.mk") which unconditionally added -D_GNU_SOUCE to CLFAGS.
Apparently clang++ also unconditionally adds it for the C++ targets [0]
which causes a conflict. Add small change in the selftests makefile
to filter it out for test_cpp.
Not sure which tree it should go via, targeting bpf for now, but net
might be better?
0: https://stackoverflow.com/questions/11670581/why-is-gnu-source-defined-by-default-and-how-to-turn-it-off
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240725214029.1760809-1-sdf@fomichev.me