Commit a1e40ac5b5 ("net: gso: fix udp gso fraglist segmentation after
pull from frag_list") detected invalid geometry in frag_list skbs and
redirects them from skb_segment_list to more robust skb_segment. But some
packets with modified geometry can also hit bugs in that code. We don't
know how many such cases exist. Addressing each one by one also requires
touching the complex skb_segment code, which risks introducing bugs for
other types of skbs. Instead, linearize all these packets that fail the
basic invariants on gso fraglist skbs. That is more robust.
If only part of the fraglist payload is pulled into head_skb, it will
always cause exception when splitting skbs by skb_segment. For detailed
call stack information, see below.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify fraglist skbs, breaking these invariants.
In extreme cases they pull one part of data into skb linear. For UDP,
this causes three payloads with lengths of (11,11,10) bytes were
pulled tail to become (12,10,10) bytes.
The skbs no longer meets the above SKB_GSO_FRAGLIST conditions because
payload was pulled into head_skb, it needs to be linearized before pass
to regular skb_segment.
skb_segment+0xcd0/0xd14
__udp_gso_segment+0x334/0x5f4
udp4_ufo_fragment+0x118/0x15c
inet_gso_segment+0x164/0x338
skb_mac_gso_segment+0xc4/0x13c
__skb_gso_segment+0xc4/0x124
validate_xmit_skb+0x9c/0x2c0
validate_xmit_skb_list+0x4c/0x80
sch_direct_xmit+0x70/0x404
__dev_queue_xmit+0x64c/0xe5c
neigh_resolve_output+0x178/0x1c4
ip_finish_output2+0x37c/0x47c
__ip_finish_output+0x194/0x240
ip_finish_output+0x20/0xf4
ip_output+0x100/0x1a0
NF_HOOK+0xc4/0x16c
ip_forward+0x314/0x32c
ip_rcv+0x90/0x118
__netif_receive_skb+0x74/0x124
process_backlog+0xe8/0x1a4
__napi_poll+0x5c/0x1f8
net_rx_action+0x154/0x314
handle_softirqs+0x154/0x4b8
[118.376811] [C201134] rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278!
[118.376829] [C201134] rxq0_pus: [name:traps&]Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[118.470774] [C201134] rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000
[118.470810] [C201134] rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000
[118.470827] [C201134] rxq0_pus: [name:mrdump&]pstate: 60400005 (nZCv daif +PAN -UAO)
[118.470848] [C201134] rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
[118.470900] [C201134] rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
[118.470928] [C201134] rxq0_pus: [name:mrdump&]sp : ffffffc008013770
Fixes: a1e40ac5b5 ("gso: fix udp gso fraglist segmentation after pull from frag_list")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removed dai_link->platform component cause a fail which
is exposed at runtime. (ex: when a sound tool is used)
This patch re-adds the dai_link->platform component to have
a full card registered.
Before this patch:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 1: HDMI [HDMI], device 0: HDMI snd-soc-dummy-dai-0 []
Subdevices: 1/1
Subdevice #0: subdevice #0
$ speaker-test -D plughw:1,0 -t sine
speaker-test 1.2.8
Playback device is plughw:1,0
Stream parameters are 48000Hz, S16_LE, 1 channels
Sine wave rate is 440.0000Hz
Playback open error: -22,Invalid argument
After this patch which restores the platform component:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: HDMI [HDMI], device 0: HDMI snd-soc-dummy-dai-0 [HDMI snd-soc-dummy-dai-0]
Subdevices: 0/1
Subdevice #0: subdevice #0
-> Resolve the playback error.
Fixes: 3b0db249cf ("ASoC: ti: remove unnecessary dai_link->platform")
Signed-off-by: Yuuki NAGAO <wf.yn386@gmail.com>
Link: https://patch.msgid.link/20250531141341.81164-1-wf.yn386@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There is no need for separate locks for single jobs and the entire
scheduler. The dma_fence context can be protected by the scheduler lock,
allowing for removing the jobs' locks. This simplifies things and
reduces the likelyhood of deadlocks etc.
Replace the jobs' locks with the mock scheduler lock.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250527101029.56491-2-phasta@kernel.org
HDAudio transfer types utilize SDxFMT for front-end (HOST) and PPLCxFMT
for back-end (LINK) side when setting up the stream. BE's
substream->runtime duplicates FE runtime so switch to using BE's
hw_params to address incorrect format values on the LINK side when FE
and BE formats differ.
The problem is introduced with commit d070002a20 ("ASoC: Intel: avs:
HDA PCM BE operations") but the code has been shuffled around since then
so direct 'Fixes:' tag does not apply.
Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20250530141025.2942936-4-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
RPM manipulation in hda_codec_probe_complete()'s error path is
superfluous and leads to RPM usage count underflow if the
build-controls operation fails.
hda_codec_probe_complete() is called in:
1) hda_codec_probe() for all non-HDMI codecs
2) in card->late_probe() for HDMI codecs
Error path for hda_codec_probe() takes care of bus' RPM already.
For 2) if late_probe() fails, ASoC performs card cleanup what
triggers hda_codec_remote() - same treatment is in 1).
Fixes: b5df2a7dca ("ASoC: codecs: Add HD-Audio codec driver")
Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20250530141025.2942936-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 0af944f0e3 ("drm/xe: Reject BO eviction if BO is bound to current VM")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
tgkill is a quite old syscall since kernel 2.5.75, but unfortunately glibc
doesn't support it before 2.30. Thus some systems fail to compile the
latest UserMode Linux.
Here is the compile error I encountered when I tried to compile UML in
my system shipped with glibc-2.28.
CALL scripts/checksyscalls.sh
CC arch/um/os-Linux/sigio.o
In file included from arch/um/os-Linux/sigio.c:17:
arch/um/os-Linux/sigio.c: In function ‘write_sigio_thread’:
arch/um/os-Linux/sigio.c:49:19: error: implicit declaration of function ‘tgkill’; did you mean ‘kill’? [-Werror=implicit-function-declaration]
CATCH_EINTR(r = tgkill(pid, pid, SIGIO));
^~~~~~
./arch/um/include/shared/os.h:21:48: note: in definition of macro ‘CATCH_EINTR’
#define CATCH_EINTR(expr) while ((errno = 0, ((expr) < 0)) && (errno == EINTR))
^~~~
cc1: some warnings being treated as errors
Fix it by Replacing glibc call with raw syscall.
Fixes: 33c9da5dfb ("um: Rewrite the sigio workaround based on epoll and tgkill")
Signed-off-by: Yongting Lin <linyongting@gmail.com>
Link: https://patch.msgid.link/20250527151222.40371-1-linyongting@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
arch/um is one of the last users of CONFIG_GENERIC_IOMAP, but upon
closer look it appears that the PCI host bridge does not register
any port I/O, and the absense of both custom inb/outb functions and
a PCI_IOBASE constant means that actually trying to use port I/O
results on a NULL pointer access.
Build testing with clang confirms this by warning about this exact
problem:
include/asm-generic/io.h:549:31: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
549 | val = __raw_readb(PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
Remove all the Kconfig selects that refer to legacy port I/O
and instead just build the normal MMIO path that is emulated
by the virtio PCI host.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250509084125.1488601-1-arnd@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Replace the calls to drm_atomic_get_connector_state() with
drm_atomic_get_new_connector_state() for cases which do not require
allocating the connector state, e.g. after drm_atomic_check_only() when
the intent is to only read the new connector state.
The rational is to avoid the need to handle the potential EDEADLK error
returned by the former helper, which would require restarting the entire
atomic sequence.
Acked-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-13-74c9c4a8ac0c@collabora.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Factor out the HDMI connector initialization from
drm_kunit_helper_connector_hdmi_init_funcs() into a common
__connector_hdmi_init() function, while extending its functionality to
allow setting custom (i.e. non-default) EDID data.
Introduce a macro as a wrapper over the new helper to allow dropping the
open coded EDID setup from all test cases.
The actual conversion will be handled separately; for now just apply it
to drm_kunit_helper_connector_hdmi_init() helper.
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-9-74c9c4a8ac0c@collabora.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
In preparation to support fallback to an alternative output format, e.g.
YUV420, when RGB cannot be used for any of the available color depths,
move the bpc try loop out of hdmi_compute_config() and, instead, make it
part of hdmi_compute_format(), while adding a new parameter to the
latter holding the output format to be checked and eventually set.
Since this helper now also changes hdmi.output_bpc in addition to
hdmi.output_format, highlight the extended functionality by renaming it
to hdmi_compute_format_bpc().
This improves code reusability and further extensibility, without
introducing any functional changes.
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-5-74c9c4a8ac0c@collabora.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Evaluating the requirement to use a limited RGB quantization range
involves a verification of the output format, among others, but this is
currently performed before actually computing the format, hence relying
on the old connector state.
Move the call to hdmi_is_limited_range() after hdmi_compute_config() to
ensure the verification is done on the updated output format.
Fixes: 027d435906 ("drm/connector: hdmi: Add RGB Quantization Range to the connector state")
Reviewed-by: Dmitry Baryshkov <lumag@kernel.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-1-74c9c4a8ac0c@collabora.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
The DE33 is a newer version of the Allwinner Display Engine IP block,
found in the H616, H618, H700 and T507 SoCs. DE2 and DE3 are already
supported by the mainline driver.
Notable features (from the H616 datasheet and implemented):
- 4096 x 2048 (4K) output support
Other features (implemented but not in this patchset):
- AFBC ARM Frame Buffer Compression support
- YUV pipeline support
The DE2 and DE3 engines have a blender register range within the
mixer engine register map, whereas the DE33 separates this out into
a separate display group, and adds a top register map.
The DE33 also appears to remove the global double buffer control
register, present in the DE2 and DE3.
Extend the mixer to support the DE33.
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Ryan Walklin <ryan@testtoast.com>
Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Link: https://lore.kernel.org/r/20250528092431.28825-7-ryan@testtoast.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>