Commit Graph

1200115 Commits

Author SHA1 Message Date
Jane Jian
4a37c55b85 drm/amd/smu: use AverageGfxclkFrequency* to replace previous GFX Curr Clock
Report current GFX clock also from average clock value as the original
CurrClock data is not valid/accurate any more as per FW team

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 16:17:17 -04:00
Mario Limonciello
c01aebeef3 drm/amd: Fix an error handling mistake in psp_sw_init()
If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared.  If the third call
fails, the memory from the second call should be cleared.

Fixes: b95b539168 ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 16:16:57 -04:00
Victor Lu
9beb223f2a drm/amdgpu: Fix infinite loop in gfxhub_v1_2_xcc_gart_enable (v2)
An instance of for_each_inst() was not changed to match its new
behaviour and is causing a loop.

v2: remove tmp_mask variable

Fixes: b579ea632f ("drm/amdgpu: Modify for_each_inst macro")
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 16:15:57 -04:00
Jonathan Kim
602816c3ee drm/amdkfd: fix trap handling work around for debugging
Update the list of devices that require the cwsr trap handling
workaround for debugging use cases.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Ruili Ji <ruili.ji@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 16:13:58 -04:00
Dave Airlie
28801cc859 Merge tag 'amd-drm-fixes-6.5-2023-07-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.5-2023-07-20:

amdgpu:
- More PCIe DPM fixes for Intel platforms
- DCN3.0.1 fixes
- Virtual display timer fix
- Async flip fix
- SMU13 clock reporting fixes
- Add missing PSP firmware declaration
- DP MST fix
- DCN3.1.x fixes
- Slab out of bounds fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720133456.7826-1-alexander.deucher@amd.com
2023-07-21 12:16:47 +10:00
Dave Airlie
534a7915c6 Merge tag 'drm-intel-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Add sentinel to xehp_oa_b_counters [perf] (Andrzej Hajda)
- Revert "drm/i915: use localized __diag_ignore_all() instead of per file" (Jani Nikula)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZLjuwhLhwab5B7RY@tursulin-desk
2023-07-21 12:15:09 +10:00
Dave Airlie
f4f19c03cf Merge tag 'drm-misc-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Memory leak fixes in drm/client, memory access/leak fixes for
accel/qaic, another leak fix in dma-buf and three nouveau fixes around
hotplugging.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fmj5nok7zggux2lcpdtls2iknweba54wfc6o4zxq6i6s3dgi2r@7z3eawwhyhen
2023-07-21 12:14:05 +10:00
Arnd Bergmann
78e9b217d7 accel/habanalabs: add more debugfs stub helpers
Two functions got added with normal prototypes for debugfs, but not
alternative when building without it:

drivers/accel/habanalabs/common/device.c: In function 'hl_device_init':
drivers/accel/habanalabs/common/device.c:2177:14: error: implicit declaration of function 'hl_debugfs_device_init'; did you mean 'hl_debugfs_init'? [-Werror=implicit-function-declaration]
drivers/accel/habanalabs/common/device.c:2305:9: error: implicit declaration of function 'hl_debugfs_device_fini'; did you mean 'hl_debugfs_remove_file'? [-Werror=implicit-function-declaration]

Add stubs for these as well.

Fixes: 3b9abb4fa6 ("accel/habanalabs: expose debugfs files later")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Acked-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230609120636.3969045-1-arnd@kernel.org
2023-07-20 12:51:59 +02:00
Ben Skeggs
ea293f823a drm/nouveau/kms/nv50-: init hpd_irq_lock for PIOR DP
Fixes OOPS on boards with ANX9805 DP encoders.

Cc: stable@vger.kernel.org # 6.4+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230719044051.6975-3-skeggsb@gmail.com
2023-07-19 11:08:47 +02:00
Ben Skeggs
2b5d1c29f6 drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts
Fixes crash on boards with ANX9805 TMDS/DP encoders.

Cc: stable@vger.kernel.org # 6.4+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230719044051.6975-2-skeggsb@gmail.com
2023-07-19 11:08:47 +02:00
Ben Skeggs
752a281032 drm/nouveau/i2c: fix number of aux event slots
This was completely bogus before, using maximum DCB device index rather
than maximum AUX ID to size the buffer that stores event refcounts.

*Pretty* unlikely to have been an actual problem on most configurations,
that is, unless you've got one of the rare boards that have off-chip DP.

There, it'll likely crash.

Cc: stable@vger.kernel.org # 6.4+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230719044051.6975-1-skeggsb@gmail.com
2023-07-19 11:08:47 +02:00
Guchun Chen
b13d3e9c6b drm/amdgpu: use a macro to define no xcp partition case
~0 as no xcp partition is used in several places, so improve its
definition by a macro for code consistency.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:42:54 -04:00
Guchun Chen
8e78127143 drm/amdgpu/vm: use the same xcp_id from root PD
Other PDs/PTs allocation should just use the same xcp_id as that
stored in root PD.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:42:47 -04:00
Guchun Chen
8ecee4cbc7 drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create
Recent code set xcp_id stored from file private data when opening
device to amdgpu bo for accounting memory usage etc, but not all
VMs are attached to this fpriv structure like the vm cases in
amdgpu_mes_self_test, otherwise, KASAN will complain below out
of bound access. And more importantly, VM code should not touch
fpriv structure, so drop fpriv code handling from amdgpu_vm_pt.

[   77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069
[   77.294146] Call Trace:
[   77.294178]  <TASK>
[   77.294208]  dump_stack_lvl+0x49/0x63
[   77.294260]  print_report+0x16f/0x4a6
[   77.294307]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.295979]  ? kasan_complete_mode_report_info+0x3c/0x200
[   77.296057]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.297556]  kasan_report+0xb4/0x130
[   77.297609]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.299202]  __asan_load4+0x6f/0x90
[   77.299272]  amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.300796]  ? amdgpu_init+0x6e/0x1000 [amdgpu]
[   77.302222]  ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu]
[   77.303721]  ? preempt_count_sub+0x18/0xc0
[   77.303786]  amdgpu_vm_init+0x39e/0x870 [amdgpu]
[   77.305186]  ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu]
[   77.306683]  ? kasan_set_track+0x25/0x30
[   77.306737]  ? kasan_save_alloc_info+0x1b/0x30
[   77.306795]  ? __kasan_kmalloc+0x87/0xa0
[   77.306852]  amdgpu_mes_self_test+0x169/0x620 [amdgpu]

v2: without specifying xcp partition for PD/PT bo, the xcp id is -1.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686
Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:42:34 -04:00
Guchun Chen
dcaa32e1f5 drm/amdgpu: Allocate root PD on correct partition
file_priv needs to be setup firstly, otherwise, root PD
will always be allocated on partition 0, even if opening
the device from other partitions.

Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:42:23 -04:00
Nicholas Kazlauskas
2387ccf43e drm/amd/display: Keep PHY active for DP displays on DCN31
[Why & How]
Port of a change that went into DCN314 to keep the PHY enabled
when we have a connected and active DP display.

The PHY can hang if PHY refclk is disabled inadvertently.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Josip Pavic <josip.pavic@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:41:23 -04:00
Daniel Miess
2a9482e559 drm/amd/display: Prevent vtotal from being set to 0
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.

[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:40:56 -04:00
Zhikai Zhai
a460beefe7 drm/amd/display: Disable MPC split by default on special asic
[WHY]
All of pipes will be used when the MPC split enable on the dcn
which just has 2 pipes. Then MPO enter will trigger the minimal
transition which need programe dcn from 2 pipes MPC split to 2
pipes MPO. This action will cause lag if happen frequently.

[HOW]
Disable the MPC split for the platform which dcn resource is limited

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:39:44 -04:00
Taimur Hassan
5a25cefc09 drm/amd/display: check TG is non-null before checking if enabled
[Why & How]
If there is no TG allocation we can dereference a NULL pointer when
checking if the TG is enabled.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Taimur Hassan <syed.hassan@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:33:47 -04:00
Wayne Lin
4f6d9e38c4 drm/amd/display: Add polling method to handle MST reply packet
[Why]
Specific TBT4 dock doesn't send out short HPD to notify source
that IRQ event DOWN_REP_MSG_RDY is set. Which violates the spec
and cause source can't send out streams to mst sinks.

[How]
To cover this misbehavior, add an additional polling method to detect
DOWN_REP_MSG_RDY is set. HPD driven handling method is still kept.
Just hook up our handler to drm mgr->cbs->poll_hpd_irq().

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jerry Zuo <jerry.zuo@amd.com>
Acked-by: Alan Liu <haoping.liu@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:32:37 -04:00
Srinivasan Shanmugam
87279fdf5e drm/amd/display: Clean up errors & warnings in amdgpu_dm.c
Fix the following errors & warnings reported by checkpatch:

ERROR: space required before the open brace '{'
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxW)
ERROR: else should follow close brace '}'
ERROR: open brace '{' following function definitions go on the next line
ERROR: code indent should use tabs where possible

WARNING: braces {} are not necessary for single statement blocks
WARNING: void function return statements are not generally useful
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:32:32 -04:00
Candice Li
b9c2213cdf drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_ta
Allow the initramfs generator to automatically include psp_13_0_6_ta
firmware to initramfs.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:23:11 -04:00
Alex Deucher
068c8bb10f drm/amdgpu/pm: make mclk consistent for smu 13.0.7
Use current uclk to be consistent with other dGPUs.

Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-07-18 14:22:49 -04:00
Alex Deucher
a4eb118241 drm/amdgpu/pm: make gfxclock consistent for sienna cichlid
Use average gfxclock for consistency with other dGPUs.

Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-07-18 14:22:12 -04:00
Simon Ser
1ca67aba8d drm/amd/display: only accept async flips for fast updates
Up until now, amdgpu was silently degrading to vsync when
user-space requested an async flip but the hardware didn't support
it.

The hardware doesn't support immediate flips when the update changes
the FB pitch, the DCC state, the rotation, enables or disables CRTCs
or planes, etc. This is reflected in the dm_crtc_state.update_type
field: UPDATE_TYPE_FAST means that immediate flip is supported.

Silently degrading async flips to vsync is not the expected behavior
from a uAPI point-of-view. Xorg expects async flips to fail if
unsupported, to be able to fall back to a blit. i915 already behaves
this way.

This patch aligns amdgpu with uAPI expectations and returns a failure
when an async flip is not possible.

Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-07-18 14:21:02 -04:00
Guchun Chen
b42ae87a7b drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.

do {
	xrandr --output Virtual --rotate inverted
	xrandr --output Virtual --rotate right
	xrandr --output Virtual --rotate left
	xrandr --output Virtual --rotate normal
} while (1);

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable

It's caused by a stuck in lock dependency in such scenario on different
CPUs.

CPU1                                             CPU2
drm_crtc_vblank_off                              hrtimer_interrupt
    grab event_lock (irq disabled)                   __hrtimer_run_queues
        grab vbl_lock/vblank_time_block                  amdgpu_vkms_vblank_simulate
            amdgpu_vkms_disable_vblank                       drm_handle_vblank
                hrtimer_cancel                                         grab dev->event_lock

So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.

So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.

v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well

v3: drop warn printing (Christian)

v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)

Fixes: 84ec374bd5 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:18:05 -04:00
Aurabindo Pillai
d1792509e1 drm/amd/display: add DCN301 specific logic for OTG programming
[Why&How]
DCN301 does not have FAMS hence the workaround needed on other DCN3x
variants related to OTG min/max selector programming is not applicable for it.
Hence isolate it and have it use the old sequence without workaround.

Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:17:35 -04:00
Aurabindo Pillai
2ed5a4c461 drm/amd/display: export some optc function for reuse
[Why&How]
Make a few functions non static so that they can be reused for other
asic. This is in preparation for separating out OTG programming sequence
for DCN301

Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:17:11 -04:00
Mario Limonciello
60a2dae490 drm/amd: Use amdgpu_device_pcie_dynamic_switching_supported() for SMU7
SMU7 does a check if the dGPU is inserted into a Rocket Lake system,
to turn off DPM.  Extend this check to all systems that have problems
with dynamic switching by using the
amdgpu_device_pcie_dynamic_switching_supported() helper.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 14:16:38 -04:00
Jani Nikula
2c27770a7b Revert "drm/i915: use localized __diag_ignore_all() instead of per file"
This reverts commit 88e9664434.

__diag_ignore_all() only works for GCC 8 or later.

-Woverride-init (from -Wextra, enabled in i915 Makefile) combined with
CONFIG_WERROR=y or W=e breaks the build for older GCC.

With i386_defconfig and x86_64_defconfig enabling CONFIG_WERROR=y by
default, we really need to roll back the change.

An alternative would be to disable -Woverride-init in the Makefile for
GCC <8, but the revert seems like the safest bet now.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8768
Reported-by: John Garry <john.g.garry@oracle.com>
References: https://lore.kernel.org/r/ad2601c0-84bb-c574-3702-a83ff8faf98c@oracle.com
References: https://lore.kernel.org/r/87wmzezns4.fsf@intel.com
Fixes: 88e9664434 ("drm/i915: use localized __diag_ignore_all() instead of per file")
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Tested-by: John Garry <john.g.garry@oracle.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711110214.25093-1-jani.nikula@intel.com
(cherry picked from commit 290d161045)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2023-07-17 13:39:04 +01:00
Andrzej Hajda
785b3f667b drm/i915/perf: add sentinel to xehp_oa_b_counters
Arrays passed to reg_in_range_table should end with empty record.

The patch solves KASAN detected bug with signature:
BUG: KASAN: global-out-of-bounds in xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]
Read of size 4 at addr ffffffffa1555d90 by task perf/1518

CPU: 4 PID: 1518 Comm: perf Tainted: G U 6.4.0-kasan_438-g3303d06107f3+ #1
Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3223.D80.2305311348 05/31/2023
Call Trace:
<TASK>
...
xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]

Fixes: 0fa9349dda ("drm/i915/perf: complete programming whitelisting for XEHPSDV")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711153410.1224997-1-andrzej.hajda@intel.com
(cherry picked from commit 2f42c5afb3)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2023-07-17 13:39:01 +01:00
Linus Torvalds
fdf0eaf114 Linux 6.5-rc2 v6.5-rc2 2023-07-16 15:10:37 -07:00
Linus Torvalds
5b8d6e8539 Merge tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:

 - fix interaction between unaligned exception handler and load/store
   exception handler

 - fix parsing ISS network interface specification string

 - add comment about etherdev freeing to ISS network driver

* tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa:
  xtensa: fix unaligned and load/store configuration interaction
  xtensa: ISS: fix call to split_if_spec
  xtensa: ISS: add comment about etherdev freeing
2023-07-16 14:12:49 -07:00
Linus Torvalds
1667e630c2 Merge tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:

 - Fix a lockdep warning when the event given is the first one, no event
   group exists yet but the code still goes and iterates over event
   siblings

* tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
2023-07-16 13:46:08 -07:00
Linus Torvalds
8a3e4a6484 Merge tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:

 - Mark copy_iovec_from_user() __noclone in order to prevent gcc from
   doing an inter-procedural optimization and confuse objtool

 - Initialize struct elf fully to avoid build failures

* tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  iov_iter: Mark copy_iovec_from_user() noclone
  objtool: initialize all of struct elf
2023-07-16 13:34:29 -07:00
Linus Torvalds
f61a89ca11 Merge tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Remove a cgroup from under a polling process properly

 - Fix the idle sibling selection

* tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/psi: use kernfs polling functions for PSI trigger polling
  sched/fair: Use recent_used_cpu to test p->cpus_ptr
2023-07-16 13:22:08 -07:00
Linus Torvalds
ede950b019 Merge tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
 "I'm mostly on vacation but what would vacation be without a few
  critical fixes so people can use their gaming laptops when hiding away
  from the sun (or rain)?

   - Fix a really annoying interrupt storm in the AMD driver affecting
     Asus TUF gaming notebooks

   - Fix device tree parsing in the Renesas driver"

* tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: amd: Unify debounce handling into amd_pinconf_set()
  pinctrl: amd: Drop pull up select configuration
  pinctrl: amd: Use amd_pinconf_set() for all config options
  pinctrl: amd: Only use special debounce behavior for GPIO 0
  pinctrl: renesas: rzg2l: Handle non-unique subnode names
  pinctrl: renesas: rzv2m: Handle non-unique subnode names
2023-07-16 12:55:31 -07:00
Linus Torvalds
fe756ad021 Merge tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - Two reconnect fixes: important fix to address inFlight count to leak
   (which can leak credits), and fix for better handling a deleted share

 - DFS fix

 - SMB1 cleanup fix

 - deferred close fix

* tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix mid leak during reconnection after timeout threshold
  cifs: is_network_name_deleted should return a bool
  smb: client: fix missed ses refcounting
  smb: client: Fix -Wstringop-overflow issues
  cifs: if deferred close is disabled then close files immediately
2023-07-16 12:49:05 -07:00
Linus Torvalds
20edcec23f Merge tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix Speculation_Store_Bypass reporting in /proc/self/status on
   Power10

 - Fix HPT with 4K pages since recent changes by implementing pmd_same()

 - Fix 64-bit native_hpte_remove() to be irq-safe

Thanks to Aneesh Kumar K.V, Nageswara R Sastry, and Russell Currey.

* tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm/book3s64/hash/4k: Add pmd_same callback for 4K page size
  powerpc/64e: Fix obtool warnings in exceptions-64e.S
  powerpc/security: Fix Speculation_Store_Bypass reporting on Power10
  powerpc/64s: Fix native_hpte_remove() to be irq-safe
2023-07-16 12:28:04 -07:00
Linus Torvalds
6eede0686f Merge tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:

 - Remove LTO-only suffixes from promoted global function symbols
   (Yonghong Song)

 - Remove unused .text..refcount section from vmlinux.lds.h (Petr Pavlu)

 - Add missing __always_inline to sparc __arch_xchg() (Arnd Bergmann)

 - Claim maintainership of string routines

* tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  sparc: mark __arch_xchg() as __always_inline
  MAINTAINERS: Foolishly claim maintainership of string routines
  kallsyms: strip LTO-only suffixes from promoted global functions
  vmlinux.lds.h: Remove a reference to no longer used sections .text..refcount
2023-07-16 12:18:18 -07:00
Linus Torvalds
4b4eef57e6 Merge tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe fixes from Masami Hiramatsu:

 - fprobe: Add a comment why fprobe will be skipped if another kprobe is
   running in fprobe_kprobe_handler().

 - probe-events: Fix some issues related to fetch-arguments:

    - Fix double counting of the string length for user-string and
      symstr. This will require longer buffer in the array case.

    - Fix not to count error code (minus value) for the total used
      length in array argument. This makes the total used length
      shorter.

    - Fix to update dynamic used data size counter only if fetcharg uses
      the dynamic size data. This may mis-count the used dynamic data
      size and corrupt data.

    - Revert "tracing: Add "(fault)" name injection to kernel probes"
      because that did not work correctly with a bug, and we agreed the
      current '(fault)' output (instead of '"(fault)"' like a string)
      explains what happened more clearly.

    - Fix to record 0-length (means fault access) data_loc data in fetch
      function itself, instead of store_trace_args(). If we record an
      array of string, this will fix to save fault access data on each
      entry of the array correctly.

* tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
  Revert "tracing: Add "(fault)" name injection to kernel probes"
  tracing/probes: Fix to update dynamic data counter if fetcharg uses it
  tracing/probes: Fix not to count error code to total length
  tracing/probes: Fix to avoid double count of the string length on the array
  fprobes: Add a comment why fprobe_kprobe_handler exits if kprobe is running
2023-07-16 12:13:51 -07:00
Linus Torvalds
831fe284d8 Merge tag 'spi-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A couple of fairly minor driver specific fixes here, plus a bunch of
  maintainership and admin updates. Nothing too remarkable"

* tag 'spi-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  mailmap: add entry for Jonas Gorski
  MAINTAINERS: add myself for spi-bcm63xx
  spi: s3c64xx: clear loopback bit after loopback test
  spi: bcm63xx: fix max prepend length
  MAINTAINERS: Add myself as a maintainer for Microchip SPI
2023-07-15 08:51:02 -07:00
Linus Torvalds
393ea78172 Merge tag 'regmap-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
 "One fix for an out of bounds access in the interupt code here"

* tag 'regmap-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap-irq: Fix out-of-bounds access when allocating config buffers
2023-07-15 08:46:09 -07:00
Linus Torvalds
82678ab2a4 Merge tag 'iommu-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:

 - Fix a regression causing a crash on sysfs access of iommu-group
   specific files

 - Fix signedness bug in SVA code

* tag 'iommu-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/sva: Fix signedness bug in iommu_sva_alloc_pasid()
  iommu: Fix crash during syfs iommu_groups/N/type
2023-07-15 08:40:00 -07:00
Ville Syrjälä
05abb3be91 dma-buf/dma-resv: Stop leaking on krealloc() failure
Currently dma_resv_get_fences() will leak the previously
allocated array if the fence iteration got restarted and
the krealloc_array() fails.

Free the old array by hand, and make sure we still clear
the returned *fences so the caller won't end up accessing
freed memory. Some (but not all) of the callers of
dma_resv_get_fences() seem to still trawl through the
array even when dma_resv_get_fences() failed. And let's
zero out *num_fences as well for good measure.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Fixes: d3c80698c9 ("dma-buf: use new iterator in dma_resv_get_fences v3")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230713194745.1751-1-ville.syrjala@linux.intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2023-07-15 13:57:30 +02:00
Linus Torvalds
b6e6cc1f78 Merge tag 'x86_urgent_for_6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CFI fixes from Peter Zijlstra:
 "Fix kCFI/FineIBT weaknesses

  The primary bug Alyssa noticed was that with FineIBT enabled function
  prologues have a spurious ENDBR instruction:

    __cfi_foo:
	endbr64
	subl	$hash, %r10d
	jz	1f
	ud2
	nop
    1:
    foo:
	endbr64 <--- *sadface*

  This means that any indirect call that fails to target the __cfi
  symbol and instead targets (the regular old) foo+0, will succeed due
  to that second ENDBR.

  Fixing this led to the discovery of a single indirect call that was
  still doing this: ret_from_fork(). Since that's an assembly stub the
  compiler would not generate the proper kCFI indirect call magic and it
  would not get patched.

  Brian came up with the most comprehensive fix -- convert the thing to
  C with only a very thin asm wrapper. This ensures the kernel thread
  boostrap is a proper kCFI call.

  While discussing all this, Kees noted that kCFI hashes could/should be
  poisoned to seal all functions whose address is never taken, further
  limiting the valid kCFI targets -- much like we already do for IBT.

  So what was a 'simple' observation and fix cascaded into a bunch of
  inter-related CFI infrastructure fixes"

* tag 'x86_urgent_for_6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cfi: Only define poison_cfi() if CONFIG_X86_KERNEL_IBT=y
  x86/fineibt: Poison ENDBR at +0
  x86: Rewrite ret_from_fork() in C
  x86/32: Remove schedule_tail_wrapper()
  x86/cfi: Extend ENDBR sealing to kCFI
  x86/alternative: Rename apply_ibt_endbr()
  x86/cfi: Extend {JMP,CAKK}_NOSPEC comment
2023-07-14 20:19:25 -07:00
Linus Torvalds
be522ac7cd Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "This is a bunch of small driver fixes and a larger rework of zone disk
  handling (which reaches into blk and nvme).

  The aacraid array-bounds fix is now critical since the security people
  turned on -Werror for some build tests, which now fail without it"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: storvsc: Handle SRB status value 0x30
  scsi: block: Improve checks in blk_revalidate_disk_zones()
  scsi: block: virtio_blk: Set zone limits before revalidating zones
  scsi: block: nullblk: Set zone limits before revalidating zones
  scsi: nvme: zns: Set zone limits before revalidating zones
  scsi: sd_zbc: Set zone limits before revalidating zones
  scsi: ufs: core: Add support for qTimestamp attribute
  scsi: aacraid: Avoid -Warray-bounds warning
  scsi: ufs: ufs-mediatek: Add dependency for RESET_CONTROLLER
  scsi: ufs: core: Update contact email for monitor sysfs nodes
  scsi: scsi_debug: Remove dead code
  scsi: qla2xxx: Use vmalloc_array() and vcalloc()
  scsi: fnic: Use vmalloc_array() and vcalloc()
  scsi: qla2xxx: Fix error code in qla2x00_start_sp()
  scsi: qla2xxx: Silence a static checker warning
  scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()
2023-07-14 19:57:29 -07:00
Linus Torvalds
b3bd86a049 Merge tag 'block-6.5-2023-07-14' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Don't require quirk to use duplicate namespace identifiers
        (Christoph, Sagi)
      - One more BOGUS_NID quirk (Pankaj)
      - IO timeout and error hanlding fixes for PCI (Keith)
      - Enhanced metadata format mask fix (Ankit)
      - Association race condition fix for fibre channel (Michael)
      - Correct debugfs error checks (Minjie)
      - Use PAGE_SECTORS_SHIFT where needed (Damien)
      - Reduce kernel logs for legacy nguid attribute (Keith)
      - Use correct dma direction when unmapping metadata (Ming)

 - Fix for a flush handling regression in this release (Christoph)

 - Fix for batched request time stamping (Chengming)

 - Fix for a regression in the mq-deadline position calculation (Bart)

 - Lockdep fix for blk-crypto (Eric)

 - Fix for a regression in the Amiga partition handling changes
   (Michael)

* tag 'block-6.5-2023-07-14' of git://git.kernel.dk/linux:
  block: queue data commands from the flush state machine at the head
  blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq
  nvme-pci: fix DMA direction of unmapping integrity data
  nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
  block/mq-deadline: Fix a bug in deadline_from_pos()
  nvme: ensure disabling pairs with unquiesce
  nvme-fc: fix race between error recovery and creating association
  nvme-fc: return non-zero status code when fails to create association
  nvme: fix parameter check in nvme_fault_inject_init()
  nvme: warn only once for legacy uuid attribute
  block: remove dead struc request->completion_data field
  nvme: fix the NVME_ID_NS_NVM_STS_MASK definition
  nvmet: use PAGE_SECTORS_SHIFT
  nvme: add BOGUS_NID quirk for Samsung SM953
  blk-crypto: use dynamic lock class for blk_crypto_profile::lock
  block/partition: fix signedness issue for Amiga partitions
2023-07-14 19:52:18 -07:00
Linus Torvalds
ec17f16432 Merge tag 'io_uring-6.5-2023-07-14' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "Just a single tweak for the wait logic in io_uring"

* tag 'io_uring-6.5-2023-07-14' of git://git.kernel.dk/linux:
  io_uring: Use io_schedule* in cqring wait
2023-07-14 19:46:54 -07:00
Linus Torvalds
2772d7df3c Merge tag 'riscv-for-linus-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - fix a formatting error in the hwprobe documentation

 - fix a spurious warning in the RISC-V PMU driver

 - fix memory detection on rv32 (problem does not manifest on any known
   system)

 - avoid parsing legacy parsing of I in ACPI ISA strings

* tag 'riscv-for-linus-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Don't include Zicsr or Zifencei in I from ACPI
  riscv: mm: fix truncation warning on RV32
  perf: RISC-V: Remove PERF_HES_STOPPED flag checking in riscv_pmu_start()
  Documentation: RISC-V: hwprobe: Fix a formatting error
2023-07-14 11:14:07 -07:00