Commit Graph

1311976 Commits

Author SHA1 Message Date
Tomasz Lis
4be3fca2ce drm/xe/vf: Start post-migration fixups with provisioning query
During post-migration recovery, only MMIO communication to GuC is
allowed. The VF KMD needs to use that channel to ask for the new
provisioning, which includes a new GGTT range assigned to the VF.

v2: query config only instead of handshake; no need to get pm ref as
 it's now kept through whole recovery (Michal)
v3: switched names of 'err' and 'ret'  (Michal)

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-5-tomasz.lis@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-11-06 15:25:30 +01:00
Tomasz Lis
1255954d9f drm/xe/vf: Send RESFIX_DONE message at end of VF restore
After restore, GuC will not answer to any messages from VF KMD until
fixups are applied. When that is done, VF KMD sends RESFIX_DONE
message to GuC, at which point GuC resumes normal operation.

This patch implements sending the RESFIX_DONE message at end of
post-migration recovery.

v2: keep pm ref during whole recovery, style fixes (Michal)
v3: assert removal to separate patch, debug message per GuC instead
  of one, comments changes (Michal)
v4: improve one debug message (Michal)

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-4-tomasz.lis@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-11-06 15:25:30 +01:00
Tomasz Lis
360a1f3e96 drm/xe/vf: Document SRIOV VF restore flow
This adds a documentation chapter, containing high level flow
of VF restore procedure.

v2: Better describe initial conditions, include GuC states on
  sequence diagram (Michal)
v3: moved DOC to .c (Michal)

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-3-tomasz.lis@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-11-06 15:24:24 +01:00
Tomasz Lis
6e6d7b41f9 drm/xe/vf: React to MIGRATED interrupt
To properly support VF Save/Restore procedure, fixups need to be
applied after PF driver finishes its part of VF Restore. The fixups
are required to adjust the ongoing execution for a hardware switch
that happened, because some GFX resources are not fully virtualized,
and assigned to a VF as range from a global pool. The VF on which
a VM is restored will often have different ranges provisioned than
the VF on which save process happened. Those resource fixups are
applied by the VF driver within a restored VM.

A VF driver gets informed that it was migrated by receiving an
interrupt from each GuC. The interrupt assigned for that purpose
is "GUC SW interrupt 0". Seeing that fields set from within the
irq handler should be the trigger for fixups.

The VF can safely do post-migration fixups on resources associated
to each GuC only after that GuC issued the MIGRATED interrupt.

This change introduces a worker to be used for post-migration fixups,
and a mechanism to schedule said worker when all GuCs sent the irq.

v2: renamed and moved functions, updated logged messages, removed
  unused includes, used anon struct (Michal)
v3: ordering, kerneldoc, asserts, debug messages,
  on_all_tiles -> on_all_gts (Michal)
v4: fixed missing header include
v5: Explained what fixups are, explained which IRQ is used, style
  fixes (Michal)

Bspec: 50868
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-2-tomasz.lis@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-11-06 14:53:35 +01:00
Lucas De Marchi
20ade9c3f1 drm/xe: Reword exec_queue and vm lock doc
Reword documentation to possibly what it meant to be.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05 13:38:47 -08:00
Lucas De Marchi
83db047d94 drm/xe: Stop accumulating LRC timestamp on job_free
The exec queue timestamp is only really useful when it's being queried
through the fdinfo. There's no need to update it so often, on every
job_free. Tracing a simple app like vkcube running shows an update
rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot
of pcie transactions.

The update on job_free() is used to cover a gap: if exec
queue is created and destroyed rapidly, before a new query, the
timestamp still needs to be accumulated and accounted for in the xef.

Initial implementation in commit 6109f24f87 ("drm/xe: Add helper to
accumulate exec queue runtime") couldn't do it on the exec_queue_fini
since the xef could be gone at that point. However since commit
ce8c161cba ("drm/xe: Add ref counting for xe_file") the xef is
refcounted and the exec queue always holds a reference, making this safe
now.

Improve the fix in commit 2149ded630 ("drm/xe: Fix use after free when
client stats are captured") by reducing the frequency in which the
update is needed.

Fixes: 2149ded630 ("drm/xe: Fix use after free when client stats are captured")
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05 13:38:46 -08:00
Lucas De Marchi
a7238ee33c drm/xe: Add trace to lrc timestamp update
Help debugging when LRC timestamp is updated for a exec queue.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05 13:38:46 -08:00
Michal Wajdeczko
43b1dd2b55 drm/xe/pf: Fix potential GGTT allocation leak
In unlikely event that we fail during sending the new VF GGTT
configuration to the GuC, we will free only the GGTT node data
struct but will miss to release the actual GGTT allocation.

This will later lead to list corruption, GGTT space leak and
finally risking crash when unloading the driver:

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] list_add corruption. next->prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028).
 [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0
 [ ] Call Trace:
 [ ]  drm_mm_insert_node_in_range+0x2c0/0x4e0
 [ ]  xe_ggtt_node_insert+0x46/0x70 [xe]
 [ ]  pf_provision_vf_ggtt+0x7f5/0xa70 [xe]
 [ ]  xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe]
 [ ]  ggtt_set+0x4b/0x70 [xe]
 [ ]  simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI
 [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390
 [ ] Call Trace:
 [ ]  <TASK>
 [ ]  ? die_addr+0x2e/0x80
 [ ]  ? exc_general_protection+0x1a1/0x3e0
 [ ]  ? asm_exc_general_protection+0x22/0x30
 [ ]  ? drm_mm_remove_node+0x1b7/0x390
 [ ]  ggtt_node_remove+0xa5/0xf0 [xe]
 [ ]  xe_ggtt_node_remove+0x35/0x70 [xe]
 [ ]  xe_ttm_bo_destroy+0x123/0x220 [xe]
 [ ]  intel_user_framebuffer_destroy+0x44/0x70 [xe]
 [ ]  intel_plane_destroy_state+0x3b/0xc0 [xe]
 [ ]  drm_atomic_state_default_clear+0x1cd/0x2f0
 [ ]  intel_atomic_state_clear+0x9/0x20 [xe]
 [ ]  __drm_atomic_state_free+0x1d/0xb0

Fix that by using pf_release_ggtt() on the error path, which now
works regardless if the node has GGTT allocation or not.

Fixes: 34e804220f ("drm/xe: Make xe_ggtt_node struct independent")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104144901.1903-1-michal.wajdeczko@intel.com
2024-11-05 20:17:48 +01:00
Matthew Brost
7d1a4258e6 drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL
Upon failure all locks need to be dropped before returning to the user.

Fixes: 58480c1c91 ("drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-3-matthew.brost@intel.com
2024-11-05 11:16:42 -08:00
Matthew Brost
07064a200b drm/xe: Fix possible exec queue leak in exec IOCTL
In a couple of places after an exec queue is looked up the exec IOCTL
returns on input errors without dropping the exec queue ref. Fix this
ensuring the exec queue ref is dropped on input error.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-2-matthew.brost@intel.com
2024-11-05 11:16:35 -08:00
Lucas De Marchi
71fb41bdd9 drm/xe: Fix case for asserts in documentation
The rendered html documentation for "Xe ASSERTs" doesn't look nice with
the mixed caps and gives the impression it was a typo. Use Title Case
Style.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105071539.2623727-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05 05:59:01 -08:00
Lucas De Marchi
a8f6035aeb drm/xe: Wire up devcoredump in documentation
Add a documentation page to detail the device coredump as implemented by
xe - it was documented in the source code, but not visible in the
rendered (html) documentation.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241102161254.1818604-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-04 23:29:57 -08:00
Lucas De Marchi
aa06cb8351 drm/xe: Improve devcoredump documentation
Let the introduction be useful for both userspace and kernel. Also
improve the formatting to wire up later to the documentation build.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241102161254.1818604-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-04 23:29:43 -08:00
Gyeyoung Baek
db62482e32 drm/xe: Fix build error for XE_IOCTL_DBG macro
If CONFIG_DRM_USE_DYNAMIC_DEBUG is set, 'drm_dbg' function is replaced
with '__dynamic_func_call_cls', which is replaced with a do while
statement. So in the previous code, there are the following build
errors.

include/linux/dynamic_debug.h:221:58: error: expected expression before ‘do’
  221 | #define __dynamic_func_call_cls(id, cls, fmt, func, ...) do {   \
      |                                                          ^~
include/linux/dynamic_debug.h:248:9: note: in expansion of macro ‘__dynamic_func_call_cls’
  248 |         __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__)
      |         ^~~~~~~~~~~~~~~~~~~~~~~
include/drm/drm_print.h:425:9: note: in expansion of macro ‘_dynamic_func_call_cls’
  425 |         _dynamic_func_call_cls(cat, fmt, __drm_dev_dbg,         \
      |         ^~~~~~~~~~~~~~~~~~~~~~
include/drm/drm_print.h:504:9: note: in expansion of macro ‘drm_dev_dbg’
  504 |         drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_DRIVER, fmt, ##__VA_ARGS__)
      |         ^~~~~~~~~~~
include/drm/drm_print.h:522:33: note: in expansion of macro ‘drm_dbg_driver’
  522 | #define drm_dbg(drm, fmt, ...)  drm_dbg_driver(drm, fmt, ##__VA_ARGS__)
      |                                 ^~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_macros.h:14:21: note: in expansion of macro ‘drm_dbg’
   14 |         ((cond) && (drm_dbg(&(xe)->drm, \
      |                     ^~~~~~~
drivers/gpu/drm/xe/xe_bo.c:2029:13: note: in expansion of macro ‘XE_IOCTL_DBG’
 2029 |         if (XE_IOCTL_DBG(xe, !gem_obj))

The problem is that XE_IOCTL_DBG uses this function for conditional expr.
Use a statement expression so it can also be used on conditional checks.

v2: Modify to print when only cond is true.
v3: Modify to evaluate cond only once.

Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241102022204.155039-1-gye976@gmail.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-04 23:24:26 -08:00
Lucas De Marchi
be15f0bc4a drm/xe: Fix drm-next merge
Fix merge in commit c787c2901e ("Merge drm/drm-next into
drm-xe-next"). That workaround needs to be done only once.

While at it, also remove undesired newline before checking for error.

Fixes: c787c2901e ("Merge drm/drm-next into drm-xe-next")
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104183104.2265275-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-04 22:36:02 -08:00
Thomas Hellström
c787c2901e Merge drm/drm-next into drm-xe-next
Backmerging to get up-to-date and to bring in a fix that was
merged through drm-misc-fixes.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-04 09:21:20 +01:00
Dave Airlie
30169bb645 Backmerge v6.12-rc6 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Backmerge Linus tree for some drm-fixes needed for msm and xe merges.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-04 14:25:33 +10:00
Linus Torvalds
59b723cd2a Linux 6.12-rc6 v6.12-rc6 2024-11-03 14:05:52 -10:00
Linus Torvalds
a8cc743272 Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "17 hotfixes.  9 are cc:stable.  13 are MM and 4 are non-MM.

  The usual collection of singletons - please see the changelogs"

* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
  mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
  mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
  mm: shrinker: avoid memleak in alloc_shrinker_info
  .mailmap: update e-mail address for Eugen Hristev
  vmscan,migrate: fix page count imbalance on node stats when demoting pages
  mailmap: update Jarkko's email addresses
  mm: allow set/clear page_type again
  nilfs2: fix potential deadlock with newly created symlinks
  Squashfs: fix variable overflow in squashfs_readpage_block
  kasan: remove vmalloc_percpu test
  tools/mm: -Werror fixes in page-types/slabinfo
  mm, swap: avoid over reclaim of full clusters
  mm: fix PSWPIN counter for large folios swap-in
  mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
  mm/codetag: fix null pointer check logic for ref and tag
  mm/gup: stop leaking pinned pages in low memory conditions
2024-11-03 10:25:05 -10:00
Linus Torvalds
d5aaa0bc6d Merge tag 'phy-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:

 - Qualcomm QMP driver fixes for null deref on suspend, bogus supplies
   fix and reset entries fix

 - BCM usb driver init array fix

 - cadence array offset fix

 - starfive link configuration fix

 - config dependency fix for rockchip driver

 - freescale reset signal fix before pll lock

 - tegra driver fix for error pointer check

* tag 'phy-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: tegra: xusb: Add error pointer check in xusb.c
  dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Fix X1E80100 resets entries
  phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check
  phy: phy-rockchip-samsung-hdptx: Depend on CONFIG_COMMON_CLK
  phy: ti: phy-j721e-wiz: fix usxgmii configuration
  phy: starfive: jh7110-usb: Fix link configuration to controller
  phy: qcom: qmp-pcie: drop bogus x1e80100 qref supplies
  phy: qcom: qmp-combo: move driver data initialisation earlier
  phy: qcom: qmp-usbc: fix NULL-deref on runtime suspend
  phy: qcom: qmp-usb-legacy: fix NULL-deref on runtime suspend
  phy: qcom: qmp-usb: fix NULL-deref on runtime suspend
  dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: add missing x1e80100 pipediv2 clocks
  phy: usb: disable COMMONONN for dual mode
  phy: cadence: Sierra: Fix offset of DEQ open eye algorithm control register
  phy: usb: Fix missing elements in BCM4908 USB init array
2024-11-03 10:19:34 -10:00
Linus Torvalds
e8529dcb12 Merge tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:

 - TI driver fix to set EOP for cyclic BCDMA transfers

 - sh rz-dmac driver fix for handling config with zero address

* tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer
  dmaengine: sh: rz-dmac: handle configs where one address is zero
2024-11-03 10:15:50 -10:00
Linus Torvalds
886b7e80ab Merge tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core revert from Greg KH:
 "Here is a single driver core revert for 6.12-rc6. It reverts a change
  that came in -rc1 that was supposed to resolve a reported problem, but
  caused another one, so revert it for now so that we can get this all
  worked out properly in 6.13.

  The revert has been in linux-next all week with no reported issues"

* tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Revert "driver core: Fix uevent_show() vs driver detach race"
2024-11-03 08:51:53 -10:00
Linus Torvalds
be5bfa1378 Merge tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
  have been sitting in my tree this week. Included in here are the
  following:

   - thunderbolt driver fixes for reported issues

   - USB typec driver fixes

   - xhci driver fixes for reported problems

   - dwc2 driver revert for a broken change

   - usb phy driver fix

   - usbip tool fix

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
  usb: phy: Fix API devm_usb_put_phy() can not release the phy
  usb: typec: use cleanup facility for 'altmodes_node'
  usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
  usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
  usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
  usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
  Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
  xhci: Fix Link TRB DMA in command ring stopped completion event
  xhci: Use pm_runtime_get to prevent RPM on unsupported systems
  usbip: tools: Fix detach_port() invalid port error path
  thunderbolt: Honor TMU requirements in the domain when setting TMU mode
  thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
2024-11-03 08:48:11 -10:00
Yu Zhao
1d4832becd mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted.  Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.

Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.

[jthoughton@google.com: fix link-time error]
Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com
Fixes: bd74fdaea1 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reported-by: David Stevens <stevensd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03 10:47:03 -08:00
Yu Zhao
ddd6d8e975 mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".

Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case.  By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.

We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted.  Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.

ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:

		Tab-open latency histogram (ms)
Version		p50	mean	p95	p99	max
base		1315	1198	2347	3454	10319
mglru		2559	1311	7399	12060	43758
fix		1119	926	2470	4211	6947

This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations.  I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.

This series also makes proactive reclaim with MGLRU possible for KVM
memory.  I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later.  Andrew, let me know if you'd
like to take the test now anyway.

[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google.com/


This patch (of 2):

The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.

Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea1 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03 10:47:02 -08:00
Linus Torvalds
32cfb3c48e Merge tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull misc driver fixes from Greg KH:
 "Here are some small char/misc/iio fixes for 6.12-rc6 that resolve
  some reported issues. Included in here are the following:

   - small IIO driver fixes for many reported issues

   - mei driver fix for a suddenly much reported issue for an "old"
     issue.

   - MAINTAINERS update for a developer who has moved companies and
     forgot to update their old entry.

  All of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mei: use kvmalloc for read buffer
  MAINTAINERS: add netup_unidvb maintainer
  iio: dac: Kconfig: Fix build error for ltc2664
  iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr()
  staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()
  docs: iio: ad7380: fix supply for ad7380-4
  iio: adc: ad7380: fix supplies for ad7380-4
  iio: adc: ad7380: add missing supplies
  iio: adc: ad7380: use devm_regulator_get_enable_read_voltage()
  dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply
  iio: light: veml6030: fix microlux value calculation
  iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table()
  iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
2024-11-03 08:45:03 -10:00
Linus Torvalds
295ba6501d Merge tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a fix for regression in input core introduced in 6.11 preventing
   re-registering input handlers

 - a fix for adp5588-keys driver tyring to disable interrupt 0 at
   suspend when devices is used without interrupt

 - a fix for edt-ft5x06 to stop leaking regmap structure when probing
   fails and to make sure it is not released too early on removal.

* tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: fix regression when re-registering input handlers
  Input: adp5588-keys - do not try to disable interrupt 0
  Input: edt-ft5x06 - fix regmap leak when probe fails
2024-11-03 08:35:29 -10:00
Linus Torvalds
a33ab3f94f Merge tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix a memory leak in modpost

 - Resolve build issues when cross-compiling RPM and Debian packages

 - Fix another regression in Kconfig

 - Fix incorrect MODULE_ALIAS() output in modpost

* tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
  modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
  kconfig: show sub-menu entries even if the prompt is hidden
  kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profile
  kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profile
  kbuild: rpm-pkg: disable kernel-devel package when cross-compiling
  sumversion: Fix a memory leak in get_src_version()
2024-11-03 08:29:02 -10:00
Linus Torvalds
b9021de3ec Merge tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A trivial compile test fix for x86:

  When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver
  fails due to a missing inline stub. Add the stub to cure it"

* tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
2024-11-03 08:26:00 -10:00
Linus Torvalds
b019b4a670 Merge tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix for posix CPU timers.

  When a thread is cloned, the posix CPU timers are not inherited.

  If the parent has a CPU timer armed the corresponding tick dependency
  in the tasks tick_dep_mask is set and copied to the new thread, which
  means the new thread and all decendants will prevent the system to go
  into full NOHZ operation.

  Clear the tick dependency mask in copy_process() to fix this"

* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
2024-11-03 08:22:21 -10:00
Linus Torvalds
33e83ffe4c Merge tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:

 - Plug a race between pick_next_task_fair() and try_to_wake_up() where
   both try to write to the same task, even though both paths hold a
   runqueue lock, but obviously from different runqueues.

   The problem is that the store to task::on_rq in __block_task() is
   visible to try_to_wake_up() which assumes that the task is not
   queued. Both sides then operate on the same task.

   Cure it by rearranging __block_task() so the the store to task::on_rq
   is the last operation on the task.

 - Prevent a potential NULL pointer dereference in task_numa_work()

   task_numa_work() iterates the VMAs of a process. A concurrent unmap
   of the address space can result in a NULL pointer return from
   vma_next() which is unchecked.

   Add the missing NULL pointer check to prevent this.

 - Operate on the correct scheduler policy in task_should_scx()

   task_should_scx() returns true when a task should be handled by sched
   EXT. It checks the tasks scheduling policy.

   This fails when the check is done before a policy has been set.

   Cure it by handing the policy into task_should_scx() so it operates
   on the requested value.

 - Add the missing handling of sched EXT in the delayed dequeue
   mechanism. This was simply forgotten.

* tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/ext: Fix scx vs sched_delayed
  sched: Pass correct scheduling policy to __setscheduler_class
  sched/numa: Fix the potential null pointer dereference in task_numa_work()
  sched: Fix pick_next_task_fair() vs try_to_wake_up() race
2024-11-03 08:18:28 -10:00
Linus Torvalds
68f05b251b Merge tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
 "perf_event_clear_cpumask() uses list_for_each_entry_rcu() without
  being in a RCU read side critical section, which triggers a
  'suspicious RCU usage' warning.

  It turns out that the list walk does not be RCU protected because the
  write side lock is held in this context.

  Change it to a regular list walk"

* tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix missing RCU reader protection in perf_event_clear_cpumask()
2024-11-03 08:13:52 -10:00
Linus Torvalds
8f0b844adc Merge tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:

 - Fix an off-by-one error in the failure path of msi_domain_alloc(),
   which causes the cleanup loop to terminate early and leaking the
   first allocated interrupt.

 - Handle a corner case in GIC-V4 versus a lazily mapped Virtual
   Processing Element (VPE). If the VPE has not been mapped because the
   guest has not yet emitted a mapping command, then the set_affinity()
   callback returns an error code, which causes the vCPU management to
   fail.

   Return success in this case without touching the hardware. This will
   be done later when the guest issues the mapping command.

* tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs
  genirq/msi: Fix off-by-one error in msi_domain_alloc()
2024-11-03 08:09:25 -10:00
Masahiro Yamada
77dc55a978 modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
When building a 64-bit kernel on a 32-bit build host, incorrect
input MODULE_ALIAS() entries may be generated.

For example, when compiling a 64-bit kernel with CONFIG_INPUT_MOUSEDEV=m
on a 64-bit build machine, you will get the correct output:

  $ grep MODULE_ALIAS drivers/input/mousedev.mod.c
  MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*r*0,*1,*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*r*a*0,*1,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*r*a*0,*1,*18,*1C,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*r*a*0,*1,*m*l*s*f*w*");

However, building the same kernel on a 32-bit machine results in
incorrect output:

  $ grep MODULE_ALIAS drivers/input/mousedev.mod.c
  MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*130,*r*0,*1,*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*16A,*r*a*0,*1,*20,*21,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*165,*r*a*0,*1,*18,*1C,*20,*21,*38,*3C,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*130,*r*a*0,*1,*20,*21,*m*l*s*f*w*");

A similar issue occurs with CONFIG_INPUT_JOYDEV=m. On a 64-bit build
machine, the output is:

  $ grep MODULE_ALIAS drivers/input/joydev.mod.c
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*120,*r*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*130,*r*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*r*a*m*l*s*f*w*");

However, on a 32-bit machine, the output is incorrect:

  $ grep MODULE_ALIAS drivers/input/joydev.mod.c
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*20,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*22,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*28,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*26,*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
  MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*2E0,*r*a*m*l*s*f*w*");

When building a 64-bit kernel, BITS_PER_LONG is defined as 64. However,
on a 32-bit build machine, the constant 1L is a signed 32-bit value.
Left-shifting it beyond 32 bits causes wraparound, and shifting by 31
or 63 bits makes it a negative value.

The fix in commit e0e9263271 ("[PATCH] PATCH: 1 line 2.6.18 bugfix:
modpost-64bit-fix.patch") is incorrect; it only addresses cases where
a 64-bit kernel is built on a 64-bit build machine, overlooking cases
on a 32-bit build machine.

Using 1ULL ensures a 64-bit width on both 32-bit and 64-bit machines,
avoiding the wraparound issue.

Fixes: e0e9263271 ("[PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-03 23:58:56 +09:00
Masahiro Yamada
2e766a1f5f modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
When CONFIG_SATA_AHCI_PLATFORM=m, modpost outputs incorect acpi
MODULE_ALIAS() if the endianness of the target and the build machine
do not match.

When the endianness of the target kernel and the build machine match,
the output is correct:

  $ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
  MODULE_ALIAS("acpi*:APMC0D33:*");
  MODULE_ALIAS("acpi*:010601:*");

However, when building a little-endian kernel on a big-endian machine
(or vice versa), the output is incorrect:

  $ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
  MODULE_ALIAS("acpi*:APMC0D33:*");
  MODULE_ALIAS("acpi*:0601??:*");

The 'cls' and 'cls_msk' fields are 32-bit.

DEF_FIELD() must be used instead of DEF_FIELD_ADDR() to correctly handle
endianness of these 32-bit fields.

The check 'if (cls)' was unnecessary; it never became NULL, as it was
the pointer to 'symval' plus the offset to the 'cls' field.

Fixes: 26095a01d3 ("ACPI / scan: Add support for ACPI _CLS device matching")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-03 22:52:12 +09:00
Dmitry Torokhov
071b24b54d Input: fix regression when re-registering input handlers
Commit d469647baf ("Input: simplify event handling logic") introduced
code that would set handler->events() method to either
input_handler_events_filter() or input_handler_events_default() or
input_handler_events_null(), depending on the kind of input handler
(a filter or a regular one) we are dealing with. Unfortunately this
breaks cases when we try to re-register the same filter (as is the case
with sysrq handler): after initial registration the handler will have 2
event handling methods defined, and will run afoul of the check in
input_handler_check_methods():

	input: input_handler_check_methods: only one event processing method can be defined (sysrq)
	sysrq: Failed to register input handler, error -22

Fix this by adding handle_events() method to input_handle structure and
setting it up when registering a new input handle according to event
handling methods defined in associated input_handler structure, thus
avoiding modifying the input_handler structure.

Reported-by: "Ned T. Crigler" <crigler@gmail.com>
Reported-by: Christian Heusel <christian@heusel.eu>
Tested-by: "Ned T. Crigler" <crigler@gmail.com>
Tested-by: Peter Seiderer <ps.report@gmx.net>
Fixes: d469647baf ("Input: simplify event handling logic")
Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-11-02 22:28:58 -07:00
Linus Torvalds
3e5e6c9900 Merge tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Fix two async COPY bugs found during NFS bake-a-thon

 - Fix an svcrdma memory leak

* tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  rpcrdma: Always release the rpcrdma_device's xa_array
  NFSD: Never decrement pending_async_copies on error
  NFSD: Initialize struct nfsd4_copy earlier
2024-11-02 09:27:11 -10:00
Linus Torvalds
f6a7b4ec74 Merge tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:

 - fix a sysbot reported crash on filestreams

 - Reduce cpu time spent searching for extents in a very fragmented FS

 - Check for delayed allocations before setting extsize

* tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: streamline xfs_filestream_pick_ag
  xfs: fix finding a last resort AG in xfs_filestream_pick_ag
  xfs: Reduce unnecessary searches when searching for the best extents
  xfs: Check for delayed allocations before setting extsize
2024-11-02 09:22:16 -10:00
Linus Torvalds
11066801dd Merge tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:

 - fix syntax error in frequency calculation arithmetic expression in
   intel_pstate run.sh

 - add missing cpupower dependency check intel_pstate run.sh

 - fix idmap_mount_tree_invalid test failure due to incorrect argument

 - fix watchdog-test run leaving the watchdog timer enabled causing
   system reboot. With this fix, the test disables the watchdog timer
   when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in
   addition to SIGINT

* tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/watchdog-test: Fix system accidentally reset after watchdog-test
  selftests/intel_pstate: check if cpupower is installed
  selftests/intel_pstate: fix operand expected error
  selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
2024-11-01 16:05:50 -10:00
Linus Torvalds
f7292c0934 Merge tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Avoid build errors with old 'rustc's without LLVM patch version
     (important since it impacts people that do not even enable Rust)

   - Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in
     'depends on' condition (the fix was eventually backported rather
     than land in LLVM 19)"

* tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux:
  cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
  kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
2024-11-01 15:59:46 -10:00
Linus Torvalds
05b92660cd Merge tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:

 - Enable device-specific ACS-like functionality even if the device
   doesn't advertise an ACS capability, which got broken when adding
   fancy ACS kernel parameter (Jason Gunthorpe)

* tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Fix pci_enable_acs() support for the ACS quirks
2024-11-01 15:44:23 -10:00
Linus Torvalds
269ce3bd62 Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Regular fixes pull, nothing too out of the ordinary, the mediatek
  fixes came in a batch that I might have preferred a bit earlier but
  all seem fine, otherwise regular xe/amdgpu and a few misc ones.

  xe:
   - Fix missing HPD interrupt enabling, bringing one PM refactor with it
   - Workaround LNL GGTT invalidation not being visible to GuC
   - Avoid getting jobs stuck without a protecting timeout

  ivpu:
   - Fix firewall IRQ handling

  panthor:
   - Fix firmware initialization wrt page sizes
   - Fix handling and reporting of dead job groups

  sched:
   - Guarantee forward progress via WC_MEM_RECLAIM

  tests:
   - Fix memory leak in drm_display_mode_from_cea_vic()

  amdgpu:
   - DCN 3.5 fix
   - Vangogh SMU KASAN fix
   - SMU 13 profile reporting fix

  mediatek:
   - Fix degradation problem of alpha blending
   - Fix color format MACROs in OVL
   - Fix get efuse issue for MT8188 DPTX
   - Fix potential NULL dereference in mtk_crtc_destroy()
   - Correct dpi power-domains property
   - Add split subschema property constraints"

* tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
  drm/xe: Don't short circuit TDR on jobs not started
  drm/xe: Add mmio read before GGTT invalidate
  drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()
  drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()
  drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()
  drm/panthor: Report group as timedout when we fail to properly suspend
  drm/panthor: Fail job creation when the group is dead
  drm/panthor: Fix firmware initialization on systems with a page size > 4k
  accel/ivpu: Fix NOC firewall interrupt handling
  drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
  drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
  drm/xe: Remove runtime argument from display s/r functions
  drm/amdgpu/smu13: fix profile reporting
  drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
  Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"
  drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM
  drm/tegra: Fix NULL vs IS_ERR() check in probe()
  dt-bindings: display: mediatek: split: add subschema property constraints
  dt-bindings: display: mediatek: dpi: correct power-domains property
  drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()
  ...
2024-11-01 15:37:09 -10:00
Linus Torvalds
b1966a1fd2 Merge tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
 "The bulk of these fixes center around an initialization order bug
  reported by Gregory Price and some additional fall out from the
  debugging effort.

  In summary, cxl_acpi and cxl_mem race and previously worked because of
  a bus_rescan_devices() while testing without modules built in.

  Unfortunately with modules built in the rescan would fail due to the
  cxl_port driver being registered late via the build order. Furthermore
  it was found bus_rescan_devices() did not guarantee a probe barrier
  which CXL was expecting. Additional fixes to cxl-test and decoder
  allocation came along as they were found in this debugging effort.

  The other fixes are pretty minor but one affects trace point data seen
  by user space.

  Summary:

   - Fix crashes when running with cxl-test code

   - Fix Trace DRAM Event Record field decodes

   - Fix module/built in initialization order errors

   - Fix use after free on decoder shutdowns

   - Fix out of order decoder allocations

   - Improve cxl-test to better reflect real world systems"

* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/test: Improve init-order fidelity relative to real-world systems
  cxl/port: Prevent out-of-order decoder allocation
  cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
  cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
  cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
  cxl/port: Fix CXL port initialization order when the subsystem is built-in
  cxl/events: Fix Trace DRAM Event Record
  cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
2024-11-01 15:22:57 -10:00
Linus Torvalds
f4a1e8e369 Merge tag 'block-6.12-20241101' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Fixup for a recent blk_rq_map_user_bvec() patch

 - NVMe pull request via Keith:
     - Spec compliant identification fix (Keith)
     - Module parameter to enable backward compatibility on unusual
       namespace formats (Keith)
     - Target double free fix when using keys (Vitaliy)
     - Passthrough command error handling fix (Keith)

* tag 'block-6.12-20241101' of git://git.kernel.dk/linux:
  nvme: re-fix error-handling for io_uring nvme-passthrough
  nvmet-auth: assign dh_key to NULL after kfree_sensitive
  nvme: module parameter to disable pi with offsets
  block: fix queue limits checks in blk_rq_map_user_bvec for real
  nvme: enhance cns version checking
2024-11-01 13:41:55 -10:00
Linus Torvalds
f0d3699aef Merge tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:

 - Fix not honoring IOCB_NOWAIT for starting buffered writes in terms of
   calling sb_start_write(), leading to a deadlock if someone is
   attempting to freeze the file system with writes in progress, as each
   side will end up waiting for the other to make progress.

* tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linux:
  io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
2024-11-01 13:38:01 -10:00
Linus Torvalds
c426456857 Merge tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
 "Make the ACPI CPPC library use a raw spinlock for operations carried
  out in scheduler context via the schedutil governor and the ACPI CPPC
  cpufreq driver (Pierre Gondois)"

* tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Make rmw_lock a raw_spin_lock
2024-11-01 09:04:23 -10:00
Linus Torvalds
edf0227abd Merge tag 'gpio-fixes-for-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:

 - fix an uninitialized variable in GPIO swnode code

 - add a missing return value check for devm_mutex_init()

 - fix an old issue with debugfs output

* tag 'gpio-fixes-for-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: fix debugfs dangling chip separator
  gpiolib: fix debugfs newline separators
  gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() call
  gpio: fix uninit-value in swnode_find_gpio
2024-11-01 09:03:02 -10:00
Dave Airlie
f99c7cca2f Merge tag 'drm-xe-fixes-2024-10-31' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
  (Imre / Maarten)
- Workaround LNL GGTT invalidation not being visible to GuC
  (Matthew Brost)
- Avoid getting jobs stuck without a protecting timeout (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
2024-11-02 04:44:27 +10:00
Linus Torvalds
a031e15404 Merge tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - Avoid accessing the early boot ACPI tables via unsafe memory
   attributes, which can result in incorrect ACPI table data appearing.
   This can cause all sorts of bad behavior.

 - Avoid compiler-inserted library calls in the VDSO.

 - GCC+Rust builds have been disabled, to avoid issues related to ISA
   string mismatched between the GCC and LLVM Rust implementations.

 - The NX flag is now set in the EFI PE/COFF headers, which is necessary
   for some distro GRUB versions to boot images.

 - A fix to avoid leaking DT node reference counts on ACPI systems
   during cache info parsing.

 - CPU numbers are now printed as unsigned values during hotplug.

 - A pair of build fixes for usused macros, which can trigger warnings
   on some configurations.

* tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Remove duplicated GET_RM
  riscv: Remove unused GENERATING_ASM_OFFSETS
  riscv: Use '%u' to format the output of 'cpu'
  riscv: Prevent a bad reference count on CPU nodes
  riscv: efi: Set NX compat flag in PE/COFF header
  RISC-V: disallow gcc + rust builds
  riscv: Do not use fortify in early code
  RISC-V: ACPI: fix early_ioremap to early_memremap
  riscv: vdso: Prevent the compiler from inserting calls to memset()
2024-11-01 08:26:38 -10:00
Linus Torvalds
3dfffd506e Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "The important one is a change to the way in which we handle protection
  keys around signal delivery so that we're more closely aligned with
  the x86 behaviour, however there is also a revert of the previous fix
  to disable software tag-based KASAN with GCC, since a workaround
  materialised shortly afterwards.

  I'd love to say we're done with 6.12, but we're aware of some
  longstanding fpsimd register corruption issues that we're almost at
  the bottom of resolving.

  Summary:

   - Fix handling of POR_EL0 during signal delivery so that pushing the
     signal context doesn't fail based on the pkey configuration of the
     interrupted context and align our user-visible behaviour with that
     of x86.

   - Fix a bogus pointer being passed to the CPU hotplug code from the
     Arm SDEI driver.

   - Re-enable software tag-based KASAN with GCC by using an alternative
     implementation of '__no_sanitize_address'"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
  firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
  Revert "kasan: Disable Software Tag-Based KASAN with GCC"
  kasan: Fix Software Tag-Based KASAN with GCC
2024-11-01 07:54:11 -10:00