Commit Graph

1311354 Commits

Author SHA1 Message Date
Linus Torvalds
8dcf44fcad Merge tag 'vfs-6.13.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
 "Various fixes for the netfs library and related infrastructure:

  cachefiles:

   - Fix a dentry leak in cachefiles_open_file()

   - Fix incorrect length return value in
     cachefiles_ondemand_fd_write_iter()

   - Fix missing pos updates in cachefiles_ondemand_fd_write_iter()

   - Clean up in cachefiles_commit_tmpfile()

   - Fix NULL pointer dereference in object->file

   - Add a memory barrier for FSCACHE_VOLUME_CREATING

  netfs:

   - Remove call to folio_index()

   - Fix a few minor bugs in netfs_page_mkwrite()

   - Remove unnecessary references to pages"

* tag 'vfs-6.13.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING
  cachefiles: Fix NULL pointer dereference in object->file
  cachefiles: Clean up in cachefiles_commit_tmpfile()
  cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
  cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter()
  netfs: Remove unnecessary references to pages
  netfs: Fix a few minor bugs in netfs_page_mkwrite()
  netfs: Remove call to folio_index()
2024-11-18 10:26:49 -08:00
Linus Torvalds
56be9aaf98 Merge tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pagecache updates from Christian Brauner:
 "Cleanup filesystem page flag usage: This continues the work to make
  the mappedtodisk/owner_2 flag available to filesystems which don't use
  buffer heads. Further patches remove uses of Private2. This brings us
  very close to being rid of it entirely"

* tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  migrate: Remove references to Private2
  ceph: Remove call to PagePrivate2()
  btrfs: Switch from using the private_2 flag to owner_2
  mm: Remove PageMappedToDisk
  nilfs2: Convert nilfs_copy_buffer() to use folios
  fs: Move clearing of mappedtodisk to buffer.c
2024-11-18 09:54:32 -08:00
Linus Torvalds
5bb6ba448f Merge tag 'vfs-6.13.rust.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rust file abstractions from Christian Brauner:
 "This contains the file abstractions needed by the Rust implementation
  of the Binder driver and other parts of the kernel.

  Let's treat this as a first attempt at getting something working but I
  do expect the actual interfaces to change significantly over time.
  Simply because we are still figuring out what actually works. But
  there's no point in further theorizing. Let's see how it holds up with
  actual users"

* tag 'vfs-6.13.rust.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  rust: task: adjust safety comments in Task methods
  rust: add seqfile abstraction
  rust: file: add abstraction for `poll_table`
  rust: file: add `Kuid` wrapper
  rust: file: add `FileDescriptorReservation`
  rust: security: add abstraction for secctx
  rust: cred: add Rust abstraction for `struct cred`
  rust: file: add Rust abstraction for `struct file`
  rust: task: add `Task::current_raw`
  rust: types: add `NotThreadSafe`
2024-11-18 09:51:32 -08:00
Linus Torvalds
70e7730c2a Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Linus Torvalds
4eb98b7760 Merge tag 'vfs-6.13.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount api conversions from Christian Brauner:
 "Convert adfs, affs, befs, hfs, hfsplus, jfs, and hpfs to the new mount
  api"

* tag 'vfs-6.13.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  efs: fix the efs new mount api implementation
  ubifs: Convert ubifs to use the new mount API
  hpfs: convert hpfs to use the new mount api
  jfs: convert jfs to use the new mount api
  hfsplus: convert hfsplus to use the new mount api
  hfs: convert hfs to use the new mount api
  befs: convert befs to use the new mount api
  affs: convert affs to use the new mount api
  adfs: convert adfs to use the new mount api
2024-11-18 09:33:34 -08:00
Linus Torvalds
6ac81fd55e Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs multigrain timestamps from Christian Brauner:
 "This is another try at implementing multigrain timestamps. This time
  with significant help from the timekeeping maintainers to reduce the
  performance impact.

  Thomas provided a base branch that contains the required timekeeping
  interfaces for the VFS. It serves as the base for the multi-grain
  timestamp work:

   - Multigrain timestamps allow the kernel to use fine-grained
     timestamps when an inode's attributes is being actively observed
     via ->getattr(). With this support, it's possible for a file to get
     a fine-grained timestamp, and another modified after it to get a
     coarse-grained stamp that is earlier than the fine-grained time. If
     this happens then the files can appear to have been modified in
     reverse order, which breaks VFS ordering guarantees.

     To prevent this, a floor value is maintained for multigrain
     timestamps. Whenever a fine-grained timestamp is handed out, record
     it, and when later coarse-grained stamps are handed out, ensure
     they are not earlier than that value. If the coarse-grained
     timestamp is earlier than the fine-grained floor, return the floor
     value instead.

     The timekeeper changes add a static singleton atomic64_t into
     timekeeper.c that is used to keep track of the latest fine-grained
     time ever handed out. This is tracked as a monotonic ktime_t value
     to ensure that it isn't affected by clock jumps. Because it is
     updated at different times than the rest of the timekeeper object,
     the floor value is managed independently of the timekeeper via a
     cmpxchg() operation, and sits on its own cacheline.

     Two new public timekeeper interfaces are added:

      (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the
          later of the coarse-grained clock and the floor time

      (2) ktime_get_real_ts64_mg() gets the fine-grained clock value,
          and tries to swap it into the floor. A timespec64 is filled
          with the result.

   - The VFS has always used coarse-grained timestamps when updating the
     ctime and mtime after a change. This has the benefit of allowing
     filesystems to optimize away a lot metadata updates, down to around
     1 per jiffy, even when a file is under heavy writes.

     Unfortunately, this has always been an issue when we're exporting
     via NFSv3, which relies on timestamps to validate caches. A lot of
     changes can happen in a jiffy, so timestamps aren't sufficient to
     help the client decide when to invalidate the cache. Even with
     NFSv4, a lot of exported filesystems don't properly support a
     change attribute and are subject to the same problems with
     timestamp granularity. Other applications have similar issues with
     timestamps (e.g backup applications).

     If we were to always use fine-grained timestamps, that would
     improve the situation, but that becomes rather expensive, as the
     underlying filesystem would have to log a lot more metadata
     updates.

     This adds a way to only use fine-grained timestamps when they are
     being actively queried. Use the (unused) top bit in
     inode->i_ctime_nsec as a flag that indicates whether the current
     timestamps have been queried via stat() or the like. When it's set,
     we allow the kernel to use a fine-grained timestamp iff it's
     necessary to make the ctime show a different value.

     This solves the problem of being able to distinguish the timestamp
     between updates, but introduces a new problem: it's now possible
     for a file being changed to get a fine-grained timestamp. A file
     that is altered just a bit later can then get a coarse-grained one
     that appears older than the earlier fine-grained time. This
     violates timestamp ordering guarantees.

     This is where the earlier mentioned timkeeping interfaces help. A
     global monotonic atomic64_t value is kept that acts as a timestamp
     floor. When we go to stamp a file, we first get the latter of the
     current floor value and the current coarse-grained time. If the
     inode ctime hasn't been queried then we just attempt to stamp it
     with that value.

     If it has been queried, then first see whether the current coarse
     time is later than the existing ctime. If it is, then we accept
     that value. If it isn't, then we get a fine-grained time and try to
     swap that into the global floor. Whether that succeeds or fails, we
     take the resulting floor time, convert it to realtime and try to
     swap that into the ctime.

     We take the result of the ctime swap whether it succeeds or fails,
     since either is just as valid.

     Filesystems can opt into this by setting the FS_MGTIME fstype flag.
     Others should be unaffected (other than being subject to the same
     floor value as multigrain filesystems)"

* tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: reduce pointer chasing in is_mgtime() test
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  timekeeping: Add percpu counter for tracking floor swap events
  timekeeping: Add interfaces for handling timestamps with a floor value
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps
2024-11-18 09:15:39 -08:00
Linus Torvalds
adc218676e Linux 6.12 v6.12 2024-11-17 14:15:08 -08:00
Linus Torvalds
f66d6acccb Merge tag 'x86_urgent_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Make sure a kdump kernel with CONFIG_IMA_KEXEC enabled and booted on
   an AMD SME enabled hardware properly decrypts the ima_kexec buffer
   information passed to it from the previous kernel

 - Fix building the kernel with Clang where a non-TLS definition of the
   stack protector guard cookie leads to bogus code generation

 - Clear a wrongly advertised virtualized VMLOAD/VMSAVE feature flag on
   some Zen4 client systems as those insns are not supported on client

* tag 'x86_urgent_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y
  x86/stackprotector: Work around strict Clang TLS symbol requirements
  x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
2024-11-17 09:35:51 -08:00
Linus Torvalds
4a5df37964 Merge tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
 "10 hotfixes, 7 of which are cc:stable. All singletons, please see the
  changelogs for details"

* tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: revert "mm: shmem: fix data-race in shmem_getattr()"
  ocfs2: uncache inode which has failed entering the group
  mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
  mm, doc: update read_ahead_kb for MADV_HUGEPAGE
  fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
  sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
  crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
  mm/mremap: fix address wraparound in move_page_tables()
  tools/mm: fix compile error
  mm, swap: fix allocation and scanning race with swapoff
2024-11-16 16:00:38 -08:00
Andrew Morton
d1aa0c0429 mm: revert "mm: shmem: fix data-race in shmem_getattr()"
Revert d949d1d14f ("mm: shmem: fix data-race in shmem_getattr()") as
suggested by Chuck [1].  It is causing deadlocks when accessing tmpfs over
NFS.

As Hugh commented, "added just to silence a syzbot sanitizer splat: added
where there has never been any practical problem".

Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1]
Fixes: d949d1d14f ("mm: shmem: fix data-race in shmem_getattr()")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-16 15:30:32 -08:00
Linus Torvalds
b84eeed05a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:

 - Fix kernel mapping for XIP kernels

 - Fix SMP support for XIP kernels

 - Fix complication corner case with CFI

 - Fix a typo in nommu code

 - Fix cacheflush syscall when PAN is enabled on LPAE platforms

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: fix cacheflush with PAN
  ARM: 9435/1: ARM/nommu: Fix typo "absence"
  ARM: 9434/1: cfi: Fix compilation corner case
  ARM: 9420/1: smp: Fix SMP for xip kernels
  ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels
2024-11-16 15:14:39 -08:00
Linus Torvalds
e06bc45bef Merge tag 'drm-fixes-2024-11-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fix from Dave Airlie:
 "Alex sent on a last minute revert for a amdgpu/swsmu regression:

   - revert patch to fix swsmu regression"

* tag 'drm-fixes-2024-11-17' of https://gitlab.freedesktop.org/drm/kernel:
  Revert "drm/amd/pm: correct the workload setting"
2024-11-16 15:09:14 -08:00
Dave Airlie
f48ab0a39f Merge tag 'amd-drm-fixes-6.12-2024-11-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-11-16:

amdgpu:
- Revert a swsmu patch to fix a regression

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241116145320.2507156-1-alexander.deucher@amd.com
2024-11-17 08:12:48 +10:00
Linus Torvalds
b5a24181e4 Merge tag 'trace-ringbuffer-v6.12-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring buffer fixes from Steven Rostedt:

 - Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU
   hotplug"

   A crash that happened on cpu hotplug was actually caused by the
   incorrect ref counting that was fixed by commit 2cf9733891
   ("ring-buffer: Fix refcount setting of boot mapped buffers"). The
   removal of calling cpu hotplug callbacks on memory mapped buffers was
   not an issue even though the tests at the time pointed toward it. But
   in fact, there's a check in that code that tests to see if the
   buffers are already allocated or not, and will not allocate them
   again if they are. Not calling the cpu hotplug callbacks ended up not
   initializing the non boot CPU buffers.

   Simply remove that change.

 - Clear all CPU buffers when starting tracing in a boot mapped buffer

   To properly process events from a previous boot, the address space
   needs to be accounted for due to KASLR and the events in the buffer
   are updated accordingly when read. This also requires that when the
   buffer has tracing enabled again in the current boot that the buffers
   are reset so that events from the previous boot do not interact with
   the events of the current boot and cause confusing due to not having
   the proper meta data.

   It was found that if a CPU is taken offline, that its per CPU buffer
   is not reset when tracing starts. This allows for events to be from
   both the previous boot and the current boot to be in the buffer at
   the same time. Clear all CPU buffers when tracing is started in a
   boot mapped buffer.

* tag 'trace-ringbuffer-v6.12-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recording
  Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"
2024-11-16 08:12:43 -08:00
Alex Deucher
44f392fbf6 Revert "drm/amd/pm: correct the workload setting"
This reverts commit 74e1006430.

This causes a regression in the workload selection.
A more extensive fix is being worked on.
For now, revert.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Fixes: 74e1006430 ("drm/amd/pm: correct the workload setting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-16 09:41:11 -05:00
Linus Torvalds
e8bdb3c8be Merge tag 'riscv-for-linus-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:

 - A fix for the CPU perf driver that avoids leaking CPU ID references
   on systems without snapshot support.

* tag 'riscv-for-linus-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  drivers: perf: Fix wrong put_cpu() placement
2024-11-15 11:44:32 -08:00
Linus Torvalds
f868cd2517 Merge tag 'drm-fixes-2024-11-16' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Final week of fixes, lots of small amdgpu fixes, some i915 and xe
  fixes, the nouveau changes fix a recent regression and some laptop
  panel black screens, then a couple of other misc ones.

  It's probably a little busier than I'd like, but each fix seems fine.

  amdgpu:
   - PSR fix
   - Panel replay fixes
   - DML fix
   - vblank power fix
   - Fix video caps
   - SMU 14.0 fix
   - GPUVM fix
   - MES 12 fix
   - APU carve out fix
   - DC vbios fix
   - NBIO fix

  i915:
   - Don't load GSC on ARL-H and ARL-U if too old FW
   - Avoid potential OOPS in enabling/disabling TV output

  xe:
   - Fix unlock on exec ioctl error path
   - Fix hibernation on LNL due to ggtt getting lost
   - Fix missing runtime PM in OA release

  bridge:
   - tc358768: Fix DSI command tx

  nouveau:
   - Fix GSP AUX error handling
   - dp: Handle retires for AUX CH transfers with GSP
   - fw: Sync DMA after setup

  panthor:
   - Fix partial BO mappings to GPU

  rockchip:
   - vop: Avoid null-ptr deref in plane-state check

  vmwgfx:
   - Avoid null-ptr deref in surface creation"

* tag 'drm-fixes-2024-11-16' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
  drm/bridge: tc358768: Fix DSI command tx
  drm/vmwgfx: avoid null_ptr_deref in vmw_framebuffer_surface_create_handle
  nouveau/dp: handle retries for AUX CH transfers with GSP.
  nouveau: handle EBUSY and EAGAIN for GSP aux errors.
  nouveau: fw: sync dma after setup is called.
  drm/xe/oa: Fix "Missing outer runtime PM protection" warning
  drm/xe: handle flat ccs during hibernation on igpu
  drm/xe: improve hibernation on igpu
  drm/xe: Restore system memory GGTT mappings
  drm/xe: Ensure all locks released in exec IOCTL
  drm/panthor: Fix handling of partial GPU mapping of BOs
  drm/amd: Fix initialization mistake for NBIO 7.7.0
  Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
  drm/amd/display: Fix failure to read vram info due to static BP_RESULT
  drm/amdgpu: enable GTT fallback handling for dGPUs only
  drm/i915: Grab intel_display from the encoder to avoid potential oopsies
  drm/i915/gsc: ARL-H and ARL-U need a newer GSC FW.
  drm/amdgpu/mes12: correct kiq unmap latency
  drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
  drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0
  ...
2024-11-15 10:53:42 -08:00
Linus Torvalds
f539573284 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:

 - Revert a change to the VLAN logic, this broke previously working ROCE
   configurations

 - Fix a memory leak on error unwinding in bnxt_re

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan"
  RDMA/bnxt_re: Remove some dead code
  RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()
2024-11-15 10:48:28 -08:00
Dave Airlie
21c1c6c7d7 Merge tag 'drm-xe-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix unlock on exec ioctl error path (Matthew Brost)
- Fix hibernation on LNL due to ggtt getting lost
  (Matthew Brost / Matthew Auld)
- Fix missing runtime PM in OA release (Ashutosh)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5ntcf2ssmmvo5dsf2mdcee4guwwmpbm3xrlufgt2pdfmznzjo3@62ygo3bxkock
2024-11-16 04:31:54 +10:00
Linus Torvalds
1b597e1cf0 Merge tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
 "pmdomain core:
   - Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names

  pmdomain providers:
   - arm: Use FLAG_DEV_NAME_FW to ensure unique names
   - imx93-blk-ctrl: Fix the remove path

  arm_scmi/qcom-cpucp:
   - Report duplicate OPPs as firmware bugs for arm_scmi
   - Skip OPP duplicates for arm_scmi
   - Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag"

* tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
  firmware: arm_scmi: Report duplicate opps as firmware bugs
  firmware: arm_scmi: Skip opp duplicates
  pmdomain: imx93-blk-ctrl: correct remove path
  pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
  pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
2024-11-15 10:20:17 -08:00
Linus Torvalds
aa35f5446f Merge tag 'mmc-v6.12-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:

 - dw_mmc: Revert fix for IDMAC operation with pages bigger than 4K

 - sunxi-mmc: Fix A100 compatible description

* tag 'mmc-v6.12-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"
  mmc: sunxi-mmc: Fix A100 compatible description
2024-11-15 10:16:12 -08:00
Linus Torvalds
eeae5ef6bf Merge tag 'sound-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A few last-minute fixes. All changes are device-specific small fixes
  that should be pretty safe to apply"

* tag 'sound-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - update set GPIO3 to default for Thinkpad with ALC1318
  ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10
  ALSA: hda/realtek - Fixed Clevo platform headset Mic issue
  ALSA: usb-audio: Fix Yamaha P-125 Quirk Entry
  ASoC: max9768: Fix event generation for playback mute
  ASoC: intel: sof_sdw: add quirk for Dell SKU
  ASoC: audio-graph-card2: Purge absent supplies for device tree nodes
2024-11-15 10:09:38 -08:00
Linus Torvalds
842c7e5834 Merge tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a regression in the MIPS CRC32C code"

* tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: mips/crc32 - fix the CRC32C implementation
2024-11-15 10:04:39 -08:00
Linus Torvalds
d79944b094 Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
 "One more fix for v6.12-rc7

  ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
  the operation to call kfuncs which shouldn't be allowed. Fix it by
  using SCX_KF_REST instead, which is trivial and low risk"

* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
2024-11-15 09:59:51 -08:00
Linus Torvalds
c9dd4571ad Merge tag 'for-6.12-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
 "One more fix that seems urgent and good to have in 6.12 final.

  It could potentially lead to unexpected transaction aborts, due to
  wrong comparison and order of processing of delayed refs"

* tag 'for-6.12-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix incorrect comparison for delayed refs
2024-11-15 09:45:32 -08:00
Dmitry Antipov
737f341378 ocfs2: uncache inode which has failed entering the group
Syzbot has reported the following BUG:

kernel BUG at fs/ocfs2/uptodate.c:509!
...
Call Trace:
 <TASK>
 ? __die_body+0x5f/0xb0
 ? die+0x9e/0xc0
 ? do_trap+0x15a/0x3a0
 ? ocfs2_set_new_buffer_uptodate+0x145/0x160
 ? do_error_trap+0x1dc/0x2c0
 ? ocfs2_set_new_buffer_uptodate+0x145/0x160
 ? __pfx_do_error_trap+0x10/0x10
 ? handle_invalid_op+0x34/0x40
 ? ocfs2_set_new_buffer_uptodate+0x145/0x160
 ? exc_invalid_op+0x38/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? ocfs2_set_new_buffer_uptodate+0x2e/0x160
 ? ocfs2_set_new_buffer_uptodate+0x144/0x160
 ? ocfs2_set_new_buffer_uptodate+0x145/0x160
 ocfs2_group_add+0x39f/0x15a0
 ? __pfx_ocfs2_group_add+0x10/0x10
 ? __pfx_lock_acquire+0x10/0x10
 ? mnt_get_write_access+0x68/0x2b0
 ? __pfx_lock_release+0x10/0x10
 ? rcu_read_lock_any_held+0xb7/0x160
 ? __pfx_rcu_read_lock_any_held+0x10/0x10
 ? smack_log+0x123/0x540
 ? mnt_get_write_access+0x68/0x2b0
 ? mnt_get_write_access+0x68/0x2b0
 ? mnt_get_write_access+0x226/0x2b0
 ocfs2_ioctl+0x65e/0x7d0
 ? __pfx_ocfs2_ioctl+0x10/0x10
 ? smack_file_ioctl+0x29e/0x3a0
 ? __pfx_smack_file_ioctl+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x43d/0x780
 ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
 ? __pfx_ocfs2_ioctl+0x10/0x10
 __se_sys_ioctl+0xfb/0x170
 do_syscall_64+0xf3/0x230
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
 </TASK>

When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular
inode in 'ocfs2_verify_group_and_input()', corresponding buffer head
remains cached and subsequent call to the same 'ioctl()' for the same
inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying
to cache the same buffer head of that inode). Fix this by uncaching
the buffer head with 'ocfs2_remove_from_cache()' on error path in
'ocfs2_group_add()'.

Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru
Fixes: 7909f2bf83 ("[PATCH 2/2] ocfs2: Implement group add for online resize")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=453873f1588c2d75b447
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Dmitry Antipov <dmantipov@yandex.ru>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Jinjiang Tu
8ce41b0f9d mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in
alloc_pages_bulk_noprof() when the task is migrated between cpusets.

When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be
&current->mems_allowed.  when first_zones_zonelist() is called to find
preferred_zoneref, the ac->nodemask may be modified concurrently if the
task is migrated between different cpusets.  Assuming we have 2 NUMA Node,
when traversing Node1 in ac->zonelist, the nodemask is 2, and when
traversing Node2 in ac->zonelist, the nodemask is 1.  As a result, the
ac->preferred_zoneref points to NULL zone.

In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a
allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading
to NULL pointer dereference.

__alloc_pages_noprof() fixes this issue by checking NULL pointer in commit
ea57485af8 ("mm, page_alloc: fix check for NULL preferred_zone") and
commit df76cee6bb ("mm, page_alloc: remove redundant checks from alloc
fastpath").

To fix it, check NULL pointer for preferred_zoneref->zone.

Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com
Fixes: 387ba26fb1 ("mm/page_alloc: add a bulk page allocator")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Yafang Shao
0740e54304 mm, doc: update read_ahead_kb for MADV_HUGEPAGE
MADV_HUGEPAGE is a new addition to readahead with behavior distinct from
normal pages.  To prevent confusion, we should update the documentation
accordingly.

Link: https://lkml.kernel.org/r/20241113150711.1685-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Dan Carpenter
669b0cb81e fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
The "arg->vec_len" variable is a u64 that comes from the user at the start
of the function.  The "arg->vec_len * sizeof(struct page_region))"
multiplication can lead to integer wrapping.  Use size_mul() to avoid
that.

Also the size_add/mul() functions work on unsigned long so for 32bit
systems we need to ensure that "arg->vec_len" fits in an unsigned long.

Link: https://lkml.kernel.org/r/39d41335-dd4d-48ed-8a7f-402c57d8ea84@stanley.mountain
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Qun-Wei Lin
fd7b4f9f46 sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags.  This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.

Example of the warning:

ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp : ffff800082ea7b40
x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
Call trace:
 __debug_object_init+0x330/0x364
 debug_object_init_on_stack+0x30/0x3c
 schedule_hrtimeout_range_clock+0xac/0x26c
 schedule_hrtimeout+0x1c/0x30
 wait_task_inactive+0x1d4/0x25c
 kthread_bind_mask+0x28/0x98
 init_rescuer+0x1e8/0x280
 workqueue_init+0x1a0/0x3cc
 kernel_init_freeable+0x118/0x200
 kernel_init+0x28/0x1f0
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------

Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com
Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Andrew Yang <andrew.yang@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Casper Li <casper.li@mediatek.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Dave Vasilevsky
31daa34315 crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware. 
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.

Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them.  Users booting via some other mechanism can still
turn it on explicitly.

Does not change the default on any other architectures for the
time being.

Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes: 75bc255a74 ("crash: clean up kdump related config items")
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Jann Horn
a4a282daf1 mm/mremap: fix address wraparound in move_page_tables()
On 32-bit platforms, it is possible for the expression `len + old_addr <
old_end` to be false-positive if `len + old_addr` wraps around. 
`old_addr` is the cursor in the old range up to which page table entries
have been moved; so if the operation succeeded, `old_addr` is the *end* of
the old region, and adding `len` to it can wrap.

The overflow causes mremap() to mistakenly believe that PTEs have been
copied; the consequence is that mremap() bails out, but doesn't move the
PTEs back before the new VMA is unmapped, causing anonymous pages in the
region to be lost.  So basically if userspace tries to mremap() a
private-anon region and hits this bug, mremap() will return an error and
the private-anon region's contents appear to have been zeroed.

The idea of this check is that `old_end - len` is the original start
address, and writing the check that way also makes it easier to read; so
fix the check by rearranging the comparison accordingly.

(An alternate fix would be to refactor this function by introducing an
"orig_old_start" variable or such.)


Tested in a VM with a 32-bit X86 kernel; without the patch:

```
user@horn:~/big_mremap$ cat test.c
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <err.h>
#include <sys/mman.h>

#define ADDR1 ((void*)0x60000000)
#define ADDR2 ((void*)0x10000000)
#define SIZE          0x50000000uL

int main(void) {
  unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE,
      MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
  if (p1 == MAP_FAILED)
    err(1, "mmap 1");
  unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE,
      MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
  if (p2 == MAP_FAILED)
    err(1, "mmap 2");
  *p1 = 0x41;
  printf("first char is 0x%02hhx\n", *p1);
  unsigned char *p3 = mremap(p1, SIZE, SIZE,
      MREMAP_MAYMOVE|MREMAP_FIXED, p2);
  if (p3 == MAP_FAILED) {
    printf("mremap() failed; first char is 0x%02hhx\n", *p1);
  } else {
    printf("mremap() succeeded; first char is 0x%02hhx\n", *p3);
  }
}
user@horn:~/big_mremap$ gcc -static -o test test.c
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() failed; first char is 0x00
```

With the patch:

```
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() succeeded; first char is 0x41
```

Link: https://lkml.kernel.org/r/20241111-fix-mremap-32bit-wrap-v1-1-61d6be73b722@google.com
Fixes: af8ca1c149 ("mm/mremap: optimize the start addresses in move_page_tables()")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Motiejus JakÅ`tys
a39326767c tools/mm: fix compile error
Add a missing semicolon.

Link: https://lkml.kernel.org/r/20241112171655.1662670-1-motiejus@jakstys.lt
Fixes: ece5897e5a ("tools/mm: -Werror fixes in page-types/slabinfo")
Signed-off-by: Motiejus JakÅ`tys <motiejus@jakstys.lt>
Closes: https://github.com/NixOS/nixpkgs/issues/355369
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Wladislav Wiebe <wladislav.kw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Kairui Song
0ec8bc9e88 mm, swap: fix allocation and scanning race with swapoff
There are two flags used to synchronize allocation and scanning with
swapoff: SWP_WRITEOK and SWP_SCANNING.

SWP_WRITEOK: Swapoff will first unset this flag, at this point any further
swap allocation or scanning on this device should just abort so no more
new entries will be referencing this device.  Swapoff will then unuse all
existing swap entries.

SWP_SCANNING: This flag is set when device is being scanned.  Swapoff will
wait for all scanner to stop before the final release of the swap device
structures to avoid UAF.  Note this flag is the highest used bit of
si->flags so it could be added up arithmetically, if there are multiple
scanner.

commit 5f843a9a3a ("mm: swap: separate SSD allocation from
scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while
separating cluster allocation path from the old allocation path.  Add the
flags back to fix swapoff race.  The race is hard to trigger as si->lock
prevents most parallel operations, but si->lock could be dropped for
reclaim or discard.  This issue is found during code review.

This commit fixes this problem.  For SWP_SCANNING, Just like before, set
the flag before scan and remove it afterwards.

For SWP_WRITEOK, there are several places where si->lock could be dropped,
it will be error-prone and make the code hard to follow if we try to cover
these places one by one.  So just do one check before the real allocation,
which is also very similar like before.  With new cluster allocator it may
waste a bit of time iterating the clusters but won't take long, and
swapoff is not performance sensitive.

Link: https://lkml.kernel.org/r/20241112083414.78174-1-ryncsn@gmail.com
Fixes: 5f843a9a3a ("mm: swap: separate SSD allocation from scan_swap_map_slots()")
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/linux-mm/87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com/
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 15:25:07 -08:00
Dave Airlie
1eb0de899b Merge tag 'amd-drm-fixes-6.12-2024-11-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-11-14:

amdgpu:
- PSR fix
- Panel replay fixes
- DML fix
- vblank power fix
- Fix video caps
- SMU 14.0 fix
- GPUVM fix
- MES 12 fix
- APU carve out fix
- DC vbios fix
- NBIO fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114143401.448210-1-alexander.deucher@amd.com
2024-11-15 06:48:57 +10:00
Dave Airlie
99d051c4b3 Merge tag 'drm-misc-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:

bridge:
- tc358768: Fix DSI command tx

nouveau:
- Fix GSP AUX error handling
- dp: Handle retires for AUX CH transfers with GSP
- fw: Sync DMA after setup

panthor:
- Fix partial BO mappings to GPU

rockchip:
- vop: Avoid null-ptr deref in plane-state check

vmwgfx:
- Avoid null-ptr deref in surface creation

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114142256.GA86810@2a02-2454-fd5e-fd00-4ce-489-4b34-bd1a.dyn6.pyur.net
2024-11-15 06:40:39 +10:00
Dave Airlie
6b76bf8f3b Merge tag 'drm-intel-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Don't load GSC on ARL-H and ARL-U if too old FW
- Avoid potential OOPS in enabling/disabling TV output

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZzWksU6CMGLPfjkT@jlahtine-mobl.ger.corp.intel.com
2024-11-15 06:18:35 +10:00
Tejun Heo
a4af89cc50 sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f708 ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
2024-11-14 08:50:58 -10:00
Linus Torvalds
cfaaa7d010 Merge tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth.

  Quite calm week. No new regression under investigation.

  Current release - regressions:

   - eth: revert "igb: Disable threaded IRQ for igb_msix_other"

  Current release - new code bugs:

   - bluetooth: btintel: direct exception event to bluetooth stack

  Previous releases - regressions:

   - core: fix data-races around sk->sk_forward_alloc

   - netlink: terminate outstanding dump on socket close

   - mptcp: error out earlier on disconnect

   - vsock: fix accept_queue memory leak

   - phylink: ensure PHY momentary link-fails are handled

   - eth: mlx5:
      - fix null-ptr-deref in add rule err flow
      - lock FTE when checking if active

   - eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol

  Previous releases - always broken:

   - sched: fix u32's systematic failure to free IDR entries for hnodes.

   - sctp: fix possible UAF in sctp_v6_available()

   - eth: bonding: add ns target multicast address to slave device

   - eth: mlx5: fix msix vectors to respect platform limit

   - eth: icssg-prueth: fix 1 PPS sync"

* tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  net: sched: u32: Add test case for systematic hnode IDR leaks
  selftests: bonding: add ns multicast group testing
  bonding: add ns target multicast address to slave device
  net: ti: icssg-prueth: Fix 1 PPS sync
  stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
  net: Make copy_safe_from_sockptr() match documentation
  net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
  ipmr: Fix access to mfc_cache_list without lock held
  samples: pktgen: correct dev to DEV
  net: phylink: ensure PHY momentary link-fails are handled
  mptcp: pm: use _rcu variant under rcu_read_lock
  mptcp: hold pm lock when deleting entry
  mptcp: update local address flags when setting it
  net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
  MAINTAINERS: Re-add cancelled Renesas driver sections
  Revert "igb: Disable threaded IRQ for igb_msix_other"
  Bluetooth: btintel: Direct exception event to bluetooth stack
  Bluetooth: hci_core: Fix calling mgmt_device_connected
  virtio/vsock: Improve MSG_ZEROCOPY error handling
  vsock: Fix sk_error_queue memory leak
  ...
2024-11-14 10:05:33 -08:00
Linus Torvalds
4abcd80f23 Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
 "This fixes one minor regression from the btree cache fixes (in the
  scan_for_btree_nodes repair path) - and the shutdown path fix is the
  big one here, in terms of bugs closed:

   - Assorted tiny syzbot fixes

   - Shutdown path fix: "bch2_btree_write_buffer_flush_going_ro()"

     The shutdown path wasn't flushing the btree write buffer, leading
     to shutting down while we still had operations in flight. This
     fixes a whole slew of syzbot bugs, and undoubtedly other strange
     heisenbugs.

* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix assertion pop in bch2_ptr_swab()
  bcachefs: Fix journal_entry_dev_usage_to_text() overrun
  bcachefs: Allow for unknown key types in backpointers fsck
  bcachefs: Fix assertion pop in topology repair
  bcachefs: Fix hidden btree errors when reading roots
  bcachefs: Fix validate_bset() repair path
  bcachefs: Fix missing validation for bch_backpointer.level
  bcachefs: Fix bch_member.btree_bitmap_shift validation
  bcachefs: bch2_btree_write_buffer_flush_going_ro()
2024-11-14 10:00:23 -08:00
Steven Rostedt
09663753bb tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recording
The events of a memory mapped ring buffer from the previous boot should
not be mixed in with events from the current boot. There's meta data that
is used to handle KASLR so that function names can be shown properly.

Also, since the timestamps of the previous boot have no meaning to the
timestamps of the current boot, having them intermingled in a buffer can
also cause confusion because there could possibly be events in the future.

When a trace is activated the meta data is reset so that the pointers of
are now processed for the new address space. The trace buffers are reset
when tracing starts for the first time. The problem here is that the reset
only happens on online CPUs. If a CPU is offline, it does not get reset.

To demonstrate the issue, a previous boot had tracing enabled in the boot
mapped ring buffer on reboot. On the following boot, tracing has not been
started yet so the function trace from the previous boot is still visible.

 # trace-cmd show -B boot_mapped -c 3 | tail
          <idle>-0       [003] d.h2.   156.462395: __rcu_read_lock <-cpu_emergency_disable_virtualization
          <idle>-0       [003] d.h2.   156.462396: vmx_emergency_disable_virtualization_cpu <-cpu_emergency_disable_virtualization
          <idle>-0       [003] d.h2.   156.462396: __rcu_read_unlock <-__sysvec_reboot
          <idle>-0       [003] d.h2.   156.462397: stop_this_cpu <-__sysvec_reboot
          <idle>-0       [003] d.h2.   156.462397: set_cpu_online <-stop_this_cpu
          <idle>-0       [003] d.h2.   156.462397: disable_local_APIC <-stop_this_cpu
          <idle>-0       [003] d.h2.   156.462398: clear_local_APIC <-disable_local_APIC
          <idle>-0       [003] d.h2.   156.462574: mcheck_cpu_clear <-stop_this_cpu
          <idle>-0       [003] d.h2.   156.462575: mce_intel_feature_clear <-stop_this_cpu
          <idle>-0       [003] d.h2.   156.462575: lmce_supported <-mce_intel_feature_clear

Now, if CPU 3 is taken offline, and tracing is started on the memory
mapped ring buffer, the events from the previous boot in the CPU 3 ring
buffer is not reset. Now those events are using the meta data from the
current boot and produces just hex values.

 # echo 0 > /sys/devices/system/cpu/cpu3/online
 # trace-cmd start -B boot_mapped -p function
 # trace-cmd show -B boot_mapped -c 3 | tail
          <idle>-0       [003] d.h2.   156.462395: 0xffffffff9a1e3194 <-0xffffffff9a0f655e
          <idle>-0       [003] d.h2.   156.462396: 0xffffffff9a0a1d24 <-0xffffffff9a0f656f
          <idle>-0       [003] d.h2.   156.462396: 0xffffffff9a1e6bc4 <-0xffffffff9a0f7323
          <idle>-0       [003] d.h2.   156.462397: 0xffffffff9a0d12b4 <-0xffffffff9a0f732a
          <idle>-0       [003] d.h2.   156.462397: 0xffffffff9a1458d4 <-0xffffffff9a0d12e2
          <idle>-0       [003] d.h2.   156.462397: 0xffffffff9a0faed4 <-0xffffffff9a0d12e7
          <idle>-0       [003] d.h2.   156.462398: 0xffffffff9a0faaf4 <-0xffffffff9a0faef2
          <idle>-0       [003] d.h2.   156.462574: 0xffffffff9a0e3444 <-0xffffffff9a0d12ef
          <idle>-0       [003] d.h2.   156.462575: 0xffffffff9a0e4964 <-0xffffffff9a0d12ef
          <idle>-0       [003] d.h2.   156.462575: 0xffffffff9a0e3fb0 <-0xffffffff9a0e496f

Reset all CPUs when starting a boot mapped ring buffer for the first time,
and not just the online CPUs.

Fixes: 7a1d1e4b96 ("tracing/ring-buffer: Add last_boot_info file to boot instance")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-14 11:54:34 -05:00
Christian Brauner
aefff51e1c statmount: retrieve security mount options
Add the ability to retrieve security mount options. Keep them separate
from filesystem specific mount options so it's easy to tell them apart.
Also allow to retrieve them separate from other mount options as most of
the time users won't be interested in security specific mount options.

Link: https://lore.kernel.org/r/20241114-radtour-ofenrohr-ff34b567b40a@brauner
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-14 17:03:25 +01:00
Takashi Iwai
5ec23a1b53 Merge tag 'asoc-fix-v6.12-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.12

Some last updates for v6.12, one quirk plus a couple of fixes.  One is a
minor fix for a relatively obscure driver and the other is a relatively
important fix for boot hangs with some audio graph based cards.
2024-11-14 16:40:15 +01:00
Josef Bacik
7d493a5ecc btrfs: fix incorrect comparison for delayed refs
When I reworked delayed ref comparison in cf4f04325b ("btrfs: move
->parent and ->ref_root into btrfs_delayed_ref_node"), I made a mistake
and returned -1 for the case where ref1->ref_root was > than
ref2->ref_root.  This is a subtle bug that can result in improper
delayed ref running order, which can result in transaction aborts.

Fixes: cf4f04325b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
CC: stable@vger.kernel.org # 6.10+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-14 16:11:02 +01:00
Steven Rostedt
580bb355bc Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"
A crash happened when testing cpu hotplug with respect to the memory
mapped ring buffers. It was assumed that the hot plug code was adding a
per CPU buffer that was already created that caused the crash. The real
problem was due to ref counting and was fixed by commit 2cf9733891
("ring-buffer: Fix refcount setting of boot mapped buffers").

When a per CPU buffer is created, it will not be created again even with
CPU hotplug, so the fix to not use CPU hotplug was a red herring. In fact,
it caused only the boot CPU buffer to be created, leaving the other CPU
per CPU buffers disabled.

Revert that change as it was not the culprit of the fix it was intended to
be.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241113230839.6c03640f@gandalf.local.home
Fixes: 912da2c384 ("ring-buffer: Do not have boot mapped buffers hook to CPU hotplug")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-14 10:01:00 -05:00
Alexandre Ferrieux
ca34aceb32 net: sched: u32: Add test case for systematic hnode IDR leaks
Add a tdc test case to exercise the just-fixed systematic leak of
IDR entries in u32 hnode disposal. Given the IDR in question is
confined to the range [1..0x7FF], it is sufficient to create/delete
the same filter 2048 times to fill it up and get a nonzero exit
status from "tc filter add".

Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20241113100428.360460-1-alexandre.ferrieux@orange.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14 11:39:17 +01:00
Francesco Dolcini
32c4514455 drm/bridge: tc358768: Fix DSI command tx
Wait for the command transmission to be completed in the DSI transfer
function polling for the dc_start bit to go back to idle state after the
transmission is started.

This is documented in the datasheet and failures to do so lead to
commands corruption.

Fixes: ff1ca6397b ("drm/bridge: Add tc358768 driver")
Cc: stable@vger.kernel.org
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240926141246.48282-1-francesco@dolcini.it
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240926141246.48282-1-francesco@dolcini.it
2024-11-14 11:29:42 +01:00
Paolo Abeni
f8d670b1ae Merge branch 'bonding-fix-ns-targets-not-work-on-hardware-nic'
Hangbin Liu says:

====================
bonding: fix ns targets not work on hardware NIC

The first patch fixed ns targets not work on hardware NIC when bonding
set arp_validate.

The second patch add a related selftest for bonding.

v4: Thanks Nikolay for the comments:
    use bond_slave_ns_maddrs_{add/del} with clear name
    fix comments typos
    remove _slave_set_ns_maddrs underscore directly
    update bond_option_arp_validate_set() change logic
v3: use ndisc_mc_map to convert the mcast mac address (Jay Vosburgh)
v2: only add/del mcast group on backup slaves when arp_validate is set (Jay Vosburgh)
    arp_validate doesn't support 3ad, tlb, alb. So let's only do it on ab mode.
====================

Link: https://patch.msgid.link/20241111101650.27685-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14 11:16:30 +01:00
Hangbin Liu
86fb6173d1 selftests: bonding: add ns multicast group testing
Add a test to make sure the backup slaves join correct multicast group
when arp_validate enabled and ns_ip6_target is set. Here is the result:

TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14 11:16:28 +01:00
Hangbin Liu
8eb36164d1 bonding: add ns target multicast address to slave device
Commit 4598380f9c ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.

For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.

To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.

There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
   arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
   need to be guarded by arp_validate.

Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14 11:16:28 +01:00