Commit Graph

142770 Commits

Author SHA1 Message Date
Linus Torvalds
06cff4a58e Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "The highlights this time are support for dynamically enabling and
  disabling Clang's Shadow Call Stack at boot and a long-awaited
  optimisation to the way in which we handle the SVE register state on
  system call entry to avoid taking unnecessary traps from userspace.

  Summary:

  ACPI:
   - Enable FPDT support for boot-time profiling
   - Fix CPU PMU probing to work better with PREEMPT_RT
   - Update SMMUv3 MSI DeviceID parsing to latest IORT spec
   - APMT support for probing Arm CoreSight PMU devices

  CPU features:
   - Advertise new SVE instructions (v2.1)
   - Advertise range prefetch instruction
   - Advertise CSSC ("Common Short Sequence Compression") scalar
     instructions, adding things like min, max, abs, popcount
   - Enable DIT (Data Independent Timing) when running in the kernel
   - More conversion of system register fields over to the generated
     header

  CPU misfeatures:
   - Workaround for Cortex-A715 erratum #2645198

  Dynamic SCS:
   - Support for dynamic shadow call stacks to allow switching at
     runtime between Clang's SCS implementation and the CPU's pointer
     authentication feature when it is supported (complete with scary
     DWARF parser!)

  Tracing and debug:
   - Remove static ftrace in favour of, err, dynamic ftrace!
   - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
     and existing arch code
   - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
     old FTRACE_WITH_REGS
   - Extend 'crashkernel=' parameter with default value and fallback to
     placement above 4G physical if initial (low) allocation fails

  SVE:
   - Optimisation to avoid disabling SVE unconditionally on syscall
     entry and just zeroing the non-shared state on return instead

  Exceptions:
   - Rework of undefined instruction handling to avoid serialisation on
     global lock (this includes emulation of user accesses to the ID
     registers)

  Perf and PMU:
   - Support for TLP filters in Hisilicon's PCIe PMU device
   - Support for the DDR PMU present in Amlogic Meson G12 SoCs
   - Support for the terribly-named "CoreSight PMU" architecture from
     Arm (and Nvidia's implementation of said architecture)

  Misc:
   - Tighten up our boot protocol for systems with memory above 52 bits
     physical
   - Const-ify static keys to satisty jump label asm constraints
   - Trivial FFA driver cleanups in preparation for v1.1 support
   - Export the kernel_neon_* APIs as GPL symbols
   - Harden our instruction generation routines against instrumentation
   - A bunch of robustness improvements to our arch-specific selftests
   - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
  arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
  arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
  arm64: Prohibit instrumentation on arch_stack_walk()
  arm64:uprobe fix the uprobe SWBP_INSN in big-endian
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
  kselftest/arm64: Allow epoll_wait() to return more than one result
  kselftest/arm64: Don't drain output while spawning children
  kselftest/arm64: Hold fp-stress children until they're all spawned
  arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
  arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
  arm64/sysreg: Convert MVFR2_EL1 to automatic generation
  arm64/sysreg: Convert MVFR1_EL1 to automatic generation
  arm64/sysreg: Convert MVFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
  ...
2022-12-12 09:50:05 -08:00
Linus Torvalds
7d62159919 Merge tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav
   Kohli)

 - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli)

 - Allow IRQ remapping to work without x2apic (Nuno Das Neves)

 - Fix comments (Olaf Hering)

 - Expand hv_vp_assist_page definition (Saurabh Sengar)

 - Improvement to page reporting (Shradha Gupta)

 - Make sure TSC clocksource works when Linux runs as the root partition
   (Stanislav Kinsburskiy)

* tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Remove unregister syscore call from Hyper-V cleanup
  iommu/hyper-v: Allow hyperv irq remapping without x2apic
  clocksource: hyper-v: Add TSC page support for root partition
  clocksource: hyper-v: Use TSC PFN getter to map vvar page
  clocksource: hyper-v: Introduce TSC PFN getter
  clocksource: hyper-v: Introduce a pointer to TSC page
  x86/hyperv: Expand definition of struct hv_vp_assist_page
  PCI: hv: update comment in x86 specific hv_arch_irq_unmask
  hv: fix comment typo in vmbus_channel/low_latency
  drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  video: hyperv_fb: Avoid taking busy spinlock on panic path
  hv_balloon: Add support for configurable order free page reporting
  mm/page_reporting: Add checks for page_reporting_order param
2022-12-12 09:34:16 -08:00
Linus Torvalds
43686598b5 Merge tag 'tpmdd-next-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
 "A random collection of TPM fixes and one bug fix for trusted keys"

* tag 'tpmdd-next-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: st33zp24: remove pointless checks on probe
  tpm/tpm_crb: Fix error message in __crb_relinquish_locality()
  tpm/tpm_ftpm_tee: Fix error handling in ftpm_mod_init()
  tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak
  tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak
  tpm: acpi: Call acpi_put_table() to fix memory leak
  tpm: Add flag to use default cancellation policy
  tpm: tis_i2c: Fix sanity check interrupt enable mask
  KEYS: trusted: tee: Make registered shm dependency explicit
  tpm: Avoid function type cast of put_device()
  tpm: st33zp24: switch to using gpiod API
  tpm: st33zp24: drop support for platform data
2022-12-12 09:24:26 -08:00
Linus Torvalds
893660b0e1 Merge tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - SLOB deprecation and SLUB_TINY

   The SLOB allocator adds maintenance burden and stands in the way of
   API improvements [1]. Deprecate it by renaming the config option (to
   make users notice) to CONFIG_SLOB_DEPRECATED with updated help text.
   SLUB should be used instead as SLAB will be the next on the removal
   list.

   Based on reports from a riscv k210 board with 8MB RAM, add a
   CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the
   expense of scalability. This has resolved the k210 regression [2] so
   in case there are no others (that wouldn't be resolvable by further
   tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles.

   Existing defconfigs with CONFIG_SLOB are converted to
   CONFIG_SLUB_TINY.

 - kmalloc() slub_debug redzone improvements

   A series from Feng Tang that builds on the tracking or requested size
   for kmalloc() allocations (for caches with debugging enabled) added
   in 6.1, to make redzone checks consider the requested size and not
   the rounded up one, in order to catch more subtle buffer overruns.
   Includes new slub_kunit test.

 - struct slab fields reordering to accomodate larger rcu_head

   RCU folks would like to grow rcu_head with debugging options, which
   breaks current struct slab layout's assumptions, so reorganize it to
   make this possible.

 - Miscellaneous improvements/fixes:
     - __alloc_size checking compiler workaround (Kees Cook)
     - Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes)
     - Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina)
     - Correct SLUB's percpu allocation estimates (Baoquan He)
     - Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov)
     - Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao)
     - Dead code removal in SLUB (Hyeonggon Yoo)

* tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits)
  mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED
  mm, slub: don't aggressively inline with CONFIG_SLUB_TINY
  mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
  mm, slub: split out allocations from pre/post hooks
  mm/slub, kunit: Add a test case for kmalloc redzone check
  mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation
  mm, slub: refactor free debug processing
  mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY
  mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
  mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY
  mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY
  mm, slub: disable SYSFS support with CONFIG_SLUB_TINY
  mm, slub: add CONFIG_SLUB_TINY
  mm, slab: ignore hardened usercopy parameters when disabled
  slab: Remove special-casing of const 0 size allocations
  slab: Clean up SLOB vs kmalloc() definition
  mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head
  mm/migrate: make isolate_movable_page() skip slab pages
  mm/slab: move and adjust kernel-doc for kmem_cache_alloc
  mm/slub, percpu: correct the calculation of early percpu allocation size
  ...
2022-12-12 09:13:06 -08:00
Linus Torvalds
98d0052d0d Merge tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of
   this_cpu_inc() on strong load-store architectures.

 - Introduce new console_list_lock to synchronize a manipulation of the
   list of registered consoles and their flags.

   This is a first step in removing the big-kernel-lock-like behavior of
   console_lock(). This semaphore still serializes console->write()
   calbacks against:

      - each other. It primary prevents potential races between early
        and proper console drivers using the same device.

      - suspend()/resume() callbacks and init() operations in some
        drivers.

      - various other operations in the tty/vt and framebufer
        susbsystems. It is likely that console_lock() serializes even
        operations that are not directly conflicting with the
        console->write() callbacks here. This is the most complicated
        big-kernel-lock aspect of the console_lock() that will be hard
        to untangle.

 - Introduce new console_srcu lock that is used to safely iterate and
   access the registered console drivers under SRCU read lock.

   This is a prerequisite for introducing atomic console drivers and
   console kthreads. It will reduce the complexity of serialization
   against normal consoles and console_lock(). Also it should remove the
   risk of deadlock during critical situations, like Oops or panic, when
   only atomic consoles are registered.

 - Check whether the console is registered instead of enabled on many
   locations. It was a historical leftover.

 - Cleanly force a preferred console in xenfb code instead of a dirty
   hack.

 - A lot of code and comment clean ups and improvements.

* tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits)
  printk: htmldocs: add missing description
  tty: serial: sh-sci: use setup() callback for early console
  printk: relieve console_lock of list synchronization duties
  tty: serial: kgdboc: use console_list_lock to trap exit
  tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()
  tty: serial: kgdboc: use console_list_lock for list traversal
  tty: serial: kgdboc: use srcu console list iterator
  proc: consoles: use console_list_lock for list iteration
  tty: tty_io: use console_list_lock for list synchronization
  printk, xen: fbfront: create/use safe function for forcing preferred
  netconsole: avoid CON_ENABLED misuse to track registration
  usb: early: xhci-dbc: use console_is_registered()
  tty: serial: xilinx_uartps: use console_is_registered()
  tty: serial: samsung_tty: use console_is_registered()
  tty: serial: pic32_uart: use console_is_registered()
  tty: serial: earlycon: use console_is_registered()
  tty: hvc: use console_is_registered()
  efi: earlycon: use console_is_registered()
  tty: nfcon: use console_is_registered()
  serial_core: replace uart_console_enabled() with uart_console_registered()
  ...
2022-12-12 09:01:36 -08:00
Linus Torvalds
73fa58dca8 Merge tag 'locks-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
 "The main change here is to add the new locks_inode_context helper, and
  convert all of the places that dereference inode->i_flctx directly to
  use that instead.

  There is a new helper to indicate whether any locks are held on an
  inode. This is mostly for Ceph but may be usable elsewhere too.

  Andi Kleen requested that we print the PID when the LOCK_MAND warning
  fires, to help track down applications trying to use it.

  Finally, we added some new warnings to some of the file locking
  functions that fire when the ->fl_file and filp arguments differ. This
  helped us find some long-standing bugs in lockd. Patches for those are
  in Chuck Lever's tree and should be in his v6.2 PR. After that patch,
  people using NFSv2/v3 locking may see some warnings fire until those
  go in.

  Happy Holidays!"

* tag 'locks-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  Add process name and pid to locks warning
  nfsd: use locks_inode_context helper
  nfs: use locks_inode_context helper
  lockd: use locks_inode_context helper
  ksmbd: use locks_inode_context helper
  cifs: use locks_inode_context helper
  ceph: use locks_inode_context helper
  filelock: add a new locks_inode_context accessor function
  filelock: new helper: vfs_inode_has_locks
  filelock: WARN_ON_ONCE when ->fl_file and filp don't match
2022-12-12 08:52:53 -08:00
Linus Torvalds
7fc035058e Merge tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
 "Most are small refactorings and bug fixes, but three things stand out:
  switching timens (which got reverted before) looks solid now,
  FOLL_FORCE has been removed (no failures seen yet across several weeks
  in -next), and some whitespace cleanups (which are long overdue).

   - Add timens support (when switching mm). This version has survived
     in -next for the entire cycle (Andrei Vagin)

   - Various small bug fixes, refactoring, and readability improvements
     (Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin)

   - Remove FOLL_FORCE for stack setup (Kees Cook)

   - Whitespace cleanups (Rolf Eike Beer, Kees Cook)"

* tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_misc: fix shift-out-of-bounds in check_special_flags
  binfmt: Fix error return code in load_elf_fdpic_binary()
  exec: Remove FOLL_FORCE for stack setup
  binfmt_elf: replace IS_ERR() with IS_ERR_VALUE()
  binfmt_elf: simplify error handling in load_elf_phdrs()
  binfmt_elf: fix documented return value for load_elf_phdrs()
  exec: simplify initial stack size expansion
  binfmt: Fix whitespace issues
  exec: Add comments on check_unsafe_exec() fs counting
  ELF uapi: add spaces before '{'
  selftests/timens: add a test for vfork+exit
  fs/exec: switch timens when a task gets a new mm
2022-12-12 08:42:29 -08:00
Linus Torvalds
667161ba0a Merge tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:

 - Add missing kerndoc parameter (Randy Dunlap)

 - Improve seccomp selftest to check CAP_SYS_ADMIN (Gautam Menghani)

 - Fix allocation leak when cloned thread immediately dies (Kuniyuki
   Iwashima)

* tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: document the "filter_count" field
  seccomp: Move copy_seccomp() to no failure path.
  selftests/seccomp: Check CAP_SYS_ADMIN capability in the test mode_filter_without_nnp
2022-12-12 08:34:05 -08:00
Linus Torvalds
059c4a341d Merge tag 'pstore-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
 "A small collection of bug fixes, refactorings, and general
  improvements:

   - Reporting improvements and return path fixes (Guilherme G. Piccoli,
     Wang Yufen, Kees Cook)

   - Clean up kmsg_bytes module parameter usage (Guilherme G. Piccoli)

   - Add Guilherme to pstore MAINTAINERS entry

   - Choose friendlier allocation flags (Qiujun Huang, Stephen Boyd)"

* tag 'pstore-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: Avoid kcore oops by vmap()ing with VM_IOREMAP
  pstore/ram: Fix error return code in ramoops_probe()
  pstore: Alert on backend write error
  MAINTAINERS: Update pstore maintainers
  pstore/ram: Set freed addresses to NULL
  pstore/ram: Move internal definitions out of kernel-wide include
  pstore/ram: Move pmsg init earlier
  pstore/ram: Consolidate kfree() paths
  efi: pstore: Follow convention for the efi-pstore backend name
  pstore: Inform unregistered backend names as well
  pstore: Expose kmsg_bytes as a module parameter
  pstore: Improve error reporting in case of backend overlap
  pstore/zone: Use GFP_ATOMIC to allocate zone buffer
2022-12-12 08:31:13 -08:00
Linus Torvalds
1fab45ab6e Merge tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Documentation updates. This is the second in a series from an ongoing
   review of the RCU documentation.

 - Miscellaneous fixes.

 - Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
   that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
   CPU lists, causes call_rcu() to introduce delays.

   These delays result in significant power savings on nearly idle
   Android and ChromeOS systems. These savings range from a few percent
   to more than ten percent.

   This series also includes several commits that change call_rcu() to a
   new call_rcu_hurry() function that avoids these delays in a few
   cases, for example, where timely wakeups are required. Several of
   these are outside of RCU and thus have acks and reviews from the
   relevant maintainers.

 - Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
   for architectures that support NMIs, but which do not provide
   NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
   by the upcoming lockless printk() work by John Ogness et al.

 - Changes providing minor but important increases in torture test
   coverage for the new RCU polled-grace-period APIs.

 - Changes to torturescript that avoid redundant kernel builds, thus
   providing about a 30% speedup for the torture.sh acceptance test.

* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
  net: devinet: Reduce refcount before grace period
  net: Use call_rcu_hurry() for dst_release()
  workqueue: Make queue_rcu_work() use call_rcu_hurry()
  percpu-refcount: Use call_rcu_hurry() for atomic switch
  scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
  rcu/rcutorture: Use call_rcu_hurry() where needed
  rcu/rcuscale: Use call_rcu_hurry() for async reader test
  rcu/sync: Use call_rcu_hurry() instead of call_rcu
  rcuscale: Add laziness and kfree tests
  rcu: Shrinker for lazy rcu
  rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
  rcu: Make call_rcu() lazy to save power
  rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
  srcu: Debug NMI safety even on archs that don't require it
  srcu: Explain the reason behind the read side critical section on GP start
  srcu: Warn when NMI-unsafe API is used in NMI
  arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
  arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
  rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
  rcu-tasks: Make grace-period-age message human-readable
  ...
2022-12-12 07:47:15 -08:00
Linus Torvalds
4cee37b3a4 Merge tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "Nine hotfixes.

  Six for MM, three for other areas. Four of these patches address
  post-6.0 issues"

* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  memcg: fix possible use-after-free in memcg_write_event_control()
  MAINTAINERS: update Muchun Song's email
  mm/gup: fix gup_pud_range() for dax
  mmap: fix do_brk_flags() modifying obviously incorrect VMAs
  mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
  tmpfs: fix data loss from failed fallocate
  kselftests: cgroup: update kmem test precision tolerance
  mm: do not BUG_ON missing brk mapping, because userspace can unmap it
  mailmap: update Matti Vaittinen's email address
2022-12-10 17:10:52 -08:00
Tejun Heo
4a7ba45b1a memcg: fix possible use-after-free in memcg_write_event_control()
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call.  As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file.  Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.

Prior to 347c4a8747 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses.  The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through.  With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.

Fix the bug by resurrecting the file type check in __file_cft().  Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection.  Instead, let's check the superblock
and dentry type.

Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a8747 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>	[3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09 18:41:17 -08:00
David Hildenbrand
630dc25e43 mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to
store a physical address.

On a 64bit system, both are 64bit wide.  However, on a 32bit system, the
latter might be 64bit wide.  This is, for example, the case on x86 with
PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans
32bit.

The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses
that case, and assumes that the maximum PFN is limited by an 32bit
phys_addr_t.  This implies, that SWP_PFN_BITS will currently only be able
to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong.

Let's rely on the number of bits in phys_addr_t instead, but make sure to
not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in
is_pfn_swap_entry() unhappy.  Note that swp_entry_t is effectively an
unsigned long and the maximum swap offset shares that value with the swap
type.

For example, on an 8 GiB x86 PAE system with a kernel config based on
Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently
fail removing migration entries (remove_migration_ptes()), because
mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as
swp_offset_pfn() wrongly masks off PFN bits.  For example,
split_huge_page_to_list()->...->remap_page() will leave migration entries
in place and continue to unlock the page.

Later, when we stumble over these migration entries (e.g., via
/proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these
migration entries shouldn't exist anymore and the page was unlocked.

[   33.067591] kernel BUG at include/linux/swapops.h:497!
[   33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G            E      6.1.0-rc8+ #16
[   33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[   33.067606] EIP: pagemap_pmd_range+0x644/0x650
[   33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31
[   33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000
[   33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68
[   33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[   33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0
[   33.067625] Call Trace:
[   33.067628]  ? madvise_free_pte_range+0x720/0x720
[   33.067632]  ? smaps_pte_range+0x4b0/0x4b0
[   33.067634]  walk_pgd_range+0x325/0x720
[   33.067637]  ? mt_find+0x1d6/0x3a0
[   33.067641]  ? mt_find+0x1d6/0x3a0
[   33.067643]  __walk_page_range+0x164/0x170
[   33.067646]  walk_page_range+0xf9/0x170
[   33.067648]  ? __kmem_cache_alloc_node+0x2a8/0x340
[   33.067653]  pagemap_read+0x124/0x280
[   33.067658]  ? default_llseek+0x101/0x160
[   33.067662]  ? smaps_account+0x1d0/0x1d0
[   33.067664]  vfs_read+0x90/0x290
[   33.067667]  ? do_madvise.part.0+0x24b/0x390
[   33.067669]  ? debug_smp_processor_id+0x12/0x20
[   33.067673]  ksys_pread64+0x58/0x90
[   33.067675]  __ia32_sys_ia32_pread64+0x1b/0x20
[   33.067680]  __do_fast_syscall_32+0x4c/0xc0
[   33.067683]  do_fast_syscall_32+0x29/0x60
[   33.067686]  do_SYSENTER_32+0x15/0x20
[   33.067689]  entry_SYSENTER_32+0x98/0xf1

Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it
readable and consistent.

[david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead]
  Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com
[david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64]
  Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com
Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com
Fixes: 0d206b5d2e ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09 18:41:16 -08:00
Linus Torvalds
010b6761a9 Merge tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, can and netfilter.

  Current release - new code bugs:

   - bonding: ipv6: correct address used in Neighbour Advertisement
     parsing (src vs dst typo)

   - fec: properly scope IRQ coalesce setup during link up to supported
     chips only

  Previous releases - regressions:

   - Bluetooth fixes for fake CSR clones (knockoffs):
       - re-add ERR_DATA_REPORTING quirk
       - fix crash when device is replugged

   - Bluetooth:
       - silence a user-triggerable dmesg error message
       - L2CAP: fix u8 overflow, oob access
       - correct vendor codec definition
       - fix support for Read Local Supported Codecs V2

   - ti: am65-cpsw: fix RGMII configuration at SPEED_10

   - mana: fix race on per-CQ variable NAPI work_done

  Previous releases - always broken:

   - af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
     avoid null-deref

   - af_can: fix NULL pointer dereference in can_rcv_filter

   - can: slcan: fix UAF with a freed work

   - can: can327: flush TX_work on ldisc .close()

   - macsec: add missing attribute validation for offload

   - ipv6: avoid use-after-free in ip6_fragment()

   - nft_set_pipapo: actually validate intervals in fields after the
     first one

   - mvneta: prevent oob access in mvneta_config_rss()

   - ipv4: fix incorrect route flushing when table ID 0 is used, or when
     source address is deleted

   - phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"

* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
  s390/qeth: fix use-after-free in hsci
  macsec: add missing attribute validation for offload
  net: mvneta: Fix an out of bounds check
  net: thunderbolt: fix memory leak in tbnet_open()
  ipv6: avoid use-after-free in ip6_fragment()
  net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
  net: phy: mxl-gpy: add MDINT workaround
  net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
  xen/netback: don't call kfree_skb() under spin_lock_irqsave()
  dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
  ethernet: aeroflex: fix potential skb leak in greth_init_rings()
  tipc: call tipc_lxc_xmit without holding node_read_lock
  can: esd_usb: Allow REC and TEC to return to zero
  can: can327: flush TX_work on ldisc .close()
  can: slcan: fix freed work crash
  can: af_can: fix NULL pointer dereference in can_rcv_filter
  net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
  ipv4: Fix incorrect route flushing when table ID 0 is used
  ipv4: Fix incorrect route flushing when source address is deleted
  ...
2022-12-08 15:32:13 -08:00
Tejun Heo
fbf8321238 memcg: Fix possible use-after-free in memcg_write_event_control()
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call.  As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file.  Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.

Prior to 347c4a8747 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses.  The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently
dropped the file type check with it allowing any file to slip through.
With the invarients broken, the d_name and parent accesses can now race
against renames and removals of arbitrary files and cause
use-after-free's.

Fix the bug by resurrecting the file type check in __file_cft().  Now
that cgroupfs is implemented through kernfs, checking the file
operations needs to go through a layer of indirection.  Instead, let's
check the superblock and dentry type.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 347c4a8747 ("memcg: remove cgroup_event->cft")
Cc: stable@kernel.org # v3.14+
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-08 10:40:58 -08:00
Dmitry Torokhov
04593028d7 tpm: st33zp24: drop support for platform data
Drop support for platform data from the driver because there are no
users of st33zp24_platform_data structure in the mainline kernel.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2022-12-08 16:20:46 +00:00
Dave Wysochanski
b5b52de321 fscache: Fix oops due to race with cookie_lru and use_cookie
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but
the state machine has not run yet, it's possible another thread can call
fscache_use_cookie and begin to use it.

When the cookie_worker finally runs, it will see the LRU_DISCARD flag
set, transition the cookie->state to LRU_DISCARDING, which will then
withdraw the cookie.  Once the cookie is withdrawn the object is removed
the below oops will occur because the object associated with the cookie
is now NULL.

Fix the oops by clearing the LRU_DISCARD bit if another thread uses the
cookie before the cookie_worker runs.

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  ...
  CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G     E    6.0.0-5.dneg.x86_64 #1
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
  Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
  RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
  ...
  Call Trace:
    netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
    process_one_work+0x217/0x3e0
    worker_thread+0x4a/0x3b0
    kthread+0xd6/0x100

Fixes: 12bb21a29c ("fscache: Implement cookie user counting and resource pinning")
Reported-by: Daire Byrne <daire.byrne@gmail.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Link: https://lore.kernel.org/r/20221117115023.1350181-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20221117142915.1366990-1-dwysocha@redhat.com/ # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-07 11:49:18 -08:00
Will Deacon
10162e78ea Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (21 commits)
  arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
  drivers/perf: hisi: Add TLP filter support
  Documentation: perf: Indent filter options list of hisi-pcie-pmu
  docs: perf: Fix PMU instance name of hisi-pcie-pmu
  drivers/perf: hisi: Fix some event id for hisi-pcie-pmu
  arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
  perf/amlogic: Remove unused header inclusions of  <linux/version.h>
  perf/amlogic: Fix build error for x86_64 allmodconfig
  dt-binding: perf: Add Amlogic DDR PMU
  docs/perf: Add documentation for the Amlogic G12 DDR PMU
  perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
  MAINTAINERS: Update HiSilicon PMU maintainers
  perf: arm_cspmu: Fix module cyclic dependency
  perf: arm_cspmu: Fix build failure on x86_64
  perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s
  perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
  perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init()
  perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init()
  drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init()
  ...
2022-12-06 11:22:48 +00:00
Will Deacon
a4aebff7ef Merge branch 'for-next/ftrace' into for-next/core
* for-next/ftrace:
  ftrace: arm64: remove static ftrace
  ftrace: arm64: move from REGS to ARGS
  ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
  ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
  ftrace: pass fregs to arch_ftrace_set_direct_caller()
2022-12-06 11:07:39 +00:00
Will Deacon
5def4a9791 Merge branch 'for-next/ffa' into for-next/core
* for-next/ffa:
  firmware: arm_ffa: Move comment before the field it is documenting
  firmware: arm_ffa: Move constants to header file
2022-12-06 11:06:04 +00:00
Will Deacon
f6ffa4c8c1 Merge branch 'for-next/dynamic-scs' into for-next/core
* for-next/dynamic-scs:
  arm64: implement dynamic shadow call stack for Clang
  scs: add support for dynamic shadow call stacks
  arm64: unwind: add asynchronous unwind tables to kernel and modules
2022-12-06 11:01:49 +00:00
Linus Torvalds
2df2adc3e6 Merge tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix ambiguous TRIM and DISCARD args
   - Fix removal of debugfs file for mmc_test

  MMC host:
   - mtk-sd: Add missing clk_disable_unprepare() in an error path
   - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
   - sdhci-esdhc-imx: Fix CQHCI exit halt state check
   - sdhci-sprd: Fix voltage switch"

* tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix no reset data and command after voltage switch
  mmc: sdhci: Fix voltage switch delay
  mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
  mmc: mmc_test: Fix removal of debugfs file
  mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
  mmc: core: Fix ambiguous TRIM and DISCARD arg
2022-12-02 15:58:07 -08:00
Linus Torvalds
bdaa78c6aa Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
 "15 hotfixes,  11 marked cc:stable.

  Only three or four of the latter address post-6.0 issues, which is
  hopefully a sign that things are converging"

* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
  Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
  drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
  mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
  mm/khugepaged: fix GUP-fast interaction by sending IPI
  mm/khugepaged: take the right locks for page table retraction
  mm: migrate: fix THP's mapcount on isolation
  mm: introduce arch_has_hw_nonleaf_pmd_young()
  mm: add dummy pmd_young() for architectures not having it
  mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
  tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
  nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
  hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
  madvise: use zap_page_range_single for madvise dontneed
  mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02 13:39:38 -08:00
Chethan T N
93df7d56f1 Bluetooth: Remove codec id field in vendor codec definition
As per the specfication vendor codec id is defined.
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2127

Fixes: 9ae664028a ("Bluetooth: Add support for Read Local Supported Codecs V2")
Signed-off-by: Chethan T N <chethan.tumkur.narayan@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:31 -08:00
Ismael Ferreras Morezuelas
42d7731e3e Bluetooth: btusb: Fix CSR clones again by re-adding ERR_DATA_REPORTING quirk
A patch series by a Qualcomm engineer essentially removed my
quirk/workaround because they thought it was unnecessary.

It wasn't, and it broke everything again:

https://patchwork.kernel.org/project/netdevbpf/list/?series=661703&archive=both&state=*

He argues that the quirk is not necessary because the code should check
if the dongle says if it's supported or not. The problem is that for
these Chinese CSR clones they say that it would work:

= New Index: 00:00:00:00:00:00 (Primary,USB,hci0)
= Open Index: 00:00:00:00:00:00
< HCI Command: Read Local Version Information (0x04|0x0001) plen 0
> HCI Event: Command Complete (0x0e) plen 12
> [hci0] 11.276039
      Read Local Version Information (0x04|0x0001) ncmd 1
        Status: Success (0x00)
        HCI version: Bluetooth 5.0 (0x09) - Revision 2064 (0x0810)
        LMP version: Bluetooth 5.0 (0x09) - Subversion 8978 (0x2312)
        Manufacturer: Cambridge Silicon Radio (10)
...
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 68
> [hci0] 11.668030
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
        Commands: 163 entries
          ...
          Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
          Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
          ...
...
< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
= Close Index: 00:1A:7D:DA:71:XX

So bring it back wholesale.

Fixes: 63b1a7dd38 ("Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING")
Fixes: e168f69008 ("Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR")
Fixes: 766ae2422b ("Bluetooth: hci_sync: Check LMP feature bit instead of quirk")
Cc: stable@vger.kernel.org
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Tested-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Randy Dunlap
b9069728a7 seccomp: document the "filter_count" field
Add missing kernel-doc for the 'filter_count' field in struct seccomp.

seccomp.h:40: warning: Function parameter or member 'filter_count' not described in 'seccomp'

Fixes: c818c03b66 ("seccomp: Report number of loaded filters in /proc/$pid/status")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221202073953.14677-1-rdunlap@infradead.org
2022-12-02 11:33:48 -08:00
John Ogness
6f8836756f printk, xen: fbfront: create/use safe function for forcing preferred
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred_locked() and perform the full operation
under the console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-33-john.ogness@linutronix.de
2022-12-02 11:25:02 +01:00
John Ogness
452b9b2475 serial_core: replace uart_console_enabled() with uart_console_registered()
All users of uart_console_enabled() really want to know if a console
is registered. It is not reliable to check for CON_ENABLED in order
to identify if a console is registered. Use console_is_registered()
instead.

A _locked() variant is provided because uart_set_options() is always
called with the console_list_lock held and must check if a console
is registered in order to synchronize with kgdboc.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-23-john.ogness@linutronix.de
2022-12-02 11:25:01 +01:00
John Ogness
1fd4224a6b console: introduce console_is_registered()
Currently it is not possible for drivers to detect if they have
already successfully registered their console. Several drivers
have multiple paths that lead to console registration. To avoid
attempting a 2nd registration (which leads to a WARN), drivers
are implementing their own solution.

Introduce console_is_registered() so drivers can easily identify
if their console is currently registered. A _locked() variant
is also provided if the caller is already holding the
console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-22-john.ogness@linutronix.de
2022-12-02 11:25:01 +01:00
John Ogness
100bdef2c1 console: introduce wrappers to read/write console flags
After switching to SRCU for console list iteration, some readers
will begin readings console->flags as a data race. Locklessly
reading console->flags provides a consistent value because there
is at most one CPU modifying console->flags and that CPU is
using only read-modify-write operations.

Introduce a wrapper for SRCU iterators to read console flags.
Introduce a matching wrapper to write to flags of registered
consoles. Writing to flags of registered consoles is synchronized
by the console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-13-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
4dc64682ad printk: introduce console_list_lock
Currently there exist races in register_console(), where the types
of registered consoles are checked (without holding the console_lock)
and then after acquiring the console_lock, it is assumed that the list
has not changed. Also, some code that performs console_unregister()
make similar assumptions.

It might be possible to fix these races using the console_lock. But
it would require a complex analysis of all console drivers to make
sure that the console_lock is not taken in match() and setup()
callbacks. And we really prefer to split up and reduce the
responsibilities of console_lock rather than expand its complexity.
Therefore, introduce a new console_list_lock to provide full
synchronization for any console list changes.

In addition, also use console_list_lock for synchronization of
console->flags updates. All flags are either static or modified only
during the console registration. There are only two exceptions.

The first exception is CON_ENABLED, which is also modified by
console_start()/console_stop(). Therefore, these functions must
also take the console_list_lock.

The second exception is when the flags are modified by the console
driver init code before the console is registered. These will be
ignored because they are not visible to the rest of the system
via the console_drivers list.

Note that one of the various responsibilities of the console_lock is
also intended to provide console list and console->flags
synchronization. Later changes will update call sites relying on the
console_lock for these purposes. Once all call sites have been
updated, the console_lock will be relieved of synchronizing
console_list and console->flags updates.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/87sficwokr.fsf@jogness.linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
6c4afa7914 printk: Prepare for SRCU console list protection
Provide an NMI-safe SRCU protected variant to walk the console list.

Note that all console fields are now set before adding the console
to the list to avoid the console becoming visible by SCRU readers
before being fully initialized.

This is a preparatory change for a new console infrastructure which
operates independent of the console BKL.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-4-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
Thomas Gleixner
d9a4af5690 printk: Convert console_drivers list to hlist
Replace the open coded single linked list with a hlist so a conversion
to SRCU protected list walks can reuse the existing primitives.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-3-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
Will Deacon
c8e320b00a firmware: arm_ffa: Move comment before the field it is documenting
This is consistent with the other comments in the struct.

Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-3-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01 15:51:32 +00:00
Will Deacon
229d58e316 firmware: arm_ffa: Move constants to header file
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.

Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-2-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-12-01 15:51:31 +00:00
Florian Westphal
c25b7a7a56 inet: ping: use hlist_nulls rcu iterator during lookup
ping_lookup() does not acquire the table spinlock, so iteration should
use hlist_nulls_for_each_entry_rcu().

Spotted during code review.

Fixes: dbca1596bb ("ping: convert to RCU lookups, get rid of rwlock")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221129140644.28525-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-01 12:42:46 +01:00
Vlastimil Babka
dc19745ad0 Merge branch 'slub-tiny-v1r6' into slab/for-next
Merge my series [1] to deprecate the SLOB allocator.
- Renames CONFIG_SLOB to CONFIG_SLOB_DEPRECATED with deprecation notice.
- The recommended replacement is CONFIG_SLUB, optionally with the new
  CONFIG_SLUB_TINY tweaks for systems with 16MB or less RAM.
- Use cases that stopped working with CONFIG_SLUB_TINY instead of SLOB
  should be reported to linux-mm@kvack.org and slab maintainers,
  otherwise SLOB will be removed in few cycles.

[1] https://lore.kernel.org/all/20221121171202.22080-1-vbabka@suse.cz/
2022-12-01 00:14:00 +01:00
Vlastimil Babka
0af8489b02 mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
SLUB gets most of its scalability by percpu slabs. However for
CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
associated code. Additionally to the slab page savings, this reduces
percpu allocator usage, and code size.

This change builds on recent commit c7323a5ad0 ("mm/slub: restrict
sysfs validation to debug caches and make it safe"), as caches with
enabled debugging also avoid percpu slabs and all allocations and
freeing ends up working with the partial list. With a bit more
refactoring by the preceding patches, use the same code paths with
CONFIG_SLUB_TINY.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
2022-12-01 00:09:09 +01:00
Andrew Morton
1d351f1894 revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
It causes build failures with unusual CC/HOSTCC combinations.

Quoting
https://lkml.kernel.org/r/A222B1E6-69B8-4085-AD1B-27BDB72CA971@goldelico.com:

  HOSTCC  scripts/mod/modpost.o - due to target missing
In file included from include/linux/string.h:5,
                 from scripts/mod/../../include/linux/license.h:5,
                 from scripts/mod/modpost.c:24:
include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory
  246 | #include <asm/rwonce.h>
      |          ^~~~~~~~~~~~~~
compilation terminated.

...

The problem is that HOSTCC is not necessarily the same compiler or even
architecture as CC and pulling in <linux/compiler.h> or <asm/rwonce.h>
files indirectly isn't a good idea then.

My toolchain is providing HOSTCC = gcc (MacPorts) and CC = arm-linux-gnueabihf
(built from gcc source) and all running on Darwin.

If I change the include to <string.h> I can then "HOSTCC scripts/mod/modpost.c"
but then it fails for "CC kernel/module/main.c" not finding <string.h>:

  CC      kernel/module/main.o - due to target missing
In file included from kernel/module/main.c:43:0:
./include/linux/license.h:5:20: fatal error: string.h: No such file or directory
 #include <string.h>
                    ^
compilation terminated.

Reported-by: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:42 -08:00
Jann Horn
2ba99c5e08 mm/khugepaged: fix GUP-fast interaction by sending IPI
Since commit 70cbc3cc78 ("mm: gup: fix the fast GUP race against THP
collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
ensure that the page table was not removed by khugepaged in between.

However, lockless_pages_from_mm() still requires that the page table is
not concurrently freed.  Fix it by sending IPIs (if the architecture uses
semi-RCU-style page table freeing) before freeing/reusing page tables.

Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:42 -08:00
Juergen Gross
4aaf269c76 mm: introduce arch_has_hw_nonleaf_pmd_young()
When running as a Xen PV guests commit eed9a328aa ("mm: x86: add
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
pmdp_test_and_clear_young():

 BUG: unable to handle page fault for address: ffff8880083374d0
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
 Oops: 0003 [#1] PREEMPT SMP NOPTI
 CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
 RIP: e030:pmdp_test_and_clear_young+0x25/0x40

This happens because the Xen hypervisor can't emulate direct writes to
page table entries other than PTEs.

This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
similar to arch_has_hw_pte_young() and test that instead of
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.

Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
Fixes: eed9a328aa ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: David Hildenbrand <david@redhat.com>	[core changes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:41 -08:00
Juergen Gross
6617da8fb5 mm: add dummy pmd_young() for architectures not having it
In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
fallback.  This is required for the later patch "mm: introduce
arch_has_hw_nonleaf_pmd_young()".

Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:41 -08:00
Mike Kravetz
04ada095dc hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range.  For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final.  However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out.  In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.

This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing.  Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table.  This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.

Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas().  This is used to indicate the 'final' unmapping of
a hugetlb vma.  When called via MADV_DONTNEED, this flag is not set and
the vm_lock is not deleted.

[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/

Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:40 -08:00
Mike Kravetz
21b85b0952 madvise: use zap_page_range_single for madvise dontneed
This series addresses the issue first reported in [1], and fully described
in patch 2.  Patches 1 and 2 address the user visible issue and are tagged
for stable backports.

While exploring solutions to this issue, related problems with mmu
notification calls were discovered.  This is addressed in the patch
"hugetlb: remove duplicate mmu notifications:".  Since there are no user
visible effects, this third is not tagged for stable backports.

Previous discussions suggested further cleanup by removing the
routine zap_page_range.  This is possible because zap_page_range_single
is now exported, and all callers of zap_page_range pass ranges entirely
within a single vma.  This work will be done in a later patch so as not
to distract from this bug fix.

[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/


This patch (of 2):

Expose the routine zap_page_range_single to zap a range within a single
vma.  The madvise routine madvise_dontneed_single_vma can use this routine
as it explicitly operates on a single vma.  Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account.  This is required as MADV_DONTNEED supports hugetlb vmas.

Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:40 -08:00
Yang Shi
dec1d352de mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
Syzbot reported the below splat:

WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node
include/linux/gfp.h:221 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221
hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221
alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Modules linked in:
CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted
6.1.0-rc1-syzkaller-00454-ga70385240892 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 10/11/2022
RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline]
RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc
ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9
96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae
RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001
RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715
 hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156
 madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611
 madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066
 madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240
 do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419
 do_madvise mm/madvise.c:1432 [inline]
 __do_sys_madvise mm/madvise.c:1432 [inline]
 __se_sys_madvise mm/madvise.c:1430 [inline]
 __x64_sys_madvise+0x113/0x150 mm/madvise.c:1430
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f6b48a4eef9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9
RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000
RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4
R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000
 </TASK>

It is because khugepaged allocates pages with __GFP_THISNODE, but the
preferred node is bogus.  The previous patch fixed the khugepaged code to
avoid allocating page from non-existing node.  But it is still racy
against memory hotremove.  There is no synchronization with the memory
hotplug so it is possible that memory gets offline during a longer taking
scanning.

So this warning still seems not quite helpful because:
  * There is no guarantee the node is online for __GFP_THISNODE context
    for all the callsites.
  * Kernel just fails the allocation regardless the warning, and it looks
    all callsites handle the allocation failure gracefully.

Although while the warning has helped to identify a buggy code, it is not
safe in general and this warning could panic the system with panic-on-warn
configuration which tends to be used surprisingly often.  So replace
VM_WARN_ON to pr_warn().  And the warning will be triggered if
__GFP_NOWARN is set since the allocator would print out warning for such
case if __GFP_NOWARN is not set.

[shy828301@gmail.com: rename nid to this_node and gfp to warn_gfp]
  Link: https://lkml.kernel.org/r/20221123193014.153983-1-shy828301@gmail.com
[akpm@linux-foundation.org: fix whitespace]
[akpm@linux-foundation.org: print gfp_mask instead of warn_gfp, per Michel]
Link: https://lkml.kernel.org/r/20221108184357.55614-3-shy828301@gmail.com
Fixes: 7d8faaf155 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reported-by: <syzbot+0044b22d177870ee974f@syzkaller.appspotmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:40 -08:00
Paul E. McKenney
87492c06e6 Merge branches 'doc.2022.10.20a', 'fixes.2022.10.21a', 'lazy.2022.11.30a', 'srcunmisafe.2022.11.09a', 'torture.2022.10.18c' and 'torturescript.2022.10.20a' into HEAD
doc.2022.10.20a: Documentation updates.
fixes.2022.10.21a: Miscellaneous fixes.
lazy.2022.11.30a: Lazy call_rcu() and NOCB updates.
srcunmisafe.2022.11.09a: NMI-safe SRCU readers.
torture.2022.10.18c: Torture-test updates.
torturescript.2022.10.20a: Torture-test scripting updates.
2022-11-30 13:20:05 -08:00
Jeff Layton
401a8b8fd5 filelock: add a new locks_inode_context accessor function
There are a number of places in the kernel that are accessing the
inode->i_flctx field without smp_load_acquire. This is required to
ensure that the caller doesn't see a partially-initialized structure.

Add a new accessor function for it to make this clear and convert all of
the relevant accesses in locks.c to use it. Also, convert
locks_free_lock_context to use the helper as well instead of just doing
a "bare" assignment.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30 05:08:10 -05:00
Jeff Layton
ab1ddef98a filelock: new helper: vfs_inode_has_locks
Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.

Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30 05:06:06 -05:00
Paul E. McKenney
0cd7e350ab rcu: Make SRCU mandatory
Kernels configured with CONFIG_PRINTK=n and CONFIG_SRCU=n get build
failures.  This causes trouble for deep embedded systems.  But given
that there are more than 25 instances of "select SRCU" in the kernel,
it is hard to believe that there are many kernels running in production
without SRCU.  This commit therefore makes SRCU mandatory.  The SRCU
Kconfig option remains for backwards compatibility, and will be removed
when it is no longer used.

[ paulmck: Update per kernel test robot feedback. ]

Reported-by: John Ogness <john.ogness@linutronix.de>
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: John Ogness <john.ogness@linutronix.de>
2022-11-29 15:00:06 -08:00
Joel Fernandes (Google)
3cb278e73b rcu: Make call_rcu() lazy to save power
Implement timer-based RCU callback batching (also known as lazy
callbacks). With this we save about 5-10% of power consumed due
to RCU requests that happen when system is lightly loaded or idle.

By default, all async callbacks (queued via call_rcu) are marked
lazy. An alternate API call_rcu_hurry() is provided for the few users,
for example synchronize_rcu(), that need the old behavior.

The batch is flushed whenever a certain amount of time has passed, or
the batch on a particular CPU grows too big. Also memory pressure will
flush it in a future patch.

To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists which were originally introduced to
address lock contention, to handle lazy CBs as well. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.

[ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
[ paulmck: Apply Zqiang feedback. ]
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Suggested-by: Paul McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29 14:02:23 -08:00