Commit Graph

1072576 Commits

Author SHA1 Message Date
Mamatha Inamdar
be7be1c6c6 PCI: rpaphp: Add MODULE_DESCRIPTION
This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
(descriptions taken from Kconfig file).

Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200924051343.16052.9571.stgit@localhost.localdomain
2022-02-03 21:45:45 +11:00
Julia Lawall
925f76c557 powerpc/spufs: adjust list element pointer type
Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate
that the list elements have type spu_context, not spu as used here.  Change
the type of tmp accordingly.

This has no impact on the execution, because tmp is not used in the body of
the loop.

Fixes: c5fc8d2a92 ("[CELL] cell: add placement computation for scheduling of affinity contexts")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr
2022-02-03 21:40:32 +11:00
Bhaskar Chowdhury
a1c4140933 powerpc/epapr: Fix parmeters typo
s/parmeters/parameters/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210320213932.22697-1-unixbhaskar@gmail.com
2022-02-03 21:35:56 +11:00
Fabiano Rosas
b53c861059 powerpc: Fix debug print in smp_setup_cpu_maps
When figuring out the number of threads, the debug message prints "1
thread" for the first iteration of the loop, instead of the actual
number of threads calculated from the length of the
"ibm,ppc-interrupt-server#s" property.

  * /cpus/PowerPC,POWER8@20...
    ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG
    thread 0 -> cpu 0 (hard id 32)
    thread 1 -> cpu 1 (hard id 33)
    thread 2 -> cpu 2 (hard id 34)
    thread 3 -> cpu 3 (hard id 35)
    thread 4 -> cpu 4 (hard id 36)
    thread 5 -> cpu 5 (hard id 37)
    thread 6 -> cpu 6 (hard id 38)
    thread 7 -> cpu 7 (hard id 39)
  * /cpus/PowerPC,POWER8@28...
    ibm,ppc-interrupt-server#s -> 8 threads
    thread 0 -> cpu 8 (hard id 40)
    thread 1 -> cpu 9 (hard id 41)
    thread 2 -> cpu 10 (hard id 42)
    thread 3 -> cpu 11 (hard id 43)
    thread 4 -> cpu 12 (hard id 44)
    thread 5 -> cpu 13 (hard id 45)
    thread 6 -> cpu 14 (hard id 46)
    thread 7 -> cpu 15 (hard id 47)
(...)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
2022-02-03 20:24:59 +11:00
Michael Ellerman
961f649fb3 powerpc/ptdump: Fix sparse warning in hashpagetable.c
As reported by sparse:

  arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    got restricted __be64 [usertype] v
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    got restricted __be64 [usertype] r

The values returned by plpar_pte_read_4() are CPU endian, not __be64, so
assigning them to struct hash_pte confuses sparse. As a minimal fix open
code a struct to hold the values with CPU endian types.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Corentin Labbe
ccafe7c20b macintosh: macio_asic: remove useless cast for driver.name
pci_driver name is const char pointer, so the cast it not necessary.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125135421.4081740-1-clabbe@baylibre.com
2022-02-02 20:32:11 +11:00
Michael Ellerman
2e7f1e2b30 powerpc/64: Move paca allocation later in boot
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.

The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.

  early_init_devtree:                       <- early_radix_enabled() = false
    early_init_dt_scan_cpus:                <- early_radix_enabled() = false
        ...
        check_cpu_pa_features:              <- early_radix_enabled() = false
        ...                               ^ <- early_radix_enabled() = TRUE
        allocate_paca:                    | <- early_radix_enabled() = TRUE
            ...                           |
            ppc64_bolted_size:            | <- early_radix_enabled() = TRUE
                if (early_radix_enabled())| <- early_radix_enabled() = TRUE
                    return ULONG_MAX;     |
        ...                               |
    ...                                   | <- early_radix_enabled() = TRUE
    ...                                   | <- early_radix_enabled() = TRUE
    mmu_early_init_devtree()              V
    ...                                     <- early_radix_enabled() = false

This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.

The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.

At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.

Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.

Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().

The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.

[1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com
[2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Maxim Kiselev
5ebb747492 powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
On board rev A, the network interface labels for the switch ports
written on the front panel are different than on rev B and later.

This patch fixes network interface names for the switch ports according
to labels that are written on the front panel of the board rev B.
They start from ETH3 and end at ETH10.

This patch also introduces a separate device tree for rev A.
The main device tree is supposed to cover rev B and later.

Fixes: e69eb0824d ("powerpc: dts: t1040rdb: add ports for Seville Ethernet switch")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220121091447.3412907-1-bigunclemax@gmail.com
2022-02-02 20:32:10 +11:00
Laurent Dufour
eddaa9a402 powerpc/pseries: read the lpar name from the firmware
The LPAR name may be changed after the LPAR has been started in the HMC.
In that case lparstat command is not reporting the updated value because
it reads it from the device tree which is read at boot time.

However this value could be read from RTAS.

Adding this value in the /proc/powerpc/lparcfg output allows to read the
updated value.

However the hypervisor, like Qemu/KVM, may not support this RTAS
parameter. In that case the value reported in lparcfg is read from the
device tree and so is not updated accordingly.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop doc-comment syntax, change RTAS/DT to lower case, use of_root
      to fix missing of_node_put(), use of_property_read_string()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220106161339.74656-1-ldufour@linux.ibm.com
2022-02-02 20:32:10 +11:00
Thierry Reding
d5342fdd16 powerpc: dts: Fix some I2C unit addresses
The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't
match the value in the "reg" property and cause a DTC warning.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211220134036.683309-1-thierry.reding@gmail.com
2022-01-31 13:45:24 +11:00
Maxim Kiselev
17846485df powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
T1040RDB has two RTL8211E-VB phys which requires setting
of internal delays for correct work.

Changing the phy-connection-type property to `rgmii-id`
will fix this issue.

Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com
2022-01-31 13:45:24 +11:00
Tobias Waldekranz
f529edd1b6 powerpc/e500/qemu-e500: allow core to idle without waiting
This means an idle guest won't needlessly consume an entire core on
the host, waiting for work to show up.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220112112459.1033754-1-troglobit@gmail.com
2022-01-31 13:45:24 +11:00
Michal Suchanek
b2a6f60435 powerpc: add link stack flush mitigation status in debugfs.
The link stack flush status is not visible in debugfs. It can be enabled
even when count cache flush is disabled. Add separate file for its
status.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
[mpe: Update for change to link_stack_flush_type]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
2022-01-31 13:45:23 +11:00
Sachin Sant
279d1a72c0 powerpc/xive: Export XIVE IPI information for online-only processors.
Cédric pointed out that XIVE IPI information exported via sysfs
(debug/powerpc/xive) display empty lines for processors which are
not online.

Switch to using for_each_online_cpu() so that information is
displayed for online-only processors.

Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchset@MacBook-Pro.local
2022-01-31 13:45:23 +11:00
Linus Torvalds
26291c54e1 Linux 5.17-rc2 v5.17-rc2 2022-01-30 15:37:07 +02:00
Linus Torvalds
c5fe9de790 Merge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Drop an unused private data field in the AIC driver

 - Various fixes to the realtek-rtl driver

 - Make the GICv3 ITS driver compile again in !SMP configurations

 - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

 - Yet another kfree/bitmap_free conversion

 - Various DT updates (Renesas, SiFive)

* tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
  dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
  dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
  irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
  irqchip/gic-v3-its: Fix build for !SMP
  irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
  irqchip/realtek-rtl: Service all pending interrupts
  irqchip/realtek-rtl: Fix off-by-one in routing
  irqchip/realtek-rtl: Map control data to virq
  irqchip/apple-aic: Drop unused ipi_hwirq field
2022-01-30 15:12:02 +02:00
Linus Torvalds
27a96c4feb Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Prevent accesses to the per-CPU cgroup context list from another CPU
   except the one it belongs to, to avoid list corruption

 - Make sure parent events are always woken up to avoid indefinite hangs
   in the traced workload

* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix cgroup event list management
  perf: Always wake the parent event
2022-01-30 15:02:32 +02:00
Linus Torvalds
24f4db1f3a Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "Make sure the membarrier-rseq fence commands are part of the reported
  set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"

* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
2022-01-30 13:09:00 +02:00
Linus Torvalds
a96d3a5b15 Merge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Add another Intel CPU model to the list of CPUs supporting the
   processor inventory unique number

 - Allow writing to MCE thresholding sysfs files again - a previous
   change had accidentally disabled it and no one noticed. Goes to show
   how much is this stuff used

* tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
  x86/MCE/AMD: Allow thresholding interface updates after init
2022-01-30 12:55:06 +02:00
Linus Torvalds
8dd71685dc Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "12 patches.

  Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
  (memory-failure, folios, kasan, and psi), selftests, and ocfs2"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: fix a deadlock when commit trans
  jbd2: export jbd2_journal_[grab|put]_journal_head
  psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
  psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
  mm, kasan: use compare-exchange operation to set KASAN page tag
  kasan: test: fix compatibility with FORTIFY_SOURCE
  tools/testing/scatterlist: add missing defines
  mm: page->mapping folio->mapping should have the same offset
  memory-failure: fetch compound_head after pgmap_pfn_valid()
  ia64: make IA64_MCA_RECOVERY bool instead of tristate
  binfmt_misc: fix crash when load/unload module
  include/linux/sysctl.h: fix register_sysctl_mount_point() return type
2022-01-30 11:21:50 +02:00
Joseph Qi
ddf4b773aa ocfs2: fix a deadlock when commit trans
commit 6f1b228529 introduces a regression which can deadlock as
follows:

  Task1:                              Task2:
  jbd2_journal_commit_transaction     ocfs2_test_bg_bit_allocatable
  spin_lock(&jh->b_state_lock)        jbd_lock_bh_journal_head
  __jbd2_journal_remove_checkpoint    spin_lock(&jh->b_state_lock)
  jbd2_journal_put_journal_head
  jbd_lock_bh_journal_head

Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
order, which finally result in a deadlock.

So use jbd2_journal_[grab|put]_journal_head instead in
ocfs2_test_bg_bit_allocatable() to fix it.

Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
Fixes: 6f1b228529 ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Joseph Qi
4cd1103d8c jbd2: export jbd2_journal_[grab|put]_journal_head
Patch series "ocfs2: fix a deadlock case".

This fixes a deadlock case in ocfs2.  We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.

This patch (of 2):

This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g.  ocfs2.

Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Suren Baghdasaryan
44585f7bc0 psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
When CONFIG_PROC_FS is disabled psi code generates the following
warnings:

  kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
      1364 | static const struct proc_ops psi_cpu_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
      1355 | static const struct proc_ops psi_memory_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
      1346 | static const struct proc_ops psi_io_proc_ops = {
           |                              ^~~~~~~~~~~~~~~

Make definitions of these structures and related functions conditional
on CONFIG_PROC_FS config.

Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Suren Baghdasaryan
51e50fbd3e psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
When CONFIG_CGROUPS is disabled psi code generates the following
warnings:

  kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes]
      1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group,
           |                     ^~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes]
      1182 | void psi_trigger_destroy(struct psi_trigger *t)
           |      ^~~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes]
      1249 | __poll_t psi_trigger_poll(void **trigger_ptr,
           |          ^~~~~~~~~~~~~~~~

Change the declarations of these functions in the header to provide the
prototypes even when they are unused.

Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Peter Collingbourne
27fe73394a mm, kasan: use compare-exchange operation to set KASAN page tag
It has been reported that the tag setting operation on newly-allocated
pages can cause the page flags to be corrupted when performed
concurrently with other flag updates as a result of the use of
non-atomic operations.

Fix the problem by using a compare-exchange loop to update the tag.

Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0cec63101
Fixes: 2813b9c029 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Marco Elver
09c6304e38 kasan: test: fix compatibility with FORTIFY_SOURCE
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks using __builtin_object_size(ptr), which when failed will
panic the kernel.

Because the KASAN test deliberately performs out-of-bounds operations,
the kernel panics with FORTIFY_SOURCE, for example:

 | kernel BUG at lib/string_helpers.c:910!
 | invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 | CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G    B             5.16.0-rc3+ #3
 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 | RIP: 0010:fortify_panic+0x19/0x1b
 | ...
 | Call Trace:
 |  kmalloc_oob_in_memset.cold+0x16/0x16
 |  ...

Fix it by also hiding `ptr` from the optimizer, which will ensure that
__builtin_object_size() does not return a valid size, preventing
fortified string functions from panicking.

Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nico Pache <npache@redhat.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Maor Gottlieb
0226bd64da tools/testing/scatterlist: add missing defines
The cited commits replaced preemptible with pagefault_disabled and
flush_kernel_dcache_page with flush_dcache_page respectively, hence need
to update the corresponding defines in the test.

  scatterlist.c: In function ‘sg_miter_stop’:
  scatterlist.c:919:4: warning: implicit declaration of function ‘flush_dcache_page’ [-Wimplicit-function-declaration]
      flush_dcache_page(miter->page);
      ^~~~~~~~~~~~~~~~~
  In file included from linux/scatterlist.h:8:0,
                   from scatterlist.c:9:
  scatterlist.c:922:18: warning: implicit declaration of function ‘pagefault_disabled’ [-Wimplicit-function-declaration]
      WARN_ON_ONCE(!pagefault_disabled());
                    ^
  linux/mm.h:23:25: note: in definition of macro ‘WARN_ON_ONCE’
    int __ret_warn_on = !!(condition);                      \
                           ^~~~~~~~~

Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com
Fixes: 723aca2085 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()")
Fixes: 0e84f5dbf8 ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Wei Yang
536f4217ce mm: page->mapping folio->mapping should have the same offset
As with the other members of folio, the offset of page->mapping and
folio->mapping must be the same.  The compile-time check was
inadvertently removed during development.  Add it back.

[willy@infradead.org: changelog redo]

Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Joao Martins
61e28cf054 memory-failure: fetch compound_head after pgmap_pfn_valid()
memory_failure_dev_pagemap() at the moment assumes base pages (e.g.
dax_lock_page()).  For devmap with compound pages fetch the
compound_head in case a tail page memory failure is being handled.

Currently this is a nop, but in the advent of compound pages in
dev_pagemap it allows memory_failure_dev_pagemap() to keep working.

Without this fix memory-failure handling (i.e.  MCEs on pmem) with
device-dax configured namespaces will regress (and crash).

Link: https://lkml.kernel.org/r/20211202204422.26777-2-joao.m.martins@oracle.com
Reported-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Randy Dunlap
dbecf9b8b8 ia64: make IA64_MCA_RECOVERY bool instead of tristate
In linux-next, IA64_MCA_RECOVERY uses the (new) function
make_task_dead(), which is not exported for use by modules.  Instead of
exporting it for one user, convert IA64_MCA_RECOVERY to be a bool
Kconfig symbol.

In a config file from "kernel test robot <lkp@intel.com>" for a
different problem, this linker error was exposed when
CONFIG_IA64_MCA_RECOVERY=m.

Fixes this build error:

  ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined!

Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org
Fixes: 0e25498f8c ("exit: Add and use make_task_dead.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Tong Zhang
e7f1e8834b binfmt_misc: fix crash when load/unload module
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again.  Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.

Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.

To reproduce:
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc

resulting in

  modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory

and an unhappy kernel:

  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  BUG: unable to handle page fault for address: fffffbfff8004802
  Call Trace:
    init_misc_binfmt+0x2d/0x1000 [binfmt_misc]

Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d533 ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Christian Brauner<brauner@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Andrew Morton
6cb917411e include/linux/sysctl.h: fix register_sysctl_mount_point() return type
The CONFIG_SYSCTL=n stub returns the wrong type.

Fixes: ee9efac48a ("sysctl: add helper to register a sysctl mount point")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Thomas Gleixner
243d308037 Merge tag 'irqchip-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:

  - Drop an unused private data field in the AIC driver

  - Various fixes to the realtek-rtl driver

  - Make the GICv3 ITS driver compile again in !SMP configurations

  - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

  - Yet another kfree/bitmap_free conversion

  - Various DT updates (Renesas, SiFive)

Link: https://lore.kernel.org/r/20220128174217.517041-1-maz@kernel.org
2022-01-29 21:03:20 +01:00
Linus Torvalds
f8c7e4ede4 Merge tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:

 - Fix compilation warnings in new mt7621 driver (Sergio Paracuellos)

 - Restore the sysfs "rom" file for VGA shadow ROMs, which was broken
   when converting "rom" to be a static attribute (Bjorn Helgaas)

* tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/sysfs: Find shadow ROM before static attribute initialization
  PCI: mt7621: Remove unused function pcie_rmw()
  PCI: mt7621: Drop of_match_ptr() to avoid unused variable
2022-01-29 19:05:47 +02:00
Linus Torvalds
4cd90083d3 Merge tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
 "Two fixes for the gpio-simulator:

   - fix a bug with hogs not being set-up in gpio-sim when user-space
     sets the chip label to an empty string

   - include the gpio-sim documentation in the index"

* tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: add doc file to index file
  gpio: sim: check the label length when setting up device properties
2022-01-29 15:45:33 +02:00
Linus Torvalds
e255759e5a Merge tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are two small char/misc driver fixes for 5.17-rc2 that fix some
  reported issues. They are:

   - fix up a merge issue in the at25.c driver that ended up dropping
     some lines in the driver. The removed lines ended being needed, so
     this restores it and the driver works again.

   - counter core fix where the wrong error was being returned, NULL
     should be the correct error for when memory is gone here, like the
     kmalloc() core does.

  Both of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  counter: fix an IS_ERR() vs NULL bug
  eeprom: at25: Restore missing allocation
2022-01-29 15:34:04 +02:00
Linus Torvalds
bb37101b36 Merge tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
 "Here are some small bug fixes and reverts for reported problems with
  the tty core and drivers. They include:

   - revert the fifo use for the 8250 console mode. It caused too many
     regressions and problems, and had a bug in it as well. This is
     being reworked and should show up in a later -rc1 release, but it's
     not ready for 5.17

   - rpmsg tty race fix

   - restore the cyclades.h uapi header file. Turns out a compiler test
     suite used it for some unknown reason. Bring it back just for the
     parts that are used by the builder test so they continue to build.
     No functionality is restored as no one actually has this hardware
     anymore, nor is it really tested.

   - stm32 driver fixes

   - n_gsm flow control fixes

   - pl011 driver fix

   - rs485 initialization fix

  All of these have been in linux-next this week with no reported
  problems"

* tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  kbuild: remove include/linux/cyclades.h from header file check
  serial: core: Initialize rs485 RTS polarity already on probe
  serial: pl011: Fix incorrect rs485 RTS polarity on set_mctrl
  serial: stm32: fix software flow control transfer
  serial: stm32: prevent TDR register overwrite when sending x_char
  tty: n_gsm: fix SW flow control encoding/handling
  serial: 8250: of: Fix mapped region size when using reg-offset property
  tty: rpmsg: Fix race condition releasing tty port
  tty: Partially revert the removal of the Cyclades public API
  tty: Add support for Brainboxes UC cards.
  Revert "tty: serial: Use fifo in 8250 console driver"
2022-01-29 15:23:13 +02:00
Linus Torvalds
44aa31a2bf Merge tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes for 5.17-rc2 that resolve a
  number of reported problems. These include:

   - typec driver fixes

   - xhci platform driver fixes for suspending

   - ulpi core fix

   - role.h build fix

   - new device ids

   - syzbot-reported bugfixes

   - gadget driver fixes

   - dwc3 driver fixes

   - other small fixes

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdnsp: Fix segmentation fault in cdns_lost_power function
  usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
  usb: gadget: at91_udc: fix incorrect print type
  usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
  usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
  usb: xhci-plat: fix crash when suspend if remote wake enable
  usb: common: ulpi: Fix crash in ulpi_match()
  usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
  ucsi_ccg: Check DEV_INT bit only when starting CCG4
  USB: core: Fix hang in usb_kill_urb by adding memory barriers
  usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
  usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
  usb: typec: tcpm: Do not disconnect while receiving VBUS off
  usb: typec: Don't try to register component master without components
  usb: typec: Only attempt to link USB ports if there is fwnode
  usb: typec: tcpci: don't touch CC line if it's Vconn source
  usb: roles: fix include/linux/usb/role.h compile issue
2022-01-29 15:17:20 +02:00
Linus Torvalds
cb323ee75d Merge tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - NVMe pull request
      - add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu
        Zheng)
      - remove the unneeded ret variable in nvmf_dev_show (Changcheng
        Deng)

 - Fix for a hang regression introduced with a patch in the merge
   window, where low queue depth devices would not always get woken
   correctly (Laibin)

 - Small series fixing an IO accounting issue with bio backed dm devices
   (Mike, Yu)

* tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
  dm: properly fix redundant bio-based IO accounting
  dm: revert partial fix for redundant bio-based IO accounting
  block: add bio_start_io_acct_time() to control start_time
  blk-mq: Fix wrong wakeup batch configuration which will cause hang
  nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show
  nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
  blk-mq: fix missing blk_account_io_done() in error path
  block: fix memory leak in disk_register_independent_access_ranges
2022-01-29 15:01:08 +02:00
Linus Torvalds
3b58e9f3a3 Merge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Just two small fixes this time:

   - Fix a bug that can lead to node registration taking 1 second, when
     it should finish much quicker (Dylan)

   - Remove an unused argument from a function (Usama)"

* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
  io_uring: remove unused argument from io_rsrc_node_alloc
  io_uring: fix bug in slow unregistering of nodes
2022-01-29 14:53:07 +02:00
Linus Torvalds
d66c1e79b9 Merge tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix VM debug warnings on boot triggered via __set_fixmap().

 - Fix a debug warning in the 64-bit Book3S PMU handling code.

 - Fix nested guest HFSCR handling with multiple vCPUs on Power9 or
   later.

 - Fix decrementer storm caused by a recent change, seen with some
   configs.

Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Fabiano Rosas, Maxime Bizon, Nicholas Piggin, and Sachin Sant.

* tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/interrupt: Fix decrementer storm
  KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
  powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
  powerpc/fixmap: Fix VM debug warning on unmap
2022-01-29 14:46:19 +02:00
Linus Torvalds
216e2aede2 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Errata workarounds for Cortex-A510: broken hardware dirty bit
   management, detection code for the TRBE (tracing) bugs with the
   actual fixes going in via the CoreSight tree.

 - Cortex-X2 errata handling for TRBE (inheriting the workarounds from
   Cortex-A710).

 - Fix ex_handler_load_unaligned_zeropad() to use the correct struct
   members.

 - A couple of kselftest fixes for FPSIMD.

 - Silence the vdso "no previous prototype" warning.

 - Mark start_backtrace() notrace and NOKPROBE_SYMBOL.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpufeature: List early Cortex-A510 parts as having broken dbm
  kselftest/arm64: Correct logging of FPSIMD register read via ptrace
  kselftest/arm64: Skip VL_INHERIT tests for unsupported vector types
  arm64: errata: Add detection for TRBE trace data corruption
  arm64: errata: Add detection for TRBE invalid prohibited states
  arm64: errata: Add detection for TRBE ignored system register writes
  arm64: Add Cortex-A510 CPU part definition
  arm64: extable: fix load_unaligned_zeropad() reg indices
  arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
  arm64: errata: Update ARM64_ERRATUM_[2119858|2224489] with Cortex-X2 ranges
  arm64: Add Cortex-X2 CPU part definition
  arm64: vdso: Fix "no previous prototype" warning
2022-01-29 08:57:22 +02:00
Linus Torvalds
d1e7f0919e Merge tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security sybsystem fix from James Morris:
 "Fix NULL pointer crash in LSM via Ceph, from Vivek Goyal"

* tag 'fixes-v5.17-lsm-ceph-null' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security, lsm: dentry_init_security() Handle multi LSM registration
2022-01-29 08:52:27 +02:00
Linus Torvalds
246e179d63 Merge tag 'docs-5.17-3' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
 "A few documentation fixes for 5.17"

* tag 'docs-5.17-3' of git://git.lwn.net/linux:
  docs/vm: Fix typo in *harden*
  Documentation: arm: marvell: Extend Avanta list
  docs: fix typo in Documentation/kernel-hacking/locking.rst
  docs: Hook the RTLA documents into the kernel docs build
2022-01-29 08:27:28 +02:00
Mike Snitzer
b879f915bc dm: properly fix redundant bio-based IO accounting
Record the start_time for a bio but defer the starting block core's IO
accounting until after IO is submitted using bio_start_io_acct_time().

This approach avoids the need to mess around with any of the
individual IO stats in response to a bio_split() that follows bio
submission.

Reported-by: Bud Brown <bubrown@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Depends-on: e45c47d1f9 ("block: add bio_start_io_acct_time() to control start_time")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28 12:28:15 -07:00
Mike Snitzer
f524d9c95f dm: revert partial fix for redundant bio-based IO accounting
Reverts a1e1cb72d9 ("dm: fix redundant IO accounting for bios that
need splitting") because it was too narrow in scope (only addressed
redundant 'sectors[]' accounting and not ios, nsecs[], etc).

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28 12:28:15 -07:00
Mike Snitzer
e45c47d1f9 block: add bio_start_io_acct_time() to control start_time
bio_start_io_acct_time() interface is like bio_start_io_acct() that
allows start_time to be passed in. This gives drivers the ability to
defer starting accounting until after IO is issued (but possibily not
entirely due to bio splitting).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28 12:28:15 -07:00
Linus Torvalds
169387e2aa Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Sixteen patches, mostly minor fixes and updates; however there are
  substantive driver bug fixes in pm8001, bnx2fc, zfcp, myrs and qedf"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: myrs: Fix crash in error case
  scsi: 53c700: Remove redundant assignment to pointer SCp
  scsi: ufs: Treat link loss as fatal error
  scsi: ufs: Use generic error code in ufshcd_set_dev_pwr_mode()
  scsi: bfa: Remove useless DMA-32 fallback configuration
  scsi: hisi_sas: Remove useless DMA-32 fallback configuration
  scsi: 3w-sas: Remove useless DMA-32 fallback configuration
  scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
  scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
  scsi: pm8001: Fix bogus FW crash for maxcpus=1
  scsi: qedf: Change context reset messages to ratelimited
  scsi: qedf: Fix refcount issue when LOGO is received during TMF
  scsi: qedf: Add stag_work to all the vports
  scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
  scsi: target: iscsi: Make sure the np under each tpg is unique
  scsi: elx: efct: Don't use GFP_KERNEL under spin lock
2022-01-28 21:17:58 +02:00
Linus Torvalds
073819e0ff Merge tag 'efi-urgent-for-v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - avoid UEFI v2.00+ runtime services on Apple Mac systems, as they have
   been reported to cause crashes, and most Macs claim to be EFI v1.10
   anyway

 - avoid a spurious boot time warning on arm64 systems with 64k pages

* tag 'efi-urgent-for-v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
  efi/libstub: arm64: Fix image check alignment at entry
2022-01-28 21:12:07 +02:00
Vivek Goyal
7f5056b9e7 security, lsm: dentry_init_security() Handle multi LSM registration
A ceph user has reported that ceph is crashing with kernel NULL pointer
dereference. Following is the backtrace.

/proc/version: Linux version 5.16.2-arch1-1 (linux@archlinux) (gcc (GCC)
11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Thu, 20 Jan 2022
16:18:29 +0000
distro / arch: Arch Linux / x86_64
SELinux is not enabled
ceph cluster version: 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)

relevant dmesg output:
[   30.947129] BUG: kernel NULL pointer dereference, address:
0000000000000000
[   30.947206] #PF: supervisor read access in kernel mode
[   30.947258] #PF: error_code(0x0000) - not-present page
[   30.947310] PGD 0 P4D 0
[   30.947342] Oops: 0000 [#1] PREEMPT SMP PTI
[   30.947388] CPU: 5 PID: 778 Comm: touch Not tainted 5.16.2-arch1-1 #1
86fbf2c313cc37a553d65deb81d98e9dcc2a3659
[   30.947486] Hardware name: Gigabyte Technology Co., Ltd. B365M
DS3H/B365M DS3H, BIOS F5 08/13/2019
[   30.947569] RIP: 0010:strlen+0x0/0x20
[   30.947616] Code: b6 07 38 d0 74 16 48 83 c7 01 84 c0 74 05 48 39 f7 75
ec 31 c0 31 d2 89 d6 89 d7 c3 48 89 f8 31 d2 89 d6 89 d7 c3 0
f 1f 40 00 <80> 3f 00 74 12 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31
ff
[   30.947782] RSP: 0018:ffffa4ed80ffbbb8 EFLAGS: 00010246
[   30.947836] RAX: 0000000000000000 RBX: ffffa4ed80ffbc60 RCX:
0000000000000000
[   30.947904] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[   30.947971] RBP: ffff94b0d15c0ae0 R08: 0000000000000000 R09:
0000000000000000
[   30.948040] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[   30.948106] R13: 0000000000000001 R14: ffffa4ed80ffbc60 R15:
0000000000000000
[   30.948174] FS:  00007fc7520f0740(0000) GS:ffff94b7ced40000(0000)
knlGS:0000000000000000
[   30.948252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.948308] CR2: 0000000000000000 CR3: 0000000104a40001 CR4:
00000000003706e0
[   30.948376] Call Trace:
[   30.948404]  <TASK>
[   30.948431]  ceph_security_init_secctx+0x7b/0x240 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[   30.948582]  ceph_atomic_open+0x51e/0x8a0 [ceph
49f9c4b9bf5be8760f19f1747e26da33920bce4b]
[   30.948708]  ? get_cached_acl+0x4d/0xa0
[   30.948759]  path_openat+0x60d/0x1030
[   30.948809]  do_filp_open+0xa5/0x150
[   30.948859]  do_sys_openat2+0xc4/0x190
[   30.948904]  __x64_sys_openat+0x53/0xa0
[   30.948948]  do_syscall_64+0x5c/0x90
[   30.948989]  ? exc_page_fault+0x72/0x180
[   30.949034]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   30.949091] RIP: 0033:0x7fc7521e25bb
[   30.950849] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00
00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 0
0 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14
25

Core of the problem is that ceph checks for return code from
security_dentry_init_security() and if return code is 0, it assumes
everything is fine and continues to call strlen(name), which crashes.

Typically SELinux LSM returns 0 and sets name to "security.selinux" and
it is not a problem. Or if selinux is not compiled in or disabled, it
returns -EOPNOTSUP and ceph deals with it.

But somehow in this configuration, 0 is being returned and "name" is
not being initialized and that's creating the problem.

Our suspicion is that BPF LSM is registering a hook for
dentry_init_security() and returns hook default of 0.

LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry,...)

I have not been able to reproduce it just by doing CONFIG_BPF_LSM=y.
Stephen has tested the patch though and confirms it solves the problem
for him.

dentry_init_security() is written in such a way that it expects only one
LSM to register the hook. Atleast that's the expectation with current code.

If another LSM returns a hook and returns default, it will simply return
0 as of now and that will break ceph.

Hence, suggestion is that change semantics of this hook a bit. If there
are no LSMs or no LSM is taking ownership and initializing security context,
then return -EOPNOTSUP. Also allow at max one LSM to initialize security
context. This hook can't deal with multiple LSMs trying to init security
context. This patch implements this new behavior.

Reported-by: Stephen Muth <smuth4@gmail.com>
Tested-by: Stephen Muth <smuth4@gmail.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: <stable@vger.kernel.org> # 5.16.0
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
2022-01-28 10:53:26 -08:00