Commit Graph

1215471 Commits

Author SHA1 Message Date
David Vernet
7d35eb1a18 bpf, docs: s/eBPF/BPF in standards documents
There isn't really anything other than just "BPF" at this point,
so referring to it as "eBPF" in our standards document just causes
unnecessary confusion. Let's just be consistent and use "BPF".

Suggested-by: Will Hawkins <hawkinsw@obs.cr>
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230828155948.123405-4-void@manifault.com
2023-08-30 16:36:10 +02:00
David Vernet
deb8840725 bpf, docs: Add abi.rst document to standardization subdirectory
As specified in the IETF BPF charter, the BPF working group has plans to
add one or more informational documents that recommend conventions and
guidelines for producing portable BPF program binaries. The
instruction-set.rst document currently contains a "Registers and calling
convention" subsection which dictates a calling convention that belongs
in an ABI document, rather than an instruction set document. Let's move
it to a new abi.rst document so we can clean it up. The abi.rst document
will of course be significantly changed and expanded upon over time. For
now, it's really just a placeholder which will contain ABI-specific
language that doesn't belong in other documents.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230828155948.123405-3-void@manifault.com
2023-08-30 16:35:58 +02:00
David Vernet
aee1720eeb bpf, docs: Move linux-notes.rst to root bpf docs tree
In commit 4d496be9ca ("bpf,docs: Create new standardization
subdirectory"), I added a standardization/ directory to the BPF
documentation, which will contain the docs that will be standardized
as part of the effort with the IETF.

I included linux-notes.rst in that directory, but I shouldn't have. It
doesn't contain anything that will be standardized. Let's move it back
to Documentation/bpf.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230828155948.123405-2-void@manifault.com
2023-08-30 16:35:44 +02:00
Dr. David Alan Gilbert
f3a9b3758e fs/jfs: Use common ucs2 upper case table
Use the UCS-2 upper case tables from nls, that are shared
with smb.

This code in JFS is hard to test, so we're only reusing the
same tables (which are identical), not trying to reuse the
rest of the helper functions.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-30 08:55:52 -05:00
Dr. David Alan Gilbert
de54845290 fs/smb/client: Use common code in client
Now we've got the common code, use it for the client as well.
Note there's a change here where we're using the server version of
UniStrcat now which had different types (__le16 vs wchar_t) but
it's not interpreting the value other than checking for 0, however
we do need casts to keep sparse happy.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-30 08:55:52 -05:00
Dr. David Alan Gilbert
089f7f5913 fs/smb: Swing unicode common code from smb->NLS
Swing most of the inline functions and unicode tables into nls
from the copy in smb/server.  This is UCS-2 rather than most
of the rest of the code in NLS, but it currently seems like the
best place for it.

The actual unicode.c implementations vary much more between server
and client so they're unmoved.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-30 08:55:51 -05:00
Dr. David Alan Gilbert
9e74938954 fs/smb: Remove unicode 'lower' tables
The unicode glue in smb/*/..uniupr.h has a section guarded
by 'ifndef UNIUPR_NOLOWER' - but that's always
defined in smb/*/..unicode.h.  Nuke those tables.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-30 08:55:51 -05:00
Steve French
b3773b19d4 SMB3: rename macro CIFS_SERVER_IS_CHAN to avoid confusion
Since older dialects such as CIFS do not support multichannel
the macro CIFS_SERVER_IS_CHAN can be confusing (it requires SMB 3
or later) so shorten its name to "SERVER_IS_CHAN"

Suggested-by: Tom Talpey <tom@talpey.com>
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-30 08:55:02 -05:00
Luke D. Jones
acce85a7dd platform/x86: asus-wmi: corrections to egpu safety check
An incorrect if statement was preventing the enablement of the egpu.

Fixes: d49f4d1a30 ("platform/x86: asus-wmi: don't allow eGPU switching if eGPU not connected")
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230830022908.36264-2-luke@ljones.dev
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-08-30 15:25:18 +02:00
Vadim Pasternak
06469a8dc3 platform/x86: mlx-platform: Add dependency on PCI to Kconfig
Add dependency on PCI to avoid 'mlx-platform' compilation error in case
CONFIG_PCI is not set.

Failed on i386:
CONFIG_ACPI=y
CONFIG_ISA=y

Error In function 'mlxplat_pci_fpga_device_init':
implicit declaration of function 'pci_request_region':
 6204 |         err = pci_request_region(pci_dev, 0, res_name);
      |               ^~~~~~~~~~~~~~~~~~
      |               pci_request_regions

Fixes: 1316e0af2d ("platform: mellanox: mlx-platform: Introduce ACPI init flow")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Michael Shych <michaelsh@nvidia.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230829133748.58208-2-vadimp@nvidia.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-08-30 15:13:41 +02:00
Christoph Hellwig
2dcdf8c18d dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
It makes no sense to expose CONFIG_DMA_NUMA_CMA if CONFIG_NUMA is not
enabled, and random config options shouldn't be default unless there
is a good reason.  Replace the default NUMA with a depends on to fix both
issues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <roin.murphy@arm.com>
2023-08-30 13:52:53 +02:00
Thomas Gleixner
2b8272ff4a cpu/hotplug: Prevent self deadlock on CPU hot-unplug
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.

    CPU1      			                 	 CPU2

T1 sets cfs_quota
   starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2				
						T1 initiates offlining of CPU1
Hotplug operation starts
  ...
'period_timer' expires and is re-enqueued on CPU1
  ...
take_cpu_down()
  CPU1 shuts down and does not handle timers
  anymore. They have to be migrated in the
  post dead hotplug steps by the control task.

						T1 runs the post dead offline operation
					      	T1 is scheduled out
						T1 waits for 'period_timer' to expire

T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().

Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.

Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
2023-08-30 12:24:22 +02:00
Paul Gortmaker
96c1fa04f0 tick/rcu: Fix false positive "softirq work is pending" messages
In commit 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the
new function report_idle_softirq() was created by breaking code out of the
existing can_stop_idle_tick() for kernels v5.18 and newer.

In doing so, the code essentially went from a one conditional:

	if (a && b && c)
		warn();

to a three conditional:

	if (!a)
		return;
	if (!b)
		return;
	if (!c)
		return;
	warn();

But that conversion got the condition for the RT specific
local_bh_blocked() wrong. The original condition was:

   	!local_bh_blocked()

but the conversion failed to negate it so it ended up as:

        if (!local_bh_blocked())
		return false;

This issue lay dormant until another fixup for the same commit was added
in commit a7e282c777 ("tick/rcu: Fix bogus ratelimit condition").
This commit realized the ratelimit was essentially set to zero instead
of ten, and hence *no* softirq pending messages would ever be issued.

Once this commit was backported via linux-stable, both the v6.1 and v6.4
preempt-rt kernels started printing out 10 instances of this at boot:

  NOHZ tick-stop error: local softirq work is pending, handler #80!!!

Remove the negation and return when local_bh_blocked() evaluates to true to
bring the correct behaviour back.

Fixes: 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Wen Yang <wenyang.linux@foxmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230818200757.1808398-1-paul.gortmaker@windriver.com
2023-08-30 12:20:28 +02:00
Guo Ren
5195c35ac4 csky: Fixup compile error
Add header file for asmlinkage macro.

Error log:
In file included from arch/csky/include/asm/ptrace.h:7,
                 from arch/csky/include/asm/elf.h:6,
                 from include/linux/elf.h:6,
                 from kernel/extable.c:6:
arch/csky/include/asm/traps.h:43:11: error: expected ';' before 'void'
   43 | asmlinkage void do_trap_unknown(struct pt_regs *regs);
      |           ^~~~~

Fixes: c8171a86b2 ("csky: Fixup -Wmissing-prototypes warning")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
2023-08-30 05:54:47 -04:00
Jinjie Ruan
cd59cdefc2 rbd: use list_for_each_entry() helper
Convert list_for_each() to list_for_each_entry() so that the tmp
list_head pointer and list_entry() call are no longer needed, which
can reduce a few lines of code. No functional changed.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-30 11:51:55 +02:00
Huacai Chen
9d1785590b Merge tag 'md-next-20230814-resend' into loongarch-next
LoongArch architecture changes for 6.5 (raid5/6 optimization) depend on
the md changes to fix build and work, so merge them to create a base.
2023-08-30 17:35:54 +08:00
Sergey Senozhatsky
fb5a431559 dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
__dma_entry_alloc_check_leak() calls into printk -> serial console
output (qcom geni) and grabs port->lock under free_entries_lock
spin lock, which is a reverse locking dependency chain as qcom_geni
IRQ handler can call into dma-debug code and grab free_entries_lock
under port->lock.

Move __dma_entry_alloc_check_leak() call out of free_entries_lock
scope so that we don't acquire serial console's port->lock under it.

Trimmed-down lockdep splat:

 The existing dependency chain (in reverse order) is:

               -> #2 (free_entries_lock){-.-.}-{2:2}:
        _raw_spin_lock_irqsave+0x60/0x80
        dma_entry_alloc+0x38/0x110
        debug_dma_map_page+0x60/0xf8
        dma_map_page_attrs+0x1e0/0x230
        dma_map_single_attrs.constprop.0+0x6c/0xc8
        geni_se_rx_dma_prep+0x40/0xcc
        qcom_geni_serial_isr+0x310/0x510
        __handle_irq_event_percpu+0x110/0x244
        handle_irq_event_percpu+0x20/0x54
        handle_irq_event+0x50/0x88
        handle_fasteoi_irq+0xa4/0xcc
        handle_irq_desc+0x28/0x40
        generic_handle_domain_irq+0x24/0x30
        gic_handle_irq+0xc4/0x148
        do_interrupt_handler+0xa4/0xb0
        el1_interrupt+0x34/0x64
        el1h_64_irq_handler+0x18/0x24
        el1h_64_irq+0x64/0x68
        arch_local_irq_enable+0x4/0x8
        ____do_softirq+0x18/0x24
        ...

               -> #1 (&port_lock_key){-.-.}-{2:2}:
        _raw_spin_lock_irqsave+0x60/0x80
        qcom_geni_serial_console_write+0x184/0x1dc
        console_flush_all+0x344/0x454
        console_unlock+0x94/0xf0
        vprintk_emit+0x238/0x24c
        vprintk_default+0x3c/0x48
        vprintk+0xb4/0xbc
        _printk+0x68/0x90
        register_console+0x230/0x38c
        uart_add_one_port+0x338/0x494
        qcom_geni_serial_probe+0x390/0x424
        platform_probe+0x70/0xc0
        really_probe+0x148/0x280
        __driver_probe_device+0xfc/0x114
        driver_probe_device+0x44/0x100
        __device_attach_driver+0x64/0xdc
        bus_for_each_drv+0xb0/0xd8
        __device_attach+0xe4/0x140
        device_initial_probe+0x1c/0x28
        bus_probe_device+0x44/0xb0
        device_add+0x538/0x668
        of_device_add+0x44/0x50
        of_platform_device_create_pdata+0x94/0xc8
        of_platform_bus_create+0x270/0x304
        of_platform_populate+0xac/0xc4
        devm_of_platform_populate+0x60/0xac
        geni_se_probe+0x154/0x160
        platform_probe+0x70/0xc0
        ...

               -> #0 (console_owner){-...}-{0:0}:
        __lock_acquire+0xdf8/0x109c
        lock_acquire+0x234/0x284
        console_flush_all+0x330/0x454
        console_unlock+0x94/0xf0
        vprintk_emit+0x238/0x24c
        vprintk_default+0x3c/0x48
        vprintk+0xb4/0xbc
        _printk+0x68/0x90
        dma_entry_alloc+0xb4/0x110
        debug_dma_map_sg+0xdc/0x2f8
        __dma_map_sg_attrs+0xac/0xe4
        dma_map_sgtable+0x30/0x4c
        get_pages+0x1d4/0x1e4 [msm]
        msm_gem_pin_pages_locked+0x38/0xac [msm]
        msm_gem_pin_vma_locked+0x58/0x88 [msm]
        msm_ioctl_gem_submit+0xde4/0x13ac [msm]
        drm_ioctl_kernel+0xe0/0x15c
        drm_ioctl+0x2e8/0x3f4
        vfs_ioctl+0x30/0x50
        ...

 Chain exists of:
   console_owner --> &port_lock_key --> free_entries_lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(free_entries_lock);
                                lock(&port_lock_key);
                                lock(free_entries_lock);
   lock(console_owner);

                *** DEADLOCK ***

 Call trace:
  dump_backtrace+0xb4/0xf0
  show_stack+0x20/0x30
  dump_stack_lvl+0x60/0x84
  dump_stack+0x18/0x24
  print_circular_bug+0x1cc/0x234
  check_noncircular+0x78/0xac
  __lock_acquire+0xdf8/0x109c
  lock_acquire+0x234/0x284
  console_flush_all+0x330/0x454
  console_unlock+0x94/0xf0
  vprintk_emit+0x238/0x24c
  vprintk_default+0x3c/0x48
  vprintk+0xb4/0xbc
  _printk+0x68/0x90
  dma_entry_alloc+0xb4/0x110
  debug_dma_map_sg+0xdc/0x2f8
  __dma_map_sg_attrs+0xac/0xe4
  dma_map_sgtable+0x30/0x4c
  get_pages+0x1d4/0x1e4 [msm]
  msm_gem_pin_pages_locked+0x38/0xac [msm]
  msm_gem_pin_vma_locked+0x58/0x88 [msm]
  msm_ioctl_gem_submit+0xde4/0x13ac [msm]
  drm_ioctl_kernel+0xe0/0x15c
  drm_ioctl+0x2e8/0x3f4
  vfs_ioctl+0x30/0x50
  ...

Reported-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-30 11:29:08 +02:00
Benjamin Block
acf00b5ef9 s390/airq: remove lsi_mask from airq_struct
Remove the field `lsi_mask` from `struct airq_struct` as it is not
utilized for any adapter interrupt, other than setting it to the default
value of 0xff.

Because nobody is using this functionality, all it does is cost a little
bit of time with each delivered adapter interrupt.

Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:28 +02:00
Heiko Carstens
a7eb28801b s390/mm: use __set_memory() variants where useful
Use the __set_memory_yy() variants instead of set_memory_yy() where
useful. This allows to make the code a bit more readable.

This also fixes the debug pagealloc case, where set_memory_4k() might be
called for an area larger than 8TB which would lead to an overflow of
the num_pages parameter of set_memory_4k().

However RELOC_HIDE() has to be used for the __set_memory_4k() case for
the time being, to avoid compiler warnings because of performing pointer
arithmetic on a NULL pointer, which has undefined behavior. This happens
because __va(0) always translates to NULL. However this will change, and
as soon as this happens the RELOC_HIDE() hack can be removed again.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:28 +02:00
Heiko Carstens
850612c8e4 s390/set_memory: add __set_memory() variant
Add a __set_memory_yy() variant for all set_memory_yy()
implementations. The new variant takes start and end void pointers,
which allows them to be used without the usual unsigned long cast.

However more important: the new variant can be used for areas larger
than 8TB. The old variant comes with an "int numpages" parameter, which
overflows with more than 8TB. Given that for debug_pagealloc
set_memory_4k() is used on the whole kernel mapping this is not only a
theoretical problem, but must be fixed.

Changing all set_memory_yy() variants only on s390 to take an "unsigned
long numpages" parameter is not possible, since the common module code
requires an int parameter from all architectures on these functions.
See module_set_memory().

Therefore change/fix this on s390 only with a new interface, and address
common code later.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:28 +02:00
Heiko Carstens
c22a4c8aaf s390/set_memory: generate all set_memory() functions
The set_memory() functions all follow the same pattern. Use a macro to
generate them, and in result remove a bit of code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Heiko Carstens
a6e49f10f4 s390/mm: improve description of mapping permissions of prefix pages
Slightly improve the description which explains why the first prefix
page must be mapped executable when the BEAR-enhancement facility is
not installed.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Heiko Carstens
3eeb07788f s390/amode31: change type of __samode31, __eamode31, etc
For consistencs reasons change the type of __samode31, __eamode31,
__stext_amode31, and __etext_amode31 to a char pointer so they
(nearly) match the type of all other sections.

This allows for code simplifications with follow-on patches.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Heiko Carstens
c0f1d47812 s390/mm: simplify kernel mapping setup
The kernel mapping is setup in two stages: in the decompressor map all
pages with RWX permissions, and within the kernel change all mappings to
their final permissions, where most of the mappings are changed from RWX to
RWNX.

Change this and map all pages RWNX from the beginning, however without
enabling noexec via control register modification. This means that
effectively all pages are used with RWX permissions like before. When the
final permissions have been applied to the kernel mapping enable noexec via
control register modification.

This allows to remove quite a bit of non-obvious code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Heiko Carstens
b6f10e2f66 s390: remove "noexec" option
Do the same like x86 with commit 76ea0025a2 ("x86/cpu: Remove "noexec"")
and remove the "noexec" kernel command line option.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Alexander Gordeev
7b03942ff3 s390/vmem: fix virtual vs physical address confusion
Fix virtual vs physical address confusion (which currently are the same).

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Gerald Schaefer
789dd8cb1e s390/dcssblk: fix lockdep warning
dcssblk_remove_store() holds the dcssblk_devices_sem semaphore while
calling del_gendisk(dev_info->gd), which in turn tries to acquire
disk->open_mutex. Then there is dcssblk_release(), which is called
with disk->open_mutex held, and tries to acquire dcssblk_devices_sem.

Lockdep reports this as possible circular locking dependency (CPU0 =
dcssblk_remove_store, CPU1 = dcssblk_release):

[   44.948865]  Possible unsafe locking scenario:

[   44.948866]        CPU0                    CPU1
[   44.948867]        ----                    ----
[   44.948868]   lock(&dcssblk_devices_sem);
[   44.948870]                                lock(&disk->open_mutex);
[   44.948872]                                lock(&dcssblk_devices_sem);
[   44.948874]   lock(&disk->open_mutex);
[   44.948876]
                *** DEADLOCK ***

In practice, this deadlock should not happen, since dcssblk_remove_store()
checks for dev_info->use_count != 0 after acquiring dcssblk_devices_sem,
and breaks out before calling del_gendisk(). dev_info->use_count will be
decremented in dcssblk_release(), protected by dcssblk_devices_sem.

Still there is no need for dcssblk_remove_store() to hold the
dcssblk_devices_sem until after calling del_gendisk(), as this only
protects dcssblk internal data. So fix the lockdep warning by releasing
dcssblk_devices_sem earlier. Also move the segment_unload() loop up,
similar to dcssblk_shared_store() error path, no need to do that after
calling del_gendisk().

Also change dcssblk_shared_store() error path, where dcssblk_devices_sem
was also released only after calling del_gendisk(), and a similar lockdep
warning could be triggered (but also deadlock prevented by check for
dev_info->use_count).

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:27 +02:00
Gerald Schaefer
67ce50ce01 s390/monreader: fix virtual vs physical address confusion
Fix virtual vs physical address confusion (which currently are the same).

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-30 11:03:26 +02:00
Heng Guo
e4da8c7897 net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated
commit edf391ff17 ("snmp: add missing counters for RFC 4293") had
already added OutOctets for RFC 4293. In commit 2d8dbb04c6 ("snmp: fix
OutOctets counter to include forwarded datagrams"), OutOctets was
counted again, but not removed from ip_output().

According to RFC 4293 "3.2.3. IP Statistics Tables",
ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So
"IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not
accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment.

This patch reverts commit 2d8dbb04c6 ("snmp: fix OutOctets counter to
include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to
ip_finish_output2 for ipv4.

Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-30 09:44:09 +01:00
Justin Stitt
e8f13e061d x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
When building x86 defconfig with Clang-18 I get the following warnings:

  | arch/x86/ia32/audit.c:6:10: warning: no previous extern declaration for non-static variable 'ia32_dir_class' [-Wmissing-variable-declarations]
  |     6 | unsigned ia32_dir_class[] = {
  | arch/x86/ia32/audit.c:11:10: warning: no previous extern declaration for non-static variable 'ia32_chattr_class' [-Wmissing-variable-declarations]
  |    11 | unsigned ia32_chattr_class[] = {
  | arch/x86/ia32/audit.c:16:10: warning: no previous extern declaration for non-static variable 'ia32_write_class' [-Wmissing-variable-declarations]
  |    16 | unsigned ia32_write_class[] = {
  | arch/x86/ia32/audit.c:21:10: warning: no previous extern declaration for non-static variable 'ia32_read_class' [-Wmissing-variable-declarations]
  |    21 | unsigned ia32_read_class[] = {
  | arch/x86/ia32/audit.c:26:10: warning: no previous extern declaration for non-static variable 'ia32_signal_class' [-Wmissing-variable-declarations]
  |    26 | unsigned ia32_signal_class[] = {

These warnings occur due to their respective extern declarations being
scoped inside of audit_classes_init as well as only being enabled with
`CONFIG_IA32_EMULATION=y`:

  | static int __init audit_classes_init(void)
  | {
  | #ifdef CONFIG_IA32_EMULATION
  |	extern __u32 ia32_dir_class[];
  |	extern __u32 ia32_write_class[];
  |	extern __u32 ia32_read_class[];
  |	extern __u32 ia32_chattr_class[];
  |	audit_register_class(AUDIT_CLASS_WRITE_32, ia32_write_class);
  |	audit_register_class(AUDIT_CLASS_READ_32, ia32_read_class);
  |	audit_register_class(AUDIT_CLASS_DIR_WRITE_32, ia32_dir_class);
  |	audit_register_class(AUDIT_CLASS_CHATTR_32, ia32_chattr_class);
  | #endif
  |	audit_register_class(AUDIT_CLASS_WRITE, write_class);
  |	audit_register_class(AUDIT_CLASS_READ, read_class);
  |	audit_register_class(AUDIT_CLASS_DIR_WRITE, dir_class);
  |	audit_register_class(AUDIT_CLASS_CHATTR, chattr_class);
  |	return 0;
  | }

Lift the extern declarations to their own header and resolve scoping
issues (and thus fix the warnings).

Moreover, change __u32 to unsigned so that we match the definitions:

  | unsigned ia32_dir_class[] = {
  | #include <asm-generic/audit_dir_write.h>
  | ~0U
  | };
  |
  | unsigned ia32_chattr_class[] = {
  | #include <asm-generic/audit_change_attr.h>
  | ~0U
  | };
  | ...

This patch is similar to commit:

  0e5e3d4461 ("x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()") [1]

Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/20200516123816.2680-1-b.thiel@posteo.de/ [1]
Link: https://github.com/ClangBuiltLinux/linux/issues/1920
Link: https://lore.kernel.org/r/20230829-missingvardecl-audit-v1-1-34efeb7f3539@google.com
2023-08-30 10:11:16 +02:00
NeilBrown
0d6b35283b sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE
task_state_index() ignores uninteresting state flags (such as
TASK_FREEZABLE) for most states, but for TASK_IDLE and TASK_RTLOCK_WAIT
it does not.

So if a task is waiting TASK_IDLE|TASK_FREEZABLE it gets incorrectly
reported as TASK_UNINTERRUPTIBLE or "D".  (it is planned for nfsd to
change to use this state).

Fix this by only testing the interesting bits and not the irrelevant
bits in __task_state_index()

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/169335025927.5133.4781141800413736103@noble.neil.brown.name
2023-08-30 10:08:38 +02:00
John Fastabend
35d2b7ffff bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
Sockmap and sockhash maps are a collection of psocks that are
objects representing a socket plus a set of metadata needed
to manage the BPF programs associated with the socket. These
maps use the stab->lock to protect from concurrent operations
on the maps, e.g. trying to insert to objects into the array
at the same time in the same slot. Additionally, a sockhash map
has a bucket lock to protect iteration and insert/delete into
the hash entry.

Each psock has a psock->link which is a linked list of all the
maps that a psock is attached to. This allows a psock (socket)
to be included in multiple sockmap and sockhash maps. This
linked list is protected the psock->link_lock.

They _must_ be nested correctly to avoid deadlock:

  lock(stab->lock)
    : do BPF map operations and psock insert/delete
    lock(psock->link_lock)
       : add map to psock linked list of maps
    unlock(psock->link_lock)
  unlock(stab->lock)

For non PREEMPT_RT kernels both raw_spin_lock_t and spin_lock_t
are guaranteed to not sleep. But, with PREEMPT_RT kernels the
spin_lock_t variants may sleep. In the current code we have
many patterns like this:

   rcu_critical_section:
      raw_spin_lock(stab->lock)
         spin_lock(psock->link_lock) <- may sleep ouch
         spin_unlock(psock->link_lock)
      raw_spin_unlock(stab->lock)
   rcu_critical_section

Nesting spin_lock() inside a raw_spin_lock() violates locking
rules for PREEMPT_RT kernels. And additionally we do alloc(GFP_ATOMICS)
inside the stab->lock, but those might sleep on PREEMPT_RT kernels.
The result is splats like this:

./test_progs -t sockmap_basic
[   33.344330] bpf_testmod: loading out-of-tree module taints kernel.
[   33.441933]
[   33.442089] =============================
[   33.442421] [ BUG: Invalid wait context ]
[   33.442763] 6.5.0-rc5-01731-gec0ded2e0282 #4958 Tainted: G           O
[   33.443320] -----------------------------
[   33.443624] test_progs/2073 is trying to lock:
[   33.443960] ffff888102a1c290 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x2c2/0x3d0
[   33.444636] other info that might help us debug this:
[   33.444991] context-{5:5}
[   33.445183] 3 locks held by test_progs/2073:
[   33.445498]  #0: ffff88811a208d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0xff/0x330
[   33.446159]  #1: ffffffff842539e0 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0xf5/0x330
[   33.446809]  #2: ffff88810d687240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x177/0x3d0
[   33.447445] stack backtrace:
[   33.447655] CPU: 10 PID

To fix observe we can't readily remove the allocations (for that
we would need to use/create something similar to bpf_map_alloc). So
convert raw_spin_lock_t to spin_lock_t. We note that sock_map_update
that would trigger the allocate and potential sleep is only allowed
through sys_bpf ops and via sock_ops which precludes hw interrupts
and low level atomic sections in RT preempt kernel. On non RT
preempt kernel there are no changes here and spin locks sections
and alloc(GFP_ATOMIC) are still not sleepable.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230830053517.166611-1-john.fastabend@gmail.com
2023-08-30 09:58:42 +02:00
Eduard Zingerman
be4033d360 docs/bpf: Add description for CO-RE relocations
Add a section on CO-RE relocations to llvm_relo.rst. Describe relevant .BTF.ext
structure, `enum bpf_core_relo_kind` and `struct bpf_core_relo` in some detail.

Description is based on doc-strings from:

  - include/uapi/linux/bpf.h:struct bpf_core_relo
  - tools/lib/bpf/relo_core.c:__bpf_core_types_match()

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20230826222912.2560865-2-eddyz87@gmail.com
2023-08-30 09:53:02 +02:00
Will Hawkins
2d71a90f7e bpf, docs: Correct source of offset for program-local call
The offset to use when calculating the target of a program-local call is
in the instruction's imm field, not its offset field.

Signed-off-by: Will Hawkins <hawkinsw@obs.cr>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230826053258.1860167-1-hawkinsw@obs.cr
2023-08-30 09:49:53 +02:00
Yonghong Song
5439cfa7fe selftests/bpf: Fix flaky cgroup_iter_sleepable subtest
Occasionally, with './test_progs -j' on my vm, I will hit the
following failure:

  test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec
  test_cgroup_iter_sleepable:PASS:skel_open 0 nsec
  test_cgroup_iter_sleepable:PASS:skel_load 0 nsec
  test_cgroup_iter_sleepable:PASS:attach_iter 0 nsec
  test_cgroup_iter_sleepable:PASS:iter_create 0 nsec
  test_cgroup_iter_sleepable:FAIL:cgroup_id unexpected cgroup_id: actual 1 != expected 2812
  #48/5    cgrp_local_storage/cgroup_iter_sleepable:FAIL
  #48      cgrp_local_storage:FAIL

Finally, I decided to do some investigation since the test is introduced
by myself. It turns out the reason is due to cgroup_fd with value 0.
In cgroup_iter, a cgroup_fd of value 0 means the root cgroup.

	/* from cgroup_iter.c */
        if (fd)
                cgrp = cgroup_v1v2_get_from_fd(fd);
        else if (id)
                cgrp = cgroup_get_from_id(id);
        else /* walk the entire hierarchy by default. */
                cgrp = cgroup_get_from_path("/");

That is why we got cgroup_id 1 instead of expected 2812.

Why we got a cgroup_fd 0? Nobody should really touch 'stdin' (fd 0) in
test_progs. I traced 'close' syscall with stack trace and found the root
cause, which is a bug in bpf_obj_pinning.c. Basically, the code closed
fd 0 although it should not. Fixing the bug in bpf_obj_pinning.c also
resolved the above cgroup_iter_sleepable subtest failure.

Fixes: 3b22f98e5a ("selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230827150551.1743497-1-yonghong.song@linux.dev
2023-08-30 08:45:05 +02:00
Tirthendu Sarkar
9d0a67b9d4 xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
Currently, xsk_build_skb() is a function that builds skb in two possible
ways and then is ended with common error handling.

We can distinguish four possible error paths and handling in xsk_build_skb():

 1. sock_alloc_send_skb fails: Retry (skb is NULL).
 2. skb_store_bits fails : Free skb and retry.
 3. MAX_SKB_FRAGS exceeded: Free skb, cleanup and drop packet.
 4. alloc_page fails for frag: Retry page allocation w/o freeing skb

1] and 3] can happen in xsk_build_skb_zerocopy(), which is one of the
two code paths responsible for building skb. Common error path in
xsk_build_skb() assumes that in case errno != -EAGAIN, skb is a valid
pointer, which is wrong as kernel test robot reports that in
xsk_build_skb_zerocopy() other errno values are returned for skb being
NULL.

To fix this, set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded
and packet needs to be dropped in both xsk_build_skb() and
xsk_build_skb_zerocopy() and use this to distinguish against all other
error cases. Also, add explicit kfree_skb() for 3] so that handling
of 1], 2], and 3] becomes identical where allocation needs to be retried.

Fixes: cf24f5a5fe ("xsk: add support for AF_XDP multi-buffer on Tx path")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com
Link: https://lore.kernel.org/bpf/20230823144713.2231808-1-tirthendu.sarkar@intel.com
2023-08-30 08:41:23 +02:00
Yafang Shao
6a8faf1070 bpftool: Fix build warnings with -Wtype-limits
Quentin reported build warnings when building bpftool :

    link.c: In function ‘perf_config_hw_cache_str’:
    link.c:86:18: warning: comparison of unsigned expression in ‘>= 0’ is always true [-Wtype-limits]
       86 |         if ((id) >= 0 && (id) < ARRAY_SIZE(array))      \
          |                  ^~
    link.c:320:20: note: in expansion of macro ‘perf_event_name’
      320 |         hw_cache = perf_event_name(evsel__hw_cache, config & 0xff);
          |                    ^~~~~~~~~~~~~~~
    [... more of the same for the other calls to perf_event_name ...]

He also pointed out the reason and the solution:

  We're always passing unsigned, so it should be safe to drop the check on
  (id) >= 0.

Fixes: 62b57e3ddd ("bpftool: Add perf event names")
Reported-by: Quentin Monnet <quentin@isovalent.com>
Suggested-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Closes: https://lore.kernel.org/bpf/a35d9a2d-54a0-49ec-9ed1-8fcf1369d3cc@isovalent.com
Link: https://lore.kernel.org/bpf/20230830030325.3786-1-laoar.shao@gmail.com
2023-08-30 08:39:00 +02:00
Yonghong Song
32337c0a28 bpf: Prevent inlining of bpf_fentry_test7()
With latest clang18, I hit test_progs failures for the following test:

  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #75      fentry_fexit:FAIL
  #76/1    fentry_test/fentry:FAIL
  #76      fentry_test:FAIL
  #80/1    fexit_test/fexit:FAIL
  #80      fexit_test:FAIL
  #110/1   kprobe_multi_test/skel_api:FAIL
  #110/2   kprobe_multi_test/link_api_addrs:FAIL
  #110/3   kprobe_multi_test/link_api_syms:FAIL
  #110/4   kprobe_multi_test/attach_api_pattern:FAIL
  #110/5   kprobe_multi_test/attach_api_addrs:FAIL
  #110/6   kprobe_multi_test/attach_api_syms:FAIL
  #110     kprobe_multi_test:FAIL

For example, for #13/2, the error messages are:

  [...]
  kprobe_multi_test_run:FAIL:kprobe_test7_result unexpected kprobe_test7_result: actual 0 != expected 1
  [...]
  kprobe_multi_test_run:FAIL:kretprobe_test7_result unexpected kretprobe_test7_result: actual 0 != expected 1

clang17 does not have this issue.

Further investigation shows that kernel func bpf_fentry_test7(), used in
the above tests, is inlined by the compiler although it is marked as
noinline.

  int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg)
  {
        return (long)arg;
  }

It is known that for simple functions like the above (e.g. just returning
a constant or an input argument), the clang compiler may still do inlining
for a noinline function. Adding 'asm volatile ("")' in the beginning of the
bpf_fentry_test7() can prevent inlining.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230826200843.2210074-1-yonghong.song@linux.dev
2023-08-30 08:36:17 +02:00
Herbert Xu
ba22e81872 crypto: powerpc/chacha20,poly1305-p10 - Add dependency on VSX
Add dependency on VSX as otherwise the build will fail without
it.

Fixes: 161fca7e3e ("crypto: powerpc - Add chacha20/poly1305-p10 to Kconfig and Makefile")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308251906.SYawej6g-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-08-30 13:48:39 +08:00
Linus Torvalds
6c1b980a7e Merge tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-maping updates from Christoph Hellwig:

 - allow dynamic sizing of the swiotlb buffer, to cater for secure
   virtualization workloads that require all I/O to be bounce buffered
   (Petr Tesarik)

 - move a declaration to a header (Arnd Bergmann)

 - check for memory region overlap in dma-contiguous (Binglei Wang)

 - remove the somewhat dangerous runtime swiotlb-xen enablement and
   unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)

 - per-node CMA improvements (Yajun Deng)

* tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: optimize get_max_slots()
  swiotlb: move slot allocation explanation comment where it belongs
  swiotlb: search the software IO TLB only if the device makes use of it
  swiotlb: allocate a new memory pool when existing pools are full
  swiotlb: determine potential physical address limit
  swiotlb: if swiotlb is full, fall back to a transient memory pool
  swiotlb: add a flag whether SWIOTLB is allowed to grow
  swiotlb: separate memory pool data from other allocator data
  swiotlb: add documentation and rename swiotlb_do_find_slots()
  swiotlb: make io_tlb_default_mem local to swiotlb.c
  swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
  dma-contiguous: check for memory region overlap
  dma-contiguous: support numa CMA for specified node
  dma-contiguous: support per-numa CMA for all architectures
  dma-mapping: move arch_dma_set_mask() declaration to header
  swiotlb: unexport is_swiotlb_active
  x86: always initialize xen-swiotlb when xen-pcifront is enabling
  xen/pci: add flag for PCI passthrough being possible
2023-08-29 20:32:10 -07:00
Linus Torvalds
3d3dfeb3ae Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
 "Pretty quiet round for this release. This contains:

   - Add support for zoned storage to ublk (Andreas, Ming)

   - Series improving performance for drivers that mark themselves as
     needing a blocking context for issue (Bart)

   - Cleanup the flush logic (Chengming)

   - sed opal keyring support (Greg)

   - Fixes and improvements to the integrity support (Jinyoung)

   - Add some exports for bcachefs that we can hopefully delete again in
     the future (Kent)

   - deadline throttling fix (Zhiguo)

   - Series allowing building the kernel without buffer_head support
     (Christoph)

   - Sanitize the bio page adding flow (Christoph)

   - Write back cache fixes (Christoph)

   - MD updates via Song:
      - Fix perf regression for raid0 large sequential writes (Jan)
      - Fix split bio iostat for raid0 (David)
      - Various raid1 fixes (Heinz, Xueshi)
      - raid6test build fixes (WANG)
      - Deprecate bitmap file support (Christoph)
      - Fix deadlock with md sync thread (Yu)
      - Refactor md io accounting (Yu)
      - Various non-urgent fixes (Li, Yu, Jack)

   - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
     Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"

* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
  block: use strscpy() to instead of strncpy()
  block: sed-opal: keyring support for SED keys
  block: sed-opal: Implement IOC_OPAL_REVERT_LSP
  block: sed-opal: Implement IOC_OPAL_DISCOVERY
  blk-mq: prealloc tags when increase tagset nr_hw_queues
  blk-mq: delete redundant tagset map update when fallback
  blk-mq: fix tags leak when shrink nr_hw_queues
  ublk: zoned: support REQ_OP_ZONE_RESET_ALL
  md: raid0: account for split bio in iostat accounting
  md/raid0: Fix performance regression for large sequential writes
  md/raid0: Factor out helper for mapping and submitting a bio
  md raid1: allow writebehind to work on any leg device set WriteMostly
  md/raid1: hold the barrier until handle_read_error() finishes
  md/raid1: free the r1bio before waiting for blocked rdev
  md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
  blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
  drivers/rnbd: restore sysfs interface to rnbd-client
  md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
  raid6: test: only check for Altivec if building on powerpc hosts
  raid6: test: make sure all intermediate and artifact files are .gitignored
  ...
2023-08-29 20:21:42 -07:00
Linus Torvalds
c1b7fcf3f6 Merge tag 'for-6.6/io_uring-2023-08-28' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
 "Fairly quiet round in terms of features, mostly just improvements all
  over the map for existing code. In detail:

   - Initial support for socket operations through io_uring. Latter half
     of this will likely land with the 6.7 kernel, then allowing things
     like get/setsockopt (Breno)

   - Cleanup of the cancel code, and then adding support for canceling
     requests with the opcode as the key (me)

   - Improvements for the io-wq locking (me)

   - Fix affinity setting for SQPOLL based io-wq (me)

   - Remove the io_uring userspace code. These were added initially as
     copies from liburing, but all of them have since bitrotted and are
     way out of date at this point. Rather than attempt to keep them in
     sync, just get rid of them. People will have liburing available
     anyway for these examples. (Pavel)

   - Series improving the CQ/SQ ring caching (Pavel)

   - Misc fixes and cleanups (Pavel, Yue, me)"

* tag 'for-6.6/io_uring-2023-08-28' of git://git.kernel.dk/linux: (47 commits)
  io_uring: move iopoll ctx fields around
  io_uring: move multishot cqe cache in ctx
  io_uring: separate task_work/waiting cache line
  io_uring: banish non-hot data to end of io_ring_ctx
  io_uring: move non aligned field to the end
  io_uring: add option to remove SQ indirection
  io_uring: compact SQ/CQ heads/tails
  io_uring: force inline io_fill_cqe_req
  io_uring: merge iopoll and normal completion paths
  io_uring: reorder cqring_flush and wakeups
  io_uring: optimise extra io_get_cqe null check
  io_uring: refactor __io_get_cqe()
  io_uring: simplify big_cqe handling
  io_uring: cqe init hardening
  io_uring: improve cqe !tracing hot path
  io_uring/rsrc: Annotate struct io_mapped_ubuf with __counted_by
  io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
  io_uring: simplify io_run_task_work_sig return
  io_uring/rsrc: keep one global dummy_ubuf
  io_uring: never overflow io_aux_cqe
  ...
2023-08-29 20:11:33 -07:00
Linus Torvalds
adfd671676 Merge tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
 "Long ago we set out to remove the kitchen sink on kernel/sysctl.c
  arrays and placings sysctls to their own sybsystem or file to help
  avoid merge conflicts. Matthew Wilcox pointed out though that if we're
  going to do that we might as well also *save* space while at it and
  try to remove the extra last sysctl entry added at the end of each
  array, a sentintel, instead of bloating the kernel by adding a new
  sentinel with each array moved.

  Doing that was not so trivial, and has required slowing down the moves
  of kernel/sysctl.c arrays and measuring the impact on size by each new
  move.

  The complex part of the effort to help reduce the size of each sysctl
  is being done by the patient work of el señor Don Joel Granados. A lot
  of this is truly painful code refactoring and testing and then trying
  to measure the savings of each move and removing the sentinels.
  Although Joel already has code which does most of this work,
  experience with sysctl moves in the past shows is we need to be
  careful due to the slew of odd build failures that are possible due to
  the amount of random Kconfig options sysctls use.

  To that end Joel's work is split by first addressing the major
  housekeeping needed to remove the sentinels, which is part of this
  merge request. The rest of the work to actually remove the sentinels
  will be done later in future kernel releases.

  The preliminary math is showing this will all help reduce the overall
  build time size of the kernel and run time memory consumed by the
  kernel by about ~64 bytes per array where we are able to remove each
  sentinel in the future. That also means there is no more bloating the
  kernel with the extra ~64 bytes per array moved as no new sentinels
  are created"

* tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  sysctl: Use ctl_table_size as stopping criteria for list macro
  sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
  vrf: Update to register_net_sysctl_sz
  networking: Update to register_net_sysctl_sz
  netfilter: Update to register_net_sysctl_sz
  ax.25: Update to register_net_sysctl_sz
  sysctl: Add size to register_net_sysctl function
  sysctl: Add size arg to __register_sysctl_init
  sysctl: Add size to register_sysctl
  sysctl: Add a size arg to __register_sysctl_table
  sysctl: Add size argument to init_header
  sysctl: Add ctl_table_size to ctl_table_header
  sysctl: Use ctl_table_header in list_for_each_table_entry
  sysctl: Prefer ctl_table_header in proc_sysctl
2023-08-29 17:39:15 -07:00
Linus Torvalds
daa22f5a78 Merge tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
 "Summary of the changes worth highlighting from most interesting to
  boring below:

   - Christoph Hellwig's symbol_get() fix to Nvidia's efforts to
     circumvent the protection he put in place in year 2020 to prevent
     proprietary modules from using GPL only symbols, and also ensuring
     proprietary modules which export symbols grandfather their taint.

     That was done through year 2020 commit 262e6ae708 ("modules:
     inherit TAINT_PROPRIETARY_MODULE"). Christoph's new fix is done by
     clarifing __symbol_get() was only ever intended to prevent module
     reference loops by Linux kernel modules and so making it only find
     symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic
     used by Nvidia was to use symbol_get() to purposely swift through
     proprietary module symbols and completely bypass our traditional
     EXPORT_SYMBOL*() annotations and community agreed upon
     restrictions.

     A small set of preamble patches fix up a few symbols which just
     needed adjusting for this on two modules, the rtc ds1685 and the
     networking enetc module. Two other modules just needed some build
     fixing and removal of use of __symbol_get() as they can't ever be
     modular, as was done by Arnd on the ARM pxa module and Christoph
     did on the mmc au1xmmc driver.

     This is a good reminder to us that symbol_get() is just a hack to
     address things which should be fixed through Kconfig at build time
     as was done in the later patches, and so ultimately it should just
     go.

   - Extremely late minor fix for old module layout 055f23b74b
     ("module: check for exit sections in layout_sections() instead of
     module_init_section()") by James Morse for arm64. Note that this
     layout thing is old, it is *not* Song Liu's commit ac3b432839
     ("module: replace module_layout with module_memory"). The issue
     however is very odd to run into and so there was no hurry to get
     this in fast.

   - Although the fix did not go through the modules tree I'd like to
     highlight the fix by Peter Zijlstra in commit 5409730962
     ("x86/static_call: Fix __static_call_fixup()") now merged in your
     tree which came out of what was originally suspected to be a
     fallout of the the newer module layout changes by Song Liu commit
     ac3b432839 ("module: replace module_layout with module_memory")
     instead of module_init_section()"). Thanks to the report by
     Christian Bricart and the debugging by Song Liu & Peter that turned
     to be noted as a kernel regression in place since v5.19 through
     commit ee88d363d1 ("x86,static_call: Use alternative RET
     encoding").

     I highlight this to reflect and clarify that we haven't seen more
     fallout from ac3b432839 ("module: replace module_layout with
     module_memory").

   - RISC-V toolchain got mapping symbol support which prefix symbols
     with "$" to help with alignment considerations for disassembly.

     This is used to differentiate between incompatible instruction
     encodings when disassembling. RISC-V just matches what ARM/AARCH64
     did for alignment considerations and Palmer Dabbelt extended
     is_mapping_symbol() to accept these symbols for RISC-V. We already
     had support for this for all architectures but it also checked for
     the second character, the RISC-V check Dabbelt added was just for
     the "$". After a bit of testing and fallout on linux-next and based
     on feedback from Masahiro Yamada it was decided to simplify the
     check and treat the first char "$" as unique for all architectures,
     and so we no make is_mapping_symbol() for all archs if the symbol
     starts with "$".

     The most relevant commit for this for RISC-V on binutils was:

       https://sourceware.org/pipermail/binutils/2021-July/117350.html

   - A late fix by Andrea Righi (today) to make module zstd
     decompression use vmalloc() instead of kmalloc() to account for
     large compressed modules. I suspect we'll see similar things for
     other decompression algorithms soon.

   - samples/hw_breakpoint minor fixes by Rong Tao, Arnd Bergmann and
     Chen Jiahao"

* tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module/decompress: use vmalloc() for zstd decompression workspace
  kallsyms: Add more debug output for selftest
  ARM: module: Use module_init_layout_section() to spot init sections
  arm64: module: Use module_init_layout_section() to spot init sections
  module: Expose module_init_layout_section()
  modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
  rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff
  net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index
  mmc: au1xmmc: force non-modular build and remove symbol_get usage
  ARM: pxa: remove use of symbol_get()
  samples/hw_breakpoint: mark sample_hbp as static
  samples/hw_breakpoint: fix building without module unloading
  samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'
  modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
  kernel: params: Remove unnecessary ‘0’ values from err
  module: Ignore RISC-V mapping symbols too
2023-08-29 17:32:32 -07:00
Nathan Chancellor
75d1d3a433 clk: qcom: Fix SM_GPUCC_8450 dependencies
CONFIG_SM_GCC_8450 depends on ARM64 but it is selected by
CONFIG_SM_GPUCC_8450, which can be selected on ARM, resulting in a
Kconfig warning.

WARNING: unmet direct dependencies detected for SM_GCC_8450
  Depends on [n]: COMMON_CLK [=y] && COMMON_CLK_QCOM [=y] && (ARM64 || COMPILE_TEST [=n])
  Selected by [y]:
  - SM_GPUCC_8450 [=y] && COMMON_CLK [=y] && COMMON_CLK_QCOM [=y]

Add the same dependencies to CONFIG_SM_GPUCC_8450 to resolve the
warning.

Fixes: 728692d49e ("clk: qcom: Add support for SM8450 GPUCC")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230829-fix-sm_gpucc_8550-deps-v1-1-d751f6cd35b2@kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-08-29 15:29:39 -07:00
Linus Torvalds
d68b4b6f30 Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Chuck Lever
b38a6023da Documentation: Add missing documentation for EXPORT_OP flags
The commits that introduced these flags neglected to update the
Documentation/filesystems/nfs/exporting.rst file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Yue Haibing
899525e892 SUNRPC: Remove unused declaration rpc_modcount()
These declarations are never implemented since the beginning of git
history. Remove these, then merge the two #ifdef block for
simplification.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Yue Haibing
07dc19dbd1 SUNRPC: Remove unused declarations
Commit c7d7ec8f04 ("SUNRPC: Remove svc_shutdown_net()") removed
svc_close_net() implementation but left declaration in place. Remove
it.

Commit 1f11a034cd ("SUNRPC new transport for the NFSv4.1 shared
back channel") removed svc_sock_create()/svc_sock_destroy() but not
the declarations.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Chuck Lever
6372e2ee62 NFSD: da_addr_body field missing in some GETDEVICEINFO replies
The XDR specification in RFC 8881 looks like this:

struct device_addr4 {
	layouttype4	da_layout_type;
	opaque		da_addr_body<>;
};

struct GETDEVICEINFO4resok {
	device_addr4	gdir_device_addr;
	bitmap4		gdir_notification;
};

union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
case NFS4_OK:
	GETDEVICEINFO4resok gdir_resok4;
case NFS4ERR_TOOSMALL:
	count4		gdir_mincount;
default:
	void;
};

Looking at nfsd4_encode_getdeviceinfo() ....

When the client provides a zero gd_maxcount, then the Linux NFS
server implementation encodes the da_layout_type field and then
skips the da_addr_body field completely, proceeding directly to
encode gdir_notification field.

There does not appear to be an option in the specification to skip
encoding da_addr_body. Moreover, Section 18.40.3 says:

> If the client wants to just update or turn off notifications, it
> MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
> In that event, if the device ID is valid, the reply's da_addr_body
> field of the gdir_device_addr field will be of zero length.

Since the layout drivers are responsible for encoding the
da_addr_body field, put this fix inside the ->encode_getdeviceinfo
methods.

Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tom Haynes <loghyr@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00