Commit Graph

141124 Commits

Author SHA1 Message Date
Linus Torvalds
f00654007f Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:

 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
   when running xfstests

 - Convert more of mpage to use folios

 - Remove add_to_page_cache() and add_to_page_cache_locked()

 - Convert find_get_pages_range() to filemap_get_folios()

 - Improvements to the read_cache_page() family of functions

 - Remove a few unnecessary checks of PageError

 - Some straightforward filesystem conversions to use folios

 - Split PageMovable users out from address_space_operations into
   their own movable_operations

 - Convert aops->migratepage to aops->migrate_folio

 - Remove nobh support (Christoph Hellwig)

* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
  fs: remove the NULL get_block case in mpage_writepages
  fs: don't call ->writepage from __mpage_writepage
  fs: remove the nobh helpers
  jfs: stop using the nobh helper
  ext2: remove nobh support
  ntfs3: refactor ntfs_writepages
  mm/folio-compat: Remove migration compatibility functions
  fs: Remove aops->migratepage()
  secretmem: Convert to migrate_folio
  hugetlb: Convert to migrate_folio
  aio: Convert to migrate_folio
  f2fs: Convert to filemap_migrate_folio()
  ubifs: Convert to filemap_migrate_folio()
  btrfs: Convert btrfs_migratepage to migrate_folio
  mm/migrate: Add filemap_migrate_folio()
  mm/migrate: Convert migrate_page() to migrate_folio()
  nfs: Convert to migrate_folio
  btrfs: Convert btree_migratepage to migrate_folio
  mm/migrate: Convert expected_page_refs() to folio_expected_refs()
  mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
  ...
2022-08-03 10:35:43 -07:00
Linus Torvalds
e087437a6f Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray
Pull XArray/IDR updates from Matthew Wilcox:

 - Add appropriate might_alloc() annotations to the XArray APIs

 - Document that the IDR is deprecated

* tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray:
  IDR: Note that the IDR API is deprecated
  XArray: Add calls to might_alloc()
2022-08-03 10:02:28 -07:00
Linus Torvalds
b6bb70f9ab Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several core optimizations:

   - threadgroup_rwsem write locking is skipped when configuring
     controllers in empty subtrees.

     Combined with CLONE_INTO_CGROUP, this allows the common static
     usage pattern to not grab threadgroup_rwsem at all (glibc still
     doesn't seem ready for CLONE_INTO_CGROUP unfortunately).

   - threadgroup_rwsem used to be put into non-percpu mode by default
     due to latency concerns in specific use cases. There's no reason
     for everyone else to pay for it. Make the behavior optional.

   - psi no longer allocates memory when disabled.

  ... along with some code cleanups"

* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Skip subtree root in cgroup_update_dfl_csses()
  cgroup: remove "no" prefixed mount options
  cgroup: Make !percpu threadgroup_rwsem operations optional
  cgroup: Add "no" prefixed mount options
  cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
  cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
  cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
  psi: dont alloc memory for psi by default
2022-08-03 09:45:08 -07:00
Kajetan Puchalski
6ab4b19900 cpuidle: Add cpu_idle_miss trace event
Add a trace event for cpuidle to track missed (too deep or too shallow)
wakeups.

After each wakeup, CPUIdle already computes whether the entered state was
optimal, above or below the desired one and updates the relevant
counters. This patch makes it possible to trace those events in addition
to just reading the counters.

The patterns of types and percentages of misses across different
workloads appear to be very consistent. This makes the trace event very
useful for comparing the relative correctness of different CPUIdle
governors for different types of workloads, or for finding the
optimal governor for a given device.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-03 17:50:58 +02:00
Rafael J. Wysocki
f6e0b468da Merge tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull operating performance points (OPP) updates for 5.20-rc1 from Viresh
Kumar:

"- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh
   Kumar).

 - Add dev_pm_opp_set_config() and friends and migrate other
   users/helpers to using them (Viresh Kumar).

 - Add support for multiple clocks for a device (Viresh Kumar and
   Krzysztof Kozlowski).

 - Configure resources before adding OPP table for Venus (Stanimir
   Varbanov).

 - Keep reference count up for opp->np and opp_table->np while they are
   still in use (Liang He).

 - Minor cleanups (Viresh Kumar and Yang Li)."

* tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (43 commits)
  venus: pm_helpers: Fix warning in OPP during probe
  OPP: Don't drop opp->np reference while it is still in use
  OPP: Don't drop opp_table->np reference while it is still in use
  OPP: Remove dev{m}_pm_opp_of_add_table_noclk()
  PM / devfreq: tegra30: Register config_clks helper
  OPP: Allow config_clks helper for single clk case
  OPP: Provide a simple implementation to configure multiple clocks
  OPP: Assert clk_count == 1 for single clk helpers
  OPP: Add key specific assert() method to key finding helpers
  OPP: Compare bandwidths for all paths in _opp_compare_key()
  OPP: Allow multiple clocks for a device
  dt-bindings: opp: accept array of frequencies
  OPP: Make dev_pm_opp_set_opp() independent of frequency
  OPP: Reuse _opp_compare_key() in _opp_add_static_v2()
  OPP: Remove rate_not_available parameter to _opp_add()
  OPP: Use consistent names for OPP table instances
  OPP: Use generic key finding helpers for bandwidth key
  OPP: Use generic key finding helpers for level key
  OPP: Add generic key finding helpers and use them for freq APIs
  OPP: Remove dev_pm_opp_find_freq_ceil_by_volt()
  ...
2022-08-03 17:49:38 +02:00
Jeff Layton
a8af0d682a libceph: clean up ceph_osdc_start_request prototype
This function always returns 0, and ignores the nofail boolean. Drop the
nofail argument, make the function void return and fix up the callers.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 14:05:39 +02:00
Waiman Long
b6e8d40d43 sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:

	[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
	  :
	[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
	  :
	[80468.207946] Call Trace:
	[80468.208947]  cpuset_can_attach+0xa0/0x140
	[80468.209953]  cgroup_migrate_execute+0x8c/0x490
	[80468.210931]  cgroup_update_dfl_csses+0x254/0x270
	[80468.211898]  cgroup_subtree_control_write+0x322/0x400
	[80468.212854]  kernfs_fop_write_iter+0x11c/0x1b0
	[80468.213777]  new_sync_write+0x11f/0x1b0
	[80468.214689]  vfs_write+0x1eb/0x280
	[80468.215592]  ksys_write+0x5f/0xe0
	[80468.216463]  do_syscall_64+0x5c/0x80
	[80468.224287]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.

Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value >= nr_cpu_ids.

Fixes: 7f51412a41 ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
2022-08-03 10:34:26 +02:00
Paolo Abeni
7c6327c77d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

net/ax25/af_ax25.c
  d7c4c9e075 ("ax25: fix incorrect dev_tracker usage")
  d62607c3fe ("net: rename reference+tracking helpers")

drivers/net/netdevsim/fib.c
  180a6a3ee6 ("netdevsim: fib: Fix reference count leak on route deletion failure")
  012ec02ae4 ("netdevsim: convert driver to use unlocked devlink API during init/fini")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-03 09:04:55 +02:00
Antony Antony
36d763509b xfrm: fix XFRMA_LASTUSED comment
It is a __u64, internally time64_t.

Fixes: bf825f81b4 ("xfrm: introduce basic mark infrastructure")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-03 07:27:37 +02:00
Jan Kara
307af6c879 mbcache: automatically delete entries from cache on freeing
Use the fact that entries with elevated refcount are not removed from
the hash and just move removal of the entry from the hash to the entry
freeing time. When doing this we also change the generic code to hold
one reference to the cache entry, not two of them, which makes code
somewhat more obvious.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:56:25 -04:00
Jan Kara
75896339e4 mbcache: Remove mb_cache_entry_delete()
Nobody uses mb_cache_entry_delete() anymore. Remove it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:56:25 -04:00
Jan Kara
3dc96bba65 mbcache: add functions to delete entry if unused
Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
it is unused and also add a function to wait for entry to become unused
- mb_cache_entry_wait_unused(). We do not share code between the two
deleting function as one of them will go away soon.

CC: stable@vger.kernel.org
Fixes: 82939d7999 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:56:25 -04:00
Jan Kara
d132495856 jbd2: unexport jbd2_log_start_commit()
jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also
make __jbd2_log_start_commit() static when we are at it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:52:19 -04:00
Jan Kara
68af74e92a jbd2: remove unused exports for jbd2 debugging
Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the
first is used only in fs/jbd2/journal.c and the second only within jbd2
code. Remove the pointless exports make jbd2_journal_enable_debug
static.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:52:19 -04:00
Jan Kara
cb3b3bf22c jbd2: rename jbd_debug() to jbd2_debug()
The name of jbd_debug() is confusing as all functions inside jbd2 have
jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-08-02 23:52:19 -04:00
ZiyangZhang
4e18403d94 ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATA
Add one new ublk command: UBLK_IO_NEED_GET_DATA. It is prepared for a new
feature designed for a user application who wants to allocate IO buffer
and set IO buffer address only after it receives an IO request from
ublksrv.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/c8a64b6b51c78340da7daa9e1054608695e79619.1659011443.git.ZiyangZhang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:13:40 -06:00
Ming Lei
4bf9cbf3e9 ublk_drv: cleanup ublksrv_ctrl_dev_info
Remove all block device related info from ublksrv_ctrl_dev_info,
meantime reduce its size into 64 bytes because:

1) ublksrv_ctrl_dev_info becomes cleaner without including any
block related info

2) generic set/get parameter command can be used to set block
related setting easily and cleanly

3) generic set/get parameter command can be used for extending
ublk without needing more info in ublksrv_ctrl_dev_info

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220730092750.1118167-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:13:40 -06:00
Ming Lei
0aa73170eb ublk_drv: add SET_PARAMS/GET_PARAMS control command
Add two commands to set/get parameters generically.

One important goal of ublk is to provide generic framework for making
block device by userspace flexibly.

As one generic block device, there are still lots of block parameters,
such as max_sectors, write_cache/fua, discard related limits,
zoned parameters, ...., so this patch starts to add generic mechanism
for set/get device parameters.

Both generic block parameters(all kinds of queue settings) and ublk
feature parameters can be covered with this way, then it becomes quite
easy to extend in future.

Add two parameter types are used so far: basic(covers basic queue setting
and misc settings which can't be grouped easily) and discard, basic type
must be set, and discard type becomes optional now

This way provides mechanism to simulate any kind of generic block device
from userspace easily, from both block queue setting viewpoint or ublk
feature viewpoint.

The style of putting all parameters together is suggested by Christoph.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220730092750.1118167-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:13:40 -06:00
Christoph Hellwig
46754bd056 block: move ->bio_split to the gendisk
Only non-passthrough requests are split by the block layer and use the
->bio_split bio_set.  Move it from the request_queue to the gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220727162300.3089193-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:08:49 -06:00
Linus Torvalds
e2b5421007 Merge tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull uapi flexible array update from Gustavo Silva:
 "A treewide patch that replaces zero-length arrays with flexible-array
  members in UAPI. This has been baking in linux-next for 5 weeks now.

  '-fstrict-flex-arrays=3' is coming and we need to land these changes
  to prevent issues like these in the short future:

    fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source]
		strcpy(de3->name, ".");
		^

  Since these are all [0] to [] changes, the risk to UAPI is nearly
  zero. If this breaks anything, we can use a union with a new member
  name"

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

* tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  treewide: uapi: Replace zero-length arrays with flexible-array members
2022-08-02 19:50:47 -07:00
Linus Torvalds
665fe72a7d Merge tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
 "This consists of several fixes and an important feature to discourage
  running KUnit tests on production systems. Running tests on a
  production system could leave the system in a bad state.

  Summary:

   - Add a new taint type, TAINT_TEST to signal that a test has been
     run.

     This should discourage people from running these tests on
     production systems, and to make it easier to tell if tests have
     been run accidentally (by loading the wrong configuration, etc)

   - Several documentation and tool enhancements and fixes"

* tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
  Documentation: KUnit: Fix example with compilation error
  Documentation: kunit: Add CLI args for kunit_tool
  kcsan: test: Add a .kunitconfig to run KCSAN tests
  kunit: executor: Fix a memory leak on failure in kunit_filter_tests
  clk: explicitly disable CONFIG_UML_PCI_OVER_VIRTIO in .kunitconfig
  mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro
  nitro_enclaves: test: Use kunit_test_suite() macro
  thunderbolt: test: Use kunit_test_suite() macro
  kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites
  kunit: unify module and builtin suite definitions
  selftest: Taint kernel when test module loaded
  module: panic: Taint the kernel when selftest modules load
  Documentation: kunit: fix example run_kunit func to allow spaces in args
  Documentation: kunit: Cleanup run_wrapper, fix x-ref
  kunit: test.h: fix a kernel-doc markup
  kunit: tool: Enable virtio/PCI by default on UML
  kunit: tool: make --kunitconfig repeatable, blindly concat
  kunit: add coverage_uml.config to enable GCOV on UML
  kunit: tool: refactor internal kconfig handling, allow overriding
  kunit: tool: introduce --qemu_args
  ...
2022-08-02 19:34:45 -07:00
Linus Torvalds
aad26f55f4 Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "This was a moderately busy cycle for documentation, but nothing
  all that earth-shaking:

   - More Chinese translations, and an update to the Italian
     translations.

     The Japanese, Korean, and traditional Chinese translations
     are more-or-less unmaintained at this point, instead.

   - Some build-system performance improvements.

   - The removal of the archaic submitting-drivers.rst document,
     with the movement of what useful material that remained into
     other docs.

   - Improvements to sphinx-pre-install to, hopefully, give more
     useful suggestions.

   - A number of build-warning fixes

  Plus the usual collection of typo fixes, updates, and more"

* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
  docs: efi-stub: Fix paths for x86 / arm stubs
  Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
  Docs/zh_CN: Update the translation of pci to 5.19-rc8
  Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
  Docs/zh_CN: Update the translation of usage to 5.19-rc8
  Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
  Docs/zh_CN: Update the translation of sparse to 5.19-rc8
  Docs/zh_CN: Update the translation of kasan to 5.19-rc8
  Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
  doc:it_IT: align Italian documentation
  docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
  Documentation: process: Update email client instructions for Thunderbird
  docs: ABI: correct QEMU fw_cfg spec path
  doc/zh_CN: remove submitting-driver reference from docs
  docs: zh_TW: align to submitting-drivers removal
  docs: zh_CN: align to submitting-drivers removal
  docs: ko_KR: howto: remove reference to removed submitting-drivers
  docs: ja_JP: howto: remove reference to removed submitting-drivers
  docs: it_IT: align to submitting-drivers removal
  docs: process: remove outdated submitting-drivers.rst
  ...
2022-08-02 19:24:24 -07:00
Linus Torvalds
7d9d077c78 Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offload updates, perhaps most notably a new
   RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
   offloaded at boot time, regardless of kernel boot parameters.

   This is useful to battery-powered systems such as ChromeOS and
   Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
   parameter prevents offloaded callbacks from interfering with
   real-time workloads and with energy-efficiency mechanisms

 - Polled grace-period updates, perhaps most notably making these APIs
   account for both normal and expedited grace periods

 - Tasks RCU updates, perhaps most notably reducing the CPU overhead of
   RCU tasks trace grace periods by more than a factor of two on a
   system with 15,000 tasks.

   The reduction is expected to increase with the number of tasks, so it
   seems reasonable to hypothesize that a system with 150,000 tasks
   might see a 20-fold reduction in CPU overhead

 - Torture-test updates

 - Updates that merge RCU's dyntick-idle tracking into context tracking,
   thus reducing the overhead of transitioning to kernel mode from
   either idle or nohz_full userspace execution for kernels that track
   context independently of RCU.

   This is expected to be helpful primarily for kernels built with
   CONFIG_NO_HZ_FULL=y

* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
  rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
  rcu: Diagnose extended sync_rcu_do_polled_gp() loops
  rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
  rcutorture: Test polled expedited grace-period primitives
  rcu: Add polled expedited grace-period primitives
  rcutorture: Verify that polled GP API sees synchronous grace periods
  rcu: Make Tiny RCU grace periods visible to polled APIs
  rcu: Make polled grace-period API account for expedited grace periods
  rcu: Switch polled grace-period APIs to ->gp_seq_polled
  rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
  rcu/nocb: Add option to opt rcuo kthreads out of RT priority
  rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
  rcu/nocb: Add an option to offload all CPUs on boot
  rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
  rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
  rcu/nocb: Add/del rdp to iterate from rcuog itself
  rcu/tree: Add comment to describe GP-done condition in fqs loop
  rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  rcu/kvfree: Remove useless monitor_todo flag
  rcu: Cleanup RCU urgency state for offline CPU
  ...
2022-08-02 19:12:45 -07:00
Linus Torvalds
c2a24a7a03 Merge tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:

   - Make proc files report fips module name and version

  Algorithms:

   - Move generic SHA1 code into lib/crypto

   - Implement Chinese Remainder Theorem for RSA

   - Remove blake2s

   - Add XCTR with x86/arm64 acceleration

   - Add POLYVAL with x86/arm64 acceleration

   - Add HCTR2

   - Add ARIA

  Drivers:

   - Add support for new CCP/PSP device ID in ccp"

* tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits)
  crypto: tcrypt - Remove the static variable initialisations to NULL
  crypto: arm64/poly1305 - fix a read out-of-bound
  crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps
  crypto: hisilicon/sec - fix auth key size error
  crypto: ccree - Remove a useless dma_supported() call
  crypto: ccp - Add support for new CCP/PSP device ID
  crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of
  crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq
  crypto: testmgr - some more fixes to RSA test vectors
  cyrpto: powerpc/aes - delete the rebundant word "block" in comments
  hwrng: via - Fix comment typo
  crypto: twofish - Fix comment typo
  crypto: rmd160 - fix Kconfig "its" grammar
  crypto: keembay-ocs-ecc - Drop if with an always false condition
  Documentation: qat: rewrite description
  Documentation: qat: Use code block for qat sysfs example
  crypto: lib - add module license to libsha1
  crypto: lib - make the sha1 library optional
  crypto: lib - move lib/sha1.c into lib/crypto/
  crypto: fips - make proc files report fips module name and version
  ...
2022-08-02 17:45:14 -07:00
Linus Torvalds
a0b09f2d6f Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
 "Though there's been a decent amount of RNG-related development during
  this last cycle, not all of it is coming through this tree, as this
  cycle saw a shift toward tackling early boot time seeding issues,
  which took place in other trees as well.

  Here's a summary of the various patches:

   - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
     option have been removed, as they overlapped with the more widely
     supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
     "random.trust_cpu". This change allowed simplifying a bit of arch
     code.

   - x86's RDRAND boot time test has been made a bit more robust, with
     RDRAND disabled if it's clearly producing bogus results. This would
     be a tip.git commit, technically, but I took it through random.git
     to avoid a large merge conflict.

   - The RNG has long since mixed in a timestamp very early in boot, on
     the premise that a computer that does the same things, but does so
     starting at different points in wall time, could be made to still
     produce a different RNG state. Unfortunately, the clock isn't set
     early in boot on all systems, so now we mix in that timestamp when
     the time is actually set.

   - User Mode Linux now uses the host OS's getrandom() syscall to
     generate a bootloader RNG seed and later on treats getrandom() as
     the platform's RDRAND-like faculty.

   - The arch_get_random_{seed_,}_long() family of functions is now
     arch_get_random_{seed_,}_longs(), which enables certain platforms,
     such as s390, to exploit considerable performance advantages from
     requesting multiple CPU random numbers at once, while at the same
     time compiling down to the same code as before on platforms like
     x86.

   - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
     Uros.

   - A comment spelling fix"

More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:

  https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/

* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: correct spelling of "overwrites"
  random: handle archrandom with multiple longs
  um: seed rng using host OS rng
  random: use try_cmpxchg in _credit_init_bits
  timekeeping: contribute wall clock to rng on time change
  x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
  random: remove CONFIG_ARCH_RANDOM
2022-08-02 17:31:35 -07:00
Christoph Hellwig
5a97806f7d block: change the blk_queue_split calling convention
The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.

Also give it and the helpers used to implement it more descriptive names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:22:53 -06:00
Hannes Reinecke
b61775d185 nvme-auth: Diffie-Hellman key exchange support
Implement Diffie-Hellman key exchange using FFDHE groups
for NVMe In-Band Authentication.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:49 -06:00
Hannes Reinecke
f50fff73d6 nvme: implement In-Band authentication
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
This patch adds two new fabric options 'dhchap_secret' to specify the
pre-shared key (in ASCII respresentation according to NVMe 2.0 section
8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify
the pre-shared controller key for bi-directional authentication of both
the host and the controller.
Re-authentication can be triggered by writing the PSK into the new
controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[axboe: fold in clang build fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:49 -06:00
Hannes Reinecke
88b140fec0 nvme: add definitions for NVMe In-Band authentication
Add new definitions for NVMe In-band authentication as defined in
the NVMe Base Specification v2.0.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:47 -06:00
Hannes Reinecke
a116e1cdc6 lib/base64: RFC4648-compliant base64 encoding
Add RFC4648-compliant base64 encoding and decoding routines, based on
the base64url encoding in fs/crypto/fname.c.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:47 -06:00
Hannes Reinecke
9e2f284e14 crypto: add crypto_has_kpp()
Add helper function to determine if a given key-agreement protocol
primitive is supported.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:47 -06:00
Hannes Reinecke
85cc424381 crypto: add crypto_has_shash()
Add helper function to determine if a given synchronous hash is supported.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:47 -06:00
Michael Kelley
2c61c97fb1 nvme: handle the persistent internal error AER
In the NVM Express Revision 1.4 spec, Figure 145 describes possible
values for an AER with event type "Error" (value 000b). For a
Persistent Internal Error (value 03h), the host should perform a
controller reset.

Add support for this error using code that already exists for
doing a controller reset. As part of this support, introduce
two utility functions for parsing the AER type and subtype.

This new support was tested in a lab environment where we can
generate the persistent internal error on demand, and observe
both the Linux side and NVMe controller side to see that the
controller reset has been done.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:47 -06:00
Xiubo Li
b53aca4b46 ceph: fix incorrect old_size length in ceph_mds_request_args
The 'old_size' is a __le64 type since birth, not sure why the
kclient incorrectly switched it to __le32. This change is okay
won't break anything because union will always allocate more memory
than the 'open' member needed.

Rename 'file_replication' to 'pool' as ceph did. Though this 'open'
struct may never be used in kclient in future, it's confusing when
going through the ceph code.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Jeff Layton
020bc44a9f ceph: switch back to testing for NULL folio->private in ceph_dirty_folio
Willy requested that we change this back to warning on folio->private
being non-NULl. He's trying to kill off the PG_private flag, and so we'd
like to catch where it's non-NULL.

Add a VM_WARN_ON_FOLIO (since it doesn't exist yet) and change over to
using that instead of VM_BUG_ON_FOLIO along with testing the ->private
pointer.

[ xiubli: define VM_WARN_ON_FOLIO macro in case DEBUG_VM is disabled
  reported by kernel test robot <lkp@intel.com> ]

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Xiubo Li
1b7587d69e ceph: fix the incorrect comment for the ceph_mds_caps struct
The incorrect comment is misleading. Acutally the last members
in ceph_mds_caps strcut is a union for none export and export
bodies.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Luís Henriques
d93231a6bc ceph: prevent a client from exceeding the MDS maximum xattr size
The MDS tries to enforce a limit on the total key/values in extended
attributes.  However, this limit is enforced only if doing a synchronous
operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS
doesn't have a chance to enforce these limits.

This patch adds support for decoding the xattrs maximum size setting that is
distributed in the mdsmap.  Then, when setting an xattr, the kernel client
will revert to do a synchronous operation if that maximum size is exceeded.

While there, fix a dout() that would trigger a printk warning:

[   98.718078] ------------[ cut here ]------------
[   98.719012] precision 65536 too large
[   98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600
...

Link: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Xiubo Li
4f48d5da81 fs/dcache: export d_same_name() helper
Compare dentry name with case-exact name, return true if names
are same, or false.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Jeff Layton
637fa738b5 fscrypt: add fscrypt_context_for_new_inode
Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.

Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:11 +02:00
Jeff Layton
d3e94fdc4e fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and Base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:11 +02:00
Linus Torvalds
043402495d Merge tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Aside from the one EVM cleanup patch, all the other changes are kexec
  related.

  On different architectures different keyrings are used to verify the
  kexec'ed kernel image signature. Here are a number of preparatory
  cleanup patches and the patches themselves for making the keyrings -
  builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and
  .platform - consistent across the different architectures"

* tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
  arm64: kexec_file: use more system keyrings to verify kernel image signature
  kexec, KEYS: make the code in bzImage64_verify_sig generic
  kexec: clean up arch_kexec_kernel_verify_sig
  kexec: drop weak attribute from functions
  kexec_file: drop weak attribute from functions
  evm: Use IS_ENABLED to initialize .enabled
2022-08-02 15:21:18 -07:00
Linus Torvalds
87fe1adb66 Merge tag 'safesetid-6.0' of https://github.com/micah-morton/linux
Pull SafeSetID updates from Micah Morton:
 "This contains one commit that touches common kernel code, one that
  adds functionality internal to the SafeSetID LSM code, and a few other
  commits that only modify the SafeSetID LSM selftest.

  The commit that touches common kernel code simply adds an LSM hook in
  the setgroups() syscall that mirrors what is done for the existing LSM
  hooks in the setuid() and setgid() syscalls. This commit combined with
  the SafeSetID-specific one allow the LSM to filter setgroups() calls
  according to configured rule sets in the same way that is already done
  for setuid() and setgid()"

* tag 'safesetid-6.0' of https://github.com/micah-morton/linux:
  LSM: SafeSetID: add setgroups() testing to selftest
  LSM: SafeSetID: Add setgroups() security policy handling
  security: Add LSM hook to setgroups() syscall
  LSM: SafeSetID: add GID testing to selftest
  LSM: SafeSetID: selftest cleanup and prepare for GIDs
  LSM: SafeSetID: fix userns bug in selftest
2022-08-02 15:12:13 -07:00
Linus Torvalds
f42e1e3e40 Merge tag 'audit-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
 "Two minor audit patches: on marks a function as static, the other
  removes a redundant length check"

* tag 'audit-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: make is_audit_feature_set() static
  audit: remove redundant data_len check
2022-08-02 14:56:25 -07:00
Linus Torvalds
6991a564f5 Merge tag 'hardening-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:

 - Fix Sparse warnings with randomizd kstack (GONG, Ruiqi)

 - Replace uintptr_t with unsigned long in usercopy (Jason A. Donenfeld)

 - Fix Clang -Wforward warning in LKDTM (Justin Stitt)

 - Fix comment to correctly refer to STRICT_DEVMEM (Lukas Bulwahn)

 - Introduce dm-verity binding logic to LoadPin LSM (Matthias Kaehlcke)

 - Clean up warnings and overflow and KASAN tests (Kees Cook)

* tag 'hardening-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  dm: verity-loadpin: Drop use of dm_table_get_num_targets()
  kasan: test: Silence GCC 12 warnings
  drivers: lkdtm: fix clang -Wformat warning
  x86: mm: refer to the intended config STRICT_DEVMEM in a comment
  dm: verity-loadpin: Use CONFIG_SECURITY_LOADPIN_VERITY for conditional compilation
  LoadPin: Enable loading from trusted dm-verity devices
  dm: Add verity helpers for LoadPin
  stack: Declare {randomize_,}kstack_offset to fix Sparse warnings
  lib: overflow: Do not define 64-bit tests on 32-bit
  MAINTAINERS: Add a general "kernel hardening" section
  usercopy: use unsigned long instead of uintptr_t
2022-08-02 14:38:59 -07:00
Linus Torvalds
8374cfe647 Merge tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - Refactor DM core's mempool allocation so that it clearer by not being
   split acorss files.

 - Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.

 - Optimize DM core's more common bio splitting by eliminating the use
   of bio cloning with bio_split+bio_chain. Shift that cloning cost to
   the relatively unlikely dm_io requeue case that only occurs during
   error handling. Introduces dm_io_rewind() that will clone a bio that
   reflects the subset of the original bio that must be requeued.

 - Remove DM core's dm_table_get_num_targets() wrapper and audit all
   dm_table_get_target() callers.

 - Fix potential for OOM with DM writecache target by setting a default
   MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
   whichever is smaller).

 - Fix DM writecache target's stats that are reported through
   DM-specific table info.

 - Fix use-after-free crash in dm_sm_register_threshold_callback().

 - Refine DM core's Persistent Reservation handling in preparation for
   broader work Mike Christie is doing to add compatibility with
   Microsoft Windows Failover Cluster.

 - Fix various KASAN reported bugs in the DM raid target.

 - Fix DM raid target crash due to md_handle_request() bio splitting
   that recurses to block core without properly initializing the bio's
   bi_dev.

 - Fix some code comment typos and fix some Documentation formatting.

* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
  dm: fix dm-raid crash if md_handle_request() splits bio
  dm raid: fix address sanitizer warning in raid_resume
  dm raid: fix address sanitizer warning in raid_status
  dm: Start pr_preempt from the same starting path
  dm: Fix PR release handling for non All Registrants
  dm: Start pr_reserve from the same starting path
  dm: Allow dm_call_pr to be used for path searches
  dm: return early from dm_pr_call() if DM device is suspended
  dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
  dm writecache: count number of blocks discarded, not number of discard bios
  dm writecache: count number of blocks written, not number of write bios
  dm writecache: count number of blocks read, not number of read bios
  dm writecache: return void from functions
  dm kcopyd: use __GFP_HIGHMEM when allocating pages
  dm writecache: set a default MAX_WRITEBACK_JOBS
  Documentation: dm writecache: Render status list as list
  Documentation: dm writecache: add blank line before optional parameters
  dm snapshot: fix typo in snapshot_map() comment
  dm raid: remove redundant "the" in parse_raid_params() comment
  dm cache: fix typo in 2 comment blocks
  ...
2022-08-02 14:21:25 -07:00
Linus Torvalds
c013d0af81 Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - Improve the type checking of request flags (Bart)

 - Ensure queue mapping for a single queues always picks the right queue
   (Bart)

 - Sanitize the io priority handling (Jan)

 - rq-qos race fix (Jinke)

 - Reserved tags handling improvements (John)

 - Separate memory alignment from file/disk offset aligment for O_DIRECT
   (Keith)

 - Add new ublk driver, userspace block driver using io_uring for
   communication with the userspace backend (Ming)

 - Use try_cmpxchg() to cleanup the code in various spots (Uros)

 - Finally remove bdevname() (Christoph)

 - Clean up the zoned device handling (Christoph)

 - Clean up independent access range support (Christoph)

 - Clean up and improve block sysfs handling (Christoph)

 - Clean up and improve teardown of block devices.

   This turns the usual two step process into something that is simpler
   to implement and handle in block drivers (Christoph)

 - Clean up chunk size handling (Christoph)

 - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
   Ming, Sebastian, Yang, Ying)

* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
  ublk_drv: fix double shift bug
  ublk_drv: make sure that correct flags(features) returned to userspace
  ublk_drv: fix error handling of ublk_add_dev
  ublk_drv: fix lockdep warning
  block: remove __blk_get_queue
  block: call blk_mq_exit_queue from disk_release for never added disks
  blk-mq: fix error handling in __blk_mq_alloc_disk
  ublk: defer disk allocation
  ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
  ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
  ublk: cleanup ublk_ctrl_uring_cmd
  ublk: simplify ublk_ch_open and ublk_ch_release
  ublk: remove the empty open and release block device operations
  ublk: remove UBLK_IO_F_PREFLUSH
  ublk: add a MAINTAINERS entry
  block: don't allow the same type rq_qos add more than once
  mmc: fix disk/queue leak in case of adding disk failure
  ublk_drv: fix an IS_ERR() vs NULL check
  ublk: remove UBLK_IO_F_INTEGRITY
  ublk_drv: remove unneeded semicolon
  ...
2022-08-02 13:46:35 -07:00
Linus Torvalds
42df1cbf6a Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring zerocopy support from Jens Axboe:
 "This adds support for efficient support for zerocopy sends through
  io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and
  UDP.

  The core network changes to support this is in a stable branch from
  Jakub that both io_uring and net-next has pulled in, and the io_uring
  changes are layered on top of that.

  All of the work has been done by Pavel"

* tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits)
  io_uring: notification completion optimisation
  io_uring: export req alloc from core
  io_uring/net: use unsigned for flags
  io_uring/net: make page accounting more consistent
  io_uring/net: checks errors of zc mem accounting
  io_uring/net: improve io_get_notif_slot types
  selftests/io_uring: test zerocopy send
  io_uring: enable managed frags with register buffers
  io_uring: add zc notification flush requests
  io_uring: rename IORING_OP_FILES_UPDATE
  io_uring: flush notifiers after sendzc
  io_uring: sendzc with fixed buffers
  io_uring: allow to pass addr into sendzc
  io_uring: account locked pages for non-fixed zc
  io_uring: wire send zc request type
  io_uring: add notification slot registration
  io_uring: add rsrc referencing for notifiers
  io_uring: complete notifiers in tw
  io_uring: cache struct io_notif
  io_uring: add zc notification infrastructure
  ...
2022-08-02 13:37:55 -07:00
Linus Torvalds
98e2474640 Merge tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring buffered writes support from Jens Axboe:
 "This contains support for buffered writes, specifically for XFS. btrfs
  is in progress, will be coming in the next release.

  io_uring does support buffered writes on any file type, but since the
  buffered write path just always -EAGAIN (or -EOPNOTSUPP) any attempt
  to do so if IOCB_NOWAIT is set, any buffered write will effectively be
  handled by io-wq offload. This isn't very efficient, and we even have
  specific code in io-wq to serialize buffered writes to the same inode
  to avoid further inefficiencies with thread offload.

  This is particularly sad since most buffered writes don't block, they
  simply copy data to a page and dirty it. With this pull request, we
  can handle buffered writes a lot more effiently.

  If balance_dirty_pages() needs to block, we back off on writes as
  indicated.

  This improves buffered write support by 2-3x.

  Jan Kara helped with the mm bits for this, and Stefan handled the
  fs/iomap/xfs/io_uring parts of it"

* tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block:
  mm: honor FGP_NOWAIT for page cache page allocation
  xfs: Add async buffered write support
  xfs: Specify lockmode when calling xfs_ilock_for_iomap()
  io_uring: Add tracepoint for short writes
  io_uring: fix issue with io_write() not always undoing sb_start_write()
  io_uring: Add support for async buffered writes
  fs: Add async write file modification handling.
  fs: Split off inode_needs_update_time and __file_update_time
  fs: add __remove_file_privs() with flags parameter
  fs: add a FMODE_BUF_WASYNC flags for f_mode
  iomap: Return -EAGAIN from iomap_write_iter()
  iomap: Add async buffered write support
  iomap: Add flags parameter to iomap_page_create()
  mm: Add balance_dirty_pages_ratelimited_flags() function
  mm: Move updates of dirty_exceeded into one place
  mm: Move starting of background writeback into the main balancing loop
2022-08-02 13:27:23 -07:00
Linus Torvalds
b349b1181d Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:

 - As per (valid) complaint in the last merge window, fs/io_uring.c has
   grown quite large these days. io_uring isn't really tied to fs
   either, as it supports a wide variety of functionality outside of
   that.

   Move the code to io_uring/ and split it into files that either
   implement a specific request type, and split some code into helpers
   as well. The code is organized a lot better like this, and io_uring.c
   is now < 4K LOC (me).

 - Deprecate the epoll_ctl opcode. It'll still work, just trigger a
   warning once if used. If we don't get any complaints on this, and I
   don't expect any, then we can fully remove it in a future release
   (me).

 - Improve the cancel hash locking (Hao)

 - kbuf cleanups (Hao)

 - Efficiency improvements to the task_work handling (Dylan, Pavel)

 - Provided buffer improvements (Dylan)

 - Add support for recv/recvmsg multishot support. This is similar to
   the accept (or poll) support for have for multishot, where a single
   SQE can trigger everytime data is received. For applications that
   expect to do more than a few receives on an instantiated socket, this
   greatly improves efficiency (Dylan).

 - Efficiency improvements for poll handling (Pavel)

 - Poll cancelation improvements (Pavel)

 - Allow specifiying a range for direct descriptor allocations (Pavel)

 - Cleanup the cqe32 handling (Pavel)

 - Move io_uring types to greatly cleanup the tracing (Pavel)

 - Tons of great code cleanups and improvements (Pavel)

 - Add a way to do sync cancelations rather than through the sqe -> cqe
   interface, as that's a lot easier to use for some use cases (me).

 - Add support to IORING_OP_MSG_RING for sending direct descriptors to a
   different ring. This avoids the usually problematic SCM case, as we
   disallow those. (me)

 - Make the per-command alloc cache we use for apoll generic, place
   limits on it, and use it for netmsg as well (me).

 - Various cleanups (me, Michal, Gustavo, Uros)

* tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits)
  io_uring: ensure REQ_F_ISREG is set async offload
  net: fix compat pointer in get_compat_msghdr()
  io_uring: Don't require reinitable percpu_ref
  io_uring: fix types in io_recvmsg_multishot_overflow
  io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
  io_uring: support multishot in recvmsg
  net: copy from user before calling __get_compat_msghdr
  net: copy from user before calling __copy_msghdr
  io_uring: support 0 length iov in buffer select in compat
  io_uring: fix multishot ending when not polled
  io_uring: add netmsg cache
  io_uring: impose max limit on apoll cache
  io_uring: add abstraction around apoll cache
  io_uring: move apoll cache to poll.c
  io_uring: consolidate hash_locked io-wq handling
  io_uring: clear REQ_F_HASH_LOCKED on hash removal
  io_uring: don't race double poll setting REQ_F_ASYNC_DATA
  io_uring: don't miss setting REQ_F_DOUBLE_POLL
  io_uring: disable multishot recvmsg
  io_uring: only trace one of complete or overflow
  ...
2022-08-02 13:20:44 -07:00
Chun-Kuang Hu
d9c26e0a58 mailbox: mtk-cmdq: Remove proprietary cmdq_task_cb
rx_callback is a standard mailbox callback mechanism and could cover the
function of proprietary cmdq_task_cb, so use the standard one instead of
the proprietary one. Client driver has changed to use standard
rx_callback, so remove proprietary cmdq_task_cb.

Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2022-08-02 15:06:57 -05:00