We should only set the max segment size to unlimited if we actually
have a virt boundary. Otherwise we accidentally clear that limit
when called from the SCSI midlayer, which always calls
blk_queue_virt_boundary, even if that mask is 0.
Fixes: 7ad388d8e4 ("scsi: core: add a host / host template field for the virt boundary")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph.
* 'nvme-5.3' of git://git.infradead.org/nvme:
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
Daniel reports that when testing an http server that uses io_uring
to poll for incoming connections, sometimes it hard crashes. This is
due to an uninitialized list member for the io_uring request. Normally
this doesn't trigger and none of the test cases caught it.
Reported-by: Daniel Kozak <kozzi11@gmail.com>
Tested-by: Daniel Kozak <kozzi11@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 0298d54352.
With this patch, set 'poll_queues > hard queues' will lead to 'nr_read_queues = 0'
in nvme_calc_irq_sets. Then poll_queues setting can fail since dev->tagset.nr_maps
equals to 2 and nvme_pci_map_queues will not do map for poll queues.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When freeing the subsystem after finding another match with
__nvme_find_get_subsystem(), use put_device() instead of
__nvme_release_subsystem() which calls kfree() directly.
Per the documentation, put_device() should always be used
after device_initialization() is called. Otherwise, leaks
like the one below which was detected by kmemleak may occur.
Once the call of __nvme_release_subsystem() is removed it no
longer makes sense to keep the helper, so fold it back
into nvme_release_subsystem().
unreferenced object 0xffff8883d12bfbc0 (size 16):
comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
hex dump (first 16 bytes):
6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2....
backtrace:
[<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
[<0000000081169e5f>] kvasprintf+0xad/0x130
[<0000000025626f25>] kvasprintf_const+0x47/0x120
[<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
[<000000004881f8b3>] dev_set_name+0x98/0xc0
[<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
[<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
[<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
[<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
[<000000002259b3d5>] __vfs_write+0x66/0x120
[<000000002f6df81e>] vfs_write+0x154/0x490
[<000000007e8cfc19>] ksys_write+0x10a/0x240
[<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
[<00000000fee6d692>] do_syscall_64+0xaa/0x470
[<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ab9e00cc72 ("nvme: track subsystems")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a
system with more than one of these SSDs will only have one usable.
[ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
[ 0.943017] nvme nvme1: Removing after probe failure status: -22
02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
There are no firmware updates available from the vendor, unfortunately.
Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves
the issue, and they all work after this patch:
/dev/nvme0n1 2J1120050420 ADATA SX6000LNP [...]
/dev/nvme1n1 2J1120050540 ADATA SX6000LNP [...]
Signed-off-by: Misha Nasledov <misha@nasledov.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Building with clang and KASAN, we get a warning about an overly large
stack frame on 32-bit architectures:
drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
[-Werror,-Wframe-larger-than=]
We already allocate other data dynamically in this function, so
just do the same for the shash descriptor, which makes up most of
this memory.
Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_sched_completed_request is a function that checks if the elevator
related to the request has started_request implemented, but currently, none of
the available IO schedulers implement started_request, so remove both.
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
memory malloced in bch_cached_dev_run() and should be freed before
leaving from the error handling cases, otherwise it will cause
memory leak.
Fixes: 0b13efecf5 ("bcache: add return value check to bch_cached_dev_run()")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We are using PAGE_SIZE as the unit to determine if the total len in
async_list has exceeded max_pages, it's not fair for smaller io sizes.
For example, if we are doing 1k-size io streams, we will never exceed
max_pages since len >>= PAGE_SHIFT always gets zero. So use original
bytes to make it more accurate.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hrvoje reports that when a large fixed buffer is registered and IO is
being done to the latter pages of said buffer, the IO submission time
is much worse:
reading to the start of the buffer: 11238 ns
reading to the end of the buffer: 1039879 ns
In fact, it's worse by two orders of magnitude. The reason for that is
how io_uring figures out how to setup the iov_iter. We point the iter
at the first bvec, and then use iov_iter_advance() to fast-forward to
the offset within that buffer we need.
However, that is abysmally slow, as it entails iterating the bvecs
that we setup as part of buffer registration. There's really no need
to use this generic helper, as we know it's a BVEC type iterator, and
we also know that each bvec is PAGE_SIZE in size, apart from possibly
the first and last. Hence we can just use a shift on the offset to
find the right index, and then adjust the iov_iter appropriately.
After this fix, the timings are:
reading to the start of the buffer: 10135 ns
reading to the end of the buffer: 1377 ns
Or about an 755x improvement for the tail page.
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A caller is supposed to pass in REQ_NOWAIT if we can't block for any
given operation, but O_DIRECT for block devices just ignore this. Hence
we'll block for various resource shortages on the block layer side,
like having to wait for requests.
Use the new REQ_NOWAIT_INLINE to ask for this error to be returned
inline, so we can handle it appropriately and return -EAGAIN to the
caller.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By default, if a caller sets REQ_NOWAIT and we need to block, we'll
return -EAGAIN through the bio->bi_end_io() callback. For some use
cases, this makes it hard to use.
Allow a caller to ask for inline return of errors related to
blocking by also setting REQ_NOWAIT_INLINE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a hang issue while using fio to do some basic test. The issue
can be easily reproduced using the below script:
while true
do
fio --ioengine=io_uring -rw=write -bs=4k -numjobs=1 \
-size=1G -iodepth=64 -name=uring --filename=/dev/zero
done
After several minutes (or more), fio would block at
io_uring_enter->io_cqring_wait in order to waiting for previously
committed sqes to be completed and can't return to user anymore until
we send a SIGTERM to fio. After receiving SIGTERM, fio hangs at
io_ring_ctx_wait_and_kill with a backtrace like this:
[54133.243816] Call Trace:
[54133.243842] __schedule+0x3a0/0x790
[54133.243868] schedule+0x38/0xa0
[54133.243880] schedule_timeout+0x218/0x3b0
[54133.243891] ? sched_clock+0x9/0x10
[54133.243903] ? wait_for_completion+0xa3/0x130
[54133.243916] ? _raw_spin_unlock_irq+0x2c/0x40
[54133.243930] ? trace_hardirqs_on+0x3f/0xe0
[54133.243951] wait_for_completion+0xab/0x130
[54133.243962] ? wake_up_q+0x70/0x70
[54133.243984] io_ring_ctx_wait_and_kill+0xa0/0x1d0
[54133.243998] io_uring_release+0x20/0x30
[54133.244008] __fput+0xcf/0x270
[54133.244029] ____fput+0xe/0x10
[54133.244040] task_work_run+0x7f/0xa0
[54133.244056] do_exit+0x305/0xc40
[54133.244067] ? get_signal+0x13b/0xbd0
[54133.244088] do_group_exit+0x50/0xd0
[54133.244103] get_signal+0x18d/0xbd0
[54133.244112] ? _raw_spin_unlock_irqrestore+0x36/0x60
[54133.244142] do_signal+0x34/0x720
[54133.244171] ? exit_to_usermode_loop+0x7e/0x130
[54133.244190] exit_to_usermode_loop+0xc0/0x130
[54133.244209] do_syscall_64+0x16b/0x1d0
[54133.244221] entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that we had added a req to ctx->pending_async at the very
end, but it didn't get a chance to be processed. How could this happen?
fio#cpu0 wq#cpu1
io_add_to_prev_work io_sq_wq_submit_work
atomic_read() <<< 1
atomic_dec_return() << 1->0
list_empty(); <<< true;
list_add_tail()
atomic_read() << 0 or 1?
As atomic_ops.rst states, atomic_read does not guarantee that the
runtime modification by any other thread is visible yet, so we must take
care of that with a proper implicit or explicit memory barrier.
This issue was detected with the help of Jackie's <liuyun01@kylinos.cn>
Fixes: 31b5151064 ("io_uring: allow workqueue item to handle multiple buffered requests")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Oleg noticed that our checking of data.got_token is unsafe in the
cleanup case, and should really use a memory barrier. Use a wmb on the
write side, and a rmb() on the read side. We don't need one in the main
loop since we're saved by set_current_state().
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case we get a spurious wakeup we need to make sure to re-set
ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we raced with somebody else getting an inflight counter we could fail
to get an inflight counter with no sleepers on the list, and thus need
to go to sleep. In this case has_sleepers should be true because we are
now relying on the waker to get our inflight counter for us. And in the
case of spurious wakeups we'd still want this to be the case. So set
has_sleepers to true if we went to sleep to make sure we're woken up the
proper way.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We saw a hang in production with WBT where there was only one waiter in
the throttle path and no outstanding IO. This is because of the
has_sleepers optimization that is used to make sure we don't steal an
inflight counter for new submitters when there are people already on the
list.
We can race with our check to see if the waitqueue has any waiters (this
is done locklessly) and the time we actually add ourselves to the
waitqueue. If this happens we'll go to sleep and never be woken up
because nobody is doing IO to wake us up.
Fix this by checking if the waitqueue has a single sleeper on the list
after we add ourselves, that way we have an uptodate view of the list.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
rq-qos sits in the io path so we want to take locks as sparingly as
possible. To accomplish this we try not to take the waitqueue head lock
unless we are sure we need to go to sleep, and we have an optimization
to make sure that we don't starve out existing waiters. Since we check
if there are existing waiters locklessly we need to be able to update
our view of the waitqueue list after we've added ourselves to the
waitqueue. Accomplish this by adding this helper to see if there is
more than just ourselves on the list.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Consider a sync bfq_queue Q that remains empty while in service, and
suppose that, when this happens, there is a fair amount of already
in-flight I/O not belonging to Q. In such a situation, I/O dispatching
may need to be plugged (until new I/O arrives for Q), for the
following reason.
The drive may decide to serve in-flight non-Q's I/O requests before
Q's ones, thereby delaying the arrival of new I/O requests for Q
(recall that Q is sync). If I/O-dispatching is not plugged, then,
while Q remains empty, a basically uncontrolled amount of I/O from
other queues may be dispatched too, possibly causing the service of
Q's I/O to be delayed even longer in the drive. This problem gets more
and more serious as the speed and the queue depth of the drive grow,
because, as these two quantities grow, the probability to find no
queue busy but many requests in flight grows too.
If Q has the same weight and priority as the other queues, then the
above delay is unlikely to cause any issue, because all queues tend to
undergo the same treatment. So, since not plugging I/O dispatching is
convenient for throughput, it is better not to plug. Things change in
case Q has a higher weight or priority than some other queue, because
Q's service guarantees may simply be violated. For this reason,
commit 1de0c4cd9e ("block, bfq: reduce idling only in symmetric
scenarios") does plug I/O in such an asymmetric scenario. Plugging
minimizes the delay induced by already in-flight I/O, and enables Q to
recover the bandwidth it may lose because of this delay.
Yet the above commit does not cover the case of weight-raised queues,
for efficiency concerns. For weight-raised queues, I/O-dispatch
plugging is activated simply if not all bfq_queues are
weight-raised. But this check does not handle the case of in-flight
requests, because a bfq_queue may become non busy *before* all its
in-flight requests are completed.
This commit performs I/O-dispatch plugging for weight-raised queues if
there are some in-flight requests.
As a practical example of the resulting recover of control, under
write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5
seconds after this fix, against 15 seconds before the fix (as a
reference, gnome-terminal takes about 35 seconds to start with any of
the other I/O schedulers).
Fixes: 1de0c4cd9e ("block, bfq: reduce idling only in symmetric scenarios")
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The runtime configurable module parameter files are located under
/sys/module/MODULENAME/parameters, not /sys/module/MODULENAME.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, ->pd_stat() is called only when moduleparam
blkcg_debug_stats is set which prevents it from printing non-debug
policy-specific statistics. Let's move debug testing down so that
->pd_stat() can print non-debug stat too. This patch doesn't cause
any visible behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We could queue a work for each req in defer and link list without
increasing async_list->cnt, so we shouldn't decrease it while exiting
from workqueue as well if we didn't process the req in async list.
Thanks to Jens Axboe <axboe@kernel.dk> for his guidance.
Fixes: 31b5151064 ("io_uring: allow workqueue item to handle multiple buffered requests")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sq->cached_sq_head and cq->cached_cq_tail are both unsigned int. If
cached_sq_head overflows before cached_cq_tail, then we may miss a
barrier req. As cached_cq_tail always follows cached_sq_head, the NQ
should be enough.
Cc: stable@vger.kernel.org
Fixes: de0617e467 ("io_uring: add support for marking commands as draining")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull more block updates from Jens Axboe:
"A later pull request with some followup items. I had some vacation
coming up to the merge window, so certain things items were delayed a
bit. This pull request also contains fixes that came in within the
last few days of the merge window, which I didn't want to push right
before sending you a pull request.
This contains:
- NVMe pull request, mostly fixes, but also a few minor items on the
feature side that were timing constrained (Christoph et al)
- Report zones fixes (Damien)
- Removal of dead code (Damien)
- Turn on cgroup psi memstall (Josef)
- block cgroup MAINTAINERS entry (Konstantin)
- Flush init fix (Josef)
- blk-throttle low iops timing fix (Konstantin)
- nbd resize fixes (Mike)
- nbd 0 blocksize crash fix (Xiubo)
- block integrity error leak fix (Wenwen)
- blk-cgroup writeback and priority inheritance fixes (Tejun)"
* tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits)
MAINTAINERS: add entry for block io cgroup
null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
block: Limit zone array allocation size
sd_zbc: Fix report zones buffer allocation
block: Kill gfp_t argument of blkdev_report_zones()
block: Allow mapping of vmalloc-ed buffers
block/bio-integrity: fix a memory leak bug
nvme: fix NULL deref for fabrics options
nbd: add netlink reconfigure resize support
nbd: fix crash when the blksize is zero
block: Disable write plugging for zoned block devices
block: Fix elevator name declaration
block: Remove unused definitions
nvme: fix regression upon hot device removal and insertion
blk-throttle: fix zero wait time for iops throttled group
block: Fix potential overflow in blk_report_zones()
blkcg: implement REQ_CGROUP_PUNT
blkcg, writeback: Implement wbc_blkcg_css()
blkcg, writeback: Add wbc->no_cgroup_owner
blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
...
Pull i2c updates from Wolfram Sang:
"New stuff from the I2C world:
- in the core, getting irqs from ACPI is now similar to OF
- new driver for MediaTek MT7621/7628/7688 SoCs
- bcm2835, i801, and tegra drivers got some more attention
- GPIO API cleanups
- cleanups in the core headers
- lots of usual driver updates"
* 'i2c/for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (74 commits)
i2c: mt7621: Fix platform_no_drv_owner.cocci warnings
i2c: cpm: remove casting dma_alloc
dt-bindings: i2c: sun6i-p2wi: Fix the binding example
dt-bindings: i2c: mv64xxx: Fix the example compatible
i2c: i801: Documentation update
i2c: i801: Add support for Intel Tiger Lake
i2c: i801: Fix PCI ID sorting
dt-bindings: i2c-stm32: document optional dmas
i2c: i2c-stm32f7: Add I2C_SMBUS_I2C_BLOCK_DATA support
i2c: core: Tidy up handling of init_irq
i2c: core: Move ACPI gpio IRQ handling into i2c_acpi_get_irq
i2c: core: Move ACPI IRQ handling to probe time
i2c: acpi: Factor out getting the IRQ from ACPI
i2c: acpi: Use available IRQ helper functions
i2c: core: Allow whole core to use i2c_dev_irq_from_resources
eeprom: at24: modify a comment referring to platform data
dt-bindings: i2c: omap: Add new compatible for J721E SoCs
dt-bindings: i2c: mv64xxx: Add YAML schemas
dt-bindings: i2c: sun6i-p2wi: Add YAML schemas
i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver
...
Pull power supply and reset updates from Sebastian Reichel:
"Core:
- add HWMON compat layer
- new properties:
- input power limit
- input voltage limit
Drivers:
- qcom-pon: add gen2 support
- new driver for storing reboot move in NVMEM
- new driver for Wilco EC charger configuration
- simplify getting the adapter of a client"
* tag 'for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: reset: nvmem-reboot-mode: add CONFIG_OF dependency
power_supply: wilco_ec: Add charging config driver
power: supply: cros: allow to set input voltage and current limit
power: supply: add input power and voltage limit properties
power: supply: fix semicolon.cocci warnings
power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface
dt-bindings: power: reset: add document for NVMEM based reboot-mode
reset: qcom-pon: Add support for gen2 pon
dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatibility line
power: supply: Add HWMON compatibility layer
power: supply: sbs-manager: simplify getting the adapter of a client
power: supply: rt9455_charger: simplify getting the adapter of a client
power: supply: rt5033_battery: simplify getting the adapter of a client
power: supply: max17042_battery: simplify getting the adapter of a client
power: supply: max17040_battery: simplify getting the adapter of a client
power: supply: max14656_charger_detector: simplify getting the adapter of a client
power: supply: bq25890_charger: simplify getting the adapter of a client
power: supply: bq24257_charger: simplify getting the adapter of a client
power: supply: bq24190_charger: simplify getting the adapter of a client
Pull rdma updates from Jason Gunthorpe:
"A smaller cycle this time. Notably we see another new driver, 'Soft
iWarp', and the deletion of an ancient unused driver for nes.
- Revise and simplify the signature offload RDMA MR APIs
- More progress on hoisting object allocation boiler plate code out
of the drivers
- Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib,
i40iw
- Tree wide cleanups: struct_size, put_user_page, xarray, rst doc
conversion
- Removal of obsolete ib_ucm chardev and nes driver
- netlink based discovery of chardevs and autoloading of the modules
providing them
- Move more of the rdamvt/hfi1 uapi to include/uapi/rdma
- New driver 'siw' for software based iWarp running on top of netdev,
much like rxe's software RoCE.
- mlx5 feature to report events in their raw devx format to userspace
- Expose per-object counters through rdma tool
- Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core
from netdev"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits)
RMDA/siw: Require a 64 bit arch
RDMA/siw: Mark expected switch fall-throughs
RDMA/core: Fix -Wunused-const-variable warnings
rdma/siw: Remove set but not used variable 's'
rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS
RDMA/siw: Add missing rtnl_lock around access to ifa
rdma/siw: Use proper enumerated type in map_cqe_status
RDMA/siw: Remove unnecessary kthread create/destroy printouts
IB/rdmavt: Fix variable shadowing issue in rvt_create_cq
RDMA/core: Fix race when resolving IP address
RDMA/core: Make rdma_counter.h compile stand alone
IB/core: Work on the caller socket net namespace in nldev_newlink()
RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
RDMA/mlx5: Set RDMA DIM to be enabled by default
RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
RDMA/core: Provide RDMA DIM support for ULPs
linux/dim: Implement RDMA adaptive moderation (DIM)
IB/mlx5: Report correctly tag matching rendezvous capability
docs: infiniband: add it to the driver-api bookset
IB/mlx5: Implement VHCA tunnel mechanism in DEVX
...
Pull MFD updates from Lee Jones:
"Core Frameworks:
- Set 'struct device' fwnode when registering a new device
New Drivers:
- Add support for ROHM BD70528 PMIC
New Device Support:
- Add support for LP87561 4-Phase Regulator to TI LP87565 PMIC
- Add support for RK809 and RK817 to Rockchip RK808
- Add support for Lid Angle to ChromeOS core
- Add support for CS47L15 CODEC to Madera core
- Add support for CS47L92 CODEC to Madera core
- Add support for ChromeOS (legacy) Accelerometers in ChromeOS core
- Add support for Add Intel Elkhart Lake PCH to Intel LPSS
New Functionality:
- Provide regulator supply information when registering; madera-core
- Additional Device Tree support; lp87565, madera, cros-ec, rohm,bd71837-pmic
- Allow over-riding power button press via Device Tree; rohm-bd718x7
- Differentiate between running processors; cros_ec_dev
Fix-ups:
- Big header file update; cros_ec_commands.h
- Split header per-subsystem; rohm-bd718x7
- Remove superfluous code; menelaus, cs5535-mfd, cs47lXX-tables
- Trivial; sorting, coding style; intel-lpss-pci
- Only remove Power Off functionality if set locally; rk808
- Make use for Power Off Prepare(); rk808
- Fix spelling mistake in header guards; stmfx
- Properly free IDA resources
- SPDX fixups; cs47lXX-tables, madera
- Error path fixups; hi655x-pmic
Bug Fixes:
- Add missing break in case() statement
- Repair undefined behaviour when not initialising variables; arizona-core, madera-core
- Fix reference to Device Tree documentation; madera"
* tag 'mfd-next-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (45 commits)
mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
mfd: madera: Fixup SPDX headers
mfd: madera: Remove some unused registers and fix some defaults
mfd: intel-lpss: Release IDA resources
mfd: intel-lpss: Add Intel Elkhart Lake PCH PCI IDs
mfd: cs5535-mfd: Remove ifdef OLPC noise
mfd: stmfx: Fix macro definition spelling
dt-bindings: mfd: Add link to ROHM BD71847 Datasheet
MAINAINERS: Swap words in INTEL PMIC MULTIFUNCTION DEVICE DRIVERS
mfd: cros_ec_dev: Register cros_ec_accel_legacy driver as a subdevice
mfd: rk808: Prepare rk805 for poweroff
mfd: rk808: Check pm_power_off pointer
mfd: cros_ec: differentiate SCP from EC by feature bit
dt-bindings: Add binding for cros-ec-rpmsg
mfd: madera: Add Madera core support for CS47L92
mfd: madera: Add Madera core support for CS47L15
mfd: madera: Update DT bindings to add additional CODECs
mfd: madera: Add supply mapping for MICVDD
mfd: madera: Fix potential uninitialised use of variable
mfd: madera: Fix bad reference to pinctrl.txt file
...
This reverts commit 031e610a6a, reversing
changes made to 52d2d44eee.
The mm changes in there we premature and not fully ack or reviewed by core mm folks,
I dropped the ball by merging them via this tree, so lets take em all back out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
mm/pgtable: drop pgtable_t variable from pte_fn_t functions
drops the token came in via the hmm tree, this caused lots of
conflicts, but applying this cleanup patch should reduce it
to something easier to handle. Just accept the token is unused
at this point.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:
- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.
- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.
- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct
- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly
- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...
Pull eCryptfs updates from Tyler Hicks:
- Fix error handling when ecryptfs_read_lower() encounters an error
- Fix read-only file creation when the eCryptfs mount is configured to
store metadata in xattrs
- Minor code cleanups
* tag 'ecryptfs-5.3-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: Change return type of ecryptfs_process_flags
ecryptfs: Make ecryptfs_xattr_handler static
ecryptfs: remove unnessesary null check in ecryptfs_keyring_auth_tok_for_sig
ecryptfs: use print_hex_dump_bytes for hexdump
eCryptfs: fix permission denied with ecryptfs_xattr mount option when create readonly file
ecryptfs: re-order a condition for static checkers
eCryptfs: fix a couple type promotion bugs
Pull UBIFS updates from Richard Weinberger:
- Support for zstd compression
- Support for offline signed filesystems
- Various fixes for regressions
* tag 'upstream-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Don't leak orphans on memory during commit
ubifs: Check link count of inodes when killing orphans.
ubifs: Add support for zstd compression.
ubifs: support offline signed images
ubifs: remove unnecessary check in ubifs_log_start_commit
ubifs: Fix typo of output in get_cs_sqnum
ubifs: Simplify redundant code
ubifs: Correctly use tnc_next() in search_dh_cookie()
Pull UML updates from Richard Weinberger:
- A new timer mode, time travel, for testing with UML
- Many bugixes/improvements for the serial line driver
- Various bugfixes
* tag 'for-linus-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT
um: Fix kcov crash during startup
um: configs: Remove useless UEVENT_HELPER_PATH
um: Support time travel mode
um: Pass nsecs to os timer functions
um: Remove drivers/ssl.h
um: Don't garbage collect in deactivate_all_fds()
um: Silence lockdep complaint about mmap_sem
um: Remove locking in deactivate_all_fds()
um: Timer code cleanup
um: fix os_timer_one_shot()
um: Fix IRQ controller regression on console read
Pull stream_open() updates from Kirill Smelkov:
"This time on stream_open front it is only two small changes:
- the first one converts stream_open.cocci to treat all functions
that start with wait_.* as blocking. Previously it was only
wait_event_.* functions that were considered as blocking, but this
was falsely reporting several deadlock cases as only warning.
This was picked by linux-kbuild and entered mainline as commit
0c4ab18fc3 ("coccinelle: api/stream_open: treat all wait_.*()
calls as blocking"), and already merged earlier.
- the second one teaches stream_open.cocci to consider files as being
stream-like even if they use noop_llseek. It results in two more
drivers being converted to stream_open() (mousedev.c and
hid-sensor-custom.c)"
* tag 'stream_open-5.3' of https://lab.nexedi.com/kirr/linux:
*: convert stream-like files -> stream_open, even if they use noop_llseek
Pull x86 platform driver updates from Andy Shevchenko:
"Gathered a bunch of x86 platform driver changes. It's rather big,
since includes two big refactors and completely new driver:
- ASUS WMI driver got a big refactoring in order to support the TUF
Gaming laptops. Besides that, the regression with backlight being
permanently off on various EeePC laptops has been fixed.
- Accelerometer on HP ProBook 450 G0 shows wrong measurements due to
X axis being inverted. This has been fixed.
- Intel PMC core driver has been extended to be ACPI enumerated if
the DSDT provides device with _HID "INT33A1". This allows to
convert the driver to be pure platform and support new hardware
purely based on ACPI DSDT.
- From now on the Intel Speed Select Technology is supported thru a
corresponding driver. This driver provides an access to the
features of the ISST, such as Performance Profile, Core Power, Base
frequency and Turbo Frequency.
- Mellanox platform drivers has been refactored and now extended to
support more systems, including new coming ones.
- The OLPC XO-1.75 platform is now supported.
- CB4063 Beckhoff Automation board is using PMC clocks, provided via
pmc_atom driver, for ethernet controllers in a way that they can't
be managed by the clock driver. The quirk has been extended to
cover this case.
- Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile
the information of Chuwi Hi10 Air has been fixed to cover more
models based on the same platform.
- Xiaomi notebooks have WMI interface enabled. Thus, the driver to
support it has been provided. It required some extension of the
generic WMI library, which allows to propagate opaque context to
the ->probe() of the individual drivers.
This release includes debugfs clean up from Greg KH for several
drivers that drop return code check and make debugfs absence or
failure non-fatal.
Also miscellaneous fixes here and there, mostly for Acer WMI and
various Intel drivers"
* tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86: (74 commits)
platform/x86: Fix PCENGINES_APU2 Kconfig warning
tools/power/x86/intel-speed-select: Add .gitignore file
platform/x86: mlx-platform: Fix error handling in mlxplat_init()
platform/x86: intel_pmc_core: Attach using APCI HID "INT33A1"
platform/x86: intel_pmc_core: transform Pkg C-state residency from TSC ticks into microseconds
platform/x86: asus-wmi: Use dev_get_drvdata()
Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
platform/x86: mlx-platform: Add more reset cause attributes
platform/x86: mlx-platform: Modify DMI matching order
platform/x86: mlx-platform: Add regmap structure for the next generation systems
platform/x86: mlx-platform: Change API for i2c-mlxcpld driver activation
platform/x86: mlx-platform: Move regmap initialization before all drivers activation
MAINTAINERS: Update for Intel Speed Select Technology
tools/power/x86: A tool to validate Intel Speed Select commands
platform/x86: ISST: Restore state on resume
platform/x86: ISST: Add Intel Speed Select PUNIT MSR interface
platform/x86: ISST: Add Intel Speed Select mailbox interface via MSRs
platform/x86: ISST: Add Intel Speed Select mailbox interface via PCI
platform/x86: ISST: Add Intel Speed Select mmio interface
platform/x86: ISST: Add IOCTL to Translate Linux logical CPU to PUNIT CPU number
...
Pull mailbox updates from Jassi Brar:
- stm32: race fix by adding a spinlock
- mhu: trim included headers
- omap: add support for K3 SoCs
- imx: Irq disable fix
- bcm: tidy up extracting driver data
- tegra: make resume 'noirq'
- api: fix error handling
* tag 'mailbox-v5.3' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: handle failed named mailbox channel request
mailbox: tegra: avoid resume NULL mailboxes
mailbox: tegra: hsp: add noirq resume
mailbox: bcm-flexrm-mailbox: using dev_get_drvdata directly
mailbox: imx: Clear GIEn bit at shutdown
mailbox: omap: Add support for TI K3 SoCs
dt-bindings: mailbox: omap: Update bindings for TI K3 SoCs
mailbox: arm_mhu: reorder header inclusion and drop unneeded ones
mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
Pull percpu updates from Dennis Zhou:
"This includes changes to let percpu_ref release the backing percpu
memory earlier after it has been switched to atomic in cases where the
percpu ref is not revived.
This will help recycle percpu memory earlier in cases where the
refcounts are pinned for prolonged periods of time"
* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu_ref: release percpu memory early without PERCPU_REF_ALLOW_REINIT
md: initialize percpu refcounters using PERCU_REF_ALLOW_REINIT
io_uring: initialize percpu refcounters using PERCU_REF_ALLOW_REINIT
percpu_ref: introduce PERCPU_REF_ALLOW_REINIT flag
Pull perf fixes from Ingo Molnar:
"A number of PMU driver corner case fixes, a race fix, an event
grouping fix, plus a bunch of tooling fixes/updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
perf/x86/intel: Fix spurious NMI on fixed counter
perf/core: Fix exclusive events' grouping
perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs
perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs
perf/core: Fix race between close() and fork()
perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool
perf intel-bts: Fix potential NULL pointer dereference found by the smatch tool
perf script: Assume native_arch for pipe mode
perf scripts python: export-to-sqlite.py: Fix DROP VIEW power_events_view
perf scripts python: export-to-postgresql.py: Fix DROP VIEW power_events_view
perf hists browser: Fix potential NULL pointer dereference found by the smatch tool
perf cs-etm: Fix potential NULL pointer dereference found by the smatch tool
perf parse-events: Remove unused variable: error
perf parse-events: Remove unused variable 'i'
perf metricgroup: Add missing list_del_init() when flushing egroups list
perf tools: Use list_del_init() more thorougly
perf tools: Use zfree() where applicable
tools lib: Adopt zalloc()/zfree() from tools/perf
perf tools: Move get_current_dir_name() cond prototype out of util.h
perf namespaces: Move the conditional setns() prototype to namespaces.h
...
Pull locking fix from Ingo Molnar:
"A single fix for a locking statistics bug"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix lock used or unused stats error
Pull x86 fix from Ingo Molnar:
"A single build system bugfix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix flip/flop vdso build bug
Pull scheduler fix from Ingo Molnar:
"Fix a sched statistics related bug that would trigger a kernel warning
on certain configs"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix preempt warning in ttwu
This patch continues 10dce8af34 (fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock) and c5bf68fe0c (*: convert stream-like files from
nonseekable_open -> stream_open) and teaches steam_open.cocci to
consider files as being stream-like not only if they have
.llseek=no_llseek, but also if they have .llseek=noop_llseek.
This is safe to do: the comment about noop_llseek says
This is an implementation of ->llseek useable for the rare special case when
userspace expects the seek to succeed but the (device) file is actually not
able to perform the seek. In this case you use noop_llseek() instead of
falling back to the default implementation of ->llseek.
and in general noop_llseek was massively added to drivers in 6038f373a3
(llseek: automatically add .llseek fop) when changing default for NULL .llseek
from NOP to no_llseek with the idea to avoid breaking compatibility, if
maybe some user-space program was using lseek on a device without caring
about the result, but caring if it was an error or not.
Amended semantic patch produces two changes when applied tree-wide:
drivers/hid/hid-sensor-custom.c:690:8-24: WARNING: hid_sensor_custom_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/input/mousedev.c:564:1-17: ERROR: mousedev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jan Blunck <jblunck@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
as well as some other functions only used by drivers that haven't
(yet?) made it upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf
record -e mem: ...) which could lead to register corruption and
kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
vmalloc when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to
use gas macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
Jitindar Singh, Thiago Jung Bauermann, YueHaibing"
* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
powerpc/eeh: Handle hugepages in ioremap space
ocxl: Update for AFU descriptor template version 1.1
powerpc/boot: pass CONFIG options in a simpler and more robust way
powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
powerpc/module64: Use symbolic instructions names.
powerpc/module32: Use symbolic instructions names.
powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Add lzo support for uImage
powerpc/boot: Add lzma support for uImage
powerpc/boot: don't force gzipped uImage
powerpc/8xx: Add microcode patch to move SMC parameter RAM.
powerpc/8xx: Use IO accessors in microcode programming.
powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
powerpc/8xx: refactor programming of microcode CPM params.
powerpc/8xx: refactor printing of microcode patch name.
powerpc/8xx: Refactor microcode write
powerpc/8xx: refactor writing of CPM microcode arrays
...