Commit Graph

1294908 Commits

Author SHA1 Message Date
Linus Torvalds
d1e9a63dcd Merge tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "VFS:

   - The new 64bit mount ids start after the old mount id, i.e., at the
     first non-32 bit value. However, we started counting one id too
     late and thus lost 4294967296 as the first valid id. Fix that.

   - Update a few comments on some vfs_*() creation helpers.

   - Move copying of the xattr name out from the locks required to start
     a filesystem write.

   - Extend the filelock lock UAF fix to the compat code as well.

   - Now that we added the ability to look up an inode under RCU it's
     possible that lockless hash lookup can find and lock an inode after
     it gets I_FREEING set. It then waits until inode teardown in
     evict() is finished.

     The flag however is still set after evict() has woken up all
     waiters. If the inode lock is taken late enough on the waiting side
     after hash removal and wakeup happened the waiting thread will
     never be woken.

     Before RCU based lookup this was synchronized via the
     inode_hash_lock. But since unhashing requires the inode lock as
     well we can check whether the inode is unhashed while holding inode
     lock even without holding inode_hash_lock.

  pidfd:

   - The nsproxy structure contains nearly all of the namespaces
     associated with a task. When a namespace type isn't supported
     nsproxy might contain a NULL pointer or always point to the initial
     namespace type. The logic isn't consistent. So when deriving
     namespace fds we need to ensure that the namespace type is
     supported.

     First, so that we don't risk dereferncing NULL pointers. The
     correct bigger fix would be to change all namespaces to always set
     a valid namespace pointer in struct nsproxy independent of whether
     or not it is compiled in. But that requires quite a few changes.

     Second, so that we don't allow deriving namespace fds when the
     namespace type doesn't exist and thus when they couldn't also be
     derived via /proc/self/ns/.

   - Add missing selftests for the new pidfd ioctls to derive namespace
     fds. This simply extends the already existing testsuite.

  netfs:

   - Fix debug logging and fix kconfig variable name so it actually
     works.

   - Fix writeback that goes both to the server and cache. The streams
     are only activated once a subreq is added. When a server write
     happens the subreq doesn't need to have finished by the time the
     cache write is started. If the server write has already finished by
     the time the cache write is about to start the cache write will
     operate on a folio that might already have been reused. Fix this by
     preactivating the cache write.

   - Limit cachefiles subreq size for cache writes to MAX_RW_COUNT"

* tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  inode: clarify what's locked
  vfs: Fix potential circular locking through setxattr() and removexattr()
  filelock: Fix fcntl/close race recovery compat path
  fs: use all available ids
  cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT
  netfs: Fix writeback that needs to go to both server and cache
  pidfs: add selftests for new namespace ioctls
  pidfs: handle kernels without namespaces cleanly
  pidfs: when time ns disabled add check for ioctl
  vfs: correct the comments of vfs_*() helpers
  vfs: handle __wait_on_freeing_inode() and evict() race
  netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG
  netfs: Revert "netfs: Switch debug logging to pr_debug()"
2024-07-24 09:42:51 -07:00
Benjamin Tissoires
facdbdfe0e selftests/hid: add test for attaching multiple time the same struct_ops
Turns out that we would en up in a bad state if we attempt to attach
twice the same HID-BPF struct_ops, so have a test for it.

Link: https://patch.msgid.link/20240723-fix-6-11-bpf-v1-4-b9d770346784@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-07-24 18:27:22 +02:00
Benjamin Tissoires
acd34cfc48 HID: bpf: prevent the same struct_ops to be attached more than once
If the struct_ops is already attached, we should bail out or we will
end up in various locks and pointer issues while unregistering.

Link: https://patch.msgid.link/20240723-fix-6-11-bpf-v1-3-b9d770346784@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-07-24 18:27:21 +02:00
Benjamin Tissoires
f64c1a4593 selftests/hid: disable struct_ops auto-attach
Since commit 08ac454e25 ("libbpf: Auto-attach struct_ops BPF maps in
BPF skeleton"), libbpf automatically calls bpf_map__attach_struct_ops()
on every struct_ops it sees in the bpf object. The problem is that
our test bpf object has many of them but only one should be manually
loaded at a time, or we end up locking the syscall.

Link: https://patch.msgid.link/20240723-fix-6-11-bpf-v1-2-b9d770346784@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-07-24 18:27:21 +02:00
Benjamin Tissoires
ff9fbcafba selftests/hid: fix bpf_wq new API
Since commit f56f4d541e ("bpf: helpers: fix bpf_wq_set_callback_impl
signature"), the API for bpf_wq changed a bit.

We need to update the selftests/hid code to reflect that or the
bpf program will not load.

Link: https://patch.msgid.link/20240723-fix-6-11-bpf-v1-1-b9d770346784@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-07-24 18:27:21 +02:00
Linus Torvalds
e44be00289 hostfs: fix folio conversion
Commit e3ec0fe944 ("hostfs: Convert hostfs_read_folio() to use a
folio") simplified hostfs_read_folio(), but in the process of converting
to using folios natively also mis-used the folio_zero_tail() function
due to the very confusing API of that function.

Very arguably it's folio_zero_tail() API itself that is buggy, since it
would make more sense (and the documentation kind of implies) that the
third argument would be the pointer to the beginning of the folio
buffer.

But no, the third argument to folio_zero_tail() is where we should start
zeroing the tail (even if we already also pass in the offset separately
as the second argument).

So fix the hostfs caller, and we can leave any folio_zero_tail() sanity
cleanup for later.

Reported-and-tested-by: Maciej Żenczykowski <maze@google.com>
Fixes: e3ec0fe944 ("hostfs: Convert hostfs_read_folio() to use a folio")
Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-24 09:25:15 -07:00
Luke D. Jones
e6e18021dd ALSA: hda/realtek: cs35l41: Fixup remaining asus strix models
Adjust quirks for 0x3a20, 0x3a30, 0x3a50 to match the 0x3a60. This
set has now been confirmed to work with this patch.

Signed-off-by: Luke D. Jones <luke@ljones.dev>
Fixes: 811dd426a9 ("ALSA: hda/realtek: Add quirks for Asus ROG 2024 laptops using CS35L41")
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20240723011224.115579-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-24 17:55:35 +02:00
Ming Lei
55fbb9a5d6 ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling
In ublk_ctrl_uring_cmd(), ioctl command NR should be used for
matching _IOC_NR(cmd_op).

Fix it by adding one private macro, and this way is clean.

Fixes: 13fe8e6825 ("ublk: add UBLK_CMD_DEL_DEV_ASYNC")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240724143311.2646330-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 09:51:46 -06:00
Yang Yang
7e04da2dc7 block: fix deadlock between sd_remove & sd_release
Our test report the following hung task:

[ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
[ 2538.459427] Call trace:
[ 2538.459430]  __switch_to+0x174/0x338
[ 2538.459436]  __schedule+0x628/0x9c4
[ 2538.459442]  schedule+0x7c/0xe8
[ 2538.459447]  schedule_preempt_disabled+0x24/0x40
[ 2538.459453]  __mutex_lock+0x3ec/0xf04
[ 2538.459456]  __mutex_lock_slowpath+0x14/0x24
[ 2538.459459]  mutex_lock+0x30/0xd8
[ 2538.459462]  del_gendisk+0xdc/0x350
[ 2538.459466]  sd_remove+0x30/0x60
[ 2538.459470]  device_release_driver_internal+0x1c4/0x2c4
[ 2538.459474]  device_release_driver+0x18/0x28
[ 2538.459478]  bus_remove_device+0x15c/0x174
[ 2538.459483]  device_del+0x1d0/0x358
[ 2538.459488]  __scsi_remove_device+0xa8/0x198
[ 2538.459493]  scsi_forget_host+0x50/0x70
[ 2538.459497]  scsi_remove_host+0x80/0x180
[ 2538.459502]  usb_stor_disconnect+0x68/0xf4
[ 2538.459506]  usb_unbind_interface+0xd4/0x280
[ 2538.459510]  device_release_driver_internal+0x1c4/0x2c4
[ 2538.459514]  device_release_driver+0x18/0x28
[ 2538.459518]  bus_remove_device+0x15c/0x174
[ 2538.459523]  device_del+0x1d0/0x358
[ 2538.459528]  usb_disable_device+0x84/0x194
[ 2538.459532]  usb_disconnect+0xec/0x300
[ 2538.459537]  hub_event+0xb80/0x1870
[ 2538.459541]  process_scheduled_works+0x248/0x4dc
[ 2538.459545]  worker_thread+0x244/0x334
[ 2538.459549]  kthread+0x114/0x1bc

[ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
[ 2538.461014] Call trace:
[ 2538.461016]  __switch_to+0x174/0x338
[ 2538.461021]  __schedule+0x628/0x9c4
[ 2538.461025]  schedule+0x7c/0xe8
[ 2538.461030]  blk_queue_enter+0xc4/0x160
[ 2538.461034]  blk_mq_alloc_request+0x120/0x1d4
[ 2538.461037]  scsi_execute_cmd+0x7c/0x23c
[ 2538.461040]  ioctl_internal_command+0x5c/0x164
[ 2538.461046]  scsi_set_medium_removal+0x5c/0xb0
[ 2538.461051]  sd_release+0x50/0x94
[ 2538.461054]  blkdev_put+0x190/0x28c
[ 2538.461058]  blkdev_release+0x28/0x40
[ 2538.461063]  __fput+0xf8/0x2a8
[ 2538.461066]  __fput_sync+0x28/0x5c
[ 2538.461070]  __arm64_sys_close+0x84/0xe8
[ 2538.461073]  invoke_syscall+0x58/0x114
[ 2538.461078]  el0_svc_common+0xac/0xe0
[ 2538.461082]  do_el0_svc+0x1c/0x28
[ 2538.461087]  el0_svc+0x38/0x68
[ 2538.461090]  el0t_64_sync_handler+0x68/0xbc
[ 2538.461093]  el0t_64_sync+0x1a8/0x1ac

  T1:				T2:
  sd_remove
  del_gendisk
  __blk_mark_disk_dead
  blk_freeze_queue_start
  ++q->mq_freeze_depth
  				bdev_release
 				mutex_lock(&disk->open_mutex)
  				sd_release
 				scsi_execute_cmd
 				blk_queue_enter
 				wait_event(!q->mq_freeze_depth)
  mutex_lock(&disk->open_mutex)

SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
this scenario. This is a classic ABBA deadlock. To fix the deadlock,
make sure we don't try to acquire disk->open_mutex after freezing
the queue.

Cc: stable@vger.kernel.org
Fixes: eec1be4c30 ("block: delete partitions later in del_gendisk")
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: and Cc: stable tags are missing. Otherwise this patch looks fine
Link: https://lore.kernel.org/r/20240724070412.22521-1-yang.yang@vivo.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 09:51:21 -06:00
Mickaël Salaün
cc374782b6 selftests/landlock: Add cred_transfer test
Check that keyctl(KEYCTL_SESSION_TO_PARENT) preserves the parent's
restrictions.

Fixes: e1199815b4 ("selftests/landlock: Add user space tests")
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20240724.Ood5aige9she@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-07-24 17:34:56 +02:00
Jann Horn
39705a6c29 landlock: Don't lose track of restrictions on cred_transfer
When a process' cred struct is replaced, this _almost_ always invokes
the cred_prepare LSM hook; but in one special case (when
KEYCTL_SESSION_TO_PARENT updates the parent's credentials), the
cred_transfer LSM hook is used instead.  Landlock only implements the
cred_prepare hook, not cred_transfer, so KEYCTL_SESSION_TO_PARENT causes
all information on Landlock restrictions to be lost.

This basically means that a process with the ability to use the fork()
and keyctl() syscalls can get rid of all Landlock restrictions on
itself.

Fix it by adding a cred_transfer hook that does the same thing as the
existing cred_prepare hook. (Implemented by having hook_cred_prepare()
call hook_cred_transfer() so that the two functions are less likely to
accidentally diverge in the future.)

Cc: stable@kernel.org
Fixes: 385975dca5 ("landlock: Set up the security framework and manage credentials")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20240724-landlock-houdini-fix-v1-1-df89a4560ca3@google.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-07-24 17:34:54 +02:00
Yunhui Cui
66381d3677 RISC-V: Select ACPI PPTT drivers
After adding ACPI support to populate_cache_leaves(), RISC-V can build
cacheinfo through the ACPI PPTT table, thus enabling the ACPI_PPTT
configuration.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20240617131425.7526-3-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 07:39:37 -07:00
Yunhui Cui
604f32ea69 riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT
Before cacheinfo can be built correctly, we need to initialize level
and type. Since RISC-V currently does not have a register group that
describes cache-related attributes like ARM64, we cannot obtain them
directly, so now we obtain cache leaves from the ACPI PPTT table
(acpi_get_cache_info()) and set the cache type through split_levels.

Suggested-by: Jeremy Linton <jeremy.linton@arm.com>
Suggested-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20240617131425.7526-2-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 07:39:36 -07:00
Yunhui Cui
ee3fab10cb riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init()
ci_leaf_init() is a declared static function. The implementation of the
function body and the caller do not use the parameter (struct device_node
*node) input parameter, so remove it.

Fixes: 6a24915145 ("Revert "riscv: Set more data to cacheinfo"")
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20240617131425.7526-1-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 07:39:35 -07:00
Sia Jee Heng
38738947db RISC-V: ACPI: Enable SPCR table for console output on RISC-V
The ACPI SPCR code has been used to enable console output for ARM64 and
X86. The same code can be reused for RISC-V. Furthermore, SPCR table is
mandated for headless system as outlined in the RISC-V BRS
Specification, chapter 6.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20240502073751.102093-2-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 07:33:37 -07:00
Jakub Kicinski
7c938e438c MAINTAINERS: make Breno the netconsole maintainer
netconsole has no maintainer, and Breno has been working on
improving it consistently for some time. So I think we found
the maintainer :)

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 15:17:39 +01:00
Jay Vosburgh
0fa9af9611 MAINTAINERS: Update bonding entry
Update my email address, clarify support status, and delete the
web site that hasn't been used in a long time.

Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 15:15:15 +01:00
Petr Machata
6d745cd0e9 net: nexthop: Initialize all fields in dumped nexthops
struct nexthop_grp contains two reserved fields that are not initialized by
nla_put_nh_group(), and carry garbage. This can be observed e.g. with
strace (edited for clarity):

    # ip nexthop add id 1 dev lo
    # ip nexthop add id 101 group 1
    # strace -e recvmsg ip nexthop get id 101
    ...
    recvmsg(... [{nla_len=12, nla_type=NHA_GROUP},
                 [{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52

The fields are reserved and therefore not currently used. But as they are, they
leak kernel memory, and the fact they are not just zero complicates repurposing
of the fields for new ends. Initialize the full structure.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 15:13:43 +01:00
Simon Horman
e9dbebae2e net: stmmac: Correct byte order of perfect_match
The perfect_match parameter of the update_vlan_hash operation is __le16,
and is correctly converted from host byte-order in the lone caller,
stmmac_vlan_update().

However, the implementations of this caller, dwxgmac2_update_vlan_hash()
and dwxgmac2_update_vlan_hash(), both treat this parameter as host byte
order, using the following pattern:

	u32 value = ...
	...
	writel(value | perfect_match, ...);

This is not correct because both:
1) value is host byte order; and
2) writel expects a host byte order value as it's first argument

I believe that this will break on big endian systems. And I expect it
has gone unnoticed by only being exercised on little endian systems.

The approach taken by this patch is to update the callback, and it's
caller to simply use a host byte order value.

Flagged by Sparse.
Compile tested only.

Fixes: c7ab0b8088 ("net: stmmac: Fallback to VLAN Perfect filtering if HASH is not available")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 15:11:44 +01:00
Pavel Begunkov
29d63b9403 io_uring: align iowq and task request error handling
There is a difference in how io_queue_sqe and io_wq_submit_work treat
error codes they get from io_issue_sqe. The first one fails anything
unknown but latter only fails when the code is negative.

It doesn't make sense to have this discrepancy, align them to the
io_queue_sqe behaviour.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c550e152bf4a290187f91a4322ddcb5d6d1f2c73.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Pavel Begunkov
a2b72b81fb io_uring: kill REQ_F_CANCEL_SEQ
We removed the reliance on the flag by the cancellation path and now
it's unused.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e57afe566bbe4fefeb44daffb08900f2a4756577.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Pavel Begunkov
f1dcdfcadb io_uring: simplify io_uring_cmd return
We don't have to return error code from an op handler back to core
io_uring, so once io_uring_cmd() sets the results and handles errors we
can juts return IOU_OK and simplify the code.

Note, only valid with e0b23d9953 ("io_uring: optimise ltimeout for
inline execution"), there was a problem with iopoll before.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8eae2be5b2a49236cd5f1dadbd1aa5730e9e2d4f.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Pavel Begunkov
e142e9cd88 io_uring: fix io_match_task must_hold
The __must_hold annotation in io_match_task() uses a non existing
parameter "req", fix it.

Fixes: 6af3f48bf6 ("io_uring: fix link traversal locking")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e65ee7709e96507cef3d93291746f2c489f2307.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Pavel Begunkov
bd44d7e902 io_uring: don't allow netpolling with SETUP_IOPOLL
IORING_SETUP_IOPOLL rings don't have any netpoll handling, let's fail
attempts to register netpolling in this case, there might be people who
will mix up IOPOLL and netpoll.

Cc: stable@vger.kernel.org
Fixes: ef1186c1a8 ("io_uring: add register/unregister napi function")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e7553aee0a8ae4edec6742cd6dd0c1e6914fba8.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Pavel Begunkov
f8b632e89a io_uring: tighten task exit cancellations
io_uring_cancel_generic() should retry if any state changes like a
request is completed, however in case of a task exit it only goes for
another loop and avoids schedule() if any tracked (i.e. REQ_F_INFLIGHT)
request got completed.

Let's assume we have a non-tracked request executing in iowq and a
tracked request linked to it. Let's also assume
io_uring_cancel_generic() fails to find and cancel the request, i.e.
via io_run_local_work(), which may happen as io-wq has gaps.
Next, the request logically completes, io-wq still hold a ref but queues
it for completion via tw, which happens in
io_uring_try_cancel_requests(). After, right before prepare_to_wait()
io-wq puts the request, grabs the linked one and tries executes it, e.g.
arms polling. Finally the cancellation loop calls prepare_to_wait(),
there are no tw to run, no tracked request was completed, so the
tctx_inflight() check passes and the task is put to indefinite sleep.

Cc: stable@vger.kernel.org
Fixes: 3f48cf18f8 ("io_uring: unify files and task cancel")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/acac7311f4e02ce3c43293f8f1fda9c705d158f1.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24 08:01:49 -06:00
Jisheng Zhang
8d22d0db5b riscv: boot: remove duplicated targets line
The "targets:" is duplicated in another line, remove the one with less
targets.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20240613153053.3835-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 06:14:06 -07:00
Jinjie Ruan
3308172276 trace: riscv: Remove deprecated kprobe on ftrace support
Since commit 7caa976546 ("ftrace: riscv: move from REGS to ARGS"),
kprobe on ftrace is not supported by riscv, because riscv's support for
FTRACE_WITH_REGS has been replaced with support for FTRACE_WITH_ARGS, and
KPROBES_ON_FTRACE will be supplanted by FPROBES. So remove the deprecated
kprobe on ftrace support, which is misunderstood.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240613111347.1745379-1-ruanjinjie@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24 06:14:05 -07:00
Hangbin Liu
863ff546fb selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like

./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected

Fixes: 6f84090333 ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 12:50:28 +01:00
Shigeru Yoshida
fa96c6baef tipc: Return non-zero value from tipc_udp_addr2str() on error
tipc_udp_addr2str() should return non-zero value if the UDP media
address is invalid. Otherwise, a buffer overflow access can occur in
tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP
media address.

Fixes: d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 12:18:03 +01:00
Rafael J. Wysocki
f7c1b0e4ae thermal: core: Back off when polling thermal zones on errors
Commit a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes).  However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason.  It is also not really useful to continuously poll thermal zones
that never respond.

To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value.  At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.

Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone.  In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
2024-07-24 12:40:23 +02:00
Peter Ujfalusi
e6fc5fcaef ASoC: SOF: ipc4-topology: Preserve the DMA Link ID for ChainDMA on unprepare
The DMA Link ID is set to the IPC message's primary during dai_config,
which is only during hw_params.
During xrun handling the hw_params is not called and the DMA Link ID
information will be lost.

All other fields in the message expected to be 0 for re-configuration, only
the DMA Link ID needs to be preserved and the in case of repeated
dai_config, it is correctly updated (masked and then set).

Cc: stable@vger.kernel.org
Fixes: ca5ce0caa6 ("ASoC: SOF: ipc4/intel: Add support for chained DMA")
Link: https://github.com/thesofproject/linux/issues/5116
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://patch.msgid.link/20240724081932.24542-3-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-24 11:29:13 +01:00
Peter Ujfalusi
ae67ed9010 ASoC: SOF: ipc4-topology: Only handle dai_config with HW_PARAMS for ChainDMA
The DMA Link ID is only valid in snd_sof_dai_config_data when the
dai_config is called with HW_PARAMS.

The commit that this patch fixes is actually moved a code section without
changing it, the same bug exists in the original code, needing different
patch to kernel prior to 6.9 kernels.

Cc: stable@vger.kernel.org
Fixes: 3858464de5 ("ASoC: SOF: ipc4-topology: change chain_dma handling in dai_config")
Link: https://github.com/thesofproject/linux/issues/5116
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://patch.msgid.link/20240724081932.24542-2-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-24 11:29:12 +01:00
Petr Vorel
ba6c664081 kbuild: rpm-pkg: Fix C locale setup
semicolon separation in LC_ALL is wrong. Either variable needs to be
exported before as a separate commit or set as part of the commit in the
beginning. Used second variant.

This fixes broken build on user's locale setup which makes 'date' binary
to produce invalid characters in rpm changelog (e.g. cs_CZ.UTF-8 'čec'):

$ make binrpm-pkg
  GEN     rpmbuild/SPECS/kernel.spec
rpmbuild -bb rpmbuild/SPECS/kernel.spec --define='_topdirlinux/rpmbuild' \
    --target x86_64-linux --build-in-place --noprep --define='_smp_mflags \
    %{nil}' $(rpm -q rpm >/dev/null 2>&1 || echo --nodeps)
Building target platforms: x86_64-linux
Building for target x86_64-linux
error: bad date in %changelog: St čec 24 2024 user <user@somehost>
make[2]: *** [scripts/Makefile.package:71: binrpm-pkg] Error 1
make[1]: *** [linux/Makefile:1546: binrpm-pkg] Error 2
make: *** [Makefile:224: __sub-make] Error 2

Fixes: 301c10908e ("kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec")
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-24 19:26:44 +09:00
Christian Brauner
f5e5e97c71 inode: clarify what's locked
In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held
but the inode is unhashed. We then release the inode_lock. So using
"locked" as parameter name is confusing. Use is_inode_hash_locked as
parameter name instead.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 11:11:40 +02:00
David Howells
c3a5e3e872 vfs: Fix potential circular locking through setxattr() and removexattr()
When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below.  The problem appears to stem from the
following:

 (1) Cachefiles manipulates xattrs on the files in its cache when called
     from ->writepages().

 (2) The setxattr() and removexattr() system call handlers get the name
     (and value) from userspace after taking the sb_writers lock, putting
     accesses of the vma->vm_lock and mm->mmap_lock inside of that.

 (3) The afs filesystem uses a per-inode lock to prevent multiple
     revalidation RPCs and in writeback vs truncate to prevent parallel
     operations from deadlocking against the server on one side and local
     page locks on the other.

Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock.  This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.

Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs.  Note that this might also affect other filesystems
that use netfslib and/or cachefiles.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.10.0-build2+ #956 Not tainted
 ------------------------------------------------------
 fsstress/6050 is trying to acquire lock:
 ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0

 but task is already holding lock:
 ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #4 (&vma->vm_lock->lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_write+0x3b/0x50
        vma_start_write+0x6b/0xa0
        vma_link+0xcc/0x140
        insert_vm_struct+0xb7/0xf0
        alloc_bprm+0x2c1/0x390
        kernel_execve+0x65/0x1a0
        call_usermodehelper_exec_async+0x14d/0x190
        ret_from_fork+0x24/0x40
        ret_from_fork_asm+0x1a/0x30

 -> #3 (&mm->mmap_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        __might_fault+0x7c/0xb0
        strncpy_from_user+0x25/0x160
        removexattr+0x7f/0x100
        __do_sys_fremovexattr+0x7e/0xb0
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #2 (sb_writers#14){.+.+}-{0:0}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        percpu_down_read+0x3c/0x90
        vfs_iocb_iter_write+0xe9/0x1d0
        __cachefiles_write+0x367/0x430
        cachefiles_issue_write+0x299/0x2f0
        netfs_advance_write+0x117/0x140
        netfs_write_folio.isra.0+0x5ca/0x6e0
        netfs_writepages+0x230/0x2f0
        afs_writepages+0x4d/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        __filemap_fdatawrite_range+0xa8/0xf0
        file_write_and_wait_range+0x59/0x90
        afs_release+0x10f/0x270
        __fput+0x25f/0x3d0
        __do_sys_close+0x43/0x70
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #1 (&vnode->validate_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        afs_writepages+0x37/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        filemap_invalidate_inode+0x167/0x1e0
        netfs_unbuffered_write_iter+0x1bd/0x2d0
        vfs_write+0x22e/0x320
        ksys_write+0xbc/0x130
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
        check_noncircular+0x119/0x160
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        filemap_fault+0x26e/0x8b0
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30

 other info that might help us debug this:

 Chain exists of:
   mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(&vma->vm_lock->lock);
                                lock(&mm->mmap_lock);
                                lock(&vma->vm_lock->lock);
   rlock(mapping.invalidate_lock#3);

  *** DEADLOCK ***

 1 lock held by fsstress/6050:
  #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 stack backtrace:
 CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x80
  check_noncircular+0x119/0x160
  ? queued_spin_lock_slowpath+0x4be/0x510
  ? __pfx_check_noncircular+0x10/0x10
  ? __pfx_queued_spin_lock_slowpath+0x10/0x10
  ? mark_lock+0x47/0x160
  ? init_chain_block+0x9c/0xc0
  ? add_chain_block+0x84/0xf0
  check_prev_add+0x195/0x430
  __lock_acquire+0xaf0/0xd80
  ? __pfx___lock_acquire+0x10/0x10
  ? __lock_release.isra.0+0x13b/0x230
  lock_acquire.part.0+0x103/0x280
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_lock_acquire.part.0+0x10/0x10
  ? rcu_is_watching+0x34/0x60
  ? lock_acquire+0xd7/0x120
  down_read+0x95/0x200
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_down_read+0x10/0x10
  ? __filemap_get_folio+0x25/0x1a0
  filemap_fault+0x26e/0x8b0
  ? __pfx_filemap_fault+0x10/0x10
  ? find_held_lock+0x7c/0x90
  ? __pfx___lock_release.isra.0+0x10/0x10
  ? __pte_offset_map+0x99/0x110
  __do_fault+0x57/0xd0
  do_pte_missing+0x23b/0x320
  __handle_mm_fault+0x2d4/0x320
  ? __pfx___handle_mm_fault+0x10/0x10
  handle_mm_fault+0x14f/0x260
  do_user_addr_fault+0x2a2/0x500
  exc_page_fault+0x71/0x90
  asm_exc_page_fault+0x22/0x30

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Gao Xiang <xiang@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-erofs@lists.ozlabs.org
cc: linux-fsdevel@vger.kernel.org
[brauner: fix minor issues]
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:14 +02:00
Jann Horn
f8138f2ad2 filelock: Fix fcntl/close race recovery compat path
When I wrote commit 3cad1bc010 ("filelock: Remove locks reliably when
fcntl/close race is detected"), I missed that there are two copies of the
code I was patching: The normal version, and the version for 64-bit offsets
on 32-bit kernels.
Thanks to Greg KH for stumbling over this while doing the stable
backport...

Apply exactly the same fix to the compat path for 32-bit kernels.

Fixes: c293621bbf ("[PATCH] stale POSIX lock handling")
Cc: stable@kernel.org
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:14 +02:00
Christian Brauner
8eac5358ad fs: use all available ids
The counter is unconditionally incremented for each mount allocation.
If we set it to 1ULL << 32 we're losing 4294967296 as the first valid
non-32 bit mount id.

Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
David Howells
51d37982bb cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT
Set the maximum size of a subrequest that writes to cachefiles to be
MAX_RW_COUNT so that we don't overrun the maximum write we can make to the
backing filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
David Howells
212be98aa1 netfs: Fix writeback that needs to go to both server and cache
When netfslib is performing writeback (ie. ->writepages), it maintains two
parallel streams of writes, one to the server and one to the cache, but it
doesn't mark either stream of writes as active until it gets some data that
needs to be written to that stream.

This is done because some folios will only be written to the cache
(e.g. copying to the cache on read is done by marking the folios and
letting writeback do the actual work) and sometimes we'll only be writing
to the server (e.g. if there's no cache).

Now, since we don't actually dispatch uploads and cache writes in parallel,
but rather flip between the streams, depending on which has the lowest
so-far-issued offset, and don't wait for the subreqs to finish before
flipping, we can end up in a situation where, say, we issue a write to the
server and this completes before we start the write to the cache.

But because we only activate a stream when we first add a subreq to it, the
result collection code may run before we manage to activate the stream -
resulting in the folio being cleaned and having the writeback-in-progress
mark removed.  At this point, the folio no longer belongs to us.

This is only really a problem for folios that need to be written to both
streams - and in that case, the upload to the server is started first,
followed by the write to the cache - and the cache write may see a bad
folio.

Fix this by activating the cache stream up front if there's a cache
available.  If there's a cache, then all data is going to be written to it.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1599053.1721398818@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
Christian Brauner
1bb8dce5df pidfs: add selftests for new namespace ioctls
Add selftests to verify that deriving namespace file descriptors from
pidfd file descriptors works correctly.

Link: https://lore.kernel.org/r/20240722-work-pidfs-69dbea91edab@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
Christian Brauner
9b3e150464 pidfs: handle kernels without namespaces cleanly
The nsproxy structure contains nearly all of the namespaces associated
with a task. When a given namespace type is not supported by this kernel
the rules whether the corresponding pointer in struct nsproxy is NULL or
always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
the case and for all namespace types we'd always set it to
init_<ns_type>_ns when the corresponding namespace type isn't supported.

Make sure we handle all namespaces where the pointer in struct nsproxy
can be NULL when the namespace type isn't supported.

Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner
Fixes: 5b08bd4085 ("pidfs: allow retrieval of namespace file descriptors") # mainline only
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
Edward Adam Davis
f60d38cb02 pidfs: when time ns disabled add check for ioctl
syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
open_namespace.

Fixes: 5b08bd4085 ("pidfs: allow retrieval of namespace file descriptors") # mainline only
Reported-and-tested-by: syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:12 +02:00
Congjie Zhou
b40c8e7a03 vfs: correct the comments of vfs_*() helpers
correct the comments of vfs_*() helpers in fs/namei.c, including:
1. vfs_create()
2. vfs_mknod()
3. vfs_mkdir()
4. vfs_rmdir()
5. vfs_symlink()

All of them come from the same commit:
6521f89170 "namei: prepare for idmapped mounts"

The @dentry is actually the dentry of child directory rather than
base directory(parent directory), and thus the @dir has to be
modified due to the change of @dentry.

Signed-off-by: Congjie Zhou <zcjie0802@qq.com>
Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:12 +02:00
Mateusz Guzik
5bc9ad78c2 vfs: handle __wait_on_freeing_inode() and evict() race
Lockless hash lookup can find and lock the inode after it gets the
I_FREEING flag set, at which point it blocks waiting for teardown in
evict() to finish.

However, the flag is still set even after evict() wakes up all waiters.

This results in a race where if the inode lock is taken late enough, it
can happen after both hash removal and wakeups, meaning there is nobody
to wake the racing thread up.

This worked prior to RCU-based lookup because the entire ordeal was
synchronized with the inode hash lock.

Since unhashing requires the inode lock, we can safely check whether it
happened after acquiring it.

Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Fixes: 7180f8d91f ("vfs: add rcu-based find_inode variants for iget ops")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:52:58 +02:00
David Howells
fcad93360d netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG
CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do
that now.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk
cc: Uwe Kleine-König <ukleinek@kernel.org>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:15:38 +02:00
David Howells
a9d47a50cf netfs: Revert "netfs: Switch debug logging to pr_debug()"
Revert commit 163eae0fb0 to get back the
original operation of the debugging macros.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org
Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk
cc: Uwe Kleine-König <ukleinek@kernel.org>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:15:37 +02:00
Florian Westphal
a16909ae99 netfilter: nft_set_pipapo_avx2: disable softinterrupts
We need to disable softinterrupts, else we get following problem:

1. pipapo_avx2 called from process context; fpu usable
2. preempt_disable() called, pcpu scratchmap in use
3. softirq handles rx or tx, we re-enter pipapo_avx2
4. fpu busy, fallback to generic non-avx version
5. fallback reuses scratch map and index, which are in use
   by the preempted process

Handle this same way as generic version by first disabling
softinterrupts while the scratchmap is in use.

Fixes: f0b3d33806 ("netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version")
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-24 10:01:59 +02:00
Linus Torvalds
786c8248db Merge tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
 "Two fixes for building perf and other tools:

   - Fix breakage in tracing tools due to pkg-config for
     libtrace{event,fs}

   - Fix build of perf when libunwind is used"

* tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf dso: Fix build when libunwind is enabled
  tools/latency: Use pkg-config in lib_setup of Makefile.config
  tools/rtla: Use pkg-config in lib_setup of Makefile.config
  tools/verification: Use pkg-config in lib_setup of Makefile.config
  tools: Make pkg-config dependency checks usable by other tools
  perf build: Warn if libtracefs is not found
2024-07-23 18:15:51 -07:00
Linus Torvalds
e9e969797b Merge tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
 "This moves the exec and binfmt_elf tests out of your way and into the
  tests/ subdirectory, following the newly ratified KUnit naming
  conventions. :)"

* tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  execve: Move KUnit tests to tests/ subdirectory
2024-07-23 17:30:42 -07:00
Helge Deller
cbade82334 parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.

Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-24 02:04:05 +02:00