Commit Graph

1090101 Commits

Author SHA1 Message Date
Eric Biggers
cb8435dc8b ext4: reject the 'commit' option on ext2 filesystems
The 'commit' option is only applicable for ext3 and ext4 filesystems,
and has never been accepted by the ext2 filesystem driver, so the ext4
driver shouldn't allow it on ext2 filesystems.

This fixes a failure in xfstest ext4/053.

Fixes: 8dc0aa8cf0 ("ext4: check incompatible mount options while mounting ext2/3")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220510183232.172615-1-ebiggers@kernel.org
2022-05-18 11:24:22 -04:00
Yang Li
b10b6278ae ext4: remove duplicated #include of dax.h in inode.c
Fix following includecheck warning:
./fs/ext4/inode.c: linux/dax.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220504225025.44753-1-yang.lee@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-18 11:24:22 -04:00
Baokun Li
f87c7a4b08 ext4: fix race condition between ext4_write and ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
 ==================================================================
 EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
 block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
 kernel BUG at fs/ext4/ext4_jbd2.c:53!
 invalid opcode: 0000 [#1] SMP KASAN PTI
 CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
 RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
 RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
 [...]
 Call Trace:
  ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
  generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
  ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
  ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
  do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
  do_iter_write+0x107/0x430 fs/read_write.c:861
  vfs_writev fs/read_write.c:934 [inline]
  do_pwritev+0x1e5/0x380 fs/read_write.c:1031
 [...]
 ==================================================================

Above issue may happen as follows:
           cpu1                     cpu2
__________________________|__________________________
do_pwritev
 vfs_writev
  do_iter_write
   ext4_file_write_iter
    ext4_buffered_write_iter
     generic_perform_write
      ext4_da_write_begin
                           vfs_fallocate
                            ext4_fallocate
                             ext4_convert_inline_data
                              ext4_convert_inline_data_nolock
                               ext4_destroy_inline_data_nolock
                                clear EXT4_STATE_MAY_INLINE_DATA
                               ext4_map_blocks
                                ext4_ext_map_blocks
                                 ext4_mb_new_blocks
                                  ext4_mb_regular_allocator
                                   ext4_mb_good_group_nolock
                                    ext4_mb_init_group
                                     ext4_mb_init_cache
                                      ext4_mb_generate_buddy  --> error
       ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
                                ext4_restore_inline_data
                                 set EXT4_STATE_MAY_INLINE_DATA
       ext4_block_write_begin
      ext4_da_write_end
       ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
       ext4_write_inline_data_end
        handle=NULL
        ext4_journal_stop(handle)
         __ext4_journal_stop
          ext4_put_nojournal(handle)
           ref_cnt = (unsigned long)handle
           BUG_ON(ref_cnt == 0)  ---> BUG_ON

The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.

To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().

Fixes: 0c8d414f16 ("ext4: let fallocate handle inline data correctly")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-17 14:17:40 -04:00
Zhang Yi
6493792d32 ext4: convert symlink external data block mapping to bdev
Symlink's external data block is one kind of metadata block, and now
that almost all ext4 metadata block's page cache (e.g. directory blocks,
quota blocks...) belongs to bdev backing inode except the symlink. It
is essentially worked in data=journal mode like other regular file's
data block because probably in order to make it simple for generic VFS
code handling symlinks or some other historical reasons, but the logic
of creating external data block in ext4_symlink() is complicated. and it
also make things confused if user do not want to let the filesystem
worked in data=journal mode. This patch convert the final exceptional
case and make things clean, move the mapping of the symlink's external
data block to bdev like any other metadata block does.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20220424140936.1898920-3-yi.zhang@huawei.com
2022-05-17 14:17:40 -04:00
Zhang Yi
9558cf14e8 ext4: add nowait mode for ext4_getblk()
Current ext4_getblk() might sleep if some resources are not valid or
could be race with a concurrent extents modifing procedure. So we
cannot call ext4_getblk() and ext4_map_blocks() to get map blocks in
the atomic context in some fast path (e.g. the upcoming procedure of
getting symlink external block in the RCU context), even if the map
extents have already been check and cached.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20220424140936.1898920-2-yi.zhang@huawei.com
2022-05-17 14:17:40 -04:00
Ojaswin Mujoo
e4e58e5df3 ext4: fix journal_ioprio mount option handling
In __ext4_super() we always overwrote the user specified journal_ioprio
value with a default value, expecting parse_apply_sb_mount_options() to
later correctly set ctx->journal_ioprio to the user specified value.
However, if parse_apply_sb_mount_options() returned early because of
empty sbi->es_s->s_mount_opts, the correct journal_ioprio value was
never set.

This patch fixes __ext4_super() to only use the default value if the
user has not specified any value for journal_ioprio.

Similarly, the remount behavior was to either use journal_ioprio
value specified during initial mount, or use the default value
irrespective of the journal_ioprio value specified during remount.
This patch modifies this to first check if a new value for ioprio
has been passed during remount and apply it.  If no new value is
passed, use the value specified during initial mount.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220418083545.45778-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-05-17 14:17:29 -04:00
Dmitry Monakhov
d63c00ea43 ext4: mark group as trimmed only if it was fully scanned
Otherwise nonaligned fstrim calls will works inconveniently for iterative
scanners, for example:

// trim [0,16MB] for group-1, but mark full group as trimmed
fstrim  -o $((1024*1024*128)) -l $((1024*1024*16)) ./m
// handle [16MB,16MB] for group-1, do nothing because group already has the flag.
fstrim  -o $((1024*1024*144)) -l $((1024*1024*16)) ./m

[ Update function documentation for ext4_trim_all_free -- TYT ]

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: https://lore.kernel.org/r/1650214995-860245-1-git-send-email-dmtrmonakhov@yandex-team.ru
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-05-17 14:17:21 -04:00
Ye Bin
0be698ecbe ext4: fix use-after-free in ext4_rename_dir_prepare
We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
ext4_get_first_dir_block: bh->b_data=0xffff88810bee6000 len=34478
ext4_get_first_dir_block: *parent_de=0xffff88810beee6ae bh->b_data=0xffff88810bee6000
ext4_rename_dir_prepare: [1] parent_de=0xffff88810beee6ae
==================================================================
BUG: KASAN: use-after-free in ext4_rename_dir_prepare+0x152/0x220
Read of size 4 at addr ffff88810beee6ae by task rep/1895

CPU: 13 PID: 1895 Comm: rep Not tainted 5.10.0+ #241
Call Trace:
 dump_stack+0xbe/0xf9
 print_address_description.constprop.0+0x1e/0x220
 kasan_report.cold+0x37/0x7f
 ext4_rename_dir_prepare+0x152/0x220
 ext4_rename+0xf44/0x1ad0
 ext4_rename2+0x11c/0x170
 vfs_rename+0xa84/0x1440
 do_renameat2+0x683/0x8f0
 __x64_sys_renameat+0x53/0x60
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f45a6fc41c9
RSP: 002b:00007ffc5a470218 EFLAGS: 00000246 ORIG_RAX: 0000000000000108
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f45a6fc41c9
RDX: 0000000000000005 RSI: 0000000020000180 RDI: 0000000000000005
RBP: 00007ffc5a470240 R08: 00007ffc5a470160 R09: 0000000020000080
R10: 00000000200001c0 R11: 0000000000000246 R12: 0000000000400bb0
R13: 00007ffc5a470320 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:00000000440015ce refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x10beee
flags: 0x200000000000000()
raw: 0200000000000000 ffffea00043ff4c8 ffffea0004325608 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88810beee580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88810beee600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88810beee680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                  ^
 ffff88810beee700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88810beee780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
ext4_rename_dir_prepare: [2] parent_de->inode=3537895424
ext4_rename_dir_prepare: [3] dir=0xffff888124170140
ext4_rename_dir_prepare: [4] ino=2
ext4_rename_dir_prepare: ent->dir->i_ino=2 parent=-757071872

Reason is first directory entry which 'rec_len' is 34478, then will get illegal
parent entry. Now, we do not check directory entry after read directory block
in 'ext4_get_first_dir_block'.
To solve this issue, check directory entry in 'ext4_get_first_dir_block'.

[ Trigger an ext4_error() instead of just warning if the directory is
  missing a '.' or '..' entry.   Also make sure we return an error code
  if the file system is corrupted.  -TYT ]

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220414025223.4113128-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-05-17 14:16:56 -04:00
Zhang Yi
4808cb5b98 ext4: add unmount filesystem message
Now that we have kernel message at mount time, system administrator
could acquire the mount time, device and options easily. But we don't
have corresponding unmounting message at umount time, so we cannot know
if someone umount a filesystem easily. Some of the modern filesystems
(e.g. xfs) have the umounting kernel message, so add one for ext4
filesystem for convenience.

 EXT4-fs (sdb): mounted filesystem with ordered data mode. Quota mode: none.
 EXT4-fs (sdb): unmounting filesystem.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220412145320.2669897-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-13 17:00:39 -04:00
Lv Ruyi
784a09951c ext4: remove unnecessary conditionals
iput() has already handled null and non-null parameter, so it is no
need to use if().

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Link: https://lore.kernel.org/r/20220411032337.2517465-1-lv.ruyi@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-13 16:27:24 -04:00
Jinke Han
af2b327581 ext4: remove unnecessary code in __mb_check_buddy
When enter elseif branch, the the MB_CHECK_ASSERT will never fail.
In addtion, the only illegal combination is 0/0, which can be caught
by the first if branch.

Signed-off-by: Jinke Han <hanjinke.666@bytedance.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220404152243.13556-1-hanjinke.666@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11 15:19:06 -04:00
Chin Yik Ming
fac8873527 ext4: fix spelling errors in comments
'functoin' and 'entres' should be 'function' and 'entries' respectively

Signed-off-by: Chin Yik Ming <yikming2222@gmail.com>
Link: https://lore.kernel.org/r/20220402090744.8918-1-yikming2222@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11 15:19:06 -04:00
Yu Zhe
c30365b90a ext4: remove unnecessary type castings
remove unnecessary void* type castings.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220401081321.73735-1-yuzhe@nfschina.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11 15:19:06 -04:00
Ye Bin
f4534c9fc9 ext4: fix warning in ext4_handle_inode_extension
We got issue as follows:
EXT4-fs error (device loop0) in ext4_reserve_inode_write:5741: Out of memory
EXT4-fs error (device loop0): ext4_setattr:5462: inode #13: comm syz-executor.0: mark_inode_dirty error
EXT4-fs error (device loop0) in ext4_setattr:5519: Out of memory
EXT4-fs error (device loop0): ext4_ind_map_blocks:595: inode #13: comm syz-executor.0: Can't allocate blocks for non-extent mapped inodes with bigalloc
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4361 at fs/ext4/file.c:301 ext4_file_write_iter+0x11c9/0x1220
Modules linked in:
CPU: 1 PID: 4361 Comm: syz-executor.0 Not tainted 5.10.0+ #1
RIP: 0010:ext4_file_write_iter+0x11c9/0x1220
RSP: 0018:ffff924d80b27c00 EFLAGS: 00010282
RAX: ffffffff815a3379 RBX: 0000000000000000 RCX: 000000003b000000
RDX: ffff924d81601000 RSI: 00000000000009cc RDI: 00000000000009cd
RBP: 000000000000000d R08: ffffffffbc5a2c6b R09: 0000902e0e52a96f
R10: ffff902e2b7c1b40 R11: ffff902e2b7c1b40 R12: 000000000000000a
R13: 0000000000000001 R14: ffff902e0e52aa10 R15: ffffffffffffff8b
FS:  00007f81a7f65700(0000) GS:ffff902e3bc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000012db88001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 do_iter_readv_writev+0x2e5/0x360
 do_iter_write+0x112/0x4c0
 do_pwritev+0x1e5/0x390
 __x64_sys_pwritev2+0x7e/0xa0
 do_syscall_64+0x37/0x50
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happen as follows:
Assume
inode.i_size=4096
EXT4_I(inode)->i_disksize=4096

step 1: set inode->i_isize = 8192
ext4_setattr
  if (attr->ia_size != inode->i_size)
    EXT4_I(inode)->i_disksize = attr->ia_size;
    rc = ext4_mark_inode_dirty
       ext4_reserve_inode_write
          ext4_get_inode_loc
            __ext4_get_inode_loc
              sb_getblk --> return -ENOMEM
   ...
   if (!error)  ->will not update i_size
     i_size_write(inode, attr->ia_size);
Now:
inode.i_size=4096
EXT4_I(inode)->i_disksize=8192

step 2: Direct write 4096 bytes
ext4_file_write_iter
 ext4_dio_write_iter
   iomap_dio_rw ->return error
 if (extend)
   ext4_handle_inode_extension
     WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize);
->Then trigger warning.

To solve above issue, if mark inode dirty failed in ext4_setattr just
set 'EXT4_I(inode)->i_disksize' with old value.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220326065351.761952-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-05-11 15:18:40 -04:00
Ojaswin Mujoo
7e0d0d4400 ext4: get rid of unused DEFAULT_MB_OPTIMIZE_SCAN
After recent changes to the mb_optimize_scan mount option
the DEFAULT_MB_OPTIMIZE_SCAN is no longer needed so get
rid of it.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220315114454.104182-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11 07:57:33 -04:00
Linus Torvalds
672c0c5173 Linux 5.18-rc5 v5.18-rc5 2022-05-01 13:57:58 -07:00
Linus Torvalds
b6b2648911 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Take care of faults occuring between the PARange and IPA range by
     injecting an exception

   - Fix S2 faults taken from a host EL0 in protected mode

   - Work around Oops caused by a PMU access from a 32bit guest when PMU
     has been created. This is a temporary bodge until we fix it for
     good.

  x86:

   - Fix potential races when walking host page table

   - Fix shadow page table leak when KVM runs nested

   - Work around bug in userspace when KVM synthesizes leaf 0x80000021
     on older (pre-EPYC) or Intel processors

  Generic (but affects only RISC-V):

   - Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: work around QEMU issue with synthetic CPUID leaves
  Revert "x86/mm: Introduce lookup_address_in_mm()"
  KVM: x86/mmu: fix potential races when walking host page table
  KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
  KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
  KVM: arm64: Inject exception on out-of-IPA-range translation fault
  KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
  KVM: arm64: Handle host stage-2 faults from 32-bit EL0
2022-05-01 11:49:32 -07:00
Linus Torvalds
b2da7df52e Merge tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is
   solely controlled by the hypervisor

 - A build fix to make the function prototype (__warn()) as visible as
   the definition itself

 - A bunch of objtool annotation fixes which have accumulated over time

 - An ORC unwinder fix to handle bad input gracefully

 - Well, we thought the microcode gets loaded in time in order to
   restore the microcode-emulated MSRs but we thought wrong. So there's
   a fix for that to have the ordering done properly

 - Add new Intel model numbers

 - A spelling fix

* tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  bug: Have __warn() prototype defined unconditionally
  x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config
  objtool: Use offstr() to print address of missing ENDBR
  objtool: Print data address for "!ENDBR" data warnings
  x86/xen: Add ANNOTATE_NOENDBR to startup_xen()
  x86/uaccess: Add ENDBR to __put_user_nocheck*()
  x86/retpoline: Add ANNOTATE_NOENDBR for retpolines
  x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline
  objtool: Enable unreachable warnings for CLANG LTO
  x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE
  x86,objtool: Mark cpu_startup_entry() __noreturn
  x86,xen,objtool: Add UNWIND hint
  lib/strn*,objtool: Enforce user_access_begin() rules
  MAINTAINERS: Add x86 unwinding entry
  x86/unwind/orc: Recheck address range after stack info was updated
  x86/cpu: Load microcode during restore_processor_state()
  x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-05-01 10:03:36 -07:00
Linus Torvalds
b70ed23c23 Merge tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
 "A bunch of objtool fixes to improve unwinding, sibling call detection,
  fallthrough detection and relocation handling of weak symbols when the
  toolchain strips section symbols"

* tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix code relocs vs weak symbols
  objtool: Fix type of reloc::addend
  objtool: Fix function fallthrough detection for vmlinux
  objtool: Fix sibling call detection in alternatives
  objtool: Don't set 'jump_dest' for sibling calls
  x86/uaccess: Don't jump between functions
2022-05-01 09:34:54 -07:00
Linus Torvalds
d4af0c1723 Merge tag 'irq_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:

 - Fix locking when accessing device MSI descriptors

* tag 'irq_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bus: fsl-mc-msi: Fix MSI descriptor mutex lock for msi_first_desc()
2022-05-01 09:30:47 -07:00
Linus Torvalds
57ae8a4921 Merge tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are some small driver core and kernfs fixes for some reported
  problems. They include:

   - kernfs regression that is causing oopses in 5.17 and newer releases

   - topology sysfs fixes for a few small reported problems.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: fix NULL dereferencing in kernfs_remove
  topology: Fix up build warning in topology_is_visible()
  arch_topology: Do not set llc_sibling if llc_id is invalid
  topology: make core_mask include at least cluster_siblings
  topology/sysfs: Hide PPIN on systems that do not support it.
2022-04-30 10:24:21 -07:00
Linus Torvalds
e2e5ebecca Merge tag 'char-misc-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are a small number of char/misc/other driver fixes for 5.18-rc5

  Nothing major in here, this is mostly IIO driver fixes along with some
  other small things:

   - at25 driver fix for systems without a dma-able stack

   - phy driver fixes for reported issues

   - binder driver fixes for reported issues

  All of these have been in linux-next without any reported problems"

* tag 'char-misc-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (31 commits)
  eeprom: at25: Use DMA safe buffers
  binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0
  binder: Address corner cases in deferred copy and fixup
  phy: amlogic: fix error path in phy_g12a_usb3_pcie_probe()
  iio: imu: inv_icm42600: Fix I2C init possible nack
  iio: dac: ltc2688: fix voltage scale read
  interconnect: qcom: sdx55: Drop IP0 interconnects
  interconnect: qcom: sc7180: Drop IP0 interconnects
  phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe
  phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe
  phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks
  bus: mhi: host: pci_generic: Flush recovery worker during freeze
  bus: mhi: host: pci_generic: Add missing poweroff() PM callback
  phy: ti: tusb1210: Fix an error handling path in tusb1210_probe()
  phy: samsung: exynos5250-sata: fix missing device put in probe error paths
  phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe
  phy: ti: Fix missing of_node_put in ti_pipe3_get_sysctrl()
  phy: ti: tusb1210: Make tusb1210_chg_det_states static
  iio:dac:ad3552r: Fix an IS_ERR() vs NULL check
  iio: sx9324: Fix default precharge internal resistance register
  ...
2022-04-30 10:15:57 -07:00
Linus Torvalds
a6b5c5dc06 Merge tag 'tty-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are some small serial driver fixes, and a larger number of GSM
  line discipline fixes for 5.18-rc5.

  These include:

   - lots of tiny n_gsm fixes for issues to resolve a number of reported
     problems. Seems that people are starting to actually use this code
     again.

   - 8250 driver fixes for some devices

   - imx serial driver fix

   - amba-pl011 driver fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (27 commits)
  tty: n_gsm: fix sometimes uninitialized warning in gsm_dlci_modem_output()
  serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device
  serial: 8250: Also set sticky MCR bits in console restoration
  tty: n_gsm: fix software flow control handling
  tty: n_gsm: fix invalid use of MSC in advanced option
  tty: n_gsm: fix broken virtual tty handling
  Revert "serial: sc16is7xx: Clear RS485 bits in the shutdown"
  tty: n_gsm: fix missing update of modem controls after DLCI open
  serial: 8250: Fix runtime PM for start_tx() for empty buffer
  serial: imx: fix overrun interrupts in DMA mode
  serial: amba-pl011: do not time out prematurely when draining tx fifo
  tty: n_gsm: fix incorrect UA handling
  tty: n_gsm: fix reset fifo race condition
  tty: n_gsm: fix missing tty wakeup in convergence layer type 2
  tty: n_gsm: fix wrong signal octets encoding in MSC
  tty: n_gsm: fix wrong command frame length field encoding
  tty: n_gsm: fix wrong command retry handling
  tty: n_gsm: fix missing explicit ldisc flush
  tty: n_gsm: fix wrong DLCI release order
  tty: n_gsm: fix insufficient txframe size
  ...
2022-04-30 10:09:14 -07:00
Linus Torvalds
da1b4042bd Merge tag 'usb-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes for 5.18-rc5 for some
  reported issues and new quirks. They include:

   - dwc3 driver fixes

   - xhci driver fixes

   - typec driver fixes

   - new usb-serial driver ids

   - added new USB devices to existing quirk tables

   - other tiny fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
  usb: phy: generic: Get the vbus supply
  usb: dwc3: gadget: Return proper request status
  usb: dwc3: pci: add support for the Intel Meteor Lake-P
  usb: dwc3: core: Only handle soft-reset in DCTL
  usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind()
  usb: misc: eud: Fix an error handling path in eud_probe()
  usb: core: Don't hold the device lock while sleeping in do_proc_control()
  usb: dwc3: Try usb-role-switch first in dwc3_drd_init
  usb: dwc3: core: Fix tx/rx threshold settings
  usb: mtu3: fix USB 3.0 dual-role-switch from device to host
  xhci: Enable runtime PM on second Alderlake controller
  usb: dwc3: fix backwards compat with rockchip devices
  dt-bindings: usb: samsung,exynos-usb2: add missing required reg
  usb: misc: fix improper handling of refcount in uss720_probe()
  USB: Fix ehci infinite suspend-resume loop issue in zhaoxin
  usb: typec: tcpm: Fix undefined behavior due to shift overflowing the constant
  usb: typec: rt1719: Fix build error without CONFIG_POWER_SUPPLY
  usb: typec: ucsi: Fix role swapping
  usb: typec: ucsi: Fix reuse of completion structure
  usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions
  ...
2022-04-30 09:58:46 -07:00
Linus Torvalds
e9512f3668 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
 "One fix for an endless error loop with the target driver affecting
  tapes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: pscsi: Set SCF_TREAT_READ_AS_NORMAL flag only if there is valid data
2022-04-30 09:47:59 -07:00
Linus Torvalds
8013d1d3d2 Merge tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:

 - A fix for a regression caused by the previous set of bugfixes
   changing tegra and at91 pinctrl properties.

   More work is needed to figure out what this should actually be, but a
   revert makes it work for the moment.

 - Defconfig regression fixes for tegra after renamed symbols

 - Build-time warning and static checker fixes for imx, op-tee, sunxi,
   meson, at91, and omap

 - More at91 DT fixes for audio, regulator and spi nodes

 - A regression fix for Renesas Hyperflash memory probe

 - A stability fix for amlogic boards, modifying the allowed cpufreq
   states

 - Multiple fixes for system suspend on omap2+

 - DT fixes for various i.MX bugs

 - A probe error fix for imx6ull-colibri MMC

 - A MAINTAINERS file entry for samsung bug reports

* tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
  Revert "arm: dts: at91: Fix boolean properties with values"
  bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
  Revert "arm64: dts: tegra: Fix boolean properties with values"
  arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
  ARM: dts: imx6ull-colibri: fix vqmmc regulator
  MAINTAINERS: add Bug entry for Samsung and memory controller drivers
  memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode
  ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
  ARM: dts: am3517-evm: Fix misc pinmuxing
  ARM: dts: am33xx-l4: Add missing touchscreen clock properties
  ARM: dts: Fix mmc order for omap3-gta04
  ARM: dts: at91: fix pinctrl phandles
  ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name
  ARM: dts: at91: Describe regulators on at91sam9g20ek
  ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
  ARM: dts: at91: Fix boolean properties with values
  ARM: dts: at91: use generic node name for dataflash
  ARM: dts: at91: align SPI NOR node name with dtschema
  ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines
  ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines
  ...
2022-04-29 15:51:05 -07:00
Linus Torvalds
c0e6265e6c Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
 "A semi-large pile of clk driver fixes this time around.

  Nothing is touching the core so these fixes are fairly well contained
  to specific devices that use these clk drivers.

   - Some Allwinner SoC fixes to gracefully handle errors and mark an
     RTC clk as critical so that the RTC keeps ticking.

   - Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC
     driver introduced this cycle. This has some devicetree bits acked
     by riscv maintainers. We're fixing it now so that the prior
     bindings aren't released in a major kernel version.

   - Remove a reset on Microchip PolarFire SoCs that broke when enabling
     CONFIG_PM.

   - Set a min/max for the Qualcomm graphics clk. This got broken by the
     clk rate range patches introduced this cycle"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
  clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical
  riscv: dts: microchip: reparent mpfs clocks
  clk: microchip: mpfs: add RTCREF clock control
  clk: microchip: mpfs: re-parent the configurable clocks
  dt-bindings: rtc: add refclk to mpfs-rtc
  dt-bindings: clk: mpfs: add defines for two new clocks
  dt-bindings: clk: mpfs document msspll dri registers
  riscv: dts: microchip: fix usage of fic clocks on mpfs
  clk: microchip: mpfs: mark CLK_ATHENA as critical
  clk: microchip: mpfs: fix parents for FIC clocks
  clk: qcom: clk-rcg2: fix gfx3d frequency calculation
  clk: microchip: mpfs: don't reset disabled peripherals
  clk: sunxi-ng: fix not NULL terminated coccicheck error
2022-04-29 15:38:23 -07:00
Linus Torvalds
bd3d3adea9 Merge tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Revert of a patch that caused timestamp issues (Tejun)

 - iocost warning fix (Tejun)

 - bfq warning fix (Jan)

* tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
  bfq: Fix warning in bfqq_request_over_limit()
  Revert "block: inherit request start time from bio for BLK_CGROUP"
  iocost: don't reset the inuse weight of under-weighted debtors
2022-04-29 15:28:42 -07:00
Linus Torvalds
63b7b3ea94 Merge tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Pretty boring:

   - three patches just adding reserved field checks (me, Eugene)

   - Fixing a potential regression with IOPOLL caused by a block change
     (Joseph)"

Boring is good.

* tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
  io_uring: check that data field is 0 in ringfd unregister
  io_uring: fix uninitialized field in rw io_kiocb
  io_uring: check reserved fields for recv/recvmsg
  io_uring: check reserved fields for send/sendmsg
2022-04-29 14:51:57 -07:00
Linus Torvalds
bdda8303f7 Merge tag 'random-5.18-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:

 - Eric noticed that the memmove() in crng_fast_key_erasure() was bogus,
   so this has been changed to a memcpy() and the confusing situation
   clarified with a detailed comment.

 - [Half]SipHash documentation updates from Bagas and Eric, after Eric
   pointed out that the use of HalfSipHash in random.c made a bit of the
   text potentially misleading.

* tag 'random-5.18-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  Documentation: siphash: disambiguate HalfSipHash algorithm from hsiphash functions
  Documentation: siphash: enclose HalfSipHash usage example in the literal block
  Documentation: siphash: convert danger note to warning for HalfSipHash
  random: document crng_fast_key_erasure() destination possibility
2022-04-29 14:47:17 -07:00
Linus Torvalds
bd383b8e32 Merge tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client
Pull ceph client fixes from Ilya Dryomov:
 "A fix for a NULL dereference that turns out to be easily triggerable
  by fsync (marked for stable) and a false positive WARN and snap_rwsem
  locking fixups"

* tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client:
  ceph: fix possible NULL pointer dereference for req->r_session
  ceph: remove incorrect session state check
  ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
  libceph: disambiguate cluster/pool full log message
2022-04-29 14:37:35 -07:00
Arnd Bergmann
adee8aa22a Revert "arm: dts: at91: Fix boolean properties with values"
This reverts commit 0dc23d1a8e, which caused another regression
as the pinctrl code actually expects an integer value of 0 or 1
rather than a simple boolean property.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 23:09:49 +02:00
Paolo Bonzini
f751d8eac1 KVM: x86: work around QEMU issue with synthetic CPUID leaves
Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU,
which assumes the *host* CPUID[0x80000000].EAX is higher or equal
to what KVM_GET_SUPPORTED_CPUID reports.

This causes QEMU to issue bogus host CPUIDs when preparing the input
to KVM_SET_CPUID2.  It can even get into an infinite loop, which is
only terminated by an abort():

   cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e)

To work around this, only synthesize those leaves if 0x8000001d exists
on the host.  The synthetic 0x80000021 leaf is mostly useful on Zen2,
which satisfies the condition.

Fixes: f144c49e8c ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 15:24:58 -04:00
Linus Torvalds
3e71713c9e Merge tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix Intel PT (Processor Trace) timeless decoding with perf.data
   directory.

 - ARM SPE (Statistical Profiling Extensions) address fixes, for
   synthesized events and for SPE events with physical addresses. Add a
   simple 'perf test' entry to make sure this doesn't regress.

 - Remove arch specific processing of kallsyms data to fixup symbol end
   address, fixing excessive memory consumption in the annotation code.

* tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf symbol: Remove arch__symbols__fixup_end()
  perf symbol: Update symbols__fixup_end()
  perf symbol: Pass is_kallsyms to symbols__fixup_end()
  perf test: Add perf_event_attr test for Arm SPE
  perf arm-spe: Fix SPE events with phys addresses
  perf arm-spe: Fix addresses of synthesized SPE events
  perf intel-pt: Fix timeless decoding with perf.data directory
2022-04-29 11:34:07 -07:00
Linus Torvalds
2d0de93ca2 Merge tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to properly ensure a single CPU is running during patch_text().

 - A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set,
   necessary after a recent refactoring.

* tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL
  riscv: patch_text: Fixup last cpu should be master
2022-04-29 10:44:58 -07:00
Linus Torvalds
66c2112b74 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
 "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.

  This is a fix to the MTE ELF ABI for a bug that was added during the
  most recent merge window as part of the coredump support.

  The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
  segment type has already been allocated to PT_AARCH64_UNWIND by the
  ELF ABI, so we've bumped the value and changed the name of the
  identifier to be better aligned with the existing one"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  elf: Fix the arm64 MTE ELF segment name and value
2022-04-29 10:36:47 -07:00
Sean Christopherson
643d95aac5 Revert "x86/mm: Introduce lookup_address_in_mm()"
Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down.  A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level
down.

This reverts commit 13c72c060f.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <YmwIi3bXr/1yhYV/@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:40:41 -04:00
Paolo Bonzini
73331c5d84 Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:

* Fix potential races when walking host page table

* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT

* Fix shadow page table leak when KVM runs nested
2022-04-29 12:39:34 -04:00
Mingwei Zhang
44187235cb KVM: x86/mmu: fix potential races when walking host page table
KVM uses lookup_address_in_mm() to detect the hugepage size that the host
uses to map a pfn.  The function suffers from several issues:

 - no usage of READ_ONCE(*). This allows multiple dereference of the same
   page table entry. The TOCTOU problem because of that may cause KVM to
   incorrectly treat a newly generated leaf entry as a nonleaf one, and
   dereference the content by using its pfn value.

 - the information returned does not match what KVM needs; for non-present
   entries it returns the level at which the walk was terminated, as long
   as the entry is not 'none'.  KVM needs level information of only 'present'
   entries, otherwise it may regard a non-present PXE entry as a present
   large page mapping.

 - the function is not safe for mappings that can be torn down, because it
   does not disable IRQs and because it returns a PTE pointer which is never
   safe to dereference after the function returns.

So implement the logic for walking host page tables directly in KVM, and
stop using lookup_address_in_mm().

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220429031757.2042406-1-mizhang@google.com>
[Inline in host_pfn_mapping_level, ensure no semantic change for its
 callers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Paolo Bonzini
d495f942f4 KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it would not be possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
  problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
  usage of flags was only introduced in 5.18.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Sean Christopherson
86931ff720 KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's
MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR.
The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the
guest, possibly with help from userspace, manages to coerce KVM into
creating a SPTE for an "impossible" gfn, KVM will leak the associated
shadow pages (page tables):

  WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57
                                kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G        W         5.18.0-rc1+ #293
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
   kvm_destroy_vm+0x162/0x2d0 [kvm]
   kvm_vm_release+0x1d/0x30 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5b/0x90
   exit_to_user_mode_prepare+0xd2/0xe0
   syscall_exit_to_user_mode+0x1d/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

On bare metal, encountering an impossible gpa in the page fault path is
well and truly impossible, barring CPU bugs, as the CPU will signal #PF
during the gva=>gpa translation (or a similar failure when stuffing a
physical address into e.g. the VMCS/VMCB).  But if KVM is running as a VM
itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR
of the underlying hardware, in which case the hardware will not fault on
the illegal-from-KVM's-perspective gpa.

Alternatively, KVM could continue allowing the dodgy behavior and simply
zap the max possible range.  But, for hosts with MAXPHYADDR < 52, that's
a (minor) waste of cycles, and more importantly, KVM can't reasonably
support impossible memslots when running on bare metal (or with an
accurate MAXPHYADDR as a VM).  Note, limiting the overhead by checking if
KVM is running as a guest is not a safe option as the host isn't required
to announce itself to the guest in any way, e.g. doesn't need to set the
HYPERVISOR CPUID bit.

A second alternative to disallowing the memslot behavior would be to
disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR.  That
restriction is undesirable as there are legitimate use cases for doing
so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous
systems so that VMs can be migrated between hosts with different
MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess.

Note that any guest.MAXPHYADDR is valid with shadow paging, and it is
even useful in order to test KVM with MAXPHYADDR=52 (i.e. without
any reserved physical address bits).

The now common kvm_mmu_max_gfn() is inclusive instead of exclusive.
The memslot and TDP MMU code want an exclusive value, but the name
implies the returned value is inclusive, and the MMIO path needs an
inclusive check.

Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 524a1e4e38 ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220428233416.2446833-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:21 -04:00
Paolo Bonzini
484c22df5a Merge tag 'kvmarm-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.18, take #2

- Take care of faults occuring between the PARange and
  IPA range by injecting an exception

- Fix S2 faults taken from a host EL0 in protected mode

- Work around Oops caused by a PMU access from a 32bit
  guest when PMU has been created. This is a temporary
  bodge until we fix it for good.
2022-04-29 12:32:14 -04:00
Arnd Bergmann
310b663753 Merge tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
ARM: tegra: Default configuration fixes for v5.18

This contains two updates to the default configuration needed because of
a Kconfig symbol name change. This fixes a failure that was detected in
the NVIDIA automated test farm.

* tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver
  ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option

Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 16:41:22 +02:00
Eugene Syromiatnikov
303cc749c8 io_uring: check that data field is 0 in ringfd unregister
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.

Fixes: e7a6c00dc7 ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29 08:39:43 -06:00
Arnd Bergmann
73c7bcdcfd Merge tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.18, 2nd round:

- Fix one sparse warning on imx-weim driver.
- Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board.
- Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix
  bd718xx-clk probe error.

* tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
  ARM: dts: imx6ull-colibri: fix vqmmc regulator
  bus: imx-weim: make symbol 'weim_of_notifier' static

Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 16:23:31 +02:00
Arnd Bergmann
c755ad9810 Merge tag 'sunxi-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
Fix return value in RSB bus driver

* tag 'sunxi-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()

Link: https://lore.kernel.org/r/Ymbkd+/dDmRJz66w@kista.localdomain
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 16:23:02 +02:00
Jan Kara
09df6a75ff bfq: Fix warning in bfqq_request_over_limit()
People are occasionally reporting a warning bfqq_request_over_limit()
triggering reporting that BFQ's idea of cgroup hierarchy (and its depth)
does not match what generic blkcg code thinks. This can actually happen
when bfqq gets moved between BFQ groups while bfqq_request_over_limit()
is running. Make sure the code is safe against BFQ queue being moved to
a different BFQ group.

Fixes: 76f1df88bb ("bfq: Limit number of requests consumed by each cgroup")
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29 06:45:37 -06:00
Thomas Gleixner
7e0815b3e0 x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Fixes: 809f9267bb ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
2022-04-29 14:37:39 +02:00
Linus Torvalds
38d741cb70 Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Another relatively quiet week, amdgpu leads the way, some i915 display
  fixes, and a single sunxi fix.

  amdgpu:
   - Runtime pm fix
   - DCN memory leak fix in error path
   - SI DPM deadlock fix
   - S0ix fix

  amdkfd:
   - GWS fix
   - GWS support for CRIU

  i915:
   - Fix #5284: Backlight control regression on XMG Core 15 e21
   - Fix black display plane on Acer One AO532h
   - Two smaller display fixes

  sunxi:
   - Single fix removing applying PHYS_OFFSET twice"

* tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
  drm/amd/pm: fix the deadlock issue observed on SI
  drm/amd/display: Fix memory leak in dcn21_clock_source_create
  drm/amdgpu: don't runtime suspend if there are displays attached (v3)
  drm/amdkfd: CRIU add support for GWS queues
  drm/amdkfd: Fix GWS queue count
  drm/sun4i: Remove obsolete references to PHYS_OFFSET
  drm/i915/fbc: Consult hw.crtc instead of uapi.crtc
  drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
  drm/i915: Check EDID for HDR static metadata when choosing blc
  drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
2022-04-28 18:00:34 -07:00
Dave Airlie
9d9f720733 Merge tag 'amd-drm-fixes-5.18-2022-04-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.18-2022-04-27:

amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix

amdkfd:
- GWS fix
- GWS support for CRIU

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
2022-04-29 10:27:05 +10:00