Commit Graph

1107183 Commits

Author SHA1 Message Date
Linus Torvalds
296d3b3e05 Merge tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
 "A single RAS change:

   - Probe whether hardware error injection (direct MSR writes) is
     possible when injecting errors on AMD platforms. In some cases, the
     platform could prohibit those"

* tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Check whether writes to MCA_STATUS are getting ignored
2022-08-01 09:29:41 -07:00
Linus Torvalds
0fac198def Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull acl updates from Christian Brauner:
 "Last cycle we introduced support for mounting overlayfs on top of
  idmapped mounts. While looking into additional testing we realized
  that posix acls don't really work correctly with stacking filesystems
  on top of idmapped layers.

  We already knew what the fix were but it would require work that is
  more suitable for the merge window so we turned off posix acls for
  v5.19 for overlayfs on top of idmapped layers with Miklos routing my
  patch upstream in 72a8e05d4f ("Merge tag 'ovl-fixes-5.19-rc7' [..]").

  This contains the work to support posix acls for overlayfs on top of
  idmapped layers. Since the posix acl fixes should use the new
  vfs{g,u}id_t work the associated branch has been merged in. (We sent a
  pull request for this earlier.)

  We've also pulled in Miklos pull request containing my patch to turn
  of posix acls on top of idmapped layers. This allowed us to avoid
  rebasing the branch which we didn't like because we were already at
  rc7 by then. Merging it in allows this branch to first fix posix acls
  and then to cleanly revert the temporary fix it brought in by commit
  4a47c6385b ("ovl: turn of SB_POSIXACL with idmapped layers
  temporarily").

  The last patch in this series adds Seth Forshee as a co-maintainer for
  idmapped mounts. Seth has been integral to all of this work and is
  also the main architect behind the filesystem idmapping work which
  ultimately made filesystems such as FUSE and overlayfs available in
  containers. He continues to be active in both development and review.
  I'm very happy he decided to help and he has my full trust. This
  increases the bus factor which is always great for work like this. I'm
  honestly very excited about this because I think in general we don't
  do great in the bringing on new maintainers department"

For more explanations of the ACL issues, see

  https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/

* tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  Add Seth Forshee as co-maintainer for idmapped mounts
  Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"
  ovl: handle idmappings in ovl_get_acl()
  acl: make posix_acl_clone() available to overlayfs
  acl: port to vfs{g,u}id_t
  acl: move idmapped mount fixup into vfs_{g,s}etxattr()
  mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
2022-08-01 09:10:07 -07:00
Linus Torvalds
bdfae5ce38 Merge tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
 "This introduces the new vfs{g,u}id_t types we agreed on. Similar to
  k{g,u}id_t the new types are just simple wrapper structs around
  regular {g,u}id_t types.

  They allow to establish a type safety boundary in the VFS for idmapped
  mounts preventing confusion betwen {g,u}ids mapped into an idmapped
  mount and {g,u}ids mapped into the caller's or the filesystem's
  idmapping.

  An initial set of helpers is introduced that allows to operate on
  vfs{g,u}id_t types. We will remove all references to non-type safe
  idmapped mounts helpers in the very near future. The patches do
  already exist.

  This converts the core attribute changing codepaths which become
  significantly easier to reason about because of this change.

  Just a few highlights here as the patches give detailed overviews of
  what is happening in the commit messages:

   - The kernel internal struct iattr contains type safe vfs{g,u}id_t
     values clearly communicating that these values have to take a given
     mount's idmapping into account.

   - The ownership values placed in struct iattr to change ownership are
     identical for idmapped and non-idmapped mounts going forward. This
     also allows to simplify stacking filesystems such as overlayfs that
     change attributes In other words, they always represent the values.

   - Instead of open coding checks for whether ownership changes have
     been requested and an actual update of the inode is required we now
     have small static inline wrappers that abstract this logic away
     removing a lot of code duplication from individual filesystems that
     all open-coded the same checks"

* tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  mnt_idmapping: align kernel doc and parameter order
  mnt_idmapping: use new helpers in mapped_fs{g,u}id()
  fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t
  mnt_idmapping: return false when comparing two invalid ids
  attr: fix kernel doc
  attr: port attribute changes to new types
  security: pass down mount idmapping to setattr hook
  quota: port quota helpers mount ids
  fs: port to iattr ownership update helpers
  fs: introduce tiny iattr ownership update helpers
  fs: use mount types in iattr
  fs: add two type safe mapping helpers
  mnt_idmapping: add vfs{g,u}id_t
2022-08-01 08:56:55 -07:00
Linus Torvalds
e6a7cf70a3 Merge tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
 "Just a couple of flock() patches from Kuniyuki Iwashima.

  The main change is that this moves a file_lock allocation from the
  slab to the stack"

* tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fs/lock: Rearrange ops in flock syscall.
  fs/lock: Don't allocate file_lock in flock_make_lock().
2022-08-01 08:54:59 -07:00
Linus Torvalds
e88745dcfd Merge tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "First of all, we'd like to add Yue Hu and Jeffle Xu as two new
  reviewers. Thank them for spending time working on EROFS!

  There is no major feature outstanding in this cycle, mainly a patchset
  I worked on to prepare for rolling hash deduplication and folios for
  compressed data as the next big features. It kills the unneeded
  PG_error flag dependency as well.

  Apart from that, there are bugfixes and cleanups as always. Details
  are listed below:

   - Add Yue Hu and Jeffle Xu as reviewers

   - Add the missing wake_up when updating lzma streams

   - Avoid consecutive detection for Highmem memory

   - Prepare for multi-reference pclusters and get rid of PG_error

   - Fix ctx->pos update for NFS export

   - minor cleanups"

* tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (23 commits)
  erofs: update ctx->pos for every emitted dirent
  erofs: get rid of the leftover PAGE_SIZE in dir.c
  erofs: get rid of erofs_prepare_dio() helper
  erofs: introduce multi-reference pclusters (fully-referenced)
  erofs: record the longest decompressed size in this round
  erofs: introduce z_erofs_do_decompressed_bvec()
  erofs: try to leave (de)compressed_pages on stack if possible
  erofs: introduce struct z_erofs_decompress_backend
  erofs: get rid of `z_pagemap_global'
  erofs: clean up `enum z_erofs_collectmode'
  erofs: get rid of `enum z_erofs_page_type'
  erofs: rework online page handling
  erofs: switch compressed_pages[] to bufvec
  erofs: introduce `z_erofs_parse_in_bvecs'
  erofs: drop the old pagevec approach
  erofs: introduce bufvec to store decompressed buffers
  erofs: introduce `z_erofs_parse_out_bvecs()'
  erofs: clean up z_erofs_collector_begin()
  erofs: get rid of unneeded `inode', `map' and `sb'
  erofs: avoid consecutive detection for Highmem memory
  ...
2022-08-01 08:52:37 -07:00
Linus Torvalds
bec14d79f7 Merge tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:

 - support for FAN_MARK_IGNORE which untangles some of the not well
   defined corner cases with fanotify ignore masks

 - small cleanups

* tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Fix comment typo
  fanotify: introduce FAN_MARK_IGNORE
  fanotify: cleanups for fanotify_mark() input validations
  fanotify: prepare for setting event flags in ignore mask
  fs: inotify: Fix typo in inotify comment
2022-08-01 08:50:39 -07:00
Linus Torvalds
af07685b9c Merge tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 and reiserfs updates from Jan Kara:
 "A fix for ext2 handling of a corrupted fs image and cleanups in ext2
  and reiserfs"

* tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Add more validity checks for inode counts
  fs/reiserfs/inode: remove dead code in _get_block_create_0()
  fs/ext2: replace ternary operator with min_t()
2022-08-01 08:48:37 -07:00
Linus Torvalds
eb43bbac4c Merge tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:

 - Delay the cleanup of interrupted posix lock requests until the user
   space result arrives. Previously, the immediate cleanup would lead to
   extraneous warnings when the result arrived.

 - Tracepoint improvements, e.g. adding the lock resource name.

 - Delay the completion of lockspace creation until one full recovery
   cycle has completed. This allows more error cases to be returned to
   the caller.

 - Remove warnings from the locking layer about delayed network replies.
   The recently added midcomms warnings are much more useful.

 - Begin the process of deprecating two unused lock-timeout-related
   features. These features now require enabling via a Kconfig option,
   and enabling them triggers deprecation warnings. We expect to remove
   the code in v6.2.

* tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  fs: dlm: move kref_put assert for lkb structs
  fs: dlm: don't use deprecated timeout features by default
  fs: dlm: add deprecation Kconfig and warnings for timeouts
  fs: dlm: remove timeout from dlm_user_adopt_orphan
  fs: dlm: remove waiter warnings
  fs: dlm: fix grammar in lowcomms output
  fs: dlm: add comment about lkb IFL flags
  fs: dlm: handle recovery result outside of ls_recover
  fs: dlm: make new_lockspace() wait until recovery completes
  fs: dlm: call dlm_lsop_recover_prep once
  fs: dlm: update comments about recovery and membership handling
  fs: dlm: add resource name to tracepoints
  fs: dlm: remove additional dereference of lksb
  fs: dlm: change ast and bast trace order
  fs: dlm: change posix lock sigint handling
  fs: dlm: use dlm_plock_info for do_unlock_close
  fs: dlm: change plock interrupted message to debug again
  fs: dlm: add pid to debug log
  fs: dlm: plock use list_first_entry
2022-08-01 08:46:53 -07:00
Alexander Aring
9585898922 fs: dlm: move kref_put assert for lkb structs
The unhold_lkb() function decrements the lock's kref, and
asserts that the ref count was not the final one.  Use the
kref_put release function (which should not be called) to
call the assert, rather than doing the assert based on the
kref_put return value.  Using kill_lkb() as the release
function doesn't make sense if we only want to assert.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:46 -05:00
Alexander Aring
6b0afc0cc3 fs: dlm: don't use deprecated timeout features by default
This patch will disable use of deprecated timeout features if
CONFIG_DLM_DEPRECATED_API is not set.  The deprecated features
will be removed in upcoming kernel release v6.2.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:38 -05:00
Alexander Aring
81eeb82fc2 fs: dlm: add deprecation Kconfig and warnings for timeouts
This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option
that must be enabled to use two timeout-related features
that we intend to remove in kernel v6.2.  Warnings are
printed if either is enabled and used.  Neither has ever
been used as far as we know.

. The DLM_LSFL_TIMEWARN lockspace creation flag will be
  removed, along with the associated configfs entry for
  setting the timeout.  Setting the flag and configfs file
  would cause dlm to track how long locks were waiting
  for reply messages.  After a timeout, a kernel message
  would be logged, and a netlink message would be sent
  to userspace.  Recently, midcomms messages have been
  added that produce much better logging about actual
  problems with messages.  No use has ever been found
  for the netlink messages.

. The userspace libdlm API has allowed the DLM_LKF_TIMEOUT
  flag with a timeout value to be set in lock requests.
  The lock request would be cancelled after the timeout.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:32 -05:00
Linus Torvalds
3d7cb6b04c Linux 5.19 v5.19 2022-07-31 14:03:01 -07:00
Linus Torvalds
334c0ef642 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
 "One-liner fix of a NULL pointer deref in the Allwinner clk driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: Fix H6 RTC clock definition
2022-07-31 09:52:20 -07:00
Linus Torvalds
89caf57540 Merge tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Update the 'mitigations=' kernel param documentation

 - Check the IBPB feature flag before enabling IBPB in firmware calls
   because cloud vendors' fantasy when it comes to creating guest
   configurations is unlimited

 - Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV
   doesn't need it anymore

 - Remove dead CONFIG_* items

* tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
  x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
  Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
  x86/configs: Update configs in x86_debug.config
2022-07-31 09:26:53 -07:00
Linus Torvalds
5e4823e6da Merge tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:

 - Avoid rwsem lockups in certain situations when handling the handoff
   bit

* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
2022-07-31 09:21:13 -07:00
Linus Torvalds
cd2715b792 Merge tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:

 - Relax the condition under which the DIMM label in ghes_edac is set in
   order to accomodate an HPE BIOS which sets only the device but not
   the bank

 - Two forgotten fixes to synopsys_edac when handling error interrupts

* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/ghes: Set the DIMM label unconditionally
  EDAC/synopsys: Re-enable the error interrupts on v3 hw
  EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
2022-07-31 09:12:58 -07:00
Hongnan Li
ecce9212d0 erofs: update ctx->pos for every emitted dirent
erofs_readdir update ctx->pos after filling a batch of dentries
and it may cause dir/files duplication for NFS readdirplus which
depends on ctx->pos to fill dir correctly. So update ctx->pos for
every emitted dirent in erofs_fill_dentries to fix it.

Also fix the update of ctx->pos when the initial file position has
exceeded nameoff.

Fixes: 3e917cc305 ("erofs: make filesystem exportable")
Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-07-31 22:26:29 +08:00
Linus Torvalds
6a01025844 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Last set of ARM fixes for 5.19:

   - fix for MAX_DMA_ADDRESS overflow

   - fix for find_*_bit performing an out of bounds memory access"

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: findbit: fix overflowing offset
  ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
2022-07-30 17:24:16 -07:00
Waiman Long
6eebd5fb20 locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit d257cc8cb8 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.

Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock.  This is not the
case before commit d257cc8cb8 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.

In some cases, this new behavior may cause lockups as shown in [1] and
[2].

This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].

[1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/
[2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/

Fixes: d257cc8cb8 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com
2022-07-30 10:58:28 +02:00
Linus Torvalds
620725263f Merge tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "Two hotfixes, both cc:stable"

* tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/hmm: fault non-owner device private entries
  page_alloc: fix invalid watermark check on a negative value
2022-07-29 21:02:35 -07:00
Linus Torvalds
8a91f86f3e Merge tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
 "Just a single fix for NVMe, yet another quirk addition"

* tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block:
  nvme-pci: Crucial P2 has bogus namespace ids
2022-07-29 16:07:35 -07:00
Linus Torvalds
e65c6a46df Merge tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
 "Maxime had the dog^Wmailing list server eat his homework^Wmisc pull
  request.

  Two more small fixes, one in nouveau svm code and the other in
  simpledrm.

  nouveau:
   - page migration fix

  simpledrm:
   - fix mode_valid return value"

* tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm:
  nouveau/svm: Fix to migrate all requested pages
  drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
2022-07-29 13:25:31 -07:00
Dave Airlie
ce156c8a18 Merge tag 'drm-misc-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One fix to fix simpledrm mode_valid return value, and one for page
migration in nouveau

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220729094514.sfzhc3gqjgwgal62@penduick
2022-07-30 06:09:57 +10:00
Linus Torvalds
1c8ac1c4af Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Four fixes, three in drivers.

  The two biggest fixes are ufs and the remaining driver and core fix
  are small and obvious (and the core fix is low risk)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix a race condition related to device management
  scsi: core: Fix warning in scsi_alloc_sgtables()
  scsi: ufs: host: Hold reference returned by of_parse_phandle()
  scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
2022-07-29 13:07:03 -07:00
Eiichi Tsukata
ea304a8b89 docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
with the respective retbleed= settings.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: corbet@lwn.net
Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
2022-07-29 20:47:07 +02:00
Ralph Campbell
8a295dbbaf mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in. 
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.

For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error.  With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.

Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com
Fixes: 08ddddda66 ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reported-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 11:33:37 -07:00
Jaewon Kim
9282012fc0 page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e140 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e140 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: GyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 11:33:37 -07:00
Linus Torvalds
bb83c99d3d Merge tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix addresses for bss symbols, describing variables used in resolving
   data access in tools such as 'perf c2c' and 'perf mem'.

 - Skip symbols if SHF_ALLOC flag is not set, a technique used for
   listing deprecated symbols, its addresses are zeros, so not useful.

 - Remove undefined behavior from bpf_perf_object__next() when dealing
   with an empty bpf_objects_list list.

 - Make a ARM CoreSight disasm script work with both python2 and
   python3.

 - Sync x86's cpufeatures header with with the kernel sources.

* tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf: Remove undefined behavior from bpf_perf_object__next()
  perf symbol: Skip symbols if SHF_ALLOC flag is not set
  perf symbol: Correct address for bss symbols
  perf scripts python: Let script to be python2 compliant
  tools headers cpufeatures: Sync with the kernel sources
2022-07-29 11:26:28 -07:00
Linus Torvalds
4b20426d04 Merge tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "Just one commit to suppress a spurious warning added during the 5.19
  cycle"

* tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Avoid a false warning in unbind_workers()
2022-07-29 11:20:40 -07:00
Linus Torvalds
506e6dfb0f Merge tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
 "Make some false positive RCU splats resulting from a recent intel_idle
  driver change go away (Waiman Long)"

* tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
2022-07-29 10:57:26 -07:00
Lai Jiangshan
46a4d679ef workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.

Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.

It is very easy to trigger the warning:
  Set wq_unbound_cpumask to a small set of CPUs.
  Offline all the CPUs of wq_unbound_cpumask.
  Offline an extra CPU and trigger the warning.

Fixes: 10a5a651e3 ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-07-29 07:49:02 -10:00
Linus Torvalds
e4d8b09d67 Merge tag 'riscv-for-linus-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
 "A build fix for 'make vdso_install' that avoids an issue trying to
  install the compat VDSO"

* tag 'riscv-for-linus-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: compat: vdso: Fix vdso_install target
2022-07-29 10:46:03 -07:00
Linus Torvalds
a95eb1d086 Merge tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:

 - Fix cache size calculation, stack protection attributes, ptrace's
   fpr_set and "ROM Size" in boardinfo

 - Some cleanups and improvements of assembly

 - Some cleanups of unused code and useless code

* tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Fix wrong "ROM Size" of boardinfo
  LoongArch: Fix missing fcsr in ptrace's fpr_set
  LoongArch: Fix shared cache size calculation
  LoongArch: Disable executable stack by default
  LoongArch: Remove unused variables
  LoongArch: Remove clock setting during cpu hotplug stage
  LoongArch: Remove useless header compiler.h
  LoongArch: Remove several syntactic sugar macros for branches
  LoongArch: Re-tab the assembly files
  LoongArch: Simplify "BGT foo, zero" with BGTZ
  LoongArch: Simplify "BLT foo, zero" with BLTZ
  LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
  LoongArch: Use the "move" pseudo-instruction where applicable
  LoongArch: Use the "jr" pseudo-instruction where applicable
  LoongArch: Use ABI names of registers where appropriate
2022-07-29 10:10:30 -07:00
Linus Torvalds
9d928d9b78 Merge tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Re-enable the new amdgpu display engine for powerpc, as long as the
   compiler is correctly configured.

 - Disable stack variable initialisation in prom_init to fix GCC 12
   allmodconfig.

Thanks to Dan Horák and Sudip Mukherjee.

* tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  drm/amdgpu: Re-enable DCN for 64-bit powerpc
  powerpc/64s: Disable stack variable initialisation for prom_init
2022-07-29 09:57:07 -07:00
Tiezhu Yang
45b53c9051 LoongArch: Fix wrong "ROM Size" of boardinfo
We can see the "ROM Size" is different in the following outputs:

[root@linux loongson]# cat /sys/firmware/loongson/boardinfo
BIOS Information
Vendor                  : Loongson
Version                 : vUDK2018-LoongArch-V2.0.pre-beta8
ROM Size                : 63 KB
Release Date            : 06/15/2022

Board Information
Manufacturer            : Loongson
Board Name              : Loongson-LS3A5000-7A1000-1w-A2101
Family                  : LOONGSON64

[root@linux loongson]# dmidecode | head -11
...
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: Loongson
	Version: vUDK2018-LoongArch-V2.0.pre-beta8
	Release Date: 06/15/2022
	ROM Size: 4 MB

According to "BIOS Information (Type 0) structure" in the SMBIOS
Reference Specification [1], it shows 64K * (n+1) is the size of
the physical device containing the BIOS if the size is less than
16M.

Additionally, we can see the related code in dmidecode [2]:

  u64 s = { .l = (code1 + 1) << 6 };

So the output of dmidecode is correct, the output of boardinfo
is wrong, fix it.

By the way, at present no need to consider the size is 16M or
greater on LoongArch, because it is usually 4M or 8M which is
enough to use.

[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.6.0.pdf
[2] https://git.savannah.nongnu.org/cgit/dmidecode.git/tree/dmidecode.c#n347

Fixes: 628c3bb40e ("LoongArch: Add boot and setup routines")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:33 +08:00
Qi Hu
b0f3bdc002 LoongArch: Fix missing fcsr in ptrace's fpr_set
In file ptrace.c, function fpr_set does not copy fcsr data from ubuf
to kbuf. That's the reason why fcsr cannot be modified by ptrace.

This patch fixs this problem and allows users using ptrace to modify
the fcsr.

Co-developed-by: Xu Li <lixu@loongson.cn>
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:33 +08:00
Huacai Chen
1aea29d7c3 LoongArch: Fix shared cache size calculation
Current calculation of shared cache size is from the node (die) scope,
but we hope 'lscpu' to show the shared cache size of the whole package
for multi-die chips (e.g., Loongson-3C5000L, which contains 4 dies in
one package). So fix it by multiplying nodes_per_package.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:33 +08:00
Huacai Chen
317980e6b4 LoongArch: Disable executable stack by default
Disable executable stack for LoongArch by default, as all modern
architectures do.

Reported-by: Andreas Schwab <schwab@suse.de>
Suggested-by: WANG Xuerui <git@xen0n.name>
Link: https://sourceware.org/pipermail/binutils/2022-July/121992.html
Tested-by: WANG Xuerui <git@xen0n.name>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
Bibo Mao
3a3a4f7a65 LoongArch: Remove unused variables
There are some variables never used or referenced, this patch
removes these varaibles and make the code cleaner.

Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
Bibo Mao
71610ab1d0 LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed
cpu. However as different platforms require different methods to
configure clocks, the code is platform-specific, and probably belongs to
firmware/pmu or cpu regulator, rather than generic arch/loongarch code.

Also, there is no such register on QEMU virt machine since the
clock/frequency regulation is not emulated.

This patch removes the hard-coded clock register accesses in generic
LoongArch cpu hotplug flow.

Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
Jun Yi
f62b7626cb LoongArch: Remove useless header compiler.h
The content of LoongArch's compiler.h is trivial, with some unused
anywhere, so inline the definitions and remove the header.

Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
ab6e57a69d LoongArch: Remove several syntactic sugar macros for branches
These syntactic sugars have been supported by upstream binutils from the
beginning, so no need to patch them locally.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
f5c3c22f21 LoongArch: Re-tab the assembly files
Reflow the *.S files for better stylistic consistency, namely hard tabs
after mnemonic position, and vertical alignment of the first operand
with hard tabs. Tab width is obviously 8. Some pre-existing intra-block
vertical alignments are preserved.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
1fdb9a9249 LoongArch: Simplify "BGT foo, zero" with BGTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
d1bc75d759 LoongArch: Simplify "BLT foo, zero" with BLTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
d47b2dc87c LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
While B{EQ,NE}Z and B{EQ,NE} are different instructions, and the vastly
expanded range for branch destination does not really matter in the few
cases touched, use the B{EQ,NE}Z where possible for shorter lines and
better consistency (e.g. some places used "BEQ foo, zero", while some
used "BEQ zero, foo").

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
57ce5d3eef LoongArch: Use the "move" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.

As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
07b480695d LoongArch: Use the "jr" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.

As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
WANG Xuerui
d8e7f201a4 LoongArch: Use ABI names of registers where appropriate
Some of the assembly in the LoongArch port seem to come from a
prehistoric time, when the assembler didn't even have support for the
ABI names we all come to know and love, thus used raw register numbers
which hampered readability.

The usages are found with a regex match inside arch/loongarch, then
manually adjusted for those non-definitions.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-07-29 18:22:32 +08:00
Russell King (Oracle)
ec85bd369f ARM: findbit: fix overflowing offset
When offset is larger than the size of the bit array, we should not
attempt to access the array as we can perform an access beyond the
end of the array. Fix this by changing the pre-condition.

Using "cmp r2, r1; bhs ..." covers us for the size == 0 case, since
this will always take the branch when r1 is zero, irrespective of
the value of r2. This means we can fix this bug without adding any
additional code!

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-07-29 09:54:26 +01:00