__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.
Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.
If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.
Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com
Fixes: 91fa127794 ("PCI: Expose PCIe Resizable BAR support via sysfs")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
pci_release_resource() will print "... releasing" regardless of the
resource being assigned or not. Move the print after the res->parent check
to avoid claiming the kernel would be releasing an unassigned resource.
Likely, none of the current callers pass a resource that is unassigned so
this change is mostly to correct the non-sensical order than to remove
errorneous printouts.
Link: https://lore.kernel.org/r/20250307140922.5776-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register. Larger sizes can be
advertised via the Capability register, but that requires an API change.
Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.
Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Remove and rescan cycle can result in failure to assign a bridge window if
it becomes larger than before the remove. The bridge window size will
include space for disabled Expansion ROM, which can causes the bridge
window to not fit anymore into the same address space slot on rescan if the
Expansion ROM resource was not assigned before the remove. In addition, the
optional resource handling is not internally consistent.
The resource fitting logic supports three main types of optional resources:
- IOV BARs
- Expansion ROMs
- Bridge window size variation due to optional resources
In addition to the above, resizable BARs beyond their current size will
require handling optional variation in resource sizes within the resource
fitting algorithm (not yet done by the resource fitting code).
There are multiple inconsistencies related to optional resource handling:
a) The allocation failure of disabled expansion ROM requires special case
inside assign_requested_resources_sorted().
b) The optionality of disabled expansion ROM is not considered during
bridge window sizing in pbus_size_mem().
c) Setting resource size to zero for optional resource in pbus_size_mem()
is problematic because it makes also the alignment invalid, which is
checked by pdev_sort_resources().
Optional IOV resources have their size set to zero by pbus_size_mem()
but the information about size is stored externally in struct pci_sriov
and complex call-chain trickery in pci_resource_alignment() ensures IOV
resources return a valid alignment despite having zero resource size. A
solution that is specific to IOV resources makes it hard to use the same
solution for other types of resources such as expansion ROM.
Simply changing pbus_size_mem() is not sufficient to fully address the main
issue because it would introduce disparity between bridge window sizing and
resource allocation. Due to size-based ordering of the resource list during
assignment loop, an Expansion ROM resource could steal space from some
other resource and make the other resource not fit if the Expansion ROM is
larger than the other resource. Thus, the resource assignment functions
need to be changed as well.
Make optional resource handling more straightforward. Use
pci_resource_is_optional() to determine if a resource is optional in both
bridge window sizing and assignment failure classification to ensure they
always align. Indicate with a parameter to
assign_requested_resources_sorted() whether it should attempt to allocate
optional resources or not.
Always try first to assign all resources (also when realloc_head is not
provided). This is required for calls from
pci_assign_unassigned_root_bus_resources() that provide realloc_head only
with some of its iterations.
Non-bridge-window optional resources in realloc_head now have add_size 0.
This condition has to be detected in reassign_resources_sorted() before
reassigning them (which would fail as there is no size change). Removing
add_size=0 optional resources entirely from realloc_head might eventually
be doable but further rework in __assign_resources_sorted() is needed first
to support such a change.
Link: https://lore.kernel.org/r/20241216175632.4175-26-ilpo.jarvinen@linux.intel.com
Reported-by: Jia Yao <jia.yao@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219547
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jia Yao <jia.yao@intel.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Resetting a resource is problematic as it prevents attempting to allocate
the resource later, unless something in between restores the resource.
Similarly, if fail_head does not contain all resources that were reset,
those resources cannot be restored later.
The entire reset/restore cycle adds complexity and leaving resources in the
reset state causes issues to other code such as for checks done in
pci_enable_resources(). Take a small step towards not resetting resources
by delaying reset until the end of resource assignment and build failure
list (fail_head) in sync with the reset to avoid leaving behind resources
that cannot be restored (for the case where the caller provides fail_head
in the first place to allow restore somewhere in the callchain, as is not
all callers pass non-NULL fail_head).
Leave the Expansion ROM check temporarily in place while building the
failure list until an upcoming change that reworks optional resource
handling.
Ideally, whole resource reset could be removed but doing that in one step
would be non-tractable due to complexity of all related code.
Link: https://lore.kernel.org/r/20241216175632.4175-25-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
reassign_resources_sorted() uses resource_size() to select between
pci_assign_resource() and pci_reassign_resource(). Due to twisted way
bridge window sizing in pbus_size_mem() sets resource sizes to 0, it works
to match into IOV resources but that is going to be changed by an upcoming
change.
Replace resource_size() check with res->parent check that is the true
dividing line in between whether assign or reassign function should be used
for the resource.
Link: https://lore.kernel.org/r/20241216175632.4175-24-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
PCI resource fitting is somewhat hard to track because it performs many
actions without logging them. In the case inside
__assign_resources_sorted(), the resources are released before resource
assignment is going to be retried in a different order. That is just one
level of retries the resource fitting performs overall so tracking it
through repeated assignments or failures of a resource gets messy rather
quickly.
Simply announce the release explicitly using pci_dbg() so it is clear what
is going on with each resource.
Link: https://lore.kernel.org/r/20241216175632.4175-23-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
pci_enable_resources() checks if device's io and mem resources are all
assigned and disallows enable if any resource failed to assign (*) but
makes an exception for the case of disabled extension ROM. There are other
optional resources, however.
Add pci_resource_is_optional() and use it instead of
pci_resource_is_disabled_rom() to cover also IOV resources that are also
optional as per pbus_size_mem().
As there will be more users of pci_resource_is_optional() inside
setup-bus.c in changes coming up after this one, the function is placed
there.
(*) In practice, resource fitting code calls reset_resource() for any
resource it fails to assign which clears resource's ->flags causing
pci_enable_resources() to never detect failed resource assignments.
This seems undesirable internal logic inconsistency, effectively
reset_resource() prevents pci_enable_resources() from functioning as
intended. This is one step of many that will be needed towards removing
reset_resource().
Link: https://lore.kernel.org/r/20241216175632.4175-20-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Commit a4ac9fea01 ("PCI : Calculate right add_size") removed including
min_align into new_size in pci_reassign_resource() which is the correct
thing to do. However, it also added a snakeoil comment that the resource
would already be aligned with min_align which is incorrect.
A resource that is assigned earlier is aligned with the old alignment, NOT
with the new requested alignment (min_align) until later deep within the
reassignment callchain. Thus, remove the incorrect comment.
Link: https://lore.kernel.org/r/20241216175632.4175-18-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed the bridge window requirements for non-optional size (size0)
but pbus_size_mem() also handles optional sizes (IOV resources) using
size1. This can manifest, e.g., as a failure to resize a BAR back to
its original size after it was first shrunk when device has a VF BAR
resource because the bridge window (size1) is enlarged beyond what is
strictly required to fit the downstream resources.
Allow using relaxed bridge window tail sizing rules also with the optional
resources (size1) so that the remove/realloc cycle during BAR resize
(smaller and back to the original size) does not fail unexpectedly due to
increase in bridge window size demand.
Also move add_align calculation to more logical place next to size1
assignment as they are strongly related to each other.
Link: https://lore.kernel.org/r/20241216175632.4175-5-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The required alignment given for
pbus_upstream_space_available(), however, was add_align which relates
only to size1 alignment.
As pbus_upstream_space_available() only selects between normal and relaxed
tail alignment of the bridge window, the different alignment only makes
relaxed tail alignment to be used more often than what was intended, which
should be harmless because relaxed tail alignment itself should work in all
cases.
For consistency, change pbus_upstream_space_available() call to use
min_align which is the alignment that is going to be used for the bridge
window in the case where size0 sized allocation is attempted.
Link: https://lore.kernel.org/r/20241216175632.4175-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The change, however, also overwrote
add_align, which is only related to case where optional size1 related
entry is added into realloc head.
Correct this by removing the add_align overwrite.
Link: https://lore.kernel.org/r/20241216175632.4175-2-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
7180c1d086 ("PCI: Distribute available resources for root buses, too")
breaks BAR assignment on some devices:
pci 0006:03:00.0: BAR 0 [mem 0x6300c0000000-0x6300c1ffffff 64bit pref]: assigned
pci 0006:03:00.1: BAR 0 [mem 0x6300c2000000-0x6300c3ffffff 64bit pref]: assigned
pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: can't assign; no space
pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
The apertures of domain 0006 before 7180c1d086:
6300c0000000-63ffffffffff : PCI Bus 0006:00
6300c0000000-6300c9ffffff : PCI Bus 0006:01
6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB
6300c0000000-6300c8ffffff : PCI Bus 0006:03 # 144MB
6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB
6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB
6300c4000000-6300c47fffff : 0006:03:00.2 # 8MB
6300c4800000-6300c67fffff : 0006:03:00.0 # 32MB
6300c6800000-6300c87fffff : 0006:03:00.1 # 32MB
6300c9000000-6300c9bfffff : PCI Bus 0006:04 # 12MB
6300c9000000-6300c9bfffff : PCI Bus 0006:05 # 12MB
6300c9000000-6300c91fffff : PCI Bus 0006:06 # 2MB
6300c9200000-6300c93fffff : PCI Bus 0006:07 # 2MB
6300c9400000-6300c95fffff : PCI Bus 0006:08 # 2MB
6300c9600000-6300c97fffff : PCI Bus 0006:09 # 2MB
After 7180c1d086:
6300c0000000-63ffffffffff : PCI Bus 0006:00
6300c0000000-6300c9ffffff : PCI Bus 0006:01
6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB
6300c0000000-6300c43fffff : PCI Bus 0006:03 # 68MB
6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB
6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB
--- no space --- : 0006:03:00.2 # 8MB
--- no space --- : 0006:03:00.0 # 32MB
--- no space --- : 0006:03:00.1 # 32MB
6300c4400000-6300c4dfffff : PCI Bus 0006:04 # 10MB
6300c4400000-6300c4dfffff : PCI Bus 0006:05 # 10MB
6300c4400000-6300c45fffff : PCI Bus 0006:06 # 2MB
6300c4600000-6300c47fffff : PCI Bus 0006:07 # 2MB
6300c4800000-6300c49fffff : PCI Bus 0006:08 # 2MB
6300c4a00000-6300c4bfffff : PCI Bus 0006:09 # 2MB
We can see that the window to 0006:03 gets shrunken too much and 0006:04
eats away the window for 0006:03:00.2.
The offending commit distributes the upstream bridge's resources multiple
times to every downstream bridge, hence makes the aperture smaller than
desired because calculation of io_per_b, mmio_per_b and mmio_pref_per_b
becomes incorrect.
Instead, distribute downstream bridges' own resources to resolve the issue.
Link: https://lore.kernel.org/r/20241204022457.51322-1-kaihengf@nvidia.com
Fixes: 7180c1d086 ("PCI: Distribute available resources for root buses, too")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219540
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Carol Soto <csoto@nvidia.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Chris Chiu <chris.chiu@canonical.com>
Pull turbostat updates from Len Brown:
- Fix regression that affinitized forked child in one-shot mode.
- Harden one-shot mode against hotplug online/offline
- Enable RAPL SysWatt column by default
- Add initial PTL, CWF platform support
- Harden initial PMT code in response to early use
- Enable first built-in PMT counter: CWF c1e residency
- Refuse to run on unsupported platforms without --force, to encourage
updating to a version that supports the system, and to avoid
no-so-useful measurement results
* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
tools/power turbostat: version 2025.02.02
tools/power turbostat: Add CPU%c1e BIC for CWF
tools/power turbostat: Harden one-shot mode against cpu offline
tools/power turbostat: Fix forked child affinity regression
tools/power turbostat: Add tcore clock PMT type
tools/power turbostat: version 2025.01.14
tools/power turbostat: Allow adding PMT counters directly by sysfs path
tools/power turbostat: Allow mapping multiple PMT files with the same GUID
tools/power turbostat: Add PMT directory iterator helper
tools/power turbostat: Extend PMT identification with a sequence number
tools/power turbostat: Return default value for unmapped PMT domains
tools/power turbostat: Check for non-zero value when MSR probing
tools/power turbostat: Enhance turbostat self-performance visibility
tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
tools/power turbostat: Fix PMT mmaped file size rounding
tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
tools/power turbostat: Add an NMI column
tools/power turbostat: add Busy% to "show idle"
tools/power turbostat: Introduce --force parameter
tools/power turbostat: Improve --help output
...
Pull sh updates from John Paul Adrian Glaubitz:
"Fixes and improvements for sh:
- replace seq_printf() with the more efficient
seq_put_decimal_ull_width() to increase performance when stress
reading /proc/interrupts (David Wang)
- migrate sh to the generic rule for built-in DTB to help avoid race
conditions during parallel builds which can occur because Kbuild
decends into arch/*/boot/dts twice (Masahiro Yamada)
- replace select with imply in the board Kconfig for enabling
hardware with complex dependencies. This addresses warnings which
were reported by the kernel test robot (Geert Uytterhoeven)"
* tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: boards: Use imply to enable hardware with complex dependencies
sh: Migrate to the generic rule for built-in DTB
sh: irq: Use seq_put_decimal_ull_width() for decimal values
Summary of Changes since 2024.11.30:
Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.
Harden one-shot mode against hotplug online/offline
Enable RAPL SysWatt column by default.
Add initial PTL, CWF platform support.
Harden initial PMT code in response to early use.
Enable first built-in PMT counter: CWF c1e residency
Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.
Signed-off-by: Len Brown <len.brown@intel.com>
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
Pull MIPS fix from Thomas Bogendoerfer:
"Revert commit breaking sysv ipc for o32 ABI"
* tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
Revert "mips: fix shmctl/semctl/msgctl syscall for o32"
Pull more smb client updates from Steve French:
- various updates for special file handling: symlink handling,
support for creating sockets, cleanups, new mount options (e.g. to
allow disabling using reparse points for them, and to allow
overriding the way symlinks are saved), and fixes to error paths
- fix for kerberos mounts (allow IAKerb)
- SMB1 fix for stat and for setting SACL (auditing)
- fix an incorrect error code mapping
- cleanups"
* tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
cifs: Fix parsing native symlinks directory/file type
cifs: update internal version number
cifs: Add support for creating WSL-style symlinks
smb3: add support for IAKerb
cifs: Fix struct FILE_ALL_INFO
cifs: Add support for creating NFS-style symlinks
cifs: Add support for creating native Windows sockets
cifs: Add mount option -o reparse=none
cifs: Add mount option -o symlink= for choosing symlink create type
cifs: Fix creating and resolving absolute NT-style symlinks
cifs: Simplify reparse point check in cifs_query_path_info() function
cifs: Remove symlink member from cifs_open_info_data union
cifs: Update description about ACL permissions
cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h
cifs: Remove struct reparse_posix_data from struct cifs_open_info_data
cifs: Remove unicode parameter from parse_reparse_point() function
cifs: Fix getting and setting SACLs over SMB1
cifs: Remove intermediate object of failed create SFU call
cifs: Validate EAs for WSL reparse points
cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM
...
Pull debugfs fix from Greg KH:
"Here is a single debugfs fix from Al to resolve a reported regression
in the driver-core tree. It has been reported to fix the issue"
* tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Fix the missing initializations in __debugfs_file_get()
Pull misc fixes from Andrew Morton:
"21 hotfixes. 8 are cc:stable and the remainder address post-6.13
issues. 13 are for MM and 8 are for non-MM.
All are singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
MAINTAINERS: include linux-mm for xarray maintenance
revert "xarray: port tests to kunit"
MAINTAINERS: add lib/test_xarray.c
mailmap, MAINTAINERS, docs: update Carlos's email address
mm/hugetlb: fix hugepage allocation for interleaved memory nodes
mm: gup: fix infinite loop within __get_longterm_locked
mm, swap: fix reclaim offset calculation error during allocation
.mailmap: update email address for Christopher Obbard
kfence: skip __GFP_THISNODE allocations on NUMA systems
nilfs2: fix possible int overflows in nilfs_fiemap()
mm: compaction: use the proper flag to determine watermarks
kernel: be more careful about dup_mmap() failures and uprobe registering
mm/fake-numa: handle cases with no SRAT info
mm: kmemleak: fix upper boundary check for physical address objects
mailmap: add an entry for Hamza Mahfooz
MAINTAINERS: mailmap: update Yosry Ahmed's email address
scripts/gdb: fix aarch64 userspace detection in get_current_task
mm/vmscan: accumulate nr_demoted for accurate demotion statistics
ocfs2: fix incorrect CPU endianness conversion causing mount failure
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
...
Pull media fix from Mauro Carvalho Chehab:
"A revert for a regression in the uvcvideo driver"
* tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 2097152 kB
Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com
Fixes: b78b27d029 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
Suggested-by: Muchun Song <muchun.song@linux.dev>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>