linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-15 05:03:01 -05:00

Author	SHA1	Message	Date
Tony Luck	f4e0cd80d3	x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG The L3 resource has several requirements for domains. There are per-domain structures that hold the 64-bit values of counters, and elements to keep track of the overflow and limbo threads. None of these are needed for the PERF_PKG resource. The hardware counters are wide enough that they do not wrap around for decades. Define a new rdt_perf_pkg_mon_domain structure which just consists of the standard rdt_domain_hdr to keep track of domain id and CPU mask. Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action needed for this resource is to create and populate domain directories if a domain is added while resctrl is mounted. Similarly resctrl_offline_mon_domain() only needs to remove domain directories. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 23:36:41 +01:00
Tony Luck	93d9fd8999	fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() Clearing a monitor group's mon_data directory is complicated because of the support for Sub-NUMA Cluster (SNC) mode. Refactor the SNC case into a helper function to make it easier to add support for a new telemetry resource. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 23:02:58 +01:00
Tony Luck	0ec1db4cac	fs/resctrl: Refactor mkdir_mondata_subdir() Population of a monitor group's mon_data directory is unreasonably complicated because of the support for Sub-NUMA Cluster (SNC) mode. Split out the SNC code into a helper function to make it easier to add support for a new telemetry resource. Move all the duplicated code to make and set owner of domain directories into the mon_add_all_files() helper and rename to _mkdir_mondata_subdir(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 23:02:57 +01:00
Tony Luck	51541f6ca7	x86/resctrl: Read telemetry events Introduce intel_aet_read_event() to read telemetry events for resource RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package, so scan all of them and add up all counters. Aggregators may return an invalid data indication if they have received no records for a given RMID. The user will see "Unavailable" if none of the aggregators on a package provide valid counts. Resctrl now uses readq() so depends on X86_64. Update Kconfig. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 23:02:57 +01:00
Tony Luck	7e6df96145	x86/resctrl: Find and enable usable telemetry events Every event group has a private copy of the data of all telemetry event aggregators (aka "telemetry regions") tracking its feature type. Included may be regions that have the same feature type but tracking different GUID from the event group's. Traverse the event group's telemetry region data and mark all regions that are not usable by the event group as unusable by clearing those regions' MMIO addresses. A region is considered unusable if: 1) GUID does not match the GUID of the event group. 2) Package ID is invalid. 3) The enumerated size of the MMIO region does not match the expected value from the XML description file. Hereafter any telemetry region with an MMIO address is considered valid for the event group it is associated with. Enable all the event group's events as long as there is at least one usable region from where data for its events can be read. Enabling of an event can fail if the same event has already been enabled as part of another event group. It should never happen that the same event is described by different GUID supported by the same system so just WARN (via resctrl_enable_mon_event()) and skip the event. Note that it is architecturally possible that some telemetry events are only supported by a subset of the packages in the system. It is not expected that systems will ever do this. If they do the user will see event files in resctrl that always return "Unavailable". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 23:02:45 +01:00
Tony Luck	8ccb1f8fa6	x86,fs/resctrl: Add architectural event pointer The resctrl file system layer passes the domain, RMID, and event id to the architecture to fetch an event counter. Fetching a telemetry event counter requires additional information that is private to the architecture, for example, the offset into MMIO space from where the counter should be read. Add mon_evt::arch_priv that architecture can use for any private data related to the event. The resctrl filesystem initializes mon_evt::arch_priv when the architecture enables the event and passes it back to architecture when needing to fetch an event counter. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 16:37:08 +01:00
Tony Luck	8f6b6ad69b	x86,fs/resctrl: Fill in details of events for performance and energy GUIDs The telemetry event aggregators of the Intel Clearwater Forest CPU support two RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with GUID 0x26557651². The event counter offsets in an aggregator's MMIO space are arranged in groups for each RMID. E.g., the "energy" counters for GUID 0x26696143 are arranged like this: MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY ... MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY After all counters there are three status registers that provide indications of how many times an aggregator was unable to process event counts, the time stamp for the most recent loss of data, and the time stamp of the most recent successful update. MMIO offset:0x2400 AGG_DATA_LOSS_COUNT MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP Define event_group structures for both of these aggregator types and define the events tracked by the aggregators in the file system code. PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format. File system code must output as floating point values. ¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml ²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 16:37:07 +01:00
Tony Luck	db64994d11	fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains The feature to sum event data across multiple domains supports systems with Sub-NUMA Cluster (SNC) mode enabled. The top-level monitoring files in each "mon_L3_XX" directory provide the sum of data across all SNC nodes sharing an L3 cache instance while the "mon_sub_L3_YY" sub-directories provide the event data of the individual nodes. SNC is only associated with the L3 resource and domains and as a result the flow handling the sum of event data implicitly assumes it is working with the L3 resource and domains. Reading of telemetry events does not require to sum event data so this feature can remain dedicated to SNC and keep the implicit assumption of working with the L3 resource and domains. Add a WARN to where the implicit assumption of working with the L3 resource is made and add comments on how the structure controlling the event sum feature is used. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 16:37:07 +01:00
Tony Luck	2e53ad6668	x86,fs/resctrl: Add and initialize a resource for package scope monitoring Add a new PERF_PKG resource and introduce package level scope for monitoring telemetry events so that CPU hotplug notifiers can build domains at the package granularity. Use the physical package ID available via topology_physical_package_id() to identify the monitoring domains with package level scope. This enables user space to use: /sys/devices/system/cpu/cpuX/topology/physical_package_id to identify the monitoring domain a CPU is associated with. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-09 16:37:07 +01:00
Tony Luck	39208e73a4	x86,fs/resctrl: Add an architectural hook called for first mount Enumeration of Intel telemetry events is an asynchronous process involving several mutually dependent drivers added as auxiliary devices during the device_initcall() phase of Linux boot. The process finishes after the probe functions of these drivers completes. But this happens after resctrl_arch_late_init() is executed. Tracing the enumeration process shows that it does complete a full seven seconds before the earliest possible mount of the resctrl file system (when included in /etc/fstab for automatic mount by systemd). Add a hook for use by telemetry event enumeration and initialization and run it once at the beginning of resctrl mount without any locks held. The architecture is responsible for any required locking. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20260105191711.GBaVwON5nZn-uO6Sqg@fat_crate.local	2026-01-09 16:36:34 +01:00
Tony Luck	e37c9a3dc9	x86,fs/resctrl: Support binary fixed point event counters resctrl assumes that all monitor events can be displayed as unsigned decimal integers. Hardware architecture counters may provide some telemetry events with greater precision where the event is not a simple count, but is a measurement of some sort (e.g. Joules for energy consumed). Add a new argument to resctrl_enable_mon_event() for architecture code to inform the file system that the value for a counter is a fixed-point value with a specific number of binary places. Only allow architecture to use floating point format on events that the file system has marked with mon_evt::is_floating_point which reflects the contract with user space on how the event values are displayed. Display fixed point values with values rounded to ceil(binary_bits * log10(2)) decimal places. Special case for zero binary bits to print "{value}.0". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 16:10:41 +01:00
Tony Luck	ab0308aee3	x86,fs/resctrl: Handle events that can be read from any CPU resctrl assumes that monitor events can only be read from a CPU in the cpumask_t set of each domain. This is true for x86 events accessed with an MSR interface, but may not be true for other access methods such as MMIO. Introduce and use flag mon_evt::any_cpu, settable by architecture, that indicates there are no restrictions on which CPU can read that event. This flag is not supported by the L3 event reading that requires to be run on a CPU that belongs to the L3 domain of the event being read. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 15:38:07 +01:00
Tony Luck	dd110880e8	fs/resctrl: Make event details accessible to functions when reading events Reading monitoring event data from MMIO requires more context than the event id to be able to read the correct memory location. struct mon_evt is the appropriate place for this event specific context. Prepare for addition of extra fields to struct mon_evt by changing the calling conventions to pass a pointer to the mon_evt structure instead of just the event id. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 15:25:22 +01:00
Tony Luck	9c214d10c5	x86,fs/resctrl: Rename some L3 specific functions With the arrival of monitor events tied to new domains associated with a different resource it would be clearer if the L3 resource specific functions are more accurately named. Rename three groups of functions: Functions that allocate/free architecture per-RMID MBM state information: arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc() mon_domain_free() -> l3_mon_domain_free() Functions that allocate/free filesystem per-RMID MBM state information: domain_setup_mon_state() -> domain_setup_l3_mon_state() domain_destroy_mon_state() -> domain_destroy_l3_mon_state() Initialization/exit: rdt_get_mon_l3_config() -> rdt_get_l3_mon_config() resctrl_mon_resource_init() -> resctrl_l3_mon_resource_init() resctrl_mon_resource_exit() -> resctrl_l3_mon_resource_exit() Ensure kernel-doc descriptions of these functions' return values are present and correctly formatted. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 11:21:55 +01:00
Tony Luck	4bc3ef46ff	x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain The upcoming telemetry event monitoring is not tied to the L3 resource and will have a new domain structure. Rename the L3 resource specific domain data structures to include "l3_" in their names to avoid confusion between the different resource specific domain structures: rdt_mon_domain -> rdt_l3_mon_domain rdt_hw_mon_domain -> rdt_hw_l3_mon_domain No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 11:17:25 +01:00
Tony Luck	6b10cf7b6e	x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters Convert the whole call sequence from mon_event_read() to resctrl_arch_rmid_read() to pass resource independent struct rdt_domain_hdr instead of an L3 specific domain structure to prepare for monitoring events in other resources. This additional layer of indirection obscures which aspects of event counting depend on a valid domain. Event initialization, support for assignable counters, and normal event counting implicitly depend on a valid domain while summing of domains does not. Split summing domains from the core event counting handling to make their respective dependencies obvious. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 11:08:58 +01:00
Tony Luck	ad5c2ff75e	fs/resctrl: Split L3 dependent parts out of __mon_event_count() Carve out the L3 resource specific event reading code into a separate helper to support reading event data from a new monitoring resource. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-05 10:18:33 +01:00
Tony Luck	97fec06d35	x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr Up until now, all monitoring events were associated with the L3 resource and it made sense to use the L3 specific "struct rdt_mon_domain *" argument to functions operating on domains. Telemetry events will be tied to a new resource with its instances represented by a new domain structure that, just like struct rdt_mon_domain, starts with the generic struct rdt_domain_hdr. Prepare to support domains belonging to different resources by changing the calling convention of functions operating on domains. Pass the generic header and use that to find the domain specific structure where needed. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-04 13:48:11 +01:00
Tony Luck	03eb578b37	x86,fs/resctrl: Improve domain type checking Every resctrl resource has a list of domain structures. struct rdt_ctrl_domain and struct rdt_mon_domain both begin with struct rdt_domain_hdr with rdt_domain_hdr::type used in validity checks before accessing the domain of a particular type. Add the resource id to struct rdt_domain_hdr in preparation for a new monitoring domain structure that will be associated with a new monitoring resource. Improve existing domain validity checks with a new helper domain_header_is_valid() that checks both domain type and resource id. domain_header_is_valid() should be used before every call to container_of() that accesses a domain structure. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com	2026-01-04 07:30:10 +01:00
Linus Torvalds	04688d6128	Merge tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fix from Steve French: - Fix potential memory leak * tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix memory and information leak in smb3_reconfigure()	2025-12-26 16:19:45 -08:00
Linus Torvalds	1e5e062ad8	Merge tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA - Remove unnecessary (and hence incorrect) endian conversion in the Rust PCI driver sample code - Fix memory leak in the unwind path of debugfs_change_name() - Support non-const struct software_node pointers in SOFTWARE_NODE_REFERENCE(), after introducing _Generic() - Avoid NULL pointer dereference in the unwind path of simple_xattrs_free() * tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: fs/kernfs: null-ptr deref in simple_xattrs_free() software node: Also support referencing non-constant software nodes debugfs: Fix memleak in debugfs_change_name(). samples: rust: fix endianness issue in rust_driver_pci rust: dma: add helpers for architectures without CONFIG_HAS_DMA	2025-12-26 13:41:02 -08:00
Linus Torvalds	e2cc644089	Merge tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - Fix parsing of SMB1 negotiate request by adjusting offsets affected by the removal of the RFC1002 length field from the SMB header - Update minimum PDU size macros for both SMB1 and SMB2 - Rename smb2_get_msg function to smb_get_msg to better reflect its role in handling both SMB1 and SMB2 requests * tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd: smb/server: fix minimum SMB2 PDU size smb/server: fix minimum SMB1 PDU size ksmbd: rename smb2_get_msg to smb_get_msg ksmbd: Fix to handle removal of rfc1002 header from smb_hdr	2025-12-26 10:03:25 -08:00
Linus Torvalds	ccd1cdca5c	Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes that arrived just a bit late for the 6.19 merge window. Regression fixes: - Mark variable __maybe_unused to avoid W=1 build break Stable fixes: - NFSv4 file creation neglects setting ACL - Clear TIME_DELEG in the suppattr_exclcreat bitmap - Clear SECLABEL in the suppattr_exclcreat bitmap - Fix memory leak in nfsd_create_serv error paths - Bound check rq_pages index in inline path - Return 0 on success from svc_rdma_copy_inline_range - Use rc_pageoff for memcpy byte offset - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf" * tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: NFSv4 file creation neglects setting ACL NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap nfsd: fix memory leak in nfsd_create_serv error paths nfsd: Mark variable __maybe_unused to avoid W=1 build break svcrdma: bound check rq_pages index in inline path svcrdma: return 0 on success from svc_rdma_copy_inline_range svcrdma: use rc_pageoff for memcpy byte offset SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf	2025-12-24 09:23:04 -08:00
Linus Torvalds	ce93692d68	Merge tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Junbeom reported that synchronous reads could hit unintended EIOs under memory pressure due to incorrect error propagation in z_erofs_decompress_queue(), where earlier physical clusters in the same decompression queue may be served for another readahead. This addresses the issue by decompressing each physical cluster independently as long as disk I/Os succeed, rather than being impacted by the error status of previous physical clusters in the same queue. Summary: - Fix unexpected EIOs under memory pressure caused by recent incorrect error propagation logic" * tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix unexpected EIO under memory pressure	2025-12-24 09:15:30 -08:00
Zilin Guan	cb6d5aa9c0	cifs: Fix memory and information leak in smb3_reconfigure() In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the function returns immediately without freeing and erasing the newly allocated new_password and new_password2. This causes both a memory leak and a potential information leak. Fix this by calling kfree_sensitive() on both password buffers before returning in this error case. Fixes: `0f0e357902` ("cifs: during remount, make sure passwords are in sync") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-24 11:07:15 -06:00
Will Rosenberg	2b74209458	fs/kernfs: null-ptr deref in simple_xattrs_free() There exists a null pointer dereference in simple_xattrs_free() as part of the __kernfs_new_node() routine. Within __kernfs_new_node(), err_out4 calls simple_xattr_free(), but kn->iattr may be NULL if __kernfs_setattr() was never called. As a result, the first argument to simple_xattrs_free() may be NULL + 0x38, and no NULL check is done internally, causing an incorrect pointer dereference. Add a check to ensure kn->iattr is not NULL, meaning __kernfs_setattr() has been called and kn->iattr is allocated. Note that struct kernfs_node kn is allocated with kmem_cache_zalloc, so we can assume kn->iattr will be NULL if not allocated. An alternative fix could be to not call simple_xattrs_free() at all. As was previously discussed during the initial patch, simple_xattrs_free() is not strictly needed and is included to be consistent with kernfs_free_rcu(), which also helps the function maintain correctness if changes are made in __kernfs_new_node(). Reported-by: syzbot+6aaf7f48ae034ab0ea97@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6aaf7f48ae034ab0ea97 Fixes: `382b1e8f30` ("kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node") Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Link: https://patch.msgid.link/20251217060107.4171558-1-whrosenb@asu.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-23 16:14:43 +01:00
ChenXiaoSong	4c7d8eb9a7	smb/server: fix minimum SMB2 PDU size The minimum SMB2 PDU size should be updated to the size of `struct smb2_pdu` (that is, the size of `struct smb2_hdr` + 2). Suggested-by: David Howells <dhowells@redhat.com> Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-21 19:20:46 -06:00
ChenXiaoSong	3b9c30eb8f	smb/server: fix minimum SMB1 PDU size Since the RFC1002 header has been removed from `struct smb_hdr`, the minimum SMB1 PDU size should be updated as well. Fixes: `83bfbd0bb9` ("cifs: Remove the RFC1002 header from smb_hdr") Suggested-by: David Howells <dhowells@redhat.com> Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-21 19:20:46 -06:00
Namjae Jeon	0b444cfd8b	ksmbd: rename smb2_get_msg to smb_get_msg With the removal of the RFC1002 length field from the SMB header, smb2_get_msg is now used to get the smb1 request from the request buffer. Since this function is no longer exclusive to smb2 and now supports smb1 as well, This patch rename it to smb_get_msg to better reflect its usage. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-21 19:20:46 -06:00
David Howells	0a70cac789	ksmbd: Fix to handle removal of rfc1002 header from smb_hdr The commit that removed the RFC1002 header from struct smb_hdr didn't also fix the places in ksmbd that use it in order to provide graceful rejection of SMB1 protocol requests. Fixes: `83bfbd0bb9` ("cifs: Remove the RFC1002 header from smb_hdr") Reported-by: Namjae Jeon <linkinjeon@kernel.org> Link: https://lore.kernel.org/r/CAKYAXd9Ju4MFkkH5Jxfi1mO0AWEr=R35M3vQ_Xa7Yw34JoNZ0A@mail.gmail.com/ Cc: ChenXiaoSong <chenxiaosong.chenxiaosong@linux.dev> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-21 19:20:46 -06:00
Junbeom Yeom	4012d78562	erofs: fix unexpected EIO under memory pressure erofs readahead could fail with ENOMEM under the memory pressure because it tries to alloc_page with GFP_NOWAIT \| GFP_NORETRY, while GFP_KERNEL for a regular read. And if readahead fails (with non-uptodate folios), the original request will then fall back to synchronous read, and `.read_folio()` should return appropriate errnos. However, in scenarios where readahead and read operations compete, read operation could return an unintended EIO because of an incorrect error propagation. To resolve this, this patch modifies the behavior so that, when the PCL is for read(which means pcl.besteffort is true), it attempts actual decompression instead of propagating the privios error except initial EIO. - Page size: 4K - The original size of FileA: 16K - Compress-ratio per PCL: 50% (Uncompressed 8K -> Compressed 4K) [page0, page1] [page2, page3] [PCL0]---------[PCL1] - functions declaration: . pread(fd, buf, count, offset) . readahead(fd, offset, count) - Thread A tries to read the last 4K - Thread B tries to do readahead 8K from 4K - RA, besteffort == false - R, besteffort == true <process A> <process B> pread(FileA, buf, 4K, 12K) do readahead(page3) // failed with ENOMEM wait_lock(page3) if (!uptodate(page3)) goto do_read readahead(FileA, 4K, 8K) // Here create PCL-chain like below: // [null, page1] [page2, null] // [PCL0:RA]-----[PCL1:RA] ... do read(page3) // found [PCL1:RA] and add page3 into it, // and then, change PCL1 from RA to R ... // Now, PCL-chain is as below: // [null, page1] [page2, page3] // [PCL0:RA]-----[PCL1:R] // try to decompress PCL-chain... z_erofs_decompress_queue err = 0; // failed with ENOMEM, so page 1 // only for RA will not be uptodated. // it's okay. err = decompress([PCL0:RA], err) // However, ENOMEM propagated to next // PCL, even though PCL is not only // for RA but also for R. As a result, // it just failed with ENOMEM without // trying any decompression, so page2 // and page3 will not be uptodated. BUG HERE --> err = decompress([PCL1:R], err) return err as ENOMEM ... wait_lock(page3) if (!uptodate(page3)) return EIO <-- Return an unexpected EIO! ... Fixes: `2349d2fa02` ("erofs: sunset unneeded NOFAILs") Cc: stable@vger.kernel.org Reviewed-by: Jaewook Kim <jw5454.kim@samsung.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-12-22 00:18:53 +08:00
Linus Torvalds	f67e8a5e3e	Merge tag 'xfs-fixes-6.19-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: "This contains a few fixes for zoned devices support, an UAF and a compiler warning, and some cleaning up" * tag 'xfs-fixes-6.19-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix the zoned RT growfs check for zone alignment xfs: validate that zoned RT devices are zone aligned xfs: fix XFS_ERRTAG_FORCE_ZERO_RANGE for zoned file system xfs: fix a memory leak in xfs_buf_item_init() xfs: fix stupid compiler warning xfs: fix a UAF problem in xattr repair xfs: ignore discard return value	2025-12-20 12:45:35 -08:00
Kuniyuki Iwashima	d412ff9e26	debugfs: Fix memleak in debugfs_change_name(). syzbot reported memleak in debugfs_change_name(). [0] When lookup_noperm_unlocked() fails, new_name is leaked. Let's fix it by reusing kfree_const() at the end of debugfs_change_name(). [0]: BUG: memory leak unreferenced object 0xffff8881110bb308 (size 8): comm "syz.0.17", pid 6090, jiffies 4294942958 hex dump (first 8 bytes): 2e 00 00 00 00 00 00 00 ........ backtrace (crc ecfc7064): kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline] slab_post_alloc_hook mm/slub.c:4953 [inline] slab_alloc_node mm/slub.c:5258 [inline] __do_kmalloc_node mm/slub.c:5651 [inline] __kmalloc_node_track_caller_noprof+0x3b2/0x670 mm/slub.c:5759 __kmemdup_nul mm/util.c:64 [inline] kstrdup+0x3c/0x80 mm/util.c:84 kstrdup_const+0x63/0x80 mm/util.c:104 kvasprintf_const+0xca/0x110 lib/kasprintf.c:48 debugfs_change_name+0xf6/0x5d0 fs/debugfs/inode.c:854 cfg80211_dev_rename+0xd8/0x110 net/wireless/core.c:149 nl80211_set_wiphy+0x102/0x1770 net/wireless/nl80211.c:3844 genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x93/0x1d0 net/netlink/af_netlink.c:2550 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x3a3/0x4f0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x335/0x6b0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg net/socket.c:733 [inline] ____sys_sendmsg+0x562/0x5a0 net/socket.c:2608 ___sys_sendmsg+0xc8/0x130 net/socket.c:2662 __sys_sendmsg+0xc7/0x140 net/socket.c:2694 Fixes: `833d2b3a07` ("Add start_renaming_two_dentries()") Reported-by: syzbot+3d7ca9c802c547f8550a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69369d82.a70a0220.38f243.009f.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251208094551.46184-1-kuniyu@google.com [ Fix minor typo in commit message. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-12-19 16:43:40 +01:00
Linus Torvalds	a91e1138b7	Merge tag 'v6.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - important fix for reconnect problem - minor cleanup * tag 'v6.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number smb: move some SMB1 definitions into common/smb1pdu.h smb: align durable reconnect v2 context to 8 byte boundary	2025-12-19 07:50:20 +12:00
Linus Torvalds	9a903e6d96	Merge tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Two fsnotify fixes. The fix from Ahelenia makes sure we generate event when modifying inode flags, the fix from Amir disables sending of events from device inodes to their parent directory as it could concievably create a usable side channel attack in case of some devices and so far we aren't aware of anybody depending on the functionality" * tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs: send fsnotify_xattr()/IN_ATTRIB from vfs_fileattr_set()/chattr(1) fsnotify: do not generate ACCESS/MODIFY events on child for special files	2025-12-19 07:41:17 +12:00
Chuck Lever	913f7cf77b	NFSD: NFSv4 file creation neglects setting ACL An NFSv4 client that sets an ACL with a named principal during file creation retrieves the ACL afterwards, and finds that it is only a default ACL (based on the mode bits) and not the ACL that was requested during file creation. This violates RFC 8881 section 6.4.1.3: "the ACL attribute is set as given". The issue occurs in nfsd_create_setattr(), which calls nfsd_attrs_valid() to determine whether to call nfsd_setattr(). However, nfsd_attrs_valid() checks only for iattr changes and security labels, but not POSIX ACLs. When only an ACL is present, the function returns false, nfsd_setattr() is skipped, and the POSIX ACL is never applied to the inode. Subsequently, when the client retrieves the ACL, the server finds no POSIX ACL on the inode and returns one generated from the file's mode bits rather than returning the originally-specified ACL. Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com> Fixes: `c0cbe70742` ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-18 11:19:11 -05:00
Chuck Lever	ad3cbbb0c1	NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap >From RFC 8881: 5.8.1.14. Attribute 75: suppattr_exclcreat > The bit vector that would set all REQUIRED and RECOMMENDED > attributes that are supported by the EXCLUSIVE4_1 method of file > creation via the OPEN operation. The scope of this attribute > applies to all objects with a matching fsid. There's nothing in RFC 8881 that states that suppattr_exclcreat is or is not allowed to contain bits for attributes that are clear in the reported supported_attrs bitmask. But it doesn't make sense for an NFS server to indicate that it /doesn't/ implement an attribute, but then also indicate that clients /are/ allowed to set that attribute using OPEN(create) with EXCLUSIVE4_1. The FATTR4_WORD2_TIME_DELEG attributes are also not to be allowed for OPEN(create) with EXCLUSIVE4_1. It doesn't make sense to set a delegated timestamp on a new file. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-18 11:18:39 -05:00
Chuck Lever	27d17641ca	NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap >From RFC 8881: 5.8.1.14. Attribute 75: suppattr_exclcreat > The bit vector that would set all REQUIRED and RECOMMENDED > attributes that are supported by the EXCLUSIVE4_1 method of file > creation via the OPEN operation. The scope of this attribute > applies to all objects with a matching fsid. There's nothing in RFC 8881 that states that suppattr_exclcreat is or is not allowed to contain bits for attributes that are clear in the reported supported_attrs bitmask. But it doesn't make sense for an NFS server to indicate that it /doesn't/ implement an attribute, but then also indicate that clients /are/ allowed to set that attribute using OPEN(create) with EXCLUSIVE4_1. Ensure that the SECURITY_LABEL and ACL bits are not set in the suppattr_exclcreat bitmask when they are also not set in the supported_attrs bitmask. Fixes: `8c18f2052e` ("nfsd41: SUPPATTR_EXCLCREAT attribute") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-18 11:18:39 -05:00
Shardul Bankar	df8d829bba	nfsd: fix memory leak in nfsd_create_serv error paths When nfsd_create_serv() calls percpu_ref_init() to initialize nn->nfsd_net_ref, it allocates both a percpu reference counter and a percpu_ref_data structure (64 bytes). However, if the function fails later due to svc_create_pooled() returning NULL or svc_bind() returning an error, these allocations are not cleaned up, resulting in a memory leak. The leak manifests as: - Unreferenced percpu allocation (8 bytes per CPU) - Unreferenced percpu_ref_data structure (64 bytes) Fix this by adding percpu_ref_exit() calls in both error paths to properly clean up the percpu_ref_init() allocations. This patch fixes the percpu_ref leak in nfsd_create_serv() seen as an auxiliary leak in syzbot report 099461f8558eb0a1f4f3; the prepare_creds() and vsock-related leaks in the same report remain to be addressed separately. Reported-by: syzbot+099461f8558eb0a1f4f3@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=099461f8558eb0a1f4f3 Fixes: `47e988147f` ("nfsd: add nfsd_serv_try_get and nfsd_serv_put") Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-18 11:18:39 -05:00
Christoph Hellwig	dc68c0f601	xfs: fix the zoned RT growfs check for zone alignment The grofs code for zoned RT subvolums already tries to check for zone alignment, but gets it wrong by using the old instead of the new mount structure. Fixes: `01b71e64bb` ("xfs: support growfs on zoned file systems") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.15 Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-17 10:27:02 +01:00
Christoph Hellwig	982d2616a2	xfs: validate that zoned RT devices are zone aligned Garbage collection assumes all zones contain the full amount of blocks. Mkfs already ensures this happens, but make the kernel check it as well to avoid getting into trouble due to fuzzers or mkfs bugs. Fixes: `2167eaabe2` ("xfs: define the zoned on-disk format") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.15 Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-17 10:27:02 +01:00
Steve French	d8a4af8f3d	cifs: update internal module version number to 2.58 Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-16 17:43:02 -06:00
ZhangGuoDong	94d5b8dbc5	smb: move some SMB1 definitions into common/smb1pdu.h These definitions are only used by SMB1, so move them into the new common/smb1pdu.h. KSMBD only implements SMB_COM_NEGOTIATE, see MS-SMB2 3.3.5.2. Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-16 17:43:01 -06:00
Bharath SM	05f5e355cf	smb: align durable reconnect v2 context to 8 byte boundary Add a 4-byte Pad to create_durable_handle_reconnect_v2 so the DH2C create context is 8 byte aligned. This avoids malformed CREATE contexts on reconnect. Recent change removed this Padding, adding it back. Fixes: `81a45de432` ("smb: move create_durable_handle_reconnect_v2 to common/smb2pdu.h") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-16 17:42:49 -06:00
Christoph Hellwig	8dc15b7a6e	xfs: fix XFS_ERRTAG_FORCE_ZERO_RANGE for zoned file system The new XFS_ERRTAG_FORCE_ZERO_RANGE error tag added by commit ea9989668081 ("xfs: error tag to force zeroing on debug kernels") fails to account for the zoned space reservation rules and this reliably fails xfs/131 because the zeroing operation returns -EIO. Fix this by reserving enough space to zero the entire range, which requires a bit of (fairly ugly) reshuffling to do the error injection early enough to affect the space reservation. Fixes: ea9989668081 ("xfs: error tag to force zeroing on debug kernels") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-16 09:21:38 +01:00
Haoxiang Li	fc40459de8	xfs: fix a memory leak in xfs_buf_item_init() xfs_buf_item_get_format() may allocate memory for bip->bli_formats, free the memory in the error path. Fixes: `c3d5f0c2fb` ("xfs: complain if anyone tries to create a too-large buffer log item") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-16 08:50:11 +01:00
Darrick J. Wong	f067250520	xfs: fix stupid compiler warning gcc 14.2 warns about: xfs_attr_item.c: In function ‘xfs_attr_recover_work’: xfs_attr_item.c:785:9: warning: ‘ip’ may be used uninitialized [-Wmaybe-uninitialized] 785 \| xfs_trans_ijoin(tp, ip, 0); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ xfs_attr_item.c:740:42: note: ‘ip’ was declared here 740 \| struct xfs_inode *ip; \| ^~ I think this is bogus since xfs_attri_recover_work either returns a real pointer having initialized ip or an ERR_PTR having not touched it, but the tools are smarter than me so let's just null-init the variable anyway. Cc: stable@vger.kernel.org # v6.8 Fixes: `e70fb328d5` ("xfs: recreate work items when recovering intent items") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-16 08:50:07 +01:00
Darrick J. Wong	5990fd7569	xfs: fix a UAF problem in xattr repair The xchk_setup_xattr_buf function can allocate a new value buffer, which means that any reference to ab->value before the call could become a dangling pointer. Fix this by moving an assignment to after the buffer setup. Cc: stable@vger.kernel.org # v6.10 Fixes: `e47dcf113a` ("xfs: repair extended attributes") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-16 08:50:00 +01:00
Chaitanya Kulkarni	2145f447b7	xfs: ignore discard return value __blkdev_issue_discard() always returns 0, making all error checking in XFS discard functions dead code. Change xfs_discard_extents() return type to void, remove error variable, error checking, and error logging for the __blkdev_issue_discard() call in same function. Update xfs_trim_perag_extents() and xfs_trim_rtgroup_extents() to ignore the xfs_discard_extents() return value and error checking code. Update xfs_discard_rtdev_extents() to ignore __blkdev_issue_discard() return value and error checking code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-12-16 08:49:56 +01:00
Linus Torvalds	40fbbd64bb	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull shmem rename fixes from Al Viro: "A couple of shmem rename fixes - recent regression from tree-in-dcache series and older breakage from stable directory offsets stuff" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: shmem: fix recovery on rename failures shmem_whiteout(): fix regression from tree-in-dcache series	2025-12-16 19:44:36 +12:00

1 2 3 4 5 ...

103089 Commits