Pull hardening updates from Kees Cook:
"As has become normal, changes are scattered around the tree (either
explicitly maintainer Acked or for trivial stuff that went ignored):
- Carve out the new CONFIG_LIST_HARDENED as a more focused subset of
CONFIG_DEBUG_LIST (Marco Elver)
- Fix kallsyms lookup failure under Clang LTO (Yonghong Song)
- Clarify documentation for CONFIG_UBSAN_TRAP (Jann Horn)
- Flexible array member conversion not carried in other tree (Gustavo
A. R. Silva)
- Various strlcpy() and strncpy() removals not carried in other trees
(Azeem Shaikh, Justin Stitt)
- Convert nsproxy.count to refcount_t (Elena Reshetova)
- Add handful of __counted_by annotations not carried in other trees,
as well as an LKDTM test
- Fix build failure with gcc-plugins on GCC 14+
- Fix selftests to respect SKIP for signal-delivery tests
- Fix CFI warning for paravirt callback prototype
- Clarify documentation for seq_show_option_n() usage"
* tag 'hardening-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
LoadPin: Annotate struct dm_verity_loadpin_trusted_root_digest with __counted_by
kallsyms: Change func signature for cleanup_symbol_name()
kallsyms: Fix kallsyms_selftest failure
nsproxy: Convert nsproxy.count to refcount_t
integrity: Annotate struct ima_rule_opt_list with __counted_by
lkdtm: Add FAM_BOUNDS test for __counted_by
Compiler Attributes: counted_by: Adjust name and identifier expansion
um: refactor deprecated strncpy to memcpy
um: vector: refactor deprecated strncpy
alpha: Replace one-element array with flexible-array member
hardening: Move BUG_ON_DATA_CORRUPTION to hardening options
list: Introduce CONFIG_LIST_HARDENED
list_debug: Introduce inline wrappers for debug checks
compiler_types: Introduce the Clang __preserve_most function attribute
gcc-plugins: Rename last_stmt() for GCC 14+
selftests/harness: Actually report SKIP for signal tests
x86/paravirt: Fix tlb_remove_table function callback prototype warning
EISA: Replace all non-returning strlcpy with strscpy
perf: Replace strlcpy with strscpy
um: Remove strlcpy declaration
...
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
Clang warns:
drivers/misc/cxl/native.c:272:20: error: unused function 'detach_spa' [-Werror,-Wunused-function]
It was created as part of some refactoring in commit 05155772f6 ("cxl:
Allocate and release the SPA with the AFU"), but has never been called
in its current form. Drop it.
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823044803.737175-1-mpe@ellerman.id.au
Memory is allocated for dynamic loading when audio daemon is trying
to attach to audioPD on DSP side. This memory is allocated from
reserved CMA memory region and needs ownership assignment to
new VMID in order to use it from audioPD.
In the current implementation, arguments are not correctly passed
to the scm call which might result in failure of dynamic loading
on audioPD. Added changes to pass correct arguments during daemon
attach request.
Fixes: 0871561055 ("misc: fastrpc: Add support for audiopd")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scatterlist table is obtained during map create request and the same
table is used for DMA mapping unmap. In case there is any failure
while getting the sg_table, ERR_PTR is returned instead of sg_table.
When the map is getting freed, there is only a non-NULL check of
sg_table which will also be true in case failure was returned instead
of sg_table. This would result in improper unmap request. Add proper
check before setting map table to avoid bad unmap request.
Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable@kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remote heap is used by DSP audioPD on need basis. This memory is
allocated from reserved CMA memory region and is then shared with
audioPD to use it for it's functionality.
Current implementation of remote heap is not allocating the memory
from CMA region, instead it is allocating the memory from SMMU
context bank. The arguments passed to scm call for the reassignment
of ownership is also not correct. Added changes to allocate CMA
memory and have a proper ownership reassignment.
Fixes: 532ad70c6d ("misc: fastrpc: Add mmap request assigning for static PD pool")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230811115643.38578-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PCI core add pci_find_vsec_capability() to query VSEC. We can use that
core API to simplify the code.
The only logical change is that pci_find_vsec_capability check the
Vendor ID before finding the VSEC.
PCI spec rev 5.0 says in 7.9.5.2 Vendor-Specific Header:
VSEC ID - This field is a vendor-defined ID number that indicates the
nature and format of the VSEC structure
Software must qualify the Vendor ID before interpreting this field.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230804075630.186054-1-wangxiongfeng2@huawei.com
Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).
The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.
Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.
To generate optimal machine code with CONFIG_LIST_HARDENED:
1. Elide checking for pointer values which upon dereference would
result in an immediate access fault (i.e. minimal hardening
checks). The trade-off is lower-quality error reports.
2. Use the __preserve_most function attribute (available with Clang,
but not yet with GCC) to minimize the code footprint for calling
the reporting slow path. As a result, function size of callers is
reduced by avoiding saving registers before calling the rarely
called reporting slow path.
Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.
3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.
As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.
Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
Kconfig variables are changed to reflect that: DEBUG_LIST selects
LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
DEBUG_LIST.
Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).
Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
context, other process context code should disable irq or bottom-half
before acquire the same lock, otherwise deadlock could happen if the
timer preempt the execution while the lock is held in process context
on the same CPU.
Possible deadlock scenario
bcm_vk_open()
-> bcm_vk_get_ctx()
-> spin_lock(&vk->ctx_lock)
<timer iterrupt>
-> bcm_vk_hb_poll()
-> bcm_vk_blk_drv_access()
-> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock, which reported the following
warning when analyzing the linux kernel 6.4-rc7 release.
[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_ioctl
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_open
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
[Deadlock]: &vk->ctx_lock
[Interrupt]: bcm_vk_hb_poll
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
[Locking Unit]: bcm_vk_release
-->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
As suggested by Arnd, the tentative patch fix the potential deadlocks
by replacing the timer with delay workqueue. x86_64 allyesconfig using
GCC shows no new warning. Note that no runtime testing was performed
due to no device on hand.
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Tested-by: Desmond Yan <desmond.branden@broadcom.com>
Tested-by: Desmond Yan <desmond.yan@broadcom.com>
Link: https://lore.kernel.org/r/20230629182941.13045-1-dg573847474@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # cxl
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230718143102.1065481-1-robh@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() is renamed to .remove().
There are two calls that can go wrong in tps6594_esm_remove(); for both
there is already an error message. Not returning the error code has the
only side effect of suppressing (another) error message by the core
about the error being ignored. So tps6594_esm_remove() can be converted
to return void without any loss.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230710082311.3474785-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() is renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230707073343.3396477-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A previous commit tried to come up with more generic subpool
names, but this isn't quite working: the node name was used
elsewhere to match pools to consumers which regressed the
nVidia Tegra 2/3 video decoder.
Revert back to an earlier approach using of_node_full_name()
instead of just the name to make sure the pool name is more
unique, and change both sites using this in the kernel.
It is not perfect since two SRAM nodes could have the same
subpool name but it makes the situation better than before.
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: 21e5a2d10c ("misc: sram: Generate unique names for subpools")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20230622074520.3058027-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
POSIX says:
"Upon successful completion, where nbyte is greater than 0, write()
shall mark for update the last data modification and last file status
change timestamps of the file..."
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230705190309.579783-1-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>