Commit Graph

6576 Commits

Author SHA1 Message Date
Linus Torvalds
8297b790c6 Merge tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull securityfs updates from Al Viro:
 "Securityfs cleanups and fixes:

   - one extra reference is enough to pin a dentry down; no need for
     two. Switch to regular scheme, similar to shmem, debugfs, etc. This
     fixes a securityfs_recursive_remove() dentry leak, among other
     things.

   - we need to have the filesystem pinned to prevent the contents
     disappearing; what we do not need is pinning it for each file.
     Doing that only for files and directories in the root is enough.

   - the previous two changes allow us to get rid of the racy kludges in
     efi_secret_unlink(), where we can use simple_unlink() instead of
     securityfs_remove(). Which does not require unlocking and relocking
     the parent, with all deadlocks that invites.

   - Make securityfs_remove() take the entire subtree out, turning
     securityfs_recursive_remove() into its alias. Makes a lot more
     sense for callers and fixes a mount leak, while we are at it.

   - Making securityfs_remove() remove the entire subtree allows for
     much simpler life in most of the users - efi_secret, ima_fs, evm,
     ipe, tmp get cleaner. I hadn't touched apparmor use of securityfs,
     but I suspect that it would be useful there as well"

* tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  tpm: don't bother with removal of files in directory we'll be removing
  ipe: don't bother with removal of files in directory we'll be removing
  evm_secfs: clear securityfs interactions
  ima_fs: get rid of lookup-by-dentry stuff
  ima_fs: don't bother with removal of files in directory we'll be removing
  efi_secret: clean securityfs use up
  make securityfs_remove() remove the entire subtree
  fix locking in efi_secret_unlink()
  securityfs: pin filesystem only for objects directly in root
  securityfs: don't pin dentries twice, once is enough...
2025-07-28 10:07:54 -07:00
Kees Cook
a8f0b1f8ef kstack_erase: Support Clang stack depth tracking
Wire up CONFIG_KSTACK_ERASE to Clang 21's new stack depth tracking
callback[1] option.

Link: https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-stack-depth [1]
Acked-by: Nicolas Schier <n.schier@avm.de>
Link: https://lore.kernel.org/r/20250724055029.3623499-4-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-26 14:28:35 -07:00
Kees Cook
9ea1e8d28a stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
The Clang stack depth tracking implementation has a fixed name for
the stack depth tracking callback, "__sanitizer_cov_stack_depth", so
rename the GCC plugin function to match since the plugin has no external
dependencies on naming.

Link: https://lore.kernel.org/r/20250717232519.2984886-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-21 21:40:39 -07:00
Kees Cook
57fbad15c2 stackleak: Rename STACKLEAK to KSTACK_ERASE
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:

- Add the new top-level CONFIG_KSTACK_ERASE option which will be
  implemented either with the stackleak GCC plugin, or with the Clang
  stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
  but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
  for what it does rather than what it protects against), but leave as
  many of the internals alone as possible to avoid even more churn.

While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-21 21:35:01 -07:00
John Johansen
4d9d1a08b7 apparmor: fix: accept2 being specifie even when permission table is presnt
The transition to the perms32 permission table dropped the need for
the accept2 table as permissions. However accept2 can be used for
flags and may be present even when the perms32 table is present. So
instead of checking on version, check whether the table is present.

Fixes: 2e12c5f060 ("apparmor: add additional flags to extended permission.")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:31:13 -07:00
John Johansen
9afdc6abb0 apparmor: transition from a list of rules to a vector of rules
The set of rules on a profile is not dynamically extended, instead
if a new ruleset is needed a new version of the profile is created.
This allows us to use a vector of rules instead of a list, slightly
reducing memory usage and simplifying the code.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:31:06 -07:00
Peng Jiang
f9c9dce01e apparmor: fix documentation mismatches in val_mask_to_str and socket functions
This patch fixes kernel-doc warnings:
1. val_mask_to_str:
- Added missing descriptions for `size` and `table` parameters.
- Removed outdated str_size and chrs references.
2. Socket Functions:
- Makes non-null requirements clear for socket/address args.
- Standardizes return values per kernel conventions.
- Adds Unix domain socket protocol details.

These changes silence doc validation warnings and improve accuracy for
AppArmor LSM docs.

Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:28 -07:00
Ryan Lee
4ce7d3cf5a apparmor: remove redundant perms.allow MAY_EXEC bitflag set
This section of profile_transition that occurs after x_to_label only
happens if perms.allow already has the MAY_EXEC bit set, so we don't need
to set it again.

Fixes: 16916b17b4 ("apparmor: force auditing of conflicting attachment execs from confined")
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:28 -07:00
John Johansen
da0edababa apparmor: fix kernel doc warnings for kernel test robot
Fix kernel doc warnings for the functions
- apparmor_socket_bind
- apparmor_unix_may_send
- apparmor_unix_stream_connect
- val_mask_to_str

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506070127.B1bc3da4-lkp@intel.com/
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
Helge Deller
c68804199d apparmor: Fix unaligned memory accesses in KUnit test
The testcase triggers some unnecessary unaligned memory accesses on the
parisc architecture:
  Kernel: unaligned access to 0x12f28e27 in policy_unpack_test_init+0x180/0x374 (iir 0x0cdc1280)
  Kernel: unaligned access to 0x12f28e67 in policy_unpack_test_init+0x270/0x374 (iir 0x64dc00ce)

Use the existing helper functions put_unaligned_le32() and
put_unaligned_le16() to avoid such warnings on architectures which
prefer aligned memory accesses.

Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 98c0cc48e2 ("apparmor: fix policy_unpack_test on big endian systems")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
Helge Deller
c567de2c4f apparmor: Fix 8-byte alignment for initial dfa blob streams
The dfa blob stream for the aa_dfa_unpack() function is expected to be aligned
on a 8 byte boundary.

The static nulldfa_src[] and stacksplitdfa_src[] arrays store the initial
apparmor dfa blob streams, but since they are declared as an array-of-chars
the compiler and linker will only ensure a "char" (1-byte) alignment.

Add an __aligned(8) annotation to the arrays to tell the linker to always
align them on a 8-byte boundary. This avoids runtime warnings at startup on
alignment-sensitive platforms like parisc such as:

 Kernel: unaligned access to 0x7f2a584a in aa_dfa_unpack+0x124/0x788 (iir 0xca0109f)
 Kernel: unaligned access to 0x7f2a584e in aa_dfa_unpack+0x210/0x788 (iir 0xca8109c)
 Kernel: unaligned access to 0x7f2a586a in aa_dfa_unpack+0x278/0x788 (iir 0xcb01090)

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Fixes: 98b824ff89 ("apparmor: refcount the pdb")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
Gabriel Totev
3fa0af4cc8 apparmor: shift uid when mediating af_unix in userns
Avoid unshifted ouids for socket file operations as observed when using
AppArmor profiles in unprivileged containers with LXD or Incus.

For example, root inside container and uid 1000000 outside, with
`owner /root/sock rw,` profile entry for nc:

/root$ nc -lkU sock & nc -U sock
==> dmesg
apparmor="DENIED" operation="connect" class="file"
namespace="root//lxd-podia_<var-snap-lxd-common-lxd>" profile="sockit"
name="/root/sock" pid=3924 comm="nc" requested_mask="wr" denied_mask="wr"
fsuid=1000000 ouid=0 [<== should be 1000000]

Fix by performing uid mapping as per common_perm_cond() in lsm.c

Signed-off-by: Gabriel Totev <gabriel.totev@zetier.com>
Fixes: c05e705812 ("apparmor: add fine grained af_unix mediation")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
Gabriel Totev
c5bf96d20f apparmor: shift ouid when mediating hard links in userns
When using AppArmor profiles inside an unprivileged container,
the link operation observes an unshifted ouid.
(tested with LXD and Incus)

For example, root inside container and uid 1000000 outside, with
`owner /root/link l,` profile entry for ln:

/root$ touch chain && ln chain link
==> dmesg
apparmor="DENIED" operation="link" class="file"
namespace="root//lxd-feet_<var-snap-lxd-common-lxd>" profile="linkit"
name="/root/link" pid=1655 comm="ln" requested_mask="l" denied_mask="l"
fsuid=1000000 ouid=0 [<== should be 1000000] target="/root/chain"

Fix by mapping inode uid of old_dentry in aa_path_link() rather than
using it directly, similarly to how it's mapped in __file_path_perm()
later in the file.

Signed-off-by: Gabriel Totev <gabriel.totev@zetier.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
John Johansen
88fec3526e apparmor: make sure unix socket labeling is correctly updated.
When a unix socket is passed into a different confinement domain make
sure its cached mediation labeling is updated to correctly reflect
which domains are using the socket.

Fixes: c05e705812 ("apparmor: add fine grained af_unix mediation")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20 02:19:27 -07:00
Mickaël Salaün
6803b6ebb8 landlock: Fix cosmetic change
This line removal should not be there and it makes it more difficult to
backport the following patch.

Cc: Günther Noack <gnoack@google.com>
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Fixes: 7a11275c37 ("landlock: Refactor layer helpers")
Link: https://lore.kernel.org/r/20250719104204.545188-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-07-19 12:44:16 +02:00
John Johansen
6456ccbd2f apparmor: fix regression in fs based unix sockets when using old abi
Policy loaded using abi 7 socket mediation was not being applied
correctly in all cases. In some cases with fs based unix sockets a
subset of permissions where allowed when they should have been denied.

This was happening because the check for if the socket was an fs based
unix socket came before the abi check. But the abi check is where the
correct path is selected, so having the fs unix socket check occur
early would cause the wrong code path to be used.

Fix this by pushing the fs unix to be done after the abi check.

Fixes: dcd7a55941 ("apparmor: gate make fine grained unix mediation behind v9 abi")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
John Johansen
50d56a1a36 apparmor: fix AA_DEBUG_LABEL()
AA_DEBUG_LABEL() was not specifying it vargs, which is needed so it can
output debug parameters.

Fixes: 71e6cff3e0 ("apparmor: Improve debug print infrastructure")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
John Johansen
a30a9fdb66 apparmor: fix af_unix auditing to include all address information
The auditing of addresses currently doesn't include the source address
and mixes source and foreign/peer under the same audit name. Fix this
so source is always addr, and the foreign/peer is peer_addr.

Fixes: c05e705812 ("apparmor: add fine grained af_unix mediation")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
John Johansen
bc6e5f6933 apparmor: Remove use of the double lock
The use of the double lock is not necessary and problematic. Instead
pull the bits that need locks into their own sections and grab the
needed references.

Fixes: c05e705812 ("apparmor: add fine grained af_unix mediation")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
John Johansen
6afb0a7bc9 apparmor: update kernel doc comments for xxx_label_crit_section
Add a kernel doc header for __end_current_label_crit_section(), and
update the header for __begin_current_label_crit_section().

Fixes: b42ecc5f58ef ("apparmor: make __begin_current_label_crit_section() indicate whether put is needed")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
Mateusz Guzik
87cc7b0011 apparmor: make __begin_current_label_crit_section() indicate whether put is needed
Same as aa_get_newest_cred_label_condref().

This avoids a bunch of work overall and allows the compiler to note when no
clean up is necessary, allowing for tail calls.

This in particular happens in apparmor_file_permission(), which manages to
tail call aa_file_perm() 105 bytes in (vs a regular call 112 bytes in
followed by branches to figure out if clean up is needed).

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:43 -07:00
John Johansen
37a3741d27 Revert "apparmor: use SHA-256 library API instead of crypto_shash API"
This reverts commit e9ed1eb8f6.

Eric has requested that this patch be taken through the libcrypto-next
tree, instead.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:22 -07:00
John Johansen
aff426f359 apparmor: mitigate parser generating large xtables
Some versions of the parser are generating an xtable transition per
state in the state machine, even when the state machine isn't using
the transition table.

The parser bug is triggered by
commit 2e12c5f060 ("apparmor: add additional flags to extended permission.")

In addition to fixing this in userspace, mitigate this in the kernel
as part of the policy verification checks by detecting this situation
and adjusting to what is actually used, or if not used at all freeing
it, so we are not wasting unneeded memory on policy.

Fixes: 2e12c5f060 ("apparmor: add additional flags to extended permission.")
Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-15 22:39:07 -07:00
Eric Biggers
f93c27092a apparmor: use SHA-256 library API instead of crypto_shash API
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value.  Just use the SHA-256
library API instead, which is much simpler and easier to use.

Acked-by: John Johansen <john.johansen@canonical.com>
Link: https://lore.kernel.org/r/20250630174805.59010-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 11:29:31 -07:00
Srish Srinivasan
bde5b1a155 integrity/platform_certs: Allow loading of keys in the static key management mode
On PLPKS enabled PowerVM LPAR, there is no provision to load signed
third-party kernel modules when the key management mode is static. This
is because keys from secure boot secvars are only loaded when the key
management mode is dynamic.

Allow loading of the trustedcadb and moduledb keys even in the static
key management mode, where the secvar format string takes the form
"ibm,plpks-sb-v0".

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610211907.101384-4-ssrish@linux.ibm.com
2025-07-09 09:16:18 +05:30
Christian Brauner
ca115d7e75 tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Andrey Albershteyn
bd14e462bb selinux: implement inode_file_[g|s]etattr hooks
These hooks are called on inode extended attribute retrieval/change.

Cc: selinux@vger.kernel.org
Cc: Paul Moore <paul@paul-moore.com>

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-3-c4e3bc35227b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01 22:44:29 +02:00
Andrey Albershteyn
defdd02d78 lsm: introduce new hooks for setting/getting inode fsxattr
Introduce new hooks for setting and getting filesystem extended
attributes on inode (FS_IOC_FSGETXATTR).

Cc: selinux@vger.kernel.org
Cc: Paul Moore <paul@paul-moore.com>

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-2-c4e3bc35227b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01 22:44:29 +02:00
Tingmao Wang
e0a69cf2c0 landlock: Fix warning from KUnit tests
get_id_range() expects a positive value as first argument but
get_random_u8() can return 0.  Fix this by clamping it.

Validated by running the test in a for loop for 1000 times.

Note that MAX() is wrong as it is only supposed to be used for
constants, but max() is good here.

  [..]     ok 9 test_range2_rand1
  [..]     ok 10 test_range2_rand2
  [..]     ok 11 test_range2_rand15
  [..] ------------[ cut here ]------------
  [..] WARNING: CPU: 6 PID: 104 at security/landlock/id.c:99 test_range2_rand16 (security/landlock/id.c:99 (discriminator 1) security/landlock/id.c:234 (discriminator 1))
  [..] Modules linked in:
  [..] CPU: 6 UID: 0 PID: 104 Comm: kunit_try_catch Tainted: G                 N  6.16.0-rc1-dev-00001-g314a2f98b65f #1 PREEMPT(undef)
  [..] Tainted: [N]=TEST
  [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [..] RIP: 0010:test_range2_rand16 (security/landlock/id.c:99 (discriminator 1) security/landlock/id.c:234 (discriminator 1))
  [..] Code: 49 c7 c0 10 70 30 82 4c 89 ff 48 c7 c6 a0 63 1e 83 49 c7 45 a0 e0 63 1e 83 e8 3f 95 17 00 e9 1f ff ff ff 0f 0b e9 df fd ff ff <0f> 0b ba 01 00 00 00 e9 68 fe ff ff 49 89 45 a8 49 8d 4d a0 45 31

  [..] RSP: 0000:ffff888104eb7c78 EFLAGS: 00010246
  [..] RAX: 0000000000000000 RBX: 000000000870822c RCX: 0000000000000000
            ^^^^^^^^^^^^^^^^
  [..]
  [..] Call Trace:
  [..]
  [..] ---[ end trace 0000000000000000 ]---
  [..]     ok 12 test_range2_rand16
  [..] # landlock_id: pass:12 fail:0 skip:0 total:12
  [..] # Totals: pass:12 fail:0 skip:0 total:12
  [..] ok 1 landlock_id

Fixes: d9d2a68ed4 ("landlock: Add unique ID generator")
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/73e28efc5b8cc394608b99d5bc2596ca917d7c4a.1750003733.git.m@maowtm.org
[mic: Minor cosmetic improvements]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-06-27 10:10:37 +02:00
Al Viro
ee79ba39b3 selinux: don't bother with selinuxfs_info_free() on failures
Failures in sel_fill_super() will be followed by sel_kill_sb(), which
will call selinuxfs_info_free() anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christian Brauner <brauner@kernel.org>
[PM: subj and description tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-24 19:39:28 -04:00
Eric W. Biederman
337490f000 exec: Correct the permission check for unsafe exec
Max Kellerman recently experienced a problem[1] when calling exec with
differing uid and euid's and he triggered the logic that is supposed
to only handle setuid executables.

When exec isn't changing anything in struct cred it doesn't make sense
to go into the code that is there to handle the case when the
credentials change.

When looking into the history of the code I discovered that this issue
was not present in Linux-2.4.0-test12 and was introduced in
Linux-2.4.0-prerelease when the logic for handling this case was moved
from prepare_binprm to compute_creds in fs/exec.c.

The bug introdused was to comparing euid in the new credentials with
uid instead of euid in the old credentials, when testing if setuid
had changed the euid.

Since triggering the keep ptrace limping along case for setuid
executables makes no sense when it was not a setuid exec revert back
to the logic present in Linux-2.4.0-test12.

This removes the confusingly named and subtlety incorrect helpers
is_setuid and is_setgid, that helped this bug to persist.

The varaiable is_setid is renamed to id_changed (it's Linux-2.4.0-test12)
as the old name describes what matters rather than it's cause.

The code removed in Linux-2.4.0-prerelease was:
-       /* Set-uid? */
-       if (mode & S_ISUID) {
-               bprm->e_uid = inode->i_uid;
-               if (bprm->e_uid != current->euid)
-                       id_change = 1;
-       }
-
-       /* Set-gid? */
-       /*
-        * If setgid is set but no group execute bit then this
-        * is a candidate for mandatory locking, not a setgid
-        * executable.
-        */
-       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-               bprm->e_gid = inode->i_gid;
-               if (!in_group_p(bprm->e_gid))
-                       id_change = 1;

Linux-2.4.0-prerelease added the current logic as:
+       if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
+           !cap_issubset(new_permitted, current->cap_permitted)) {
+                current->dumpable = 0;
+
+               lock_kernel();
+               if (must_not_trace_exec(current)
+                   || atomic_read(&current->fs->count) > 1
+                   || atomic_read(&current->files->count) > 1
+                   || atomic_read(&current->sig->count) > 1) {
+                       if(!capable(CAP_SETUID)) {
+                               bprm->e_uid = current->uid;
+                               bprm->e_gid = current->gid;
+                       }
+                       if(!capable(CAP_SETPCAP)) {
+                               new_permitted = cap_intersect(new_permitted,
+                                                       current->cap_permitted);
+                       }
+               }
+               do_unlock = 1;
+       }

I have condenced the logic from Linux-2.4.0-test12 to just:
	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);

This change is userspace visible, but I don't expect anyone to care.

For the bug that is being fixed to trigger bprm->unsafe has to be set.
The variable bprm->unsafe is set when ptracing an executable, when
sharing a working directory, or when no_new_privs is set.  Properly
testing for cases that are safe even in those conditions and doing
nothing special should not affect anyone.  Especially if they were
previously ok with their credentials getting munged

To minimize behavioural changes the code continues to set secureexec
when euid != uid or when egid != gid.

[1] https://lkml.kernel.org/r/20250306082615.174777-1-max.kellermann@ionos.com

Reported-by: Max Kellermann <max.kellermann@ionos.com>
Fixes: 64444d3d0d7f ("Linux version 2.4.0-prerelease")
v1: https://lkml.kernel.org/r/878qmxsuy8.fsf@email.froward.int.ebiederm.org
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <kees@kernel.org>
2025-06-23 10:38:39 -05:00
Paul Moore
9ab71d9204 selinux: add __GFP_NOWARN to hashtab_init() allocations
As reported by syzbot, hashtab_init() can be affected by abnormally
large policy loads which would cause the kernel's allocator to emit
a warning in some configurations.  Since the SELinux hashtab_init()
code handles the case where the allocation fails, due to a large
request or some other reason, we can safely add the __GFP_NOWARN flag
to squelch these abnormally large allocation warnings.

Reported-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com
Tested-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 17:24:57 -04:00
Stephen Smalley
951b2de06a selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
Extend the task avdcache to also cache whether the task SID is both
permissive and neveraudit, and return immediately if so in both
selinux_inode_getattr() and selinux_inode_permission().

The same approach could be applied to many of the hook functions
although the avdcache would need to be updated for more than directory
search checks in order for this optimization to be beneficial for checks
on objects other than directories.

To test, apply https://github.com/SELinuxProject/selinux/pull/473 to
your selinux userspace, build and install libsepol, and use the following
CIL policy module:
$ cat neverauditpermissive.cil
(typeneveraudit unconfined_t)
(typepermissive unconfined_t)

Without this module inserted, running the following commands:
   perf record make -jN # on an already built allmodconfig tree
   perf report --sort=symbol,dso
yields the following percentages (only showing __d_lookup_rcu for
reference and only showing relevant SELinux functions):
   1.65%  [k] __d_lookup_rcu
   0.53%  [k] selinux_inode_permission
   0.40%  [k] selinux_inode_getattr
   0.15%  [k] avc_lookup
   0.05%  [k] avc_has_perm
   0.05%  [k] avc_has_perm_noaudit
   0.02%  [k] avc_policy_seqno
   0.02%  [k] selinux_file_permission
   0.01%  [k] selinux_inode_alloc_security
   0.01%  [k] selinux_file_alloc_security
for a total of 1.24% for SELinux compared to 1.65% for
__d_lookup_rcu().

After running the following command to insert this module:
   semodule -i neverauditpermissive.cil
and then re-running the same perf commands from above yields
the following non-zero percentages:
   1.74%  [k] __d_lookup_rcu
   0.31%  [k] selinux_inode_permission
   0.03%  [k] selinux_inode_getattr
   0.03%  [k] avc_policy_seqno
   0.01%  [k] avc_lookup
   0.01%  [k] selinux_file_permission
   0.01%  [k] selinux_file_open
for a total of 0.40% for SELinux compared to 1.74% for
__d_lookup_rcu().

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 17:23:05 -04:00
Stephen Smalley
1106896146 selinux: introduce neveraudit types
Introduce neveraudit types i.e. types that should never trigger
audit messages. This allows the AVC to skip all audit-related
processing for such types. Note that neveraudit differs from
dontaudit not only wrt being applied for all checks with a given
source type but also in that it disables all auditing, not just
permission denials.

When a type is both a permissive type and a neveraudit type,
the security server can short-circuit the security_compute_av()
logic, allowing all permissions and not auditing any permissions.

This change just introduces the basic support but does not yet
further optimize the AVC or hook function logic when a type
is both a permissive type and a dontaudit type.

Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 17:23:04 -04:00
Stephen Smalley
fde46f60f6 selinux: change security_compute_sid to return the ssid or tsid on match
If the end result of a security_compute_sid() computation matches the
ssid or tsid, return that SID rather than looking it up again. This
avoids the problem of multiple initial SIDs that map to the same
context.

Cc: stable@vger.kernel.org
Reported-by: Guido Trentalancia <guido@trentalancia.com>
Fixes: ae254858ce ("selinux: introduce an initial SID for early boot processes")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Guido Trentalancia <guido@trentalancia.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 16:13:16 -04:00
Al Viro
5be998a218 ipe: don't bother with removal of files in directory we'll be removing
... and use securityfs_remove() instead of securityfs_recursive_remove()

Acked-by: Fan Wu <wufan@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:10:53 -04:00
Al Viro
e25fc5540c evm_secfs: clear securityfs interactions
1) creation never returns NULL; error is reported as ERR_PTR()
2) no need to remove file before removing its parent

Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:10:30 -04:00
Al Viro
d15ffbbf4d ima_fs: get rid of lookup-by-dentry stuff
lookup_template_data_hash_algo() machinery is used to locate the
matching ima_algo_array[] element at read time; securityfs
allows to stash that into inode->i_private at object creation
time, so there's no need to bother

Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:10:14 -04:00
Al Viro
22260a99d7 ima_fs: don't bother with removal of files in directory we'll be removing
removal of parent takes all children out

Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:09:52 -04:00
Al Viro
273a291dd7 apparmor: file never has NULL f_path.mnt
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:04:36 -04:00
Al Viro
d1832e648d landlock: opened file never has a negative dentry
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:03:57 -04:00
Stephen Smalley
86c8db86af selinux: fix selinux_xfrm_alloc_user() to set correct ctx_len
We should count the terminating NUL byte as part of the ctx_len.
Otherwise, UBSAN logs a warning:
  UBSAN: array-index-out-of-bounds in security/selinux/xfrm.c:99:14
  index 60 is out of range for type 'char [*]'

The allocation itself is correct so there is no actual out of bounds
indexing, just a warning.

Cc: stable@vger.kernel.org
Suggested-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/selinux/CAEjxPJ6tA5+LxsGfOJokzdPeRomBHjKLBVR6zbrg+_w3ZZbM3A@mail.gmail.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-16 19:02:22 -04:00
Paul Moore
8a71d8fa55 selinux: add a 5 second sleep to /sys/fs/selinux/user
Commit d7b6918e22 ("selinux: Deprecate /sys/fs/selinux/user") started
the deprecation process for /sys/fs/selinux/user:

  The selinuxfs "user" node allows userspace to request a list
  of security contexts that can be reached for a given SELinux
  user from a given starting context. This was used by libselinux
  when various login-style programs requested contexts for
  users, but libselinux stopped using it in 2020.
  Kernel support will be removed no sooner than Dec 2025.

A pr_warn() message has been in place since Linux v6.13, this patch
adds a five second sleep to /sys/fs/selinux/user to help make the
deprecation and upcoming removal more noticeable.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-16 18:44:03 -04:00
Kalevi Kolttonen
9fc86a85f3 lsm: trivial comment fix
Fix a typo in the security_inode_mkdir() comment block.

Signed-off-by: Kalevi Kolttonen <kalevi@kolttonen.fi>
[PM: subject tweak, add description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-16 18:43:13 -04:00
Baoquan He
aa9bb1b325 ima: add a knob ima= to allow disabling IMA in kdump kernel
Kdump kernel doesn't need IMA functionality, and enabling IMA will cost
extra memory. It would be very helpful to allow IMA to be disabled for
kdump kernel.

Hence add a knob ima=on|off here to allow turning IMA off in kdump
kernel if needed.

Note that this IMA disabling is limited to kdump kernel, please don't
abuse it in other kernel and thus serious consequences are caused.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-06-16 09:15:13 -04:00
Al Viro
29d673b150 make securityfs_remove() remove the entire subtree
... and fix the mount leak when anything's mounted there.
securityfs_recursive_remove becomes an alias for securityfs_remove -
we'll probably need to remove it in a cycle or two.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11 18:19:46 -04:00
Al Viro
e4de726502 securityfs: pin filesystem only for objects directly in root
Nothing on securityfs ever changes parents, so we don't need
to pin the internal mount if it's already pinned for parent.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11 18:00:00 -04:00
Al Viro
27cd1bf124 securityfs: don't pin dentries twice, once is enough...
incidentally, securityfs_recursive_remove() is broken without that -
it leaks dentries, since simple_recursive_removal() does not expect
anything of that sort.  It could be worked around by dput() in
remove_one() callback, but it's easier to just drop that double-get
stuff.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11 15:07:57 -04:00
Herbert Xu
488ef35601 KEYS: Invert FINAL_PUT bit
Invert the FINAL_PUT bit so that test_bit_acquire and clear_bit_unlock
can be used instead of smp_mb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-11 11:57:14 -07:00
Linus Torvalds
dee264c16a Merge tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull compiler version requirement update from Arnd Bergmann:
 "Require gcc-8 and binutils-2.30

  x86 already uses gcc-8 as the minimum version, this changes all other
  architectures to the same version. gcc-8 is used is Debian 10 and Red
  Hat Enterprise Linux 8, both of which are still supported, and
  binutils 2.30 is the oldest corresponding version on those.

  Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as
  the system compiler but additionally include toolchains that remain
  supported.

  With the new minimum toolchain versions, a number of workarounds for
  older versions can be dropped, in particular on x86_64 and arm64.
  Importantly, the updated compiler version allows removing two of the
  five remaining gcc plugins, as support for sancov and structeak
  features is already included in modern compiler versions.

  I tried collecting the known changes that are possible based on the
  new toolchain version, but expect that more cleanups will be possible.

  Since this touches multiple architectures, I merged the patches
  through the asm-generic tree."

* tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV
  Documentation: update binutils-2.30 version reference
  gcc-plugins: remove SANCOV gcc plugin
  Kbuild: remove structleak gcc plugin
  arm64: drop binutils version checks
  raid6: skip avx512 checks
  kbuild: require gcc-8 and binutils-2.30
2025-05-31 08:16:52 -07:00