Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>:🆕:<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
Sometimes we may need to iterate over, or find an element in a read
only (or read mostly) red-black tree, and in that case we don't need a
mutable reference to the tree, which we'll however have to take to be
able to use the current (mutable) cursor implementation.
This patch adds a simple immutable cursor implementation to RBTree,
which enables us to use an immutable tree reference. The existing
(fully featured) cursor implementation is renamed to CursorMut,
while retaining its functionality.
The only existing user of the [mutable] cursor for RBTrees (binder) is
updated to match the changes.
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251014123339.2492210-1-vitaly.wool@konsulko.se
[ Applied `rustfmt`. Added intra-doc link. Fixed unclosed example.
Fixed docs description. Fixed typo and other formatting nits.
- Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
There are no scenarios where a weak increment is invalid on binder_node.
The only possible case where it could be invalid is if the kernel
delivers BR_DECREFS to the process that owns the node, and then
increments the weak refcount again, effectively "reviving" a dead node.
However, that is not possible: when the BR_DECREFS command is delivered,
the kernel removes and frees the binder_node. The fact that you were
able to call binder_inc_node_nilocked() implies that the node is not yet
destroyed, which implies that BR_DECREFS has not been delivered to
userspace, so incrementing the weak refcount is valid.
Note that it's currently possible to trigger this condition if the owner
calls BINDER_THREAD_EXIT while node->has_weak_ref is true. This causes
BC_INCREFS on binder_ref instances to fail when they should not.
Cc: stable@vger.kernel.org
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Reported-by: Yu-Ting Tseng <yutingtseng@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251015-binder-weak-inc-v1-1-7914b092c371@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`kernel::ffi::CStr` was introduced in commit d126d23801 ("rust: str:
add `CStr` type") in November 2022 as an upstreaming of earlier work
that was done in May 2021[0]. That earlier work, having predated the
inclusion of `CStr` in `core`, largely duplicated the implementation of
`std::ffi::CStr`.
`std::ffi::CStr` was moved to `core::ffi::CStr` in Rust 1.64 in
September 2022. Hence replace `kernel::str::CStr` with `core::ffi::CStr`
to reduce our custom code footprint, and retain needed custom
functionality through an extension trait.
Add `CStr` to `ffi` and the kernel prelude.
Link: faa3cbcca0 [0]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-16-9378a54385f8@gmail.com
[ Removed assert that would now depend on the Rust version. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Fix several kernel-doc warnings in `drivers/android/binder.c` caused by
undocumented struct members and function parameters.
In particular, add missing documentation for the `@thread` parameter in
binder_free_buf_locked().
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Binder only sends out freeze notifications when ioctl_freeze() completes
and the process has become fully frozen. However, if a freeze
notification is registered during the freeze operation, then it
registers an initial state of 'frozen'. This is a problem because if
the freeze operation fails, then the listener is not told about that
state change, leading to lost updates.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When userspace issues commands to a freeze listener, it identifies it
using a cookie. Normally this cookie uniquely identifies a freeze
listener, but when userspace clears a listener with the intent of
deleting it, it's allowed to "regret" clearing it and create a new
freeze listener for the same node using the same cookie. (IMO this was
an API mistake, but userspace relies on it.)
Currently if the active freeze listener gets fully deleted while there
are still pending duplicates, then the code incorrectly deletes the
pending duplicates too. To fix this, do not delete the entry if there
are still pending duplicates.
Since the current data structure requires a main freeze listener, we
convert one pending duplicate into the primary listener in this
scenario.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Consider the following scenario:
1. A freeze notification is delivered to thread 1.
2. The process becomes frozen or unfrozen.
3. The message for step 2 is delivered to thread 2 and ignored because
there is already a pending notification from step 1.
4. Thread 1 acknowledges the notification from step 1.
In this case, step 4 should ensure that the message ignored in step 3 is
resent as it can now be delivered.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clippy reports:
error: replacing a value of type `T` with `T::default()` is better expressed using `core::mem::take`
--> drivers/android/binder/node.rs:690:32
|
690 | _unused_capacity = mem::replace(&mut inner.freeze_list, KVVec::new());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using: `core::mem::take(&mut inner.freeze_list)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#mem_replace_with_default
= note: `-D clippy::mem-replace-with-default` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::mem_replace_with_default)]`
The suggestion seems fine, thus apply it.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're generally not proponents of rewrites (nasty uncomfortable things
that make you late for dinner!). So why rewrite Binder?
Binder has been evolving over the past 15+ years to meet the evolving
needs of Android. Its responsibilities, expectations, and complexity
have grown considerably during that time. While we expect Binder to
continue to evolve along with Android, there are a number of factors
that currently constrain our ability to develop/maintain it. Briefly
those are:
1. Complexity: Binder is at the intersection of everything in Android and
fulfills many responsibilities beyond IPC. It has become many things
to many people, and due to its many features and their interactions
with each other, its complexity is quite high. In just 6kLOC it must
deliver transactions to the right threads. It must correctly parse
and translate the contents of transactions, which can contain several
objects of different types (e.g., pointers, fds) that can interact
with each other. It controls the size of thread pools in userspace,
and ensures that transactions are assigned to threads in ways that
avoid deadlocks where the threadpool has run out of threads. It must
track refcounts of objects that are shared by several processes by
forwarding refcount changes between the processes correctly. It must
handle numerous error scenarios and it combines/nests 13 different
locks, 7 reference counters, and atomic variables. Finally, It must
do all of this as fast and efficiently as possible. Minor performance
regressions can cause a noticeably degraded user experience.
2. Things to improve: Thousand-line functions [1], error-prone error
handling [2], and confusing structure can occur as a code base grows
organically. After more than a decade of development, this codebase
could use an overhaul.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n2896
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n3658
3. Security critical: Binder is a critical part of Android's sandboxing
strategy. Even Android's most de-privileged sandboxes (e.g. the
Chrome renderer, or SW Codec) have direct access to Binder. More than
just about any other component, it's important that Binder provide
robust security, and itself be robust against security
vulnerabilities.
It's #1 (high complexity) that has made continuing to evolve Binder and
resolving #2 (tech debt) exceptionally difficult without causing #3
(security issues). For Binder to continue to meet Android's needs, we
need better ways to manage (and reduce!) complexity without increasing
the risk.
The biggest change is obviously the choice of programming language. We
decided to use Rust because it directly addresses a number of the
challenges within Binder that we have faced during the last years. It
prevents mistakes with ref counting, locking, bounds checking, and also
does a lot to reduce the complexity of error handling. Additionally,
we've been able to use the more expressive type system to encode the
ownership semantics of the various structs and pointers, which takes the
complexity of managing object lifetimes out of the hands of the
programmer, reducing the risk of use-after-frees and similar problems.
Rust has many different pointer types that it uses to encode ownership
semantics into the type system, and this is probably one of the most
important aspects of how it helps in Binder. The Binder driver has a lot
of different objects that have complex ownership semantics; some
pointers own a refcount, some pointers have exclusive ownership, and
some pointers just reference the object and it is kept alive in some
other manner. With Rust, we can use a different pointer type for each
kind of pointer, which enables the compiler to enforce that the
ownership semantics are implemented correctly.
Another useful feature is Rust's error handling. Rust allows for more
simplified error handling with features such as destructors, and you get
compilation failures if errors are not properly handled. This means that
even though Rust requires you to spend more lines of code than C on
things such as writing down invariants that are left implicit in C, the
Rust driver is still slightly smaller than C binder: Rust is 5.5kLOC and
C is 5.8kLOC. (These numbers are excluding blank lines, comments,
binderfs, and any debugging facilities in C that are not yet implemented
in the Rust driver. The numbers include abstractions in rust/kernel/
that are unlikely to be used by other drivers than Binder.)
Although this rewrite completely rethinks how the code is structured and
how assumptions are enforced, we do not fundamentally change *how* the
driver does the things it does. A lot of careful thought has gone into
the existing design. The rewrite is aimed rather at improving code
health, structure, readability, robustness, security, maintainability
and extensibility. We also include more inline documentation, and
improve how assumptions in the code are enforced. Furthermore, all
unsafe code is annotated with a SAFETY comment that explains why it is
correct.
We have left the binderfs filesystem component in C. Rewriting it in
Rust would be a large amount of work and requires a lot of bindings to
the file system interfaces. Binderfs has not historically had the same
challenges with security and complexity, so rewriting binderfs seems to
have lower value than the rest of Binder.
Correctness and feature parity
------------------------------
Rust binder passes all tests that validate the correctness of Binder in
the Android Open Source Project. We can boot a device, and run a variety
of apps and functionality without issues. We have performed this both on
the Cuttlefish Android emulator device, and on a Pixel 6 Pro.
As for feature parity, Rust binder currently implements all features
that C binder supports, with the exception of some debugging facilities.
The missing debugging facilities will be added before we submit the Rust
implementation upstream.
Tracepoints
-----------
I did not include all of the tracepoints as I felt that the mechansim
for making C access fields of Rust structs should be discussed on list
separately. I also did not include the support for building Rust Binder
as a module since that requires exporting a bunch of additional symbols
on the C side.
Original RFC Link with old benchmark numbers:
https://lore.kernel.org/r/20231101-rust-binder-v1-0-08ba9197f637@google.com
Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Matt Gilbride <mattgilbride@google.com>
Signed-off-by: Matt Gilbride <mattgilbride@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250919-rust-binder-v2-1-a384b09f28dd@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A process might fail to allocate a new bitmap when trying to expand its
proc->dmap. In that case, dbitmap_grow() fails and frees the old bitmap
via dbitmap_free(). However, the driver calls dbitmap_free() again when
the same process terminates, leading to a double-free error:
==================================================================
BUG: KASAN: double-free in binder_proc_dec_tmpref+0x2e0/0x55c
Free of addr ffff00000b7c1420 by task kworker/9:1/209
CPU: 9 UID: 0 PID: 209 Comm: kworker/9:1 Not tainted 6.17.0-rc6-dirty #5 PREEMPT
Hardware name: linux,dummy-virt (DT)
Workqueue: events binder_deferred_func
Call trace:
kfree+0x164/0x31c
binder_proc_dec_tmpref+0x2e0/0x55c
binder_deferred_func+0xc24/0x1120
process_one_work+0x520/0xba4
[...]
Allocated by task 448:
__kmalloc_noprof+0x178/0x3c0
bitmap_zalloc+0x24/0x30
binder_open+0x14c/0xc10
[...]
Freed by task 449:
kfree+0x184/0x31c
binder_inc_ref_for_node+0xb44/0xe44
binder_transaction+0x29b4/0x7fbc
binder_thread_write+0x1708/0x442c
binder_ioctl+0x1b50/0x2900
[...]
==================================================================
Fix this issue by marking proc->map NULL in dbitmap_free().
Cc: stable@vger.kernel.org
Fixes: 15d9da3f81 ("binder: use bitmap for faster descriptor lookup")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Tiffany Yang <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250915221248.3470154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce a generic netlink multicast event to report binder transaction
failures to userspace. This allows subscribers to monitor these events
and take appropriate actions, such as stopping a misbehaving application
that is spamming a service with huge amount of transactions.
The multicast event contains full details of the failed transactions,
including the sender/target PIDs, payload size and specific error code.
This interface is defined using a YAML spec, from which the UAPI and
kernel headers and source are auto-generated.
Signed-off-by: Li Li <dualli@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-4-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace the t->need_reply flag with the more descriptive t->is_async and
and t->is_reply flags. The 'need_reply' flag was only used for debugging
purposes and the new flags can be used to distinguish between the type
of transactions too: sync, async and reply.
For now, only update the logging in print_binder_transaction_ilocked().
However, the new flags can be used in the future to replace the current
patterns and improve readability. e.g.:
- if (!reply && !(tr->flags & TF_ONE_WAY))
+ if (t->is_async)
This patch is in preparation for binder's generic netlink implementation
and no functional changes are intended.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the allocation of 'struct binder_transaction' to the beginning of
the binder_transaction() function, along with the initialization of all
the members that are known at that time. This minor refactoring helps to
consolidate the usage of transaction information at later points.
This patch is in preparation for binder's generic netlink implementation
and no functional changes are intended.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull char / misc / IIO / other driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystems for 6.17-rc1. It's a big set this time around, with the
huge majority being in the iio subsystem with new drivers and dts
files being added there.
Highlights include:
- IIO driver updates, additions, and changes making more code const
and cleaning up some init logic
- bus_type constant conversion changes
- misc device test functions added
- rust miscdevice minor fixup
- unused function removals for some drivers
- mei driver updates
- mhi driver updates
- interconnect driver updates
- Android binder updates and test infrastructure added
- small cdx driver updates
- small comedi fixes
- small nvmem driver updates
- small pps driver updates
- some acrn virt driver fixes for printk messages
- other small driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
binder: Use seq_buf in binder_alloc kunit tests
binder: Add copyright notice to new kunit files
misc: ti_fpc202: Switch to of_fwnode_handle()
bus: moxtet: Use dev_fwnode()
pc104: move PC104 option to drivers/Kconfig
drivers: virt: acrn: Don't use %pK through printk
comedi: fix race between polling and detaching
interconnect: qcom: Add Milos interconnect provider driver
dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
mei: more prints with client prefix
mei: bus: use cldev in prints
bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
bus: mhi: host: Detect events pointing to unexpected TREs
bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
bus: mhi: host: Use str_true_false() helper
bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
bus: mhi: host: Fix endianness of BHI vector table
bus: mhi: host: pci_generic: Disable runtime PM for QDU100
bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
...
Pull misc VFS updates from Al Viro:
"VFS-related cleanups in various places (mostly of the "that really
can't happen" or "there's a better way to do it" variety)"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gpib: use file_inode()
binder_ioctl_write_read(): simplify control flow a bit
secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file
apparmor: file never has NULL f_path.mnt
landlock: opened file never has a negative dentry
Each case tested by the binder allocator test is defined by 3 parameters:
the end alignment type of each requested buffer allocation, whether those
buffers share the front or back pages of the allotted address space, and
the order in which those buffers should be released. The alignment type
represents how a binder buffer may be laid out within or across page
boundaries and relative to other buffers, and it's used along with
whether the buffers cover part (sharing the front pages) of or all
(sharing the back pages) of the vma to calculate the sizes passed into
each test.
binder_alloc_test_alloc recursively generates each possible arrangement
of alignment types and then tests that the binder_alloc code tracks pages
correctly when those buffers are allocated and then freed in every
possible order at both ends of the address space. While they provide
comprehensive coverage, they are poor candidates to be represented as
KUnit test cases, which must be statically enumerated. For 5 buffers and
5 end alignment types, the test case array would have 750,000 entries.
This change structures the recursive calls into meaningful test cases so
that failures are easier to interpret.
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-7-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert the existing binder_alloc_selftest tests into KUnit tests. These
tests allocate and free an exhaustive combination of buffers with
various sizes and alignments. This change allows them to be run without
blocking or otherwise interfering with other processes in binder.
This test is refactored into more meaningful cases in the subsequent
patch.
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-6-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Store a pointer to the free pages list that the binder allocator should
use for a process inside of struct binder_alloc. This change allows
binder allocator code to be tested and debugged deterministically while
a system is using binder; i.e., without interfering with other binder
processes and independently of the shrinker. This is necessary to
convert the current binder_alloc_selftest into a kunit test that does
not rely on hijacking an existing binder_proc to run.
A binder process's binder_alloc->freelist should not be changed after
it is initialized. A sole exception is the process that runs the
existing binder_alloc selftest. Its freelist can be temporarily replaced
for the duration of the test because it runs as a single thread before
any pages can be added to the global binder freelist, and the test frees
every page it allocates before dropping the binder_selftest_lock. This
exception allows the existing selftest to be used to check for
regressions, but it will be dropped when the binder_alloc tests are
converted to kunit in a subsequent patch in this series.
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-3-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
don't try to open-code simple_recursive_removal(), especially when
you miss things like d_invalidate()...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull char / misc / iio driver updates from Greg KH:
"Here is the big char/misc/iio and other small driver subsystem pull
request for 6.16-rc1.
Overall, a lot of individual changes, but nothing major, just the
normal constant forward progress of new device support and cleanups to
existing subsystems. Highlights in here are:
- Large IIO driver updates and additions and device tree changes
- Android binder bugfixes and logfile fixes
- mhi driver updates
- comedi driver updates
- counter driver updates and additions
- coresight driver updates and additions
- echo driver removal as there are no in-kernel users of it
- nvmem driver updates
- spmi driver updates
- new amd-sbi driver "subsystem" and drivers added
- rust miscdriver binding documentation fix
- other small driver fixes and updates (uio, w1, acrn, hpet,
xillybus, cardreader drivers, fastrpc and others)
All of these have been in linux-next for quite a while with no
reported problems"
* tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (390 commits)
binder: fix yet another UAF in binder_devices
counter: microchip-tcb-capture: Add watch validation support
dt-bindings: iio: adc: Add ROHM BD79100G
iio: adc: add support for Nuvoton NCT7201
dt-bindings: iio: adc: add NCT7201 ADCs
iio: chemical: Add driver for SEN0322
dt-bindings: trivial-devices: Document SEN0322
iio: adc: ad7768-1: reorganize driver headers
iio: bmp280: zero-init buffer
iio: ssp_sensors: optimalize -> optimize
HID: sensor-hub: Fix typo and improve documentation
iio: admv1013: replace redundant ternary operator with just len
iio: chemical: mhz19b: Fix error code in probe()
iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS
iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS
iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS
iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros
iio: make IIO_DMA_MINALIGN minimum of 8 bytes
...
Pull vfs directory lookup updates from Christian Brauner:
"This contains cleanups for the lookup_one*() family of helpers.
We expose a set of functions with names containing "lookup_one_len"
and others without the "_len". This difference has nothing to do with
"len". It's rater a historical accident that can be confusing.
The functions without "_len" take a "mnt_idmap" pointer. This is found
in the "vfsmount" and that is an important question when choosing
which to use: do you have a vfsmount, or are you "inside" the
filesystem. A related question is "is permission checking relevant
here?".
nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
functions. They pass nop_mnt_idmap and refuse to work on filesystems
which have any other idmap.
This work changes nfsd and cachefile to use the lookup_one family of
functions and to explictily pass &nop_mnt_idmap which is consistent
with all other vfs interfaces used where &nop_mnt_idmap is explicitly
passed.
The remaining uses of the "_one" functions do not require permission
checks so these are renamed to be "_noperm" and the permission
checking is removed.
This series also changes these lookup function to take a qstr instead
of separate name and len. In many cases this simplifies the call"
* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
VFS: change lookup_one_common and lookup_noperm_common to take a qstr
Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
cachefiles: Use lookup_one() rather than lookup_one_len()
nfsd: Use lookup_one() rather than lookup_one_len()
VFS: improve interface for lookup_one functions
Commit e77aff5528 ("binderfs: fix use-after-free in binder_devices")
addressed a use-after-free where devices could be released without first
being removed from the binder_devices list. However, there is a similar
path in binder_free_proc() that was missed:
==================================================================
BUG: KASAN: slab-use-after-free in binder_remove_device+0xd4/0x100
Write of size 8 at addr ffff0000c773b900 by task umount/467
CPU: 12 UID: 0 PID: 467 Comm: umount Not tainted 6.15.0-rc7-00138-g57483a362741 #9 PREEMPT
Hardware name: linux,dummy-virt (DT)
Call trace:
binder_remove_device+0xd4/0x100
binderfs_evict_inode+0x230/0x2f0
evict+0x25c/0x5dc
iput+0x304/0x480
dentry_unlink_inode+0x208/0x46c
__dentry_kill+0x154/0x530
[...]
Allocated by task 463:
__kmalloc_cache_noprof+0x13c/0x324
binderfs_binder_device_create.isra.0+0x138/0xa60
binder_ctl_ioctl+0x1ac/0x230
[...]
Freed by task 215:
kfree+0x184/0x31c
binder_proc_dec_tmpref+0x33c/0x4ac
binder_deferred_func+0xc10/0x1108
process_one_work+0x520/0xba4
[...]
==================================================================
Call binder_remove_device() within binder_free_proc() to ensure the
device is removed from the binder_devices list before being kfreed.
Cc: stable@vger.kernel.org
Fixes: 12d909cac1 ("binderfs: add new binder devices to binder_devices")
Reported-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4af454407ec393de51d6
Tested-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250524220758.915028-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Binder defines several seq_files that can be accessed via debugfs or
binderfs. Some of these files (e.g., 'state' and 'transactions')
contain more granular information about binder's internal state that
is helpful for debugging, but they also leak userspace address data
through user-defined 'cookie' or 'ptr' values. Consequently, access
to these files must be heavily restricted.
Add two new files, 'state_hashed' and 'transactions_hashed', that
reproduce the information in the original files but use the kernel's
raw pointer obfuscation to hash any potential user addresses. This
approach allows systems to grant broader access to the new files
without having to change the security policy around the existing ones.
In practice, userspace populates these fields with user addresses, but
within the driver, these values only serve as unique identifiers for
their associated binder objects. Consequently, binder logs can
obfuscate these values and still retain meaning. While this strategy
prevents leaking information about the userspace memory layout in the
existing log files, it also decouples log messages about binder
objects from their user-defined identifiers.
Acked-by: Carlos Llamas <cmllamas@google.com>
Tested-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250510013435.1520671-7-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binder driver outputs information about each dead binder node by
iterating over the dead nodes list, and it prints the state of each live
node in the system by traversing each binder_proc's proc->nodes tree.
Both cases require similar logic to maintain the global lock ordering
while accessing each node.
Create a helper function to synchronize around printing binder nodes in
a list. Opportunistically make minor cosmetic changes to binder print
functions.
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250510013435.1520671-5-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
virtual filesystem populating itself, or xfs accessing its ORPHANAGE
or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
been performed such as a network filesystem finding in "silly-rename"
file in the same directory. This is also the context after the
_parentat() functions where currently lookup_one_qstr_excl() is used.
So the permission check is pointless.
The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed. Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.
This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm". They are changed to receive a 'struct qstr' instead
of separate name and len. In a few cases the use of QSTR() results in a
new call to strlen().
try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr. This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().
The new lookup_noperm_common() doesn't take a qstr yet. That will be
tidied up in a subsequent patch.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
binder_internal.h is included only in the following two C files:
$ git grep binder_internal.h
drivers/android/binder.c:#include "binder_internal.h"
drivers/android/binderfs.c:#include "binder_internal.h"
Neither of these files use the EXPORT_SYMBOL macro, so including
<linux/export.h> is unnecessary.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20250217112756.1011333-1-masahiroy@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>