Commit Graph

1126431 Commits

Author SHA1 Message Date
Thomas Jarosch
b97df039a6 xfrm: Fix oops in __xfrm_state_delete()
Kernel 5.14 added a new "byseq" index to speed
up xfrm_state lookups by sequence number in commit
fe9f1d8779 ("xfrm: add state hashtable keyed by seq")

While the patch was thorough, the function pfkey_send_new_mapping()
in net/af_key.c also modifies x->km.seq and never added
the current xfrm_state to the "byseq" index.

This leads to the following kernel Ooops:
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    ..
    RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
    ..
    Call Trace:
    <TASK>
    xfrm_state_delete+0x1e/0x40
    xfrm_del_sa+0xb0/0x110 [xfrm_user]
    xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
    ? remove_entity_load_avg+0x8a/0xa0
    ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
    netlink_rcv_skb+0x51/0x100
    xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
    netlink_unicast+0x1a6/0x270
    netlink_sendmsg+0x22a/0x480
    __sys_sendto+0x1a6/0x1c0
    ? __audit_syscall_entry+0xd8/0x130
    ? __audit_syscall_exit+0x249/0x2b0
    __x64_sys_sendto+0x23/0x30
    do_syscall_64+0x3a/0x90
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

Exact location of the crash in __xfrm_state_delete():
    if (x->km.seq)
        hlist_del_rcu(&x->byseq);

The hlist_node "byseq" was never populated.

The bug only triggers if a new NAT traversal mapping (changed IP or port)
is detected in esp_input_done2() / esp6_input_done2(), which in turn
indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
with CONFIG_NET_KEY and "af_key" is active.

The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
Various implementations have been examined how they handle
the "sadb_msg_seq" header field:

- racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
- strongswan: does not care about sadb_msg_seq
- openswan: does not care about sadb_msg_seq

There is no standard how PF_KEYv2 sadb_msg_seq should be populated
for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
implementations either. Herbert Xu suggested we should just
use the current km.seq value as is. This fixes the root cause
of the oops since we no longer modify km.seq itself.

The update of "km.seq" looks like a copy'n'paste error
from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
number according to RFC 2367. It has been verified that code paths
involving pfkey_send_acquire() don't cause the same Oops.

PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

    commit cbc3488685b20e7b2a98ad387a1a816aada569d8
    Author:     Derek Atkins <derek@ihtfp.com>
    AuthorDate: Wed Apr 2 13:21:02 2003 -0800

        [IPSEC]: Implement UDP Encapsulation framework.

        In particular, implement ESPinUDP encapsulation for IPsec
        Nat Traversal.

A note on triggering the bug: I was not able to trigger it using VMs.
There is one VPN using a high latency link on our production VPN server
that triggered it like once a day though.

Link: https://github.com/strongswan/strongswan/issues/992
Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/

Fixes: fe9f1d8779 ("xfrm: add state hashtable keyed by seq")
Reported-by: Roth Mark <rothm@mail.com>
Reported-by: Zhihao Chen <chenzhihao@meizu.com>
Tested-by: Roth Mark <rothm@mail.com>
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Acked-by: Antony Antony <antony.antony@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-22 07:14:55 +01:00
Herbert Xu
7f57f8165c af_key: Fix send_acquire race with pfkey_register
The function pfkey_send_acquire may race with pfkey_register
(which could even be in a different name space).  This may result
in a buffer overrun.

Allocating the maximum amount of memory that could be used prevents
this.

Reported-by: syzbot+1e9af9185d8850e2c2fa@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-27 16:35:12 +02:00
Christian Langrock
4b549ccce9 xfrm: replay: Fix ESN wrap around for GSO
When using GSO it can happen that the wrong seq_hi is used for the last
packets before the wrap around. This can lead to double usage of a
sequence number. To avoid this, we should serialize this last GSO
packet.

Fixes: d7dbefc45c ("xfrm: Add xfrm_replay_overflow functions for offloading")
Co-developed-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-19 09:00:53 +02:00
Eyal Birger
d83f7040e1 xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available
Ido reported that a kernel warning [1] can be triggered from
user space when the kernel is compiled with CONFIG_MODULES=y and
CONFIG_XFRM=n when adding an xfrm encap type route, e.g:

$ ip route add 198.51.100.0/24 dev dummy1 encap xfrm if_id 1
Error: lwt encapsulation type not supported.

The reason for the warning is that the LWT infrastructure has an
autoloading feature which is meant only for encap types that don't
use a net device,  which is not the case in xfrm encap.

Mute this warning for xfrm encap as there's no encap module to autoload
in this case.

[1]
 WARNING: CPU: 3 PID: 2746262 at net/core/lwtunnel.c:57 lwtunnel_valid_encap_type+0x4f/0x120
[...]
 Call Trace:
  <TASK>
  rtm_to_fib_config+0x211/0x350
  inet_rtm_newroute+0x3a/0xa0
  rtnetlink_rcv_msg+0x154/0x3c0
  netlink_rcv_skb+0x49/0xf0
  netlink_unicast+0x22f/0x350
  netlink_sendmsg+0x208/0x440
  ____sys_sendmsg+0x21f/0x250
  ___sys_sendmsg+0x83/0xd0
  __sys_sendmsg+0x54/0xa0
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported-by: Ido Schimmel <idosch@idosch.org>
Fixes: 2c2493b9da ("xfrm: lwtunnel: add lwtunnel support for xfrm interfaces in collect_md mode")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-12 10:45:51 +02:00
Eyal Birger
3a5913183a xfrm: fix "disable_policy" on ipv4 early demux
The commit in the "Fixes" tag tried to avoid a case where policy check
is ignored due to dst caching in next hops.

However, when the traffic is locally consumed, the dst may be cached
in a local TCP or UDP socket as part of early demux. In this case the
"disable_policy" flag is not checked as ip_route_input_noref() was only
called before caching, and thus, packets after the initial packet in a
flow will be dropped if not matching policies.

Fix by checking the "disable_policy" flag also when a valid dst is
already available.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216557
Reported-by: Monil Patel <monil191989@gmail.com>
Fixes: e6175a2ed1 ("xfrm: fix "disable_policy" flag use when arriving from different devices")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>

----

v2: use dev instead of skb->dev
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-12 10:45:34 +02:00
Jakub Kicinski
1d22f78d05 Merge tag 'ieee802154-for-net-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2022-10-05

Only two patches this time around. A revert from Alexander Aring to a patch
that hit net and the updated patch to fix the problem from Tetsuo Handa.

* tag 'ieee802154-for-net-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
  net/ieee802154: don't warn zero-sized raw_sendmsg()
  Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
====================

Link: https://lore.kernel.org/r/20221005144508.787376-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05 20:38:46 -07:00
Alexandru Tachici
f93719351b net: ethernet: adi: adin1110: Add check in netdev_event
Check whether this driver actually is the intended recipient of
upper change event.

Fixes: bc93e19d08 ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Link: https://lore.kernel.org/r/20221003111636.54973-1-alexandru.tachici@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05 20:32:52 -07:00
Casper Andersson
229a002759 docs: networking: phy: add missing space
Missing space between "pins'" and "strength"

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20221004073242.304425-1-casper.casan@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05 20:32:39 -07:00
Geert Uytterhoeven
304ee24bdb net: pse-pd: PSE_REGULATOR should depend on REGULATOR
The Regulator based PSE controller driver relies on regulator support to
be enabled.  If regulator support is disabled, it will still compile
fine, but won't operate correctly.

Hence add a dependency on REGULATOR, to prevent asking the user about
this driver when configuring a kernel without regulator support.

Fixes: 66741b4e94 ("net: pse-pd: add regulator based PSE driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/709caac8873ff2a8b72b92091429be7c1a939959.1664900558.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05 20:32:28 -07:00
Vladimir Oltean
af7b29b1de Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"
taprio_attach() has this logic at the end, which should have been
removed with the blamed patch (which is now being reverted):

	/* access to the child qdiscs is not needed in offload mode */
	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
		kfree(q->qdiscs);
		q->qdiscs = NULL;
	}

because otherwise, we make use of q->qdiscs[] even after this array was
deallocated, namely in taprio_leaf(). Therefore, whenever one would try
to attach a valid child qdisc to a fully offloaded taprio root, one
would immediately dereference a NULL pointer.

$ tc qdisc replace dev eno0 handle 8001: parent root taprio \
	num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	max-sdu 0 0 0 0 0 200 0 0 \
	base-time 200 \
	sched-entry S 80 20000 \
	sched-entry S a0 20000 \
	sched-entry S 5f 60000 \
	flags 2
$ max_frame_size=1500
$ data_rate_kbps=20000
$ port_transmit_rate_kbps=1000000
$ idleslope=$data_rate_kbps
$ sendslope=$(($idleslope - $port_transmit_rate_kbps))
$ locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps))
$ hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps))
$ tc qdisc replace dev eno0 parent 8001:7 cbs \
	idleslope $idleslope \
	sendslope $sendslope \
	hicredit $hicredit \
	locredit $locredit \
	offload 0

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
pc : taprio_leaf+0x28/0x40
lr : qdisc_leaf+0x3c/0x60
Call trace:
 taprio_leaf+0x28/0x40
 tc_modify_qdisc+0xf0/0x72c
 rtnetlink_rcv_msg+0x12c/0x390
 netlink_rcv_skb+0x5c/0x130
 rtnetlink_rcv+0x1c/0x2c

The solution is not as obvious as the problem. The code which deallocates
q->qdiscs[] is in fact copied and pasted from mqprio, which also
deallocates the array in mqprio_attach() and never uses it afterwards.

Therefore, the identical cleanup logic of priv->qdiscs[] that
mqprio_destroy() has is deceptive because it will never take place at
qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
root), but rather, this is just the twisted way in which the Qdisc API
understands error path cleanup should be done (Qdisc_ops :: destroy() is
called even when Qdisc_ops :: init() never succeeded).

Side note, in fact this is also what the comment in mqprio_init() says:

	/* pre-allocate qdisc, attachment can't fail */

Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().

[ this comment was also copied and pasted into the initial taprio
  commit, even though taprio_attach() came way later ]

The problem is that taprio also makes extensive use of the q->qdiscs[]
array in the software fast path (taprio_enqueue() and taprio_dequeue()),
but it does not keep a reference of its own on q->qdiscs[i] (you'd think
that since it creates these Qdiscs, it holds the reference, but nope,
this is not completely true).

To understand the difference between taprio_destroy() and mqprio_destroy()
one must look before commit 13511704f8 ("net: taprio offload: enforce
qdisc to netdev queue mapping"), because that just muddied the waters.

In the "original" taprio design, taprio always attached itself (the root
Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
through taprio_enqueue().

It also called qdisc_refcount_inc() on itself for as many times as there
were netdev TX queues, in order to counter-balance what tc_get_qdisc()
does when destroying a Qdisc (simplified for brevity below):

	if (n->nlmsg_type == RTM_DELQDISC)
		err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);

qdisc_graft(where "new" is NULL so this deletes the Qdisc):

	for (i = 0; i < num_q; i++) {
		struct netdev_queue *dev_queue;

		dev_queue = netdev_get_tx_queue(dev, i);

		old = dev_graft_qdisc(dev_queue, new);
		if (new && i > 0)
			qdisc_refcount_inc(new);

		qdisc_put(old);
		~~~~~~~~~~~~~~
		this decrements taprio's refcount once for each TX queue
	}

	notify_and_destroy(net, skb, n, classid,
			   rtnl_dereference(dev->qdisc), new);
			   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			   and this finally decrements it to zero,
			   making qdisc_put() call qdisc_destroy()

The q->qdiscs[] created using qdisc_create_dflt() (or their
replacements, if taprio_graft() was ever to get called) were then
privately freed by taprio_destroy().

This is still what is happening after commit 13511704f8 ("net: taprio
offload: enforce qdisc to netdev queue mapping"), but only for software
mode.

In full offload mode, the per-txq "qdisc_put(old)" calls from
qdisc_graft() now deallocate the child Qdiscs rather than decrement
taprio's refcount. So when notify_and_destroy(taprio) finally calls
taprio_destroy(), the difference is that the child Qdiscs were already
deallocated.

And this is exactly why the taprio_attach() comment "access to the child
qdiscs is not needed in offload mode" is deceptive too. Not only the
q->qdiscs[] array is not needed, but it is also necessary to get rid of
it as soon as possible, because otherwise, we will also call qdisc_put()
on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
will cause a nasty use-after-free/refcount-saturate/whatever.

In short, the problem is that since the blamed commit, taprio_leaf()
needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
-> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
for full offload. Fixing one problem triggers the other.

All of this can be solved by making taprio keep its q->qdiscs[i] with a
refcount elevated at 2 (in offloaded mode where they are attached to the
netdev TX queues), both in taprio_attach() and in taprio_graft(). The
generic qdisc_graft() would just decrement the child qdiscs' refcounts
to 1, and taprio_destroy() would give them the final coup de grace.

However the rabbit hole of changes is getting quite deep, and the
complexity increases. The blamed commit was supposed to be a bug fix in
the first place, and the bug it addressed is not so significant so as to
justify further rework in stable trees. So I'd rather just revert it.
I don't know enough about multi-queue Qdisc design to make a proper
judgement right now regarding what is/isn't idiomatic use of Qdisc
concepts in taprio. I will try to study the problem more and come with a
different solution in net-next.

Fixes: 1461d212ab ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs")
Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05 20:32:15 -07:00
Tetsuo Handa
b12e924a2f net/ieee802154: don't warn zero-sized raw_sendmsg()
syzbot is hitting skb_assert_len() warning at __dev_queue_xmit() [1],
for PF_IEEE802154 socket's zero-sized raw_sendmsg() request is hitting
__dev_queue_xmit() with skb->len == 0.

Since PF_IEEE802154 socket's zero-sized raw_sendmsg() request was
able to return 0, don't call __dev_queue_xmit() if packet length is 0.

  ----------
  #include <sys/socket.h>
  #include <netinet/in.h>

  int main(int argc, char *argv[])
  {
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = htonl(INADDR_LOOPBACK) };
    struct iovec iov = { };
    struct msghdr hdr = { .msg_name = &addr, .msg_namelen = sizeof(addr), .msg_iov = &iov, .msg_iovlen = 1 };
    sendmsg(socket(PF_IEEE802154, SOCK_RAW, 0), &hdr, 0);
    return 0;
  }
  ----------

Note that this might be a sign that commit fd18942244 ("bpf: Don't
redirect packets with invalid pkt_len") should be reverted, for
skb->len == 0 was acceptable for at least PF_IEEE802154 socket.

Link: https://syzkaller.appspot.com/bug?extid=5ea725c25d06fb9114c4 [1]
Reported-by: syzbot <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com>
Fixes: fd18942244 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221005014750.3685555-2-aahringo@redhat.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-05 12:37:10 +02:00
Alexander Aring
2eb2756f6c Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
This reverts commit 3a4d061c69.

There is a v2 which does return zero if zero length is given.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221005014750.3685555-1-aahringo@redhat.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-05 12:34:07 +02:00
Linus Torvalds
0326074ff4 Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core:

   - Introduce and use a single page frag cache for allocating small skb
     heads, clawing back the 10-20% performance regression in UDP flood
     test from previous fixes.

   - Run packets which already went thru HW coalescing thru SW GRO. This
     significantly improves TCP segment coalescing and simplifies
     deployments as different workloads benefit from HW or SW GRO.

   - Shrink the size of the base zero-copy send structure.

   - Move TCP init under a new slow / sleepable version of DO_ONCE().

  BPF:

   - Add BPF-specific, any-context-safe memory allocator.

   - Add helpers/kfuncs for PKCS#7 signature verification from BPF
     programs.

   - Define a new map type and related helpers for user space -> kernel
     communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).

   - Allow targeting BPF iterators to loop through resources of one
     task/thread.

   - Add ability to call selected destructive functions. Expose
     crash_kexec() to allow BPF to trigger a kernel dump. Use
     CAP_SYS_BOOT check on the loading process to judge permissions.

   - Enable BPF to collect custom hierarchical cgroup stats efficiently
     by integrating with the rstat framework.

   - Support struct arguments for trampoline based programs. Only
     structs with size <= 16B and x86 are supported.

   - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
     sockets (instead of just TCP and UDP sockets).

   - Add a helper for accessing CLOCK_TAI for time sensitive network
     related programs.

   - Support accessing network tunnel metadata's flags.

   - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.

   - Add support for writing to Netfilter's nf_conn:mark.

  Protocols:

   - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
     (MLO) work (802.11be, WiFi 7).

   - vsock: improve support for SO_RCVLOWAT.

   - SMC: support SO_REUSEPORT.

   - Netlink: define and document how to use netlink in a "modern" way.
     Support reporting missing attributes via extended ACK.

   - IPSec: support collect metadata mode for xfrm interfaces.

   - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
     packets.

   - TCP: introduce optional per-netns connection hash table to allow
     better isolation between namespaces (opt-in, at the cost of memory
     and cache pressure).

   - MPTCP: support TCP_FASTOPEN_CONNECT.

   - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.

   - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.

   - Open vSwitch:
      - Allow specifying ifindex of new interfaces.
      - Allow conntrack and metering in non-initial user namespace.

   - TLS: support the Korean ARIA-GCM crypto algorithm.

   - Remove DECnet support.

  Driver API:

   - Allow selecting the conduit interface used by each port in DSA
     switches, at runtime.

   - Ethernet Power Sourcing Equipment and Power Device support.

   - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
     traffic class max frame size for time-based packet schedules.

   - Support PHY rate matching - adapting between differing host-side
     and link-side speeds.

   - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.

   - Validate OF (device tree) nodes for DSA shared ports; make
     phylink-related properties mandatory on DSA and CPU ports.
     Enforcing more uniformity should allow transitioning to phylink.

   - Require that flash component name used during update matches one of
     the components for which version is reported by info_get().

   - Remove "weight" argument from driver-facing NAPI API as much as
     possible. It's one of those magic knobs which seemed like a good
     idea at the time but is too indirect to use in practice.

   - Support offload of TLS connections with 256 bit keys.

  New hardware / drivers:

   - Ethernet:
      - Microchip KSZ9896 6-port Gigabit Ethernet Switch
      - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
      - Analog Devices ADIN1110 and ADIN2111 industrial single pair
        Ethernet (10BASE-T1L) MAC+PHY.
      - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).

   - Ethernet SFPs / modules:
      - RollBall / Hilink / Turris 10G copper SFPs
      - HALNy GPON module

   - WiFi:
      - CYW43439 SDIO chipset (brcmfmac)
      - CYW89459 PCIe chipset (brcmfmac)
      - BCM4378 on Apple platforms (brcmfmac)

  Drivers:

   - CAN:
      - gs_usb: HW timestamp support

   - Ethernet PHYs:
      - lan8814: cable diagnostics

   - Ethernet NICs:
      - Intel (100G):
         - implement control of FCS/CRC stripping
         - port splitting via devlink
         - L2TPv3 filtering offload
      - nVidia/Mellanox:
         - tunnel offload for sub-functions
         - MACSec offload, w/ Extended packet number and replay window
           offload
         - significantly restructure, and optimize the AF_XDP support,
           align the behavior with other vendors
      - Huawei:
         - configuring DSCP map for traffic class selection
         - querying standard FEC statistics
         - querying SerDes lane number via ethtool
      - Marvell/Cavium:
         - egress priority flow control
         - MACSec offload
      - AMD/SolarFlare:
         - PTP over IPv6 and raw Ethernet
      - small / embedded:
         - ax88772: convert to phylink (to support SFP cages)
         - altera: tse: convert to phylink
         - ftgmac100: support fixed link
         - enetc: standard Ethtool counters
         - macb: ZynqMP SGMII dynamic configuration support
         - tsnep: support multi-queue and use page pool
         - lan743x: Rx IP & TCP checksum offload
         - igc: add xdp frags support to ndo_xdp_xmit

   - Ethernet high-speed switches:
      - Marvell (prestera):
         - support SPAN port features (traffic mirroring)
         - nexthop object offloading
      - Microchip (sparx5):
         - multicast forwarding offload
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - support RGMII cmode
      - NXP (felix):
         - standardized ethtool counters
      - Microchip (lan966x):
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
         - traffic policing and mirroring
         - link aggregation / bonding offload
         - QUSGMII PHY mode support

   - Qualcomm 802.11ax WiFi (ath11k):
      - cold boot calibration support on WCN6750
      - support to connect to a non-transmit MBSSID AP profile
      - enable remain-on-channel support on WCN6750
      - Wake-on-WLAN support for WCN6750
      - support to provide transmit power from firmware via nl80211
      - support to get power save duration for each client
      - spectral scan support for 160 MHz

   - MediaTek WiFi (mt76):
      - WiFi-to-Ethernet bridging offload for MT7986 chips

   - RealTek WiFi (rtw89):
      - P2P support"

* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
  eth: pse: add missing static inlines
  once: rename _SLOW to _SLEEPABLE
  net: pse-pd: add regulator based PSE driver
  dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
  ethtool: add interface to interact with Ethernet Power Equipment
  net: mdiobus: search for PSE nodes by parsing PHY nodes.
  net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
  net: add framework to support Ethernet PSE and PDs devices
  dt-bindings: net: phy: add PoDL PSE property
  net: marvell: prestera: Propagate nh state from hw to kernel
  net: marvell: prestera: Add neighbour cache accounting
  net: marvell: prestera: add stub handler neighbour events
  net: marvell: prestera: Add heplers to interact with fib_notifier_info
  net: marvell: prestera: Add length macros for prestera_ip_addr
  net: marvell: prestera: add delayed wq and flush wq on deinit
  net: marvell: prestera: Add strict cleanup of fib arbiter
  net: marvell: prestera: Add cleanup of allocated fib_nodes
  net: marvell: prestera: Add router nexthops ABI
  eth: octeon: fix build after netif_napi_add() changes
  net/mlx5: E-Switch, Return EBUSY if can't get mode lock
  ...
2022-10-04 13:38:03 -07:00
Linus Torvalds
522667b24f Merge tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
 "Improve user help for Landlock (documentation and sample)"

* tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Fix documentation style
  landlock: Slightly improve documentation and fix spelling
  samples/landlock: Print hints about ABI versions
2022-10-04 11:13:38 -07:00
Linus Torvalds
c645c11a2d Merge tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
 "Six audit patches for v6.1, most are pretty trivial, but a quick list
  of the highlights are below:

   - Only free the audit proctitle information on task exit. This allows
     us to cache the information and improve performance slightly.

   - Use the time_after() macro to do time comparisons instead of doing
     it directly and potentially causing ourselves problems when the
     timer wraps.

   - Convert an audit_context state comparison from a relative enum
     comparison, e.g. (x < y), to a not-equal comparison to ensure that
     we are not caught out at some unknown point in the future by an
     enum shuffle.

   - A handful of small cleanups such as tidying up comments and
     removing unused declarations"

* tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: remove selinux_audit_rule_update() declaration
  audit: use time_after to compare time
  audit: free audit_proctitle only on task exit
  audit: explicitly check audit_context->context enum value
  audit: audit_context pid unused, context enum comment fix
  audit: fix repeated words in comments
2022-10-04 11:05:43 -07:00
Linus Torvalds
3eba620e7b Merge tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:

 - The usual round of smaller fixes and cleanups all over the tree

* tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
  x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32
  x86: Fix various duplicate-word comment typos
  x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
2022-10-04 10:24:11 -07:00
Linus Torvalds
193e2268a3 Merge tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:

 - More work by James Morse to disentangle the resctrl filesystem
   generic code from the architectural one with the endgoal of plugging
   ARM's MPAM implementation into it too so that the user interface
   remains the same

 - Properly restore the MSR_MISC_FEATURE_CONTROL value instead of
   blindly overwriting it to 0

* tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
  x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
  x86/resctrl: Rename and change the units of resctrl_cqm_threshold
  x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
  x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
  x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
  x86/resctrl: Abstract __rmid_read()
  x86/resctrl: Allow per-rmid arch private storage to be reset
  x86/resctrl: Add per-rmid arch private storage for overflow and chunks
  x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  x86/resctrl: Allow update_mba_bw() to update controls directly
  x86/resctrl: Remove architecture copy of mbps_val
  x86/resctrl: Switch over to the resctrl mbps_val list
  x86/resctrl: Create mba_sc configuration in the rdt_domain
  x86/resctrl: Abstract and use supports_mba_mbps()
  x86/resctrl: Remove set_mba_sc()s control array re-initialisation
  x86/resctrl: Add domain offline callback for resctrl work
  x86/resctrl: Group struct rdt_hw_domain cleanup
  x86/resctrl: Add domain online callback for resctrl work
  x86/resctrl: Merge mon_capable and mon_enabled
  ...
2022-10-04 10:14:58 -07:00
Linus Torvalds
b5f0b11353 Merge tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x75 microcode loader updates from Borislav Petkov:

 - Get rid of a single ksize() usage

 - By popular demand, print the previous microcode revision an update
   was done over

 - Remove more code related to the now gone MICROCODE_OLD_INTERFACE

 - Document the problems stemming from microcode late loading

* tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Track patch allocation size explicitly
  x86/microcode: Print previous version of microcode after reload
  x86/microcode: Remove ->request_microcode_user()
  x86/microcode: Document the whole late loading problem
2022-10-04 10:12:08 -07:00
Linus Torvalds
9bf445b65d Merge tag 'x86_paravirt_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt fix from Borislav Petkov:

 - Ensure paravirt patching site descriptors are aligned properly so
   that code can do proper arithmetic with their addresses

* tag 'x86_paravirt_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Ensure proper alignment
2022-10-04 10:03:40 -07:00
Linus Torvalds
901735e51e Merge tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Borislav Petkov:

 - Drop misleading "RIP" from the opcodes dumping message

 - Correct APM entry's Konfig help text

* tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Don't mention RIP in "Code: "
  x86/Kconfig: Specify idle=poll instead of no-hlt
2022-10-04 10:00:27 -07:00
Linus Torvalds
bb1f11546e Merge tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm update from Borislav Petkov:

 - Use the __builtin_ffs/ctzl() compiler builtins for the constant
   argument case in the kernel's optimized ffs()/ffz() helpers in order
   to make use of the compiler's constant folding optmization passes.

* tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
  x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
2022-10-04 09:49:50 -07:00
Linus Torvalds
8cded8fb12 Merge tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core fixes from Borislav Petkov:

 - Make sure an INT3 is slapped after every unconditional retpoline JMP
   as both vendors suggest

 - Clean up pciserial a bit

* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86,retpoline: Be sure to emit INT3 after JMP *%\reg
  x86/earlyprintk: Clean up pciserial
2022-10-04 09:46:22 -07:00
Linus Torvalds
5bb3a16dbe Merge tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC update from Borislav Petkov:

 - Add support for locking the APIC in X2APIC mode to prevent SGX
   enclave leaks

* tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Don't disable x2APIC if locked
2022-10-04 09:37:02 -07:00
Linus Torvalds
51eaa866a5 Merge tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:

 - Fix the APEI MCE callback handler to consult the hardware about the
   granularity of the memory error instead of hard-coding it

 - Offline memory pages on Intel machines after 2 errors reported per
   page

* tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Retrieve poison range from hardware
  RAS/CEC: Reduce offline page threshold for Intel systems
2022-10-04 09:33:12 -07:00
Linus Torvalds
7db99f01d1 Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:

 - Print the CPU number at segfault time.

   The number printed is not always accurate (preemption is enabled at
   that time) but the print string contains "likely" and after a lot of
   back'n'forth on this, this was the consensus that was reached. See
   thread at [1].

 - After a *lot* of testing and polishing, finally the clear_user()
   improvements to inline REP; STOSB by default

Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1]

* tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Print likely CPU at segfault time
  x86/clear_user: Make it faster
2022-10-04 09:21:30 -07:00
Linus Torvalds
ba94a7a900 Merge tag 'x86_sgx_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX update from Borislav Petkov:

 - Improve the documentation of a couple of SGX functions handling
   backing storage

* tag 'x86_sgx_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Improve comments for sgx_encl_lookup/alloc_backing()
2022-10-04 09:17:44 -07:00
Linus Torvalds
f8475a6749 Merge tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RTC cleanups from Borislav Petkov:

 - Cleanup x86/rtc.c and delete duplicated functionality in favor of
   using the respective functionality from the RTC library

* tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/rtc: Rename mach_set_rtc_mmss() to mach_set_cmos_time()
  x86/rtc: Rewrite & simplify mach_get_cmos_time() by deleting duplicated functionality
2022-10-04 09:13:21 -07:00
Linus Torvalds
3339914a58 Merge tag 'x86_platform_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform update from Borislav Petkov:
 "A single x86/platform improvement when the kernel is running as an
  ACRN guest:

   - Get TSC and CPU frequency from CPUID leaf 0x40000010 when the
     kernel is running as a guest on the ACRN hypervisor"

* tag 'x86_platform_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acrn: Set up timekeeping
2022-10-04 09:06:35 -07:00
Linus Torvalds
bf7676251b Merge tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:

 - Add support for Skylake-S CPUs to ie31200_edac

 - Improve error decoding speed of the Intel drivers by avoiding the
   ACPI facilities but doing decoding in the driver itself

 - Other misc improvements to the Intel drivers

 - The usual cleanups and fixlets all over EDAC land

* tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/i7300: Correct the i7300_exit() function name in comment
  x86/sb_edac: Add row column translation for Broadwell
  EDAC/i10nm: Print an extra register set of retry_rd_err_log
  EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
  EDAC/skx_common: Add ChipSelect ADXL component
  EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
  EDAC: Remove obsolete declarations in edac_module.h
  EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
  EDAC/skx_common: Make output format similar
  EDAC/skx_common: Use driver decoder first
  EDAC/mc: Drop duplicated dimm->nr_pages debug printout
  EDAC/mc: Replace spaces with tabs in memtype flags definition
  EDAC/wq: Remove unneeded flush_workqueue()
  EDAC/ie31200: Add Skylake-S support
2022-10-04 08:58:02 -07:00
Borislav Petkov
c257795609 Merge branches 'edac-drivers' and 'edac-misc' into edac-updates-for-v6.1
Combine all queued EDAC changes for submission into v6.1:

* edac-drivers:
  EDAC/ie31200: Add Skylake-S support

* edac-misc:
  EDAC/i7300: Correct the i7300_exit() function name in comment
  x86/sb_edac: Add row column translation for Broadwell
  EDAC/i10nm: Print an extra register set of retry_rd_err_log
  EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
  EDAC/skx_common: Add ChipSelect ADXL component
  EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
  EDAC: Remove obsolete declarations in edac_module.h
  EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
  EDAC/skx_common: Make output format similar
  EDAC/skx_common: Use driver decoder first
  EDAC/mc: Drop duplicated dimm->nr_pages debug printout
  EDAC/mc: Replace spaces with tabs in memtype flags definition
  EDAC/wq: Remove unneeded flush_workqueue()

Signed-off-by: Borislav Petkov <bp@suse.de>
2022-10-04 10:00:25 +02:00
Jakub Kicinski
681bf011b9 eth: pse: add missing static inlines
build bot reports missing 'static inline' qualifiers in the header.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 18ff0bcda6 ("ethtool: add interface to interact with Ethernet Power Equipment")
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20221004040327.2034878-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 21:52:33 -07:00
Linus Torvalds
725737e7c2 Merge tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull STATX_DIOALIGN support from Eric Biggers:
 "Make statx() support reporting direct I/O (DIO) alignment information.

  This provides a generic interface for userspace programs to determine
  whether a file supports DIO, and if so with what alignment
  restrictions. Specifically, STATX_DIOALIGN works on block devices, and
  on regular files when their containing filesystem has implemented
  support.

  An interface like this has been requested for years, since the
  conditions for when DIO is supported in Linux have gotten increasingly
  complex over time. Today, DIO support and alignment requirements can
  be affected by various filesystem features such as multi-device
  support, data journalling, inline data, encryption, verity,
  compression, checkpoint disabling, log-structured mode, etc.

  Further complicating things, Linux v6.0 relaxed the traditional rule
  of DIO needing to be aligned to the block device's logical block size;
  now user buffers (but not file offsets) only need to be aligned to the
  DMA alignment.

  The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
  discarded in favor of creating a clean new interface with statx().

  For more information, see the individual commits and the man page
  update[1]"

Link: https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org [1]

* tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  xfs: support STATX_DIOALIGN
  f2fs: support STATX_DIOALIGN
  f2fs: simplify f2fs_force_buffered_io()
  f2fs: move f2fs_force_buffered_io() into file.c
  ext4: support STATX_DIOALIGN
  fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
  vfs: support STATX_DIOALIGN on block devices
  statx: add direct I/O alignment information
2022-10-03 20:33:41 -07:00
Linus Torvalds
5779aa2dac Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
 "Minor changes to convert uses of kmap() to kmap_local_page()"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: use kmap_local_page() instead of kmap()
  fs-verity: use memcpy_from_page()
2022-10-03 20:27:34 -07:00
Linus Torvalds
438b2cdd17 Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
 "This release contains some implementation changes, but no new
  features:

   - Rework the implementation of the fscrypt filesystem-level keyring
     to not be as tightly coupled to the keyrings subsystem. This
     resolves several issues.

   - Eliminate most direct uses of struct request_queue from fs/crypto/,
     since struct request_queue is considered to be a block layer
     implementation detail.

   - Stop using the PG_error flag to track decryption failures. This is
     a prerequisite for freeing up PG_error for other uses"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: work on block_devices instead of request_queues
  fscrypt: stop holding extra request_queue references
  fscrypt: stop using keyrings subsystem for fscrypt_master_key
  fscrypt: stop using PG_error to track error status
  fscrypt: remove fscrypt_set_test_dummy_encryption()
2022-10-03 20:18:34 -07:00
Linus Torvalds
f4309528f3 Merge tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:

 - Fix a couple races found with a new torture test

 - Improve errors when api functions are used incorrectly

 - Improve tracing for lock requests from user space

 - Fix use after free in recently added tracing cod.

 - Small internal code cleanups

* tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  fs: dlm: fix possible use after free if tracing
  fs: dlm: const void resource name parameter
  fs: dlm: LSFL_CB_DELAY only for kernel lockspaces
  fs: dlm: remove DLM_LSFL_FS from uapi
  fs: dlm: trace user space callbacks
  fs: dlm: change ls_clear_proc_locks to spinlock
  fs: dlm: remove dlm_del_ast prototype
  fs: dlm: handle rcom in else if branch
  fs: dlm: allow lockspaces have zero lvblen
  fs: dlm: fix invalid derefence of sb_lvbptr
  fs: dlm: handle -EINVAL as log_error()
  fs: dlm: use __func__ for function name
  fs: dlm: handle -EBUSY first in unlock validation
  fs: dlm: handle -EBUSY first in lock arg validation
  fs: dlm: fix race between test_bit() and queue_work()
  fs: dlm: fix race in lowcomms
2022-10-03 20:11:59 -07:00
Linus Torvalds
f90497a16e Merge tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "This release is mostly bug fixes, clean-ups, and optimizations.

  One notable set of fixes addresses a subtle buffer overflow issue that
  occurs if a small RPC Call message arrives in an oversized RPC record.
  This is only possible on a framed RPC transport such as TCP.

  Because NFSD shares the receive and send buffers in one set of pages,
  an oversized RPC record steals pages from the send buffer that will be
  used to construct the RPC Reply message. NFSD must not assume that a
  full-sized buffer is always available to it; otherwise, it will walk
  off the end of the send buffer while constructing its reply.

  In this release, we also introduce the ability for the server to wait
  a moment for clients to return delegations before it responds with
  NFS4ERR_DELAY. This saves a retransmit and a network round- trip when
  a delegation recall is needed. This work will be built upon in future
  releases.

  The NFS server adds another shrinker to its collection. Because
  courtesy clients can linger for quite some time, they might be
  freeable when the server host comes under memory pressure. A new
  shrinker has been added that releases courtesy client resources during
  low memory scenarios.

  Lastly, of note: the maximum number of operations per NFSv4 COMPOUND
  that NFSD can handle is increased from 16 to 50. There are NFSv4
  client implementations that need more than 16 to successfully perform
  a mount operation that uses a pathname with many components"

* tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (53 commits)
  nfsd: extra checks when freeing delegation stateids
  nfsd: make nfsd4_run_cb a bool return function
  nfsd: fix comments about spinlock handling with delegations
  nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
  NFSD: fix use-after-free on source server when doing inter-server copy
  NFSD: Cap rsize_bop result based on send buffer size
  NFSD: Rename the fields in copy_stateid_t
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
  nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
  nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
  NFSD: Pack struct nfsd4_compoundres
  NFSD: Remove unused nfsd4_compoundargs::cachetype field
  NFSD: Remove "inline" directives on op_rsize_bop helpers
  NFSD: Clean up nfs4svc_encode_compoundres()
  SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment
  NFSD: Clean up WRITE arg decoders
  NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
  NFSD: Refactor common code out of dirlist helpers
  ...
2022-10-03 20:07:15 -07:00
Linus Torvalds
3497640a80 Merge tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "In this cycle, for container use cases, fscache-based shared domain is
  introduced [1] so that data blobs in the same domain will be storage
  deduplicated and it will also be used for page cache sharing later.

  Also, a special packed inode is now introduced to record inode
  fragments which keep the tail part of files by Yue Hu [2]. You can
  keep arbitary length or (at will) the whole file as a fragment and
  then fragments can be optionally compressed in the packed inode
  together and even deduplicated for smaller image sizes.

  In addition to that, global compressed data deduplication by sharing
  partial-referenced pclusters is also supported in this cycle.

  Summary:

   - Introduce fscache-based domain to share blobs between images

   - Support recording fragments in a special packed inode

   - Support partial-referenced pclusters for global compressed data
     deduplication

   - Fix an order >= MAX_ORDER warning due to crafted negative i_size

   - Several cleanups"

Link: https://lore.kernel.org/r/20220916085940.89392-1-zhujia.zj@bytedance.com [1]
Link: https://lore.kernel.org/r/cover.1663065968.git.huyue2@coolpad.com [2]

* tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: clean up erofs_iget()
  erofs: clean up unnecessary code and comments
  erofs: fold in z_erofs_reload_indexes()
  erofs: introduce partial-referenced pclusters
  erofs: support on-disk compressed fragments data
  erofs: support interlaced uncompressed data for compressed files
  erofs: clean up .read_folio() and .readahead() in fscache mode
  erofs: introduce 'domain_id' mount option
  erofs: Support sharing cookies in the same domain
  erofs: introduce a pseudo mnt to manage shared cookies
  erofs: introduce fscache-based domain
  erofs: code clean up for fscache
  erofs: use kill_anon_super() to kill super in fscache mode
  erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
2022-10-03 20:01:40 -07:00
Linus Torvalds
8bea8ff34a Merge tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull fatfs vfsuid conversion from Christian Brauner:
 "Last cycle we introduced the new vfs{g,u}id_t types that we had agreed
  on. The most important parts of the vfs have been converted but there
  are a few more places we need to switch before we can remove the old
  helpers completely.

  This cycle we converted all filesystems that called idmapped mount
  helpers directly. The affected filesystems are f2fs, fat, fuse, ksmbd,
  overlayfs, and xfs. We've sent patches for all of them. Looking at
  -next f2fs, ksmbd, overlayfs, and xfs have all picked up these patches
  and they should land in mainline during the v6.1 merge window.

  So all filesystems that have a separate tree should send the vfsuid
  conversion themselves. Onle the fat conversion is going through this
  generic fs trees because there is no fat tree.

  In order to change time settings on an inode fat checks that the
  caller either is the owner of the inode or the inode's group is in the
  caller's group list. If fat is on an idmapped mount we compare whether
  the inode mapped into the mount is equivalent to the caller's fsuid.
  If it isn't we compare whether the inode's group mapped into the mount
  is in the caller's group list.

  We now use the new vfsuid based helpers for that"

* tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fat: port to vfs{g,u}id_t and associated helpers
2022-10-03 19:54:29 -07:00
Linus Torvalds
223b845253 Merge tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs acl updates from Christian Brauner:
 "These are general fixes and preparatory changes related to the ongoing
  posix acl rework. The actual rework where we build a type safe posix
  acl api wasn't ready for this merge window but we're hopeful for the
  next merge window.

  General fixes:

   - Some filesystems like 9p and cifs have to implement custom posix
     acl handlers because they require access to the dentry in order to
     set and get posix acls while the set and get inode operations
     currently don't. But the ntfs3 filesystem has no such requirement
     and thus implemented custom posix acl xattr handlers when it really
     didn't have to. So this pr contains patch that just implements set
     and get inode operations for ntfs3 and switches it to rely on the
     generic posix acl xattr handlers. (We would've appreciated reviews
     from the ntfs3 maintainers but we didn't get any. But hey, if we
     really broke it we'll fix it. But fstests for ntfs3 said it's
     fine.)

   - The posix_acl_fix_xattr_common() helper has been adapted so it can
     be used by a few more callers and avoiding open-coding the same
     checks over and over.

  Other than the two general fixes this series introduces a new helper
  vfs_set_acl_prepare(). The reason for this helper is so that we can
  mitigate one of the source that change {g,u}id values directly in the
  uapi struct. With the vfs_set_acl_prepare() helper we can move the
  idmapped mount fixup into the generic posix acl set handler.

  The advantage of this is that it allows us to remove the
  posix_acl_setxattr_idmapped_mnt() helper which so far we had to call
  in vfs_setxattr() to account for idmapped mounts. While semantically
  correct the problem with this approach was that we had to keep the
  value parameter of the generic vfs_setxattr() call as non-const. This
  is rectified in this series.

  Ultimately, we will get rid of all the extreme kludges and type
  unsafety once we have merged the posix api - hopefully during the next
  merge window - built solely around get and set inode operations. Which
  incidentally will also improve handling of posix acls in security and
  especially in integrity modesl. While this will come with temporarily
  having two inode operation for posix acls that is nothing compared to
  the problems we have right now and so well worth it. We'll end up with
  something that we can actually reason about instead of needing to
  write novels to explain what's going on"

* tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  xattr: always us is_posix_acl_xattr() helper
  acl: fix the comments of posix_acl_xattr_set
  xattr: constify value argument in vfs_setxattr()
  ovl: use vfs_set_acl_prepare()
  acl: move idmapping handling into posix_acl_xattr_set()
  acl: add vfs_set_acl_prepare()
  acl: return EOPNOTSUPP in posix_acl_fix_xattr_common()
  ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
2022-10-03 19:48:54 -07:00
Linus Torvalds
da380aefdd Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull coredump fix from Al Viro:
 "Brown paper bag bug fix for the coredumping fix late in the 6.0
  release cycle"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [brown paperbag] fix coredump breakage
2022-10-03 18:03:58 -07:00
Linus Torvalds
26b84401da Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
 "Seven patches for the LSM layer and we've got a mix of trivial and
  significant patches. Highlights below, starting with the smaller bits
  first so they don't get lost in the discussion of the larger items:

   - Remove some redundant NULL pointer checks in the common LSM audit
     code.

   - Ratelimit the lockdown LSM's access denial messages.

     With this change there is a chance that the last visible lockdown
     message on the console is outdated/old, but it does help preserve
     the initial series of lockdown denials that started the denial
     message flood and my gut feeling is that these might be the more
     valuable messages.

   - Open userfaultfds as readonly instead of read/write.

     While this code obviously lives outside the LSM, it does have a
     noticeable impact on the LSMs with Ondrej explaining the situation
     in the commit description. It is worth noting that this patch
     languished on the VFS list for over a year without any comments
     (objections or otherwise) so I took the liberty of pulling it into
     the LSM tree after giving fair notice. It has been in linux-next
     since the end of August without any noticeable problems.

   - Add a LSM hook for user namespace creation, with implementations
     for both the BPF LSM and SELinux.

     Even though the changes are fairly small, this is the bulk of the
     diffstat as we are also including BPF LSM selftests for the new
     hook.

     It's also the most contentious of the changes in this pull request
     with Eric Biederman NACK'ing the LSM hook multiple times during its
     development and discussion upstream. While I've never taken NACK's
     lightly, I'm sending these patches to you because it is my belief
     that they are of good quality, satisfy a long-standing need of
     users and distros, and are in keeping with the existing nature of
     the LSM layer and the Linux Kernel as a whole.

     The patches in implement a LSM hook for user namespace creation
     that allows for a granular approach, configurable at runtime, which
     enables both monitoring and control of user namespaces. The general
     consensus has been that this is far preferable to the other
     solutions that have been adopted downstream including outright
     removal from the kernel, disabling via system wide sysctls, or
     various other out-of-tree mechanisms that users have been forced to
     adopt since we haven't been able to provide them an upstream
     solution for their requests. Eric has been steadfast in his
     objections to this LSM hook, explaining that any restrictions on
     the user namespace could have significant impact on userspace.
     While there is the possibility of impacting userspace, it is
     important to note that this solution only impacts userspace when it
     is requested based on the runtime configuration supplied by the
     distro/admin/user. Frederick (the pathset author), the LSM/security
     community, and myself have tried to work with Eric during
     development of this patchset to find a mutually acceptable
     solution, but Eric's approach and unwillingness to engage in a
     meaningful way have made this impossible. I have CC'd Eric directly
     on this pull request so he has a chance to provide his side of the
     story; there have been no objections outside of Eric's"

* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: ratelimit denial messages
  userfaultfd: open userfaultfds with O_RDONLY
  selinux: Implement userns_create hook
  selftests/bpf: Add tests verifying bpf lsm userns_create hook
  bpf-lsm: Make bpf_lsm_userns_create() sleepable
  security, lsm: Introduce security_create_user_ns()
  lsm: clean up redundant NULL pointer check
2022-10-03 17:51:52 -07:00
Linus Torvalds
e816da29bc Merge tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
 "Six SELinux patches, all are simple and easily understood, but a list
  of the highlights is below:

   - Use 'grep -E' instead of 'egrep' in the SELinux policy install
     script.

     Fun fact, this seems to be GregKH's *second* dedicated SELinux
     patch since we transitioned to git (ignoring merges, the SPDX
     stuff, and a trivial fs reference removal when lustre was yanked);
     the first was back in 2011 when selinuxfs was placed in
     /sys/fs/selinux. Oh, the memories ...

   - Convert the SELinux policy boolean values to use signed integer
     types throughout the SELinux kernel code.

     Prior to this we were using a mix of signed and unsigned integers
     which was probably okay in this particular case, but it is
     definitely not a good idea in general.

   - Remove a reference to the SELinux runtime disable functionality in
     /etc/selinux/config as we are in the process of deprecating that.

     See [1] for more background on this if you missed the previous
     notes on the deprecation.

   - Minor cleanups: remove unneeded variables and function parameter
     constification"

Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable [1]

* tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: remove runtime disable message in the install_policy.sh script
  selinux: use "grep -E" instead of "egrep"
  selinux: remove the unneeded result variable
  selinux: declare read-only parameters const
  selinux: use int arrays for boolean values
  selinux: remove an unneeded variable in sel_make_class_dir_entries()
2022-10-03 17:45:15 -07:00
Jakub Kicinski
e52f7c1ddf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in the left-over fixes before the net-next pull-request.

Conflicts:

drivers/net/ethernet/mediatek/mtk_ppe.c
  ae3ed15da5 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear")
  9d8cb4c096 ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc")
https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/

kernel/bpf/helpers.c
  8addbfc7b3 ("bpf: Gate dynptr API behind CAP_BPF")
  5679ff2f13 ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF")
  8a67f2de9b ("bpf: expose bpf_strtol and bpf_strtoul to all program types")
https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:44:18 -07:00
Linus Torvalds
eafb121ec0 Merge tag 'integrity-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Just two bug fixes"

* tag 'integrity-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  efi: Correct Macmini DMI match in uefi cert quirk
  ima: fix blocking of security.ima xattrs of unsupported algorithms
2022-10-03 17:42:12 -07:00
Linus Torvalds
74a0f84590 Merge tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
 "Two minor code clean-ups: one removes constants left over from the old
  mount API, while the other gets rid of an unneeded variable.

  The other change fixes a flaw in handling IPv6 labeling"

* tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next:
  smack: cleanup obsolete mount option flags
  smack: lsm: remove the unneeded result variable
  SMACK: Add sk_clone_security LSM hook
2022-10-03 17:38:09 -07:00
Jason A. Donenfeld
2a4187f440 once: rename _SLOW to _SLEEPABLE
The _SLOW designation wasn't really descriptive of anything. This is
meant to be called from process context when it's possible to sleep. So
name this more aptly _SLEEPABLE, which better fits its intended use.

Fixes: 62c07983be ("once: add DO_ONCE_SLOW() for sleepable contexts")
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221003181413.1221968-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:34:32 -07:00
Jakub Kicinski
331834898f Merge branch 'add-generic-pse-support'
Oleksij Rempel says:

====================
add generic PSE support

Add generic support for the Ethernet Power Sourcing Equipment.
====================

Link: https://lore.kernel.org/r/20221003065202.3889095-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:34:07 -07:00
Oleksij Rempel
66741b4e94 net: pse-pd: add regulator based PSE driver
Add generic, regulator based PSE driver to support simple Power Sourcing
Equipment without automatic classification support.

This driver was tested on 10Bast-T1L switch with regulator based PoDL PSE.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:33:57 -07:00
Oleksij Rempel
f05dfdaf56 dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
Add bindings for the regulator based Ethernet PoDL PSE controller and
generic bindings for all PSE controllers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:33:57 -07:00
Oleksij Rempel
18ff0bcda6 ethtool: add interface to interact with Ethernet Power Equipment
Add interface to support Power Sourcing Equipment. At current step it
provides generic way to address all variants of PSE devices as defined
in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
PoDL Power Sourcing Equipment (PSE).

Currently supported and mandatory objects are:
IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl

This is minimal interface needed to control PSE on each separate
ethernet port but it provides not all mandatory objects specified in
IEEE 802.3-2018.

Since "PoDL PSE" and "PSE" have similar names, but some different values
I decide to not merge them and keep separate naming schema. This should
allow as to be as close to IEEE 802.3 spec as possible and avoid name
conflicts in the future.

This implementation is connected to PHYs instead of MACs because PSE
auto classification can potentially interfere with PHY auto negotiation.
So, may be some extra PHY related initialization will be needed.

With WIP version of ethtools interaction with PSE capable link looks
as following:

$ ip l
...
5: t1l1@eth0: <BROADCAST,MULTICAST> ..
...

$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: disabled
PoDL PSE Power Detection Status: disabled

$ ethtool --set-pse t1l1 podl-pse-admin-control enable
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power

Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:33:57 -07:00