Commit Graph

142697 Commits

Author SHA1 Message Date
Kumar Kartikeya Dwivedi
f71b2f6417 bpf: Refactor map->off_arr handling
Refactor map->off_arr handling into generic functions that can work on
their own without hardcoding map specific code. The btf_fields_offs
structure is now returned from btf_parse_field_offs, which can be reused
later for types in program BTF.

All functions like copy_map_value, zero_map_value call generic
underlying functions so that they can also be reused later for copying
to values allocated in programs which encode specific fields.

Later, some helper functions will also require access to this
btf_field_offs structure to be able to skip over special fields at
runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221103191013.1236066-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-03 23:09:40 -07:00
Kumar Kartikeya Dwivedi
db55911782 bpf: Consolidate spin_lock, timer management into btf_record
Now that kptr_off_tab has been refactored into btf_record, and can hold
more than one specific field type, accomodate bpf_spin_lock and
bpf_timer as well.

While they don't require any more metadata than offset, having all
special fields in one place allows us to share the same code for
allocated user defined types and handle both map values and these
allocated objects in a similar fashion.

As an optimization, we still keep spin_lock_off and timer_off offsets in
the btf_record structure, just to avoid having to find the btf_field
struct each time their offset is needed. This is mostly needed to
manipulate such objects in a map value at runtime. It's ok to hardcode
just one offset as more than one field is disallowed.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-03 22:19:40 -07:00
Kumar Kartikeya Dwivedi
aa3496accc bpf: Refactor kptr_off_tab into btf_record
To prepare the BPF verifier to handle special fields in both map values
and program allocated types coming from program BTF, we need to refactor
the kptr_off_tab handling code into something more generic and reusable
across both cases to avoid code duplication.

Later patches also require passing this data to helpers at runtime, so
that they can work on user defined types, initialize them, destruct
them, etc.

The main observation is that both map values and such allocated types
point to a type in program BTF, hence they can be handled similarly. We
can prepare a field metadata table for both cases and store them in
struct bpf_map or struct btf depending on the use case.

Hence, refactor the code into generic btf_record and btf_field member
structs. The btf_record represents the fields of a specific btf_type in
user BTF. The cnt indicates the number of special fields we successfully
recognized, and field_mask is a bitmask of fields that were found, to
enable quick determination of availability of a certain field.

Subsequently, refactor the rest of the code to work with these generic
types, remove assumptions about kptr and kptr_off_tab, rename variables
to more meaningful names, etc.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-03 21:44:53 -07:00
Kumar Kartikeya Dwivedi
23da464dd6 bpf: Allow specifying volatile type modifier for kptrs
This is useful in particular to mark the pointer as volatile, so that
compiler treats each load and store to the field as a volatile access.
The alternative is having to define and use READ_ONCE and WRITE_ONCE in
the BPF program.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221103191013.1236066-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-03 19:31:17 -07:00
Jakub Kicinski
fbeb229a66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03 13:21:54 -07:00
Linus Torvalds
9521c9d6a5 Merge tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - net: several zerocopy flags fixes

   - netfilter: fix possible memory leak in nf_nat_init()

   - openvswitch: add missing .resv_start_op

  Previous releases - regressions:

   - neigh: fix null-ptr-deref in neigh_table_clear()

   - sched: fix use after free in red_enqueue()

   - dsa: fall back to default tagger if we can't load the one from DT

   - bluetooth: fix use-after-free in l2cap_conn_del()

  Previous releases - always broken:

   - netfilter: netlink notifier might race to release objects

   - nfc: fix potential memory leak of skb

   - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu

   - bluetooth: use skb_put to set length

   - eth: tun: fix bugs for oversize packet when napi frags enabled

   - eth: lan966x: fixes for when MTU is changed

   - eth: dwmac-loongson: fix invalid mdio_node"

* tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  vsock: fix possible infinite sleep in vsock_connectible_wait_data()
  vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
  ipv6: fix WARNING in ip6_route_net_exit_late()
  bridge: Fix flushing of dynamic FDB entries
  net, neigh: Fix null-ptr-deref in neigh_table_clear()
  net/smc: Fix possible leaked pernet namespace in smc_init()
  stmmac: dwmac-loongson: fix invalid mdio_node
  ibmvnic: Free rwi on reset success
  net: mdio: fix undefined behavior in bit shift for __mdiobus_register
  Bluetooth: L2CAP: Fix attempting to access uninitialized memory
  Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
  Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
  Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
  Bluetooth: L2CAP: Fix memory leak in vhci_write
  Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
  Bluetooth: virtio_bt: Use skb_put to set length
  Bluetooth: hci_conn: Fix CIS connection dst_type handling
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
  netfilter: ipset: enforce documented limit to prevent allocating huge memory
  isdn: mISDN: netjet: fix wrong check of device registration
  ...
2022-11-03 10:51:59 -07:00
Linus Torvalds
4d74039149 Merge tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix an endian thinko in the asm-generic compat_arg_u64() which led to
   syscall arguments being swapped for some compat syscalls.

 - Fix syscall wrapper handling of syscalls with 64-bit arguments on
   32-bit kernels, which led to syscall arguments being misplaced.

 - A build fix for amdgpu on Book3E with AltiVec disabled.

Thanks to Andreas Schwab, Christian Zigotzky, and Arnd Bergmann.

* tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Select ARCH_SPLIT_ARG64
  powerpc/32: fix syscall wrappers with 64-bit arguments
  asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
  powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
2022-11-03 10:27:28 -07:00
Daniel Machon
6182d5875c net: dcb: add new apptrust attribute
Add new apptrust extension attributes to the 8021Qaz APP managed object.

Two new attributes, DCB_ATTR_DCB_APP_TRUST_TABLE and
DCB_ATTR_DCB_APP_TRUST, has been added. Trusted selectors are passed in
the nested attribute DCB_ATTR_DCB_APP_TRUST, in order of precedence.

The new attributes are meant to allow drivers, whose hw supports the
notion of trust, to be able to set whether a particular app selector is
trusted - and in which order.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03 15:16:50 +01:00
Daniel Machon
ec32c0c42d net: dcb: add new pcp selector to app object
Add new PCP selector for the 8021Qaz APP managed object.

As the PCP selector is not part of the 8021Qaz standard, a new non-std
extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
helper functions to translate between selector and app attribute type
has been added. The new selector has been given a value of 255, to
minimize the risk of future overlap of std- and non-std attributes.

The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
app table. This means that the dcb_app struct can now both contain std-
and non-std app attributes. Currently there is no overlap between the
selector values of the two attributes.

The purpose of adding the PCP selector, is to be able to offload
PCP-based queue classification to the 8021Q Priority Code Point table,
see 6.9.3 of IEEE Std 802.1Q-2018.

PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03 15:16:50 +01:00
Jiri Slaby (SUSE)
777fa87c76 bonding (gcc13): synchronize bond_{a,t}lb_xmit() types
Both bond_alb_xmit() and bond_tlb_xmit() produce a valid warning with
gcc-13:
  drivers/net/bonding/bond_alb.c:1409:13: error: conflicting types for 'bond_tlb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
  include/net/bond_alb.h:160:5: note: previous declaration of 'bond_tlb_xmit' with type 'int(struct sk_buff *, struct net_device *)'

  drivers/net/bonding/bond_alb.c:1523:13: error: conflicting types for 'bond_alb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
  include/net/bond_alb.h:159:5: note: previous declaration of 'bond_alb_xmit' with type 'int(struct sk_buff *, struct net_device *)'

I.e. the return type of the declaration is int, while the definitions
spell netdev_tx_t. Synchronize both of them to the latter.

Cc: Martin Liska <mliska@suse.cz>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114409.10417-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02 20:38:13 -07:00
Jakub Kicinski
b54a0d4094 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2022-11-02

We've added 70 non-merge commits during the last 14 day(s) which contain
a total of 96 files changed, 3203 insertions(+), 640 deletions(-).

The main changes are:

1) Make cgroup local storage available to non-cgroup attached BPF programs
   such as tc BPF ones, from Yonghong Song.

2) Avoid unnecessary deadlock detection and failures wrt BPF task storage
   helpers, from Martin KaFai Lau.

3) Add LLVM disassembler as default library for dumping JITed code
   in bpftool, from Quentin Monnet.

4) Various kprobe_multi_link fixes related to kernel modules,
   from Jiri Olsa.

5) Optimize x86-64 JIT with emitting BMI2-based shift instructions,
   from Jie Meng.

6) Improve BPF verifier's memory type compatibility for map key/value
   arguments, from Dave Marchevsky.

7) Only create mmap-able data section maps in libbpf when data is exposed
   via skeletons, from Andrii Nakryiko.

8) Add an autoattach option for bpftool to load all object assets,
   from Wang Yufen.

9) Various memory handling fixes for libbpf and BPF selftests,
   from Xu Kuohai.

10) Initial support for BPF selftest's vmtest.sh on arm64,
    from Manu Bretelle.

11) Improve libbpf's BTF handling to dedup identical structs,
    from Alan Maguire.

12) Add BPF CI and denylist documentation for BPF selftests,
    from Daniel Müller.

13) Check BPF cpumap max_entries before doing allocation work,
    from Florian Lehner.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits)
  samples/bpf: Fix typo in README
  bpf: Remove the obsolte u64_stats_fetch_*_irq() users.
  bpf: check max_entries before allocating memory
  bpf: Fix a typo in comment for DFS algorithm
  bpftool: Fix spelling mistake "disasembler" -> "disassembler"
  selftests/bpf: Fix bpftool synctypes checking failure
  selftests/bpf: Panic on hard/soft lockup
  docs/bpf: Add documentation for new cgroup local storage
  selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x
  selftests/bpf: Add selftests for new cgroup local storage
  selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str
  bpftool: Support new cgroup local storage
  libbpf: Support new cgroup local storage
  bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
  bpf: Refactor some inode/task/sk storage functions for reuse
  bpf: Make struct cgroup btf id global
  selftests/bpf: Tracing prog can still do lookup under busy lock
  selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection
  bpf: Add new bpf_task_storage_delete proto with no deadlock detection
  bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check
  ...
====================

Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02 08:18:27 -07:00
Shane Parslow
d08b0f8f46 net: wwan: iosm: add rpc interface for xmm modems
Add a new iosm wwan port that connects to the modem rpc interface. This
interface provides a configuration channel, and in the case of the 7360, is
the only way to configure the modem (as it does not support mbim).

The new interface is compatible with existing software, such as
open_xdatachannel.py from the xmm7360-pci project [1].

[1] https://github.com/xmm7360/xmm7360-pci

Signed-off-by: Shane Parslow <shaneparslow808@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-02 11:51:03 +00:00
Florian Westphal
ecaf75ffd5 netlink: introduce bigendian integer types
Jakub reported that the addition of the "network_byte_order"
member in struct nla_policy increases size of 32bit platforms.

Instead of scraping the bit from elsewhere Johannes suggested
to add explicit NLA_BE types instead, so do this here.

NLA_POLICY_MAX_BE() macro is removed again, there is no need
for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.

NLA_BE64 can be added later.

Fixes: 08724ef699 ("netlink: introduce NLA_POLICY_MAX_BE")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01 21:29:06 -07:00
Linus Torvalds
f526d6a822 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "x86:

   - fix lock initialization race in gfn-to-pfn cache (+selftests)

   - fix two refcounting errors

   - emulator fixes

   - mask off reserved bits in CPUID

   - fix bug with disabling SGX

  RISC-V:

   - update MAINTAINERS"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
  KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
  KVM: x86: emulator: update the emulation mode after CR0 write
  KVM: x86: emulator: update the emulation mode after rsm
  KVM: x86: emulator: introduce emulator_recalc_and_set_mode
  KVM: x86: emulator: em_sysexit should update ctxt->mode
  KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test
  KVM: selftests: Add tests in xen_shinfo_test to detect lock races
  KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
  KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
  KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
  KVM: x86: Exempt pending triple fault from event injection sanity check
  MAINTAINERS: git://github -> https://github.com for kvm-riscv
  KVM: debugfs: Return retval of simple_attr_open() if it fails
  KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
  KVM: x86: Mask off reserved bits in CPUID.8000001FH
  KVM: x86: Mask off reserved bits in CPUID.8000001AH
  KVM: x86: Mask off reserved bits in CPUID.80000008H
  KVM: x86: Mask off reserved bits in CPUID.80000006H
  KVM: x86: Mask off reserved bits in CPUID.80000001H
2022-11-01 12:28:52 -07:00
Eric Dumazet
3bdfb04f13 net: dropreason: add SKB_DROP_REASON_FRAG_TOO_FAR
IPv4 reassembly unit can decide to drop frags based on
/proc/sys/net/ipv4/ipfrag_max_dist sysctl.

Add a specific drop reason to track this specific
and weird case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 20:14:27 -07:00
Eric Dumazet
77adfd3a1d net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT
Used to track skbs freed after a timeout happened
in a reassmbly unit.

Passing a @reason argument to inet_frag_rbtree_purge()
allows to use correct consumed status for frags
that have been successfully re-assembled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 20:14:27 -07:00
Eric Dumazet
4ecbb1c27c net: dropreason: add SKB_DROP_REASON_DUP_FRAG
This is used to track when a duplicate segment received by various
reassembly units is dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 20:14:26 -07:00
Eric Dumazet
0e84afe8eb net: dropreason: add SKB_CONSUMED reason
This will allow to simply use in the future:

	kfree_skb_reason(skb, reason);

Instead of repeating sequences like:

	if (dropped)
	    kfree_skb_reason(skb, reason);
	else
	    consume_skb(skb);

For instance, following patch in the series is adding
@reason to skb_release_data() and skb_release_all(),
so that we can propagate a meaningful @reason whenever
consume_skb()/kfree_skb() have to take care of a potential frag_list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 20:14:26 -07:00
Hangbin Liu
f3a63cce1b rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link
This patch use the new helper unregister_netdevice_many_notify() for
rtnl_delete_link(), so that the kernel could reply unicast when userspace
 set NLM_F_ECHO flag to request the new created interface info.

At the same time, the parameters of rtnl_delete_link() need to be updated
since we need nlmsghdr and portid info.

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 18:10:21 -07:00
Hangbin Liu
1d997f1013 rtnetlink: pass netlink message header and portid to rtnl_configure_link()
This patch pass netlink message header and portid to rtnl_configure_link()
All the functions in this call chain need to add the parameters so we can
use them in the last call rtnl_notify(), and notify the userspace about
the new link info if NLM_F_ECHO flag is set.

- rtnl_configure_link()
  - __dev_notify_flags()
    - rtmsg_ifinfo()
      - rtmsg_ifinfo_event()
        - rtmsg_ifinfo_build_skb()
        - rtmsg_ifinfo_send()
	  - rtnl_notify()

Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub
suggested.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 18:10:21 -07:00
Andreas Schwab
40ff214328 asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
The macros are defined backwards.

This affects the following compat syscalls:
 - compat_sys_truncate64()
 - compat_sys_ftruncate64()
 - compat_sys_fallocate()
 - compat_sys_sync_file_range()
 - compat_sys_fadvise64_64()
 - compat_sys_readahead()
 - compat_sys_pread64()
 - compat_sys_pwrite64()

Fixes: 43d5de2b67 ("asm-generic: compat: Support BE for long long args in 32-bit ABIs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Add list of affected syscalls]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/871qqoyvni.fsf_-_@igel.home
2022-11-01 10:20:11 +11:00
Jacob Keller
1060707e38 ptp: introduce helpers to adjust by scaled parts per million
Many drivers implement the .adjfreq or .adjfine PTP op function with the
same basic logic:

  1. Determine a base frequency value
  2. Multiply this by the abs() of the requested adjustment, then divide by
     the appropriate divisor (1 billion, or 65,536 billion).
  3. Add or subtract this difference from the base frequency to calculate a
     new adjustment.

A few drivers need the difference and direction rather than the combined
new increment value.

I recently converted the Intel drivers to .adjfine and the scaled parts per
million (65.536 parts per billion) logic. To avoid overflow with minimal
loss of precision, mul_u64_u64_div_u64 was used.

The basic logic used by all of these drivers is very similar, and leads to
a lot of duplicate code to perform the same task.

Rather than keep this duplicate code, introduce diff_by_scaled_ppm and
adjust_by_scaled_ppm. These helper functions calculate the difference or
adjustment necessary based on the scaled parts per million input.

The diff_by_scaled_ppm function returns true if the difference should be
subtracted, and false otherwise.

Update the Intel drivers to use the new helper functions. Other vendor
drivers will be converted to .adjfine and this helper function in the
following changes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31 11:14:16 +00:00
Jacob Keller
b9a61b9779 ptp: add missing documentation for parameters
The ptp_find_pin_unlocked function and the ptp_system_timestamp structure
didn't document their parameters and fields. Fix this.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31 11:14:16 +00:00
Jakub Kicinski
8c2a535e08 net: geneve: fix array of flexible structures warnings
New compilers don't like flexible array of flexible structs:

  include/net/geneve.h:62:34: warning: array of flexible structures

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31 10:43:04 +00:00
Jakub Kicinski
738136a0e3 netlink: split up copies in the ack construction
Clean up the use of unsafe_memcpy() by adding a flexible array
at the end of netlink message header and splitting up the header
and data copies.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31 09:13:10 +00:00
Juhee Kang
f3fb589aeb net: remove unused netdev_unregistering()
Currently, use dev->reg_state == NETREG_UNREGISTERING to check the status
which is NETREG_UNREGISTERING, rather than using netdev_unregistering.
Also, A helper function which is netdev_unregistering on nedevice.h is no
longer used. Thus, netdev_unregistering removes from netdevice.h.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-30 21:56:39 +00:00
Linus Torvalds
b72018ab82 Merge tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
 "A use-after-free bugfix in the smscufx driver and various minor error
  path fixes, smaller build fixes, sysfs fixes and typos in comments in
  the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb
  drivers"

* tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: cyber2000fb: fix missing pci_disable_device()
  fbdev: sisfb: use explicitly signed char
  fbdev: smscufx: Fix several use-after-free bugs
  fbdev: xilinxfb: Make xilinxfb_release() return void
  fbdev: sisfb: fix repeated word in comment
  fbdev: gbefb: Convert sysfs snprintf to sysfs_emit
  fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit
  fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
  fbdev: da8xx-fb: Fix error handling in .remove()
  fbdev: MIPS supports iomem addresses
2022-10-30 11:31:14 -07:00
Linus Torvalds
9f127546bb Merge tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
 "Some small driver fixes for 6.1-rc3.  They include:

   - iio driver bugfixes

   - counter driver bugfixes

   - coresight bugfixes, including a revert and then a second fix to get
     it right.

  All of these have been in linux-next with no reported problems"

* tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  misc: sgi-gru: use explicitly signed char
  coresight: cti: Fix hang in cti_disable_hw()
  Revert "coresight: cti: Fix hang in cti_disable_hw()"
  counter: 104-quad-8: Fix race getting function mode and direction
  counter: microchip-tcb-capture: Handle Signal1 read and Synapse
  coresight: cti: Fix hang in cti_disable_hw()
  coresight: Fix possible deadlock with lock dependency
  counter: ti-ecap-capture: fix IS_ERR() vs NULL check
  counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array
  iio: bmc150-accel-core: Fix unsafe buffer attributes
  iio: adxl367: Fix unsafe buffer attributes
  iio: adxl372: Fix unsafe buffer attributes
  iio: at91-sama5d2_adc: Fix unsafe buffer attributes
  iio: temperature: ltc2983: allocate iio channels once
  tools: iio: iio_utils: fix digit calculation
  iio: adc: stm32-adc: fix channel sampling time init
  iio: adc: mcp3911: mask out device ID in debug prints
  iio: adc: mcp3911: use correct id bits
  iio: adc: mcp3911: return proper error code on failure to allocate trigger
  iio: adc: mcp3911: fix sizeof() vs ARRAY_SIZE() bug
  ...
2022-10-30 11:22:33 -07:00
Linus Torvalds
434766058e Merge tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Rename a perf memory level event define to denote it is of CXL type

 - Add Alder and Raptor Lakes support to RAPL

 - Make sure raw sample data is output with tracepoints

* tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
  perf/x86/rapl: Add support for Intel Raptor Lake
  perf/x86/rapl: Add support for Intel AlderLake-N
  perf: Fix missing raw data on tracepoint events
2022-10-30 09:49:18 -07:00
Linus Torvalds
c6e0e874a8 Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - make the multipath dma alignment match the non-multipath one
        (Keith Busch)
      - fix a bogus use of sg_init_marker() (Nam Cao)
      - fix circulr locking in nvme-tcp (Sagi Grimberg)

 - Initialization fix for requests allocated via the special hw queue
   allocator (John)

 - Fix for a regression added in this release with the batched
   completions of end_io backed requests (Ming)

 - Error handling leak fix for rbd (Yang)

 - Error handling leak fix for add_disk() failure (Yu)

* tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
  blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
  blk-mq: don't add non-pt request with ->end_io to batch
  rbd: fix possible memory leak in rbd_sysfs_init()
  nvme-multipath: set queue dma alignment to 3
  nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
  nvme-tcp: replace sg_init_marker() with sg_init_table()
  block: fix memory leak for elevator on add_disk failure
2022-10-29 18:06:52 -07:00
Linus Torvalds
3c339dbd13 Merge tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
 "Eight fix pre-6.0 bugs and the remainder address issues which were
  introduced in the 6.1-rc merge cycle, or address issues which aren't
  considered sufficiently serious to warrant a -stable backport"

* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
  lib: maple_tree: remove unneeded initialization in mtree_range_walk()
  mmap: fix remap_file_pages() regression
  mm/shmem: ensure proper fallback if page faults
  mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
  x86: fortify: kmsan: fix KMSAN fortify builds
  x86: asm: make sure __put_user_size() evaluates pointer once
  Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
  x86/purgatory: disable KMSAN instrumentation
  mm: kmsan: export kmsan_copy_page_meta()
  mm: migrate: fix return value if all subpages of THPs are migrated successfully
  mm/uffd: fix vma check on userfault for wp
  mm: prep_compound_tail() clear page->private
  mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
  mm/page_isolation: fix clang deadcode warning
  fs/ext4/super.c: remove unused `deprecated_msg'
  ipc/msg.c: fix percpu_counter use after free
  memory tier, sysfs: rename attribute "nodes" to "nodelist"
  MAINTAINERS: git://github.com -> https://github.com for nilfs2
  mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
  ...
2022-10-29 17:49:33 -07:00
Willem de Bruijn
58ba426388 net/packet: add PACKET_FANOUT_FLAG_IGNORE_OUTGOING
Extend packet socket option PACKET_IGNORE_OUTGOING to fanout groups.

The socket option sets ptype.ignore_outgoing, which makes
dev_queue_xmit_nit skip the socket.

When the socket joins a fanout group, the option is not reflected in
the struct ptype of the group. dev_queue_xmit_nit only tests the
fanout ptype, so the flag is ignored once a socket joins a
fanout group.

Inheriting the option from a socket would change established behavior.
Different sockets in the group can set different flags, and can also
change them at runtime.

Testing in packet_rcv_fanout defeats the purpose of the original
patch, which is to avoid skb_clone in dev_queue_xmit_nit (esp. for
MSG_ZEROCOPY packets).

Instead, introduce a new fanout group flag with the same behavior.

Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/test_psock_fanout_ignore_outgoing.c

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221027211014.3581513-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 22:00:49 -07:00
Russell King (Oracle)
d83845d224 net: sfp: move field definitions along side register index
Just as we do for the A2h enum, arrange the A0h enum to have the
field definitions next to their corresponding register index.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 21:56:18 -07:00
Russell King (Oracle)
17dd361119 net: sfp: convert register indexes from hex to decimal
The register indexes in the standards are in decimal rather than hex,
so lets specify them in decimal in the header file so we can easily
cross-reference without converting between hex and decimal.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 21:56:18 -07:00
Russell King (Oracle)
9c5a170677 net: phylink: add phylink_get_link_timer_ns() helper
Add a helper to convert the PHY interface mode to the required link
timer setting as stated by the appropriate standard. Inappropriate
interface modes return an error.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 21:48:30 -07:00
Pavel Begunkov
fee9ac0664 net: remove SOCK_SUPPORT_ZC from sockmap
sockmap replaces ->sk_prot with its own callbacks, we should remove
SOCK_SUPPORT_ZC as the new proto doesn't support msghdr::ubuf_info.

Cc: <stable@vger.kernel.org> # 6.0
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: e993ffe3da ("net: flag sockets supporting msghdr originated zerocopy")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 20:21:25 -07:00
Jakub Kicinski
7354c9024f netlink: hide validation union fields from kdoc
Mark the validation fields as private, users shouldn't set
them directly and they are too complicated to explain in
a more succinct way (there's already a long explanation
in the comment above).

The strict_start_type field is set directly and has a dedicated
comment so move that above the "private" section.

Link: https://lore.kernel.org/r/20221027212107.2639255-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 20:19:15 -07:00
Jakub Kicinski
196dd92a00 Kalle Valo says:
====================
pull-request: wireless-next-2022-10-28

First set of patches v6.2. mac80211 refactoring continues for Wi-Fi 7.
All mac80211 driver are now converted to use internal TX queues, this
might cause some regressions so we wanted to do this early in the
cycle.

Note: wireless tree was merged[1] to wireless-next to avoid some
conflicts with mac80211 patches between the trees. Unfortunately there
are still two smaller conflicts in net/mac80211/util.c which Stephen
also reported[2]. In the first conflict initialise scratch_len to
"params->scratch_len ?: 3 * params->len" (note number 3, not 2!) and
in the second conflict take the version which uses elems->scratch_pos.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?id=dfd2d876b3fda1790bc0239ba4c6967e25d16e91
[2] https://lore.kernel.org/all/20221020032340.5cf101c0@canb.auug.org.au/

mac80211
 - preparation for Wi-Fi 7 Multi-Link Operation (MLO) continues
 - add API to show the link STAs in debugfs
 - all mac80211 drivers are now using mac80211 internal TX queues (iTXQs)

rtw89
 - support 8852BE

rtl8xxxu
 - support RTL8188FU

brmfmac
 - support two station interfaces concurrently

bcma
 - support SPROM rev 11
====================

Link: https://lore.kernel.org/r/20221028132943.304ECC433B5@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 18:31:40 -07:00
Alexander Potapenko
78a498c3a2 x86: fortify: kmsan: fix KMSAN fortify builds
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice.  This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.

Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Peter Xu
67eae54bc2 mm/uffd: fix vma check on userfault for wp
We used to have a report that pte-marker code can be reached even when
uffd-wp is not compiled in for file memories, here:

https://lore.kernel.org/all/YzeR+R6b4bwBlBHh@x1n/T/#u

I just got time to revisit this and found that the root cause is we simply
messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we
will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was
wrong:

    if (vm_flags & VM_UFFD_MINOR)
        return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);

Where we'll allow anything to pass on shmem as long as minor mode is
requested.

Axel did it right when introducing minor mode but I messed it up in
b1f9e87686 when moving code around.  Fix it.

Link: https://lkml.kernel.org/r/20221024193336.1233616-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221024193336.1233616-2-peterx@redhat.com
Fixes: b1f9e87686 ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:22 -07:00
Linus Torvalds
f186fd2f5a Merge tag 'sound-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small fixes:

   - fixes for regressions by the recent ALSA control hash usages

   - fixes for UAF with del_timer() at removals in a few drivers

   - char signedness fixes

   - a few memory leak fixes in error paths

   - device-specific fixes / quirks for Intel SOF, AMD, HD-audio,
     USB-audio, and various ASoC codecs"

* tag 'sound-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (50 commits)
  ALSA: aoa: Fix I2S device accounting
  ALSA: Use del_timer_sync() before freeing timer
  ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
  ALSA: rme9652: use explicitly signed char
  ALSA: au88x0: use explicitly signed char
  ALSA: hda/realtek: Add another HP ZBook G9 model quirks
  ALSA: usb-audio: Add quirks for M-Audio Fast Track C400/600
  ASoC: SOF: Intel: hda-codec: fix possible memory leak in hda_codec_device_init()
  ASoC: amd: yc: Add Lenovo Thinkbook 14+ 2022 21D0 to quirks table
  ASoC: Intel: Skylake: fix possible memory leak in skl_codec_device_init()
  ALSA: ac97: Use snd_ctl_rename() to rename a control
  ALSA: ca0106: Use snd_ctl_rename() to rename a control
  ALSA: emu10k1: Use snd_ctl_rename() to rename a control
  ALSA: hda/realtek: Use snd_ctl_rename() to rename a control
  ALSA: usb-audio: Use snd_ctl_rename() to rename a control
  ALSA: control: add snd_ctl_rename()
  ALSA: ac97: fix possible memory leak in snd_ac97_dev_register()
  ASoC: SOF: Intel: pci-tgl: fix ADL-N descriptor
  ASoC: qcom: lpass-cpu: Mark HDMI TX parity register as volatile
  ASoC: amd: yc: Adding Lenovo ThinkBook 14 Gen 4+ ARA and Lenovo ThinkBook 16 Gen 4+ ARA to the Quirks List
  ...
2022-10-28 12:20:31 -07:00
Linus Torvalds
e3493d6825 Merge tag 'drm-fixes-2022-10-28' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regularly scheduled fixes for drm, live from a Red Hat office for the
  first time in a while.

  The core has two fixes, one for scheduler leak and one for aperture
  uninit read.

  Otherwise a single bridge fix, and msm, amdgpu/kfd and i915 have a set
  of fixes each.

  sched:
   - Stop leaking fences when killing a sched entity.

  aperture:
   - Avoid uninitialized read in aperture_remove_conflicting_pci_device()

  bridge:
   - Fix HPD on bridge/ps8640.

  msm:
   - Fix shrinker deadlock
   - Fix crash during suspend after unbind
   - Fix IRQ lifetime issues
   - Fix potential memory corruption with too many bridges
   - Fix memory corruption on GPU state capture

  amdgpu:
   - Stable pstate fix
   - SMU 13.x updates
   - SR-IOV fixes
   - PCI AER fix
   - GC 11.x fixes
   - Display fixes
   - Expose IMU firmware version for debugging
   - Plane modifier fix
   - S0i3 fix

  amdkfd:
   - Fix possible memory leak
   - Fix GC 10.x cache info reporting

  i915:
   - Extend Wa_1607297627 to Alderlake-P
   - Keep PCI autosuspend control 'on' by default on all dGPU
   - Reset frl trained flag before restarting FRL training"

* tag 'drm-fixes-2022-10-28' of git://anongit.freedesktop.org/drm/drm: (39 commits)
  fbdev/core: Avoid uninitialized read in aperture_remove_conflicting_pci_device()
  drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume
  drm/scheduler: fix fence ref counting
  drm/amd/display: Revert logic for plane modifiers
  drm/amdkfd: correct the cache info for gfx1036
  drm/amdkfd: update gfx1037 Lx cache setting
  drm/amdgpu: skip mes self test for gc 11.0.3 in recover
  drm/amd: Add IMU fw version to fw version queries
  drm/amd/display: Don't return false if no stream
  drm/amd/display: Remove wrong pipe control lock
  drm/amd/pm: allow gfxoff on gc_11_0_3
  drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()
  drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x
  drm/i915/dp: Reset frl trained flag before restarting FRL training
  drm/i915/dgfx: Keep PCI autosuspend control 'on' by default on all dGPU
  drm/i915: Extend Wa_1607297627 to Alderlake-P
  drm/amdgpu: Adjust MES polling timeout for sriov
  drm/amd/pm: update driver-if header for smu_v13_0_10
  drm/amdgpu: fix pstate setting issue
  drm/bridge: ps8640: Add back the 50 ms mystery delay after HPD
  ...
2022-10-28 12:10:43 -07:00
Mubashir Adnan Qureshi
71fc704768 tcp: add rcv_wnd and plb_rehash to TCP_INFO
rcv_wnd can be useful to diagnose TCP performance where receiver window
becomes the bottleneck. rehash reports the PLB and timeout triggered
rehash attempts by the TCP connection.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
29c1c44646 tcp: add u32 counter in tcp_sock and an SNMP counter for PLB
A u32 counter is added to tcp_sock for counting the number of PLB
triggered rehashes for a TCP connection. An SNMP counter is also
added to count overall PLB triggered rehash events for a host. These
counters are hooked up to PLB implementation for DCTCP.

TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports
the rehash attempts triggered due to PLB or timeouts. This gives
a historical view of sustained congestion or timeouts experienced
by the TCP connection.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
1a91bb7c3e tcp: add PLB functionality for TCP
Congestion control algorithms track PLB state and cause the connection
to trigger a path change when either of the 2 conditions is satisfied:

- No packets are in flight and (# consecutive congested rounds >=
  sysctl_tcp_plb_idle_rehash_rounds)
- (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds)

A round (RTT) is marked as congested when congestion signal
(ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh.
In the event of RTO, PLB (via tcp_write_timeout()) triggers a path
change and disables congestion-triggered path changes for random time
between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec)
to avoid hopping onto the "connectivity blackhole". RTO-triggered
path changes can still happen during this cool-off period.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
bd456f283b tcp: add sysctls for TCP PLB parameters
PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226

This commit adds new sysctl knobs and sets their default values for
TCP PLB.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Jakub Kicinski
12dee519d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move struct nft_payload_set definition to .c file where it is
   only used.

2) Shrink transport and inner header offset fields in the nft_pktinfo
   structure to 16-bits, from Florian Westphal.

3) Get rid of nft_objref Kbuild toggle, make it built-in into
   nf_tables. This expression is used to instantiate conntrack helpers
   in nftables. After removing the conntrack helper auto-assignment
   toggle it this feature became more important so move it to the nf_tables
   core module. Also from Florian.

4) Extend the existing function to calculate payload inner header offset
   to deal with the GRE and IPIP transport protocols.

6) Add inner expression support for nf_tables. This new expression
   provides a packet parser for tunneled packets which uses a userspace
   description of the expected inner headers. The inner expression
   invokes the payload expression (via direct call) to match on the
   inner header protocol fields using the inner link, network and
   transport header offsets.

   An example of the bytecode generated from userspace to match on
   IP source encapsulated in a VxLAN packet:

   # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
     netdev x y
       [ meta load l4proto => reg 1 ]
       [ cmp eq reg 1 0x00000011 ]
       [ payload load 2b @ transport header + 2 => reg 1 ]
       [ cmp eq reg 1 0x0000b512 ]
       [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
       [ cmp eq reg 1 0x00000008 ]
       [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
       [ cmp eq reg 1 0x04030201 ]

7) Store inner link, network and transport header offsets in percpu
   area to parse inner packet header once only. Matching on a different
   tunnel type invalidates existing offsets in the percpu area and it
   invokes the inner tunnel parser again.

8) Add support for inner meta matching. This support for
   NFTA_META_PROTOCOL, which specifies the inner ethertype, and
   NFT_META_L4PROTO, which specifies the inner transport protocol.

9) Extend nft_inner to parse GENEVE optional fields to calculate the
   link layer offset.

10) Update inner expression so tunnel offset points to GRE header
    to normalize tunnel header handling. This also allows to perform
    different interpretations of the GRE header from userspace.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_inner: set tunnel offset to GRE header offset
  netfilter: nft_inner: add geneve support
  netfilter: nft_meta: add inner match support
  netfilter: nft_inner: add percpu inner context
  netfilter: nft_inner: support for inner tunnel header matching
  netfilter: nft_payload: access ipip payload for inner offset
  netfilter: nft_payload: access GRE payload via inner offset
  netfilter: nft_objref: make it builtin
  netfilter: nf_tables: reduce nft_pktinfo by 8 bytes
  netfilter: nft_payload: move struct nft_payload_set definition where it belongs
====================

Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:41:05 -07:00
Jakub Kicinski
31f1aa4f74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
  2871edb32f ("can: kvaser_usb: Fix possible completions during init_completion")
  abb8670938 ("can: kvaser_usb_leaf: Ignore stale bus-off after start")
  8d21f5927a ("can: kvaser_usb_leaf: Fix improved state not being reported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 16:56:36 -07:00
Linus Torvalds
2375886721 Merge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from 802.15.4 (Zigbee et al).

  Current release - regressions:

   - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1

  Current release - new code bugs:

   - mptcp: fix abba deadlock on fastopen

   - eth: stmmac: rk3588: allow multiple gmac controllers in one system

  Previous releases - regressions:

   - ip: rework the fix for dflt addr selection for connected nexthop

   - net: couple more fixes for misinterpreting bits in struct page
     after the signature was added

  Previous releases - always broken:

   - ipv6: ensure sane device mtu in tunnels

   - openvswitch: switch from WARN to pr_warn on a user-triggerable path

   - ethtool: eeprom: fix null-deref on genl_info in dump

   - ieee802154: more return code fixes for corner cases in
     dgram_sendmsg

   - mac802154: fix link-quality-indicator recording

   - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
     offload

   - eth: fec: limit register access on i.MX6UL

   - eth: bcm4908_enet: update TX stats after actual transmission

   - can: rcar_canfd: improve IRQ handling for RZ/G2L

  Misc:

   - genetlink: piggy back on the newly added resv_op_start to enforce
     more sanity checks on new commands"

* tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  net: enetc: survive memory pressure without crashing
  kcm: do not sense pfmemalloc status in kcm_sendpage()
  net: do not sense pfmemalloc status in skb_append_pagefrags()
  net/mlx5e: Fix macsec sci endianness at rx sa update
  net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
  net/mlx5e: Fix macsec rx security association (SA) update/delete
  net/mlx5e: Fix macsec coverity issue at rx sa update
  net/mlx5: Fix crash during sync firmware reset
  net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
  net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
  net/mlx5e: TC, Reject forwarding from internal port to internal port
  net/mlx5: Fix possible use-after-free in async command interface
  net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
  net/mlx5e: Update restore chain id for slow path packets
  net/mlx5e: Extend SKB room check to include PTP-SQ
  net/mlx5: DR, Fix matcher disconnect error flow
  net/mlx5: Wait for firmware to enable CRS before pci_restore_state
  net/mlx5e: Do not increment ESN when updating IPsec ESN state
  netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
  netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
  ...
2022-10-27 13:36:59 -07:00
Linus Torvalds
2eb72d85ac Merge tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:

 - Fix older Clang vs recent overflow KUnit test additions (Nick
   Desaulniers, Kees Cook)

 - Fix kern-doc visibility for overflow helpers (Kees Cook)

* tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  overflow: Refactor test skips for Clang-specific issues
  overflow: disable failing tests for older clang versions
  overflow: Fix kern-doc markup for functions
2022-10-27 12:31:57 -07:00