Commit Graph

135989 Commits

Author SHA1 Message Date
Jisheng Zhang
f05fa10901 kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2.

Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by
a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.

I only modified x86, arm, arm64 and riscv, other architectures such as
sh, powerpc and s390 are better to be kept kexec code as-is so they are
not touched.

This patch (of 5):

Make the forward declarations of crashk_res, crashk_low_res and
crash_notes always visible.  Code referring to these symbols can then just
check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional
compilation using an #ifdef, thus preparing to increase compile coverage
and simplify the code.

Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org
Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Randy Dunlap
abc7da58c4 init.h: improve __setup and early_param documentation
Igor noted in [1] that there are quite a few __setup() handling
functions that return incorrect values.  Doing this can be harmless, but
it can also cause strings to be added to init's argument or environment
list, polluting them.

Since __setup() handling and return values are not documented, first add
documentation for that.  Also add more documentation for early_param()
handling and return values.

For __setup() functions, returning 0 (not handled) has questionable
value if it is just a malformed option value, as in

  rodata=junk

since returning 0 would just cause "rodata=junk" to be added to init's
environment unnecessarily:

  Run /sbin/init as init process
    with arguments:
      /sbin/init
    with environment:
      HOME=/
      TERM=linux
      splash=native
      rodata=junk

Also, there are no recommendations on whether to print a warning when an
unknown parameter value is seen.  I am not addressing that here.

[1] lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru

Link: https://lkml.kernel.org/r/20220221050852.1147-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Andy Shevchenko
25cb5b7ac6 bitfield: add explicit inclusions to the example
It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion
and the example in the former is confusing.  Some developers think that
it's okay to just include bitfield.h to get it working.  Change example
to explicitly include necessary headers in order to avoid confusion.

Link: https://lkml.kernel.org/r/20220207123341.47533-1-andriy.shevchenko@linux.intel.com
Fixes: 3e9b3112ec ("add basic register-field manipulation macros")
Depends-on: 8bd9cb51da ("locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Jan Dąbroś <jsd@semihalf.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Christophe Leroy
f334f5668b ilog2: force inlining of __ilog2_u32() and __ilog2_u64()
Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to
__ilog2_u32() being duplicated 50 times and __ilog2_u64() 3 times in
vmlinux on a tiny powerpc32 config.

__ilog2_u32() being 2 instructions it is not worth being kept out of
line, so force inlining.  Allthough the u64 version is a bit bigger,
there is still a small benefit in keeping it inlined.  On a 64 bits
config there's a real benefit.

With this change the size of vmlinux text is reduced by 1 kbytes, which
is approx 50% more than the size of the removed functions.

Before the patch there is for instance:

	c00d2a94 <__ilog2_u32>:
	c00d2a94:	7c 63 00 34 	cntlzw  r3,r3
	c00d2a98:	20 63 00 1f 	subfic  r3,r3,31
	c00d2a9c:	4e 80 00 20 	blr

	c00d36d8 <__order_base_2>:
	c00d36d8:	28 03 00 01 	cmplwi  r3,1
	c00d36dc:	40 81 00 2c 	ble     c00d3708 <__order_base_2+0x30>
	c00d36e0:	94 21 ff f0 	stwu    r1,-16(r1)
	c00d36e4:	7c 08 02 a6 	mflr    r0
	c00d36e8:	38 63 ff ff 	addi    r3,r3,-1
	c00d36ec:	90 01 00 14 	stw     r0,20(r1)
	c00d36f0:	4b ff f3 a5 	bl      c00d2a94 <__ilog2_u32>
	c00d36f4:	80 01 00 14 	lwz     r0,20(r1)
	c00d36f8:	38 63 00 01 	addi    r3,r3,1
	c00d36fc:	7c 08 03 a6 	mtlr    r0
	c00d3700:	38 21 00 10 	addi    r1,r1,16
	c00d3704:	4e 80 00 20 	blr
	c00d3708:	38 60 00 00 	li      r3,0
	c00d370c:	4e 80 00 20 	blr

With the patch it has become:

	c00d356c <__order_base_2>:
	c00d356c:	28 03 00 01 	cmplwi  r3,1
	c00d3570:	40 81 00 14 	ble     c00d3584 <__order_base_2+0x18>
	c00d3574:	38 63 ff ff 	addi    r3,r3,-1
	c00d3578:	7c 63 00 34 	cntlzw  r3,r3
	c00d357c:	20 63 00 20 	subfic  r3,r3,32
	c00d3580:	4e 80 00 20 	blr
	c00d3584:	38 60 00 00 	li      r3,0
	c00d3588:	4e 80 00 20 	blr

No more need for __order_base_2() to setup a stack frame and
save/restore caller address. And the following 'add 1' is
merged in the subtract.

Another typical use of it:

	c080ff28 <hugepagesz_setup>:
	...
	c080fff8:	7f c3 f3 78 	mr      r3,r30
	c080fffc:	4b 8f 81 f1 	bl      c01081ec <__ilog2_u32>
	c0810000:	38 63 ff f2 	addi    r3,r3,-14
	...

Becomes

	c080ff1c <hugepagesz_setup>:
	...
	c080ffec:	7f c3 00 34 	cntlzw  r3,r30
	c080fff0:	20 63 00 11 	subfic  r3,r3,17
	...

Here no need to move r30 argument to r3 then substract 14 to result.  Just
work on r30 and merge the 'sub 14' with the 'sub from 31'.

Link: https://lkml.kernel.org/r/803a2ac3d923ebcfd0dd40f5886b05cae7bb0aba.1644243860.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Rasmus Villemoes
14e83077d5 include: drop pointless __compiler_offsetof indirection
(1) compiler_types.h is unconditionally included via an -include flag
    (see scripts/Makefile.lib), and it defines __compiler_offsetof
    unconditionally.  So testing for definedness of __compiler_offsetof is
    mostly pointless.

(2) Every relevant compiler provides __builtin_offsetof (even sparse
    has had that for 14 years), and if for whatever reason one would end
    up picking up the poor man's fallback definition (C file compiler with
    completely custom CFLAGS?), newer clang versions won't treat the
    result as an Integer Constant Expression, so if used in place where
    such is required (static initializer or static_assert), one would get
    errors like

      t.c:11:16: error: static_assert expression is not an integral constant expression
      t.c:11:16: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
      t.c:4:33: note: expanded from macro 'offsetof'
      #define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

So just define offsetof unconditionally and directly in terms of
__builtin_offsetof.

Link: https://lkml.kernel.org/r/20220202102147.326672-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Bjorn Helgaas
179fd6ba3b Documentation/sparse: add hints about __CHECKER__
Several attributes depend on __CHECKER__, but previously there was no
clue in the tree about when __CHECKER__ might be defined.  Add hints at
the most common places (__kernel, __user, __iomem, __bitwise) and in the
sparse documentation.

Link: https://lkml.kernel.org/r/20220310220927.245704-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Bjorn Helgaas
c724c866bb linux/types.h: remove unnecessary __bitwise__
There are no users of "__bitwise__" except the definition of
"__bitwise".  Remove __bitwise__ and define __bitwise directly.

This is a follow-up to 05de97003c ("linux/types.h: enable endian
checks for all sparse builds").

[akpm@linux-foundation.org: change the tools/include/linux/types.h definition also]

Link: https://lkml.kernel.org/r/20220310220927.245704-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Linus Torvalds
551acdc3c3 Merge tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, ipsec, and wireless.

  A few last minute revert / disable and fix patches came down from our
  sub-trees. We're not waiting for any fixes at this point.

  Current release - regressions:

   - Revert "netfilter: nat: force port remap to prevent shadowing
     well-known ports", restore working conntrack on asymmetric paths

   - Revert "ath10k: drop beacon and probe response which leak from
     other channel", restore working AP and mesh mode on QCA9984

   - eth: intel: fix hang during reboot/shutdown

  Current release - new code bugs:

   - netfilter: nf_tables: disable register tracking, it needs more work
     to cover all corner cases

  Previous releases - regressions:

   - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
     extension headers get specified

   - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
     value more selectively

   - bnx2x: fix driver load failure when FW not present in initrd

  Previous releases - always broken:

   - vsock: stop destroying unrelated sockets in nested virtualization

   - packet: fix slab-out-of-bounds access in packet_recvmsg()

  Misc:

   - add Paolo Abeni to networking maintainers!"

* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
  iavf: Fix hang during reboot/shutdown
  net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
  net: bcmgenet: skip invalid partial checksums
  bnx2x: fix built-in kernel driver load failure
  net: phy: mscc: Add MODULE_FIRMWARE macros
  net: dsa: Add missing of_node_put() in dsa_port_parse_of
  net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
  Revert "ath10k: drop beacon and probe response which leak from other channel"
  hv_netvsc: Add check for kvmalloc_array
  iavf: Fix double free in iavf_reset_task
  ice: destroy flow director filter mutex after releasing VSIs
  ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
  Add Paolo Abeni to networking maintainers
  atm: eni: Add check for dma_map_single
  net/packet: fix slab-out-of-bounds access in packet_recvmsg()
  net: mdio: mscc-miim: fix duplicate debugfs entry
  net: phy: marvell: Fix invalid comparison in the resume and suspend functions
  esp6: fix check on ipv6_skip_exthdr's return value
  net: dsa: microchip: add spi_device_id tables
  netfilter: nf_tables: disable register tracking
  ...
2022-03-17 12:55:26 -07:00
Nicolas Dichtel
4ee06de772 net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
This kind of interface doesn't have a mac header. This patch fixes
bpf_redirect() to a PIM interface.

Fixes: 27b29f6305 ("bpf: add bpf_redirect() helper")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220315092008.31423-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16 19:38:41 -07:00
Jakub Kicinski
15d703921f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net coming late
in the 5.17-rc process:

1) Revert port remap to mitigate shadowing service ports, this is causing
   problems in existing setups and this mitigation can be achieved with
   explicit ruleset, eg.

	... tcp sport < 16386 tcp dport >= 32768 masquerade random

  This patches provided a built-in policy similar to the one described above.

2) Disable register tracking infrastructure in nf_tables. Florian reported
   two issues:

   - Existing expressions with no implemented .reduce interface
     that causes data-store on register should cancel the tracking.
   - Register clobbering might be possible storing data on registers that
     are larger than 32-bits.

   This might lead to generating incorrect ruleset bytecode. These two
   issues are scheduled to be addressed in the next release cycle.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: disable register tracking
  Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
  Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
====================

Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14 15:51:10 -07:00
Jiyong Park
8e6ed96376 vsock: each transport cycles only on its own sockets
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11 23:14:19 -08:00
Linus Torvalds
93ce93587d Merge branch 'davidh' (fixes from David Howells)
Merge misc fixes from David Howells:
 "A set of patches for watch_queue filter issues noted by Jann. I've
  added in a cleanup patch from Christophe Jaillet to convert to using
  formal bitmap specifiers for the note allocation bitmap.

  Also two filesystem fixes (afs and cachefiles)"

* emailed patches from David Howells <dhowells@redhat.com>:
  cachefiles: Fix volume coherency attribute
  afs: Fix potential thrashing in afs writeback
  watch_queue: Make comment about setting ->defunct more accurate
  watch_queue: Fix lack of barrier/sync/lock between post and read
  watch_queue: Free the alloc bitmap when the watch_queue is torn down
  watch_queue: Fix the alloc bitmap size to reflect notes allocated
  watch_queue: Use the bitmap API when applicable
  watch_queue: Fix to always request a pow-of-2 pipe ring size
  watch_queue: Fix to release page in ->release()
  watch_queue, pipe: Free watchqueue state after clearing pipe ring
  watch_queue: Fix filter limit check
2022-03-11 10:28:32 -08:00
David Howells
413a4a6b0b cachefiles: Fix volume coherency attribute
A network filesystem may set coherency data on a volume cookie, and if
given, cachefiles will store this in an xattr on the directory in the
cache corresponding to the volume.

The function that sets the xattr just stores the contents of the volume
coherency buffer directly into the xattr, with nothing added; the
checking function, on the other hand, has a cut'n'paste error whereby it
tries to interpret the xattr contents as would be the xattr on an
ordinary file (using the cachefiles_xattr struct).  This results in a
failure to match the coherency data because the buffer ends up being
shifted by 18 bytes.

Fix this by defining a structure specifically for the volume xattr and
making both the setting and checking functions use it.

Since the volume coherency doesn't work if used, take the opportunity to
insert a reserved field for future use, set it to 0 and check that it is
0.  Log mismatch through the appropriate tracepoint.

Note that this only affects cifs; 9p, afs, ceph and nfs don't use the
volume coherency data at the moment.

Fixes: 32e150037d ("fscache, cachefiles: Store the volume coherency data")
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11 10:24:37 -08:00
David Howells
c993ee0f9f watch_queue: Fix filter limit check
In watch_queue_set_filter(), there are a couple of places where we check
that the filter type value does not exceed what the type_filter bitmap
can hold.  One place calculates the number of bits by:

   if (tf[i].type >= sizeof(wfilter->type_filter) * 8)

which is fine, but the second does:

   if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG)

which is not.  This can lead to a couple of out-of-bounds writes due to
a too-large type:

 (1) __set_bit() on wfilter->type_filter
 (2) Writing more elements in wfilter->filters[] than we allocated.

Fix this by just using the proper WATCH_TYPE__NR instead, which is the
number of types we actually know about.

The bug may cause an oops looking something like:

  BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740
  Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x45/0x59
   print_address_description.constprop.0+0x1f/0x150
   ...
   kasan_report.cold+0x7f/0x11b
   ...
   watch_queue_set_filter+0x659/0x740
   ...
   __x64_sys_ioctl+0x127/0x190
   do_syscall_64+0x43/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  Allocated by task 611:
   kasan_save_stack+0x1e/0x40
   __kasan_kmalloc+0x81/0xa0
   watch_queue_set_filter+0x23a/0x740
   __x64_sys_ioctl+0x127/0x190
   do_syscall_64+0x43/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  The buggy address belongs to the object at ffff88800d2c66a0
   which belongs to the cache kmalloc-32 of size 32
  The buggy address is located 28 bytes inside of
   32-byte region [ffff88800d2c66a0, ffff88800d2c66c0)

Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11 10:17:12 -08:00
Linus Torvalds
186d32bbf0 Merge tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, and ipsec.

  Current release - regressions:

   - Bluetooth: fix unbalanced unlock in set_device_flags()

   - Bluetooth: fix not processing all entries on cmd_sync_work, make
     connect with qualcomm and intel adapters reliable

   - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"

   - xdp: xdp_mem_allocator can be NULL in trace_mem_connect()

   - eth: ice: fix race condition and deadlock during interface enslave

  Current release - new code bugs:

   - tipc: fix incorrect order of state message data sanity check

  Previous releases - regressions:

   - esp: fix possible buffer overflow in ESP transformation

   - dsa: unlock the rtnl_mutex when dsa_master_setup() fails

   - phy: meson-gxl: fix interrupt handling in forced mode

   - smsc95xx: ignore -ENODEV errors when device is unplugged

  Previous releases - always broken:

   - xfrm: fix tunnel mode fragmentation behavior

   - esp: fix inter address family tunneling on GSO

   - tipc: fix null-deref due to race when enabling bearer

   - sctp: fix kernel-infoleak for SCTP sockets

   - eth: macb: fix lost RX packet wakeup race in NAPI receive

   - eth: intel stop disabling VFs due to PF error responses

   - eth: bcmgenet: don't claim WOL when its not available"

* tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
  ice: Fix race condition during interface enslave
  net: phy: meson-gxl: improve link-up behavior
  net: bcmgenet: Don't claim WOL when its not available
  net: arc_emac: Fix use after free in arc_mdio_probe()
  sctp: fix kernel-infoleak for SCTP sockets
  net: phy: correct spelling error of media in documentation
  net: phy: DP83822: clear MISR2 register to disable interrupts
  gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
  selftests: pmtu.sh: Kill nettest processes launched in subshell.
  selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
  NFC: port100: fix use-after-free in port100_send_complete
  net/mlx5e: SHAMPO, reduce TIR indication
  net/mlx5e: Lag, Only handle events from highest priority multipath entry
  net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
  net/mlx5: Fix a race on command flush flow
  net/mlx5: Fix size field in bufferx_reg struct
  ax25: Fix NULL pointer dereference in ax25_kill_by_device
  net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
  net: ethernet: lpc_eth: Handle error for clk_enable
  ...
2022-03-10 16:47:58 -08:00
Colin Foster
26183cfe47 net: phy: correct spelling error of media in documentation
The header file incorrectly referenced "median-independant interface"
instead of media. Correct this typo.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Fixes: 4069a572d4 ("net: phy: Document core PHY structures")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220309062544.3073-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 14:40:59 -08:00
Linus Torvalds
b5521fe9a9 Merge tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
 "Several Linux PV device frontends are using the grant table interfaces
  for removing access rights of the backends in ways being subject to
  race conditions, resulting in potential data leaks, data corruption by
  malicious backends, and denial of service triggered by malicious
  backends:

   - blkfront, netfront, scsifront and the gntalloc driver are testing
     whether a grant reference is still in use. If this is not the case,
     they assume that a following removal of the granted access will
     always succeed, which is not true in case the backend has mapped
     the granted page between those two operations.

     As a result the backend can keep access to the memory page of the
     guest no matter how the page will be used after the frontend I/O
     has finished. The xenbus driver has a similar problem, as it
     doesn't check the success of removing the granted access of a
     shared ring buffer.

   - blkfront, netfront, scsifront, usbfront, dmabuf, xenbus, 9p,
     kbdfront, and pvcalls are using a functionality to delay freeing a
     grant reference until it is no longer in use, but the freeing of
     the related data page is not synchronized with dropping the granted
     access.

     As a result the backend can keep access to the memory page even
     after it has been freed and then re-used for a different purpose.

   - netfront will fail a BUG_ON() assertion if it fails to revoke
     access in the rx path.

     This will result in a Denial of Service (DoS) situation of the
     guest which can be triggered by the backend"

* tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
  xen/gnttab: fix gnttab_end_foreign_access() without page specified
  xen/pvcalls: use alloc/free_pages_exact()
  xen/9p: use alloc/free_pages_exact()
  xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()
  xen: remove gnttab_query_foreign_access()
  xen/gntalloc: don't use gnttab_query_foreign_access()
  xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
  xen/netfront: don't use gnttab_query_foreign_access() for mapped status
  xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
  xen/grant-table: add gnttab_try_end_foreign_access()
  xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
2022-03-09 20:44:17 -08:00
Ben Ben-Ishay
99a2b9be07 net/mlx5e: SHAMPO, reduce TIR indication
SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the
first place to enforce suitability between connected TIR and RQ, this
enforcement does not exist in current the Firmware implementation and was
redundant in the first place.

Fixes: 83439f3c37 ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 11:39:35 -08:00
Mohammad Kabat
ac77998b7a net/mlx5: Fix size field in bufferx_reg struct
According to HW spec the field "size" should be 16 bits
in bufferx register.

Fixes: e281682bf2 ("net/mlx5_core: HW data structs/types definitions cleanup")
Signed-off-by: Mohammad Kabat <mohammadkab@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 11:39:34 -08:00
David S. Miller
cc7e2f596e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2022-03-09

1) Fix IPv6 PMTU discovery for xfrm interfaces.
   From Lina Wang.

2) Revert failing for policies and states that are
   configured with XFRMA_IF_ID 0. It broke a
   user configuration. From Kai Lueke.

3) Fix a possible buffer overflow in the ESP output path.

4) Fix ESP GSO for tunnel and BEET mode on inter address
   family tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 14:48:11 +00:00
Linus Torvalds
92f90cc9fe Merge tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:

 - Fix an issue with splice on the fuse device

 - Fix a regression in the fileattr API conversion

 - Add a small userspace API improvement

* tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix pipe buffer lifetime for direct_io
  fuse: move FUSE_SUPER_MAGIC definition to magic.h
  fuse: fix fileattr op failure
2022-03-08 09:41:18 -08:00
Linus Torvalds
cd22a8bfcf Merge tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 spectre fixes from James Morse:
 "ARM64 Spectre-BHB mitigations:

   - Make EL1 vectors per-cpu

   - Add mitigation sequences to the EL1 and EL2 vectors on vulnerble
     CPUs

   - Implement ARCH_WORKAROUND_3 for KVM guests

   - Report Vulnerable when unprivileged eBPF is enabled"

* tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  arm64: entry.S: Add ventry overflow sanity checks
2022-03-08 09:27:25 -08:00
Florian Westphal
ee0a4dc9f3 Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
This was a prerequisite for the ill-fated
"netfilter: nat: force port remap to prevent shadowing well-known ports".

As this has been reverted, this change can be backed out too.

Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-08 17:28:38 +01:00
Linus Torvalds
4a01e748a5 Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:

 - Mitigate Spectre v2-type Branch History Buffer attacks on machines
   which support eIBRS, i.e., the hardware-assisted speculation
   restriction after it has been shown that such machines are vulnerable
   even with the hardware mitigation.

 - Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
   it is insufficient to mitigate such attacks. Instead, switch to
   retpolines on all AMD by default.

 - Update the docs and add some warnings for the obviously vulnerable
   cmdline configurations.

* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
  x86/speculation: Warn about Spectre v2 LFENCE mitigation
  x86/speculation: Update link to AMD speculation whitepaper
  x86/speculation: Use generic retpoline by default on AMD
  x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  Documentation/hw-vuln: Update spectre doc
  x86/speculation: Add eIBRS + Retpoline options
  x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
2022-03-07 17:29:47 -08:00
Linus Torvalds
06be302970 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Some last minute fixes that took a while to get ready. Not
  regressions, but they look safe and seem to be worth to have"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  tools/virtio: handle fallout from folio work
  tools/virtio: fix virtio_test execution
  vhost: remove avail_event arg from vhost_update_avail_event()
  virtio: drop default for virtio-mem
  vdpa: fix use-after-free on vp_vdpa_remove
  virtio-blk: Remove BUG_ON() in virtio_queue_rq()
  virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
  vhost: fix hung thread due to erroneous iotlb entries
  vduse: Fix returning wrong type in vduse_domain_alloc_iova()
  vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
  vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
  vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
  virtio_console: break out of buf poll on remove
  virtio: document virtio_reset_device
  virtio: acknowledge all features before access
  virtio: unexport virtio_finalize_features
2022-03-07 11:32:17 -08:00
Halil Pasic
aa6f8dcbab swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.

The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
  must take precedence over DMA_ATTR_SKIP_CPU_SYNC

Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.

Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.

To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: ddbd89deb7 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-07 11:26:02 -08:00
Steffen Klassert
23c7f8d798 net: Fix esp GSO on inter address family tunnels.
The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.

This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.

Fixes: c35fe4106b ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07 13:14:04 +01:00
Steffen Klassert
ebe48d368e esp: Fix possible buffer overflow in ESP transformation
The maximum message size that can be send is bigger than
the  maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.

Fix this by doing a fallback to COW in that case.

v2:

Avoid get get_order() costs as suggested by Linus Torvalds.

Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: valis <sec@valis.email>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07 13:14:03 +01:00
Juergen Gross
42baefac63 xen/gnttab: fix gnttab_end_foreign_access() without page specified
gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.

The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.

This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V4:
- expand comment in header
V5:
- get page ref in case of kmalloc() failure, too
2022-03-07 09:48:55 +01:00
Juergen Gross
1dbd11ca75 xen: remove gnttab_query_foreign_access()
Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.

All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2022-03-07 09:48:54 +01:00
Juergen Gross
6b1775f26a xen/grant-table: add gnttab_try_end_foreign_access()
Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.

Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.

This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- new patch
V4:
- add comments to header (Jan Beulich)
2022-03-07 09:48:54 +01:00
Linus Torvalds
dcde98da99 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - a fixup for Goodix touchscreen driver allowing it to work on certain
   Cherry Trail devices

 - a fix for imbalanced enable/disable regulator in Elam touchpad driver
   that became apparent when used with Asus TF103C 2-in-1 dock

 - a couple new input keycodes used on newer keyboards

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  HID: add mapping for KEY_ALL_APPLICATIONS
  HID: add mapping for KEY_DICTATE
  Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
  Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
  Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
  Input: goodix - use the new soc_intel_is_byt() helper
  Input: samsung-keypad - properly state IOMEM dependency
2022-03-05 15:49:45 -08:00
Suren Baghdasaryan
96403e1128 mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high.  With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).

Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults.  Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647).  In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).

kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter.  This would lead to the
refcounted object never being freed.  A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.

To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED.  This should provide enough headroom for raising the
refcounts temporarily.

Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Suren Baghdasaryan
5c26f6ac94 mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Si-Wei Liu
e0077cc13b vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
No functional change introduced. vdpa bus driver such as virtio_vdpa
or vhost_vdpa is not supposed to take care of the locking for core
by its own. The locked API vdpa_set_features should suffice the
bus driver's need.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04 11:56:33 -05:00
Michael S. Tsirkin
4fa59ede95 virtio: acknowledge all features before access
The feature negotiation was designed in a way that
makes it possible for devices to know which config
fields will be accessed by drivers.

This is broken since commit 404123c2db ("virtio: allow drivers to
validate features") with fallout in at least block and net.  We have a
partial work-around in commit 2f9a174f91 ("virtio: write back
F_VERSION_1 before validate") which at least lets devices find out which
format should config space have, but this is a partial fix: guests
should not access config space without acknowledging features since
otherwise we'll never be able to change the config space format.

To fix, split finalize_features from virtio_finalize_features and
call finalize_features with all feature bits before validation,
and then - if validation changed any bits - once again after.

Since virtio_finalize_features no longer writes out features
rename it to virtio_features_ok - since that is what it does:
checks that features are ok with the device.

As a side effect, this also reduces the amount of hypervisor accesses -
we now only acknowledge features once unless we are clearing any
features when validating (which is uncommon).

IRC I think that this was more or less always the intent in the spec but
unfortunately the way the spec is worded does not say this explicitly, I
plan to address this at the spec level, too.

Acked-by: Jason Wang <jasowang@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 404123c2db ("virtio: allow drivers to validate features")
Fixes: 2f9a174f91 ("virtio: write back F_VERSION_1 before validate")
Cc: "Halil Pasic" <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04 08:33:21 -05:00
Michael S. Tsirkin
838d6d3461 virtio: unexport virtio_finalize_features
virtio_finalize_features is only used internally within virtio.
No reason to export it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04 08:33:21 -05:00
William Mahon
327b89f0ac HID: add mapping for KEY_ALL_APPLICATIONS
This patch adds a new key definition for KEY_ALL_APPLICATIONS
and aliases KEY_DASHBOARD to it.

It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.

Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-03 18:44:21 -08:00
William Mahon
bfa26ba343 HID: add mapping for KEY_DICTATE
Numerous keyboards are adding dictate keys which allows for text
messages to be dictated by a microphone.

This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
usage code to this new keycode. Additionally hid-debug is adjusted to
recognize this new usage code as well.

Signed-off-by: William Mahon <wmahon@chromium.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-03 18:44:19 -08:00
Linus Torvalds
b949c21fc2 Merge tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from can, xfrm, wifi, bluetooth, and netfilter.

  Lots of various size fixes, the length of the tag speaks for itself.
  Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
  which had been lagging as you pointed out previously. But there's also
  a larger than we'd like portion of fixes for bugs from previous
  releases.

  Three more fixes still under discussion, including and xfrm revert for
  uAPI error.

  Current release - regressions:

   - iwlwifi: don't advertise TWT support, prevent FW crash

   - xfrm: fix the if_id check in changelink

   - xen/netfront: destroy queues before real_num_tx_queues is zeroed

   - bluetooth: fix not checking MGMT cmd pending queue, make scanning
     work again

  Current release - new code bugs:

   - mptcp: make SIOCOUTQ accurate for fallback socket

   - bluetooth: access skb->len after null check

   - bluetooth: hci_sync: fix not using conn_timeout

   - smc: fix cleanup when register ULP fails

   - dsa: restore error path of dsa_tree_change_tag_proto

   - iwlwifi: fix build error for IWLMEI

   - iwlwifi: mvm: propagate error from request_ownership to the user

  Previous releases - regressions:

   - xfrm: fix pMTU regression when reported pMTU is too small

   - xfrm: fix TCP MSS calculation when pMTU is close to 1280

   - bluetooth: fix bt_skb_sendmmsg not allocating partial chunks

   - ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks

   - ipv6: prevent leaks in igmp6 when input queues get full

   - fix up skbs delta_truesize in UDP GRO frag_list

   - eth: e1000e: fix possible HW unit hang after an s0ix exit

   - eth: e1000e: correct NVM checksum verification flow

   - ptp: ocp: fix large time adjustments

  Previous releases - always broken:

   - tcp: make tcp_read_sock() more robust in presence of urgent data

   - xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate

   - xfrm: fix xfrm_migrate issues when address family changes

   - dcb: flush lingering app table entries for unregistered devices

   - smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error

   - mac80211: fix EAPoL rekey fail in 802.3 rx path

   - mac80211: fix forwarded mesh frames AC & queue selection

   - netfilter: nf_queue: fix socket access races and bugs

   - batman-adv: fix ToCToU iflink problems and check the result belongs
     to the expected net namespace

   - can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting

   - can: rcar_canfd: register the CAN device when fully ready

   - eth: igb, igc: phy: drop premature return leaking HW semaphore

   - eth: ixgbe: xsk: change !netif_carrier_ok() handling in
     ixgbe_xmit_zc(), prevent live lock when link goes down

   - eth: stmmac: only enable DMA interrupts when ready

   - eth: sparx5: move vlan checks before any changes are made

   - eth: iavf: fix races around init, removal, resets and vlan ops

   - ibmvnic: more reset flow fixes

  Misc:

   - eth: fix return value of __setup handlers"

* tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
  net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
  ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
  selftests: mlxsw: resource_scale: Fix return value
  selftests: mlxsw: tc_police_scale: Make test more robust
  net: dcb: disable softirqs in dcbnl_flush_dev()
  bnx2: Fix an error message
  sfc: extend the locking on mcdi->seqno
  net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
  net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
  net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
  tcp: make tcp_read_sock() more robust
  bpf, sockmap: Do not ignore orig_len parameter
  net: ipa: add an interconnect dependency
  net: fix up skbs delta_truesize in UDP GRO frag_list
  iwlwifi: mvm: return value for request_ownership
  nl80211: Update bss channel on channel switch for P2P_CLIENT
  iwlwifi: fix build error for IWLMEI
  ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
  batman-adv: Don't expect inter-netns unique iflink indices
  ...
2022-03-03 11:10:56 -08:00
Eric Dumazet
2d3916f318 ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.

Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.

Fixes: f185de28d9 ("mld: add new workqueues for process mld events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 09:47:06 -08:00
Jakub Kicinski
4761df52f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not
   intentional. From Eric Dumazet.

2) Use-after-free in netfilter hook core, from Eric Dumazet.

3) Missing rcu read lock side for netfilter egress hook,
   from Florian Westphal.

4) nf_queue assume state->sk is full socket while it might not be.
   Invoke sock_gen_put(), from Florian Westphal.

5) Add selftest to exercise the reported KASAN splat in 4)

6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0.
   Also from Florian.

7) Use input interface index only for hardware offload, not for
   the software plane. This breaks tc ct action. Patch from Paul Blakey.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
  netfilter: nf_queue: handle socket prefetch
  netfilter: nf_queue: fix possible use-after-free
  selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
  netfilter: nf_queue: don't assume sk is full socket
  netfilter: egress: silence egress hook lockdep splats
  netfilter: fix use-after-free in __nf_register_net_hook()
  netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
====================

Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 15:13:47 -08:00
Paul Blakey
db6140e5e3 net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
After cited commit optimizted hw insertion, flow table entries are
populated with ifindex information which was intended to only be used
for HW offload. This tuple ifindex is hashed in the flow table key, so
it must be filled for lookup to be successful. But tuple ifindex is only
relevant for the netfilter flowtables (nft), so it's not filled in
act_ct flow table lookup, resulting in lookup failure, and no SW
offload and no offload teardown for TCP connection FIN/RST packets.

To fix this, add new tc ifindex field to tuple, which will
only be used for offloading, not for lookup, as it will not be
part of the tuple hash.

Fixes: 9795ded7f9 ("net/sched: act_ct: Fill offloading tuple iifidx")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-01 22:08:31 +01:00
David S. Miller
b8d06ce712 Merge tag 'wireless-for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
johannes Berg says:

====================

Some last-minute fixes:
 * rfkill
   - add missing rfill_soft_blocked() when disabled

 * cfg80211
   - handle a nla_memdup() failure correctly
   - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
     Makefile

 * mac80211
   - fix EAPOL handling in 802.3 RX path
   - reject setting up aggregation sessions before
     connection is authorized to avoid timeouts or
     similar
   - handle some SAE authentication steps correctly
   - fix AC selection in mesh forwarding

 * iwlwifi
   - remove TWT support as it causes firmware crashes
     when the AP isn't behaving correctly
   - check debugfs pointer before dereferncing it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:45:55 +00:00
Florian Westphal
c387307024 netfilter: nf_queue: fix possible use-after-free
Eric Dumazet says:
  The sock_hold() side seems suspect, because there is no guarantee
  that sk_refcnt is not already 0.

On failure, we cannot queue the packet and need to indicate an
error.  The packet will be dropped by the caller.

v2: split skb prefetch hunk into separate change

Fixes: 271b72c7fa ("udp: RCU handling for Unicast packets.")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-01 11:50:35 +01:00
Ben Dooks
50bb467c9e rfkill: define rfill_soft_blocked() if !RFKILL
If CONFIG_RFKILL is not set, the Intel WiFi driver will not build
the iw_mvm driver part due to the missing rfill_soft_blocked()
call. Adding a inline declaration of rfill_soft_blocked() if
CONFIG_RFKILL=n fixes the following error:

drivers/net/wireless/intel/iwlwifi/mvm/mvm.h: In function 'iwl_mvm_mei_set_sw_rfkill_state':
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:2215:38: error: implicit declaration of function 'rfkill_soft_blocked'; did you mean 'rfkill_blocked'? [-Werror=implicit-function-declaration]
 2215 |                 mvm->hw_registered ? rfkill_soft_blocked(mvm->hw->wiphy->rfkill) : false;
      |                                      ^~~~~~~~~~~~~~~~~~~
      |                                      rfkill_blocked

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reported-by: Neill Whillans <neill.whillans@codethink.co.uk>
Fixes: 5bc9a9dd75 ("rfkill: allow to get the software rfkill state")
Link: https://lore.kernel.org/r/20220218093858.1245677-1-ben.dooks@codethink.co.uk
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-01 10:59:13 +01:00
Florian Westphal
17a8f31bba netfilter: egress: silence egress hook lockdep splats
Netfilter assumes its called with rcu_read_lock held, but in egress
hook case it may be called with BH readlock.

This triggers lockdep splat.

In order to avoid to change all rcu_dereference() to
rcu_dereference_check(..., rcu_read_lock_bh_held()), wrap nf_hook_slow
with read lock/unlock pair.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-28 22:34:04 +01:00
Linus Torvalds
719fce7539 Merge tag 'soc-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "The code changes address mostly minor problems:

   - Several NXP/FSL SoC driver fixes, addressing issues with error
     handling and compilation

   - Fix a clock disabling imbalance in gpcv2 driver.

   - Arm Juno DMA coherency issue

   - Trivial firmware driver fixes for op-tee and scmi firmware

  The remaining changes address issues in the devicetree files:

   - A timer regression for the OMAP devkit8000, which has to use the
     alternative timer.

   - A hang in the i.MX8MM power domain configuration

   - Multiple fixes for the Rockchip RK3399 addressing issues with sound
     and eMMC

   - Cosmetic fixes for i.MX8ULP, RK3xxx, and Tegra124"

* tag 'soc-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
  ARM: tegra: Move panels to AUX bus
  soc: imx: gpcv2: Fix clock disabling imbalance in error path
  soc: fsl: qe: Check of ioremap return value
  soc: fsl: qe: fix typo in a comment
  soc: fsl: guts: Add a missing memory allocation failure check
  soc: fsl: guts: Revert commit 3c0d64e867
  soc: fsl: Correct MAINTAINERS database (SOC)
  soc: fsl: Correct MAINTAINERS database (QUICC ENGINE LIBRARY)
  soc: fsl: Replace kernel.h with the necessary inclusions
  dt-bindings: fsl,layerscape-dcfg: add missing compatible for lx2160a
  dt-bindings: qoriq-clock: add missing compatible for lx2160a
  ARM: dts: Use 32KiHz oscillator on devkit8000
  ARM: dts: switch timer config to common devkit8000 devicetree
  tee: optee: fix error return code in probe function
  arm64: dts: imx8ulp: Set #thermal-sensor-cells to 1 as required
  arm64: dts: imx8mm: Fix VPU Hanging
  ARM: dts: rockchip: fix a typo on rk3288 crypto-controller
  ARM: dts: rockchip: reorder rk322x hmdi clocks
  firmware: arm_scmi: Remove space in MODULE_ALIAS name
  arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg"
  ...
2022-02-28 12:51:14 -08:00
Linus Torvalds
98f3e84f8d Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:

 - fix a swiotlb info leak (Halil Pasic)

* tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-27 12:42:37 -08:00
Linus Torvalds
2293be58d6 Merge tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - rtla (Real-Time Linux Analysis tool):
    - fix typo in man page
    - Update API -e to -E before it is released
    - Error message fix and memory leak fix

 - Partially uninline trace event soft disable to shrink text

 - Fix function graph start up test

 - Have triggers affect the trace instance they are in and not top level

 - Have osnoise sleep in the units it says it uses

 - Remove unused ftrace stub function

 - Remove event probe redundant info from event in the buffer

 - Fix group ownership setting in tracefs

 - Ensure trace buffer is minimum size to prevent crashes

* tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rtla/osnoise: Fix error message when failing to enable trace instance
  rtla/osnoise: Free params at the exit
  rtla/hist: Make -E the short version of --entries
  tracing: Fix selftest config check for function graph start up test
  tracefs: Set the group ownership in apply_options() not parse_options()
  tracing/osnoise: Make osnoise_main to sleep for microseconds
  ftrace: Remove unused ftrace_startup_enable() stub
  tracing: Ensure trace buffer is at least 4096 bytes large
  tracing: Uninline trace_trigger_soft_disabled() partly
  eprobes: Remove redundant event type information
  tracing: Have traceon and traceoff trigger honor the instance
  tracing: Dump stacktrace trigger to the corresponding instance
  rtla: Fix systme -> system typo on man page
2022-02-26 12:10:17 -08:00