Commit Graph

152077 Commits

Author SHA1 Message Date
Rae Moar
6c4ea2f48d kunit: add is_init test attribute
Add is_init test attribute of type bool. Add to_string, get, and filter
methods to lib/kunit/attributes.c.

Mark each of the tests in the init section with the is_init=true attribute.

Add is_init to the attributes documentation.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-12-18 13:21:15 -07:00
Rae Moar
d81f0d7b8b kunit: add KUNIT_INIT_TABLE to init linker section
Add KUNIT_INIT_TABLE to the INIT_DATA linker section.

Alter the KUnit macros to create init tests:
kunit_test_init_section_suites

Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and
KUNIT_INIT_TABLE.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-12-18 13:21:15 -07:00
Rae Moar
69dfdce1c5 kunit: move KUNIT_TABLE out of INIT_DATA
Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and
into DATA_DATA.

Data for KUnit tests does not need to be in the init section.

In order to run tests again after boot the KUnit data cannot be labeled as
init data as the kernel could write over it.

Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init
data/functions.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-12-18 13:21:15 -07:00
David Gow
56778b49c9 kunit: Add a macro to wrap a deferred action function
KUnit's deferred action API accepts a void(*)(void *) function pointer
which is called when the test is exited. However, we very frequently
want to use existing functions which accept a single pointer, but which
may not be of type void*. While this is probably dodgy enough to be on
the wrong side of the C standard, it's been often used for similar
callbacks, and gcc's -Wcast-function-type seems to ignore cases where
the only difference is the type of the argument, assuming it's
compatible (i.e., they're both pointers to data).

However, clang 16 has introduced -Wcast-function-type-strict, which no
longer permits any deviation in function pointer type. This seems to be
because it'd break CFI, which validates the type of function calls.

This rather ruins our attempts to cast functions to defer them, and
leaves us with a few options. The one we've chosen is to implement a
macro which will generate a wrapper function which accepts a void*, and
casts the argument to the appropriate type.

For example, if you were trying to wrap:
void foo_close(struct foo *handle);
you could use:
KUNIT_DEFINE_ACTION_WRAPPER(kunit_action_foo_close,
			    foo_close,
			    struct foo *);

This would create a new kunit_action_foo_close() function, of type
kunit_action_t, which could be passed into kunit_add_action() and
similar functions.

In addition to defining this macro, update KUnit and its tests to use
it.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-12-18 13:21:14 -07:00
Linus Torvalds
2e3f280b24 Merge tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:

 - Limit Max_Read_Request_Size (MRRS) on some MIPS Loongson systems
   because they don't all support MRRS > 256, and firmware doesn't
   always initialize it correctly, which meant some PCIe devices didn't
   work (Jiaxun Yang)

 - Add and use pci_enable_link_state_locked() to prevent potential
   deadlocks in vmd and qcom drivers (Johan Hovold)

 - Revert recent (v6.5) acpiphp resource assignment changes that fixed
   issues with hot-adding devices on a root bus or with large BARs, but
   introduced new issues with GPU initialization and hot-adding SCSI
   disks in QEMU VMs and (Bjorn Helgaas)

* tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
  PCI/ASPM: Add pci_disable_link_state_locked() lockdep assert
  PCI/ASPM: Clean up __pci_disable_link_state() 'sem' parameter
  PCI: qcom: Clean up ASPM comment
  PCI: qcom: Fix potential deadlock when enabling ASPM
  PCI: vmd: Fix potential deadlock when enabling ASPM
  PCI/ASPM: Add pci_enable_link_state_locked()
  PCI: loongson: Limit MRRS to 256
2023-12-15 19:48:47 -08:00
Jens Axboe
ae1914174a cred: get rid of CONFIG_DEBUG_CREDENTIALS
This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-15 14:19:48 -08:00
Jens Axboe
f8fa5d7692 cred: switch to using atomic_long_t
There are multiple ways to grab references to credentials, and the only
protection we have against overflowing it is the memory required to do
so.

With memory sizes only moving in one direction, let's bump the reference
count to 64-bit and move it outside the realm of feasibly overflowing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-15 14:08:46 -08:00
Linus Torvalds
3bd7d74881 Merge tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
 "Just two minor fixes:

   - Fix for the io_uring socket option commands using the wrong value
     on some archs (Al)

   - Tweak to the poll lazy wake enable (me)"

* tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux:
  io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation
  io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE
2023-12-15 12:20:14 -08:00
Linus Torvalds
a62aa88ba1 Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6
  issues"

* tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mglru: reclaim offlined memcgs harder
  mm/mglru: respect min_ttl_ms with memcgs
  mm/mglru: try to stop at high watermarks
  mm/mglru: fix underprotected page cache
  mm/shmem: fix race in shmem_undo_range w/THP
  Revert "selftests: error out if kernel header files are not yet built"
  crash_core: fix the check for whether crashkernel is from high memory
  x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
  sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXEC
  loongarch, kexec: change dependency of object files
  mm/damon/core: make damon_start() waits until kdamond_fn() starts
  selftests/mm: cow: print ksft header before printing anything else
  mm: fix VMA heap bounds checking
  riscv: fix VMALLOC_START definition
  kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
2023-12-15 12:00:54 -08:00
Linus Torvalds
c7402612e2 Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - regressions:

   - tcp: fix tcp_disordered_ack() vs usec TS resolution

  Current release - new code bugs:

   - dpll: sanitize possible null pointer dereference in
     dpll_pin_parent_pin_set()

   - eth: octeon_ep: initialise control mbox tasks before using APIs

  Previous releases - regressions:

   - io_uring/af_unix: disable sending io_uring over sockets

   - eth: mlx5e:
       - TC, don't offload post action rule if not supported
       - fix possible deadlock on mlx5e_tx_timeout_work

   - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close

   - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()

   - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on

   - eth: team: fix use-after-free when an option instance allocation
     fails

  Previous releases - always broken:

   - neighbour: don't let neigh_forced_gc() disable preemption for long

   - net: prevent mss overflow in skb_segment()

   - ipv6: support reporting otherwise unknown prefix flags in
     RTM_NEWPREFIX

   - tcp: remove acked SYN flag from packet in the transmit queue
     correctly

   - eth: octeontx2-af:
       - fix a use-after-free in rvu_nix_register_reporters
       - fix promisc mcam entry action

   - eth: dwmac-loongson: make sure MDIO is initialized before use

   - eth: atlantic: fix double free in ring reinit logic"

* tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  net: atlantic: fix double free in ring reinit logic
  appletalk: Fix Use-After-Free in atalk_ioctl
  net: stmmac: Handle disabled MDIO busses from devicetree
  net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
  dpaa2-switch: do not ask for MDB, VLAN and FDB replay
  dpaa2-switch: fix size of the dma_unmap
  net: prevent mss overflow in skb_segment()
  vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
  Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
  MIPS: dts: loongson: drop incorrect dwmac fallback compatible
  stmmac: dwmac-loongson: drop useless check for compatible fallback
  stmmac: dwmac-loongson: Make sure MDIO is initialized before use
  tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
  dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
  net: ena: Fix XDP redirection error
  net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
  net: ena: Fix xdp drops handling due to multibuf packets
  net: ena: Destroy correct number of xdp queues upon failure
  net: Remove acked SYN flag from packet in the transmit queue correctly
  qed: Fix a potential use-after-free in qed_cxt_tables_alloc
  ...
2023-12-14 13:11:49 -08:00
Jens Axboe
595e52284d io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE
There are a few quirks around using lazy wake for poll unconditionally,
and one of them is related the EPOLLEXCLUSIVE. Those may trigger
exclusive wakeups, which wake a limited number of entries in the wait
queue. If that wake number is less than the number of entries someone is
waiting for (and that someone is also using DEFER_TASKRUN), then we can
get stuck waiting for more entries while we should be processing the ones
we already got.

If we're doing exclusive poll waits, flag the request as not being
compatible with lazy wakeups.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 6ce4a93dbb ("io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-13 08:58:15 -07:00
Yu Zhao
4376807bf2 mm/mglru: reclaim offlined memcgs harder
In the effort to reduce zombie memcgs [1], it was discovered that the
memcg LRU doesn't apply enough pressure on offlined memcgs.  Specifically,
instead of rotating them to the tail of the current generation
(MEMCG_LRU_TAIL) for a second attempt, it moves them to the next
generation (MEMCG_LRU_YOUNG) after the first attempt.

Not applying enough pressure on offlined memcgs can cause them to build
up, and this can be particularly harmful to memory-constrained systems.

On Pixel 8 Pro, launching apps for 50 cycles:
                 Before  After  Change
  Zombie memcgs  45      35     -22%

[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/

Link: https://lkml.kernel.org/r/20231208061407.2125867-4-yuzhao@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: T.J. Mercier <tjmercier@google.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:20 -08:00
Yu Zhao
8aa4206179 mm/mglru: respect min_ttl_ms with memcgs
While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru:
try to stop at high watermarks"), it was discovered that the memcg LRU can
breach the thrashing protection imposed by min_ttl_ms.

Before the memcg LRU:
  kswapd()
    shrink_node_memcgs()
      mem_cgroup_iter()
        inc_max_seq()  // always hit a different memcg
    lru_gen_age_node()
      mem_cgroup_iter()
        check the timestamp of the oldest generation

After the memcg LRU:
  kswapd()
    shrink_many()
      restart:
        iterate the memcg LRU:
          inc_max_seq()  // occasionally hit the same memcg
          if raced with lru_gen_rotate_memcg():
            goto restart
    lru_gen_age_node()
      mem_cgroup_iter()
        check the timestamp of the oldest generation

Specifically, when the restart happens in shrink_many(), it needs to stick
with the (memcg LRU) generation it began with.  In other words, it should
neither re-read memcg_lru->seq nor age an lruvec of a different
generation.  Otherwise it can hit the same memcg multiple times without
giving lru_gen_age_node() a chance to check the timestamp of that memcg's
oldest generation (against min_ttl_ms).

[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20231208061407.2125867-3-yuzhao@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:20 -08:00
Yu Zhao
081488051d mm/mglru: fix underprotected page cache
Unmapped folios accessed through file descriptors can be underprotected. 
Those folios are added to the oldest generation based on:

1. The fact that they are less costly to reclaim (no need to walk the
   rmap and flush the TLB) and have less impact on performance (don't
   cause major PFs and can be non-blocking if needed again).
2. The observation that they are likely to be single-use. E.g., for
   client use cases like Android, its apps parse configuration files
   and store the data in heap (anon); for server use cases like MySQL,
   it reads from InnoDB files and holds the cached data for tables in
   buffer pools (anon).

However, the oldest generation can be very short lived, and if so, it
doesn't provide the PID controller with enough time to respond to a surge
of refaults.  (Note that the PID controller uses weighted refaults and
those from evicted generations only take a half of the whole weight.) In
other words, for a short lived generation, the moving average smooths out
the spike quickly.

To fix the problem:
1. For folios that are already on LRU, if they can be beyond the
   tracking range of tiers, i.e., five accesses through file
   descriptors, move them to the second oldest generation to give them
   more time to age. (Note that tiers are used by the PID controller
   to statistically determine whether folios accessed multiple times
   through file descriptors are worth protecting.)
2. When adding unmapped folios to LRU, adjust the placement of them so
   that they are not too close to the tail. The effect of this is
   similar to the above.

On Android, launching 55 apps sequentially:
                           Before     After      Change
  workingset_refault_anon  25641024   25598972   0%
  workingset_refault_file  115016834  106178438  -8%

Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com
Fixes: ac35a49023 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:19 -08:00
SeongJae Park
6376a82459 mm/damon/core: make damon_start() waits until kdamond_fn() starts
The cleanup tasks of kdamond threads including reset of corresponding
DAMON context's ->kdamond field and decrease of global nr_running_ctxs
counter is supposed to be executed by kdamond_fn().  However, commit
0f91d13366 ("mm/damon: simplify stop mechanism") made neither
damon_start() nor damon_stop() ensure the corresponding kdamond has
started the execution of kdamond_fn().

As a result, the cleanup can be skipped if damon_stop() is called fast
enough after the previous damon_start().  Especially the skipped reset
of ->kdamond could cause a use-after-free.

Fix it by waiting for start of kdamond_fn() execution from
damon_start().

Link: https://lkml.kernel.org/r/20231208175018.63880-1-sj@kernel.org
Fixes: 0f91d13366 ("mm/damon: simplify stop mechanism")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Jakub Acs <acsjakub@amazon.de>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jakub Acs <acsjakub@amazon.de>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:17 -08:00
Kefeng Wang
d3bb89ea9c mm: fix VMA heap bounds checking
After converting selinux to VMA heap check helper, the gcl triggers an
execheap SELinux denial, which is caused by a changed logic check.

Previously selinux only checked that the VMA range was within the VMA heap
range, and the implementation checks the intersection between the two
ranges, but the corner case (vm_end=start_brk, brk=vm_start) isn't handled
correctly.

Since commit 11250fd12e ("mm: factor out VMA stack and heap checks") was
only a function extraction, it seems that the issue was introduced by
commit 0db0c01b53 ("procfs: fix /proc/<pid>/maps heap check").  Let's
fix above corner cases, meanwhile, correct the wrong indentation of the
stack and heap check helpers.

Fixes: 11250fd12e ("mm: factor out VMA stack and heap checks")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://lore.kernel.org/selinux/CAFqZXNv0SVT0fkOK6neP9AXbj3nxJ61JAY4+zJzvxqJaeuhbFw@mail.gmail.com/
Tested-by: Ondrej Mosnacek <omosnace@redhat.com>
Link: https://lkml.kernel.org/r/20231207152525.2607420-1-wangkefeng.wang@huawei.com
Cc: David Hildenbrand <david@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:17 -08:00
Linus Torvalds
cf52eed70e Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "Fix various bugs / regressions for ext4, including a soft lockup, a
  WARN_ON, and a BUG"

* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix soft lockup in journal_finish_inode_data_buffers()
  ext4: fix warning in ext4_dio_write_end_io()
  jbd2: increase the journal IO's priority
  jbd2: correct the printing of write_flags in jbd2_write_superblock()
  ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12 11:37:04 -08:00
Linus Torvalds
eaadbbaaff Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:

 - Fix a couple of potential crashes, one introduced in 6.6 and one
   in 5.10

 - Fix misbehavior of virtiofs submounts on memory pressure

 - Clarify naming in the uAPI for a recent feature

* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
  fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
  fuse: share lookup state between submount and its parent
  docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
  fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
2023-12-12 11:06:41 -08:00
Johan Hovold
718ab82266 PCI/ASPM: Add pci_enable_link_state_locked()
Add pci_enable_link_state_locked() for enabling link states that can be
used in contexts where a pci_bus_sem read lock is already held (e.g. from
pci_walk_bus()).

This helper will be used to fix a couple of potential deadlocks where
the current helper is called with the lock already held, hence the CC
stable tag.

Fixes: f492edb40b ("PCI: vmd: Add quirk to configure PCIe ASPM and LTR")
Link: https://lore.kernel.org/r/20231128081512.19387-2-johan+linaro@kernel.org
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
[bhelgaas: include helper name in subject, commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: <stable@vger.kernel.org>	# 6.3
Cc: Michael Bottini <michael.a.bottini@linux.intel.com>
Cc: David E. Box <david.e.box@linux.intel.com>
2023-12-11 12:07:42 -06:00
Vlad Buslov
125f1c7f26 net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
The referenced change added custom cleanup code to act_ct to delete any
callbacks registered on the parent block when deleting the
tcf_ct_flow_table instance. However, the underlying issue is that the
drivers don't obtain the reference to the tcf_ct_flow_table instance when
registering callbacks which means that not only driver callbacks may still
be on the table when deleting it but also that the driver can still have
pointers to its internal nf_flowtable and can use it concurrently which
results either warning in netfilter[0] or use-after-free.

Fix the issue by taking a reference to the underlying struct
tcf_ct_flow_table instance when registering the callback and release the
reference when unregistering. Expose new API required for such reference
counting by adding two new callbacks to nf_flowtable_type and implementing
them for act_ct flowtable_ct type. This fixes the issue by extending the
lifetime of nf_flowtable until all users have unregistered.

[0]:
[106170.938634] ------------[ cut here ]------------
[106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
[106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm
ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis
try overlay mlx5_core
[106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1
[106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
[106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00
[106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202
[106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48
[106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000
[106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700
[106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58
[106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000
[106170.951616] FS:  0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000
[106170.952329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0
[106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[106170.954766] Call Trace:
[106170.955057]  <TASK>
[106170.955315]  ? __warn+0x79/0x120
[106170.955648]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
[106170.956172]  ? report_bug+0x17c/0x190
[106170.956537]  ? handle_bug+0x3c/0x60
[106170.956891]  ? exc_invalid_op+0x14/0x70
[106170.957264]  ? asm_exc_invalid_op+0x16/0x20
[106170.957666]  ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core]
[106170.958172]  ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core]
[106170.958788]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
[106170.959339]  ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core]
[106170.959854]  ? mapping_remove+0x154/0x1d0 [mlx5_core]
[106170.960342]  ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core]
[106170.960927]  mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core]
[106170.961441]  mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core]
[106170.962001]  mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core]
[106170.962524]  mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core]
[106170.963034]  mlx5e_flow_put+0x73/0xe0 [mlx5_core]
[106170.963506]  mlx5e_put_flow_list+0x38/0x70 [mlx5_core]
[106170.964002]  mlx5e_rep_update_flows+0xec/0x290 [mlx5_core]
[106170.964525]  mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core]
[106170.965056]  process_one_work+0x13a/0x2c0
[106170.965443]  worker_thread+0x2e5/0x3f0
[106170.965808]  ? rescuer_thread+0x410/0x410
[106170.966192]  kthread+0xc6/0xf0
[106170.966515]  ? kthread_complete_and_exit+0x20/0x20
[106170.966970]  ret_from_fork+0x2d/0x50
[106170.967332]  ? kthread_complete_and_exit+0x20/0x20
[106170.967774]  ret_from_fork_asm+0x11/0x20
[106170.970466]  </TASK>
[106170.970726] ---[ end trace 0000000000000000 ]---

Fixes: 77ac5e40c4 ("net/sched: act_ct: remove and free nf_table callbacks")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11 09:59:58 +00:00
Linus Torvalds
8aa74869d2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "Primarily rtrs and irdma fixes:

   - Fix uninitialized value in ib_get_eth_speed()

   - Fix hns refusing to work if userspace doesn't select the correct
     congestion control algorithm

   - Several irdma fixes - unreliable Send Queue Drain, use after free,
     64k page size bugs, device removal races

   - Several rtrs bug fixes - crashes, memory leaks, use after free, bad
     credit accounting, bogus WARN_ON

   - Typos and a MAINTAINER update"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Avoid free the non-cqp_request scratch
  RDMA/irdma: Fix support for 64k pages
  RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned
  RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz
  RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info()
  RDMA/bnxt_re: Correct module description string
  RDMA/rtrs-clt: Remove the warnings for req in_use check
  RDMA/rtrs-clt: Fix the max_send_wr setting
  RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight
  RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true
  RDMA/rtrs-srv: Check return values while processing info request
  RDMA/rtrs-clt: Start hb after path_up
  RDMA/rtrs-srv: Do not unconditionally enable irq
  MAINTAINERS: Add Chengchang Tang as Hisilicon RoCE maintainer
  RDMA/irdma: Add wait for suspend on SQD
  RDMA/irdma: Do not modify to SQD on error
  RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm
  RDMA/core: Fix uninit-value access in ib_get_eth_speed()
2023-12-08 12:27:11 -08:00
Linus Torvalds
38bafa65b1 Merge tag 'drm-fixes-2023-12-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regular weekly fixes, mostly amdgpu and i915 as usual. A couple of
  nouveau, panfrost, one core and one bridge Kconfig.

  Seems about normal for rc5.

  atomic-helpers:
   - invoke end_fb_access while owning plane state

  i915:
   - fix a missing dep for a previous fix
   - Relax BXT/GLK DSI transcoder hblank limits
   - Fix DP MST .mode_valid_ctx() return values
   - Reject DP MST modes that require bigjoiner (as it's not yet
     supported on DP MST)
   - Fix _intel_dsb_commit() variable type to allow negative values

  nouveau:
   - document some bits of gsp rm
   - flush vmm more on tu102 to avoid hangs

  panfrost:
   - fix imported dma-buf objects residency
   - fix device freq update

  bridge:
   - tc358768 - fix Kconfig

  amdgpu:
   - Disable MCBP on gfx9
   - DC vbios fix
   - eDP fix
   - dml2 UBSAN fix
   - SMU 14 fix
   - RAS fixes
   - dml KASAN/KCSAN fix
   - PSP 13 fix
   - Clockgating fixes
   - Suspend fix

  exynos:
   - fix pointer dereference
   - fix wrong error check"

* tag 'drm-fixes-2023-12-08' of git://anongit.freedesktop.org/drm/drm: (27 commits)
  drm/exynos: fix a wrong error checking
  drm/exynos: fix a potential error pointer dereference
  drm/amdgpu: fix buffer funcs setting order on suspend
  drm/amdgpu: Avoid querying DRM MGCG status
  drm/amdgpu: Update HDP 4.4.2 clock gating flags
  drm/amdgpu: Add NULL checks for function pointers
  drm/amdgpu: Restrict extended wait to PSP v13.0.6
  drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml
  drm/amdgpu: optimize the printing order of error data
  drm/amdgpu: Update fw version for boot time error query
  drm/amd/pm: support new mca smu error code decoding
  drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table
  drm/amd/display: Fix array-index-out-of-bounds in dml2
  drm/amd/display: Add monitor patch for specific eDP
  drm/amd/display: Use channel_width = 2 for vram table 3.0
  drm/amdgpu: disable MCBP by default
  drm/atomic-helpers: Invoke end_fb_access while owning plane state
  drm/i915: correct the input parameter on _intel_dsb_commit()
  drm/i915/mst: Reject modes that require the bigjoiner
  drm/i915/mst: Fix .mode_valid_ctx() return values
  ...
2023-12-08 11:17:44 -08:00
Linus Torvalds
a6adef8987 Merge tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "Most of the changes are devicetree fixes for NXP, Mediatek, Rockchips
  Arm machines as well as Microchip RISC-V, and most of these address
  build-time warnings for spec violations and other minor issues. One of
  the Mediatek warnings was enabled by default and prevented a clean
  build.

  The ones that address serious runtime issues are all on the i.MX
  platform:

   - a boot time panic on imx8qm

   - USB hanging under load on imx8

   - regressions on the imx93 ethernet phy

  Code fixes include a minor error handling for the i.MX PMU driver, and
  a number of firmware driver fixes:

   - OP-TEE fix for supplicant based device enumeration, and a new sysfs
     attribute to needed to fix a race against userspace

   - Arm SCMI fix for possible truncation/overflow in the frequency
     computations

   - Multiple FF-A fixes for the newly added notification support"

* tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (55 commits)
  MAINTAINERS: change the S32G2 maintainer's email address.
  arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588
  ARM: dts: imx28-xea: Pass the 'model' property
  ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
  MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry
  arm64: dts: imx8-apalis: set wifi regulator to always-on
  ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
  arm64: dts: imx8ulp: update gpio node name to align with register address
  arm64: dts: imx93: update gpio node name to align with register address
  arm64: dts: imx93: correct mediamix power
  arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup
  arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells
  arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
  dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
  ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
  arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3
  arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
  arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi
  firmware: arm_scmi: Fix possible frequency truncation when using level indexing mode
  firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
  ...
2023-12-08 08:58:39 -08:00
Linus Torvalds
8e819a7623 Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "31 hotfixes. Ten of these address pre-6.6 issues and are marked
  cc:stable. The remainder address post-6.6 issues or aren't considered
  serious enough to justify backporting"

* tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
  nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
  mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
  scripts/gdb: fix lx-device-list-bus and lx-device-list-class
  MAINTAINERS: drop Antti Palosaari
  highmem: fix a memory copy problem in memcpy_from_folio
  nilfs2: fix missing error check for sb_set_blocksize call
  kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
  units: add missing header
  drivers/base/cpu: crash data showing should depends on KEXEC_CORE
  mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
  scripts/gdb/tasks: fix lx-ps command error
  mm/Kconfig: make userfaultfd a menuconfig
  selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
  mm/damon/core: copy nr_accesses when splitting region
  lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
  checkstack: fix printed address
  mm/memory_hotplug: fix error handling in add_memory_resource()
  mm/memory_hotplug: add missing mem_hotplug_lock
  .mailmap: add a new address mapping for Chester Lin
  ...
2023-12-08 08:36:23 -08:00
Maciej Żenczykowski
bd4a816752 net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX
Lorenzo points out that we effectively clear all unknown
flags from PIO when copying them to userspace in the netlink
RTM_NEWPREFIX notification.

We could fix this one at a time as new flags are defined,
or in one fell swoop - I choose the latter.

We could either define 6 new reserved flags (reserved1..6) and handle
them individually (and rename them as new flags are defined), or we
could simply copy the entire unmodified byte over - I choose the latter.

This unfortunately requires some anonymous union/struct magic,
so we add a static assert on the struct size for a little extra safety.

Cc: David Ahern <dsahern@kernel.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08 10:40:51 +00:00
David S. Miller
179a8b515e Merge tag 'mlx5-fixes-2023-12-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2023-12-04

This series provides bug fixes to mlx5 driver.

V1->V2:
  - Drop commit #9 ("net/mlx5e: Forbid devlink reload if IPSec rules are
    offloaded"), we are working on a better fix

Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08 10:30:34 +00:00
Dave Airlie
9ac4883d24 Merge tag 'drm-misc-fixes-2023-12-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.7-rc5:
- Document nouveau's GSP-RM.
- Flush vmm harder on nouveau tu102.
- Panfrost fix for imported dma-buf objects, and device frequency.
- Kconfig Build fix for tc358768.
- Call end_fb_access after atomic commit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/05a26dc0-8cf1-4b1f-abb6-3bf471fbfc99@linux.intel.com
2023-12-08 12:16:11 +10:00
Linus Torvalds
5e3f5b81de Merge tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff

  Current release - new code bugs:

   - tcp: assorted fixes to the new Auth Option support

  Older releases - regressions:

   - tcp: fix mid stream window clamp

   - tls: fix incorrect splice handling

   - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit()

   - dsa: mv88e6xxx: restore USXGMII support for 6393X

   - arcnet: restore support for multiple Sohard Arcnet cards

  Older releases - always broken:

   - tcp: do not accept ACK of bytes we never sent

   - require admin privileges to receive packet traces via netlink

   - packet: move reference count in packet_sock to atomic_long_t

   - bpf:
      - fix incorrect branch offset comparison with cpu=v4
      - fix prog_array_map_poke_run map poke update

   - netfilter:
      - three fixes for crashes on bad admin commands
      - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
      - nf_tables: fix 'exist' matching on bigendian arches

   - leds: netdev: fix RTNL handling to prevent potential deadlock

   - eth: tg3: prevent races in error/reset handling

   - eth: r8169: fix rtl8125b PAUSE storm when suspended

   - eth: r8152: improve reset and surprise removal handling

   - eth: hns: fix race between changing features and sending

   - eth: nfp: fix sleep in atomic for bonding offload"

* tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
  net/smc: fix missing byte order conversion in CLC handshake
  net: dsa: microchip: provide a list of valid protocols for xmit handler
  drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
  psample: Require 'CAP_NET_ADMIN' when joining "packets" group
  bpf: sockmap, updating the sg structure should also update curr
  net: tls, update curr on splice as well
  nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
  net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
  tcp: do not accept ACK of bytes we never sent
  selftests/bpf: Add test for early update in prog_array_map_poke_run
  bpf: Fix prog_array_map_poke_run map poke update
  netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
  netfilter: nf_tables: validate family when identifying table via handle
  netfilter: nf_tables: bail out on mismatching dynset and set expressions
  netfilter: nf_tables: fix 'exist' matching on bigendian arches
  netfilter: nft_set_pipapo: skip inactive elements during set walk
  netfilter: bpf: fix bad registration on nf_defrag
  leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
  octeontx2-af: Update Tx link register range
  ...
2023-12-07 17:04:13 -08:00
Linus Torvalds
33d42bde99 Merge tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:

 - Fix i8042 filter resource handling, input, and suspend issues in
   asus-wmi

 - Skip zero instance WMI blocks to avoid issues with some laptops

 - Differentiate dev/production keys in mlxbf-bootctl

 - Correct surface serdev related return value to avoid leaking errno
   into userspace

 - Error checking fixes

* tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/mellanox: Check devm_hwmon_device_register_with_groups() return value
  platform/mellanox: Add null pointer checks for devm_kasprintf()
  mlxbf-bootctl: correctly identify secure boot with development keys
  platform/x86: wmi: Skip blocks with zero instances
  platform/surface: aggregator: fix recv_buf() return value
  platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend
  platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd
  platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter
  platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
2023-12-07 12:10:55 -08:00
Ido Schimmel
e03781879a drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
The "NET_DM" generic netlink family notifies drop locations over the
"events" multicast group. This is problematic since by default generic
netlink allows non-root users to listen to these notifications.

Fix by adding a new field to the generic netlink multicast group
structure that when set prevents non-root users or root without the
'CAP_SYS_ADMIN' capability (in the user namespace owning the network
namespace) from joining the group. Set this field for the "events"
group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
nature of the information that is shared over this group.

Note that the capability check in this case will always be performed
against the initial user namespace since the family is not netns aware
and only operates in the initial network namespace.

A new field is added to the structure rather than using the "flags"
field because the existing field uses uAPI flags and it is inappropriate
to add a new uAPI flag for an internal kernel check. In net-next we can
rework the "flags" field to use internal flags and fold the new field
into it. But for now, in order to reduce the amount of changes, add a
new field.

Since the information can only be consumed by root, mark the control
plane operations that start and stop the tracing as root-only using the
'GENL_ADMIN_PERM' flag.

Tested using [1].

Before:

 # capsh -- -c ./dm_repo
 # capsh --drop=cap_sys_admin -- -c ./dm_repo

After:

 # capsh -- -c ./dm_repo
 # capsh --drop=cap_sys_admin -- -c ./dm_repo
 Failed to join "events" multicast group

[1]
 $ cat dm.c
 #include <stdio.h>
 #include <netlink/genl/ctrl.h>
 #include <netlink/genl/genl.h>
 #include <netlink/socket.h>

 int main(int argc, char **argv)
 {
 	struct nl_sock *sk;
 	int grp, err;

 	sk = nl_socket_alloc();
 	if (!sk) {
 		fprintf(stderr, "Failed to allocate socket\n");
 		return -1;
 	}

 	err = genl_connect(sk);
 	if (err) {
 		fprintf(stderr, "Failed to connect socket\n");
 		return err;
 	}

 	grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events");
 	if (grp < 0) {
 		fprintf(stderr,
 			"Failed to resolve \"events\" multicast group\n");
 		return grp;
 	}

 	err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
 	if (err) {
 		fprintf(stderr, "Failed to join \"events\" multicast group\n");
 		return err;
 	}

 	return 0;
 }
 $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c

Fixes: 9a8afc8d39 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07 09:54:02 -08:00
Jakub Kicinski
c85e5594b7 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2023-12-06

We've added 4 non-merge commits during the last 6 day(s) which contain
a total of 7 files changed, 185 insertions(+), 55 deletions(-).

The main changes are:

1) Fix race found by syzkaller on prog_array_map_poke_run when
   a BPF program's kallsym symbols were still missing, from Jiri Olsa.

2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA,
   from Yonghong Song.

3) Fix xsk's poll handling to only set mask on bound xsk sockets,
   from Yewon Choi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add test for early update in prog_array_map_poke_run
  bpf: Fix prog_array_map_poke_run map poke update
  xsk: Skip polling event check for unbound socket
  bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
====================

Link: https://lore.kernel.org/r/20231206220528.12093-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07 09:32:24 -08:00
Andrew Morton
0c92218f4e Merge branch 'master' into mm-hotfixes-stable 2023-12-06 17:03:50 -08:00
Su Hui
73424d00dc highmem: fix a memory copy problem in memcpy_from_folio
Clang static checker complains that value stored to 'from' is never read. 
And memcpy_from_folio() only copy the last chunk memory from folio to
destination.  Use 'to += chunk' to replace 'from += chunk' to fix this
typo problem.

Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com
Fixes: b23d03ef7a ("highmem: add memcpy_to_folio() and memcpy_from_folio()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:49 -08:00
Andy Shevchenko
8e92157d7f units: add missing header
BITS_PER_BYTE is defined in bits.h.

Link: https://lkml.kernel.org/r/20231128174404.393393-1-andriy.shevchenko@linux.intel.com
Fixes: e8eed5f736 ("units: Add BYTES_PER_*BIT")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Damian Muszynski <damian.muszynski@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:48 -08:00
Mike Kravetz
187da0f825 hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write
The routine __vma_private_lock tests for the existence of a reserve map
associated with a private hugetlb mapping.  A pointer to the reserve map
is in vma->vm_private_data.  __vma_private_lock was checking the pointer
for NULL.  However, it is possible that the low bits of the pointer could
be used as flags.  In such instances, vm_private_data is not NULL and not
a valid pointer.  This results in the null-ptr-deref reported by syzbot:

general protection fault, probably for non-canonical address 0xdffffc000000001d:
 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88
8cf78c29e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1
0/09/2023
RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004
...
Call Trace:
 <TASK>
 lock_acquire kernel/locking/lockdep.c:5753 [inline]
 lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
 down_write+0x93/0x200 kernel/locking/rwsem.c:1573
 hugetlb_vma_lock_write mm/hugetlb.c:300 [inline]
 hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291
 __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447
 hugetlb_zap_begin include/linux/hugetlb.h:258 [inline]
 unmap_vmas+0x2f4/0x470 mm/memory.c:1733
 exit_mmap+0x1ad/0xa60 mm/mmap.c:3230
 __mmput+0x12a/0x4d0 kernel/fork.c:1349
 mmput+0x62/0x70 kernel/fork.c:1371
 exit_mm kernel/exit.c:567 [inline]
 do_exit+0x9ad/0x2a20 kernel/exit.c:861
 __do_sys_exit kernel/exit.c:991 [inline]
 __se_sys_exit kernel/exit.c:989 [inline]
 __x64_sys_exit+0x42/0x50 kernel/exit.c:989
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Mask off low bit flags before checking for NULL pointer.  In addition, the
reserve map only 'belongs' to the OWNER (parent in parent/child
relationships) so also check for the OWNER flag.

Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com
Reported-by: syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/
Fixes: bf4916922c ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Edward Adam Davis <eadavis@qq.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:43 -08:00
Jiri Olsa
4b7de80160 bpf: Fix prog_array_map_poke_run map poke update
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.

There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.

The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.

I'm hitting following race during the program load:

  CPU 0                             CPU 1

  bpf_prog_load
    bpf_check
      do_misc_fixups
        prog_array_map_poke_track

                                    map_update_elem
                                      bpf_fd_array_map_update_elem
                                        prog_array_map_poke_run

                                          bpf_arch_text_poke returns -EINVAL

    bpf_prog_kallsyms_add

After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

Similar race exists on the program unload.

Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.

Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.

  [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
2023-12-06 22:40:16 +01:00
Arnd Bergmann
b0b2981c49 Merge tag 'ffa-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
Arm FF-A fixes for v6.7

A bunch of fixes addressing issues around the notification support that
was added this cycle. They address issue in partition IDs handling in
ffa_notification_info_get(), notifications cleanup path and the size of
the allocation in ffa_partitions_cleanup().

It also adds check for the notification enabled state so that the drivers
registering the callbacks can be rejected if not enabled/supported.

It also moves the partitions setup operation after the notification
initialisation so that the driver has the correct state for notification
enabled/supported before the partitions are initialised/setup.

It also now allows FF-A initialisation to complete successfully even
when the notification initialisation fails as it is an optional support
in the specification. Initial support just allowed it only if the
firmware didn't support notifications.

Finally, it also adds a fix for smatch warning by declaring ffa_bus_type
structure in the header.

* tag 'ffa-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_ffa: Fix ffa_notification_info_get() IDs handling
  firmware: arm_ffa: Fix the size of the allocation in ffa_partitions_cleanup()
  firmware: arm_ffa: Fix FFA notifications cleanup path
  firmware: arm_ffa: Add checks for the notification enabled state
  firmware: arm_ffa: Setup the partitions after the notification initialisation
  firmware: arm_ffa: Allow FF-A initialisation even when notification fails
  firmware: arm_ffa: Declare ffa_bus_type structure in the header

Link: https://lore.kernel.org/r/20231116191603.929767-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06 17:35:29 +01:00
Dmitry Safonov
9396c4ee93 net/tcp: Don't store TCP-AO maclen on reqsk
This extra check doesn't work for a handshake when SYN segment has
(current_key.maclen != rnext_key.maclen). It could be amended to
preserve rnext_key.maclen instead of current_key.maclen, but that
requires a lookup on listen socket.

Originally, this extra maclen check was introduced just because it was
cheap. Drop it and convert tcp_request_sock::maclen into boolean
tcp_request_sock::used_tcp_ao.

Fixes: 06b22ef295 ("net/tcp: Wire TCP-AO to request sockets")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06 12:36:56 +01:00
Dmitry Safonov
da7dfaa6d6 net/tcp: Consistently align TCP-AO option in the header
Currently functions that pre-calculate TCP header options length use
unaligned TCP-AO header + MAC-length for skb reservation.
And the functions that actually write TCP-AO options into skb do align
the header. Nothing good can come out of this for ((maclen % 4) != 0).

Provide tcp_ao_len_aligned() helper and use it everywhere for TCP
header options space calculations.

Fixes: 1e03d32bea ("net/tcp: Add TCP-AO sign to outgoing packets")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06 12:36:55 +01:00
Thomas Zimmermann
e0f04e41e8 drm/atomic-helpers: Invoke end_fb_access while owning plane state
Invoke drm_plane_helper_funcs.end_fb_access before
drm_atomic_helper_commit_hw_done(). The latter function hands over
ownership of the plane state to the following commit, which might
free it. Releasing resources in end_fb_access then operates on undefined
state. This bug has been observed with non-blocking commits when they
are being queued up quickly.

Here is an example stack trace from the bug report. The plane state has
been free'd already, so the pages for drm_gem_fb_vunmap() are gone.

Unable to handle kernel paging request at virtual address 0000000100000049
[...]
 drm_gem_fb_vunmap+0x18/0x74
 drm_gem_end_shadow_fb_access+0x1c/0x2c
 drm_atomic_helper_cleanup_planes+0x58/0xd8
 drm_atomic_helper_commit_tail+0x90/0xa0
 commit_tail+0x15c/0x188
 commit_work+0x14/0x20

Fix this by running end_fb_access immediately after updating all planes
in drm_atomic_helper_commit_planes(). The existing clean-up helper
drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.

For aborted commits, roll back from drm_atomic_helper_prepare_planes()
in the new helper drm_atomic_helper_unprepare_planes(). This case is
different from regular cleanup, as we have to release the new state;
regular cleanup releases the old state. The new helper also invokes
cleanup_fb for all planes.

The changes mostly involve DRM's atomic helpers. Only two drivers, i915
and nouveau, implement their own commit function. Update them to invoke
drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail
function do not require changes.

v4:
	* fix documentation (kernel test robot)
v3:
	* add drm_atomic_helper_unprepare_planes() for rolling back
	* use correct state for end_fb_access
v2:
	* fix test in drm_atomic_helper_cleanup_planes()

Reported-by: Alyssa Ross <hi@alyssa.is>
Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Fixes: 94d879eaf7 ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers")
Tested-by: Alyssa Ross <hi@alyssa.is>
Reviewed-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: <stable@vger.kernel.org> # v6.2+
Link: https://patchwork.freedesktop.org/patch/msgid/20231204083247.22006-1-tzimmermann@suse.de
2023-12-06 10:51:27 +01:00
Kelly Kane
7037d95a04 r8152: add vendor/device ID pair for ASUS USB-C2500
The ASUS USB-C2500 is an RTL8156 based 2.5G Ethernet controller.

Add the vendor and product ID values to the driver. This makes Ethernet
work with the adapter.

Signed-off-by: Kelly Kane <kelly@hawknetworks.com>
Link: https://lore.kernel.org/r/20231203011712.6314-1-kelly@hawknetworks.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06 10:38:38 +01:00
Paolo Abeni
58d3aade20 tcp: fix mid stream window clamp.
After the blamed commit below, if the user-space application performs
window clamping when tp->rcv_wnd is 0, the TCP socket will never be
able to announce a non 0 receive window, even after completely emptying
the receive buffer and re-setting the window clamp to higher values.

Refactor tcp_set_window_clamp() to address the issue: when the user
decreases the current clamp value, set rcv_ssthresh according to the
same logic used at buffer initialization, but ensuring reserved mem
provisioning.

To avoid code duplication factor-out the relevant bits from
tcp_adjust_rcv_ssthresh() in a new helper and reuse it in the above
scenario.

When increasing the clamp value, give the rcv_ssthresh a chance to grow
according to previously implemented heuristic.

Fixes: 3aa7857fe1 ("tcp: enable mid stream window clamp")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/705dad54e6e6e9a010e571bf58e0b35a8ae70503.1701706073.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05 20:07:02 -08:00
Leon Romanovsky
c2bf84f1d1 net/mlx5e: Tidy up IPsec NAT-T SA discovery
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.

That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.

Fixes: d659549349 ("net/mlx5e: Support IPsec NAT-T functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:52 -08:00
Leon Romanovsky
a5e400a985 net/mlx5e: Honor user choice of IPsec replay window size
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.

Fixes: 7db21ef456 ("net/mlx5e: Set IPsec replay sequence numbers")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:51 -08:00
Jianheng Zhang
37e4b8df27 net: stmmac: fix FPE events losing
The status bits of register MAC_FPE_CTRL_STS are clear on read. Using
32-bit read for MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and
dwmac5_fpe_send_mpacket() clear the status bits. Then the stmmac interrupt
handler missing FPE event status and leads to FPE handshaking failure and
retries.
To avoid clear status bits of MAC_FPE_CTRL_STS in dwmac5_fpe_configure()
and dwmac5_fpe_send_mpacket(), add fpe_csr to stmmac_fpe_cfg structure to
cache the control bits of MAC_FPE_CTRL_STS and to avoid reading
MAC_FPE_CTRL_STS in those methods.

Fixes: 5a5586112b ("net: stmmac: support FPE link partner hand-shaking procedure")
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jianheng Zhang <Jianheng.Zhang@synopsys.com>
Link: https://lore.kernel.org/r/CY5PR12MB637225A7CF529D5BE0FBE59CBF81A@CY5PR12MB6372.namprd12.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:35:19 -08:00
Mike Marciniszyn
4fbc3a52cd RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz
64k pages introduce the situation in this diagram when the HCA 4k page
size is being used:

 +-------------------------------------------+ <--- 64k aligned VA
 |                                           |
 |              HCA 4k page                  |
 |                                           |
 +-------------------------------------------+
 |                   o                       |
 |                                           |
 |                   o                       |
 |                                           |
 |                   o                       |
 +-------------------------------------------+
 |                                           |
 |              HCA 4k page                  |
 |                                           |
 +-------------------------------------------+ <--- Live HCA page
 |OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO| <--- offset
 |                                           | <--- VA
 |                MR data                    |
 +-------------------------------------------+
 |                                           |
 |              HCA 4k page                  |
 |                                           |
 +-------------------------------------------+
 |                   o                       |
 |                                           |
 |                   o                       |
 |                                           |
 |                   o                       |
 +-------------------------------------------+
 |                                           |
 |              HCA 4k page                  |
 |                                           |
 +-------------------------------------------+

The VA addresses are coming from rdma-core in this diagram can be
arbitrary, but for 64k pages, the VA may be offset by some number of HCA
4k pages and followed by some number of HCA 4k pages.

The current iterator doesn't account for either the preceding 4k pages or
the following 4k pages.

Fix the issue by extending the ib_block_iter to contain the number of DMA
pages like comment [1] says and by using __sg_advance to start the
iterator at the first live HCA page.

The changes are contained in a parallel set of iterator start and next
functions that are umem aware and specific to umem since there is one user
of the rdma_for_each_block() without umem.

These two fixes prevents the extra pages before and after the user MR
data.

Fix the preceding pages by using the __sq_advance field to start at the
first 4k page containing MR data.

Fix the following pages by saving the number of pgsz blocks in the
iterator state and downcounting on each next.

This fix allows for the elimination of the small page crutch noted in the
Fixes.

Fixes: 10c75ccb54 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()")
Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04 20:02:41 -04:00
Tyler Fanelli
c55e0a55b1 fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its
description indicates a purpose of reducing memory footprint. This
may imply that it could be further used to relax other DIRECT_IO
operations in the future.

Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing,
allow shared mmap of DIRECT_IO files while still bypassing the cache
on regular reads and writes.

[Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
Fixes: e78662e818 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:14:39 +01:00
Linus Torvalds
17b17be28d Merge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio
Pull vfio fixes from Alex Williamson:

 - Fix the lifecycle of a mutex in the pds variant driver such that a
   reset prior to opening the device won't find it uninitialized.
   Implement the release path to symmetrically destroy the mutex. Also
   switch a different lock from spinlock to mutex as the code path has
   the potential to sleep and doesn't need the spinlock context
   otherwise (Brett Creeley)

 - Fix an issue detected via randconfig where KVM tries to symbol_get an
   undeclared function. The symbol is temporarily declared
   unconditionally here, which resolves the problem and avoids churn
   relative to a series pending for the next merge window which resolves
   some of this symbol ugliness, but also fixes Kconfig dependencies
   (Sean Christopherson)

* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
  vfio/pds: Fix possible sleep while in atomic context
  vfio/pds: Fix mutex lock->magic != lock warning
2023-12-03 08:37:39 +09:00
Linus Torvalds
669fc83452 Merge tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - objpool: Fix objpool overrun case on memory/cache access delay
   especially on the big.LITTLE SoC. The objpool uses a copy of object
   slot index internal loop, but the slot index can be changed on
   another processor in parallel. In that case, the difference of 'head'
   local copy and the 'slot->last' index will be bigger than local slot
   size. In that case, we need to re-read the slot::head to update it.

 - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
   kretprobe_holder::rp is RCU managed, it should use
   rcu_assign_pointer() and rcu_dereference_check() correctly. Also
   adding __rcu tag for finding wrong usage by sparse.

 - rethook: Fix to use appropriate rcu API for rethook::handler. The
   same as kretprobe, rethook::handler is RCU managed and it should use
   rcu_assign_pointer() and rcu_dereference_check(). This also adds
   __rcu tag for finding wrong usage by sparse.

* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook: Use __rcu pointer for rethook::handler
  kprobes: consistent rcu api usage for kretprobe holder
  lib: objpool: fix head overrun on RK3588 SBC
2023-12-03 08:02:49 +09:00
Linus Torvalds
815fb87b75 Merge tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix issues in two cpufreq drivers, in the AMD P-state driver and
  in the power-capping DTPM framework.

  Specifics:

   - Fix the AMD P-state driver's EPP sysfs interface in the cases when
     the performance governor is in use (Ayush Jain)

   - Make the ->fast_switch() callback in the AMD P-state driver return
     the target frequency as expected (Gautham R. Shenoy)

   - Allow user space to control the range of frequencies to use via
     scaling_min_freq and scaling_max_freq when AMD P-state driver is in
     use (Wyes Karny)

   - Prevent power domains needed for wakeup signaling from being turned
     off during system suspend on Qualcomm systems and prevent
     performance states votes from runtime-suspended devices from being
     lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
     (Stephan Gerhold)

   - Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
     i.MX6ULL types that can run at that frequency (Christoph
     Niedermaier)

   - Eliminate unnecessary and harmful conversions to uW from the DTPM
     (dynamic thermal and power management) framework (Lukasz Luba)"

* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Only print supported EPP values for performance governor
  cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
  powercap: DTPM: Fix unneeded conversions to micro-Watts
  cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
  pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
  cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
  cpufreq: qcom-nvmem: Enable virtual power domain devices
  cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
2023-12-02 09:01:00 +09:00