Commit Graph

1045033 Commits

Author SHA1 Message Date
Aharon Landau
f1a090f09f RDMA/core: Require the driver to set the IOVA correctly during rereg_mr
If the driver returns a new MR during rereg it has to fill it with the
IOVA from the proper source. If IB_MR_REREG_TRANS is set then the IOVA is
cmd.hca_va, otherwise the IOVA comes from the old MR. mlx5 for example has
two calls inside rereg_mr:

		return create_real_mr(new_pd, umem, mr->ibmr.iova,
				      new_access_flags);
and
		return create_real_mr(new_pd, new_umem, iova, new_access_flags);

Unconditionally overwriting the iova in the newly allocated MR will
corrupt the iova if the first path is used.

Remove the redundant initializations from ib_uverbs_rereg_mr().

Fixes: 6e0954b11c ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Link: https://lore.kernel.org/r/4b0a31bbc372842613286a10d7a8cbb0ee6069c7.1635400472.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-03 09:37:52 -03:00
Kamal Heib
dd83f482d2 RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callback
There is no need to return always zero for function which is not
supported, especially since 0 is the wrong return code.

Link: https://lore.kernel.org/r/20211102073054.410838-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-03 09:06:36 -03:00
Jason Gunthorpe
6a463bc9d9 Merge branch 'for-rc' into rdma.git for-next
Patches held over for a possible rc8.

* for-rc:
  RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
  RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility
  RDMA/hns: Fix initial arm_st of CQ

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 15:08:19 -03:00
Jason Gunthorpe
a2a2a69d14 Merge tag 'v5.15' into rdma.git for-next
Pull in the accepted for-rc patches as the next merge needs a newer base.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 14:49:20 -03:00
Zhu Yanjun
9ed8110c9b RDMA/irdma: optimize rx path by removing unnecessary copy
In the function irdma_post_recv, the function irdma_copy_sg_list is
not needed since the struct irdma_sge and ib_sge have the similar
member variables. The struct irdma_sge can be replaced with the
struct ib_sge totally.

This can increase the rx performance of irdma.

Link: https://lore.kernel.org/r/20211030104226.253346-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 14:39:33 -03:00
Kamal Heib
4e446714fb RDMA/qed: Use helper function to set GUIDs
Use addrconf_addr_eui48() helper function to set the GUIDs and remove the
driver specific version.

Link: https://lore.kernel.org/r/20211031170743.81755-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 09:17:10 -03:00
Linus Torvalds
8bb7eca972 Linux 5.15 v5.15 2021-10-31 13:53:10 -07:00
Linus Torvalds
75fcbd3860 Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix compilation of callchain related code on powerpc with gcc11+

 - Fix PERF_SAMPLE_WEIGHT_STRUCT support in 'perf script'

 - Check session->header.env.arch before using it, fixing a segmentation
   fault

 - Suppress 'rm dlfilter' build messages

* tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support
  perf callchain: Fix compilation on powerpc with gcc11+
  perf script: Check session->header.env.arch before using it
  perf build: Suppress 'rm dlfilter' build message
2021-10-31 11:24:06 -07:00
Linus Torvalds
ca5e83eddc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:

 - Fixes for s390 interrupt delivery

 - Fixes for Xen emulator bugs showing up as debug kernel WARNs

 - Fix another issue with SEV/ES string I/O VMGEXITs

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Take srcu lock in post_kvm_run_save()
  KVM: SEV-ES: fix another issue with string I/O VMGEXITs
  KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
  KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
  KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
  KVM: s390: clear kicked_mask before sleeping again
2021-10-31 11:19:02 -07:00
Kan Liang
27730c8cd6 perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support
-F weight in perf script is broken.

  # ./perf mem record
  # ./perf script -F weight
  Samples for 'dummy:HG' event do not have WEIGHT attribute set. Cannot
print 'weight' field.

The sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. They share the same space, weight. The
lower 32 bits are exactly the same for both sample type. The higher 32
bits may be different for different architecture. For a new kernel on
x86, the PERF_SAMPLE_WEIGHT_STRUCT is used. For an old kernel or other
ARCHs, the PERF_SAMPLE_WEIGHT is used.

With -F weight, current perf script will only check the input string
"weight" with the PERF_SAMPLE_WEIGHT sample type. Because the commit
ea8d0ed6ea ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") didn't
update the PERF_SAMPLE_WEIGHT_STRUCT sample type for perf script. For a
new kernel on x86, the check fails.

Use PERF_SAMPLE_WEIGHT_TYPE, which supports both sample types, to
replace PERF_SAMPLE_WEIGHT

Fixes: ea8d0ed6ea ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT")
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1632929894-102778-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Jiri Olsa
89ac61ff05 perf callchain: Fix compilation on powerpc with gcc11+
Got following build fail on powerpc:

    CC      arch/powerpc/util/skip-callchain-idx.o
  In function ‘check_return_reg’,
      inlined from ‘check_return_addr’ at arch/powerpc/util/skip-callchain-idx.c:213:7,
      inlined from ‘arch_skip_callchain_idx’ at arch/powerpc/util/skip-callchain-idx.c:265:7:
  arch/powerpc/util/skip-callchain-idx.c:54:18: error: ‘dwarf_frame_register’ accessing 96 bytes \
  in a region of size 64 [-Werror=stringop-overflow=]
     54 |         result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops);
        |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/powerpc/util/skip-callchain-idx.c: In function ‘arch_skip_callchain_idx’:
  arch/powerpc/util/skip-callchain-idx.c:54:18: note: referencing argument 3 of type ‘Dwarf_Op *’
  In file included from /usr/include/elfutils/libdwfl.h:32,
                   from arch/powerpc/util/skip-callchain-idx.c:10:
  /usr/include/elfutils/libdw.h:1069:12: note: in a call to function ‘dwarf_frame_register’
   1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno,
        |            ^~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

The dwarf_frame_register args changed with [1],
Updating ops_mem accordingly.

[1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=5621fe5443da23112170235dd5cac161e5c75e65

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Mark Wieelard <mjw@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210928195253.1267023-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Song Liu
29c77550ee perf script: Check session->header.env.arch before using it
When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in 'perf script' for NULL
session->header.env.arch.

Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.

Committer notes:

If it is a pipe, we already assume is a native arch, so no need to check
session->header.env.arch.

Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211004053238.514936-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Adrian Hunter
095729484e perf build: Suppress 'rm dlfilter' build message
The following build message:

	rm dlfilters/dlfilter-test-api-v0.o

is unwanted.

The object file is being treated as an intermediate file and being
automatically removed. Mark the object file as .SECONDARY to prevent
removal and hence the message.

Requested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210930062849.110416-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Linus Torvalds
180eca540a Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Three small fixes, all in drivers, and one sizeable update to the UFS
  driver to remove the HPB 2.0 feature that has been objected to by Jens
  and Christoph.

  Although the UFS patch is large and last minute, it's essentially the
  least intrusive way of resolving the objections in time for the 5.15
  release"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: ufshpb: Remove HPB2.0 flows
  scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT
  scsi: ufs: ufs-exynos: Correct timeout value setting registers
  scsi: ibmvfc: Fix up duplicate response detection
2021-10-30 15:56:38 -07:00
Linus Torvalds
3a4347d82e Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
 "One fix for the composite clk that broke when we changed this clk type
  to use the determine_rate instead of round_rate clk op by default.
  This caused lots of problems on Rockchip SoCs because they heavily use
  the composite clk code to model the clk tree"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: composite: Also consider .determine_rate for rate + mux composites
2021-10-30 09:55:46 -07:00
Linus Torvalds
bf85ba018f Merge tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
 "These are pretty late, but they do fix concrete issues.

   - ensure the trap vector's address is aligned.

   - avoid re-populating the KASAN shadow memory.

   - allow kasan to build without warnings, which have recently become
     errors"

* tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix asan-stack clang build
  riscv: Do not re-populate shadow memory with kasan_populate_early_shadow
  riscv: fix misalgned trap vector base address
2021-10-30 09:28:24 -07:00
Avri Altman
09d9e4d041 scsi: ufs: ufshpb: Remove HPB2.0 flows
The Host Performance Buffer feature allows UFS read commands to carry the
physical media addresses along with the LBAs, thus allowing less internal
L2P-table switches in the device.  HPB1.0 allowed a single LBA, while
HPB2.0 increases this capacity up to 255 blocks.

Carrying more than a single record, the read operation is no longer purely
of type "read" but a "hybrid" command: Writing the physical address to the
device in one operation and reading back the required payload in another.

The JEDEC HPB spec defines two commands for this operation:
HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and
HPB-READ to read the payload.

With the current HPB design the UFS driver has no alternative but to divide
the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ.
This causes a great deal of aggravation to the block layer guys who
demanded that we completely revert the entire HPB driver regardless of the
huge amount of corporate effort already invested in it.

As a compromise, remove only the pieces that implement the 2.0
specification. This is done as a matter of urgency for the final 5.15
release.

Link: https://lore.kernel.org/r/20211030062301.248-1-avri.altman@wdc.com
Tested-by: Avri Altman <avri.altman@wdc.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-30 10:01:01 -04:00
Linus Torvalds
119c85055d Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Three commits fixing some issues introduced with the recent IOMMU
  changes we merged.

  Thanks to Alexey Kardashevskiy"

* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
  powerpc/pseries/iommu: Check if the default window in use before removing it
  powerpc/pseries/iommu: Use correct vfree for it_map
2021-10-29 17:35:56 -07:00
Linus Torvalds
db2398a56a Merge tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:

 - fix the return value check when parsing the ngpios property in
   gpio-xgs-iproc

 - check the return value of bgpio_init() in gpio-mlxbf2

* tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mlxbf2.c: Add check for bgpio_init failure
  gpio: xgs-iproc: fix parsing of ngpios property
2021-10-29 17:04:38 -07:00
Linus Torvalds
a379fbbcb8 Merge tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - NVMe pull request:
      - fix nvmet-tcp header digest verification (Amit Engel)
      - fix a memory leak in nvmet-tcp when releasing a queue (Maurizio
        Lombardi)
      - fix nvme-tcp H2CData PDU send accounting again (Sagi Grimberg)
      - fix digest pointer calculation in nvme-tcp and nvmet-tcp (Varun
        Prakash)
      - fix possible nvme-tcp req->offset corruption (Varun Prakash)

 - Queue drain ordering fix (Ming)

 - Partition check regression for zoned devices (Shin'ichiro)

 - Zone queue restart fix (Naohiro)

* tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-block:
  block: Fix partition check for host-aware zoned block devices
  nvmet-tcp: fix header digest verification
  nvmet-tcp: fix data digest pointer calculation
  nvme-tcp: fix data digest pointer calculation
  nvme-tcp: fix possible req->offset corruption
  block: schedule queue restart after BLK_STS_ZONE_RESOURCE
  block: drain queue after disk is removed from sysfs
  nvme-tcp: fix H2CData PDU send accounting (again)
  nvmet-tcp: fix a memory leak when releasing a queue
2021-10-29 11:10:29 -07:00
Martin K. Petersen
61a9f252c1 scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT
Testing revealed a problem with how the reference tag was handled for
a WRITE_INSERT operation. The SCSI_PROT_REF_CHECK flag is not set when
the controller is asked to generate the protection information
(i.e. not DIX). And as a result the initial reference tag would not be
set in the WRITE_INSERT case.

Separate handling of the REF_CHECK and REF_INCREMENT flags to align
with both the DIX spec and the MPI implementation.

Link: https://lore.kernel.org/r/20211028034202.24225-1-martin.petersen@oracle.com
Fixes: b3e2c72af1 ("scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-29 14:03:58 -04:00
Linus Torvalds
17d50f8941 Merge tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:

 - tmio: Re-enable card irqs after a reset

 - mtk-sd: Fixup probing of cqhci for crypto

 - cqhci: Fix support for suspend/resume

 - vub300: Fix control-message timeouts

 - dw_mmc-exynos: Fix support for tuning

 - winbond: Silences build errors on M68K

 - sdhci-esdhc-imx: Fix support for tuning

 - sdhci-pci: Read card detect from ACPI for Intel Merrifield

 - sdhci: Fix eMMC support for Thundercomm TurboX CM2290

* tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: tmio: reenable card irqs after the reset callback
  mmc: mediatek: Move cqhci init behind ungate clock
  mmc: cqhci: clear HALT state after CQE enable
  mmc: vub300: fix control-message timeouts
  mmc: dw_mmc: exynos: fix the finding clock sample value
  mmc: winbond: don't build on M68K
  mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit
  mmc: sdhci-pci: Read card detect from ACPI for Intel Merrifield
  mmc: sdhci: Map more voltage level to SDHCI_POWER_330
2021-10-29 10:54:44 -07:00
Linus Torvalds
fd919bbd33 Merge tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "Last minute fixes for crash on 32bit architectures when compression is
  in use. It's a regression introduced in 5.15-rc and I'd really like
  not let this into the final release, fixes via stable trees would add
  unnecessary delay.

  The problem is on 32bit architectures with highmem enabled, the pages
  for compression may need to be kmapped, while the patches removed that
  as we don't use GFP_HIGHMEM allocations anymore. The pages that don't
  come from local allocation still may be from highmem. Despite being on
  32bit there's enough such ARM machines in use so it's not a marginal
  issue.

  I did full reverts of the patches one by one instead of a huge one.
  There's one exception for the "lzo" revert as there was an
  intermediate patch touching the same code to make it compatible with
  subpage. I can't revert that one too, so the revert in lzo.c is
  manual. Qu Wenruo has worked on that with me and verified the changes"

* tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: compression: drop kmap/kunmap from lzo"
  Revert "btrfs: compression: drop kmap/kunmap from zlib"
  Revert "btrfs: compression: drop kmap/kunmap from zstd"
  Revert "btrfs: compression: drop kmap/kunmap from generic helpers"
2021-10-29 10:46:59 -07:00
Linus Torvalds
6f11521267 Merge tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing comment fixes from Steven Rostedt:

 - Some bots have informed me that some of the ftrace functions
   kernel-doc has formatting issues.

 - Also, fix my snake instinct.

* tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix misspelling of "missing"
  ftrace: Fix kernel-doc formatting issues
2021-10-29 10:41:07 -07:00
Linus Torvalds
75c7a6c1ca Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a build-time warning in x86/sm4"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/sm4 - Fix invalid section entry size
2021-10-29 10:17:08 -07:00
Chengchang Tang
6d202d9f70 RDMA/hns: Use the core code to manage the fixed mmap entries
Add a new implementation for mmap by using the new mmap entry API. This
makes way for further use of the dynamic mmap allocator in this driver.

Link: https://lore.kernel.org/r/20211028105640.1056-1-liangwenpeng@huawei.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 14:07:31 -03:00
Linus Torvalds
2c04d67ec1 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 patches.

  Subsystems affected by this patch series: mm (memcg, memory-failure,
  oom-kill, secretmem, vmalloc, hugetlb, damon, and tools), and ocfs2"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
  mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
  mm: khugepaged: skip huge page collapse for special files
  mm, thp: bail out early in collapse_file for writeback page
  mm/vmalloc: fix numa spreading for large hash tables
  mm/secretmem: avoid letting secretmem_users drop to zero
  ocfs2: fix race between searching chunks and release journal_head from buffer_head
  mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
  mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  mm: hwpoison: remove the unnecessary THP check
  memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT
2021-10-29 10:03:07 -07:00
Scott Breyer
4892298c3a IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124611.26694.71239.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Scott Breyer
840f4ed2d4 IB/qib: Rebranding of qib driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124606.26694.71567.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Scott Breyer
ddf65f28dd IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124601.26694.35662.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Kamal Heib
493620b1c9 RDMA/bnxt_re: Use helper function to set GUIDs
Use addrconf_addr_eui48() helper function to set the GUIDs and remove the
driver specific version.

Link: https://lore.kernel.org/r/20211028094359.160407-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:24:09 -03:00
Alexandre Ghiti
54c5639d8f riscv: Fix asan-stack clang build
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-29 08:54:50 -07:00
Alexandre Ghiti
cf11d01135 riscv: Do not re-populate shadow memory with kasan_populate_early_shadow
When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.

We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.

Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: e178d670f2 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-29 08:53:42 -07:00
Kamal Heib
04567caf96 RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descs
For some reason when introducing the fixed commit the "active_pds" and
"active_ahs" descriptors got dropped, which lead to the following panic
when trying to access the first entry in the descriptors.

 bnxt_re: Broadcom NetXtreme-C/E RoCE Driver
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 CPU: 2 PID: 594 Comm: kworker/u32:1 Not tainted 5.15.0-rc6+ #2
 Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.12.1 12/07/2020
 Workqueue: bnxt_re bnxt_re_task [bnxt_re]
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 31
 RSP: 0018:ffffb25fc47dfbb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000008100
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 00000000fffffff4 R09: 0000000000000000
 R10: ffff8a05c71fc028 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8a05c3dee800
 FS:  0000000000000000(0000) GS:ffff8a092fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000048d3da001 CR4: 00000000001706e0
 Call Trace:
  kernfs_name_hash+0x12/0x80
  kernfs_find_ns+0x35/0xd0
  kernfs_remove_by_name_ns+0x32/0x90
  remove_files+0x2b/0x60
  create_files+0x1d3/0x1f0
  internal_create_group+0x17b/0x1f0
  internal_create_groups.part.0+0x3d/0xa0
  setup_port+0x180/0x3b0 [ib_core]
  ? __cond_resched+0x16/0x40
  ? kmem_cache_alloc_trace+0x278/0x3d0
  ib_setup_port_attrs+0x99/0x240 [ib_core]
  ib_register_device+0xcc/0x160 [ib_core]
  bnxt_re_task+0xba/0x170 [bnxt_re]
  process_one_work+0x1eb/0x390
  worker_thread+0x53/0x3d0
  ? process_one_work+0x390/0x390
  kthread+0x10f/0x130
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30

Fixes: 13f30b0fa0 ("RDMA/counter: Add a descriptor in struct rdma_hw_stats")
Link: https://lore.kernel.org/r/20211027205448.127821-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 12:10:33 -03:00
Alok Prasad
4f960393a0 RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
This patch fixes a crash caused by querying the QP via netlink, and
corrects the state of GSI qp. GSI qp's have a NULL qed_qp.

The call trace is generated by:
 $ rdma res show

 BUG: kernel NULL pointer dereference, address: 0000000000000034
 Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
 RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
 RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
 RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
 RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
 RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
 R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
 FS:  00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
 Call Trace:
  qedr_query_qp+0x82/0x360 [qedr]
  ib_query_qp+0x34/0x40 [ib_core]
  ? ib_query_qp+0x34/0x40 [ib_core]
  fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
  ? __nla_put+0x20/0x30
  ? nla_put+0x33/0x40
  fill_res_qp_entry+0xe3/0x120 [ib_core]
  res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
  ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
  nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
  netlink_dump+0x156/0x2f0
  __netlink_dump_start+0x1ab/0x260
  rdma_nl_rcv+0x1de/0x330 [ib_core]
  ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
  netlink_unicast+0x1b8/0x270
  netlink_sendmsg+0x33e/0x470
  sock_sendmsg+0x63/0x70
  __sys_sendto+0x13f/0x180
  ? setup_sgl.isra.12+0x70/0xc0
  __x64_sys_sendto+0x28/0x30
  do_syscall_64+0x3a/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Fixes: cecbcddf64 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211027184329.18454-1-palok@marvell.com
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 11:54:16 -03:00
Yixing Liu
0e60778efb RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility
The upper limit of MAX_LP_MSG_LEN on HIP08 is 64K, and the upper limit on
HIP09 is 16K. Regardless of whether it is HIP08 or HIP09, only 16K will be
used. In order to ensure compatibility, it is unified to 16K.

Setting MAX_LP_MSG_LEN to 16K will not cause performance loss on HIP08.

Fixes: fbed9d2be2 ("RDMA/hns: Fix configuration of ack_req_freq in QPC")
Link: https://lore.kernel.org/r/20211029100537.27299-1-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 11:51:47 -03:00
Haoyue Xu
571fb4fb78 RDMA/hns: Fix initial arm_st of CQ
We set the init CQ status to ARMED before. As a result, an unexpected CEQE
would be reported. Therefore, the init CQ status should be set to no_armed
rather than REG_NXT_CEQE.

Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Link: https://lore.kernel.org/r/20211029095846.26732-1-liangwenpeng@huawei.com
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 11:51:34 -03:00
Steven Rostedt (VMware)
ddcf906fe5 tracing: Fix misspelling of "missing"
My snake instinct was on and I wrote "misssing" instead of "missing".

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:54:14 -04:00
Steven Rostedt (VMware)
6130722f11 ftrace: Fix kernel-doc formatting issues
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.

Also, the "ftrace_is_dead()" had an incomplete description.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:52:23 -04:00
David Sterba
ccaa66c8dd Revert "btrfs: compression: drop kmap/kunmap from lzo"
This reverts commit 8c945d32e6.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

The revert does not apply cleanly due to changes in a6e66e6f8c
("btrfs: rework lzo_decompress_bio() to make it subpage compatible")
that reworked the page iteration so the revert is done to be equivalent
to the original code.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Tested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:25:43 +02:00
David Sterba
55276e14df Revert "btrfs: compression: drop kmap/kunmap from zlib"
This reverts commit 696ab562e6.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:03:05 +02:00
David Sterba
56ee254d23 Revert "btrfs: compression: drop kmap/kunmap from zstd"
This reverts commit bbaf9715f3.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Example stacktrace with ZSTD on a 32bit ARM machine:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = c4159ed3
  [00000000] *pgd=00000000
  Internal error: Oops: 5 [#1] PREEMPT SMP ARM
  Modules linked in:
  CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12
  Hardware name: Allwinner sun4i/sun5i Families
  Workqueue: btrfs-delalloc btrfs_work_helper
  PC is at mmiocpy+0x48/0x330
  LR is at ZSTD_compressStream_generic+0x15c/0x28c

  (mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c)
  (ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0)
  (ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488)
  (zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c)
  (btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834)
  (compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28)
  (async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230)
  (btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418)
  (process_one_work) from [<c014f210>] (worker_thread+0x44/0x524)
  (worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0)
  (kthread) from [<c0100150>]

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:02:50 +02:00
David Yang
9c7516d669 tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
The coccinelle check report:

  ./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42:
  ERROR: application of sizeof to pointer

Use "strlen" to fix it.

Link: https://lkml.kernel.org/r/20211012030116.184027-1-davidcomponentone@gmail.com
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
SeongJae Park
2e014660b3 mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
Kunit test cases for 'damon_split_regions_of()' expects the number of
regions after calling the function will be same to their request
('nr_sub').  However, the requested number is just an upper-limit,
because the function randomly decides the size of each sub-region.

This fixes the wrong expectation.

Link: https://lkml.kernel.org/r/20211028090628.14948-1-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Yang Shi
a4aeaa06d4 mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC.  The intended usecase is to avoid TLB
misses for large text segments.  But it doesn't restrict the file types
so a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission.  This
may cause bugs, like [1] and [2].

This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.

[shy828301@gmail.com: fix vm_file check [3]]

Link: https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/ [2]
Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj-A@mail.gmail.com [3]
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: syzbot+aae069be1de40fb11825@syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Rongwei Wang
74c42e1baa mm, thp: bail out early in collapse_file for writeback page
Currently collapse_file does not explicitly check PG_writeback, instead,
page_has_private and try_to_release_page are used to filter writeback
pages.  This does not work for xfs with blocksize equal to or larger
than pagesize, because in such case xfs has no page->private.

This makes collapse_file bail out early for writeback page.  Otherwise,
xfs end_page_writeback will panic as follows.

  page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:ffff0003f88c86a8 index:0x0 pfn:0x84ef32
  aops:xfs_address_space_operations [xfs] ino:30000b7 dentry name:"libtest.so"
  flags: 0x57fffe0000008027(locked|referenced|uptodate|active|writeback)
  raw: 57fffe0000008027 ffff80001b48bc28 ffff80001b48bc28 ffff0003f88c86a8
  raw: 0000000000000000 0000000000000000 00000000ffffffff ffff0000c3e9a000
  page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u))
  page->mem_cgroup:ffff0000c3e9a000
  ------------[ cut here ]------------
  kernel BUG at include/linux/mm.h:1212!
  Internal error: Oops - BUG: 0 [#1] SMP
  Modules linked in:
  BUG: Bad page state in process khugepaged  pfn:84ef32
   xfs(E)
  page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:0 index:0x0 pfn:0x84ef32
   libcrc32c(E) rfkill(E) aes_ce_blk(E) crypto_simd(E) ...
  CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Tainted: ...
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
  Call trace:
    end_page_writeback+0x1c0/0x214
    iomap_finish_page_writeback+0x13c/0x204
    iomap_finish_ioend+0xe8/0x19c
    iomap_writepage_end_bio+0x38/0x50
    bio_endio+0x168/0x1ec
    blk_update_request+0x278/0x3f0
    blk_mq_end_request+0x34/0x15c
    virtblk_request_done+0x38/0x74 [virtio_blk]
    blk_done_softirq+0xc4/0x110
    __do_softirq+0x128/0x38c
    __irq_exit_rcu+0x118/0x150
    irq_exit+0x1c/0x30
    __handle_domain_irq+0x8c/0xf0
    gic_handle_irq+0x84/0x108
    el1_irq+0xcc/0x180
    arch_cpu_idle+0x18/0x40
    default_idle_call+0x4c/0x1a0
    cpuidle_idle_call+0x168/0x1e0
    do_idle+0xb4/0x104
    cpu_startup_entry+0x30/0x9c
    secondary_start_kernel+0x104/0x180
  Code: d4210000 b0006161 910c8021 94013f4d (d4210000)
  ---[ end trace 4a88c6a074082f8c ]---
  Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt

Link: https://lkml.kernel.org/r/20211022023052.33114-1-rongwei.wang@linux.alibaba.com
Fixes: 99cb0dbd47 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Song Liu <song@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Chen Wandun
ffb29b1c25 mm/vmalloc: fix numa spreading for large hash tables
Eric Dumazet reported a strange numa spreading info in [1], and found
commit 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings") introduced
this issue [2].

Dig into the difference before and after this patch, page allocation has
some difference:

before:
  alloc_large_system_hash
    __vmalloc
      __vmalloc_node(..., NUMA_NO_NODE, ...)
        __vmalloc_node_range
          __vmalloc_area_node
            alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */
              alloc_pages_current
                alloc_page_interleave /* can be proved by print policy mode */

after:
  alloc_large_system_hash
    __vmalloc
      __vmalloc_node(..., NUMA_NO_NODE, ...)
        __vmalloc_node_range
          __vmalloc_area_node
            alloc_pages_node /* choose nid by nuam_mem_id() */
              __alloc_pages_node(nid, ....)

So after commit 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings"),
it will allocate memory in current node instead of interleaving allocate
memory.

Link: https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@mail.gmail.com/ [2]
Link: https://lkml.kernel.org/r/20211021080744.874701-2-chenwandun@huawei.com
Fixes: 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Kees Cook
855d44434f mm/secretmem: avoid letting secretmem_users drop to zero
Quoting Dmitry:
 "refcount_inc() needs to be done before fd_install(). After
  fd_install() finishes, the fd can be used by userspace and
  we can have secret data in memory before the refcount_inc().

  A straightforward misuse where a user will predict the returned
  fd in another thread before the syscall returns and will use it
  to store secret data is somewhat dubious because such a user just
  shoots themself in the foot.

  But a more interesting misuse would be to close the predicted fd
  and decrement the refcount before the corresponding refcount_inc,
  this way one can briefly drop the refcount to zero while there are
  other users of secretmem."

Move fd_install() after refcount_inc().

Link: https://lkml.kernel.org/r/20211021154046.880251-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/CACT4Y+b1sW6-Hkn8HQYw_SsT7X3tp-CJNh2ci0wG3ZnQz9jjig@mail.gmail.com
Fixes: 9a436f8ff6 ("PM: hibernate: disable when there are active secretmem users")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jordy Zomer <jordy@pwning.systems>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Gautham Ananthakrishna
6f1b228529 ocfs2: fix race between searching chunks and release journal_head from buffer_head
Encountered a race between ocfs2_test_bg_bit_allocatable() and
jbd2_journal_put_journal_head() resulting in the below vmcore.

  PID: 106879  TASK: ffff880244ba9c00  CPU: 2   COMMAND: "loop3"
  Call trace:
    panic
    oops_end
    no_context
    __bad_area_nosemaphore
    bad_area_nosemaphore
    __do_page_fault
    do_page_fault
    page_fault
      [exception RIP: ocfs2_block_group_find_clear_bits+316]
    ocfs2_block_group_find_clear_bits [ocfs2]
    ocfs2_cluster_group_search [ocfs2]
    ocfs2_search_chain [ocfs2]
    ocfs2_claim_suballoc_bits [ocfs2]
    __ocfs2_claim_clusters [ocfs2]
    ocfs2_claim_clusters [ocfs2]
    ocfs2_local_alloc_slide_window [ocfs2]
    ocfs2_reserve_local_alloc_bits [ocfs2]
    ocfs2_reserve_clusters_with_limit [ocfs2]
    ocfs2_reserve_clusters [ocfs2]
    ocfs2_lock_refcount_allocators [ocfs2]
    ocfs2_make_clusters_writable [ocfs2]
    ocfs2_replace_cow [ocfs2]
    ocfs2_refcount_cow [ocfs2]
    ocfs2_file_write_iter [ocfs2]
    lo_rw_aio
    loop_queue_work
    kthread_worker_fn
    kthread
    ret_from_fork

When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the
bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and
released the jounal head from the buffer head.  Needed to take bit lock
for the bit 'BH_JournalHead' to fix this race.

Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <rajesh.sivaramasubramaniom@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Suren Baghdasaryan
337546e83f mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
Race between process_mrelease and exit_mmap, where free_pgtables is
called while __oom_reap_task_mm is in progress, leads to kernel crash
during pte_offset_map_lock call.  oom-reaper avoids this race by setting
MMF_OOM_VICTIM flag and causing exit_mmap to take and release
mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock.

Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to
fix this race, however that would be considered a hack.  Fix this race
by elevating mm->mm_users and preventing exit_mmap from executing until
process_mrelease is finished.  Patch slightly refactors the code to
adapt for a possible mmget_not_zero failure.

This fix has considerable negative impact on process_mrelease
performance and will likely need later optimization.

Link: https://lkml.kernel.org/r/20211022014658.263508-1-surenb@google.com
Fixes: 884a7e5964 ("mm: introduce process_mrelease system call")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christian Brauner <christian@brauner.io>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00