Commit Graph

124779 Commits

Author SHA1 Message Date
Jakub Kicinski
594850ca43 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Update debugging in IPVS tcp protocol handler to make it easier
   to understand, from longguang.yue

2) Update TCP tracker to deal with keepalive packet after
   re-registration, from Franceso Ruggeri.

3) Missing IP6SKB_FRAGMENTED from netfilter fragment reassembly,
   from Georg Kohmann.

4) Fix bogus packet drop in ebtables nat extensions, from
   Thimothee Cocault.

5) Fix typo in flowtable documentation.

6) Reset skb timestamp in nft_fwd_netdev.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-22 12:00:37 -07:00
Linus Torvalds
96485e4462 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "The siginificant new ext4 feature this time around is Harshad's new
  fast_commit mode.

  In addition, thanks to Mauricio for fixing a race where mmap'ed pages
  that are being changed in parallel with a data=journal transaction
  commit could result in bad checksums in the failure that could cause
  journal replays to fail.

  Also notable is Ritesh's buffered write optimization which can result
  in significant improvements on parallel write workloads. (The kernel
  test robot reported a 330.6% improvement on fio.write_iops on a 96
  core system using DAX)

  Besides that, we have the usual miscellaneous cleanups and bug fixes"

Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: fix invalid inode checksum
  ext4: add fast commit stats in procfs
  ext4: add a mount opt to forcefully turn fast commits on
  ext4: fast commit recovery path
  jbd2: fast commit recovery path
  ext4: main fast-commit commit path
  jbd2: add fast commit machinery
  ext4 / jbd2: add fast commit initialization
  ext4: add fast_commit feature and handling for extended mount options
  doc: update ext4 and journalling docs to include fast commit feature
  ext4: Detect already used quota file early
  jbd2: avoid transaction reuse after reformatting
  ext4: use the normal helper to get the actual inode
  ext4: fix bs < ps issue reported with dioread_nolock mount opt
  ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
  ext4: data=journal: fixes for ext4_page_mkwrite()
  jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
  jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
  ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
  ext4: use ext4_sb_bread() instead of sb_bread()
  ...
2020-10-22 10:31:08 -07:00
Linus Torvalds
f56e65dff6 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull initial set_fs() removal from Al Viro:
 "Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Allow a NULL pos pointer to __kernel_read
  fs: Allow a NULL pos pointer to __kernel_write
  powerpc: remove address space overrides using set_fs()
  powerpc: use non-set_fs based maccess routines
  x86: remove address space overrides using set_fs()
  x86: make TASK_SIZE_MAX usable from assembly code
  x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
  lkdtm: remove set_fs-based tests
  test_bitmap: remove user bitmap tests
  uaccess: add infrastructure for kernel builds with set_fs()
  fs: don't allow splice read/write without explicit ops
  fs: don't allow kernel reads and writes without iter ops
  sysctl: Convert to iter interfaces
  proc: add a read_iter method to proc proc_ops
  proc: cleanup the compat vs no compat file ops
  proc: remove a level of indentation in proc_get_inode
2020-10-22 09:59:21 -07:00
Jakub Kicinski
d2775984d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-10-22

1) Fix enforcing NULL check in verifier for new helper return types of
   RET_PTR_TO_{BTF_ID,MEM_OR_BTF_ID}_OR_NULL, from Martin KaFai Lau.

2) Fix bpf_redirect_neigh() helper API before it becomes frozen by adding
   nexthop information as argument, from Toke Høiland-Jørgensen.

3) Guard & fix compilation of bpf_tail_call_static() when __bpf__ arch is
   not defined by compiler or clang too old, from Daniel Borkmann.

4) Remove misplaced break after return in attach_type_to_prog_type(), from
   Tom Rix.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-22 09:51:41 -07:00
Linus Torvalds
24717cfbbb Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "The one new feature this time, from Anna Schumaker, is READ_PLUS,
  which has the same arguments as READ but allows the server to return
  an array of data and hole extents.

  Otherwise it's a lot of cleanup and bugfixes"

* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
  NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
  SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
  sunrpc: raise kernel RPC channel buffer size
  svcrdma: fix bounce buffers for unaligned offsets and multiple pages
  nfsd: remove unneeded break
  net/sunrpc: Fix return value for sysctl sunrpc.transports
  NFSD: Encode a full READ_PLUS reply
  NFSD: Return both a hole and a data segment
  NFSD: Add READ_PLUS hole segment encoding
  NFSD: Add READ_PLUS data support
  NFSD: Hoist status code encoding into XDR encoder functions
  NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
  NFSD: Remove the RETURN_STATUS() macro
  NFSD: Call NFSv2 encoders on error returns
  NFSD: Fix .pc_release method for NFSv2
  NFSD: Remove vestigial typedefs
  NFSD: Refactor nfsd_dispatch() error paths
  NFSD: Clean up nfsd_dispatch() variables
  NFSD: Clean up stale comments in nfsd_dispatch()
  NFSD: Clean up switch statement in nfsd_dispatch()
  ...
2020-10-22 09:44:27 -07:00
Harshad Shirwadkar
8016e29f43 ext4: fast commit recovery path
This patch adds fast commit recovery path support for Ext4 file
system. We add several helper functions that are similar in spirit to
e2fsprogs journal recovery path handlers. Example of such functions
include - a simple block allocator, idempotent block bitmap update
function etc. Using these routines and the fast commit log in the fast
commit area, the recovery path (ext4_fc_replay()) performs fast commit
log recovery.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:38 -04:00
Harshad Shirwadkar
5b849b5f96 jbd2: fast commit recovery path
This patch adds fast commit recovery support in JBD2.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar
aa75f4d3da ext4: main fast-commit commit path
This patch adds main fast commit commit path handlers. The overall
patch can be divided into two inter-related parts:

(A) Metadata updates tracking

    This part consists of helper functions to track changes that need
    to be committed during a commit operation. These updates are
    maintained by Ext4 in different in-memory queues. Following are
    the APIs and their short description that are implemented in this
    patch:

    - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat
      operations
    - ext4_fc_track_range() - Track changed logical block offsets
      inodes
    - ext4_fc_track_inode() - Track inodes
    - ext4_fc_mark_ineligible() - Mark file system fast commit
      ineligible()
    - ext4_fc_start_update() / ext4_fc_stop_update() /
      ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These
      functions are useful for co-ordinating inode updates with
      commits.

(B) Main commit Path

    This part consists of functions to convert updates tracked in
    in-memory data structures into on-disk commits. Function
    ext4_fc_commit() is the main entry point to commit path.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar
ff780b91ef jbd2: add fast commit machinery
This functions adds necessary APIs needed in JBD2 layer for fast
commits.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:37 -04:00
Harshad Shirwadkar
6866d7b3f2 ext4 / jbd2: add fast commit initialization
This patch adds fast commit area trackers in the journal_t
structure. These are initialized via the jbd2_fc_init() routine that
this patch adds. This patch also adds ext4/fast_commit.c and
ext4/fast_commit.h files for fast commit code that will be added in
subsequent patches in this series.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:26 -04:00
Harshad Shirwadkar
995a3ed67f ext4: add fast_commit feature and handling for extended mount options
We are running out of mount option bits. Add handling for using
s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add
ability to turn off the fast commit feature in Ext4.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21 23:22:26 -04:00
Di Zhu
ebfe3c5183 rtnetlink: fix data overflow in rtnl_calcit()
"ip addr show" command execute error when we have a physical
network card with a large number of VFs

The return value of if_nlmsg_size() in rtnl_calcit() will exceed
range of u16 data type when any network cards has a larger number of
VFs. rtnl_vfinfo_size() will significant increase needed dump size when
the value of num_vfs is larger.

Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
which decides the memory size needed by netlink dump and netlink_dump()
will return -EMSGSIZE because of not enough memory was allocated.

So fix it by promoting  min_dump_alloc data type to u32 to
avoid whole netlink message size overflow and it's also align
with the data type of struct netlink_callback{}.min_dump_alloc
which is assigned by return value of rtnl_calcit()

Signed-off-by: Di Zhu <zhudi21@huawei.com>
Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21 18:24:08 -07:00
Toke Høiland-Jørgensen
ba452c9e99 bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply
be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen.

  [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
2020-10-22 01:28:54 +02:00
Peter Xu
9e9eb226b9 KVM: Cache as_id in kvm_memory_slot
Cache the address space ID just like the slot ID.  It will be used in
order to fill in the dirty ring entries.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201014182700.2888246-7-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:01 -04:00
Oliver Upton
66570e966d kvm: x86: only provide PV features if enabled in guest's CPUID
KVM unconditionally provides PV features to the guest, regardless of the
configured CPUID. An unwitting guest that doesn't check
KVM_CPUID_FEATURES before use could access paravirt features that
userspace did not intend to provide. Fix this by checking the guest's
CPUID before performing any paravirtual operations.

Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
aforementioned enforcement. Migrating a VM from a host w/o this patch to
a host with this patch could silently change the ABI exposed to the
guest, warranting that we default to the old behavior and opt-in for
the new one.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98
Message-Id: <20200818152429.1923996-4-oupton@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:32 -04:00
Linus Torvalds
b7769c45b8 Merge tag 'rtc-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "A new driver this cycle is making the bulk of the changes and the
  rx8010 driver has been rework to use the modern APIs.

  Summary:

  Subsystem:
   - new generic DT properties: aux-voltage-chargeable,
     trickle-voltage-millivolt

  New driver:
   - Microcrystal RV-3032

  Drivers:
   - ds1307: use aux-voltage-chargeable
   - r9701, rx8010: modernization of the driver
   - rv3028: fix clock output, trickle resistor values, RAM
     configuration registers"

* tag 'rtc-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  rtc: r9701: set range
  rtc: r9701: convert to devm_rtc_allocate_device
  rtc: r9701: stop setting RWKCNT
  rtc: r9701: remove useless memset
  rtc: r9701: stop setting a default time
  rtc: r9701: remove leftover comment
  rtc: rv3032: Add a driver for Microcrystal RV-3032
  dt-bindings: rtc: rv3032: add RV-3032 bindings
  dt-bindings: rtc: add trickle-voltage-millivolt
  rtc: rv3028: ensure ram configuration registers are saved
  rtc: rv3028: factorize EERD bit handling
  rtc: rv3028: fix trickle resistor values
  rtc: rv3028: fix clock output support
  rtc: mt6397: Remove unused member dev
  rtc: rv8803: simplify the return expression of rv8803_nvram_write
  rtc: meson: simplify the return expression of meson_vrtc_probe
  rtc: rx8010: rename rx8010_init_client() to rx8010_init()
  rtc: ds1307: enable rx8130's backup battery, make it chargeable optionally
  rtc: ds1307: consider aux-voltage-chargeable
  rtc: ds1307: store previous charge default per chip
  ...
2020-10-21 11:22:08 -07:00
Linus Torvalds
b5df4b5c28 Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - if a host can be a client, too, the I2C core can now use it to
   emulate SMBus HostNotify support (STM32 and R-Car added this so far)

 - also for client mode, a testunit has been added. It can create rare
   situations on the bus, so host controllers can be tested

 - a binding has been added to mark the bus as "single-master". This
   allows for better timeout detections

 - new driver for Mellanox Bluefield

 - massive refactoring of the Tegra driver

 - EEPROMs recognized by the at24 driver can now have custom names

 - rest is driver updates

* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (80 commits)
  Documentation: i2c: add testunit docs to index
  i2c: tegra: Improve driver module description
  i2c: tegra: Clean up whitespaces, newlines and indentation
  i2c: tegra: Clean up and improve comments
  i2c: tegra: Clean up printk messages
  i2c: tegra: Clean up variable names
  i2c: tegra: Improve formatting of variables
  i2c: tegra: Check errors for both positive and negative values
  i2c: tegra: Factor out hardware initialization into separate function
  i2c: tegra: Factor out register polling into separate function
  i2c: tegra: Factor out packet header setup from tegra_i2c_xfer_msg()
  i2c: tegra: Factor out error recovery from tegra_i2c_xfer_msg()
  i2c: tegra: Rename wait/poll functions
  i2c: tegra: Remove "dma" variable from tegra_i2c_xfer_msg()
  i2c: tegra: Remove redundant check in tegra_i2c_issue_bus_clear()
  i2c: tegra: Remove likely/unlikely from the code
  i2c: tegra: Remove outdated barrier()
  i2c: tegra: Clean up variable types
  i2c: tegra: Reorder location of functions in the code
  i2c: tegra: Clean up probe function
  ...
2020-10-21 10:54:05 -07:00
Linus Torvalds
ed7cfefe44 Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:

 - a patch that removes crush_workspace_mutex (myself). CRUSH
   computations are no longer serialized and can run in parallel.

 - a couple new filesystem client metrics for "ceph fs top" command
   (Xiubo Li)

 - a fix for a very old messenger bug that affected the filesystem,
   marked for stable (myself)

 - assorted fixups and cleanups throughout the codebase from Jeff and
   others.

* tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits)
  libceph: clear con->out_msg on Policy::stateful_server faults
  libceph: format ceph_entity_addr nonces as unsigned
  libceph: fix ENTITY_NAME format suggestion
  libceph: move a dout in queue_con_delay()
  ceph: comment cleanups and clarifications
  ceph: break up send_cap_msg
  ceph: drop separate mdsc argument from __send_cap
  ceph: promote to unsigned long long before shifting
  ceph: don't SetPageError on readpage errors
  ceph: mark ceph_fmt_xattr() as printf-like for better type checking
  ceph: fold ceph_update_writeable_page into ceph_write_begin
  ceph: fold ceph_sync_writepages into writepage_nounlock
  ceph: fold ceph_sync_readpages into ceph_readpage
  ceph: don't call ceph_update_writeable_page from page_mkwrite
  ceph: break out writeback of incompatible snap context to separate function
  ceph: add a note explaining session reject error string
  libceph: switch to the new "osd blocklist add" command
  libceph, rbd, ceph: "blacklist" -> "blocklist"
  ceph: have ceph_writepages_start call pagevec_lookup_range_tag
  ceph: use kill_anon_super helper
  ...
2020-10-21 10:34:10 -07:00
Bjorn Helgaas
924bb1f9b0 Merge branch 'remotes/lorenzo/pci/dwc'
- Fix designware-ep Header Type check (Hou Zhiqiang)

- Use DBI accessors instead of own config accessors (Rob Herring)

- Allow overriding bridge pci_ops (Rob Herring)

- Allow root and child buses to have different pci_ops (Rob Herring)

- Add default dwc pci_ops.map_bus (Rob Herring)

- Use pci_ops for root config space accessors in al, exynos, histb,
  keystone, kirin, meson, tegra (Rob Herring)

- Remove dwc own/other config accessor ops (Rob Herring)

- Use generic config accessors in dwc (Rob Herring)

- Also call .add_bus() callback for root bus (Rob Herring)

- Convert keystone .scan_bus() callback to use pci_ops.add_bus (Rob
  Herring)

- Convert dwc to use pci_host_probe() (Rob Herring)

- Remove dwc root_bus pointer (Rob Herring)

- Remove storing of PCI resources in dwc-specific structs (Rob Herring)

- Simplify config space handling (Rob Herring)

- Drop keystone duplicated DT num-viewport handling (Rob Herring)

- Check CONFIG_PCI_MSI in dw_pcie_msi_init() instead of duplicating it in
  all the drivers (Rob Herring)

- Remove imx6 duplicate PCIE_LINK_WIDTH_SPEED_CONTROL definition (Rob
  Herring)

- Add dwc num_lanes for use when it's lacking from DT (Rob Herring)

- Ensure "Fast Link Mode" simulation environment setting is cleared (Rob
  Herring)

- Drop meson duplicate number of lanes setup (Rob Herring)

- Drop meson unnecessary RC config space init (Rob Herring)

- Rework meson config and dwc port logic register accesses (Rob Herring)

- Use common PCI register definitions in imx6 and qcom (Rob Herring)

- Search for DesignWare PCIe Capability instead of hard-coding its location
  (Rob Herring)

- Use common DesignWare register definitions in tegra (Rob Herring)

- Drop keystone unused DBI2 code (Rob Herring)

- Make dwc ATU accessors private (Rob Herring)

- Centralize link gen setting in dwc (Rob Herring)

- Set PORT_LINK_DLL_LINK_EN in common dwc setup code (Rob Herring)

- Drop intel-gw unnecessary DT 'device_type' checking (Rob Herring)

- Move intel-gw PCI_CAP_ID_EXP discovery to the single place it's used (Rob
  Herring)

- Drop intel-gw unused max_width (Rob Herring)

- Move N_FTS (fast training sequence) setup to common dwc setup (Rob
  Herring)

- Convert spear13xx, tegra194 to use DBI accessors (Rob Herring)

- Add multiple PFs support for DWC (Xiaowei Bao)

- Add MSI-X doorbell mode for endpoint mode (Xiaowei Bao)

- Update MSI/MSI-X capability management for endpoints (Xiaowei Bao)

- Add layerscape ls1088a and ls2088a compatible strings (Xiaowei Bao)

- Update layerscape MSI/MSI-X management (Xiaowei Bao)

- Use doorbell to support MSI-X on layerscape (Xiaowei Bao)

- Add layerscape endpoint mode support for ls1088a and ls2088a (Xiaowei
  Bao)

- Add layerscape ls1088a node to DT (Xiaowei Bao)

- Add Freescale/Layerscape ls1088a to endpoint test (Xiaowei Bao)

- Add endpoint test driver data for Layerscape PCIe controllers (Hou
  Zhiqiang)

- Fix 'cast truncates bits from constant value' warning (Gustavo Pimentel)

- Add uniphier iATU register description (Kunihiko Hayashi)

- Add common iATU register support (Kunihiko Hayashi)

- Remove keystone iATU register mapping in favor of generic dwc support
  (Kunihiko Hayashi)

- Skip PCIE_MSI_INTR0* programming if MSI is disabled (Jisheng Zhang)

- Fix MSI page leakage in suspend/resume (Jisheng Zhang)

- Check whether link is up before attempting config access (best-effort fix
  even though it's racy) (Hou Zhiqiang)

* remotes/lorenzo/pci/dwc:
  PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()
  PCI: dwc: Fix MSI page leakage in suspend/resume
  PCI: dwc: Skip PCIE_MSI_INTR0* programming if MSI is disabled
  PCI: keystone: Remove iATU register mapping
  PCI: dwc: Add common iATU register support
  dt-bindings: PCI: uniphier-ep: Add iATU register description
  dt-bindings: PCI: uniphier: Add iATU register description
  PCI: dwc: Fix 'cast truncates bits from constant value'
  misc: pci_endpoint_test: Add driver data for Layerscape PCIe controllers
  misc: pci_endpoint_test: Add LS1088a in pci_device_id table
  PCI: layerscape: Add EP mode support for ls1088a and ls2088a
  PCI: layerscape: Modify the MSIX to the doorbell mode
  PCI: layerscape: Modify the way of getting capability with different PEX
  PCI: layerscape: Fix some format issue of the code
  dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a and ls2088a
  PCI: designware-ep: Modify MSI and MSIX CAP way of finding
  PCI: designware-ep: Move the function of getting MSI capability forward
  PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
  PCI: designware-ep: Add multiple PFs support for DWC
  PCI: dwc: Use DBI accessors
  PCI: dwc: Move N_FTS setup to common setup
  PCI: dwc/intel-gw: Drop unused max_width
  PCI: dwc/intel-gw: Move getting PCI_CAP_ID_EXP offset to intel_pcie_link_setup()
  PCI: dwc/intel-gw: Drop unnecessary checking of DT 'device_type' property
  PCI: dwc: Set PORT_LINK_DLL_LINK_EN in common setup code
  PCI: dwc: Centralize link gen setting
  PCI: dwc: Make ATU accessors private
  PCI: dwc: Remove read_dbi2 code
  PCI: dwc/tegra: Use common Designware port logic register definitions
  PCI: dwc: Remove hardcoded PCI_CAP_ID_EXP offset
  PCI: dwc/qcom: Use common PCI register definitions
  PCI: dwc/imx6: Use common PCI register definitions
  PCI: dwc/meson: Rework PCI config and DW port logic register accesses
  PCI: dwc/meson: Drop unnecessary RC config space initialization
  PCI: dwc/meson: Drop the duplicate number of lanes setup
  PCI: dwc: Ensure FAST_LINK_MODE is cleared
  PCI: dwc: Add a 'num_lanes' field to struct dw_pcie
  PCI: dwc/imx6: Remove duplicate define PCIE_LINK_WIDTH_SPEED_CONTROL
  PCI: dwc: Check CONFIG_PCI_MSI inside dw_pcie_msi_init()
  PCI: dwc/keystone: Drop duplicated 'num-viewport'
  PCI: dwc: Simplify config space handling
  PCI: dwc: Remove storing of PCI resources
  PCI: dwc: Remove root_bus pointer
  PCI: dwc: Convert to use pci_host_probe()
  PCI: dwc: keystone: Convert .scan_bus() callback to use add_bus
  PCI: Also call .add_bus() callback for root bus
  PCI: dwc: Use generic config accessors
  PCI: dwc: Remove dwc specific config accessor ops
  PCI: dwc: histb: Use pci_ops for root config space accessors
  PCI: dwc: exynos: Use pci_ops for root config space accessors
  PCI: dwc: kirin: Use pci_ops for root config space accessors
  PCI: dwc: meson: Use pci_ops for root config space accessors
  PCI: dwc: tegra: Use pci_ops for root config space accessors
  PCI: dwc: keystone: Use pci_ops for config space accessors
  PCI: dwc: al: Use pci_ops for child config space accessors
  PCI: dwc: Add a default pci_ops.map_bus for root port
  PCI: dwc: Allow overriding bridge pci_ops
  PCI: dwc: Use DBI accessors instead of own config accessors
  PCI: Allow root and child buses to have different pci_ops
  PCI: designware-ep: Fix the Header Type check
2020-10-21 09:58:39 -05:00
Bjorn Helgaas
299af12a72 Merge branch 'remotes/lorenzo/pci/pci-iomap'
- Remove useless __KERNEL__ preprocessor guard in sparc io_32.h (Lorenzo
  Pieralisi)

- Move ioremap/iounmap declaration so it's visible in asm-generic/io.h
  (Lorenzo Pieralisi)

- Fix memory leak in generic !CONFIG_GENERIC_IOMAP pci_iounmap()
  implementation (Lorenzo Pieralisi)

* remotes/lorenzo/pci/pci-iomap:
  asm-generic/io.h: Fix !CONFIG_GENERIC_IOMAP pci_iounmap() implementation
  sparc32: Move ioremap/iounmap declaration before asm-generic/io.h include
  sparc32: Remove useless io_32.h __KERNEL__ preprocessor guard
2020-10-21 09:58:37 -05:00
Bjorn Helgaas
03b482e243 Merge branch 'remotes/lorenzo/pci/apei'
- Add ACPI APEI notifier chain for unknown (vendor) CPER records (Shiju
  Jose)

- Add handling of HiSilicon HIP PCIe controller errors (Yicong Yang)

* remotes/lorenzo/pci/apei:
  PCI: hip: Add handling of HiSilicon HIP PCIe controller errors
  ACPI / APEI: Add a notifier chain for unknown (vendor) CPER records
2020-10-21 09:58:36 -05:00
Bjorn Helgaas
8b28a3f346 Merge branch 'pci/misc'
- Remove unnecessary #includes (Gustavo Pimentel)

- Fix intel_mid_pci.c build error when !CONFIG_ACPI (Randy Dunlap)

- Use scnprintf(), not snprintf(), in sysfs "show" functions (Krzysztof
  Wilczyński)

- Simplify pci-pf-stub by using module_pci_driver() (Liu Shixin)

- Print IRQ used by Link Bandwidth Notification (Dongdong Liu)

- Update sysfs mmap-related #ifdef comments (Clint Sbisa)

- Simplify pci_dev_reset_slot_function() (Lukas Wunner)

- Use "NULL" instead of "0" to fix sparse warnings (Gustavo Pimentel)

- Simplify bool comparisons (Krzysztof Wilczyński)

- Drop double zeroing for P2PDMA sg_init_table() (Julia Lawall)

* pci/misc:
  PCI: v3-semi: Remove unneeded break
  PCI/P2PDMA: Drop double zeroing for sg_init_table()
  PCI: Simplify bool comparisons
  PCI: endpoint: Use "NULL" instead of "0" as a NULL pointer
  PCI: Simplify pci_dev_reset_slot_function()
  PCI: Update mmap-related #ifdef comments
  PCI/LINK: Print IRQ number used by port
  PCI/IOV: Simplify pci-pf-stub with module_pci_driver()
  PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions
  x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
  PCI: Remove unnecessary header includes
2020-10-21 09:58:36 -05:00
Bjorn Helgaas
0d2493ab08 Merge branch 'pci/pm'
- Remove unused pcibios_pm_ops (Vaibhav Gupta)

- Rename pci_dev.d3_delay to d3hot_delay (Krzysztof Wilczyński)

- Apply D2 transition delay as microseconds, not milliseconds (Bjorn
  Helgaas)

* pci/pm:
  PCI/PM: Revert "PCI/PM: Apply D2 delay as milliseconds, not microseconds"
  PCI/PM: Remove unused PCI_PM_BUS_WAIT
  PCI/PM: Rename pci_dev.d3_delay to d3hot_delay
  PCI/PM: Remove unused pcibios_pm_ops
2020-10-21 09:58:35 -05:00
Bjorn Helgaas
28a18aec59 Merge branch 'pci/enumeration'
- Tone down message about missing optional MCFG (Jeremy Linton)

- Add schedule point in pci_read_config() (Jiang Biao)

- Add Ampere Altra SOC MCFG quirk (Tuan Phan)

- Add Kconfig options for MPS/MRRS strategy (Jim Quinlan)

* pci/enumeration:
  PCI: Add Kconfig options for MPS/MRRS strategy
  PCI/ACPI: Add Ampere Altra SOC MCFG quirk
  PCI: Add schedule point in pci_read_config()
  PCI/ACPI: Tone down missing MCFG message
2020-10-21 09:58:34 -05:00
Pierre Morel
0afa15e1a5 virtio: let arch advertise guest's memory access restrictions
An architecture may restrict host access to guest memory,
e.g. IBM s390 Secure Execution or AMD SEV.

Provide a new Kconfig entry the architecture can select,
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides
the arch_has_restricted_virtio_memory_access callback to advertise
to VIRTIO common code when the architecture restricts memory access
from the host.

The common code can then fail the probe for any device where
VIRTIO_F_ACCESS_PLATFORM is required, but not set.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/1599728030-17085-2-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-10-21 10:34:12 -04:00
Dai Ngo
0cfcd405e7 NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have
build errors and some configs with NFSD=m to get NFS4ERR_STALE
error when doing inter server copy.

Added ops table in nfs_common for knfsd to access NFS client modules.

Fixes: 3ac3711adb ("NFSD: Fix NFS server build errors")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-10-21 10:31:20 -04:00
Linus Torvalds
c4d6fe7311 Merge tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray
Pull XArray updates from Matthew Wilcox:

 - Fix the test suite after introduction of the local_lock

 - Fix a bug in the IDA spotted by Coverity

 - Change the API that allows the workingset code to delete a node

 - Fix xas_reload() when dealing with entries that occupy multiple
   indices

 - Add a few more tests to the test suite

 - Fix an unsigned int being shifted into an unsigned long

* tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray:
  XArray: Fix xas_create_range for ranges above 4 billion
  radix-tree: fix the comment of radix_tree_next_slot()
  XArray: Fix xas_reload for multi-index entries
  XArray: Add private interface for workingset node deletion
  XArray: Fix xas_for_each_conflict documentation
  XArray: Test marked multiorder iterations
  XArray: Test two more things about xa_cmpxchg
  ida: Free allocated bitmap in error path
  radix tree test suite: Fix compilation
2020-10-20 14:39:37 -07:00
Linus Torvalds
59f0e7eb2f Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "Stable Fixes:
   - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+
   - Fix nfs_path in case of a rename retry
   - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag

  New features and improvements:
   - Replace dprintk() calls with tracepoints
   - Make cache consistency bitmap dynamic
   - Added support for the NFS v4.2 READ_PLUS operation
   - Improvements to net namespace uniquifier

  Other bugfixes and cleanups:
   - Remove redundant clnt pointer
   - Don't update timeout values on connection resets
   - Remove redundant tracepoints
   - Various cleanups to comments
   - Fix oops when trying to use copy_file_range with v4.0 source server
   - Improvements to flexfiles mirrors
   - Add missing 'local_lock=posix' mount option"

* tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits)
  NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag
  NFSv4: Fix up RCU annotations for struct nfs_netns_client
  NFS: Only reference user namespace from nfs4idmap struct instead of cred
  nfs: add missing "posix" local_lock constant table definition
  NFSv4: Use the net namespace uniquifier if it is set
  NFSv4: Clean up initialisation of uniquified client id strings
  NFS: Decode a full READ_PLUS reply
  SUNRPC: Add an xdr_align_data() function
  NFS: Add READ_PLUS hole segment decoding
  SUNRPC: Add the ability to expand holes in data pages
  SUNRPC: Split out _shift_data_right_tail()
  SUNRPC: Split out xdr_realign_pages() from xdr_align_pages()
  NFS: Add READ_PLUS data segment support
  NFS: Use xdr_page_pos() in NFSv4 decode_getacl()
  SUNRPC: Implement a xdr_page_pos() function
  SUNRPC: Split out a function for setting current page
  NFS: fix nfs_path in case of a rename retry
  fs: nfs: return per memcg count for xattr shrinkers
  NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE
  nfs: remove incorrect fallthrough label
  ...
2020-10-20 13:26:30 -07:00
Linus Torvalds
4962a85696 Merge tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
 "A mix of fixes and a few stragglers. In detail:

   - Revert the bogus __read_mostly that we discussed for the initial
     pull request.

   - Fix a merge window regression with fixed file registration error
     path handling.

   - Fix io-wq numa node affinities.

   - Series abstracting out an io_identity struct, making it both easier
     to see what the personality items are, and also easier to to adopt
     more. Use this to cover audit logging.

   - Fix for read-ahead disabled block condition in async buffered
     reads, and using single page read-ahead to unify what
     generic_file_buffer_read() path is used.

   - Series for REQ_F_COMP_LOCKED fix and removal of it (Pavel)

   - Poll fix (Pavel)"

* tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-block: (21 commits)
  io_uring: use blk_queue_nowait() to check if NOWAIT supported
  mm: use limited read-ahead to satisfy read
  mm: mark async iocb read as NOWAIT once some data has been copied
  io_uring: fix double poll mask init
  io-wq: inherit audit loginuid and sessionid
  io_uring: use percpu counters to track inflight requests
  io_uring: assign new io_identity for task if members have changed
  io_uring: store io_identity in io_uring_task
  io_uring: COW io_identity on mismatch
  io_uring: move io identity items into separate struct
  io_uring: rely solely on work flags to determine personality.
  io_uring: pass required context in as flags
  io-wq: assign NUMA node locality if appropriate
  io_uring: fix error path cleanup in io_sqe_files_register()
  Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly"
  io_uring: fix REQ_F_COMP_LOCKED by killing it
  io_uring: dig out COMP_LOCK from deep call chain
  io_uring: don't put a poll req under spinlock
  io_uring: don't unnecessarily clear F_LINK_TIMEOUT
  io_uring: don't set COMP_LOCKED if won't put
  ...
2020-10-20 13:19:30 -07:00
Stephen Boyd
5f56888fad Merge branches 'clk-ingenic', 'clk-at91', 'clk-kconfig', 'clk-imx', 'clk-qcom', 'clk-prima2' and 'clk-bcm' into clk-next
- Support qcom SM8150/SM8250 video and display clks
 - Change how qcom's display port clks work

* clk-ingenic:
  clk: ingenic: Respect CLK_SET_RATE_PARENT in .round_rate
  clk: ingenic: Don't tag custom clocks with CLK_SET_RATE_PARENT
  clk: ingenic: Don't use CLK_SET_RATE_GATE for PLL
  clk: ingenic: Use readl_poll_timeout instead of custom loop
  clk: ingenic: Use to_clk_info() macro for all clocks

* clk-at91:
  clk: at91: sam9x60: support only two programmable clocks
  clk: at91: clk-sam9x60-pll: remove unused variable
  clk: at91: clk-main: update key before writing AT91_CKGR_MOR
  clk: at91: remove the checking of parent_name

* clk-kconfig:
  clk: Restrict CLK_HSDK to ARC_SOC_HSDK

* clk-imx:
  clk: imx8mq: Fix usdhc parents order
  clk: imx: imx21: Remove clock driver
  clk: imx: gate2: Fix a few typos
  clk: imx: Fix and update kerneldoc
  clk: imx: fix i.MX7D peripheral clk mux flags
  clk: imx: fix composite peripheral flags
  clk: imx: Correct the memrepair clock on imx8mp
  clk: imx: Correct the root clk of media ldb on imx8mp
  clk: imx: vf610: Add CRC clock
  clk: imx: Explicitly include bits.h
  clk: imx8qxp: Support building i.MX8QXP clock driver as module
  clk: imx8m: Support module build
  clk: imx: Add clock configuration for ARMv7 platforms
  clk: imx: Support building i.MX common clock driver as module
  clk: composite: Export clk_hw_register_composite()
  clk: imx6sl: Use BIT(x) to avoid shifting signed 32-bit value by 31 bits

* clk-qcom:
  clk: qcom: gdsc: Keep RETAIN_FF bit set if gdsc is already on
  clk: qcom: Add display clock controller driver for SM8150 and SM8250
  dt-bindings: clock: add QCOM SM8150 and SM8250 display clock bindings
  clk: qcom: add video clock controller driver for SM8250
  clk: qcom: add video clock controller driver for SM8150
  dt-bindings: clock: add SM8250 QCOM video clock bindings
  dt-bindings: clock: add SM8150 QCOM video clock bindings
  dt-bindings: clock: combine qcom,sdm845-videocc and qcom,sc7180-videocc
  clk: qcom: gcc-msm8994: Add missing clocks, resets and GDSCs
  clk/qcom: fix spelling typo
  clk: qcom: gcc-sdm660: Fix wrong parent_map
  clk: qcom: dispcc: Update DP clk ops for phy design
  clk: qcom: gcc-msm8939: remove defined but not used variables
  clk: qcom: ipq8074: make pcie0_rchng_clk_src static

* clk-prima2:
  clk: clk-prima2: fix return value check in prima2_clk_init()

* clk-bcm:
  clk: bcm2835: add missing release if devm_clk_hw_register fails
  clk: bcm: rpi: Add register to control pixel bvb clk
2020-10-20 11:47:07 -07:00
Stephen Boyd
3ab9a54f76 Merge branches 'clk-simplify', 'clk-ti', 'clk-tegra', 'clk-rockchip' and 'clk-mediatek' into clk-next
- Small non-critical fixes for TI clk driver
 - Support Mediatek MT8167 clks

* clk-simplify:
  clk: mediatek: fix platform_no_drv_owner.cocci warnings
  clk: mediatek: mt7629: simplify the return expression of mtk_infrasys_init
  clk: mediatek: mt6797: simplify the return expression of mtk_infrasys_init

* clk-ti:
  clk: ti: dra7: add missing clkctrl register for SHA2 instance
  clk: ti: clockdomain: fix static checker warning
  clk: ti: autoidle: add checks against NULL pointer reference
  clk: keystone: sci-clk: add 10% slack to set_rate
  clk: keystone: sci-clk: cache results of last query rate operation
  clk: keystone: sci-clk: fix parsing assigned-clock data during probe

* clk-tegra:
  clk: tegra: Drop !provider check in tegra210_clk_emc_set_rate()

* clk-rockchip:
  clk: rockchip: Initialize hw to error to avoid undefined behavior
  clk: rockchip: rk3399: Support module build
  clk: rockchip: fix the clk config to support module build
  clk: rockchip: Export some clock common APIs for module drivers
  clk: rockchip: Export rockchip_register_softrst()
  clk: rockchip: Export rockchip_clk_register_ddrclk()
  clk: rockchip: Use clk_hw_register_composite instead of clk_register_composite calls
  clk: rockchip: rk3308: drop unused mux_timer_src_p

* clk-mediatek:
  clk: mediatek: Add MT8167 clock support
  dt-bindings: clock: mediatek: add bindings for MT8167 clocks
  clk: mediatek: add UART0 clock support
2020-10-20 11:46:47 -07:00
Stephen Boyd
9d3261628a Merge branches 'clk-renesas', 'clk-amlogic', 'clk-allwinner', 'clk-samsung', 'clk-doc' and 'clk-unused' into clk-next
- Remove various unused variables in clk drivers

* clk-renesas:
  clk: renesas: rcar-gen3: Update description for RZ/G2
  clk: renesas: cpg-mssr: Add support for R-Car V3U
  clk: renesas: cpg-mssr: Add register pointers into struct cpg_mssr_priv
  clk: renesas: cpg-mssr: Use enum clk_reg_layout instead of a boolean flag
  dt-bindings: clock: renesas,cpg-mssr: Document r8a779a0
  dt-bindings: clock: Add r8a779a0 CPG Core Clock Definitions
  dt-bindings: power: Add r8a779a0 SYSC power domain definitions
  clk: renesas: rcar-gen2: Rename vsp1-(sy|rt) clocks to vsp(s|r)
  clk: renesas: r8a7742: Add clk entry for VSPR

* clk-amlogic:
  clk: meson: make shipped controller configurable
  clk: meson: g12a: mark fclk_div2 as critical
  clk: meson: axg-audio: fix g12a tdmout sclk inverter
  clk: meson: axg-audio: separate axg and g12a regmap tables
  clk: meson: add sclk-ws driver

* clk-allwinner:
  clk: sunxi-ng: sun8i: r40: Use sigma delta modulation for audio PLL
  clk: sunxi-ng: add support for the Allwinner A100 CCU
  dt-bindings: clk: sunxi-ccu: add compatible string for A100 CCU and R-CCU

* clk-samsung:
  clk: s2mps11: initialize driver via module_platform_driver
  clk: samsung: Use cached clk_hws instead of __clk_lookup() calls
  clk: samsung: exynos5420/5250: Add IDs to the CPU parent clk definitions
  clk: samsung: Add clk ID definitions for the CPU parent clocks
  clk: samsung: exynos5420: Avoid __clk_lookup() calls when enabling clocks
  clk: samsung: exynos5420: Add definition of clock ID for mout_sw_aclk_g3d
  clk: samsung: Keep top BPLL mux on Exynos542x enabled

* clk-doc:
  clk: davinci: add missing kerneldoc
  clk: fixed: add missing kerneldoc

* clk-unused:
  clk: socfpga: agilex: Remove unused variable 'cntr_mux'
  clk: si5341: drop unused 'err' variable
  clk: mmp: pxa1928: drop unused 'clk' variable
  clk: at91: drop unused at91sam9g45_pcr_layout
2020-10-20 11:46:34 -07:00
Linus Torvalds
38525c6919 Merge tag 'for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
 "Power-supply core:
   - add wireless type
   - properly document current direction

  Battery/charger driver changes:
   - new fuel-gauge/charger driver for RN5T618/RN5T619
   - new charger driver for BQ25980, BQ25975 and BQ25960
   - bq27xxx-battery: add support for TI bq34z100
   - gpio-charger: convert to GPIO descriptors
   - gpio-charger: add optional support for charge current limiting
   - max17040: add support for max17041, max17043, max17044
   - max17040: add support for max17048, max17049, max17058, max17059
   - smb347-charger: add DT support
   - smb247-charger: add SMB345 and SMB358 support
   - simple-battery: add temperature properties
   - lots of minor fixes, cleanups and DT binding YAML conversions

  Reset drivers:
   - ocelot: Add support for Sparx5"

* tag 'for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (81 commits)
  power: reset: POWER_RESET_OCELOT_RESET should depend on Ocelot or Sparx5
  power: supply: bq25980: Fix uninitialized wd_reg_val and overrun
  power: supply: ltc2941: Fix ptr to enum cast
  power: supply: test-power: revise parameter printing to use sprintf
  power: supply: charger-manager: fix incorrect check on charging_duration_ms
  power: supply: max17040: Fix ptr to enum cast
  power: supply: bq25980: Fix uninitialized wd_reg_val
  power: supply: bq25980: remove redundant zero check on ret
  power: reset: ocelot: Add support for Sparx5
  dt-bindings: reset: ocelot: Add Sparx5 support
  power: supply: sbs-battery: keep error code when get_property() fails
  power: supply: bq25980: Add support for the BQ259xx family
  dt-binding: bq25980: Add the bq25980 flash charger
  power: supply: fix spelling mistake "unprecise" -> "imprecise"
  power: supply: test_power: add missing newlines when printing parameters by sysfs
  power: supply: pm2301: drop duplicated i2c_device_id
  power: supply: charger-manager: drop unused charger assignment
  power: supply: rt9455: skip 'struct acpi_device_id' when !CONFIG_ACPI
  power: supply: goldfish: skip 'struct acpi_device_id' when !CONFIG_ACPI
  power: supply: bq25890: skip 'struct acpi_device_id' when !CONFIG_ACPI
  ...
2020-10-20 10:56:34 -07:00
Bean Huo
aa9c9b3f3f PM: runtime: Fix typo in pm_runtime_set_active() helper comment
This patch is to fix typo in the comment of helper pm_runtime_set_active().

Signed-off-by: Bean Huo <beanhuo@micron.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-20 19:45:14 +02:00
Linus Torvalds
4a5bb973fa Merge tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:

 - A single patch to fix the Xen security issue XSA-331 (malicious
   guests can DoS dom0 by triggering NULL-pointer dereferences or access
   to stale data).

 - A larger series to fix the Xen security issue XSA-332 (malicious
   guests can DoS dom0 by sending events at high frequency leading to
   dom0's vcpus being busy in IRQ handling for elongated times).

* tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: block rogue events for some time
  xen/events: defer eoi in case of excessive number of events
  xen/events: use a common cpu hotplug hook for event channels
  xen/events: switch user event channels to lateeoi model
  xen/pciback: use lateeoi irq binding
  xen/pvcallsback: use lateeoi irq binding
  xen/scsiback: use lateeoi irq binding
  xen/netback: use lateeoi irq binding
  xen/blkback: use lateeoi irq binding
  xen/events: add a new "late EOI" evtchn framework
  xen/events: fix race in evtchn_fifo_unmask()
  xen/events: add a proper barrier to 2-level uevent unmasking
  xen/events: avoid removing an event channel while handling it
2020-10-20 09:24:01 -07:00
Yufen Yu
cb3a92da23 block: remove unused members for io_context
After removing blk-sq code, there is no user of nr_batch_requests
and last_waited in kernel.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-20 07:10:14 -06:00
Paolo Bonzini
1b21c8db0e Merge tag 'kvmarm-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.10

- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
2020-10-20 08:14:25 -04:00
Saeed Mirzamohammadi
31cc578ae2 netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:54 +02:00
Christoph Hellwig
695cebe58d dma-mapping: move more functions to dma-map-ops.h
Due to a mismerge a bunch of prototypes that should have moved to
dma-map-ops.h are still in dma-mapping.h, fix that up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-20 10:41:07 +02:00
Juergen Gross
54c9de8989 xen/events: add a new "late EOI" evtchn framework
In order to avoid tight event channel related IRQ loops add a new
framework of "late EOI" handling: the IRQ the event channel is bound
to will be masked until the event has been handled and the related
driver is capable to handle another event. The driver is responsible
for unmasking the event channel via the new function xen_irq_lateeoi().

This is similar to binding an event channel to a threaded IRQ, but
without having to structure the driver accordingly.

In order to support a future special handling in case a rogue guest
is sending lots of unsolicited events, add a flag to xen_irq_lateeoi()
which can be set by the caller to indicate the event was a spurious
one.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:21:59 +02:00
Linus Torvalds
694565356c Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:

 - Support directly accessing host page cache from virtiofs. This can
   improve I/O performance for various workloads, as well as reducing
   the memory requirement by eliminating double caching. Thanks to Vivek
   Goyal for doing most of the work on this.

 - Allow automatic submounting inside virtiofs. This allows unique
   st_dev/ st_ino values to be assigned inside the guest to files
   residing on different filesystems on the host. Thanks to Max Reitz
   for the patches.

 - Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
  virtiofs: calculate number of scatter-gather elements accurately
  fuse: connection remove fix
  fuse: implement crossmounts
  fuse: Allow fuse_fill_super_common() for submounts
  fuse: split fuse_mount off of fuse_conn
  fuse: drop fuse_conn parameter where possible
  fuse: store fuse_conn in fuse_req
  fuse: add submount support to <uapi/linux/fuse.h>
  fuse: fix page dereference after free
  virtiofs: add logic to free up a memory range
  virtiofs: maintain a list of busy elements
  virtiofs: serialize truncate/punch_hole and dax fault path
  virtiofs: define dax address space operations
  virtiofs: add DAX mmap support
  virtiofs: implement dax read/write operations
  virtiofs: introduce setupmapping/removemapping commands
  virtiofs: implement FUSE_INIT map_alignment field
  virtiofs: keep a list of free dax memory ranges
  virtiofs: add a mount option to enable dax
  virtiofs: set up virtio_fs dax_device
  ...
2020-10-19 14:28:30 -07:00
Al Grant
f3d301c1f2 perf: correct SNOOPX field offset
perf_event.h has macros that define the field offsets in the
data_src bitmask in perf records. The SNOOPX and REMOTE offsets
were both 37. These are distinct fields, and the bitfield layout
in perf_mem_data_src confirms that SNOOPX should be at offset 38.

Fixes: 52839e653b ("perf tools: Add support for printing new mem_info encodings")
Signed-off-by: Al Grant <al.grant@foss.arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/4ac9f5cc-4388-b34a-9999-418a4099415d@foss.arm.com
2020-10-19 19:39:22 +02:00
Linus Torvalds
7cf726a594 Merge tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kunit updates from Shuah Khan:

 - add Kunit to kernel_init() and remove KUnit from init calls entirely.

   This addresses the concern that Kunit would not work correctly during
   late init phase.

 - add a linker section where KUnit can put references to its test
   suites.

   This is the first step in transitioning to dispatching all KUnit
   tests from a centralized executor rather than having each as its own
   separate late_initcall.

 - add a centralized executor to dispatch tests rather than relying on
   late_initcall to schedule each test suite separately. Centralized
   execution is for built-in tests only; modules will execute tests when
   loaded.

 - convert bitfield test to use KUnit framework

 - Documentation updates for naming guidelines and how
   kunit_test_suite() works.

 - add test plan to KUnit TAP format

* tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILE
  lib: kunit: add bitfield test conversion to KUnit
  Documentation: kunit: add a brief blurb about kunit_test_suite
  kunit: test: add test plan to KUnit TAP format
  init: main: add KUnit to kernel init
  kunit: test: create a single centralized executor for all tests
  vmlinux.lds.h: add linker section for KUnit test suites
  Documentation: kunit: Add naming guidelines
2020-10-18 14:45:59 -07:00
Linus Torvalds
41eea65e2a Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:

 - Debugging for smp_call_function()

 - RT raw/non-raw lock ordering fixes

 - Strict grace periods for KASAN

 - New smp_call_function() torture test

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
  the RCU branch due to questions about the series.   - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  smp: Make symbol 'csd_bug_count' static
  kernel/smp: Provide CSD lock timeout diagnostics
  smp: Add source and destination CPUs to __call_single_data
  rcu: Shrink each possible cpu krcp
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  torture: Add gdb support
  rcutorture: Allow pointer leaks to test diagnostic code
  rcutorture: Hoist OOM registry up one level
  refperf: Avoid null pointer dereference when buf fails to allocate
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Properly set rcu_fwds for OOM handling
  torture: Add kvm.sh --help and update help message
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Update initrd documentation
  rcutorture: Replace HTTP links with HTTPS ones
  locktorture: Make function torture_percpu_rwsem_init() static
  torture: document --allcpus argument added to the kvm.sh script
  rcutorture: Output number of elapsed grace periods
  rcutorture: Remove KCSAN stubs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  ...
2020-10-18 14:34:50 -07:00
Christoph Hellwig
301fa9f2dd mm: remove alloc_vm_area
All users are gone now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lkml.kernel.org/r/20201002122204.1534411-12-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Christoph Hellwig
3e9a9e256b mm: add a vmap_pfn function
Add a proper helper to remap PFNs into kernel virtual space so that
drivers don't have to abuse alloc_vm_area and open coded PTE manipulation
for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lkml.kernel.org/r/20201002122204.1534411-4-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Christoph Hellwig
b944afc9d6 mm: add a VM_MAP_PUT_PAGES flag for vmap
Add a flag so that vmap takes ownership of the passed in page array.  When
vfree is called on such an allocation it will put one reference on each
page, and free the page array itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lkml.kernel.org/r/20201002122204.1534411-3-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Minchan Kim
ecb8ac8b1f mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app.  Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2).  It uses pidfd of an external process to give the
hint.  It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement.  I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations.  In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment.  With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky.  Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone.  Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint.  It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
                unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
      The process_madvise() system call is used to give advice or directions
      to the kernel about the address ranges from external process as well as
      local process. It provides the advice to address ranges of process
      described by iovec and vlen. The goal of such advice is to improve
      system or application performance.

      The pidfd selects the process referred to by the PID file descriptor
      specified in pidfd. (See pidofd_open(2) for further information)

      The pointer iovec points to an array of iovec structures, defined in
      <sys/uio.h> as:

        struct iovec {
            void *iov_base;         /* starting address */
            size_t iov_len;         /* number of bytes to be advised */
        };

      The iovec describes address ranges beginning at address(iov_base)
      and with size length of bytes(iov_len).

      The vlen represents the number of elements in iovec.

      The advice is indicated in the advice argument, which is one of the
      following at this moment if the target process specified by pidfd is
      external.

        MADV_COLD
        MADV_PAGEOUT

      Permission to provide a hint to external process is governed by a
      ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

      The process_madvise supports every advice madvise(2) has if target
      process is in same thread group with calling process so user could
      use process_madvise(2) to extend existing madvise(2) to support
      vector address ranges.

    RETURN VALUE
      On success, process_madvise() returns the number of bytes advised.
      This return value may be less than the total number of requested
      bytes, if an error occurred. The caller should check return value
      to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote.  The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves.  We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called.  If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect.  It's the
responsibility of the process calling process_madvise to close this
race condition.  For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called.  Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process.  Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm.  The suggested API itself does not provide synchronization.  It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though.  Why is it so wrong to require
that callers do their own synchronization in some manner?  Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something.  Think about mmap.  It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before.  That's where we need synchronization by using other API or
design from userside.  It shouldn't be part of API itself.  If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3].  Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA.  Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill.  It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
  Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
  Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
  Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
  Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
  Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
  Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
  Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
  Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Minchan Kim
1aa92cd31c pid: move pidfd_get_pid() to pid.c
process_madvise syscall needs pidfd_get_pid function to translate pidfd to
pid so this patch move the function to kernel/pid.c.

Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-3-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Minchan Kim
0726b01e70 mm/madvise: pass mm to do_madvise
Patch series "introduce memory hinting API for external process", v9.

Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API.  With
that, application could give hints to kernel what memory range are
preferred to be reclaimed.  However, in some platform(e.g., Android), the
information required to make the hinting decision is not known to the app.
Instead, it is known to a centralized userspace daemon(e.g.,
ActivityManagerService), and that daemon must be able to initiate reclaim
on its own without any app involvement.

To solve the concern, this patch introduces new syscall -
process_madvise(2).  Bascially, it's same with madvise(2) syscall but it
has some differences.

1. It needs pidfd of target process to provide the hint

2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this
   moment.  Other hints in madvise will be opened when there are explicit
   requests from community to prevent unexpected bugs we couldn't support.

3. Only privileged processes can do something for other process's
   address space.

For more detail of the new API, please see "mm: introduce external memory
hinting API" description in this patchset.

This patch (of 3):

In upcoming patches, do_madvise will be called from external process
context so we shouldn't asssume "current" is always hinted process's
task_struct.

Furthermore, we must not access mm_struct via task->mm, but obtain it via
access_mm() once (in the following patch) and only use that pointer [1],
so pass it to do_madvise() as well.  Note the vma->vm_mm pointers are
safe, so we can use them further down the call stack.

And let's pass current->mm as arguments of do_madvise so it shouldn't
change existing behavior but prepare next patch to make review easy.

[vbabka@suse.cz: changelog tweak]
[minchan@kernel.org: use current->mm for io_uring]
  Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org
[akpm@linux-foundation.org: fix it for upstream changes]
[akpm@linux-foundation.org: whoops]
[rdunlap@infradead.org: add missing includes]

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: John Dias <joaodias@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200901000633.1920247-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-2-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-2-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:09 -07:00