Commit Graph

114298 Commits

Author SHA1 Message Date
David S. Miller
5b7fe93db0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-10-27

The following pull-request contains BPF updates for your *net-next* tree.

We've added 52 non-merge commits during the last 11 day(s) which contain
a total of 65 files changed, 2604 insertions(+), 1100 deletions(-).

The main changes are:

 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF
    assembly code. The work here teaches BPF verifier to recognize
    kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints
    such that verifier allows direct use of bpf_skb_event_output() helper
    used in tc BPF et al (w/o probing memory access) that dumps skb data
    into perf ring buffer. Also add direct loads to probe memory in order
    to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov.

 2) Big batch of changes to improve libbpf and BPF kselftests. Besides
    others: generalization of libbpf's CO-RE relocation support to now
    also include field existence relocations, revamp the BPF kselftest
    Makefile to add test runner concept allowing to exercise various
    ways to build BPF programs, and teach bpf_object__open() and friends
    to automatically derive BPF program type/expected attach type from
    section names to ease their use, from Andrii Nakryiko.

 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu.

 4) Allow to read BTF as raw data from bpftool. Most notable use case
    is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa.

 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which
    manages to improve "rx_drop" performance by ~4%., from Björn Töpel.

 6) Fix to restore the flow dissector after reattach BPF test and also
    fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki.

 7) Improve verifier's BTF ctx access for use outside of raw_tp, from
    Martin KaFai Lau.

 8) Improve documentation for AF_XDP with new sections and to reflect
    latest features, from Magnus Karlsson.

 9) Add back 'version' section parsing to libbpf for old kernels, from
    John Fastabend.

10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(),
    from KP Singh.

11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order
    to improve insn coverage for built BPF progs, from Yonghong Song.

12) Misc minor cleanups and fixes, from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26 22:57:27 -07:00
David S. Miller
4b1f5ddaff Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
more specifically:

* Updates for ipset:

1) Coding style fix for ipset comment extension, from Jeremy Sowden.

2) De-inline many functions in ipset, from Jeremy Sowden.

3) Move ipset function definition from header to source file.

4) Move ip_set_put_flags() to source, export it as a symbol, remove
   inline.

5) Move range_to_mask() to the source file where this is used.

6) Move ip_set_get_ip_port() to the source file where this is used.

* IPVS selftests and netns improvements:

7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan.

8) Three patches to add selftest script for ipvs, also from
   Haishuang Yan.

* Conntrack updates and new nf_hook_slow_list() function:

9) Document ct ecache extension, from Florian Westphal.

10) Skip ct extensions from ctnetlink dump, from Florian.

11) Free ct extension immediately, from Florian.

12) Skip access to ecache extension from nf_ct_deliver_cached_events()
    this is not correct as reported by Syzbot.

13) Add and use nf_hook_slow_list(), from Florian.

* Flowtable infrastructure updates:

14) Move priority to nf_flowtable definition.

15) Dynamic allocation of per-device hooks in flowtables.

16) Allow to include netdevice only once in flowtable definitions.

17) Rise maximum number of devices per flowtable.

* Netfilter hardware offload infrastructure updates:

18) Add nft_flow_block_chain() helper function.

19) Pass callback list to nft_setup_cb_call().

20) Add nft_flow_cls_offload_setup() helper function.

21) Remove rules for the unregistered device via netdevice event.

22) Support for multiple devices in a basechain definition at the
    ingress hook.

22) Add nft_chain_offload_cmd() helper function.

23) Add nft_flow_block_offload_init() helper function.

24) Rewind in case of failing to bind multiple devices to hook.

25) Typo in IPv6 tproxy module description, from Norman Rasmussen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26 11:35:43 -07:00
Jason Baron
480274787d tcp: add TCP_INFO status for failed client TFO
The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
or not data-in-SYN was ack'd on both the client and server side. We'd like
to gather more information on the client-side in the failure case in order
to indicate the reason for the failure. This can be useful for not only
debugging TFO, but also for creating TFO socket policies. For example, if
a middle box removes the TFO option or drops a data-in-SYN, we can
can detect this case, and turn off TFO for these connections saving the
extra retransmits.

The newly added tcpi_fastopen_client_fail status is 2 bits and has the
following 4 states:

1) TFO_STATUS_UNSPEC

Catch-all state which includes when TFO is disabled via black hole
detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.

2) TFO_COOKIE_UNAVAILABLE

If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
is available in the cache.

3) TFO_DATA_NOT_ACKED

Data was sent with SYN, we received a SYN/ACK but it did not cover the data
portion. Cookie is not accepted by server because the cookie may be invalid
or the server may be overloaded.

4) TFO_SYN_RETRANSMITTED

Data was sent with SYN, we received a SYN/ACK which did not cover the data
after at least 1 additional SYN was sent (without data). It may be the case
that a middle-box is dropping data-in-SYN packets. Thus, it would be more
efficient to not use TFO on this connection to avoid extra retransmits
during connection establishment.

These new fields do not cover all the cases where TFO may fail, but other
failures, such as SYN/ACK + data being dropped, will result in the
connection not becoming established. And a connection blackhole after
session establishment shows up as a stalled connection.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25 19:25:37 -07:00
Martin KaFai Lau
3820729160 bpf: Prepare btf_ctx_access for non raw_tp use case
This patch makes a few changes to btf_ctx_access() to prepare
it for non raw_tp use case where the attach_btf_id is not
necessary a BTF_KIND_TYPEDEF.

It moves the "btf_trace_" prefix check and typedef-follow logic to a new
function "check_attach_btf_id()" which is called only once during
bpf_check().  btf_ctx_access() only operates on a BTF_KIND_FUNC_PROTO
type now. That should also be more efficient since it is done only
one instead of every-time check_ctx_access() is called.

"check_attach_btf_id()" needs to find the func_proto type from
the attach_btf_id.  It needs to store the result into the
newly added prog->aux->attach_func_proto.  func_proto
btf type has no name, so a proper name should be stored into
"attach_func_name" also.

v2:
- Move the "btf_trace_" check to an earlier verifier phase (Alexei)

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191025001811.1718491-1-kafai@fb.com
2019-10-24 18:41:08 -07:00
Tao Ren
b9bcb95315 net: phy: broadcom: add 1000Base-X support for BCM54616S
The BCM54616S PHY cannot work properly in RGMII->1000Base-X mode, mainly
because genphy functions are designed for copper links, and 1000Base-X
(clause 37) auto negotiation needs to be handled differently.

This patch enables 1000Base-X support for BCM54616S by customizing 3
driver callbacks, and it's verified to be working on Facebook CMM BMC
platform (RGMII->1000Base-KX):

  - probe: probe callback detects PHY's operation mode based on
    INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
    Control register.

  - config_aneg: calls genphy_c37_config_aneg when the PHY is running in
    1000Base-X mode; otherwise, genphy_config_aneg will be called.

  - read_status: calls genphy_c37_read_status when the PHY is running in
    1000Base-X mode; otherwise, genphy_read_status will be called.

Note: BCM54616S PHY can also be configured in RGMII->100Base-FX mode, and
100Base-FX support is not available as of now.

Signed-off-by: Tao Ren <taoren@fb.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23 20:42:52 -07:00
Heiner Kallweit
fa6e98cee5 net: phy: add support for clause 37 auto-negotiation
This patch adds support for clause 37 1000Base-X auto-negotiation.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Tao Ren <taoren@fb.com>
Tested-by: René van Dorst <opensource@vdorst.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-23 20:42:52 -07:00
Pablo Neira Ayuso
d54725cd11 netfilter: nf_tables: support for multiple devices per netdev hook
This patch allows you to register one netdev basechain to multiple
devices. This adds a new NFTA_HOOK_DEVS netlink attribute to specify
the list of netdevices. Basechains store a list of hooks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-23 13:01:34 +02:00
Pablo Neira Ayuso
cb662ac671 netfilter: nf_tables: increase maximum devices number per flowtable
Rise the maximum limit of devices per flowtable up to 256. Rename
NFT_FLOWTABLE_DEVICE_MAX to NFT_NETDEVICE_MAX in preparation to reuse
the netdev hook parser for ingress basechain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-23 13:01:26 +02:00
Pablo Neira Ayuso
3f0465a9ef netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables
Use a list of hooks per device instead an array.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-23 13:01:22 +02:00
Pablo Neira Ayuso
71a8a63b9d netfilter: nf_flow_table: move priority to struct nf_flowtable
Hardware offload needs access to the priority field, store this field in
the nf_flowtable object.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-23 13:01:03 +02:00
Ben Dooks (Codethink)
5e5b03d163 xdp: Fix type of string pointer in __XDP_ACT_SYM_TAB
The table entry in __XDP_ACT_SYM_TAB for the last item is set
to { -1, 0 } where it should be { -1, NULL } as the second item
is a pointer to a string.

Fixes the following sparse warnings:

./include/trace/events/xdp.h:28:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:53:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:82:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:140:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:155:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:190:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:225:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:260:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:318:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:356:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:390:1: warning: Using plain integer as NULL pointer

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191022125925.10508-1-ben.dooks@codethink.co.uk
2019-10-22 21:39:54 +02:00
Vivien Didelot
7e99e34701 net: dsa: remove dsa_switch_alloc helper
Now that ports are dynamically listed in the fabric, there is no need
to provide a special helper to allocate the dsa_switch structure. This
will give more flexibility to drivers to embed this structure as they
wish in their private structure.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:07 -07:00
Vivien Didelot
05f294a852 net: dsa: allocate ports on touch
Allocate the struct dsa_port the first time it is accessed with
dsa_port_touch, and remove the static dsa_port array from the
dsa_switch structure.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:07 -07:00
Vivien Didelot
da4561cda2 net: dsa: use ports list to setup default CPU port
Use the new ports list instead of iterating over switches and their
ports when setting up the default CPU port. Unassign it on teardown.

Now that we can iterate over multiple CPU ports, remove dst->cpu_dp.

At the same time, provide a better error message for CPU-less tree.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:07 -07:00
Vivien Didelot
fb35c60cba net: dsa: use ports list to setup switches
Use the new ports list instead of iterating over switches and their
ports when setting up the switches and their ports.

At the same time, provide setup states and messages for ports and
switches as it is done for the trees.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:06 -07:00
Vivien Didelot
b96ddf254b net: dsa: use ports list in dsa_to_port
Use the new ports list instead of accessing the dsa_switch array
of ports in the dsa_to_port helper.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:06 -07:00
Vivien Didelot
ab8ccae122 net: dsa: add ports list in the switch fabric
Add a list of switch ports within the switch fabric. This will help the
lookup of a port inside the whole fabric, and it is the first step
towards supporting multiple CPU ports, before deprecating the usage of
the unique dst->cpu_dp pointer.

In preparation for a future allocation of the dsa_port structures,
return -ENOMEM in case no structure is returned, even though this
error cannot be reached yet.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:06 -07:00
Vivien Didelot
68bb8ea8ad net: dsa: use dsa_to_port helper everywhere
Do not let the drivers access the ds->ports static array directly
while there is a dsa_to_port helper for this purpose.

At the same time, un-const this helper since the SJA1105 driver
assigns the priv member of the returned dsa_port structure.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 12:37:06 -07:00
David S. Miller
2f184393e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Several cases of overlapping changes which were for the most
part trivially resolvable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20 10:43:00 -07:00
Linus Torvalds
531e93d114 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "I was battling a cold after some recent trips, so quite a bit piled up
  meanwhile, sorry about that.

  Highlights:

   1) Fix fd leak in various bpf selftests, from Brian Vazquez.

   2) Fix crash in xsk when device doesn't support some methods, from
      Magnus Karlsson.

   3) Fix various leaks and use-after-free in rxrpc, from David Howells.

   4) Fix several SKB leaks due to confusion of who owns an SKB and who
      should release it in the llc code. From Eric Biggers.

   5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

   6) Jumbo packets don't work after resume on r8169, as the BIOS resets
      the chip into non-jumbo mode during suspend. From Heiner Kallweit.

   7) Corrupt L2 header during MPLS push, from Davide Caratti.

   8) Prevent possible infinite loop in tc_ctl_action, from Eric
      Dumazet.

   9) Get register bits right in bcmgenet driver, based upon chip
      version. From Florian Fainelli.

  10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

  11) Cure race between route lookup and invalidation in ipv4, from Wei
      Wang.

  12) Fix performance regression due to false sharing in 'net'
      structure, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
  net: reorder 'struct net' fields to avoid false sharing
  net: dsa: fix switch tree list
  net: ethernet: dwmac-sun8i: show message only when switching to promisc
  net: aquantia: add an error handling in aq_nic_set_multicast_list
  net: netem: correct the parent's backlog when corrupted packet was dropped
  net: netem: fix error path for corrupted GSO frames
  macb: propagate errors when getting optional clocks
  xen/netback: fix error path of xenvif_connect_data()
  net: hns3: fix mis-counting IRQ vector numbers issue
  net: usb: lan78xx: Connect PHY before registering MAC
  vsock/virtio: discard packets if credit is not respected
  vsock/virtio: send a credit update when buffer size is changed
  mlxsw: spectrum_trap: Push Ethernet header before reporting trap
  net: ensure correct skb->tstamp in various fragmenters
  net: bcmgenet: reset 40nm EPHY on energy detect
  net: bcmgenet: soft reset 40nm EPHYs before MAC init
  net: phy: bcm7xxx: define soft_reset for 40nm EPHY
  net: bcmgenet: don't set phydev->link from MAC
  net: Update address for MediaTek ethernet driver in MAINTAINERS
  ipv4: fix race condition between route lookup and invalidation
  ...
2019-10-19 17:09:11 -04:00
Eric Dumazet
2a06b8982f net: reorder 'struct net' fields to avoid false sharing
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.

Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)

So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.

We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.

We move in the first cache line fields that can be
dirtied often.

We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.

Fixes: 355b985537 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:21:53 -07:00
Linus Torvalds
5f93393a15 Merge tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Just a few small fixes for the usual suspect, HD- and USB-audio:
  enablement of runtime PM for Nvidia due to the recent PCI changes, a
  fix for potential hangs with recent HD-audio platforms, and the rest
  device-specific quirks"

* tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Force runtime PM on Nvidia HDMI codecs
  ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
  ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
  ALSA: hdac: clear link output stream mapping
  ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
2019-10-18 09:21:13 -07:00
Linus Torvalds
0e2adab6cf Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "The main thing here is a long-awaited workaround for a CPU erratum on
  ThunderX2 which we have developed in conjunction with engineers from
  Cavium/Marvell.

  At the moment, the workaround is unconditionally enabled for affected
  CPUs at runtime but we may add a command-line option to disable it in
  future if performance numbers show up indicating a significant cost
  for real workloads.

  Summary:

   - Work around Cavium/Marvell ThunderX2 erratum #219

   - Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses

   - More fixes to the spurious kernel fault detection logic

   - Fix pathological preemption race when enabling some CPU features at boot

   - Drop broken kcore macros in favour of generic implementations

   - Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled

   - Avoid NULL dereference on allocation failure during hibernation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: tags: Preserve tags for addresses translated via TTBR1
  arm64: mm: fix inverted PAR_EL1.F check
  arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
  arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
  arm64: hibernate: check pgd table allocation
  arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
  arm64: Fix kcore macros after 52-bit virtual addressing fallout
  arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
  arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
  arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
  arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
2019-10-17 17:00:14 -07:00
Marek Vasut
1d951ba3da net: phy: micrel: Update KSZ87xx PHY name
The KSZ8795 PHY ID is in fact used by KSZ8794/KSZ8795/KSZ8765 switches.
Update the PHY ID and name to reflect that, as this family of switches
is commonly refered to as KSZ87xx

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17 16:31:52 -07:00
Will Deacon
777d062e5b Merge branch 'errata/tx2-219' into for-next/fixes
Workaround for Cavium/Marvell ThunderX2 erratum #219.

* errata/tx2-219:
  arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
  arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
  arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
  arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
2019-10-17 13:42:42 -07:00
Linus Torvalds
7801158f83 Merge tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
 "The fixes pertain to a problem with initializing the Intel GPIO
  irqchips when adding gpiochips.

  Andy fixed it up elegantly by adding a hardware initialization
  callback to the struct gpio_irq_chip so let's use this. Tested and
  verified on the target hardware"

* tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: lynxpoint: set default handler to be handle_bad_irq()
  gpio: merrifield: Move hardware initialization to callback
  gpio: lynxpoint: Move hardware initialization to callback
  gpio: intel-mid: Move hardware initialization to callback
  gpiolib: Initialize the hardware with a callback
  gpio: merrifield: Restore use of irq_base
2019-10-17 08:08:20 -07:00
Alexei Starovoitov
a7658e1a41 bpf: Check types of arguments passed into helpers
Introduce new helper that reuses existing skb perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct sk_buff *' as tracepoint argument or
can walk other kernel data structures to skb pointer.

In order to do that teach verifier to resolve true C types
of bpf helpers into in-kernel BTF ids.
The type of kernel pointer passed by raw tracepoint into bpf
program will be tracked by the verifier all the way until
it's passed into helper function.
For example:
kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
bpf programs receives that skb pointer and may eventually
pass it into bpf_skb_output() bpf helper which in-kernel is
implemented via bpf_skb_event_output() kernel function.
Its first argument in the kernel is 'struct sk_buff *'.
The verifier makes sure that types match all the way.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
2019-10-17 16:44:36 +02:00
Alexei Starovoitov
3dec541b2e bpf: Add support for BTF pointers to x86 JIT
Pointer to BTF object is a pointer to kernel object or NULL.
Such pointers can only be used by BPF_LDX instructions.
The verifier changed their opcode from LDX|MEM|size
to LDX|PROBE_MEM|size to make JITing easier.
The number of entries in extable is the number of BPF_LDX insns
that access kernel memory via "pointer to BTF type".
Only these load instructions can fault.
Since x86 extable is relative it has to be allocated in the same
memory region as JITed code.
Allocate it prior to last pass of JITing and let the last pass populate it.
Pointer to extable in bpf_prog_aux is necessary to make page fault
handling fast.
Page fault handling is done in two steps:
1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
   It's done by walking rb tree.
2. then extable for given bpf program is binary searched.
This process is similar to how page faulting is done for kernel modules.
The exception handler skips over faulting x86 instruction and
initializes destination register with zero. This mimics exact
behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).

JITs for other architectures can add support in similar way.
Until then they will reject unknown opcode and fallback to interpreter.

Since extable should be aligned and placed near JITed code
make bpf_jit_binary_alloc() return 4 byte aligned image offset,
so that extable aligning formula in bpf_int_jit_compile() doesn't need
to rely on internal implementation of bpf_jit_binary_alloc().
On x86 gcc defaults to 16-byte alignment for regular kernel functions
due to better performance. JITed code may be aligned to 16 in the future,
but it will use 4 in the meantime.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
2019-10-17 16:44:36 +02:00
Alexei Starovoitov
2a02759ef5 bpf: Add support for BTF pointers to interpreter
Pointer to BTF object is a pointer to kernel object or NULL.
The memory access in the interpreter has to be done via probe_kernel_read
to avoid page faults.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
2019-10-17 16:44:36 +02:00
Alexei Starovoitov
9e15db6613 bpf: Implement accurate raw_tp context access via BTF
libbpf analyzes bpf C program, searches in-kernel BTF for given type name
and stores it into expected_attach_type.
The kernel verifier expects this btf_id to point to something like:
typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
which represents signature of raw_tracepoint "kfree_skb".

Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
and 'void *' in second case.

Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
Like PTR_TO_SOCKET points to 'struct bpf_sock',
PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
PTR_TO_BTF_ID points to in-kernel structs.
If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.

When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
the btf_struct_access() checks which field of 'struct sk_buff' is
at offset 32. Checks that size of access matches type definition
of the field and continues to track the dereferenced type.
If that field was a pointer to 'struct net_device' the r2's type
will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
in vmlinux's BTF.

Such verifier analysis prevents "cheating" in BPF C program.
The program cannot cast arbitrary pointer to 'struct sk_buff *'
and access it. C compiler would allow type cast, of course,
but the verifier will notice type mismatch based on BPF assembly
and in-kernel BTF.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
2019-10-17 16:44:35 +02:00
Alexei Starovoitov
ccfe29eb29 bpf: Add attach_btf_id attribute to program load
Add attach_btf_id attribute to prog_load command.
It's similar to existing expected_attach_type attribute which is
used in several cgroup based program types.
Unfortunately expected_attach_type is ignored for
tracing programs and cannot be reused for new purpose.
Hence introduce attach_btf_id to verify bpf programs against
given in-kernel BTF type id at load time.
It is strictly checked to be valid for raw_tp programs only.
In a later patches it will become:
btf_id == 0 semantics of existing raw_tp progs.
btd_id > 0 raw_tp with BTF and additional type safety.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-5-ast@kernel.org
2019-10-17 16:44:35 +02:00
Alexei Starovoitov
8580ac9404 bpf: Process in-kernel BTF
If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
for further use by the verifier.
In-kernel BTF is trusted just like kallsyms and other build artifacts
embedded into vmlinux.
Yet run this BTF image through BTF verifier to make sure
that it is valid and it wasn't mangled during the build.
If in-kernel BTF is incorrect it means either gcc or pahole or kernel
are buggy. In such case disallow loading BPF programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
2019-10-17 16:44:35 +02:00
Alexei Starovoitov
7c6a469e34 bpf: Add typecast to bpf helpers to help BTF generation
When pahole converts dwarf to btf it emits only used types.
Wrap existing bpf helper functions into typedef and use it in
typecast to make gcc emits this type into dwarf.
Then pahole will convert it to btf.
The "btf_#name_of_helper" types will be used to figure out
types of arguments of bpf helpers.
The generated code before and after is the same.
Only dwarf and btf sections are different.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-3-ast@kernel.org
2019-10-17 16:44:35 +02:00
Alexei Starovoitov
e8c423fb31 bpf: Add typecast to raw_tracepoints to help BTF generation
When pahole converts dwarf to btf it emits only used types.
Wrap existing __bpf_trace_##template() function into
btf_trace_##template typedef and use it in type cast to
make gcc emits this type into dwarf. Then pahole will convert it to btf.
The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191016032505.2089704-2-ast@kernel.org
2019-10-17 16:44:35 +02:00
Florian Westphal
ca58fbe06c netfilter: add and use nf_hook_slow_list()
At this time, NF_HOOK_LIST() macro will iterate the list and then calls
nf_hook() for each individual skb.

This makes it so the entire list is passed into the netfilter core.
The advantage is that we only need to fetch the rule blob once per list
instead of per-skb.

NF_HOOK_LIST now only works for ipv4 and ipv6, as those are the only
callers.

v2: use skb_list_del_init() instead of list_del (Edward Cree)

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17 12:20:48 +02:00
Florian Westphal
2ad9d7747c netfilter: conntrack: free extension area immediately
Instead of waiting for rcu grace period just free it directly.

This is safe because conntrack lookup doesn't consider extensions.

Other accesses happen while ct->ext can't be free'd, either because
a ct refcount was taken or because the conntrack hash bucket lock or
the dying list spinlock have been taken.

This allows to remove __krealloc in a followup patch, netfilter was the
only user.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17 11:47:02 +02:00
Russell King
2203cbf2c8 net: sfp: move fwnode parsing into sfp-bus layer
Rather than parsing the sfp firmware node in phylink, parse it in the
sfp-bus code, so we can re-use this code for PHYs without having to
duplicate the parsing.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-16 14:31:59 -04:00
Julien Thierry
19c95f261c arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.

However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.

* Task 1:

PAN == 0 ---|                          |---------------
            |                          |<- return from IRQ, PSTATE.PAN = 0
            | <- IRQ                   |
            +--------+ <- preempt()  +--
                                     ^
                                     |
                                     reschedule Task 1, PSTATE.PAN == 1
* Init:
        --------------------+------------------------
                            ^
                            |
                            enable_cpu_features
                            set PSTATE.PAN on all CPUs

Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).

Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.

This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
 expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 09:51:43 -07:00
Russell King
554032cdfb net: phylink: use more linkmode_*
Use more linkmode_* helpers rather than open-coding the bitmap
operations.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15 20:40:06 -07:00
Davide Caratti
fa4e0f8855 net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions
the following script:

 # tc qdisc add dev eth0 clsact
 # tc filter add dev eth0 egress protocol ip matchall \
 > action mpls push protocol mpls_uc label 0x355aa bos 1

causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.

v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
    in skb_mpls_pop(), reported by kbuild test robot

CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15 17:14:48 -07:00
Jiri Pirko
14af7fd1d4 ethtool: Add support for 400Gbps (50Gbps per lane) link modes
Add support for 400Gbps speed, link modes of 50Gbps per lane

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15 15:02:30 -07:00
Linus Torvalds
8625732e77 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
  (qla2xxx) and two in the core.

  The last two are mostly about removing incorrect messages from the
  kernel log: the resid message is definitely wrong and the sync cache
  on protected drive problem is arguably wrong"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: MAINTAINERS: Update qla2xxx driver
  scsi: zfcp: fix reaction on bit error threshold notification
  scsi: core: save/restore command resid for error handling
  scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
  scsi: sd: Ignore a failure to sync cache due to lack of authorization
2019-10-15 12:19:08 -07:00
Andy Shevchenko
9411e3aaa6 gpiolib: Initialize the hardware with a callback
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.

Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.

This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:18:46 +02:00
Randy Dunlap
13bea898cd xarray.h: fix kernel-doc warning
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97e ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
2a7e582f42 bitmap.h: fix kernel-doc warning and typo
Fix kernel-doc warning in <linux/bitmap.h>:

  include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'

Also fix small typo (bitnaps).

Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442f7 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Vlastimil Babka
fdf3bf8091 mm, page_owner: rename flag indicating that page is allocated
Commit 37389167a2 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated.  Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".

Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:00 -07:00
Vlastimil Babka
5556cfe8d9 mm, page_owner: fix off-by-one error in __set_page_owner_handle()
Patch series "followups to debug_pagealloc improvements through
page_owner", v3.

These are followups to [1] which made it to Linus meanwhile.  Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2].  It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).

This patch (of 3):

As noted by Kirill, commit 7e2f2a0cd1 ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages.  As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.

Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage.  Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.

Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0cd1 ("mm, page_owner: record page owner for each subpage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Reported-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:00 -07:00
David S. Miller
a98d62c3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-10-14

The following pull-request contains BPF updates for your *net-next* tree.

12 days of development and
85 files changed, 1889 insertions(+), 1020 deletions(-)

The main changes are:

1) auto-generation of bpf_helper_defs.h, from Andrii.

2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h
   and move into libbpf, from Andrii.

3) Track contents of read-only maps as scalars in the verifier, from Andrii.

4) small x86 JIT optimization, from Daniel.

5) cross compilation support, from Ivan.

6) bpf flow_dissector enhancements, from Jakub and Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 12:17:21 -07:00
David S. Miller
7e0d15ee0d Merge tag 'mac80211-next-for-net-next-2019-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
A few more small things, nothing really stands out:
 * minstrel improvements from Felix
 * a TX aggregation simplification
 * some additional capabilities for hwsim
 * minor cleanups & docs updates
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13 11:29:07 -07:00
Eric Dumazet
ab4e846a82 tcp: annotate sk->sk_wmem_queued lockless reads
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_wmem_queued while this field can change from IRQ or other cpu.

We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.

sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13 10:13:08 -07:00