Commit Graph

967767 Commits

Author SHA1 Message Date
Jakub Kicinski
07cbce2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
   e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
   on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
   infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
   XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
   context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
   for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
   can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

  [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
      https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
  net: mlx5: Add xdp tx return bulking support
  net: mvpp2: Add xdp tx return bulking support
  net: mvneta: Add xdp tx return bulking support
  net: page_pool: Add bulk support for ptr_ring
  net: xdp: Introduce bulking for xdp tx return path
  bpf: Expose bpf_d_path helper to sleepable LSM hooks
  bpf: Augment the set of sleepable LSM hooks
  bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Rename some functions in bpf_sk_storage
  bpf: Folding omem_charge() into sk_storage_charge()
  selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
  selftests/bpf: Add skb_pkt_end test
  bpf: Support for pointers beyond pkt_end.
  tools/bpf: Always run the *-clean recipes
  tools/bpf: Add bootstrap/ to .gitignore
  bpf: Fix NULL dereference in bpf_task_storage
  tools/bpftool: Fix build slowdown
  tools/runqslower: Build bpftool using HOSTCC
  tools/runqslower: Enable out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 09:13:41 -08:00
Steen Hegelund
774626fa44 net: phy: mscc: Add PTP support for 2 more VSC PHYs
Add VSC8572 and VSC8574 in the PTP configuration
as they also support PTP.

The relevant datasheets can be found here:
  - VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572
  - VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 18:25:34 -08:00
Daniel Borkmann
c14d61fca0 Merge branch 'xdp-redirect-bulk'
Lorenzo Bianconi says:

====================
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually
run inside the driver NAPI tx completion loop.

Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs.

More details on benchmarks run on mlx5 can be found here:
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org

Changes since v5:
- do not keep looping over ptr_ring if the cache is full but release leftover
  pages running page_pool_return_page

Changes since v4:
- fix comments
- introduce xdp_frame_bulk_init utility routine
- compiler annotations for I-cache code layout
- move rcu_read_lock outside fast-path
- mlx5 xdp bulking code optimization

Changes since v3:
- align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE
- refactor page_pool_put_page_bulk to avoid code duplication

Changes since v2:
- move mvneta changes in a dedicated patch

Changes since v1:
- improve comments
- rework xdp_return_frame_bulk routine logic
- move count and xa fields at the beginning of xdp_frame_bulk struct
- invert logic in page_pool_put_page_bulk for loop
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
2020-11-14 02:30:03 +01:00
Lorenzo Bianconi
b87c57ae12 net: mlx5: Add xdp tx return bulking support
Convert mlx5 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 8.9Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi
dbef19ccde net: mvpp2: Add xdp tx return bulking support
Convert mvpp2 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 1.79Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi
2f9d09394d net: mvneta: Add xdp tx return bulking support
Convert mvneta driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 275Kpps
XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/9af8014006d022fc0fec78cdaa71beb56999750d.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi
7886244736 net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi
8965398713 net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
2020-11-14 02:28:59 +01:00
Jisheng Zhang
bb3222f71b net: stmmac: platform: use optional clk/reset get APIs
Use the devm_reset_control_get_optional() and devm_clk_get_optional()
rather than open coding them.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201112092606.5173aa6f@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 16:31:52 -08:00
Heiner Kallweit
ca1ab89cd2 r8169: improve rtl_tx
We can simplify the for() condition and eliminate variable tx_left.
The change also considers that tp->cur_tx may be incremented by a
racing rtl8169_start_xmit().
In addition replace the write to tp->dirty_tx and the following
smp_mb() with an equivalent call to smp_store_mb(). This implicitly
adds a WRITE_ONCE() to the write.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 16:29:07 -08:00
Heiner Kallweit
95f3c5458d r8169: use READ_ONCE in rtl_tx_slots_avail
tp->dirty_tx and tp->cur_tx may be changed by a racing rtl_tx() or
rtl8169_start_xmit(). Use READ_ONCE() to annotate the races and ensure
that the compiler doesn't use cached values.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/5676fee3-f6b4-84f2-eba5-c64949a371ad@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 16:28:59 -08:00
Jakub Kicinski
2caf08e757 Merge branch 'net-ipa-two-fixes'
Alex Elder says:

====================
net: ipa: two fixes

This small series makes two fixes to the IPA code:
  - While reviewing something else I found that one of the resource
    limits on the SDM845 used the wrong value.  The first patch
    fixes this.  The correct value allocates more resources of this
    type for IPA to use, and otherwise does not change behavior.
  - When the IPA-resident microcontroller starts up it generates an
    event, which triggers an AP interrupt.  The event merely
    provides some information for logging, which we don't support.
    We already ignore the event, and that's harmless.  So this
    patch explicitly ignores it rather than issuing a warning when
    it occurs.
====================

Link: https://lore.kernel.org/r/20201112121157.19784-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:37:10 -08:00
Alex Elder
0a5096ec2a net: ipa: ignore the microcontroller log event
The IPA-resident microcontroller has the ability to log various
activity in an area of IPA shared memory.  When the microcontroller
starts it generates an event to the AP to provide information about
the log.

We don't support reading this log, and we can safely ignore the
event.  So do that rather than treating the log info event we
receive as "unsupported."

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:37:07 -08:00
Alex Elder
3ce6da1b2e net: ipa: fix source packet contexts limit
I have discovered that the maximum number of source packet contexts
configured for SDM845 is incorrect.  Fix this error.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:37:07 -08:00
Jakub Kicinski
992c75ae2f Merge branch 'sfc-further-ef100-encap-tso-features'
Edward Cree says:

====================
sfc: further EF100 encap TSO features

This series adds support for GRE and GRE_CSUM TSO on EF100 NICs, as
 well as improving the handling of UDP tunnel TSO.
====================

Link: https://lore.kernel.org/r/eda2de73-edf2-8b92-edb9-099ebda09ebc@solarflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:33:37 -08:00
Edward Cree
c5122cf584 sfc: support GRE TSO on EF100
We can treat SKB_GSO_GRE almost exactly the same as UDP tunnels, except
 that we don't want to edit the outer UDP len (as there isn't one).
For SKB_GSO_GRE_CSUM, we have to use GSO_PARTIAL as the device doesn't
 support offload of non-UDP outer L4 checksums.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
2020-11-13 15:33:30 -08:00
Edward Cree
42bfd69a9f sfc: correctly support non-partial GSO_UDP_TUNNEL_CSUM on EF100
By asking the HW for the correct edits, we can make UDP tunnel TSO
 work without needing GSO_PARTIAL.  So don't specify it in our
 netdev->gso_partial_features.
However, retain GSO_PARTIAL support, as this will be used for other
 protocols later.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
2020-11-13 15:33:27 -08:00
Edward Cree
dc8d2512e6 sfc: extend bitfield macros to 19 fields
Our TSO descriptors got even more fussy.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
2020-11-13 15:33:03 -08:00
Jakub Kicinski
72ac50b206 Merge branch 'net-ipa-gsi-register-consolidation'
Alex Elder says:

====================
net: ipa: GSI register consolidation

This series rearranges and consolidates some GSI register
definitions.  Its general aim is to make things more
consistent, by:
  - Using enumerated types to define the values held in GSI register
    fields
  - Defining field values in "gsi_reg.h", together with the
    definition of the register (and field) that holds them
  - Format enumerated type members consistently, with hexidecimal
    numeric values, and assignments aligned on the same column

There is one checkpatch "CHECK" warning requesting a blank line; I
ignored that because my intention was to group certain definitions.
====================

Link: https://lore.kernel.org/r/20201110215922.23514-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:46 -08:00
Alex Elder
4730ab1c1d net: ipa: use enumerated types for GSI field values
Replace constants defined with an "_FVAL" suffix with values defined
in enumerated types, to be consistent with other usage in the driver.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:41 -08:00
Alex Elder
cec2076e43 net: ipa: move GSI command opcode values into "gsi_reg.h"
The gsi_ch_cmd_opcode, gsi_evt_cmd_opcode, and gsi_generic_cmd_opcode
enumerated types are values that fields in the GSI command registers
can take on.  Move their definitions out of "gsi.c" and into "gsi_reg.h",
alongside the definition of registers they are associated with.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:41 -08:00
Alex Elder
7b0ac8f651 net: ipa: move GSI error values into "gsi_reg.h"
The gsi_err_code and gsi_err_type enumerated types are values that
fields in the GSI ERROR_LOG register can take on.  Move their
definitions out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the ERROR_LOG register offset and field symbols.

Drop the "_ERR" suffix in the names of the gsi_err_code members.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:41 -08:00
Alex Elder
9ed8c2a92d net: ipa: move channel type values into "gsi_reg.h"
The gsi_channel_type enumerated type define values used for the
channel type/protocol for event rings and channels.  Move its
definition out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the CH_C_CNTXT_0 register offset and its fields.
Add a comment near the definition of the EV_CH_E_CNTXT_0 register
indicating this type is used for its EV_CHTYPE field.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:41 -08:00
Alex Elder
46dda53ef7 net: ipa: use common value for channel type and protocol
The numeric values that represent the event ring channel type are
identical to the values that represent the matching protocol used
for a channel.  Use a new gsi_channel_type enumerated type to
represent the values programmed for both cases, using "CHANNEL_TYPE"
in member names in place of "EVT_CHTYPE" and "CHANNEL_PROTOCOL".

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:41 -08:00
Alex Elder
6c6358cca6 net: ipa: define GSI interrupt types with enums
Define the GSI global interrupt types with an enumerated type whose
values are the bit positions representing the global interrupt types.

Similarly, define the GSI general interrupt types with an enumerated
type whose values are the bit positions of general interrupt types.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 15:13:40 -08:00
Wenlin Kang
2f51e5758d tipc: fix -Wstringop-truncation warnings
Replace strncpy() with strscpy(), fixes the following warning:

In function 'bearer_name_validate',
    inlined from 'tipc_enable_bearer' at net/tipc/bearer.c:246:7:
net/tipc/bearer.c:141:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
  strncpy(name_copy, name, TIPC_MAX_BEARER_NAME);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Link: https://lore.kernel.org/r/20201112093442.8132-1-wenlin.kang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 14:17:49 -08:00
Jakub Kicinski
f8fd36b95e Merge tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Some updates:
 * injection/radiotap updates for new test capabilities
 * remove WDS support - even years ago when we turned
   it off by default it was already basically unusable
 * support for HE (802.11ax) rates for beacons
 * support for some vendor-specific HE rates
 * many other small features/cleanups

* tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits)
  nl80211: fix kernel-doc warning in the new SAE attribute
  cfg80211: remove WDS code
  mac80211: remove WDS-related code
  rt2x00: remove WDS code
  b43legacy: remove WDS code
  b43: remove WDS code
  carl9170: remove WDS code
  ath9k: remove WDS code
  wireless: remove CONFIG_WIRELESS_WDS
  mac80211: assure that certain drivers adhere to DONT_REORDER flag
  mac80211: don't overwrite QoS TID of injected frames
  mac80211: adhere to Tx control flag that prevents frame reordering
  mac80211: add radiotap flag to assure frames are not reordered
  mac80211: save HE oper info in BSS config for mesh
  cfg80211: add support to configure HE MCS for beacon rate
  nl80211: fix beacon tx rate mask validation
  nl80211/cfg80211: fix potential infinite loop
  cfg80211: Add support to calculate and report 4096-QAM HE rates
  cfg80211: Add support to configure SAE PWE value to drivers
  ieee80211: Add definition for WFA DPP
  ...
====================

Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 12:03:22 -08:00
KP Singh
6f100640ca bpf: Expose bpf_d_path helper to sleepable LSM hooks
Sleepable hooks are never called from an NMI/interrupt context, so it
is safe to use the bpf_d_path helper in LSM programs attaching to these
hooks.

The helper is not restricted to sleepable programs and merely uses the
list of sleepable hooks as the initial subset of LSM hooks where it can
be used.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201113005930.541956-3-kpsingh@chromium.org
2020-11-13 15:47:21 +01:00
KP Singh
423f16108c bpf: Augment the set of sleepable LSM hooks
Update the set of sleepable hooks with the ones that do not trigger
a warning with might_fault() when exercised with the correct kernel
config options enabled, i.e.

	DEBUG_ATOMIC_SLEEP=y
	LOCKDEP=y
	PROVE_LOCKING=y

This means that a sleepable LSM eBPF program can be attached to these
LSM hooks. A new helper method bpf_lsm_is_sleepable_hook is added and
the set is maintained locally in bpf_lsm.c

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201113005930.541956-2-kpsingh@chromium.org
2020-11-13 15:45:54 +01:00
Alexei Starovoitov
904709f63b Merge branch 'bpf: Enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP'
Martin KaFai says:

====================

This set is to allow the FENTRY/FEXIT/RAW_TP tracing program to use
bpf_sk_storage.  The first two patches are a cleanup.  The last patch is
tests.  Patch 3 has the required kernel changes to
enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP.

Please see individual patch for details.

v2:
- Rename some of the function prefix from sk_storage to bpf_sk_storage
- Use prefix check instead of substr check
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-12 18:39:57 -08:00
Martin KaFai Lau
53632e1119 bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch tests storing the task's related info into the
bpf_sk_storage by fentry/fexit tracing at listen, accept,
and connect.  It also tests the raw_tp at inet_sock_set_state.

A negative test is done by tracing the bpf_sk_storage_free()
and using bpf_sk_storage_get() at the same time.  It ensures
this bpf program cannot load.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201112211320.2587537-1-kafai@fb.com
2020-11-12 18:39:28 -08:00
Martin KaFai Lau
8e4597c627 bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
the bpf_sk_storage_(get|delete) helper, so those tracing programs
can access the sk's bpf_local_storage and the later selftest
will show some examples.

The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
cg sockops...etc which is running either in softirq or
task context.

This patch adds bpf_sk_storage_get_tracing_proto and
bpf_sk_storage_delete_tracing_proto.  They will check
in runtime that the helpers can only be called when serving
softirq or running in a task context.  That should enable
most common tracing use cases on sk.

During the load time, the new tracing_allowed() function
will ensure the tracing prog using the bpf_sk_storage_(get|delete)
helper is not tracing any bpf_sk_storage*() function itself.
The sk is passed as "void *" when calling into bpf_local_storage.

This patch only allows tracing a kernel function.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
2020-11-12 18:39:28 -08:00
Martin KaFai Lau
e794bfddb8 bpf: Rename some functions in bpf_sk_storage
Rename some of the functions currently prefixed with sk_storage
to bpf_sk_storage.  That will make the next patch have fewer
prefix check and also bring the bpf_sk_storage.c to a more
consistent function naming.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211307.2587021-1-kafai@fb.com
2020-11-12 18:39:27 -08:00
Martin KaFai Lau
9e838b02b0 bpf: Folding omem_charge() into sk_storage_charge()
sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com
2020-11-12 18:39:27 -08:00
Jakub Kicinski
e1d9d7b913 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 16:54:48 -08:00
Daniel Borkmann
0a58a65cc0 Merge branch 'bpf-ptrs-beyond-pkt-end'
Alexei Starovoitov says:

====================
v1->v2:
- removed set-but-unused variable.
- added Jiri's Tested-by.

In some cases LLVM uses the knowledge that branch is taken to optimze the code
which causes the verifier to reject valid programs.
Teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
  here r1 points beyond packet_end and subsequent
  if (r1 > r2) // always evaluates to "true".
}
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-11-13 01:42:57 +01:00
Alexei Starovoitov
cb62d34019 selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
Add few assembly tests for packet comparison.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-4-alexei.starovoitov@gmail.com
2020-11-13 01:42:11 +01:00
Alexei Starovoitov
9cc873e858 selftests/bpf: Add skb_pkt_end test
Add a test that currently makes LLVM generate assembly code:

$ llvm-objdump -S skb_pkt_end.o
0000000000000000 <main_prog>:
; 	if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
       0:	61 12 50 00 00 00 00 00	r2 = *(u32 *)(r1 + 80)
       1:	61 14 4c 00 00 00 00 00	r4 = *(u32 *)(r1 + 76)
       2:	bf 43 00 00 00 00 00 00	r3 = r4
       3:	07 03 00 00 36 00 00 00	r3 += 54
       4:	b7 01 00 00 00 00 00 00	r1 = 0
       5:	2d 23 02 00 00 00 00 00	if r3 > r2 goto +2 <LBB0_2>
       6:	07 04 00 00 0e 00 00 00	r4 += 14
; 	if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
       7:	bf 41 00 00 00 00 00 00	r1 = r4
0000000000000040 <LBB0_2>:
       8:	b4 00 00 00 ff ff ff ff	w0 = -1
; 	if (!(ip = get_iphdr(skb)))
       9:	2d 23 05 00 00 00 00 00	if r3 > r2 goto +5 <LBB0_6>
; 	proto = ip->protocol;
      10:	71 12 09 00 00 00 00 00	r2 = *(u8 *)(r1 + 9)
; 	if (proto != IPPROTO_TCP)
      11:	56 02 03 00 06 00 00 00	if w2 != 6 goto +3 <LBB0_6>
; 	if (tcp->dest != 0)
      12:	69 12 16 00 00 00 00 00	r2 = *(u16 *)(r1 + 22)
      13:	56 02 01 00 00 00 00 00	if w2 != 0 goto +1 <LBB0_6>
; 	return tcp->urg_ptr;
      14:	69 10 26 00 00 00 00 00	r0 = *(u16 *)(r1 + 38)
0000000000000078 <LBB0_6>:
; }
      15:	95 00 00 00 00 00 00 00	exit

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-3-alexei.starovoitov@gmail.com
2020-11-13 01:42:11 +01:00
Alexei Starovoitov
6d94e741a8 bpf: Support for pointers beyond pkt_end.
This patch adds the verifier support to recognize inlined branch conditions.
The LLVM knows that the branch evaluates to the same value, but the verifier
couldn't track it. Hence causing valid programs to be rejected.
The potential LLVM workaround: https://reviews.llvm.org/D87428
can have undesired side effects, since LLVM doesn't know that
skb->data/data_end are being compared. LLVM has to introduce extra boolean
variable and use inline_asm trick to force easier for the verifier assembly.

Instead teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
  here r1 points beyond packet_end and
  subsequent
  if (r1 > r2) // always evaluates to "true".
}

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-2-alexei.starovoitov@gmail.com
2020-11-13 01:42:11 +01:00
Guillaume Nault
e865802357 selftests: set conf.all.rp_filter=0 in bareudp.sh
When working on the rp_filter problem, I didn't realise that disabling
it on the network devices didn't cover all cases: rp_filter could also
be enabled globally in the namespace, in which case it would drop
packets, even if the net device has rp_filter=0.

Fixes: 1ccd58331f ("selftests: disable rp_filter when testing bareudp")
Fixes: bbbc7aa45e ("selftests: add test script for bareudp tunnels")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f2d459346471f163b239aa9d63ce3e2ba9c62895.1605107012.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 16:14:38 -08:00
Jakub Kicinski
e7086213f7 Merge branch 'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'
Ido Schimmel says:

====================
mlxsw: spectrum: Prepare for XM implementation - prefix insertion and removal

Jiri says:

This is a preparation patchset for follow-up support of boards with
extended mezzanine (XM), which is going to allow extended (scale-wise)
router offload.

XM requires a separate PRM register named XMDR to be used instead of
RALUE to insert/update/remove FIB entries. Therefore, this patchset
extends the previously introduces low-level ops to be able to have
XM-specific FIB entry config implementation.

Currently the existing original RALUE implementation is moved to "basic"
low-level ops.

Unlike legacy router, insertion/update/removal of FIB entries into XM
could be done in bulks up to 4 items in a single PRM register write.
That is why this patchset implements "an op context", that allows the
future XM ops implementation to squash multiple FIB events to single
register write. For that, the way in which the FIB events are processed
by the work queue has to be changed.

The conversion from 1:1 FIB event - work callback call to event queue is
implemented in patch #3.

Patch #4 introduces "an op context" that will allow in future to squash
multiple FIB events into one XMDR register write. Patch #12 converts it
from stack to be allocated per instance.

Existing RALUE manipulations are pushed to ops in patch #10.

Patch #13 is introducing a possibility for low-level implementation to
have per FIB entry private memory.

The rest of the patches are either cosmetics or smaller preparations.
====================

Link: https://lore.kernel.org/r/20201110094900.1920158-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:26 -08:00
Jiri Pirko
173f14cda3 mlxsw: spectrum_router: Introduce FIB entry update op
Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:22 -08:00
Jiri Pirko
a005a7fe2f mlxsw: spectrum_router: Track FIB entry committed state and skip uncommitted on delete
In case bulking is used, the entry that was previously added may not
be yet committed to the HW as it waits in the queue for bulk send. For
such entries, skip the deletion.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:22 -08:00
Jiri Pirko
ae9ce81aa7 mlxsw: spectrum_router: Introduce fib_entry priv for low-level ops
Prepare for the low-level ops that need to store some data alongside
the fib_entry and introduce a per-fib_entry priv for ll ops.
The priv is reference counted as in the follow-up patch it is going
to be saved in pack() function and used later on in commit() even in
case the related fib_entry gets freed in the middle.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:21 -08:00
Jiri Pirko
91d20d71b2 mlxsw: spectrum_router: Have FIB entry op context allocated for the instance
Get the max size needed for FIB entry op context and allocate it once
for the instance. Use it repeatedly from the scheduled work.
By this, allow to extend the context to hold more data than it is wise
to do when it was on the stack. Make sure to signalize that the context
needs to be initialized in case families of subsequent FIB entries differ.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:21 -08:00
Jiri Pirko
505cd65c66 mlxsw: spectrum_router: Prepare work context for possible bulking
For XMDR register it is possible to carry multiple FIB entry
operations in a single write. However the FW does not restrict mixing
the types of operations, make the code easier and indicate the bulking
is ok only in case the bulk contains FIB operations of the same family
and event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:21 -08:00
Jiri Pirko
7f5c4090e4 mlxsw: spectrum: Push RALUE packing and writing into low-level router ops
With follow-up introduction of XM implementation, XMDR register is
going to be optionally used instead of RALUE register. Push the RALUE
packing helpers and write call into low-level router ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:21 -08:00
Jiri Pirko
1a9c21d5f7 mlxsw: spectrum_router: Use RALUE pack helper from abort function
Unify the RALUE register payload packing and use the
__mlxsw_sp_fib_entry_ralue_pack() helper from
__mlxsw_sp_router_set_abort_trap().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:21 -08:00
Jiri Pirko
1a7fcdf75d mlxsw: reg: Allow to pass NULL pointer to mlxsw_reg_ralue_pack4/6()
In preparation for the change that is going to be done in the next
patch, allow to pass NULL pointer to mlxsw_reg_ralue_pack4() and
mlxsw_reg_ralue_pack6() helpers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:20 -08:00
Jiri Pirko
0c1d6b2694 mlxsw: spectrum_router: Pass destination IP as a pointer to mlxsw_reg_ralue_pack4()
Instead of passing destination IP as a u32 value, pass it as pointer to
u32. Avoid using local variable for the pointer store.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 15:55:20 -08:00