Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-10-31
This series contains updates to i40e, i40evf and net/sched.
Arnd Bergmann cleans up the power management code to resolve a build
warning.
Shannon Nelson fixes i40e to only redistribute our vectors when we did
not get the full count that we requested.
Alex reverts a previous commit because it potentially causes a memory leak
when combined with the current page recycling scheme.
Amritha enables configuring cloud filters in i40e using the tc-flower
classifier. The classification function of the filter is to match a
packet to a traffic class. cls_flower is extended to offload classid to
hardware. Hardware traffic classes are identified using classid values
reserved in the range :ffe0 - :ffef.
The cloud filters are added for a VSI and are cleaned up when the VSI is
deleted. The filters that match on L4 ports needs enhanced admin queue
functions with big buffer support for extended fields in cloud filter
commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
tdc.py reads a bunch of test cases in json files. When a json file
cannot be parsed, tdc just exits and does not run any tests.
This patch will cause tdc to print a message with the file name and
line number, then that file will be ignored and the rest of the tests
will be processed.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check if tcase[k] is an instance of a list (is or is derived from list)
instead of checking if it is a list.
This will be useful if the data structures change to be something
that implements list, instead of being an actual list. In that
case, this code will not have to change.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the config customization into a site-local file
tdc_config_local.py, so that updates of the tdc test
software does not require hand-editing of the config.
This patch includes a template for the site-local
customization file.
In addition, this makes it easy to revert to a stock
tdc environment for testing the test framework and/or
the core tests.
Also it makes it harder for any custom config to be
submitted back to the kernel tdc.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignore .pyc files, "python compiled" files, that get created
when a python script is run. They should never be committed.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of documentation, supply some very simple test cases
to illustrate how test cases work. One test case shows
commands in the setup, command, verify and teardown stages.
Other test cases show how to have a working test case that
does not have commands in the setup, verify and/or teardown
stages.
Specifically, the command lists for setup and teardown can
be empty. And the verify command must have a command, but
it can be /bin/true. The regex must have a string, we
recommend a single space, and the count of matches must be
zero if you do not want to use the match feature of verify.
Verify will always look for a return code of success (0)
so we give /bin/true when we do not want to make a check
there.
Also, update the documentation for testcases to be more
specific in the cases of:
- accepting non-success return codes in setup and
teardown stages
- how to write the test when no setup, teardown
and/or verify are desired.
To run the example test cases:
$ sudo -E ./tdc.py -f creating-testcases/example.json -l
1f: (example) simple test to test framework
2f: (example) simple test, no need for verify
3f: (example) simple test, no need for setup or teardown (or verify)
$ sudo -E ./tdc.py -f creating-testcases/example.json
Test 1f: simple test to test framework
Test 2f: simple test, no need for verify
Test 3f: simple test, no need for setup or teardown (or verify)
All test results:
1..3
ok 1 1f simple test to test framework
ok 2 2f simple test, no need for verify
ok 3 3f simple test, no need for setup or teardown (or verify)
$
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault says:
====================
l2tp: remove unused code
Patch #1 removes the ref/deref mechanism that was originally used to
prevent ppp pseudowires from dropping their sockets. This mechanism
was error prone and isn't used anymore.
Patch #2 removes some module specific refcount debugging.
Patches #3 and #4 take care of some dead code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With conversion to refcount_t, such manual debugging code doesn't make
sense anymore.
The tunnel part was already dropped by
54652eb12c ("l2tp: hold tunnel while looking up sessions in l2tp_netlink").
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ->ref() and ->deref() callbacks are unused since PPP stopped using
them in ee40fb2e1e ("l2tp: protect sock pointer of struct pppol2tp_session with RCU").
We can thus remove them from struct l2tp_session and drop the do_ref
parameter of l2tp_session_get*().
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables tc-flower based hardware offloads. tc flower
filter provided by the kernel is configured as driver specific
cloud filter. The patch implements functions and admin queue
commands needed to support cloud filters in the driver and
adds cloud filters to configure these tc-flower filters.
The classification function of the filter is to direct matched
packets to a traffic class. The hardware traffic class is set
based on the the classid reserved in the range :ffe0 - :ffef.
Match Dst MAC and route to TC0:
prio 1 flower dst_mac 3c:fd:fe:a0:d6:70 skip_sw\
hw_tc 1
Match Dst IPv4,Dst Port and route to TC1:
prio 2 flower dst_ip 192.168.3.5/32\
ip_proto udp dst_port 25 skip_sw\
hw_tc 2
Match Dst IPv6,Dst Port and route to TC1:
prio 3 flower dst_ip fe8::200:1\
ip_proto udp dst_port 66 skip_sw\
hw_tc 2
Delete tc flower filter:
Example:
Flow Director Sideband is disabled while configuring cloud filters
via tc-flower and until any cloud filter exists.
Unsupported matches when cloud filters are added using enhanced
big buffer cloud filter mode of underlying switch include:
1. source port and source IP
2. Combined MAC address and IP fields.
3. Not specifying L4 port
These filter matches can however be used to redirect traffic to
the main VSI (tc 0) which does not require the enhanced big buffer
cloud filter support.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch offloads the classid to hardware and uses the classid
reserved in the range :ffe0 - :ffef to identify hardware traffic
classes reported via dev->num_tc.
tcf_result structure contains the class ID of the class to which
the packet belongs and is offloaded to hardware via flower filter.
A new helper function is introduced to represent HW traffic
classes 0 through 15 using the reserved classid values :ffe0 - :ffef.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This reverts commit 11f29003d6.
I am reverting this as I am fairly certain this can result in a memory leak
when combined with the current page recycling scheme. Specifically we end
up attempting to allocate fewer buffers than we recycled and this results
in us rewinding the next to alloc pointer which leads to leaks when we
overwrite the rx_buffer_info when processing the next frame.
Fixes: 11f29003d6 ("i40e/i40evf: bump tail only in multiples of 8")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Whether or not there are vectors_left, we only need to redistribute
our vectors if we didn't get as many as we requested. With the current
check, the code will try to redistribute even if we did in fact get all
the vectors we requested - this can happen when we have more CPUs than
we do vectors. This restores an earlier check to be sure we only
redistribute if we didn't get the full count we requested.
Fixes: 4ce20abc64 (i40e: fix MSI-X vector redistribution if hw limit is reached)
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A cleanup of the PM code left an incorrect #ifdef in place, leading
to a harmless build warning:
drivers/net/ethernet/intel/i40e/i40e_main.c:12223:12: error: 'i40e_resume' defined but not used [-Werror=unused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:12185:12: error: 'i40e_suspend' defined but not used [-Werror=unused-function]
It's easier to use __maybe_unused attributes here, since you
can't pick the wrong one.
Fixes: 0e5d3da400 ("i40e: use newer generic PM support instead of legacy PM callbacks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
bpf_getsockopt bpf call sets the ret variable to zero and
never changes it. What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several conflicts here.
NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.
Parallel additions to net/pkt_cls.h and net/sch_generic.h
A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.
The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix route leak in xfrm_bundle_create().
2) In mac80211, validate user rate mask before configuring it. From
Johannes Berg.
3) Properly enforce memory limits in fair queueing code, from Toke
Hoiland-Jorgensen.
4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
5) Fix TSO header allocation and management in mvpp2 driver, from Yan
Markman.
6) Don't take socket lock in BH handler in strparser code, from Tom
Herbert.
7) Don't show sockets from other namespaces in AF_UNIX code, from
Andrei Vagin.
8) Fix double free in error path of tap_open(), from Girish Moodalbail.
9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
and Alexander Duyck.
10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
Long.
12) Properly align SKB head before building SKB in tuntap, from Jason
Wang.
13) Avoid matching qdiscs with a zero handle during lookups, from Cong
Wang.
14) Fix various endianness bugs in sctp, from Xin Long.
15) Fix tc filter callback races and add selftests which trigger the
problem, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
selftests: Introduce a new test case to tc testsuite
selftests: Introduce a new script to generate tc batch file
net_sched: fix call_rcu() race on act_sample module removal
net_sched: add rtnl assertion to tcf_exts_destroy()
net_sched: use tcf_queue_work() in tcindex filter
net_sched: use tcf_queue_work() in rsvp filter
net_sched: use tcf_queue_work() in route filter
net_sched: use tcf_queue_work() in u32 filter
net_sched: use tcf_queue_work() in matchall filter
net_sched: use tcf_queue_work() in fw filter
net_sched: use tcf_queue_work() in flower filter
net_sched: use tcf_queue_work() in flow filter
net_sched: use tcf_queue_work() in cgroup filter
net_sched: use tcf_queue_work() in bpf filter
net_sched: use tcf_queue_work() in basic filter
net_sched: introduce a workqueue for RCU callbacks of tc filter
sctp: fix some type cast warnings introduced since very beginning
sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
sctp: fix some type cast warnings introduced by transport rhashtable
sctp: fix some type cast warnings introduced by stream reconf
...
Cong Wang says:
====================
net_sched: fix races with RCU callbacks
Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:
1. A simple one fixed by Daniel:
commit c78e1746d3
Author: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed May 20 17:13:33 2015 +0200
net: sched: fix call_rcu() race on classifier module unloads
2. A very nasty one fixed by me:
commit 1697c4bb52
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon Sep 11 16:33:32 2017 -0700
net_sched: carefully handle tcf_block_put()
3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/https://patchwork.ozlabs.org/patch/826695/
Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.
Alternatively, we could also:
a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb52 ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).
b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.
As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.
Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
# sudo ./tdc.py -d enp4s0f0
This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.
In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
# ./tdc_batch.py -h
usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file
TC batch file generator
positional arguments:
device device name
file batch file name
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
how many lines in batch file
-o, --skip_sw skip_sw (offload), by default skip_hw
-s, --share_action all filters share the same action
-p, --prio all filters have different prio
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.
Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.
On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.
Please also see the code comments below.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar says:
====================
add 'private' and 'vepa' attributes to ipvlan modes
IPvlan has always been operating in bridge-mode for its supported modes i.e.
if the packets are destined to the adjacent neighbor dev, then IPvlan driver
will switch the packet internally without needing the packets to hit the
wire or get routed. However, there are situations where this bridge-mode is
not needed. e.g. two private processes running inside two namespaces which
are having one IPvlan slave each for its namespace but sharing the master. These
processes should reach the outside world through the master device but at
the same time the bridge function should not work. Currently that's not
possible hence the private attribute for the selected mode comes in play.
VEPA or 802.1Qbg on the other hand has limited appeal with IPvlan since IPvlan
uses the mac-address of the lower device. So packets that are destined to
the adjacent neighbor slave-dev will have same src and dest mac. When these
packets reach the external switch/router, they will send you the redirect
message which the host will have to deal with. Having said that this attribute
will have appeal in debugging as IPvlan will not switch / short-circuit
packets internally. e.g. using VEPA mode with lower-device in loopback mode
will avoid some complicated set-ups that use non-local-bind with some route
jugglery.
This patch-set implements these attributes for the existing modes that
IPvlan has. Please see individual patches for their detailed implementation.
A subsequent ip-utils patch is needed and will be sent soon.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is very similar to the Macvlan VEPA mode, however, there is some
difference. IPvlan uses the mac-address of the lower device, so the VEPA
mode has implications of ICMP-redirects for packets destined for its
immediate neighbors sharing same master since the packets will have same
source and dest mac. The external switch/router will send redirect msg.
Having said that, this will be useful tool in terms of debugging
since IPvlan will not switch packets within its slaves and rely completely
on the external entity as intended in 802.1Qbg.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPvlan has always operated in bridge mode. However there are scenarios
where each slave should be able to talk through the master device but
not necessarily across each other. Think of an environment where each
of a namespace is a private and independant customer. In this scenario
the machine which is hosting these namespaces neither want to tell who
their neighbor is nor the individual namespaces care to talk to neighbor
on short-circuited network path.
This patch implements the mode that is very similar to the 'private' mode
in macvlan where individual slaves can send and receive traffic through
the master device, just that they can not talk among slave devices.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a completion file for bash. The completion function runs bpftool
when needed, making it smart enough to help users complete ids or tags
for eBPF programs and maps currently on the system.
Update Makefile to install completion file to
/usr/share/bash-completion/completions when running `make install`.
Emacs file mode and (at the end) Vim modeline have been added, to keep
the style in use for most existing bash completion files. In this, it
differs from tools/perf/perf-completion.sh, which seems to be the only
other completion file among the kernel sources repository. This is also
valid for indent style: 4-space indents, as in other completion files.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: a bunch of fixes for some sparse warnings
As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
warnings or errors checked by sparse appear. They are all problems
about Endian and type cast.
Most of them are just warnings by which no issues could be caused
while some might be bugs.
This patchset fixes them with four patches basically according to
how they are introduced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These warnings were found by running 'make C=2 M=net/sctp/'.
They are there since very beginning.
Note after this patch, there still one warning left in
sctp_outq_flush():
sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)
Since it has been moved to sctp_stream_outq_migrate on net-next,
to avoid the extra job when merging net-next to net, I will post
the fix for it after the merging is done.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These warnings were found by running 'make C=2 M=net/sctp/'.
Commit d4d6fb5787 ("sctp: Try not to change a_rwnd when faking a
SACK from SHUTDOWN.") expected to use the peers old rwnd and add
our flight size to the a_rwnd. But with the wrong Endian, it may
not work as well as expected.
So fix it by converting to the right value.
Fixes: d4d6fb5787 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These warnings were found by running 'make C=2 M=net/sctp/'.
They are introduced by not aware of Endian for the port when
coding transport rhashtable patches.
Fixes: 7fda702f93 ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>