Pull drm fixes from Dave Airlie:
"Last week of fixes, just amdgpu and i915 collections. We had a i915
regression reported by HJ Lu reported this morning, and this contains
a fix for that he has tested.
There are a fair few other fixes, but they are spread across the two
drivers, and all fairly self contained.
amdgpu:
- Fan fix for CI asics
- Fix a warning in possible_crtcs
- Build fix for when debugfs is disabled
- Display overflow fix
- Display watermark fixes for Renoir
- SDMA 5.2 fix
- Stolen vga memory regression fix
- Power profile fixes
- Fix a regression from removal of GEM and PRIME callbacks
amdkfd:
- Fix a memory leak in dmabuf import
i915:
- rc7 regression fix for modesetting
- vdsc/dp slice fixes
- gen9 mocs entries fix
- preemption timeout fix
- unsigned compare against 0 fix
- selftest fix
- submission error propogatig fix
- request flow suspend fix"
* tag 'drm-fixes-2020-12-11' of git://anongit.freedesktop.org/drm/drm:
drm/i915/display: Go softly softly on initial modeset failure
drm/amd/pm: typo fix (CUSTOM -> COMPUTE)
drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs
drm/amdgpu: fix size calculation with stolen vga memory
drm/amd/pm: update smu10.h WORKLOAD_PPLIB setting for raven
drm/amdkfd: Fix leak in dmabuf import
drm/amdgpu: fix sdma instance fw version and feature version init
drm/amd/display: Add wm table for Renoir
drm/amd/display: Prevent bandwidth overflow
drm/amdgpu: fix debugfs creation/removal, again
drm/amdgpu/disply: set num_crtc earlier
drm/amdgpu/powerplay: parse fan table for CI asics
drm/i915/gt: Declare gen9 has 64 mocs entries!
drm/i915/display/dp: Compute the correct slice count for VDSC on DP
drm/i915: fix size_t greater or equal to zero comparison
drm/i915/gt: Cancel the preemption timeout on responding to it
drm/i915/gt: Ignore repeated attempts to suspend request flow across reset
drm/i915/gem: Propagate error from cancelled submit due to context closure
drm/i915/gem: Check the correct variable in selftest
Pull ktest fix from Steven Rostedt:
"Fix issues with grub2bls in ktest.pl
ktest.pl did not know about grub2bls that was introduced in Fedora 30,
and now it does"
* tag 'ktest-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Fix incorrect reboot for grub2bls
Pull powerpc fix from Michael Ellerman:
"One commit to implement copy_from_kernel_nofault_allowed(), otherwise
copy_from_kernel_nofault() can trigger warnings when accessing bad
addresses in some configurations.
Thanks to Christophe Leroy and Qian Cai"
* tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()
Pull namespaced fscaps fix from James Morris:
"Fix namespaced fscaps when !CONFIG_SECURITY (Serge Hallyn)"
* tag 'fixes-v5.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
[SECURITY] fix namespaced fscaps when !CONFIG_SECURITY
Fixes for VDSC/DP, selftests, shmem_utils, preemption, submission, and gt reset:
- Check the correct variable in selftest (Dan)
- Propagate error from canceled submit due to context closure (Chris)
- Ignore repeated attempts to suspend request flow across reset (Chris)
- Cancel the preemption timeout on responding to it (Chris)
- Fix unsigned compared against 0 (Colin)
- Compute the correct slice count for VDSC on DP (Manasi)
- Declar gen9 has 64 mocs entries (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201209235010.GA10554@intel.com
amd-drm-fixes-5.10-2020-12-09:
amdgpu:
- Fan fix for CI asics
- Fix a warning in possible_crtcs
- Build fix for when debugfs is disabled
- Display overflow fix
- Display watermark fixes for Renoir
- SDMA 5.2 fix
- Stolen vga memory regression fix
- Power profile fixes
- Fix a regression from removal of GEM and PRIME callbacks
amdkfd:
- Fix a memory leak in dmabuf import
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201210034848.18108-1-alexander.deucher@amd.com
Pull NFS client fixes from Anna Schumaker:
"Here are a handful more bugfixes for 5.10.
Unfortunately, we found some problems with the new READ_PLUS operation
that aren't easy to fix. We've decided to disable this codepath
through a Kconfig option for now, but a series of patches going into
5.11 will clean up the code and fix the issues at the same time. This
seemed like the best way to go about it.
Summary:
- Fix array overflow when flexfiles mirroring is enabled
- Fix rpcrdma_inline_fixup() crash with new LISTXATTRS
- Fix 5 second delay when doing inter-server copy
- Disable READ_PLUS by default"
* tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Disable READ_PLUS by default
NFSv4.2: Fix 5 seconds delay when doing inter server copy
NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation
pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
Pull networking fixes from David Miller:
1) IPsec compat fixes, from Dmitry Safonov.
2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai.
3) Fix polling in xsk sockets by using sk_poll_wait() instead of
datagram_poll() which keys off of sk_wmem_alloc and such which xsk
sockets do not update. From Xuan Zhuo.
4) Missing init of rekey_data in cfgh80211, from Sara Sharon.
5) Fix destroy of timer before init, from Davide Caratti.
6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd
Bergmann.
7) Missing error return in rtm_to_fib_config() switch case, from Zhang
Changzhong.
8) Fix some src/dest address handling in vrf and add a testcase. From
Stephen Suryaputra.
9) Fix multicast handling in Seville switches driven by mscc-ocelot
driver. From Vladimir Oltean.
10) Fix proto value passed to skb delivery demux in udp, from Xin Long.
11) HW pkt counters not reported correctly in enetc driver, from Claudiu
Manoil.
12) Fix deadlock in bridge, from Joseph Huang.
13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET.
14) Fix pid fetching in bpftool when there are a lot of results, from
Andrii Nakryiko.
15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso.
16) Various stymmac fixes, from Fugang Duan.
17) Fix null deref in tipc, from Cengiz Can.
18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric
Dumazet.
19) Revert a geneve change that likely isnt necessary, from Jakub
Kicinski.
20) Avoid premature rx buffer reuse in various Intel driversm from Björn
Töpel.
21) retain EcT bits during TIS reflection in tcp, from Wei Wang.
22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell.
23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault
24) Fix propagation of 32-bit signed bounds in bpf verifier and add test
cases, from Alexei Starovoitov.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
selftests: fix poll error in udpgro.sh
selftests/bpf: Fix "dubious pointer arithmetic" test
selftests/bpf: Fix array access with signed variable test
selftests/bpf: Add test for signed 32-bit bound check bug
bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.
MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower
net/mlx4_en: Handle TX error CQE
net/mlx4_en: Avoid scheduling restart task if it is already running
tcp: fix cwnd-limited bug for TSO deferral where we send nothing
net: flow_offload: Fix memory leak for indirect flow block
tcp: Retain ECT bits for tos reflection
ethtool: fix stack overflow in ethnl_parse_bitset()
e1000e: fix S0ix flow to allow S0i3.2 subset entry
ice: avoid premature Rx buffer reuse
ixgbe: avoid premature Rx buffer reuse
i40e: avoid premature Rx buffer reuse
igb: avoid transmit queue timeout in xdp path
igb: use xdp_do_flush
igb: skb add metasize for xdp
...
Alexei Starovoitov says:
====================
pull-request: bpf 2020-12-10
The following pull-request contains BPF updates for your *net* tree.
We've added 21 non-merge commits during the last 12 day(s) which contain
a total of 21 files changed, 163 insertions(+), 88 deletions(-).
The main changes are:
1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei.
2) Fix ring_buffer__poll() return value, from Andrii.
3) Fix race in lwt_bpf, from Cong.
4) Fix test_offload, from Toke.
5) Various xsk fixes.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John
Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We've been seeing failures with xfstests generic/091 and generic/263
when using READ_PLUS. I've made some progress on these issues, and the
tests fail later on but still don't pass. Let's disable READ_PLUS by
default until we can work out what is going on.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Since commit b4868b44c5 ("NFSv4: Wait for stateid updates after
CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5
seconds delay regardless of the size of the copy. The delay is from
nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential
fails because the seqid in both nfs4_state and nfs4_stateid are 0.
Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state,
until after the call to update_open_stateid, to indicate this is the 1st
open. This fix is part of a 2 patches, the other patch is the fix in the
source server to return the stateid for COPY_NOTIFY request with seqid 1
instead of 0.
Fixes: ce0887ac96 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
By switching to an XFS-backed export, I am able to reproduce the
ibcomp worker crash on my client with xfstests generic/013.
For the failing LISTXATTRS operation, xdr_inline_pages() is called
with page_len=12 and buflen=128.
- When ->send_request() is called, rpcrdma_marshal_req() does not
set up a Reply chunk because buflen is smaller than the inline
threshold. Thus rpcrdma_convert_iovs() does not get invoked at
all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked
on the receive buffer.
- During reply processing, rpcrdma_inline_fixup() tries to copy
received data into rq_rcv_buf->pages because page_len is positive.
But there are no receive pages because rpcrdma_marshal_req() never
allocated them.
The result is that the ibcomp worker faults and dies. Sometimes that
causes a visible crash, and sometimes it results in a transport hang
without other symptoms.
RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
should eventually be fixed or replaced. However, my preference is
that upper-layer operations should explicitly allocate their receive
buffers (using GFP_KERNEL) when possible, rather than relying on
XDRBUF_SPARSE_PAGES.
Reported-by: Olga kornievskaia <kolga@netapp.com>
Suggested-by: Olga kornievskaia <kolga@netapp.com>
Fixes: c10a75145f ("NFSv4.2: add the extended attribute proc functions.")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga kornievskaia <kolga@netapp.com>
Reviewed-by: Frank van der Linden <fllinden@amazon.com>
Tested-by: Olga kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The test program udpgso_bench_rx always invokes the poll()
syscall with a timeout of 10ms. If a larger timeout is specified
via the command line, udpgso_bench_rx is supposed to do multiple
poll() calls till the timeout is expired or an event is received.
Currently the poll() loop errors out after the first invocation with
no events, and may causes self-tests failure alike:
failed
GRO with custom segment size ./udpgso_bench_rx: poll: 0x0 expected 0x1
This change addresses the issue allowing the poll() loop to consume
all the configured timeout.
Fixes: ada641ff6e ("selftests: fixes for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The verifier trace changed following a bugfix. After checking the 64-bit
sign, only the upper bit mask is known, not bit 31. Update the test
accordingly.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test fails because of a recent fix to the verifier, even though this
program is valid. In details what happens is:
7: (61) r1 = *(u32 *)(r0 +0)
Load a 32-bit value, with signed bounds [S32_MIN, S32_MAX]. The bounds
of the 64-bit value are [0, U32_MAX]...
8: (65) if r1 s> 0xffffffff goto pc+1
... therefore this is always true (the operand is sign-extended).
10: (b4) w2 = 11
11: (6d) if r2 s> r1 goto pc+1
When true, the 64-bit bounds become [0, 10]. The 32-bit bounds are still
[S32_MIN, 10].
13: (64) w1 <<= 2
Because this is a 32-bit operation, the verifier propagates the new
32-bit bounds to the 64-bit ones, and the knowledge gained from insn 11
is lost.
14: (0f) r0 += r1
15: (7a) *(u64 *)(r0 +0) = 4
Then the verifier considers r0 unbounded here, rejecting the test. To
make the test work, change insn 8 to check the sign of the 32-bit value.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After a 32-bit load followed by a branch, the verifier would reduce the
maximum bound of the register to 0x7fffffff, allowing a user to bypass
bound checks. Ensure such a program is rejected.
In the second test, the 64-bit compare should not sufficient to
determine whether the signed 32-bit lower bound is 0, so the verifier
should reject the second branch.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The 64-bit signed bounds should not affect 32-bit signed bounds unless the
verifier knows that upper 32-bits are either all 1s or all 0s. For example the
register with smin_value==1 doesn't mean that s32_min_value is also equal to 1,
since smax_value could be larger than 32-bit subregister can hold.
The verifier refines the smax/s32_max return value from certain helpers in
do_refine_retval_range(). Teach the verifier to recognize that smin/s32_min
value is also bounded. When both smin and smax bounds fit into 32-bit
subregister the verifier can propagate those bounds.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull rdma fixes from Jason Gunthorpe:
"Two user triggerable crashers and a some EFA related regressions:
- Syzkaller found a bug in CM
- Restore access to the GID table and fix modify_qp for EFA
- Crasher in qedr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait
RDMA/core: Fix empty gid table for non IB/RoCE devices
RDMA/efa: Use the correct current and new states in modify QP
RDMA/qedr: iWARP invalid(zero) doorbell address fix
Pull media fixes from Mauro Carvalho Chehab:
"A couple of fixes:
- videobuf2: fix a DMABUF bug, preventing it to properly handle cache
sync/flush
- vidtv: an usage after free and a few sparse/smatch warning fixes
- pulse8-cec: a duplicate free and a bug related to new firmware
usage
- mtk-cir: fix a regression on a clock setting"
* tag 'media/v5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vidtv: fix some warnings
media: vidtv: fix kernel-doc markups
media: [next] media: vidtv: fix a read from an object after it has been freed
media: vb2: set cache sync hints when init buffers
media: pulse8-cec: add support for FW v10 and up
media: pulse8-cec: fix duplicate free at disconnect or probe error
media: mtk-cir: fix calculation of chk period
Add maintainers info for new Marvell Prestera Ethernet switch driver.
Signed-off-by: Mickey Rachamim <mickeyr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
from Subash Abhinov Kasiviswanathan.
2) Fix netlink dump of dynset timeouts later than 23 days.
3) Add comment for the indirect serialization of the nft commit mutex
with rtnl_mutex.
4) Remove bogus check for confirmed conntrack when matching on the
conntrack ID, from Brett Mastbergen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-12-09
This series contains updates to igb, ixgbe, i40e, and ice drivers.
Sven Auhagen fixes issues with igb XDP: return correct error value in XDP
xmit back, increase header padding to include space for double VLAN, add
an extack error when Rx buffer is too small for frame size, set metasize if
it is set in xdp, change xdp_do_flush_map to xdp_do_flush, and update
trans_start to avoid possible Tx timeout.
Björn fixes an issue where an Rx buffer can be reused prematurely with
XDP redirect for ixgbe, i40e, and ice drivers.
The following are changes since commit 323a391a22:
can: isotp: isotp_setsockopt(): block setsockopt on bound sockets
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan says:
====================
mlx4_en fixes
This patchset by Moshe contains fixes to the mlx4 Eth driver,
addressing issues in restart flow.
Patch 1 protects the restart task from being rescheduled while active.
Please queue for -stable >= v2.6.
Patch 2 reconstructs SQs stuck in error state, and adds prints for improved
debuggability.
Please queue for -stable >= v3.12.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case error CQE was found while polling TX CQ, the QP is in error
state and all posted WQEs will generate error CQEs without any data
transmitted. Fix it by reopening the channels, via same method used for
TX timeout handling.
In addition add some more info on error CQE and WQE for debug.
Fixes: bd2f631d7c ("net/mlx4_en: Notify user when TX ring in error state")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add restarting state flag to avoid scheduling another restart task while
such task is already running. Change task name from watchdog_task to
restart_task to better fit the task role.
Fixes: 1e338db56e ("mlx4_en: Fix a race at restart task")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:
(1) ACK for full-sized skb of N*MSS arrives
-> tcp_write_xmit() transmit full-sized skb with N*MSS
-> move pacing release time forward
-> exit tcp_write_xmit() because pacing time is in the future
(2) TSQ callback or TCP internal pacing timer fires
-> try to transmit next skb, but TSO deferral finds remainder of
available cwnd is not big enough to trigger an immediate send
now, so we defer sending until the next ACK.
(3) repeat...
So we can get into a case where we never mark ourselves as
cwnd-limited for many seconds at a time, even with
bulk/infinite-backlog senders, because:
o In case (1) above, every time in tcp_write_xmit() we have enough
cwnd to send a full-sized skb, we are not fully using the cwnd
(because cwnd is not a multiple of the TSO skb size). So every time we
send data, we are not cwnd limited, and so in the cwnd-limited
tracking code in tcp_cwnd_validate() we mark ourselves as not
cwnd-limited.
o In case (2) above, every time in tcp_write_xmit() that we try to
transmit the "remainder" of the cwnd but defer, we set the local
variable is_cwnd_limited to true, but we do not send any packets, so
sent_pkts is zero, so we don't call the cwnd-limited logic to update
tp->is_cwnd_limited.
Fixes: ca8a226343 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Reported-by: Ingemar Johansson <ingemar.s.johansson@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The offending commit introduces a cleanup callback that is invoked
when the driver module is removed to clean up the tunnel device
flow block. But it returns on the first iteration of the for loop.
The remaining indirect flow blocks will never be freed.
Fixes: 1fac52da59 ("net: flow_offload: consolidate indirect flow_block infrastructure")
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.
Fixes: ac8f1710c1 ("tcp: reflect tos value received in SYN to the socket")
Reported-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Fixes: efc2214b60 ("ice: Add support for XDP")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Fixes: 6453073987 ("ixgbe: add initial support for xdp redirect")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
Now, prior calling xdp_do_redirect():
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 2
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.
Fixes: d9314c474d ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since we share the transmit queue with the network stack,
it is possible that we run into a transmit queue timeout.
This will reset the queue.
This happens under high load when XDP is using the
transmit queue pretty much exclusively.
netdev_start_xmit() sets the trans_start variable of the
transmit queue to jiffies which is later utilized by dev_watchdog(),
so to avoid timeout, let stack know that XDP xmit happened by
bumping the trans_start within XDP Tx routines to jiffies.
Fixes: 9cbc948b5a ("igb: add XDP support")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull ARM SoC fixes from Arnd Bergmann:
"There are a few more PHY mode changes for allwinner SoC based boards
with a Realtek PHY after the driver changed its behavior, I assume
there will be more of these in the future. Also on for Allwinner, the
Banana Pi M2 board had a regression that led to some devices not
working because of a slightly incorrect voltage being applied.
By popular demand, I picked up a change from Krzysztof Kozlowski to
actually list the SoC tree in the MAINTAINERS file. We don't want to
get Cc'd on normal patches that are picked up by platform maintainers,
but the lack of an entry has led to confusion in the past.
All the other changes are fairly benign, fixing boot-time or
compile-time warning messages in various places:
- A dtc warning on the OLPC XO-1.75
- A boot-time warning on i.MX6 wandboard
- A harmless compile-time warning
- A regression causing one of the i.MX6 SoCs to be identified as
another
- Missing SoC identification of Allwinner V3 and S3"
* tag 'arm-soc-fixes-v5.10-4b' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: xilinx: Mark pm_api_features_map with static keyword
ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs
MAINTAINERS: add a limited ARM and ARM64 SoC entry
MAINTAINERS: correct SoC Git address (formerly: arm-soc)
ARM: keystone: remove SECTION_SIZE_BITS/MAX_PHYSMEM_BITS
arm64: dts: allwinner: H5: NanoPi Neo Plus2: phy-mode rgmii-id
arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id
ARM: dts: imx6qdl-kontron-samx6i: fix I2C_PM scl pin
ARM: dts: imx6qdl-wandboard-revd1: Remove PAD_GPIO_6 from enetgrp
ARM: imx: Use correct SRC base address
ARM: dts: sun7i: pcduino3-nano: enable RGMII RX/TX delay on PHY
ARM: dts: sun8i: v3s: fix GIC node memory range
ARM: dts: sun8i: v40: bananapi-m2-berry: Fix ethernet node
ARM: dts: sun8i: r40: bananapi-m2-berry: Fix dcdc1 regulator
ARM: dts: sun7i: bananapi: Enable RGMII RX/TX delay on Ethernet PHY
ARM: dts: s3: pinecube: align compatible property to other S3 boards
ARM: sunxi: Add machine match for the Allwinner V3 SoC
arm64: dts: allwinner: h6: orangepi-one-plus: Fix ethernet
The check_spi_bus_bridge() in scripts/dtc/checks.c requires that the node
have "spi-slave" property must with "#address-cells = <0>" and
"#size-cells = <0>". But currently both "#address-cells" and "#size-cells"
properties are deleted, the corresponding default values are 2 and 1. As a
result, the check fails and below warnings is displayed.
arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #address-cells for SPI bus
also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #size-cells for SPI bus
also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2-olpc-xo-1-75.dtb: Warning (spi_bus_reg): \
Failed prerequisite 'spi_bus_bridge'
Because the value of "#size-cells" is already defined as zero in the node
"ssp3: spi@d4037000" in arch/arm/boot/dts/mmp2.dtsi. So we only need to
explicitly add "#address-cells = <0>" and keep "#size-cells" no change.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20201207084752.1665-2-thunder.leizhen@huawei.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull iommu fix from Will Deacon:
"Fix interrupt table length definition for AMD IOMMU.
It's actually a fix for a fix, where the size of the interrupt
remapping table was increased but a related constant for the
size of the interrupt table was forgotten"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs
Toke Høiland-Jørgensen says:
====================
This series restores the test_offload.py selftest to working order. It seems a
number of subtle behavioural changes have crept into various subsystems which
broke test_offload.py in a number of ways. Most of these are fairly benign
changes where small adjustments to the test script seems to be the best fix,
but one is an actual kernel bug that I've observed in the wild caused by a bad
interaction between xdp_attachment_flags_ok() and the rework of XDP program
handling in the core netdev code.
Patch 1 fixes the bug by removing xdp_attachment_flags_ok(), and the reminder of
the patches are adjustments to test_offload.py, including a new feature for
netdevsim to force a BPF verification fail. Please see the individual patches
for details.
Changelog:
v4:
- Accidentally truncated the Fixes: hashes in patches 3/4 to 11 chars
v3:
- Add Fixes: tags
v2:
- Replace xdp_attachment_flags_ok() with a check in dev_xdp_attach()
- Better packing of struct nsim_dev
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>