If userspace does not include the trailing end of batch message, then
nfnetlink aborts the transaction. This allows to check that ruleset
updates trigger no errors.
After this patch, invoking this command from the prerouting chain:
# nft -c add rule x y fib saddr . oif type local
fails since oif is not supported there.
This patch fixes the lack of rule validation from the abort/check path
to catch configuration errors such as the one above.
Fixes: a654de8fdc ("netfilter: nf_tables: fix chain dependency validation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():
/* Note: skb->sk can be different from sk, in case of tunnels */
int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.
One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.
So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If netfilter changes the packet mark, the packet is rerouted. The
ip_route_me_harder family of functions fails to use the right sk, opting
to instead use skb->sk, resulting in a routing loop when used with
tunnels. With the next change fixing this issue in netfilter, test for
the relevant condition inside our test suite, since wireguard was where
the bug was discovered.
Reported-by: Chen Minqiang <ptpt52@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netlink report should be sent regardless the available listeners.
Fixes: 84d7fce693 ("netfilter: nf_tables: export rule-set generation ID")
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Peter writes:
4 bug fixes for Cadence 3 driver.
The biggest fix is changing endpoint configuration method to
avoid FIFO overflow at multiple endpoints situation.
* tag 'usb-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb:
usb: cdns3: gadget: own the lock wrongly at the suspend routine
usb: cdns3: Fix on-chip memory overflow issue
usb: cdns3: gadget: suspicious implicit sign extension
usb: cdns3: Variable 'length' set but not used
When (for example) an IBSS station is pre-moved to AUTHORIZED
before it's inserted, and then the insertion fails, we don't
clean up the fast RX/TX states that might already have been
created, since we don't go through all the state transitions
again on the way down.
Do that, if it hasn't been done already, when the station is
freed. I considered only freeing the fast TX/RX state there,
but we might add more state so it's more robust to wind down
the state properly.
Note that we warn if the station was ever inserted, it should
have been properly cleaned up in that case, and the driver
will probably not like things happening out of order.
Reported-by: syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's a race condition in the netdev registration in that
NETDEV_REGISTER actually happens after the netdev is available,
and so if we initialize things only there, we might get called
with an uninitialized wdev through nl80211 - not using a wdev
but using a netdev interface index.
I found this while looking into a syzbot report, but it doesn't
really seem to be related, and unfortunately there's no repro
for it (yet). I can't (yet) explain how it managed to get into
cfg80211_release_pmsr() from nl80211_netlink_notify() without
the wdev having been initialized, as the latter only iterates
the wdevs that are linked into the rdev, which even without the
change here happened after init.
However, looking at this, it seems fairly clear that the init
needs to be done earlier, otherwise we might even re-init on a
netns move, when data might still be pending.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009135821.fdcbba3aad65.Ie9201d91dbcb7da32318812effdc1561aeaf4cdc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When ieee80211_skb_resize() is called from ieee80211_build_hdr()
the skb has no 802.11 header yet, in fact it consist only of the
payload as the ethernet frame is removed. As such, we're using
the payload data for ieee80211_is_mgmt(), which is of course
completely wrong. This didn't really hurt us because these are
always data frames, so we could only have added more tailroom
than we needed if we determined it was a management frame and
sdata->crypto_tx_tailroom_needed_cnt was false.
However, syzbot found that of course there need not be any payload,
so we're using at best uninitialized memory for the check.
Fix this to pass explicitly the kind of frame that we have instead
of checking there, by replacing the "bool may_encrypt" argument
with an argument that can carry the three possible states - it's
not going to be encrypted, it's a management frame, or it's a data
frame (and then we check sdata->crypto_tx_tailroom_needed_cnt).
Reported-by: syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When sending EAPOL frames via NL80211 they are treated as injected
frames in mac80211. Due to commit 1df2bdba52 ("mac80211: never drop
injected frames even if normally not allowed") these injected frames
were not assigned a sta context in the function ieee80211_tx_dequeue,
causing certain wireless network cards to always send EAPOL frames in
plaintext. This may cause compatibility issues with some clients or
APs, which for instance can cause the group key handshake to fail and
in turn would cause the station to get disconnected.
This commit fixes this regression by assigning a sta context in
ieee80211_tx_dequeue to injected frames as well.
Note that sending EAPOL frames in plaintext is not a security issue
since they contain their own encryption and authentication protection.
Cc: stable@vger.kernel.org
Fixes: 1df2bdba52 ("mac80211: never drop injected frames even if normally not allowed")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Tested-by: Christian Hesse <list@eworm.de>
Tested-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201019160113.350912-1-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We finalize caps before initializing kvm hyp code, and any use of
cpus_have_const_cap() in kvm hyp code generates redundant and
potentially unsound code to read the cpu_hwcaps array.
A number of helper functions used in both hyp context and regular kernel
context use cpus_have_const_cap(), as some regular kernel code runs
before the capabilities are finalized. It's tedious and error-prone to
write separate copies of these for hyp and non-hyp code.
So that we can avoid the redundant code, let's automatically upgrade
cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context.
With this change, there's never a reason to access to cpu_hwcaps array
from hyp code, and we don't need to create an NVHE alias for this.
This should have no effect on non-hyp code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
Currently has_vhe() detects whether it is being compiled for VHE/NVHE
hyp code based on preprocessor definitions, and uses this knowledge to
avoid redundant runtime checks.
There are other cases where we'd like to use this knowledge, so let's
factor the preprocessor checks out into separate helpers.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
The call to rcu_cpu_starting() in secondary_start_kernel() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call trace:
dump_backtrace+0x0/0x3c8
show_stack+0x14/0x60
dump_stack+0x14c/0x1c4
lockdep_rcu_suspicious+0x134/0x14c
__lock_acquire+0x1c30/0x2600
lock_acquire+0x274/0xc48
_raw_spin_lock+0xc8/0x140
vprintk_emit+0x90/0x3d0
vprintk_default+0x34/0x40
vprintk_func+0x378/0x590
printk+0xa8/0xd4
__cpuinfo_store_cpu+0x71c/0x868
cpuinfo_store_cpu+0x2c/0xc8
secondary_start_kernel+0x244/0x318
This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the secondary_start_kernel() function.
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Link: https://lore.kernel.org/r/20201028182614.13655-1-cai@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
vdpa_sim generates a ramdom MAC address but it is never used by upper
layers because the VIRTIO_NET_F_MAC bit is not set in the features list.
Because of that, virtio-net always regenerates a random MAC address each
time it is loaded whereas the address should only change on vdpa_sim
load/unload.
Fix that by adding VIRTIO_NET_F_MAC in the features list of vdpa_sim.
Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Cc: jasowang@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20201029122050.776445-2-lvivier@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as
a unused variable, so suggest we remove it. Actually it stores
return value of irq_bypass_register_producer(), but we did not
check it, we should handle the failure case.
This commit will print a message if irq bypass register producer
fail, in this case, vqs still remain functional.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This reverts commit 7ed9e3d97c.
The patch creates a DoS risk since it can result in a high order memory
allocation.
Fixes: 7ed9e3d97c ("vhost-vdpa: fix page pinning leakage in error path")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the system goes to suspend, if the controller is at device mode with
cable connecting to host, the call stack is: cdns3_suspend->
cdns3_gadget_suspend -> cdns3_disconnect_gadget, after cdns3_disconnect_gadget
is called, it owns lock wrongly, it causes the system being deadlock after
resume due to at cdns3_device_thread_irq_handler, it tries to get the lock,
but can't get it forever.
To fix it, we delete the unlock-lock operations at cdns3_disconnect_gadget,
and do it at the caller.
Fixes: b1234e3b3b ("usb: cdns3: add runtime PM support")
Acked-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Patch fixes issue caused setting On-chip memory overflow bit in usb_sts
register. The issue occurred because EP_CFG register was set twice
before USB_STS.CFGSTS was set. Every write operation on EP_CFG.BUFFERING
causes that controller increases internal counter holding the number
of reserved on-chip buffers. First time this register was updated in
function cdns3_ep_config before delegating SET_CONFIGURATION request
to class driver and again it was updated when class wanted to enable
endpoint. This patch fixes this issue by configuring endpoints
enabled by class driver in cdns3_gadget_ep_enable and others just
before status stage.
Cc: stable@vger.kernel.org#v5.8+
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Reported-and-tested-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
While I thought I had this correct (since it actually did reject modes
like I expected during testing), Ville Syrjala from Intel pointed out
that the logic here isn't correct. max_clock refers to the max data rate
supported by the DP encoder. So, limiting it to the output of ds_clock (which
refers to the maximum dotclock of the downstream DP device) doesn't make any
sense. Additionally, since we're using the connector's bpc as the canonical BPC
we should use this in mode_valid until we support dynamically setting the bpp
based on bandwidth constraints.
https://lists.freedesktop.org/archives/dri-devel/2020-September/280276.html
For more info.
So, let's rewrite this using Ville's advice.
v2:
* Ville pointed out I mixed up the dotclock and the link rate. So fix that...
* ...and also rename all the variables in this function to be more appropriately
labeled so I stop mixing them up.
* Reuse the bpp from the connector for now until we have dynamic bpp selection.
* Use use DIV_ROUND_UP for calculating the mode rate like i915 does, which we
should also have been doing from the start
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 409d38139b ("drm/nouveau/kms/nv50-: Use downstream DP clock limits for mode validation")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ville also pointed out that I got a lot of the logic here wrong as well, whoops.
While I don't think anyone's likely using 3D output with nouveau, the next patch
will make nouveau_conn_mode_valid() make a lot less sense. So, let's just get
rid of it and open-code it like before, while taking care to move the 3D frame
packing calculations on the dot clock into the right place.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: d6a9efece7 ("drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
With this we try to detect if the endianess switch works and assume LE if
not. Suggested by Ben.
Fixes: 51c05340e4 ("drm/nouveau/device: detect if changing endianness failed")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
we can't use nouveau_bo_ref here as no ttm object was allocated and
nouveau_bo_ref mainly deals with that. Simply deallocate the object.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Not entirely sure why this never came up when I originally tested this
(maybe some BIOSes already have this setup?) but the ->caps_init vfunc
appears to cause the display engine to throw an exception on driver
init, at least on my ThinkPad P72:
nouveau 0000:01:00.0: disp: chid 0 mthd 008c data 00000000 0000508c 0000102b
This is magic nvidia speak for "You need to have the DMA notifier offset
programmed before you can call NV507D_GET_CAPABILITIES." So, let's fix
this by doing that, and also perform an update afterwards to prevent
racing with the GPU when reading capabilities.
v2:
* Don't just program the DMA notifier offset, make sure to actually
perform an update
v3:
* Don't call UPDATE()
* Actually read the correct notifier fields, as apparently the
CAPABILITIES_DONE field lives in a different location than the main
NV_DISP_CORE_NOTIFIER_1 field. As well, 907d+ use a different
CAPABILITIES_DONE field then pre-907d cards.
v4:
* Don't forget to check the return value of core507d_read_caps()
v5:
* Get rid of NV50_DISP_CAPS_NTFY[14], use NV50_DISP_CORE_NTFY
* Disable notifier after calling GetCapabilities()
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 4a2cb4181b ("drm/nouveau/kms/nv50-: Probe SOR and PIOR caps for DP interlacing support")
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The user level OpenCL code shouldn't have to align start and end
addresses to a page boundary. That is better handled in the nouveau
driver. The npages field is also redundant since it can be computed
from the start and end addresses.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>