The BTF conflicts were simple overlapping changes.
The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and
data segments") tried to initialize various left-over ad-hoc vma's
"properly", but actually made things worse for the temporary vma's used
for TLB flushing.
vma_init() doesn't actually initialize all of the vma, just a few
fields, so doing something like
- struct vm_area_struct vma = { .vm_mm = tlb->mm, };
+ struct vm_area_struct vma;
+
+ vma_init(&vma, tlb->mm);
was actually very bad: instead of having a nicely initialized vma with
every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
with only a couple of fields initialized. And they weren't even fields
that the code in question mostly cared about.
The flush_tlb_range() function takes a "struct vma" rather than a
"struct mm_struct", because a few architectures actually care about what
kind of range it is - being able to only do an ITLB flush if it's a
range that doesn't have data accesses enabled, for example. And all the
normal users already have the vma for doing the range invalidation.
But a few people want to call flush_tlb_range() with a range they just
made up, so they also end up using a made-up vma. x86 just has a
special "flush_tlb_mm_range()" function for this, but other
architectures (arm and ia64) do the "use fake vma" thing instead, and
thus got caught up in the vma_init() changes.
At the same time, the TLB flushing code really doesn't care about most
other fields in the vma, so vma_init() is just unnecessary and
pointless.
This fixes things by having an explicit "this is just an initializer for
the TLB flush" initializer macro, which is used by the arm/arm64/ia64
people who mis-use this interface with just a dummy vma.
Fixes: 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and data segments")
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new TCP stats to record the number of reordering events seen
and expose it in both tcp_info (TCP_INFO) and opt_stats
(SOF_TIMESTAMPING_OPT_STATS).
Application can use this stats to track the frequency of the reordering
events in addition to the existing reordering stats which tracks the
magnitude of the latest reordering event.
Note: this new stats tracks reordering events triggered by ACKs, which
could often be fewer than the actual number of packets being delivered
out-of-order.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new TCP stat to record the number of DSACK blocks received
(RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new TCP stat to record the number of bytes retransmitted
(RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new TCP stat to record the number of bytes sent
(RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.
Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently drivers have to check if they already have a umem
installed for a given queue and return an error if so. Make
better use of XDP_QUERY_XSK_UMEM and move this functionality
to the core.
We need to keep rtnl across the calls now.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to depend on real_num_rx_queues as a upper bound for sanity
checks. For AF_XDP socket validation it's useful if the check behaves
the same regardless of CONFIG_SYSFS setting.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull locking fixes from Ingo Molnar:
"A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
i2c/mux, locking/core: Annotate the nested rt_mutex usage
locking/rtmutex: Allow specifying a subclass for nested locking
Saeed Mahameed says:
====================
mlx5e-updates-2018-07-27 (Vxlan updates)
This series from Gal and Saeed provides updates to mlx5 vxlan implementation.
Gal, started with three cleanups to reflect the actual hardware vxlan state
- reflect 4789 UDP port default addition to software database
- check maximum number of vxlan UDP ports
- cleanup an unused member in vxlan work
Then Gal provides performance optimization by replacing the
vxlan radix tree with a hash table.
Measuring mlx5e_vxlan_lookup_port execution time:
Radix Tree Hash Table
--------------- ------------ ------------
Single Stream 161 ns 79 ns (51% improvement)
Multi Stream 259 ns 136 ns (47% improvement)
Measuring UDP stream packet rate, single fully utilized TX core:
Radix Tree: 498,300 PPS
Hash Table: 555,468 PPS (11% improvement)
Next, from Saeed, vxlan refactoring to allow sharing the vxlan table
between different mlx5 netdevice instances like PF and VF representors,
this is done by making mlx5 vxlan interface more generic and decoupling
it from PF netdevice structures and logic, then moving it into mlx5 core
as a low level interface so it can be used by VF representors, which is
illustrated in the last patch of the serious.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If an invalid MTU value is set through rtnetlink return extra error
information instead of putting message in kernel log. For other cases
where there is no visible API, keep the error report in the log.
Example:
# ip li set dev enp12s0 mtu 10000
Error: mtu greater than device maximum.
# ifconfig enp12s0 mtu 10000
SIOCSIFMTU: Invalid argument
# dmesg | tail -1
[ 2047.795467] enp12s0: mtu greater than device maximum
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the feature described in rfc1812#section-5.3.5.2
and rfc2644. It allows the router to forward directed broadcast when
sysctl bc_forwarding is enabled.
Note that this feature could be done by iptables -j TEE, but it would
cause some problems:
- target TEE's gateway param has to be set with a specific address,
and it's not flexible especially when the route wants forward all
directed broadcasts.
- this duplicates the directed broadcasts so this may cause side
effects to applications.
Besides, to keep consistent with other os router like BSD, it's also
necessary to implement it in the route rx path.
Note that route cache needs to be flushed when bc_forwarding is
changed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move vxlan logic and objects to mlx5 core dirver.
Since it going to be used from different mlx5 interfaces.
e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
The NIC has a limited number of offloaded VXLAN UDP ports (usually 4).
Instead of letting the firmware fail when trying to add more ports than
it can handle, let the driver check it on its own.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull block fixes from Jens Axboe:
"Bigger than usual at this time, mostly due to the O_DIRECT corruption
issue and the fact that I was on vacation last week. This contains:
- NVMe pull request with two fixes for the FC code, and two target
fixes (Christoph)
- a DIF bio reset iteration fix (Greg Edwards)
- two nbd reply and requeue fixes (Josef)
- SCSI timeout fixup (Keith)
- a small series that fixes an issue with bio_iov_iter_get_pages(),
which ended up causing corruption for larger sized O_DIRECT writes
that ended up racing with buffered writes (Martin Wilck)"
* tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
block: reset bi_iter.bi_done after splitting bio
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: bio_iov_iter_get_pages: fix size of last iovec
nvmet: only check for filebacking on -ENOTBLK
nvmet: fixup crash on NULL device path
scsi: set timed out out mq requests to complete
blk-mq: export setting request completion state
nvme: if_ready checks to fail io to deleting controller
nvmet-fc: fix target sgl list on large transfers
nbd: handle unexpected replies better
nbd: don't requeue the same request twice.
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kvm, mm: account shadow page tables to kmemcg
zswap: re-check zswap_is_full() after do zswap_shrink()
include/linux/eventfd.h: include linux/errno.h
mm: fix vma_is_anonymous() false-positives
mm: use vma_init() to initialize VMAs on stack and data segments
mm: introduce vma_init()
mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
ipc/sem.c: prevent queue.status tearing in semop
mm: disallow mappings that conflict for devm_memremap_pages()
kasan: only select SLUB_DEBUG with SYSFS=y
delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
Pull tracing fixes from Steven Rostedt:
"Various fixes to the tracing infrastructure:
- Fix double free when the reg() call fails in
event_trigger_callback()
- Fix anomoly of snapshot causing tracing_on flag to change
- Add selftest to test snapshot and tracing_on affecting each other
- Fix setting of tracepoint flag on error that prevents probes from
being deleted.
- Fix another possible double free that is similar to
event_trigger_callback()
- Quiet a gcc warning of a false positive unused variable
- Fix crash of partial exposed task->comm to trace events"
* tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kthread, tracing: Don't expose half-written comm when creating kthreads
tracing: Quiet gcc warning about maybe unused link variable
tracing: Fix possible double free in event_enable_trigger_func()
tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
selftests/ftrace: Add snapshot and tracing_on test case
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
tracing: Fix double free of event_trigger_data
The existing SocketCAN implementation provides alloc_candev() to
allocate a CAN device using a single Tx and Rx queue. This can lead to
priority inversion in case the single Tx queue is already full with low
priority messages and a high priority message needs to be sent while the
bus is fully loaded with medium priority messages.
This problem can be solved by using the existing multi-queue support of
the network subsytem. The commit makes it possible to use multi-queue in
the CAN subsystem in the same way it is used in the Ethernet subsystem
by adding an alloc_candev_mqs() call and accompanying macros. With this
support a CAN device can use multi-queue qdisc (e.g. mqprio) to avoid
the aforementioned priority inversion.
The exisiting functionality of alloc_candev() is the same as before.
CAN devices need to have prioritized multiple hardware queues or are
able to abort waiting for arbitration to make sensible use of
multi-queues.
Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
Reviewed-by: Heiko Schocher <hs@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The new gasket staging driver ran into a randconfig build failure when
CONFIG_EVENTFD is disabled:
In file included from drivers/staging/gasket/gasket_interrupt.h:11,
from drivers/staging/gasket/gasket_interrupt.c:4:
include/linux/eventfd.h: In function 'eventfd_ctx_fdget':
include/linux/eventfd.h:51:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
I can't see anything wrong with including eventfd.h before err.h, so the
easiest fix is to make it possible to do this by including the file
where it is needed.
Link: http://lkml.kernel.org/r/20180724110737.3985088-1-arnd@arndb.de
Fixes: 9a69f5087c ("drivers/staging: Gasket driver framework + Apex driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
Commit c96f5471ce ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays. If
%current succeeded init while @p didn't, it leads to the following
crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
RIP: 0010:__delayacct_blkio_end+0xc/0x40
Call Trace:
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
Fixes: c96f5471ce ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <dsj@fb.com>
Debugged-by: Dave Jones <dsj@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Snyder <joshs@netflix.com>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a helper for checking whether polling is used to detect PHY status
changes.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Vince reported the perf_fuzzer giving various unwinder warnings and
Josh reported:
> Deja vu. Most of these are related to perf PEBS, similar to the
> following issue:
>
> b8000586c9 ("perf/x86/intel: Cure bogus unwind from PEBS entries")
>
> This is basically the ORC version of that. setup_pebs_sample_data() is
> assembling a franken-pt_regs which ORC isn't happy about. RIP is
> inconsistent with some of the other registers (like RSP and RBP).
And where the previous unwinder only needed BP,SP ORC also requires
IP. But we cannot spoof IP because then the sample will get displaced,
entirely negating the point of PEBS.
So cure the whole thing differently by doing the unwind early; this
does however require a means to communicate we did the unwind early.
We (ab)use an unused sample_type bit for this, which we set on events
that fill out the data->callchain before the normal
perf_prepare_sample().
Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
1) Handle stations tied to AP_VLANs properly during mac80211 hw
reconfig. From Manikanta Pubbisetty.
2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.
3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
Ben Elisha.
4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.
5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.
6) Fix cached netdev name leak in nf_tables, from Florian Westphal.
7) Fix memory leaks on chain rename, also from Florian Westphal.
8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
Cheng.
9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.
10) Fix link local address handling with VRF, from David Ahern.
11) Don't clobber 'err' on a successful call to __skb_linearize() in
skb_segment(). From Eric Dumazet.
12) Fix vxlan fdb notification races, from Roopa Prabhu.
13) Hash UDP fragments consistently, from Paolo Abeni.
14) If TCP receives lots of out of order tiny packets, we do really
silly stuff. Make the out-of-order queue ending more robust to this
kind of behavior, from Eric Dumazet.
15) Don't leak netlink dump state in nf_tables, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: axienet: Fix double deregister of mdio
qmi_wwan: fix interface number for DW5821e production firmware
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
bnx2x: Fix invalid memory access in rss hash config path.
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
r8169: restore previous behavior to accept BIOS WoL settings
cfg80211: never ignore user regulatory hint
sock: fix sg page frag coalescing in sk_alloc_sg
netfilter: nf_tables: move dumper state allocation into ->start
tcp: add tcp_ooo_try_coalesce() helper
tcp: call tcp_drop() from tcp_data_queue_ofo()
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: free batches of packets in tcp_prune_ofo_queue()
ip: hash fragments consistently
ipv6: use fib6_info_hold_safe() when necessary
can: xilinx_can: fix power management handling
can: xilinx_can: fix incorrect clear of non-processed interrupts
can: xilinx_can: fix RX overflow interrupt not being enabled
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
...
This is preparing for drivers that want to directly alter the state of
their requests. No functional change here.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->start() is called once when dump is being initialized, there is no
need to store it in netlink_cb.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers-next patches for 4.19
The first set of patches for 4.19. Only smaller features and bug
fixes, not really anything major. Also included are changes to
include/linux/bitfield.h, we agreed with Johannes that it makes sense
to apply them via wireless-drivers-next.
Major changes:
ath10k
* support channel 173
* fix spectral scan for QCA9984 and QCA9888 chipsets
ath6kl
* add support for Dell Wireless 1537
ti wlcore
* add support for runtime PM
* enable runtime PM autosuspend support
qtnfmac
* support changing MAC address
* enable source MAC address randomization support
libertas
* fix suspend and resume for SDIO cards
mt76
* add software DFS radar pattern detector for mt76x2 based devices
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracer has one event, event 0x26, with two subtypes:
- Subtype 0: Ownership change
- Subtype 1: Traces available
An ownership change occurs in the following cases:
1- Owner releases his ownership, in this case, an event will be
sent to inform others to reattempt acquire ownership.
2- Ownership was taken by a higher priority tool, in this case
the owner should understand that it lost ownership, and go through
tear down flow.
The second subtype indicates that there are traces in the trace buffer,
in this case, the driver polls the tracer buffer for new traces, parse
them and prepares the messages for printing.
The HW starts tracing from the first address in the tracer buffer.
Driver receives an event notifying that new trace block exists.
HW posts a timestamp event at the last 8B of every 256B block.
Comparing the timestamp to the last handled timestamp would indicate
that this is a new trace block. Once the new timestamp is detected,
the entire block is considered valid.
Block validation and parsing, should be done after copying the current
block to a different location, in order to avoid block overwritten
during processing.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Implement FW tracer logic and registers access, initialization and
cleanup flows.
Initializing the tracer will be part of load one flow, as multiple
PFs will try to acquire ownership but only one will succeed and will
be the tracer owner.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 core infrastructure updates and fixes.
From Eran:
- Add MPEGC (Management PCIe General Configuration) registers and btis
- Fix tristate and description for MLX5 module
rom Feras:
- Add hardware structures for the firmware tracer
From Jainbo:
- Core support for double vlan push/pop steering action
From Max:
- Add XRQ commands definitions
From Noa:
- Add missing SET_DRIVER_VERSION command translation
From Roi:
- Use ERR_CAST() instead of coding it
From Tariq:
- Better return types for CQE API
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull vfs fixes from Al Viro:
"Fix several places that screw up cleanups after failures halfway
through opening a file (one open-coding filp_clone_open() and getting
it wrong, two misusing alloc_file()). That part is -stable fodder from
the 'work.open' branch.
And Christoph's regression fix for uapi breakage in aio series;
include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
definition of sigset_t, the reason for doing so in the first place had
been bogus - there's no need to expose struct __aio_sigset in
aio_abi.h at all"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: don't expose __aio_sigset in uapi
ocxlflash_getfile(): fix double-iput() on alloc_file() failures
cxl_getfile(): fix double-iput() on alloc_file() failures
drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)
[ Also, fix the prototype of kernel_wait4() to have that __user
annotation - Linus ]
Fixes: 92ebce5ac5 ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following warning:
net/ipv4/bpfilter/sockopt.c:28:5: error: symbol 'bpfilter_ip_set_sockopt' redeclared with different type
net/ipv4/bpfilter/sockopt.c:34:5: error: symbol 'bpfilter_ip_get_sockopt' redeclared with different type
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.
The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.
We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.
Right now those functions are literally just wrappers around the
kmem_cache_*() calls. This is a purely mechanical conversion:
# new vma:
kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
# copy old vma
kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
# free vma
kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-07-18
The following series provides fixes to mlx5 core and net device driver.
Please pull and let me know if there's any problem.
For -stable v4.7
net/mlx5e: Don't allow aRFS for encapsulated packets
net/mlx5e: Fix quota counting in aRFS expire flow
For -stable v4.15
net/mlx5e: Only allow offloading decap egress (egdev) flows
net/mlx5e: Refine ets validation function
net/mlx5: Adjust clock overflow work period
For -stable v4.17
net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-20
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add sharing of BPF objects within one ASIC: this allows for reuse of
the same program on multiple ports of a device, and therefore gains
better code store utilization. On top of that, this now also enables
sharing of maps between programs attached to different ports of a
device, from Jakub.
2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
detections and unused variable exports, also from Jakub.
3) First batch of RCU annotation fixes in prog array handling, i.e.
there are several __rcu markers which are not correct as well as
some of the RCU handling, from Roman.
4) Two fixes in BPF sample files related to checking of the prog_cnt
upper limit from sample loader, from Dan.
5) Minor cleanup in sockmap to remove a set but not used variable,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Plumb in get_ownership() callback for devices belonging to a class so that
they can be created with uid/gid different from global root. This will
allow network devices in a container to belong to container's root and not
global root.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally kobjects and their sysfs representation belong to global root,
however it is not necessarily the case for objects in separate namespaces.
For example, objects in separate network namespace logically belong to the
container's root and not global root.
This change lays groundwork for allowing network namespace objects
ownership to be transferred to container's root user by defining
get_ownership() callback in ktype structure and using it in sysfs code to
retrieve desired uid/gid when creating sysfs objects for given kobject.
Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.
When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.
Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree:
1) No need to set ttl from reject action for the bridge family, from
Taehee Yoo.
2) Use a fixed timeout for flow that are passed up from the flowtable
to conntrack, from Florian Westphal.
3) More preparation patches for tproxy support for nf_tables, from Mate
Eckl.
4) Remove unnecessary indirection in core IPv6 checksum function, from
Florian Westphal.
5) Use nf_ct_get_tuplepr() from openvswitch, instead of opencoding it.
From Florian Westphal.
6) socket match now selects socket infrastructure, instead of depending
on it. From Mate Eckl.
7) Patch series to simplify conntrack tuple building/parsing from packet
path and ctnetlink, from Florian Westphal.
8) Fetch timeout policy from protocol helpers, instead of doing it from
core, from Florian Westphal.
9) Merge IPv4 and IPv6 protocol trackers into conntrack core, from
Florian Westphal.
10) Depend on CONFIG_NF_TABLES_IPV6 and CONFIG_IP6_NF_IPTABLES
respectively, instead of IPV6. Patch from Mate Eckl.
11) Add specific function for garbage collection in conncount,
from Yi-Hung Wei.
12) Catch number of elements in the connlimit list, from Yi-Hung Wei.
13) Move locking to nf_conncount, from Yi-Hung Wei.
14) Series of patches to add lockless tree traversal in nf_conncount,
from Yi-Hung Wei.
15) Resolve clash in matching conntracks when race happens, from
Martynas Pumputis.
16) If connection entry times out, remove template entry from the
ip_vs_conn_tab table to improve behaviour under flood, from
Julian Anastasov.
17) Remove useless parameter from nf_ct_helper_ext_add(), from Gao feng.
18) Call abort from 2-phase commit protocol before requesting modules,
make sure this is done under the mutex, from Florian Westphal.
19) Grab module reference when starting transaction, also from Florian.
20) Dynamically allocate expression info array for pre-parsing, from
Florian.
21) Add per netns mutex for nf_tables, from Florian Westphal.
22) A couple of patches to simplify and refactor nf_osf code to prepare
for nft_osf support.
23) Break evaluation on missing socket, from Mate Eckl.
24) Allow to match socket mark from nft_socket, from Mate Eckl.
25) Remove dependency on nf_defrag_ipv6, now that IPv6 tracker is
built-in into nf_conntrack. From Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IOMMU fix from Joerg Roedel:
"Only one revert, for an an Intel VT-d patch that caused issues with
the i915 GPU driver"
* tag 'iommu-fixes-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
Revert "iommu/vt-d: Clean up pasid quirk for pre-production devices"