The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.
Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.
The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20190728131648.971361611@linutronix.de
The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.
Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.
The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.
Fixes: 7ac8707479 ("x86/vdso: Switch to generic vDSO implementation")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190728131648.879156507@linutronix.de
To address the regression which causes seccomp to deny applications the
access to clock_gettime64() and clock_getres64() syscalls because they
are not enabled in the existing filters.
That trips over the fact that 32bit VDSOs use the new clock_gettime64() and
clock_getres64() syscalls in the fallback path.
Add a conditional to invoke the 32bit legacy fallback syscalls instead of
the new 64bit variants. The conditional can go away once all architectures
are converted.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907301134470.1738@nanos.tec.linutronix.de
To allow syscall fallbacks using the legacy 32bit syscall for 32bit VDSO
builds, move the fallback invocation out into the callers.
Split the common code out of __cvdso_clock_gettime/getres() and invoke the
syscall fallback in the 64bit and 32bit variants.
Preparatory work for using legacy syscalls in 32bit VDSO. No functional
change.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20190728131648.695579736@linutronix.de
The 32bit variants of vdso_clock_gettime()/getres() have a NULL pointer
check for the timespec pointer. That's inconsistent vs. 64bit.
But the vdso implementation will never be consistent versus the syscall
because the only case which it can handle is NULL. Any other invalid
pointer will cause a segfault. So special casing NULL is not really useful.
Remove it along with the superflouos syscall fallback invocation as that
will return -EFAULT anyway. That also gets rid of the dubious typecast
which only works because the pointer is NULL.
Fixes: 00b26474c2 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190728131648.587523358@linutronix.de
Stefano Garzarella says:
====================
vsock/virtio: optimizations to increase the throughput
This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.
v5:
- rebased all patches on net-next
- added Stefan's R-b and Michael's A-b
v4: https://patchwork.kernel.org/cover/11047717
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431
Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.
A brief description of patches:
- Patches 1: limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
host -> guest [Gbps]
pkt_size before opt p 1 p 2+3 p 4+5
32 0.032 0.030 0.048 0.051
64 0.061 0.059 0.108 0.117
128 0.122 0.112 0.227 0.234
256 0.244 0.241 0.418 0.415
512 0.459 0.466 0.847 0.865
1K 0.927 0.919 1.657 1.641
2K 1.884 1.813 3.262 3.269
4K 3.378 3.326 6.044 6.195
8K 5.637 5.676 10.141 11.287
16K 8.250 8.402 15.976 16.736
32K 13.327 13.204 19.013 20.515
64K 21.241 21.341 20.973 21.879
128K 21.851 22.354 21.816 23.203
256K 21.408 21.693 21.846 24.088
512K 21.600 21.899 21.921 24.106
guest -> host [Gbps]
pkt_size before opt p 1 p 2+3 p 4+5
32 0.045 0.046 0.057 0.057
64 0.089 0.091 0.103 0.104
128 0.170 0.179 0.192 0.200
256 0.364 0.351 0.361 0.379
512 0.709 0.699 0.731 0.790
1K 1.399 1.407 1.395 1.427
2K 2.670 2.684 2.745 2.835
4K 5.171 5.199 5.305 5.451
8K 8.442 8.500 10.083 9.941
16K 12.305 12.259 13.519 15.385
32K 11.418 11.150 11.988 24.680
64K 10.778 10.659 11.589 35.273
128K 10.421 10.339 10.939 40.338
256K 10.300 9.719 10.508 36.562
512K 9.833 9.808 10.612 35.979
As Stefan suggested in the v1, I measured also the efficiency in this way:
efficiency = Mbps / (%CPU_Host + %CPU_Guest)
The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.
host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1 p 2+3 p 4+5
32 0.35 0.45 0.79 1.02
64 0.56 0.80 1.41 1.54
128 1.11 1.52 3.03 3.12
256 2.20 2.16 5.44 5.58
512 4.17 4.18 10.96 11.46
1K 8.30 8.26 20.99 20.89
2K 16.82 16.31 39.76 39.73
4K 30.89 30.79 74.07 75.73
8K 53.74 54.49 124.24 148.91
16K 80.68 83.63 200.21 232.79
32K 132.27 132.52 260.81 357.07
64K 229.82 230.40 300.19 444.18
128K 332.60 329.78 331.51 492.28
256K 331.06 337.22 339.59 511.59
512K 335.58 328.50 331.56 504.56
guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1 p 2+3 p 4+5
32 0.43 0.43 0.53 0.56
64 0.85 0.86 1.04 1.10
128 1.63 1.71 2.07 2.13
256 3.48 3.35 4.02 4.22
512 6.80 6.67 7.97 8.63
1K 13.32 13.31 15.72 15.94
2K 25.79 25.92 30.84 30.98
4K 50.37 50.48 58.79 59.69
8K 95.90 96.15 107.04 110.33
16K 145.80 145.43 143.97 174.70
32K 147.06 144.74 146.02 282.48
64K 145.25 143.99 141.62 406.40
128K 149.34 146.96 147.49 489.34
256K 156.35 149.81 152.21 536.37
512K 151.65 150.74 151.52 519.93
[1] https://github.com/stefano-garzarella/iperf/
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.
Move also buf_alloc under the same lock.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).
The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.
This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The livepatching self-tests tweak the dynamic debug config to verify
the kernel log during the tests. Enhance set_dynamic_debug() so that
the config changes are restored when the script exits.
Note this functionality needs to keep in sync with:
- dynamic_debug input/output formatting
- functions affected by set_dynamic_debug()
For example, push_dynamic_debug() transforms:
kernel/livepatch/transition.c:530 [livepatch]klp_init_transition =_ "'%s': initializing %s transition\012"
to the following:
file kernel/livepatch/transition.c line 530 =_
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Previously, using "%m" in a ksft_* format string can result in strange
output because the errno value wasn't saved before calling other libc
functions. The solution is to simply save and restore the errno before
we format the user-supplied format string.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.
Guillaume Nault adds:
And the implementation was broken until 2016 (see 29e73269aa ("pppoe:
fix reference counting in PPPoE proxy")), and nobody ever noticed. I
should probably have removed this ioctl entirely instead of fixing it.
Clearly, it has never been used.
Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.
All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.
This should apply to all stable kernels.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need dev_err() messages when platform_get_irq() fails now that
platform_get_irq() prints an error message itself when something goes
wrong. Let's remove these prints with a simple semantic patch.
// <smpl>
@@
expression ret;
struct platform_device *E;
@@
ret =
(
platform_get_irq(E, ...)
|
platform_get_irq_byname(E, ...)
);
if ( \( ret < 0 \| ret <= 0 \) )
{
(
-if (ret != -EPROBE_DEFER)
-{ ...
-dev_err(...);
-... }
|
...
-dev_err(...);
)
...
}
// </smpl>
While we're here, remove braces on if statements that only have one
statement (manually).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Lemon says:
====================
Finish conversion of skb_frag_t to bio_vec
The recent conversion of skb_frag_t to bio_vec did not include
skb_frag's page_offset. Add accessor functions for this field,
utilize them, and remove the union, restoring the original structure.
v2:
- rename accessors
- follow kdoc conventions
====================
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that page_offset is referenced through accessors, remove
the union, and use bv_offset.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
and skb_frag_off_copy() accessors for page_offset.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: clean up __sctp_connect function
This patchset is to factor out some common code for
sctp_sendmsg_new_asoc() and __sctp_connect() into 2
new functioins.
v1->v2:
- add the patch 1/5 to avoid a slab-out-of-bounds warning.
- add some code comment for the check change in patch 2/5.
- remove unused 'addrcnt' as Marcelo noticed in patch 3/5.
====================
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it adds a peer with the other addr into the
asoc after this asoc is created with the 1st addr.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it creates the asoc and adds a peer with the
1st addr.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc.
To factor out common functions, this patch is to clean up their code
to make them look more similar:
1. create the asoc and add a peer with the 1st addr.
2. add peers with the other addrs into this asoc one by one.
while at it, also remove the unused 'addrcnt'.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
sctp_inet_connect(), the latter has done addr_size check with size
of sa_family_t.
In the next patch to clean up __sctp_connect(), we will remove
addr_size check with size of sa_family_t from __sctp_connect()
for the 1st address.
So before doing that, __sctp_setsockopt_connectx() should do
this check first, as sctp_inet_connect() does.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'addr' passed to sctp_transport_init is not always a whole size
of union sctp_addr, like the path:
sctp_sendmsg() ->
sctp_sendmsg_new_asoc() ->
sctp_assoc_add_peer() ->
sctp_transport_new() -> sctp_transport_init()
In the next patches, we will also pass the address length of data
only to sctp_assoc_add_peer().
So sctp_transport_init() should copy the only available data from
addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
cause slab-out-of-bounds.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull pidfd fixes from Christian Brauner:
"This makes setting the exit_state in exit_notify() consistent after
fixing the pidfd polling race pre-rc1. Related to the race fix, this
adds a WARN_ON() to do_notify_pidfd() to catch any future exit_state
races.
Last, this removes an obsolete comment from the pidfd tests"
* tag 'for-linus-20190730' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
exit: make setting exit_state consistent
pidfd: Add warning if exit_state is 0 during notification
pidfd: remove obsolete comments from test
Pull f2fs fixes from Jaegeuk Kim:
"This set of patches adjust to follow recent setflags changes and fix
two regressions"
* tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: use EINVAL for superblock with invalid magic
f2fs: fix to read source block before invalidating it
f2fs: remove redundant check from f2fs_setflags_common()
f2fs: use generic checking function for FS_IOC_FSSETXATTR
f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
Pull kselftest fixes from Shuah Khan:
"Minor fixes to tests and one major fix to livepatch test to add skip
handling to avoid false fail reports when livepatch is disabled"
* tag 'linux-kselftest-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/livepatch: add test skip handling
selftests: mlxsw: Fix typo in qos_mc_aware.sh
selftests/x86: fix spelling mistake "FAILT" -> "FAIL"
selftests: kmod: Fix typo in kmod.sh
Pull rdma fixes from Jason Gunthorpe:
"A few regression and bug fixes for the patches merged in the last
cycle:
- hns fixes a subtle crash from the ib core SGL rework
- hfi1 fixes various error handling, oops and protocol errors
- bnxt_re fixes a regression where nvmeof doesn't work on some
configurations
- mlx5 fixes a serious 'use after free' bug in how MR caching is
handled
- some edge case crashers in the new statistic core code
- more siw static checker fixups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
IB/counters: Always initialize the port counter object
IB/core: Fix querying total rdma stats
IB/mlx5: Prevent concurrent MR updates during invalidation
IB/mlx5: Fix clean_mr() to work in the expected order
IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
IB/mlx5: Fix unreg_umr to ignore the mkey state
RDMA/siw: Remove set but not used variables 'rv'
IB/mlx5: Replace kfree with kvfree
RDMA/bnxt_re: Honor vlan_id in GID entry comparison
IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn
IB/hfi1: Field not zero-ed when allocating TID flow memory
IB/hfi1: Unreserve a flushed OPFN request
IB/hfi1: Check for error on call to alloc_rsm_map_table
RDMA/hns: Fix sg offset non-zero issue
RDMA/siw: Fix error return code in siw_init_module()
Pull HMM fixes from Jason Gunthorpe:
"Fix the locking around nouveau's use of the hmm_range_* APIs. It works
correctly in the success case, but many of the the edge cases have
missing unlocks or double unlocks.
The diffstat is a bit big as Christoph did a comprehensive job to move
the obsolete API from the core header and into the driver before
fixing its flow, but the risk of regression from this code motion is
low"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
nouveau: unlock mmap_sem on all errors from nouveau_range_fault
nouveau: remove the block parameter to nouveau_range_fault
mm/hmm: move hmm_vma_range_done and hmm_vma_fault to nouveau
mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
Commit 33ec3e53e7 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:
for i in {a..z}{a..z}; do
dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
mkfs.ext2 $i.image
mkdir mnt$i
done
echo "Run"
for i in {a..z}{a..z}; do
mount -o loop -t ext2 $i.image mnt$i &
done
Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.
Fixes: 33ec3e53e7 ("loop: Don't change loop device under exclusive opener")
Cc: stable@vger.kernel.org
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 837158b847 ("dt-bindings: Check the examples against the
schemas") started generating YAML encoded DT files to validate the
examples against the schema. When running 'make dt_binding_check' in
tree after the 1st time, the generated example .dt.yaml files are
mistakenly added to the list of schema files. Exclude *.example.dt.yaml
files from the search for schema files.
Fixes: 837158b847 ("dt-bindings: Check the examples against the schemas")
Reported-by: Guido Günther <agx@sigxcpu.org>
Tested-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Rob Herring <robh@kernel.org>
In xchk_da_btree_block_check_sibling(), there is an if statement on
line 274 to check whether ds->state->altpath.blk[level].bp is NULL:
if (ds->state->altpath.blk[level].bp)
When ds->state->altpath.blk[level].bp is NULL, it is used on line 281:
xfs_trans_brelse(..., ds->state->altpath.blk[level].bp);
struct xfs_buf_log_item *bip = bp->b_log_item;
ASSERT(bp->b_transp == tp);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, ds->state->altpath.blk[level].bp is checked before
being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Since commit b191d6491b ("pidfd: fix a poll race when setting exit_state")
we unconditionally set exit_state to EXIT_ZOMBIE before calling into
do_notify_parent(). This was done to eliminate a race when querying
exit_state in do_notify_pidfd().
Back then we decided to do the absolute minimal thing to fix this and
not touch the rest of the exit_notify() function where exit_state is
set.
Since this fix has not caused any issues change the setting of
exit_state to EXIT_DEAD in the autoreap case to account for the fact hat
exit_state is set to EXIT_ZOMBIE unconditionally. This fix was planned
but also explicitly requested in [1] and makes the whole code more
consistent.
/* References */
[1]: https://lore.kernel.org/lkml/CAHk-=wigcxGFR2szue4wavJtH5cYTTeNES=toUBVGsmX0rzX+g@mail.gmail.com
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
This patch corrects the SPDX License Identifier style
in header files related to Drivers for Intel(R) Trace Hub
controller.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rxkad sometimes triggers a warning about oversized stack frames when
building with clang for a 32-bit architecture:
net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=]
net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=]
The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in
rxkad_verify_packet()/rxkad_secure_packet() with the relatively large
scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt().
The warning does not show up when using gcc, which does not inline the
functions as aggressively, but the problem is still the same.
Allocate the cipher buffers from the slab instead, caching the allocated
packet crypto request memory used for DATA packet crypto in the rxrpc_call
struct.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
Here are a couple of fixes for rxrpc:
(1) Fix a potential deadlock in the peer keepalive dispatcher.
(2) Fix a missing notification when a UDP sendmsg error occurs in rxrpc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If PHYLIB is not set, build enetc will fails:
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open':
enetc.c: undefined reference to `phy_disconnect'
enetc.c: undefined reference to `phy_start'
drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close':
enetc.c: undefined reference to `phy_stop'
enetc.c: undefined reference to `phy_disconnect'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_get_link_ksettings'
drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_set_link_ksettings'
drivers/net/ethernet/freescale/enetc/enetc_mdio.o: In function `enetc_mdio_probe':
enetc_mdio.c: undefined reference to `mdiobus_alloc_size'
enetc_mdio.c: undefined reference to `mdiobus_free'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent changes that introduced support for Page Pool in stmmac, Jon
reported that NFS boot was no longer working on an ARM64 based platform
that had the IP behind an IOMMU.
As Page Pool API does not guarantee DMA syncing because of the use of
DMA_ATTR_SKIP_CPU_SYNC flag, we have to explicit sync the whole buffer upon
re-allocation because we are always re-using same pages.
In fact, ARM64 code invalidates the DMA area upon two situations [1]:
- sync_single_for_cpu(): Invalidates if direction != DMA_TO_DEVICE
- sync_single_for_device(): Invalidates if direction == DMA_FROM_DEVICE
So, as we must invalidate both the current RX buffer and the newly allocated
buffer we propose this fix.
[1] arch/arm64/mm/cache.S
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 2af6106ae9 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Tested-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently are duplicated checks on orig_egr_types which are
redundant, I believe this is a typo and should actually be
orig_ing_types || orig_egr_types instead of the expression
orig_egr_types || orig_egr_types. Fix these.
Addresses-Coverity: ("Same on both sides")
Fixes: c6b36bdd04 ("mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is perfectly ok to not have an gpio attached to the fixed-link node. So
the driver should not throw an error message when the gpio is missing.
Fixes: 5468e82f70 ("net: phy: fixed-phy: Drop GPIO from fixed_phy_add()")
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In qla2x00_alloc_fcport(), fcport is assigned to NULL in the error
handling code on line 4880:
fcport = NULL;
Then fcport is used on lines 4883-4886:
INIT_WORK(&fcport->del_work, qla24xx_delete_sess_fn);
INIT_WORK(&fcport->reg_work, qla_register_fcport_fn);
INIT_LIST_HEAD(&fcport->gnl_entry);
INIT_LIST_HEAD(&fcport->list);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, qla2x00_alloc_fcport() directly returns NULL
in the error handling code.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
per hardware design, if DMA-able range contains all 64-bits
set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault.
E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000
bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then
HBA will fault the firmware.
Driver will set 63-bit DMA mask to ensure the above address will not be
used.
Cc: <stable@vger.kernel.org> # 5.1.20+
Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a race condition between removing glue directory and adding a new
device under the glue dir. It can be reproduced in following test:
CPU1: CPU2:
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add_internal()
create_dir() // create glue_dir
device_add()
get_device_parent()
kobject_get() // get glue_dir
device_del()
cleanup_glue_dir()
kobject_del(glue_dir)
kobject_add()
kobject_add_internal()
create_dir() // in glue_dir
sysfs_create_dir_ns()
kernfs_create_dir_ns(sd)
sysfs_remove_dir() // glue_dir->sd=NULL
sysfs_put() // free glue_dir->sd
// sd is freed
kernfs_new_node(sd)
kernfs_get(glue_dir)
kernfs_add_one()
kernfs_put()
Before CPU1 remove last child device under glue dir, if CPU2 add a new
device under glue dir, the glue_dir kobject reference count will be
increase to 2 via kobject_get() in get_device_parent(). And CPU2 has
been called kernfs_create_dir_ns(), but not call kernfs_new_node().
Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in
glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call
kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase
it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next
call kernfs_add_one() by CPU2 will fail(This is also use-after-free)
and call kernfs_put() to decrease reference count. Because the reference
count is decremented to 0, it will also call kmem_cache_free() to free
the glue_dir->sd again. This will result in double free.
In order to avoid this happening, we also should make sure that kernfs_node
for glue_dir is released in CPU1 only when refcount for glue_dir kobj is
1 to fix this race.
The following calltrace is captured in kernel 4.14 with the following patch
applied:
commit 726e410979 ("drivers: core: Remove glue dirs from sysfs earlier")
--------------------------------------------------------------------------
[ 3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494
Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get().
....
[ 3.633986] Call trace:
[ 3.633991] kernfs_create_dir_ns+0xa8/0xb0
[ 3.633994] sysfs_create_dir_ns+0x54/0xe8
[ 3.634001] kobject_add_internal+0x22c/0x3f0
[ 3.634005] kobject_add+0xe4/0x118
[ 3.634011] device_add+0x200/0x870
[ 3.634017] _request_firmware+0x958/0xc38
[ 3.634020] request_firmware_into_buf+0x4c/0x70
....
[ 3.634064] kernel BUG at .../mm/slub.c:294!
Here is BUG_ON(object == fp) in set_freepointer().
....
[ 3.634346] Call trace:
[ 3.634351] kmem_cache_free+0x504/0x6b8
[ 3.634355] kernfs_put+0x14c/0x1d8
[ 3.634359] kernfs_create_dir_ns+0x88/0xb0
[ 3.634362] sysfs_create_dir_ns+0x54/0xe8
[ 3.634366] kobject_add_internal+0x22c/0x3f0
[ 3.634370] kobject_add+0xe4/0x118
[ 3.634374] device_add+0x200/0x870
[ 3.634378] _request_firmware+0x958/0xc38
[ 3.634381] request_firmware_into_buf+0x4c/0x70
--------------------------------------------------------------------------
Fixes: 726e410979 ("drivers: core: Remove glue dirs from sysfs earlier")
Signed-off-by: Muchun Song <smuchun@gmail.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>