This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.
With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.
When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.
Function bpf_clear_redirect_map() was removed in
commit ee75aef23a ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.
With test topology:
+-------------------+ +-------------------+
| Host A (i40e 10G) | ---------- | eno1(i40e 10G) |
+-------------------+ | |
| Host B |
+-------------------+ | |
| Host C (i40e 10G) | ---------- | eno2(i40e 10G) |
+-------------------+ | |
| +------+ |
| veth0 -- | Peer | |
| veth1 -- | | |
| veth2 -- | NS | |
| +------+ |
+-------------------+
On Host A:
# pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.
Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):
5.12 rc4 | redirect_map i40e->i40e | 2.0M | 9.7M
5.12 rc4 | redirect_map i40e->veth | 1.7M | 11.8M
5.12 rc4 + patch | redirect_map i40e->i40e | 2.0M | 9.6M
5.12 rc4 + patch | redirect_map i40e->veth | 1.7M | 11.7M
Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:
5.12 rc4 + patch | redirect_map multi i40e->veth (x1) | 1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi i40e->veth (x2) | 1.1M | 4.3M
5.12 rc4 + patch | redirect_map multi i40e->veth (x3) | 0.8M | 2.6M
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
Since commit 9ec37efb87 ("PCI/MSI: Make pci_host_common_probe() declare
its reliance on MSI domains"), platforms that rely on the "msi-map"
device-tree property don't get MSIs anymore.
On the Arm Fast Model for example [1], the host bridge doesn't have a
"msi-parent" property since it doesn't itself generate MSIs, and so doesn't
get a MSI domain. It has an "msi-map" property instead to describe MSI
controllers of child devices. As a result, due to the new msi_domain check
in pci_register_host_bridge(), the whole bus gets PCI_BUS_FLAGS_NO_MSI.
Check whether the root complex has an "msi-map" property before giving
up on MSIs.
[1] arch/arm64/boot/dts/arm/fvp-base-revc.dts
Fixes: 9ec37efb87 ("PCI/MSI: Make pci_host_common_probe() declare its reliance on MSI domains")
Link: https://lore.kernel.org/r/20210510173129.750496-1-jean-philippe@linaro.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Currently the source's export is mounted and unmounted on every
inter-server copy operation. This patch is an enhancement to delay
the unmount of the source export for a certain period of time to
eliminate the mount and unmount overhead on subsequent copy operations.
After a copy operation completes, a work entry is added to the
delayed unmount list with an expiration time. This list is serviced
by the laundromat thread to unmount the export of the expired entries.
Each time the export is being used again, its expiration time is
extended and the entry is re-inserted to the tail of the list.
The unmount task and the mount operation of the copy request are
synced to make sure the export is not unmounted while it's being
used.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.
v3: - Only sync data if using GEM objects backed by non-coherent memory.
- Use a drm_device pointer instead of device pointer in prototype
v5: - Rename to drm_fb_cma_sync_non_coherent
- Invert loops for better cache locality
- Only sync BOs that have the non-coherent flag
- Move to drm_fb_cma_helper.c to avoid circular dependency
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210523170415.90410-3-paul@crapouillou.net
Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.
Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.
Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.
v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
the objects are mapped, and use the new dma_mmap_pages function.
v4: Make sure map_noncoherent is always disabled when creating GEM
objects meant to be used with dma-buf.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210523170415.90410-2-paul@crapouillou.net
ASoC: Fixes for v5.13
A collection of fixes that have come in since the merge window, mainly
device specific things. The fixes to the generic cards from
Morimoto-san are handling regressions that were introduced in the merge
window on at least the Kontron sl28-var3-ads2.
Although the power state check is performed in various places (e.g. at
the entrance of quite a few ioctls), there can be still some pending
tasks that already went into the ioctl handler or other ops, and those
may access the hardware even after the power state check. For
example, kcontrol access ioctl paths that call info/get/put callbacks
may update the hardware registers. If a system wants to assure the
free from such hw access (like the case of PCI rescan feature we're
going to implement in future), this situation must be avoided, and we
have to sync such in-flight tasks finishing beforehand.
For that purpose, this patch introduces a few new things in core code:
- A refcount, power_ref, and a wait queue, power_ref_sleep, to the
card object
- A few new helpers, snd_power_ref(), snd_power_unref(),
snd_power_ref_and_wait(), and snd_power_sync_ref()
In the code paths that call kctl info/read/write/tlv ops, we check the
power state with the newly introduced snd_power_ref_and_wait(). This
function also takes the card.power_ref refcount for tracking this
in-flight task. Once after the access finishes, snd_power_unref() is
called to released the refcount in return. So the driver can sync via
snd_power_sync_ref() assuring that all in-flight tasks have been
finished.
As of this patch, snd_power_sync_ref() is called only at
snd_card_disconnect(), but it'll be used in other places in future.
Note that atomic_t is used for power_ref intentionally instead of
refcount_t. It's because of the design of refcount_t type; refcount_t
cannot be zero-based, and it cannot do dec_and_test() call for
multiple times, hence it's not suitable for our purpose.
Also, this patch changes snd_power_wait() to accept only
SNDRV_CTL_POWER_D0, which is the only value that makes sense.
In later patch, the snd_power_wait() calls will be cleaned up.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20210523090920.15345-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix some spelling mistakes in comments:
aother ==> another
Netiher ==> Neither
desribe ==> describe
intializing ==> initializing
funciton ==> function
wont ==> won't and move the word 'the' at the end to the next line
accross ==> across
pathes ==> paths
triggerred ==> triggered
excute ==> execute
ether ==> either
conervative ==> conservative
convetion ==> convention
markes ==> marks
interpeter ==> interpreter
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com
Pull cgroup fixes from Tejun Heo:
- "cgroup_disable=" boot param was being applied too late confusing
some subsystems. Fix it by moving application to __setup() time.
- Comment spelling fixes. Included here to lower the chance of trivial
future merge conflicts.
* 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix spelling mistakes
cgroup: disable controllers at parse time
Pull spi fixes from Mark Brown:
"There's some device specific fixes here but also an unusually large
number of fixes for the core, including both fixes for breakage
introduced on ACPI systems while fixing the long standing confusion
about the polarity of GPIO chip selects specified through DT, and
fixes for ordering issues on unregistration which have been exposed
through the wider usage of devm_."
* tag 'spi-fix-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sc18is602: implement .max_{transfer,message}_size() for the controller
spi: sc18is602: don't consider the chip select byte in sc18is602_check_transfer
MAINTAINERS: Add Alain Volmat as STM32 SPI maintainer
dt-bindings: spi: spi-mux: rename flash node
spi: Don't have controller clean up spi device before driver unbind
spi: Assume GPIO CS active high in ACPI case
spi: sprd: Add missing MODULE_DEVICE_TABLE
spi: Switch to signed types for *_native_cs SPI controller fields
spi: take the SPI IO-mutex in the spi_set_cs_timing method
spi: spi-fsl-dspi: Fix a resource leak in an error handling path
spi: spi-zynq-qspi: Fix stack violation bug
spi: spi-zynq-qspi: Fix kernel-doc warning
spi: altera: Make SPI_ALTERA_CORE invisible
spi: Fix spi device unregister flow
Fix some spelling mistakes in comments:
hierarhcy ==> hierarchy
automtically ==> automatically
overriden ==> overridden
In absense of .. or ==> In absence of .. and
assocaited ==> associated
taget ==> target
initate ==> initiate
succeded ==> succeeded
curremt ==> current
udpated ==> updated
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The dormant flag need to be updated from the preparation phase,
otherwise, two consecutive requests to dorm a table in the same batch
might try to remove the same hooks twice, resulting in the following
warning:
hook not found, pf 3 num 0
WARNING: CPU: 0 PID: 334 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
Modules linked in:
CPU: 0 PID: 334 Comm: kworker/u4:5 Not tainted 5.12.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:__nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
This patch is a partial revert of 0ce7cf4127 ("netfilter: nftables:
update table flags from the commit phase") to restore the previous
behaviour.
However, there is still another problem: A batch containing a series of
dorm-wakeup-dorm table and vice-versa also trigger the warning above
since hook unregistration happens from the preparation phase, while hook
registration occurs from the commit phase.
To fix this problem, this patch adds two internal flags to annotate the
original dormant flag status which are __NFT_TABLE_F_WAS_DORMANT and
__NFT_TABLE_F_WAS_AWAKEN, to restore it from the abort path.
The __NFT_TABLE_F_UPDATE bitmask allows to handle the dormant flag update
with one single transaction.
Reported-by: syzbot+7ad5cd1615f2d89c6e7e@syzkaller.appspotmail.com
Fixes: 0ce7cf4127 ("netfilter: nftables: update table flags from the commit phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The tags used for an IO scheduler are currently per hctx.
As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.
This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.
Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b
In that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.
Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).
For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have already delete block_dump feature in mark_inode_dirty() because
it can be replaced by tracepoints, now we also remove the part in
submit_bio() for the same reason. The part of block dump feature in
submit_bio() dump the write process, write region and sectors on the
target disk into kernel message. it can be replaced by
block_bio_queue tracepoint in submit_bio_checks(), so we do not need
block_dump anymore, remove the whole block_dump feature.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210313030146.2882027-3-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Export irq_set_affinity() for cleaning up drivers/perf
Pull export of irq_set_affinity() from Thomas Gleixner, so we can convert
all new and exiting Arm PMU drivers to the new interface.
* tag 'irq-export-set-affinity' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Export affinity setter for modules
Current .n_voltages settings do not cover the latest 2 valid selectors,
so it fails to set voltage for the hightest voltage support.
The latest linear range has step_uV = 0, so it does not matter if we
count the .n_voltages to maximum selector + 1 or the first selector of
latest linear range + 1.
To simplify calculating the n_voltages, let's just set the
.n_voltages to maximum selector + 1.
Fixes: 522498f8cb ("regulator: bd71828: Basic support for ROHM bd71828 PMIC regulators")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Reviewed-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/20210523071045.2168904-2-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Until now, the MPEG-2 V4L2 API was not exported as a public API,
and only defined in a private media header (media/mpeg2-ctrls.h).
After reviewing the MPEG-2 specification in detail, and reworking
the controls so they match the MPEG-2 semantics properly,
we can consider it ready.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The Hantro and Cedrus drivers work in frame-mode,
meaning they expect all the slices in a picture (either frame
or field structure) to be passed in each OUTPUT buffer.
These two are the only V4L2 MPEG-2 stateless decoders currently
supported. Given the VA-API drivers also work per-frame,
coalescing all the MPEG-2 slices in a buffer before the decoding
operation, it makes sense to not expect slice-mode drivers and
therefore remove V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS.
This is done to avoid carrying an unused interface. If needed,
this control can be added without breaking backwards compatibility.
Note that this would mean introducing a enumerator control to
specify the decoding mode (see V4L2_CID_STATELESS_H264_DECODE_MODE).
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Co-developed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Typically, bitstreams are composed of a sequence header,
followed by a number of picture header and picture coding extension
headers. Each picture can be composed of a number of slices.
Let's split the MPEG-2 uAPI to follow these semantics more closely,
allowing more usage flexibility. Having these controls split up
allows applications to set a sequence control at the beginning
of a sequence, and then set a picture control for each frame.
While here add padding fields where needed, and document
the uAPI header thoroughly.
Note that the V4L2_CTRL_TYPE_{} defines had to be moved because
it clashes with existing ones. This is not really an issue
since they will be re-defined when the controls are moved
out of staging.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Tested-by: Jonas Karlman <jonas@kwiboo.se>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
As stated in the MPEG-2 specification, section 6.3.7 "Quant matrix
extension":
Each quantisation matrix has a default set of values. When a
sequence_header_code is decoded all matrices shall be reset to
their default values. User defined matrices may be downloaded
and this can occur in a sequence_header() or in a
quant_matrix_extension().
The load_intra_quantiser_matrix syntax elements are transmitted
in the bitstream headers, signalling that a quantisation matrix
needs to be loaded and used for pictures transmitted afterwards
(until the matrices are reset).
This "load" semantics are implemented in the V4L2 interface
without the need of any "load" flags: passing the control
is effectively a load.
Therefore, rework the V4L2_CID_MPEG_VIDEO_MPEG2_QUANTISATION
semantics to match the MPEG-2 semantics. Quantisation matrices
values are now initialized by the V4L2 control core to their
reset default value, and applications are expected to reset
their values as specified.
The quantisation control is therefore optional, and used to
load bitstream-defined values in the quantisation matrices.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Co-developed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Pull block fixes from Jens Axboe:
- Fix BLKRRPART and deletion race (Gulam, Christoph)
- NVMe pull request (Christoph):
- nvme-tcp corruption and timeout fixes (Sagi Grimberg, Keith
Busch)
- nvme-fc teardown fix (James Smart)
- nvmet/nvme-loop memory leak fixes (Wu Bo)"
* tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block:
block: fix a race between del_gendisk and BLKRRPART
block: prevent block device lookups at the beginning of del_gendisk
nvme-fc: clear q_live at beginning of association teardown
nvme-tcp: rerun io_work if req_list is not empty
nvme-tcp: fix possible use-after-completion
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvmet: fix memory leak in nvmet_alloc_ctrl()
SG BOs such as dmabuf imports and userptr BOs do not consume system
resources directly. Instead they point to resources owned elsewhere.
They typically get evicted by DMABuf move notifiers of MMU notifiers.
If those notifiers don't need to wait for hardware fences (i.e. the SG
BOs are used in a preemptible context), then we don't need to limit
them to the GTT size and we don't need TTM to evict them.
Create a new placement for such preemptible SG BOs that does not impose
artificial size limits and TTM evictions.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When device_link_free() drops references to the supplier and
consumer devices of the device link going away and the reference
being dropped turns out to be the last one for any of those
device objects, its ->release callback will be invoked and it
may sleep which goes against the SRCU callback execution
requirements.
To address this issue, make the device link removal code carry out
the device_link_free() actions preceded by SRCU synchronization from
a separate work item (the "long" workqueue is used for that, because
it does not matter when the device link memory is released and it may
take time to get to that point) instead of using SRCU callbacks.
While at it, make the code work analogously when SRCU is not enabled
to reduce the differences between the SRCU and non-SRCU cases.
Fixes: 843e600b8a ("driver core: Fix sleeping in invalid context during device link deletion")
Cc: stable <stable@vger.kernel.org>
Reported-by: chenxiang (M) <chenxiang66@hisilicon.com>
Tested-by: chenxiang (M) <chenxiang66@hisilicon.com>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/5722787.lOV4Wx5bFT@kreacher
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>