Introduce ib_umem_dmabuf_get_pinned() which allows the driver to get a
dmabuf umem which is pinned and does not require move_notify callback
implementation.
The returned umem is pinned and DMA mapped like standard cpu umems, and is
released through ib_umem_release() (incl. unpinning and unmapping).
Link: https://lore.kernel.org/r/20211012120903.96933-3-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
All tcp_remove_empty_skb() callers now use tcp_write_queue_tail()
for the skb argument, we can therefore factorize code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hyperv provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyperv requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB page.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Add flag returned by FUSE_OPEN and FUSE_CREATE requests to avoid flushing
data cache on close.
Different filesystems implement ->flush() is different ways:
- Most disk filesystems do not implement ->flush() at all
- Some network filesystem (e.g. nfs) flush local write cache of
FMODE_WRITE file and send a "flush" command to server
- Some network filesystem (e.g. cifs) flush local write cache of
FMODE_WRITE file without sending an additional command to server
FUSE flushes local write cache of ANY file, even non FMODE_WRITE
and sends a "flush" command to server (if server implements it).
The FUSE implementation of ->flush() seems over agressive and
arbitrary and does not make a lot of sense when writeback caching is
disabled.
Instead of deciding on another arbitrary implementation that makes
sense, leave the choice of per-file flush behavior in the hands of
the server.
Link: https://lore.kernel.org/linux-fsdevel/CAJfpegspE8e6aKd47uZtSYX8Y-1e1FWS0VL0DH2Skb9gQP5RJQ@mail.gmail.com/
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add FourCCs for 10- and 12-bit red formats with padding to 16 bits.
They correspond to the V4L2 10- and 12-bit greyscale (V4L2_PIX_FMT_Y10
and V4L2_PIX_FMT_Y12) formats, as well as the Bayer formats with the
same bit depth (V4L2_PIX_FMT_SBGGR{10,12} and all other Bayer pattern
permutations).
These formats are not used by any kernel driver at this point, but need
to be exposed to applications by libcamera, which uses DRM FourCCs for
pixel formats.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027233140.12268-1-laurent.pinchart@ideasonboard.com
In command DSP models, one meter information consists of 4 bytes for
IEEE 764 floating point (binary32). In previous patch, it is exported
to userspace as 32 bit storage since the storage is also handled in
ALSA firewire-motu driver as well in kernel space in which floating point
arithmetic is not preferable. On the other hand, ALSA firewire-motu driver
doesn't perform floating point calculation. The driver just gather meter
information from isochronous packets and fill structure fields for
userspace.
In 'header' target of Kbuild, UAPI headers are processed before installed.
In this timing, #ifdef macro with __KERNEL__ is removed. This mechanism
is useful in the case so that the 32 bit storage can be accessible as u32
type in kernel space and float type in user space. We can see the same
usage in ''struct acct_v3' in 'include/uapi/linux/acct.h'.
This commit is for the above idea. Additionally, due to message
protocol, meter information is filled with 0xffffffff in the end of
period but 0xffffffff is invalid as binary32. To avoid confusion in
userspace application, the last two elements are left without any
assignment.
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20211027125529.54295-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
After further investigation, I realize that the total number of elements
in array is not enough to store all of related messages from device.
This commit refines meter array and message parser.
In terms of channel identifier, register DSP models are classified to
two categories:
1. the target of output is selectable
828mk2, 896hd, and Traveler are in the category. They transfer messages
with channel identifier between 0x00 and 0x13 for input meters,
therefore 20 elements are needed to store.
On the other hand, they transfer messages with channel identifier for one
pair of output meters. The selection is done by asynchronous write
transaction to offset 0x'ffff'f000'0b2c. The table for relationship
between written value and available identifiers is below:
============= ===============
written value identifier pair
============= ===============
0x00000b00 0x80/0x81
0x00000b01 0x82/0x83
... ...
0x00000b0b 0x96/0x97
... ...
0x00000b10 0xa0/0xa1
... ...
0x00000b3f 0xfe/0xff
... ...
greater 0xfe/0xff
============= ===============
Actually in the above three models, 0x96/0x97 pair is the maximum. Thus
the number of available output meter is 24.
2. all of output is available
8 pre, Ultralite, Audio Express, and 4 pre are in the category. They
transfer messages for output meters without any selection. The table for
available identifier for each direction is below:
============== ========= ==========
model input output
============== ========= ==========
8 pre 0x00-0x0f 0x82-0x8d
Ultralite 0x00-0x09 0x82-0x8f
Audio Express 0x00-0x09 0x80-0x8d
4 pre 0x00-0x09 0x80-0x8d
============== ========= ==========
Some of available identifiers might not be used for actual output meters.
Anyway, 24 plus 24 elements accommodate the input/output meters.
I note that isochronous packet from V3HD/V4HD deliver no message.
Notification by asynchronous transaction to registered address seems to be
used for the purpose as well as for change of mixer parameter.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20211027125529.54295-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
v4.17 commit 86b87cde0b ("scsi: core: host template attribute groups")
introduced explicit sysfs_create_groups() in scsi_sysfs_add_sdev()
and sysfs_remove_groups() in __scsi_remove_device(), both for sdev_gendev,
based on a new field const struct attribute_group **sdev_groups
of struct scsi_host_template.
Commit 92c4b58b15 ("scsi: core: Register sysfs attributes earlier")
removed above explicit (de)registration of scsi_device attribute groups.
It also converted all scsi_device attributes and attribute_groups to
end up in a new field const struct attribute_group *gendev_attr_groups[6]
of struct scsi_device. However, that new field was not used anywhere.
Surprisingly, this only caused missing LLDD specific scsi_device sysfs
attributes. Whereas, scsi core attributes from scsi_sdev_attr_groups
did continue to exist because of scsi_dev_type.groups.
We separate scsi core attibutes from LLDD specific attributes.
Hence, we keep the initializing assignment scsi_dev_type =
{ .groups = scsi_sdev_attr_groups, } as this takes care of core
attributes. Without the separation, it would cause attribute double
registration due to scsi_dev_type.groups and sdev_gendev.groups.
Julian suggested to assign the sdev_groups pointer of the
scsi_host_template directly to the groups pointer of sdev_gendev.
This way we can delete the container scsi_device.gendev_attr_groups
and the loop copying each entry from hostt->sdev_groups to
sdev->gendev_attr_groups.
Alternative approaches ruled out:
Assigning gendev_attr_groups to sdev_dev has no visible effect.
Assigning sdev->gendev_attr_groups to scsi_dev_type.groups
caused scsi_device of all scsi host types to get LLDD specific
attributes of the LLDD for which the last sdev alloc happened to occur,
as that overwrote scsi_dev_type.groups,
e.g. scsi_debug had zfcp-specific scsi_device attributes.
Link: https://lore.kernel.org/r/20211026014240.4098365-1-maier@linux.ibm.com
Fixes: 92c4b58b15 ("scsi: core: Register sysfs attributes earlier")
Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A later patch will change the MPTCP memory accounting schema
in such a way that MPTCP sockets will encode the total amount of
forward allocated memory in two separate fields (one for tx and
one for rx).
MPTCP sockets will use their own helper to provide the accurate
amount of fwd allocated memory.
To allow the above, this patch adds a new, optional, sk method to
fetch the fwd memory, wrap the call in a new helper and use it
where it is appropriate.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A following patch is going to implement a similar reclaim schema
for the MPTCP protocol, with different locking.
Let's define a couple of macros for the used thresholds, so
that the latter code will be more easily maintainable.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported data-races in inet_getname() multiple times,
it is time we fix this instead of pretending applications
should not trigger them.
getsockname() and getpeername() are not really considered fast path.
v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration
needed when CONFIG_CGROUP_BPF=n, as reported by
kernel test robot <lkp@intel.com>
syzbot typical report:
BUG: KCSAN: data-race in __inet_hash_connect / inet_getname
write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1:
__inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831
inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853
tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275
__inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728
__sys_connect_file net/socket.c:1896 [inline]
__sys_connect+0x254/0x290 net/socket.c:1913
__do_sys_connect net/socket.c:1923 [inline]
__se_sys_connect net/socket.c:1920 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1920
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0:
inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790
__sys_getsockname+0x11d/0x1b0 net/socket.c:1946
__do_sys_getsockname net/socket.c:1961 [inline]
__se_sys_getsockname net/socket.c:1958 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:1958
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000 -> 0xdee0
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently rcu_barrier() is used to ensure that no readers of the
inactive mini_Qdisc buffer remain before it is reused. This waits for
any pending RCU callbacks to complete, when all that is actually
required is to wait for one RCU grace period to elapse after the buffer
was made inactive. This means that using rcu_barrier() may result in
unnecessary waits.
To improve this, store the current RCU state when a buffer is made
inactive and use poll_state_synchronize_rcu() to check whether a full
grace period has elapsed before reusing it. If a full grace period has
not elapsed, wait for a grace period to elapse, and in the non-RT case
use synchronize_rcu_expedited() to hasten it.
Since this approach eliminates the RCU callback it is no longer
necessary to synchronize_rcu() in the tp_head==NULL case. However, the
RCU state should still be saved for the previously active buffer.
Before this change I would typically see mini_qdisc_pair_swap() take
tens of milliseconds to complete. After this change it typcially
finishes in less than 1 ms, and often it takes just a few microseconds.
Thanks to Paul for walking me through the options for improving this.
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the dedicated wake IRQ is level trigger, and it uses the
device's low-power status as the wakeup source, that means if the
device is not in low-power state, the wake IRQ will be triggered
if enabled; For this case, need enable the wake IRQ after running
the device's ->runtime_suspend() which make it enter low-power state.
e.g.
Assume the wake IRQ is a low level trigger type, and the wakeup
signal comes from the low-power status of the device.
The wakeup signal is low level at running time (0), and becomes
high level when the device enters low-power state (runtime_suspend
(1) is called), a wakeup event at (2) make the device exit low-power
state, then the wakeup signal also becomes low level.
------------------
| ^ ^|
---------------- | | --------------
|<---(0)--->|<--(1)--| (3) (2) (4)
if enable the wake IRQ before running runtime_suspend during (0),
a wake IRQ will arise, it causes resume immediately;
it works if enable wake IRQ ( e.g. at (3) or (4)) after running
->runtime_suspend().
This patch introduces a new status WAKE_IRQ_DEDICATED_REVERSE to
optionally support enabling wake IRQ after running ->runtime_suspend().
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
apei_hest_parse() is only used in hest.c, so mark it static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ rjw: Minor subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 316580b69d ("u64_stats: provide u64_stats_t type")
fixed possible load/store tearing on 64bit arches.
For instance the following C code
stats->nsecs += sched_clock() - start;
Could be rightfully implemented like this by a compiler,
confusing concurrent readers a lot:
stats->nsecs += sched_clock();
// arbitrary delay
stats->nsecs -= start;
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
Vinod writes:
phy-for-5.16
- New support:
- Kirin 970 PCIe PHY driver
- Qualcomm QCM2290 USB2 and USB3 support
- Updates:
- Qualcomm synopsis phy driver updates
- sc8180x PCIe update
- cadence-torrent driver updates for output reference clock
- stm32 phy tuning support
* tag 'phy-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (28 commits)
phy: Sparx5 Eth SerDes: Fix return value check in sparx5_serdes_probe()
phy: qcom-snps: Correct the FSEL_MASK
phy: hisilicon: Add of_node_put() in phy-hisi-inno-usb2
phy: qcom-qmp: another fix for the sc8180x PCIe definition
phy: cadence-torrent: Add support to output received reference clock
phy: cadence-torrent: Model reference clock driver as a clock to enable derived refclk
dt-bindings: phy: cadence-torrent: Add clock IDs for derived and received refclk
phy: cadence-torrent: Migrate to clk_hw based registration and OF APIs
phy: ti: gmii-sel: check of_get_address() for failure
dt-bindings: phy: qcom,qmp: IPQ6018 and IPQ8074 PCIe PHY require no supply
phy: stm32: add phy tuning support
dt-bindings: phy: phy-stm32-usbphyc: add optional phy tuning properties
phy: stm32: restore utmi switch on resume
dt-bindings: phy: rockchip: remove usb-phy fallback string for rk3066a/rk3188
phy: qcom-qusb2: Fix a memory leak on probe
phy: qcom-qmp: Add QCM2290 USB3 PHY support
dt-bindings: phy: qcom,qmp: Add QCM2290 USB3 PHY
phy: qcom-qusb2: Add missing vdd supply
dt-bindings: phy: qcom,qusb2: Add missing vdd-supply
phy: rockchip-inno-usb2: Make use of the helper function devm_add_action_or_reset()
...
To reduce code churn, the same patch makes multiple changes, since they
all touch the same lines:
1. The implementations for these two are identical, just with different
function pointers. Reduce duplications and name the function pointers
"mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.
2. Drop the "const" attribute from "orig_dev". If the driver needs to
check whether orig_dev belongs to itself and then
call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
can't, because call_switchdev_notifiers takes a non-const struct
net_device *.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FAN_FS_ERROR allows events without inodes - i.e. for file system-wide
errors. Even though fsnotify_handle_inode_event is not currently used
by fanotify, this patch protects other backends from cases where neither
inode or dir are provided. Also document the constraints of the
interface (inode and dir cannot be both NULL).
Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Some file system events (i.e. FS_ERROR) might not be associated with an
inode or directory. For these, we can retrieve the super block from the
data field. But, since the super_block is available in the data field
on every event type, simplify the code to always retrieve it from there,
through a new helper.
Link: https://lore.kernel.org/r/20211025192746.66445-11-krisman@collabora.com
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY.
Use it to pass the dentry instead of it's ->d_inode where available.
This is needed in preparation to the refactor to retrieve the super
block from the data field. In some cases (i.e. mkdir in kernfs), the
data inode comes from a negative dentry, such that no super block
information would be available. By receiving the dentry itself, instead
of the inode, fsnotify can derive the super block even on these cases.
Link: https://lore.kernel.org/r/20211025192746.66445-3-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
[Expand explanation in commit message]
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>