On little-endian systems, doing subtraction after htons()
leads to interesting results:
Given:
MAGIC_BYTES = 123 = 0x007B aka. in big endian: 0x7B00 = 31488
sizeof(struct iphdr) = 20
Before this patch:
__bpf_constant_htons(MAGIC_BYTES) - sizeof(struct iphdr) = 0x7AEC
0x7AEC = htons(0xEC7A) = htons(60538)
So these were outer IP packets with a total length of 123 bytes,
containing an inner IP packet with a total length of 60538 bytes.
After this patch:
__bpf_constant_htons(MAGIC_BYTES - sizeof(struct iphdr)) = htons(103)
Now these packets are outer IP packets with a total length of 123 bytes,
containing an inner IP packet with a total length of 103 bytes.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20240808075906.1849564-1-ast@fiberby.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Alan Maguire says:
====================
As previously discussed here [1], long-lived sockets can miss
a chance to set additional callbacks if a sock ops program
was not attached early in their lifetime. Adding support
to bpf_setsockopt() to set callback flags (and bpf_getsockopt()
to retrieve them) provides other opportunities to enable callbacks,
either directly via a cgroup/setsockopt intercepted setsockopt()
or via a socket iterator.
Patch 1 adds bpf_[get|set]sockopt() support; patch 2 adds testing
for it via a sockops programs, along with verification via a
cgroup/getsockopt program.
Changes since v1 [2]:
- Removed unneeded READ_ONCE() (Martin, patch 1)
- Reworked sockopt test to leave existing tests undisturbed while adding
test_nonstandard_opt() test to cover the TCP_BPF_SOCK_OPS_CB_FLAGS
case; test verifies that value set via bpf_setsockopt() is what we
expect via a call to getsockopt() which is caught by a
cgroup/getsockopt program to provide the flags value (Martin, patch 2)
- Removed unneeded iterator test (Martin)
[1] https://lore.kernel.org/bpf/f42f157b-6e52-dd4d-3d97-9b86c84c0b00@oracle.com/
[2] https://lore.kernel.org/bpf/20240802152929.2695863-1-alan.maguire@oracle.com/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Currently the only opportunity to set sock ops flags dictating
which callbacks fire for a socket is from within a TCP-BPF sockops
program. This is problematic if the connection is already set up
as there is no further chance to specify callbacks for that socket.
Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_setsockopt() and bpf_getsockopt()
to allow users to specify callbacks later, either via an iterator
over sockets or via a socket-specific program triggered by a
setsockopt() on the socket.
Previous discussion on this here [1].
[1] https://lore.kernel.org/bpf/f42f157b-6e52-dd4d-3d97-9b86c84c0b00@oracle.com/
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20240808150558.1035626-2-alan.maguire@oracle.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Increase size of queue_name buffer from 30 to 31 to accommodate
the largest string written to it. This avoids truncation in
the possibly unlikely case where the string is name is the
maximum size.
Flagged by gcc-14:
.../mvpp2_main.c: In function 'mvpp2_probe':
.../mvpp2_main.c:7636:32: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=]
7636 | "stats-wq-%s%s", netdev_name(priv->port_list[0]->dev),
| ^
.../mvpp2_main.c:7635:9: note: 'snprintf' output between 10 and 31 bytes into a destination of size 30
7635 | snprintf(priv->queue_name, sizeof(priv->queue_name),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7636 | "stats-wq-%s%s", netdev_name(priv->port_list[0]->dev),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7637 | priv->port_count > 1 ? "+" : "");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduced by commit 118d6298f6 ("net: mvpp2: add ethtool GOP statistics").
I am not flagging this as a bug as I am not aware that it is one.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Wojtas <marcin.s.wojtas@gmail.com>
Link: https://patch.msgid.link/20240806-mvpp2-namelen-v1-1-6dc773653f2f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add netkit support to rt_link.yaml. Only forward(PASS) and
blackhole(DROP) policies are allowed to be set by user-space so I've
added only them to the yaml to avoid confusion.
Example:
$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/rt_link.yaml \
--do getlink --json '{"ifname": "netkit0"}' --output-json | jq
...
"linkinfo": {
"kind": "netkit",
"data": {
"primary": 1,
"policy": "blackhole",
"mode": "l2",
"peer-policy": "forward"
}
},
...
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240806104531.3296718-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of alloc_ordered_workqueue
is potentially insecure.
F.e. clang-18 says:
.../bond_main.c:6384:37: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
6384 | bond->wq = alloc_ordered_workqueue(bond_dev->name, WQ_MEM_RECLAIM);
| ^~~~~~~~~~~~~~
.../workqueue.h:524:18: note: expanded from macro 'alloc_ordered_workqueue'
524 | alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
| ^~~
.../bond_main.c:6384:37: note: treat the string as an argument to avoid this
6384 | bond->wq = alloc_ordered_workqueue(bond_dev->name, WQ_MEM_RECLAIM);
| ^
| "%s",
..../workqueue.h:524:18: note: expanded from macro 'alloc_ordered_workqueue'
524 | alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
| ^
Perhaps it is always the case where the contents of bond_dev->name is
safe to pass as the format argument. That is, in my understanding, it
never contains any format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240806-bonding-fmt-v1-1-e75027e45775@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace SET_RUNTIME_PM_OPS()/SET SYSTEM_SLEEP_PM_OPS() with their modern
RUNTIME_PM_OPS() and SYSTEM_SLEEP_PM_OPS() alternatives.
The combined usage of pm_ptr() and RUNTIME_PM_OPS/SYSTEM_SLEEP_PM_OPS()
allows the compiler to evaluate if the runtime suspend/resume() functions
are used at build time or are simply dead code.
This allows removing the __maybe_unused notation from the runtime
suspend/resume() functions.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Link: https://patch.msgid.link/20240806021628.2524089-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
So, with these changes, fix the following warning:
drivers/net/ethernet/fungible/funcore/fun_dev.c:550:43: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/ZrDwEugW7DR/FlP5@cute
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing says:
===================
tcp: completely support active reset
This time the patch series finally covers all the cases in the active
reset logic. After this, we can know the related exact reason(s).
v4
Link:
1. revise the changelog to avoid future confusion in patch [5/7] (Eric)
2. revise the changelog of patch [6/7] like above.
3. add reviewed-by tags (Eric)
v3
Link: https://lore.kernel.org/all/20240731120955.23542-1-kerneljasonxing@gmail.com/
1. introduce TCP_DISCONNECT_WITH_DATA reason (Eric)
2. use a better name 'TCP_KEEPALIVE_TIMEOUT' (Eric)
3. add three reviewed-by tags (Eric)
v2
Link: https://lore.kernel.org/all/20240730133513.99986-1-kerneljasonxing@gmail.com/
1. use RFC 9293 in the comment and changelog instead of old RFC 793
2. correct the comment and changelog in patch 5
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When user tries to disconnect a socket and there are more data written
into tcp write queue, we should tell users about this reset reason.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introducing a new type TCP_STATE to handle some reset conditions
appearing in RFC 793 due to its socket state. Actually, we can look
into RFC 9293 which has no discrepancy about this part.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introducing a new type TCP_ABORT_ON_MEMORY for tcp reset reason to handle
out of memory case.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introducing a new type TCP_ABORT_ON_LINGER for tcp reset reason to handle
negative linger value case.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introducing a new type TCP_ABORT_ON_CLOSE for tcp reset reason to handle
the case where more data is unread in closing phase.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2024-08-06
The first patch is by Frank Li and adds the can-transceiver property
to the flexcan device-tree bindings.
Haibo Chen contributes 2 patches for the flexcan driver to add wakeup
support for the imx95.
The 2 patches by Stefan Mätje for the esd_402_pci driver clean up the
driver and add support for the one-shot mode.
The last 15 patches are by Jimmy Assarsson and add hardware timestamp
support for all devices covered by the kvaser_usb driver.
* tag 'linux-can-next-for-6.12-20240806' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: kvaser_usb: Rename kvaser_usb_{ethtool,netdev}_ops_hwts to kvaser_usb_{ethtool,netdev}_ops
can: kvaser_usb: Remove struct variables kvaser_usb_{ethtool,netdev}_ops
can: kvaser_usb: Remove KVASER_USB_QUIRK_HAS_HARDWARE_TIMESTAMP
can: kvaser_usb: leaf: Add hardware timestamp support to usbcan devices
can: kvaser_usb: leaf: Store MSB of timestamp
can: kvaser_usb: leaf: Add structs for Tx ACK and clock overflow commands
can: kvaser_usb: leaf: Add hardware timestamp support to leaf based devices
can: kvaser_usb: leaf: kvaser_usb_leaf_tx_acknowledge: Rename local variable
can: kvaser_usb: leaf: Replace kvaser_usb_leaf_m32c_dev_cfg with kvaser_usb_leaf_m32c_dev_cfg_{16,24,32}mhz
can: kvaser_usb: leaf: Assign correct timestamp_freq for kvaser_usb_leaf_imx_dev_cfg_{16,24,32}mhz
can: kvaser_usb: leaf: Add struct for Tx ACK commands
can: kvaser_usb: hydra: Set hardware timestamp on transmitted packets
can: kvaser_usb: hydra: Add struct for Tx ACK commands
can: kvaser_usb: hydra: kvaser_usb_hydra_ktime_from_rx_cmd: Drop {rx_} in function name
can: kvaser_usb: Add helper functions to convert device timestamp into ktime
can: esd_402_pci: Add support for one-shot mode
can: esd_402_pci: Rename esdACC CTRL register macros
can: flexcan: add wakeup support for imx95
dt-bindings: can: fsl,flexcan: move fsl,imx95-flexcan standalone
dt-bindings: can: fsl,flexcan: add common 'can-transceiver' for fsl,flexcan
====================
Link: https://patch.msgid.link/20240806074731.1905378-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Compiling libcxgb_ppm.c results in several sparse warnings:
libcxgb_ppm.c:368:15: warning: incorrect type in assignment (different address spaces)
libcxgb_ppm.c:368:15: expected struct cxgbi_ppm_pool *pools
libcxgb_ppm.c:368:15: got void [noderef] __percpu *_res
libcxgb_ppm.c:374:48: warning: incorrect type in initializer (different address spaces)
libcxgb_ppm.c:374:48: expected void const [noderef] __percpu *__vpp_verify
libcxgb_ppm.c:374:48: got struct cxgbi_ppm_pool *
libcxgb_ppm.c:484:19: warning: incorrect type in assignment (different address spaces)
libcxgb_ppm.c:484:19: expected struct cxgbi_ppm_pool [noderef] __percpu *pool
libcxgb_ppm.c:484:19: got struct cxgbi_ppm_pool *[assigned] pool
libcxgb_ppm.c:511:21: warning: incorrect type in argument 1 (different address spaces)
libcxgb_ppm.c:511:21: expected void [noderef] __percpu *__pdata
libcxgb_ppm.c:511:21: got struct cxgbi_ppm_pool *[assigned] pool
Add __percpu annotation to *pools and *pool percpu pointers and to
ppm_alloc_cpu_pool() function that returns percpu pointer to fix
these warnings.
Compile tested only, but there is no difference in the resulting object file.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240804154635.4249-1-ubizjak@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the PHY of the mido bus is enabled with Wake-on-LAN (WOL),
we cannot suspend the PHY. Although the WOL status has been
checked in phy_suspend(), returning -EBUSY(-16) would cause
the Power Management (PM) to fail to suspend. Since
phy_suspend() is an exported symbol (EXPORT_SYMBOL),
timely error reporting is needed. Therefore, an additional
check is performed here. If the PHY of the mido bus is enabled
with WOL, we skip calling phy_suspend() to avoid PM failure.
From the following logs, it has been observed that the phydev->attached_dev
is NULL, phydev is "stmmac-0:01", it not attached, but it will affect suspend
and resume.The actually attached "stmmac-0:00" will not dpm_run_callback():
mdio_bus_phy_suspend().
init log:
[ 5.932502] YT8521 Gigabit Ethernet stmmac-0:00: attached PHY driver
(mii_bus:phy_addr=stmmac-0:00, irq=POLL)
[ 5.932512] YT8521 Gigabit Ethernet stmmac-0:01: attached PHY driver
(mii_bus:phy_addr=stmmac-0:01, irq=POLL)
[ 24.566289] YT8521 Gigabit Ethernet stmmac-0:00: yt8521_read_status,
link down, media: UTP
suspend log:
[ 322.631362] OOM killer disabled.
[ 322.631364] Freezing remaining freezable tasks
[ 322.632536] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 322.632540] printk: Suspending console(s) (use no_console_suspend to debug)
[ 322.633052] YT8521 Gigabit Ethernet stmmac-0:01:
PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x110 [libphy] returns -16
[ 322.633071] YT8521 Gigabit Ethernet stmmac-0:01:
PM: failed to suspend: error -16
[ 322.669699] PM: Some devices failed to suspend, or early wake event detected
[ 322.669949] OOM killer enabled.
[ 322.669951] Restarting tasks ... done.
[ 322.671008] random: crng reseeded on system resumption
[ 322.671014] PM: suspend exit
Add a function that phylib can inquire of the driver whether WoL
has been enabled at the PHY.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Youwan Wang <youwan@nfschina.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240731091537.771391-1-youwan@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current implementation of netpoll in veth devices leads to
suboptimal behavior, as it triggers warnings due to the invocation of
__netif_rx() within a softirq context. This is not compliant with
expected practices, as __netif_rx() has the following statement:
lockdep_assert_once(hardirq_count() | softirq_count());
Given that veth devices typically do not benefit from the
functionalities provided by netpoll, Disable netpoll for veth
interfaces.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240805094012.1843247-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan says:
====================
mlx5 PTM cross timestamping support
This patchset by Rahul and Carolina adds PTM (Precision Time Measurement)
support to the mlx5 driver.
PTM is a PCI extended capability introduced by PCI-SIG for providing an
accurate read of the device clock offset without being impacted by
asymmetric bus transfer rates.
The performance of PTM on ConnectX-7 was evaluated using both real-time
(RTC) and free-running (FRC) clocks under traffic and no traffic
conditions. Tests with phc2sys measured the maximum offset values at a 50Hz
rate, with and without PTM.
Results:
1. No traffic
+-----+--------+--------+
| | No-PTM | PTM |
+-----+--------+--------+
| FRC | 125 ns | <29 ns |
+-----+--------+--------+
| RTC | 248 ns | <34 ns |
+-----+--------+--------+
2. With traffic
+-----+--------+--------+
| | No-PTM | PTM |
+-----+--------+--------+
| FRC | 254 ns | <40 ns |
+-----+--------+--------+
| RTC | 255 ns | <45 ns |
+-----+--------+--------+
v2: https://lore.kernel.org/d1dba3e1-2ecc-4fdf-a23b-7696c4bccf45@gmail.com
====================
Link: https://patch.msgid.link/20240730134055.1835261-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add hardware timestamp support for all usbcan based devices (M16C).
The usbcan firmware is slightly different compared to the other Kvaser USB
interfaces:
- The timestamp is provided by a 32-bit counter, with 10us resolution.
Hence, the hardware timestamp will wrap after less than 12 hours.
- Each Rx CAN or Tx ACK command only contains the 16-bits LSB of the
timestamp counter.
- The 16-bits MSB are sent in an asynchronous event (command), if any
change occurred in the MSB since the last event.
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20240701154936.92633-13-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>