Add an ioctl for getting and setting epoll_params. User programs can use
this ioctl to get and set the busy poll usec time, packet budget, and
prefer busy poll params for a specific epoll context.
Parameters are limited:
- busy_poll_usecs is limited to <= s32_max
- busy_poll_budget is limited to <= NAPI_POLL_WEIGHT by unprivileged
users (!capable(CAP_NET_ADMIN))
- prefer_busy_poll must be 0 or 1
- __pad must be 0
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using epoll-based busy poll, the prefer_busy_poll option is hardcoded
to false. Users may want to enable prefer_busy_poll to be used in
conjunction with gro_flush_timeout and defer_hard_irqs_count to keep device
IRQs masked.
Other busy poll methods allow enabling or disabling prefer busy poll via
SO_PREFER_BUSY_POLL, but epoll-based busy polling uses a hardcoded value.
Fix this edge case by adding support for a per-epoll context
prefer_busy_poll option. The default is false, as it was hardcoded before
this change.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using epoll-based busy poll, the packet budget is hardcoded to
BUSY_POLL_BUDGET (8). Users may desire larger busy poll budgets, which
can potentially increase throughput when busy polling under high network
load.
Other busy poll methods allow setting the busy poll budget via
SO_BUSY_POLL_BUDGET, but epoll-based busy polling uses a hardcoded
value.
Fix this edge case by adding support for a per-epoll context busy poll
packet budget. If not specified, the default value (BUSY_POLL_BUDGET) is
used.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow busy polling on a per-epoll context basis. The per-epoll context
usec timeout value is preferred, but the pre-existing system wide sysctl
value is still supported if it specified.
busy_poll_usecs is a u32, but in a follow up patch the ioctl provided to
the user only allows setting a value from 0 to S32_MAX.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in initializing an ndo to NULL, therefore the
assignment is redundant and can be removed.
Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet ingress and egress MAC/serdes channel numbers are configurable
on CN10K series of silicons. These channel numbers inturn used while
installing MCAM rules to match ingress/egress port. Fetch these channel
numbers from firmware at driver init time.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-12 (ice)
This series contains updates to ice driver only.
Grzegorz adds support for E825-C devices.
Wojciech reworks devlink reload to fulfill expected conditions (remove
and readd).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
linux-can-next-for-6.9-20240213
this is a pull request of 23 patches for net-next/master.
The first patch is by Nicolas Maier and targets the CAN Broadcast
Manager (bcm), it adds message flags to distinguish between own local
and remote traffic.
Oliver Hartkopp contributes a patch for the CAN ISOTP protocol that
adds dynamic flow control parameters.
Stefan Mätje's patch series add support for the esd PCIe/402 CAN
interface family.
Markus Schneider-Pargmann contributes 14 patches for the m_can to
optimize for the SPI attached tcan4x5x controller.
A patch by Vincent Mailhol replaces Wolfgang Grandegger by Vincent
Mailhol as the CAN drivers Co-Maintainer.
Jimmy Assarsson's patch add support for the Kvaser M.2 PCIe 4xCAN
adapter.
A patch by Daniil Dulov removed a redundant NULL check in the softing
driver.
Oliver Hartkopp contributes a patch to add CANXL virtual CAN network
identifier support.
A patch by myself removes Naga Sureshkumar Relli as the maintainer of
the xilinx_can driver, as their email bounces.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi says:
====================
add multi-buff support for xdp running in generic mode
Introduce multi-buffer support for xdp running in generic mode not always
linearizing the skb in netif_receive_generic_xdp routine.
Introduce generic percpu page_pools allocator.
====================
Link: https://lore.kernel.org/r/cover.1707729884.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
net: adopt netdev_lockdep_set_classes()
Instead of waiting for syzbot to discover lockdep false positives,
make sure we use netdev_lockdep_set_classes() a bit more.
====================
Link: https://lore.kernel.org/r/20240212140700.2795436-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PMD Global Transmit Disable bit should be cleared for normal operation.
This should be HW default, however I found that on Asus RT-AX89X that uses
AQR113C PHY and firmware 5.4 this bit is set by default.
With this bit set the AQR cannot achieve a link with its link-partner and
it took me multiple hours of digging through the vendor GPL source to find
this out, so lets always clear this bit during .config_init() to avoid a
situation like this in the future.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240211181732.646311-1-robimarko@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
CAN XL data frames contain an 8-bit virtual CAN network identifier (VCID).
A VCID value of zero represents an 'untagged' CAN XL frame.
To receive and send these optional VCIDs via CAN_RAW sockets a new socket
option CAN_RAW_XL_VCID_OPTS is introduced to define/access VCID content:
- tx: set the outgoing VCID value by the kernel (one fixed 8-bit value)
- tx: pass through VCID values from the user space (e.g. for traffic replay)
- rx: apply VCID receive filter (value/mask) to be passed to the user space
With the 'tx pass through' option CAN_RAW_XL_VCID_TX_PASS all valid VCID
values can be sent, e.g. to replay full qualified CAN XL traffic.
The VCID value provided for the CAN_RAW_XL_VCID_TX_SET option will
override the VCID value in the struct canxl_frame.prio defined for
CAN_RAW_XL_VCID_TX_PASS when both flags are set.
With a rx_vcid_mask of zero all possible VCID values (0x00 - 0xFF) are
passed to the user space when the CAN_RAW_XL_VCID_RX_FILTER flag is set.
Without this flag only untagged CAN XL frames (VCID = 0x00) are delivered
to the user space (default).
The 8-bit VCID is stored inside the CAN XL prio element (only in CAN XL
frames!) to not interfere with other CAN content or the CAN filters
provided by the CAN_RAW sockets and kernel infrastruture.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20240212213550.18516-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Set scope automatically in ip_route_output_ports() (using the socket
SOCK_LOCALROUTE flag). This way, callers don't have to overload the
tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does.
For callers that don't pass a struct sock, this doesn't change anything
as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL.
Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or
RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use
ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with
the RTO_ONLINK flag now becomes unnecessary.
In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos
parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk
parameter is a kernel socket, which doesn't have any configuration path
for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports()
will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c
doesn't need to be modified.
Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as
these macros are now unused.
The objective is to eventually remove RTO_ONLINK entirely to allow
converting ->flowi4_tos to dscp_t. This will ensure proper isolation
between the DSCP and ECN bits, thus minimising the risk of introducing
bugs where TOS values interfere with ECN.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During devlink reload it is needed to remove debugfs entries
correlated with only one PF. ice_debugfs_exit() removes all
entries created by ice driver so we can't use it.
Introduce ice_debugfs_pf_deinit() in order to release PF's
debugfs entries. Move ice_debugfs_exit() call to ice_module_exit(),
it makes more sense since ice_debugfs_init() is called in
ice_module_init() and not in ice_probe().
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Recent changes to the devlink reload (commit 9b2348e2d6
("devlink: warn about existing entities during reload-reinit"))
force the drivers to destroy devlink ports during reinit.
Adjust ice driver to this requirement, unregister netdvice, destroy
devlink port. ice_init_eth() was removed and all the common code
between probe and reload was moved to ice_load().
During devlink reload we can't take devl_lock (it's already taken)
and in ice_probe() we have to lock it. Use devl_* variant of the API
which does not acquire and release devl_lock. Guard ice_load()
with devl_lock only in case of probe.
Suggested-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
E825C devices shall support the new signing type of RSA 3K for new DDP
section (SEGMENT_SIGN_TYPE_RSA3K_E825 (5) - already in the code).
The driver is responsible to verify the presence of correct signing type.
Add 3k signinig support for E825C devices based on mac_type:
ICE_MAC_GENERIC_3K_E825;
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
E800 series devices have a couple of quirks:
1. Sideband control queues are not supported
2. The registers that the driver needs to program for the "Precision
Time Protocol (PTP)" feature are different for E800 series devices
compared to other devices supported by this driver.
Both these require conditional logic based on the underlying device we
are dealing with.
The function ice_is_sbq_supported added by commit 8f5ee3c477
("ice: add support for sideband messages") addresses (1).
The same function can be used to address (2) as well
but this just looks weird readability wise in cases that have nothing
to do with sideband control queues:
if (ice_is_sbq_supported(hw))
/* program register A */
else
/* program register B */
For these cases, the function ice_is_generic_mac introduced by this
patch communicates the idea/intention better. Also rework
ice_is_sbq_supported to use this new function.
As side-band queue is supported for E825C devices, it's mac_type is
considered as generic mac_type.
Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Introduce new Intel Ethernet E825C family devices.
Add new PCI device IDs which are going to be supported by the
driver:
- 579C: Intel(R) Ethernet Connection E825-C for backplane
- 579D: Intel(R) Ethernet Connection E825-C for QSFP
- 579E: Intel(R) Ethernet Connection E825-C for SFP
- 579F: Intel(R) Ethernet Connection E825-C for SGMII
Add helper function ice_is_e825c() to verify if the running device
belongs to E825C family.
Co-developed-by: Jan Glaza <jan.glaza@intel.com>
Signed-off-by: Jan Glaza <jan.glaza@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Markus Schneider-Pargmann <msp@baylibre.com> says:
The series implements many small and bigger throughput improvements and
adds rx/tx coalescing at the end.
Changes in v7:
- Rebased to v6.8-rc1
- Fixed NULL pointer dereference in m_can_clean() on am62 that happened
when doing ip link up, ip link down, ip link up
- Fixed a racecondition on am62 observed with high throughput tests.
netdev_completed_queue() was called before netdev_sent_queue() as the
interrupt was processed so fast. netdev_sent_queue() is now reported
before the actual sent is done.
- Fixed an initializing issue on am62 where active interrupts are
getting lost between runs. Fixed by resetting cdev->active_interrupts
in m_can_disable_all_interrupts()
- Removed m_can_start_fast_xmit() because of a reordering of operations
due to above mentioned race condition
Changes in v6:
- Rebased to v6.6-rc2
- Added two small changes for the newly integrated polling feature
- Reuse the polling hrtimer for coalescing as the timer used for
coalescing has a similar purpose as the one for polling. Also polling
and coalescing will never be active at the same time.
Changes in v5:
- Add back parenthesis in m_can_set_coalesce(). This will make
checkpatch unhappy but gcc happy.
- Remove unused fifo_header variable in m_can_tx_handler().
- Rebased to v6.5-rc1
Changes in v4:
- Create and use struct m_can_fifo_element in m_can_tx_handler
- Fix memcpy_and_pad to copy the full buffer
- Fixed a few checkpatch warnings
- Change putidx to be unsigned
- Print hard_xmit error only once when TX FIFO is full
Changes in v3:
- Remove parenthesis in error messages
- Use memcpy_and_pad for buffer copy in 'can: m_can: Write transmit
header and data in one transaction'.
- Replace spin_lock with spin_lock_irqsave. I got a report of a
interrupt that was calling start_xmit just after the netqueue was
woken up before the locked region was exited. spin_lock_irqsave should
fix this. I attached the full stack at the end of the mail if someone
wants to know.
- Rebased to v6.3-rc1.
- Removed tcan4x5x patches from this series.
Changes in v2:
- Rebased on v6.2-rc5
- Fixed missing/broken accounting for non peripheral m_can devices.
previous versions:
v1 - https://lore.kernel.org/lkml/20221221152537.751564-1-msp@baylibre.com
v2 - https://lore.kernel.org/lkml/20230125195059.630377-1-msp@baylibre.com
v3 - https://lore.kernel.org/lkml/20230315110546.2518305-1-msp@baylibre.com
v4 - https://lore.kernel.org/lkml/20230621092350.3130866-1-msp@baylibre.com
v5 - https://lore.kernel.org/lkml/20230718075708.958094-1-msp@baylibre.com
v6 - https://lore.kernel.org/lkml/20230929141304.3934380-1-msp@baylibre.com
Link: https://lore.kernel.org/all/20240207093220.2681425-1-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
m_can supports submitting multiple transmits with one register write.
This is an interesting option to reduce the number of SPI transfers for
peripheral chips.
The m_can_tx_op is extended with a bool that signals if it is the last
transmission and the submit should be executed immediately.
The worker then writes the skb to the FIFO and submits it only if the
submit bool is set. If it isn't set, the worker will write the next skb
which is waiting in the workqueue to the FIFO, etc.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-15-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>