PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 675ad47375 ("e1000: Use netdev_<level>, pr_<level> and dev_<level>")
declared but never implemented e1000_get_hw_dev_name().
Commit 1532ecea1d ("e1000: drop dead pcie code from e1000")
removed e1000_check_mng_mode()/e1000_blink_led_start() but not the declarations.
Commit c46b59b241 ("e1000: Remove unused function e1000_mta_set.")
removed e1000_mta_set() but not its declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 39de828179 ("RDS: Main header file") declared but never implemented
rds_trans_init() and rds_trans_exit(), remove it.
Commit d37c935905 ("RDS: Move loop-only function to loop.c") removed the
implementation rds_message_inc_free() but not the declaration.
Since commit 55b7ed0b58 ("RDS: Common RDMA transport code")
rds_rdma_conn_connect() is never implemented and used.
rds_tcp_map_seq() is never implemented and used since
commit 70041088e3 ("RDS: Add TCP transport to RDS").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt()
and others use nlk->flags without correct locking.
Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove
data-races.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong says:
====================
net: tcp: support probing OOM
In this series, we make some small changes to make the tcp
retransmission become zero-window probes if the receiver drops the skb
because of memory pressure.
In the 1st patch, we reply a zero-window ACK if the skb is dropped
because out of memory, instead of dropping the skb silently.
In the 2nd patch, we allow a zero-window ACK to update the window.
In the 3rd patch, fix unexcepted socket die when snd_wnd is 0 in
tcp_retransmit_timer().
In the 4th patch, we refactor the debug message in
tcp_retransmit_timer() to make it more correct.
After these changes, the tcp can probe the OOM of the receiver forever.
Changes since v3:
- make the timeout "2 * TCP_RTO_MAX" in the 3rd patch
- tp->retrans_stamp is not based on jiffies and can't be compared with
icsk->icsk_timeout in the 3rd patch. Fix it.
- introduce the 4th patch
Changes since v2:
- refactor the code to avoid code duplication in the 1st patch
- use after() instead of max() in tcp_rtx_probe0_timed_out()
Changes since v1:
- send 0 rwin ACK for the receive queue empty case when necessary in the
1st patch
- send the ACK immediately by using the ICSK_ACK_NOW flag in the 1st
patch
- consider the case of the connection restart from idle, as Neal comment,
in the 3rd patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The debug message in tcp_retransmit_timer() is slightly wrong, because
they could be printed even if we did not receive a new ACK packet from
the remote peer.
Change it to probing zero-window, as it is a expected case now. The
description may be not correct.
Adding the duration since the last ACK we received, and the duration of
the retransmission, which are useful for debugging.
And the message now like this:
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 209ms ago, lasting 209ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 404ms ago, lasting 408ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 812ms ago, lasting 1224ms
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not
TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp
could already be a long time (minutes or hours) in the past even on the
first RTO. So we double check the timeout with the duration of the
retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
dying too soon.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fow now, an ACK can update the window in following case, according to
the tcp_may_update_window():
1. the ACK acknowledged new data
2. the ACK has new data
3. the ACK expand the window and the seq of it is valid
Now, we allow the ACK update the window if the window is 0, and the
seq/ack of it is valid. This is for the case that the receiver replies
an zero-window ACK when it is under memory stress and can't queue the new
data.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For now, skb will be dropped when no memory, which makes client keep
retrans util timeout and it's not friendly to the users.
In this patch, we reply an ACK with zero-window in this case to update
the snd_wnd of the sender to 0. Therefore, the sender won't timeout the
connection and will probe the zero-window with the retransmits.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
i40e: Replace one-element arrays with flexible-array members
Replace one-element arrays with flexible-array members in multiple
structures.
This results in no differences in binary output.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: Replace one-element array with flex-array member in struct i40e_profile_aq_section
i40e: Replace one-element array with flex-array member in struct i40e_section_table
i40e: Replace one-element array with flex-array member in struct i40e_profile_segment
i40e: Replace one-element array with flex-array member in struct i40e_package_header
====================
Link: https://lore.kernel.org/r/20230810175302.1964182-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexis Lothoré says:
====================
net: dsa: rzn1-a5psw: add support for vlan and .port_bridge_flags
this series enables vlan support in Renesas RZN1 internal ethernet switch,
and is a follow up to the work initiated by Clement Leger a few months ago,
who handed me over the topic.
This new revision aims to iron the last few points raised by Vladimir to
ensure that the driver is in line with switch drivers expectations, and is
based on the lengthy discussion in [1] (thanks Vladimir for the valuable
explanations)
[1] https://lore.kernel.org/netdev/20230314163651.242259-1-clement.leger@bootlin.com/
----
V5:
- ensure that flooding can be enabled only if port belongs to a bridge
- enable learning in a5psw_port_stp_state_set() only if port has learning
enabled
- toggle vlan tagging on vlan filtering in
- removed reviewed-by on second patch since its modified
RESEND V4:
- Resent due to net-next being closed
V4:
- Fix missing CPU port bit in a5psw->bridged_ports
- Use unsigned int for vlan_res_id parameters
- Rename a5psw_get_vlan_res_entry() to a5psw_new_vlan_res_entry()
- In a5psw_port_vlan_add(), return -ENOSPC when no VLAN entry is found
- In a5psw_port_vlan_filtering(), compute "val" from "mask"
V3:
- Target net-next tree and correct version...
V2:
- Fixed a few formatting errors
- Add .port_bridge_flags implementation
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for vlan operation (add, del, filtering) on the RZN1
driver. The a5psw switch supports up to 32 VLAN IDs with filtering,
tagged/untagged VLANs and PVID for each ports.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running vlan test (bridge_vlan_aware/unaware.sh), there were some
failure due to the lack .port_bridge_flag function to disable port
flooding. Implement this operation for BR_LEARNING, BR_FLOOD,
BR_MCAST_FLOOD and BR_BCAST_FLOOD.
Since .port_bridge_flags affects the bits disabling learning for a port,
ensure that any other modification on the same register done by
a5psw_port_stp_state_set is in sync by using the port learning state to
enable/disable learning on the port.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.port_bridge_flags will be added and allows to modify the flood mask
independently for each port. Keeping the existing bridged_ports write
in a5psw_flooding_set_resolution() would potentially messed up this.
Use a read-modify-write to set that value and move bridged_ports
handling in bridge_port_join/leave.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation does not allow the user to enable both
hw-tc-offload and ntuple features on the interface. These checks
are added as TC flower offload and ntuple features internally configures
the same hardware resource MCAM. But TC HTB offload configures the
transmit scheduler which can be safely enabled on the interface with
ntuple feature.
This patch adds the same and ensures only TC flower offload and ntuple
features are mutually exclusive.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao says:
====================
bonding: do some cleanups in bond driver
Do some cleanups in bond driver.
---
v2: use IS_ERR instead of NULL check in patch 2/5, update commit
information in patch 3/5, remove inline modifier in patch 4/5
====================
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The free_percpu function also could check whether "rr_tx_counter"
parameter is NULL. Therefore, remove NULL check in bond_destructor.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bond_reset_slave_arr(), values are assigned and memory is released only
when the variables "usable" and "all" are not NULL. But even if the
"usable" and "all" variables are NULL, they can still work, because value
will be checked in kfree_rcu. Therefore, use bond_set_slave_arr() and set
the input parameters "usable_slaves" and "all_slaves" to NULL to simplify
the code in bond_reset_slave_arr(). And the same to bond_uninit().
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because debugfs_create_dir returns ERR_PTR, so bonding_debug_root will
never be NULL. Remove redundant NULL check for bonding_debug_root in
debugfs function. The later debugfs_create_dir/debugfs_remove_recursive
/debugfs_remove_recursive functions will check the dentry with IS_ERR().
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because debugfs_create_dir returns ERR_PTR, so IS_ERR should be used to
check whether the directory is successfully created.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some functions are only used in initialization and exit functions, so add
the __init/__net_init and __net_exit modifiers to these functions.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'type' is an enum, thus cast of pointer on 64-bit compile test with
W=1 causes:
mvmdio.c:272:9: error: cast to smaller integer type 'enum orion_mdio_bus_type' from 'const void *' [-Werror,-Wvoid-pointer-to-enum-cast]
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
'enet_id' is an enum, thus cast of pointer on 64-bit compile test with
W=1 causes:
xgene_enet_main.c:2044:20: error: cast to smaller integer type 'enum xgene_enet_id' from 'const void *' [-Werror,-Wvoid-pointer-to-enum-cast]
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
Extended performance counter stats in 'ethtool -S <interface>'
for MANA VF to include GDMA tx LSO packets and bytes count.
Tested-on: Ubuntu22
Testcases:
1. LISA testcase:
PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic
2. LISA testcase:
PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV
3. Validated the GDMA stat packets and byte counters
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement control plane mailbox versions for host and firmware.
Versions are published in info area of control mailbox bar4
memory structure.Firmware will publish minimum and maximum
supported versions.Control plane mailbox apis will check for
firmware version before sending any control commands to firmware.
Notifications from firmware will similarly be checked for host
version compatibility.
Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accept TC offload classifier rule only if SPI field
can be extracted by HW.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If netdev_mc_count() is not zero and not IFF_ALLMULTI, filter
incoming multicast packets. The chip has a Multicast Address Hash Table
for allowed multicast addresses, so we fill it.
Implement .ndo_set_rx_mode and recalculate multicast hash table. Also
observe change of IFF_PROMISC and IFF_ALLMULTI netdev flags.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When gso.hdr_len is zero and a packet is transmitted via write() or
writev(), all payload is treated as header which requires a contiguous
memory allocation. This allocation request is harder to satisfy, and may
even fail if there is enough fragmentation.
Note that sendmsg() code path limits the linear copy length, so this change
makes write()/writev() and sendmsg() paths more consistent.
Signed-off-by: Tahsin Erdogan <trdgn@amazon.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230809164753.2247594-1-trdgn@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross merge x86 fixes to fix clang linking errors:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:221: at least one side of the expression must be absolute
These will hopefully be downstream by the time we ship
the next batch of fixes.
* 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Move gds_ucode_mitigated() declaration to header
x86/speculation: Add cpu_show_gds() prototype
driver core: cpu: Make cpu_show_not_affected() static
x86/srso: Fix build breakage with the LLVM linker
Documentation/srso: Document IBPB aspect and fix formatting
driver core: cpu: Unify redundant silly stubs
Documentation/hw-vuln: Unify filename specification in index
Link: https://lore.kernel.org/all/CAHk-=wj_b+FGTnevQSBAtCWuhCk=0oQ_THvthBW2hzqpOTLFmg@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrew Lunn says:
====================
Support offload LED blinking to PHY.
Allow offloading of the LED trigger netdev to PHY drivers and
implement it for the Marvell PHY driver. Additionally, correct the
handling of when the initial state of the LED cannot be represented by
the trigger, and so an error is returned. As with ledtrig-timer,
disable offload when the trigger is deactivate, or replaced by another
trigger.
====================
Link: https://lore.kernel.org/r/20230808210436.838995-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linux LEDs can be requested to perform hardware accelerated blinking
to indicate link, RX, TX etc. Pass the rules for blinking to the PHY
driver, if it implements the ops needed to determine if a given
pattern can be offloaded, to offload it, and what the current offload
is. Additionally implement the op needed to get what device the LED is
for.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20230808210436.838995-3-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the netdev trigger is activates, it tries to determine what
device the LED blinks for, and what the current blink mode is.
The documentation for hw_control_get() says:
* Return 0 on success, a negative error number on failing parsing the
* initial mode. Error from this function is NOT FATAL as the device
* may be in a not supported initial state by the attached LED trigger.
*/
For the Marvell PHY and the Armada 370-rd board, the initial LED blink
mode is not supported by the trigger, so it returns an error. This
resulted in not getting the device the LED is blinking for. As a
result, the device is unknown and offloaded is never performed.
Change to condition to always get the device if offloading is
supported, and reduce the scope of testing for an error from
hw_control_get() to skip setting trigger internal state if there is an
error.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20230808210436.838995-2-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using a fixed-link setup, certain devices like the SJA1105 require a
small pause in the TXC clock line to enable their internal tunable
delay line (TDL).
To satisfy this requirement, this patch temporarily disables the TX clock,
and restarts it after a required period. This provides the required
silent interval on the clock line for SJA1105 to complete the frequency
transition and enable the internal TDLs. This action occurs before the link
is built up, so it does not impact a normal device too. There is no
need to identify if the connected device is an SJA1105 alike or not during
the implementation.
So far we have only enabled this feature on the i.MX93 platform.
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Frank Li <frank.li@nxp.com>
Link: https://lore.kernel.org/r/20230807160716.259072-3-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau says:
====================
pull-request: bpf-next 2023-08-09
We've added 19 non-merge commits during the last 6 day(s) which contain
a total of 25 files changed, 369 insertions(+), 141 deletions(-).
The main changes are:
1) Fix array-index-out-of-bounds access when detaching from an
already empty mprog entry from Daniel Borkmann.
2) Adjust bpf selftest because of a recent llvm change
related to the cpu-v4 ISA from Eduard Zingerman.
3) Add uprobe support for the bpf_get_func_ip helper from Jiri Olsa.
4) Fix a KASAN splat due to the kernel incorrectly accepted
an invalid program using the recent cpu-v4 instruction from
Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf: btf: Remove two unused function declarations
bpf: lru: Remove unused declaration bpf_lru_promote()
selftests/bpf: relax expected log messages to allow emitting BPF_ST
selftests/bpf: remove duplicated functions
bpf, docs: Fix small typo and define semantics of sign extension
selftests/bpf: Add bpf_get_func_ip test for uprobe inside function
selftests/bpf: Add bpf_get_func_ip tests for uprobe on function entry
bpf: Add support for bpf_get_func_ip helper for uprobe program
selftests/bpf: Add a movsx selftest for sign-extension of R10
bpf: Fix an incorrect verification success with movsx insn
bpf, docs: Formalize type notation and function semantics in ISA standard
bpf: change bpf_alu_sign_string and bpf_movsx_string to static
libbpf: Use local includes inside the library
bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR.
bpf: fix inconsistent return types of bpf_xdp_copy_buf().
selftests/bpf: fix the incorrect verification of port numbers.
selftests/bpf: Add test for detachment on empty mprog entry
bpf: Fix mprog detachment for empty mprog entry
bpf: bpf_struct_ops: Remove unnecessary initial values of variables
====================
Link: https://lore.kernel.org/r/20230810055123.109578-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>