Enabling KASAN and running some iperf tests raises some memory issues with
vmm_table:
BUG: KASAN: slab-out-of-bounds in wilc_wlan_handle_txq+0x6ac/0xdb4
Write of size 4 at addr c3a61540 by task wlan0-tx/95
KASAN detects that we are writing data beyond range allocated to vmm_table.
There is indeed a mismatch between the size passed to allocator in
wilc_wlan_init, and the range of possible indexes used later: allocation
size is missing a multiplication by sizeof(u32)
Fixes: 40b717bfce ("wifi: wilc1000: fix DMA on stack objects")
Cc: stable@vger.kernel.org
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231017-wilc1000_tx_oops-v3-1-b2155f1f7bee@bootlin.com
DCFO tracking compensate the CFO (carrier frequency offset) by digital
hardware that provides fine CFO estimation. Although the avg_cfo which
is a coarse information becomes zero, still we need DCFO tracking to
compensate the residual CFO. However, the original flow skips the case
when avg_cfo is zero, so we fix it to have expected performance.
Signed-off-by: Cheng-Chieh Hsieh <cj.hsieh@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-5-pkshih@realtek.com
Change naming to disambiguate the functions because their names are common
and not clear about the purpose. Not change logic at all.
These functions are to control baseband AGC while BT coexists with WiFi.
Among these functions, ctrl_btg_bt_rx is used to control AGC related
settings, which is affected by BT RX, while BT shares the same path
with wifi; ctrl_nbtg_bt_tx is used to control AGC settings under
non-shared path condition, which is affected by BT TX.
Signed-off-by: Chung-Hsuan Hung <hsuan8331@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-2-pkshih@realtek.com
Sometimes firmware may enter strange state or infinite
loop due to unknown bug, and then it will lead critical
function fail, such as sending H2C command or changing
power mode. In these abnormal states, we add more debug
information, including hardware register status, to help
further investigation.
Signed-off-by: Chin-Yen Lee <timlee@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016053554.744180-3-pkshih@realtek.com
This is a kconfig warning in a randconfig when CONFIG_PCI is not set:
WARNING: unmet direct dependencies detected for SSB_EMBEDDED
Depends on [n]: SSB [=y] && SSB_DRIVER_MIPS [=y] && SSB_PCICORE_HOSTMODE [=n]
Selected by [y]:
- BCM47XX_SSB [=y] && BCM47XX [=y]
This is caused by arch/mips/bcm47xx/Kconfig's symbol BCM47XX_SSB
selecting SSB_EMBEDDED when CONFIG_PCI is not set.
This warning can be prevented by altering SSB_EMBEDDED to allow for
PCI=n or the former SSB_PCICORE_HOSTMODE.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Michael Büsch <m@bues.ch>
Cc: linux-wireless@vger.kernel.org
Cc: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231012220856.23260-1-rdunlap@infradead.org
Saeed Mahameed says:
====================
mlx5-updates-2023-10-10
1) Adham Faris, Increase max supported channels number to 256
2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes
3) Shay Drory, Replace global mlx5_intf_lock with
HCA devcom component lock
4) Wei Zhang, Optimize SF creation flow
During SF creation, HCA state gets changed from INVALID to
IN_USE step by step. Accordingly, FW sends vhca event to
driver to inform about this state change asynchronously.
Each vhca event is critical because all related SW/FW
operations are triggered by it.
Currently there is only a single mlx5 general event handler
which not only handles vhca event but many other events.
This incurs huge bottleneck because all events are forced
to be handled in serial manner.
Moreover, all SFs share same table_lock which inevitably
impacts each other when they are created in parallel.
This series will solve this issue by:
1. A dedicated vhca event handler is introduced to eliminate
the mutual impact with other mlx5 events.
2. Max FW threads work queues are employed in the vhca event
handler to fully utilize FW capability.
3. Redesign SF active work logic to completely remove
table_lock.
With above optimization, SF creation time is reduced by 25%,
i.e. from 80s to 60s when creating 100 SFs.
Patches summary:
Patch 1 - implement dedicated vhca event handler with max FW
cmd threads of work queues.
Patch 2 - remove table_lock by redesigning SF active work
logic.
* tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Allow IPsec soft/hard limits in bytes
net/mlx5e: Increase max supported channels number to 256
net/mlx5e: Preparations for supporting larger number of channels
net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's
net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()
net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs
net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code
net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code
net/mlx5: fix config name in Kconfig parameter documentation
net/mlx5: Remove unused declaration
net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock
net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom
net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
net/mlx5: Redesign SF active work to remove table_lock
net/mlx5: Parallelize vhca event handling
====================
Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect both hi.data.modename and hi.data.drivername to be
NUL-terminated based on its usage with sprintf:
| sprintf(hi.data.modename, "%sclk,%smodem,fclk=%d,bps=%d%s",
| bc->cfg.intclk ? "int" : "ext",
| bc->cfg.extmodem ? "ext" : "int", bc->cfg.fclk, bc->cfg.bps,
| bc->cfg.loopback ? ",loopback" : "");
Note that this data is copied out to userspace with:
| if (copy_to_user(data, &hi, sizeof(hi)))
... however, the data was also copied FROM the user here:
| if (copy_from_user(&hi, data, sizeof(hi)))
Considering the above, a suitable replacement is strscpy_pad() as it
guarantees NUL-termination on the destination buffer while also
NUL-padding (which is good+wanted behavior when copying data to
userspace).
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231016-strncpy-drivers-net-hamradio-baycom_epp-c-v2-1-39f72a72de30@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I recently cleaned up specs to not specify enum-as-flags
when target enum is already defined as flags.
YNL Python library did not convert flags, unfortunately,
so this caused breakage for Stan and Willem.
Note that the nlspec.py abstraction already hides the differences
between flags and enums (value vs user_value), so the changes
are pretty trivial.
Fixes: 0629f22ec1 ("ynl: netdev: drop unnecessary enum-as-flags")
Reported-and-tested-by: Willem de Bruijn <willemb@google.com>
Reported-and-tested-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/
Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: remove last of the phylink validate methods and clean up
This four patch series removes the last of the phylink MAC .validate
methods which can be found in the Freescale fman driver. fman has a
requirement that half duplex may not be supported in RGMII mode,
which is currently handled in its .validate method.
In order to keep this functionality when removing the .validate method,
we need to replace that with equivalent functionality, for which I
propose the optional .mac_get_caps method in the first patch.
The advantage of this approach over the .validate callback is that MAC
drivers only have to deal with the MAC_* capabilities, and don't need
to call back into phylink functions to do the masking of the ethtool
linkmodes etc - which then becomes internal to phylink. This can be
seen in the fourth patch where we make a load of these methods static.
====================
Link: https://lore.kernel.org/r/ZS1Z5DDfHyjMryYu@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a new method, mac_get_caps() to get the MAC capabilities for
the specified interface mode. This is for MACs which have special
requirements, such as not supporting half-duplex in certain interface
modes, and will replace the validate() method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent FW interface update bumped the size of struct hwrm_func_cfg_input
above 128B which is the max some devices support.
Probe on Stratus (BCM957452) with FW 20.8.3.11 fails with:
bnxt_en ...: Unable to reserve tx rings
bnxt_en ...: 2nd rings reservation failed.
bnxt_en ...: Not enough rings available.
Once probe is fixed other errors pop up:
bnxt_en ...: Failed to set async event completion ring.
This is because __hwrm_send() rejects requests larger than
bp->hwrm_max_ext_req_len with -E2BIG. Since the driver doesn't
actually access any of the new fields, yet, trim the length.
It should be safe.
Similar workaround exists for backing_store_cfg_input.
Although that one mins() to a constant of 256, not 128
we'll effectively use here. Michael explains: "the backing
store cfg command is supported by relatively newer firmware
that will accept 256 bytes at least."
To make debugging easier in the future add a warning
for oversized requests.
Fixes: 754fbf604f ("bnxt_en: Update firmware interface to 1.10.2.171")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231016171640.1481493-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.
Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
- IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
a single bridge.
- IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
the bridge.
The new attributes are used like this:
# ip link add name br up type bridge fdb_max_learned 256
# ip link add name v1 up master br type veth peer v2
# ip link set up dev v2
# mausezahn -a rand -c 1024 v2
0.01 seconds (90877 packets per second
# bridge fdb | grep -v permanent | wc -l
256
# ip -d link show dev br
13: br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 [...]
[...] fdb_n_learned 256 fdb_max_learned 256
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-3-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.
There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.
Mitigate this by maintaining a per bridge count of those automatically
generated entries in fdb_n_learned, and a limit in fdb_max_learned. If
the limit is hit new entries are not learned anymore.
For backwards compatibility the default setting of 0 disables the limit.
User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.
Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of
whether an FDB entry is included in the count. The flag is enabled for
dynamically learned entries, and disabled for all other entries. This
should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset,
but contrary to the two flags it can be toggled atomically.
Atomicity is required here, as there are multiple callers that modify the
flags, but are not under a common lock (br_fdb_update is the exception
for br->hash_lock, br_fdb_external_learn_add for RTNL).
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-2-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless-next patches for v6.7
The second pull request for v6.7, with only driver changes this time.
We have now support for mt7925 PCIe and USB variants, few new features
and of course some fixes.
Major changes:
mt76
- mt7925 support
ath12k
- read board data variant name from SMBIOS
wfx
- Remain-On-Channel (ROC) support
* tag 'wireless-next-2023-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (109 commits)
wifi: rtw89: mac: do bf_monitor only if WiFi 6 chips
wifi: rtw89: mac: set bf_assoc capabilities according to chip gen
wifi: rtw89: mac: set bfee_ctrl() according to chip gen
wifi: rtw89: mac: add registers of MU-EDCA parameters for WiFi 7 chips
wifi: rtw89: mac: generalize register of MU-EDCA switch according to chip gen
wifi: rtw89: mac: update RTS threshold according to chip gen
wifi: rtlwifi: simplify TX command fill callbacks
wifi: hostap: remove unused ioctl function
wifi: atmel: remove unused ioctl function
wifi: rtw89: coex: add annotation __counted_by() to struct rtw89_btc_btf_set_mon_reg
wifi: rtw89: coex: add annotation __counted_by() for struct rtw89_btc_btf_set_slot_table
wifi: rtw89: add EHT radiotap in monitor mode
wifi: rtw89: show EHT rate in debugfs
wifi: rtw89: parse TX EHT rate selected by firmware from RA C2H report
wifi: rtw89: Add EHT rate mask as parameters of RA H2C command
wifi: rtw89: parse EHT information from RX descriptor and PPDU status packet
wifi: radiotap: add bandwidth definition of EHT U-SIG
wifi: rtlwifi: use convenient list_count_nodes()
wifi: p54: Annotate struct p54_cal_database with __counted_by
wifi: brcmfmac: fweh: Add __counted_by for struct brcmf_fweh_queue_item and use struct_size()
...
====================
Link: https://lore.kernel.org/r/20231016143822.880D8C433C8@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matt Johnston says:
====================
I3C MCTP net driver
This series adds an I3C transport for the kernel's MCTP network
protocol. MCTP is a communication protocol between system components
(BMCs, drives, NICs etc), with higher level protocols such as NVMe-MI or
PLDM built on top of it (in userspace). It runs over various transports
such as I2C, PCIe, or I3C.
The mctp-i3c driver follows a similar approach to the kernel's existing
mctp-i2c driver, creating a "mctpi3cX" network interface for each
numbered I3C bus. Busses opt in to support by adding a "mctp-controller"
property to the devicetree:
&i3c0 {
mctp-controller;
}
The driver will bind to MCTP class devices (DCR 0xCC) that are on a
supported I3C bus. Each bus is represented by a `struct mctp_i3c_bus`
that keeps state for the network device. An individual I3C device
(struct mctp_i3c_device) performs operations using the "parent"
mctp_i3c_bus object. The I3C notify/enumeration patch is needed so that
the mctp-i3c driver can handle creating/removing mctp_i3c_bus objects as
required.
The mctp-i3c driver is using the Provisioned ID as an identifier for
target I3C devices (the neighbour address), as that will be more stable
than the I3C dynamic address. The driver internally translates that to a
dynamic address for bus operations.
The driver has been tested using an AST2600 platform. A remote endpoint
has been tested against QEMU, as well as using the target mode support
in Aspeed's vendor tree.
I3C maintainers have acked merging this through net-next tree.
====================
Link: https://lore.kernel.org/r/20231013040628.354323-1-matt@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Provides MCTP network transport over an I3C bus, as specified in
DMTF DSP0233.
Each I3C bus (with "mctp-controller" devicetree property) gets an
"mctpi3cX" net device created. I3C devices are reachable as remote
endpoints through that net device. Link layer addressing uses the
I3C PID as a fixed hardware address for neighbour table entries.
The driver matches I3C devices that have the MIPI assigned DCR 0xCC for
MCTP.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
consume_skb() doesn't walk the segment list, so segments other than
the first are leaked.
Move this skb_consume call into the loop.
Cc: Willem de Bruijn <willemb@google.com>
Fixes: b3098d32ed ("net: add skb_segment kunit test")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-10-16
We've added 90 non-merge commits during the last 25 day(s) which contain
a total of 120 files changed, 3519 insertions(+), 895 deletions(-).
The main changes are:
1) Add missed stats for kprobes to retrieve the number of missed kprobe
executions and subsequent executions of BPF programs, from Jiri Olsa.
2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is
for systemd to reimplement the LogNamespace feature which allows
running multiple instances of systemd-journald to process the logs
of different services, from Daan De Meyer.
3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich.
4) Improve BPF verifier log output for scalar registers to better
disambiguate their internal state wrt defaults vs min/max values
matching, from Andrii Nakryiko.
5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving
the source IP address with a new BPF_FIB_LOOKUP_SRC flag,
from Martynas Pumputis.
6) Add support for open-coded task_vma iterator to help with symbolization
for BPF-collected user stacks, from Dave Marchevsky.
7) Add libbpf getters for accessing individual BPF ring buffers which
is useful for polling them individually, for example, from Martin Kelly.
8) Extend AF_XDP selftests to validate the SHARED_UMEM feature,
from Tushar Vyavahare.
9) Improve BPF selftests cross-building support for riscv arch,
from Björn Töpel.
10) Add the ability to pin a BPF timer to the same calling CPU,
from David Vernet.
11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic
implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments,
from Alexandre Ghiti.
12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen.
13) Fix bpftool's skeleton code generation to guarantee that ELF data
is 8 byte aligned, from Ian Rogers.
14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4
security mitigations in BPF verifier, from Yafang Shao.
15) Annotate struct bpf_stack_map with __counted_by attribute to prepare
BPF side for upcoming __counted_by compiler support, from Kees Cook.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits)
bpf: Ensure proper register state printing for cond jumps
bpf: Disambiguate SCALAR register state output in verifier logs
selftests/bpf: Make align selftests more robust
selftests/bpf: Improve missed_kprobe_recursion test robustness
selftests/bpf: Improve percpu_alloc test robustness
selftests/bpf: Add tests for open-coded task_vma iter
bpf: Introduce task_vma open-coded iterator kfuncs
selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
bpf: Don't explicitly emit BTF for struct btf_iter_num
bpf: Change syscall_nr type to int in struct syscall_tp_t
net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
bpf: Avoid unnecessary audit log for CPU security mitigations
selftests/bpf: Add tests for cgroup unix socket address hooks
selftests/bpf: Make sure mount directory exists
documentation/bpf: Document cgroup unix socket address hooks
bpftool: Add support for cgroup unix socket address hooks
libbpf: Add support for cgroup unix socket address hooks
bpf: Implement cgroup sockaddr hooks for unix sockets
bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
bpf: Propagate modified uaddrlen from cgroup sockaddr programs
...
====================
Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A few networking drivers including bnx2x, bnxt, qede, and idpf call
tcp_gro_complete as part of offloading TCP GRO. The function is only
defined if CONFIG_INET is true, since its TCP specific and is meaningless
if the kernel lacks IP networking support.
The combination of trying to use the complex network drivers with
CONFIG_NET but not CONFIG_INET is rather unlikely in practice: most use
cases are going to need IP networking.
The tcp_gro_complete function just sets some data in the socket buffer for
use in processing the TCP packet in the event that the GRO was offloaded to
the device. If the kernel lacks TCP support, such setup will simply go
unused.
The bnx2x, bnxt, and qede drivers wrap their TCP offload support in
CONFIG_INET checks and skip handling on such kernels.
The idpf driver did not check CONFIG_INET and thus fails to link if the
kernel is configured with CONFIG_NET=y, CONFIG_IDPF=(m|y), and
CONFIG_INET=n.
While checking CONFIG_INET does allow the driver to bypass significantly
more instructions in the event that we know TCP networking isn't supported,
the configuration is unlikely to be used widely.
Rather than require driver authors to care about this, stub the
tcp_gro_complete function when CONFIG_INET=n. This allows drivers to be
left as-is. It does mean the idpf driver will perform slightly more work
than strictly necessary when CONFIG_INET=n, since it will still execute
some of the skb setup in idpf_rx_rsc. However, that work would be performed
in the case where CONFIG_INET=y anyways.
I did not change the existing drivers, since they appear to wrap a
significant portion of code when CONFIG_INET=n. There is little benefit in
trashing these drivers just to unwrap and remove the CONFIG_INET check.
Using a stub for tcp_gro_complete is still beneficial, as it means future
drivers no longer need to worry about this case of CONFIG_NET=y and
CONFIG_INET=n, which should reduce noise from buildbots that check such a
configuration.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20231013185502.1473541-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When modifying netclassid, the command("echo 0x100001 > net_cls.classid")
will take more time on many threads of one process, because the process
create many fds.
for example, one process exists 28000 fds and 60000 threads, echo command
will task 45 seconds.
Now, we only consider the main process when exec "iterate_fd", and the
time is about 52 milliseconds.
Signed-off-by: Liansen Zhai <zhailiansen@kuaishou.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231012090330.29636-1-zhailiansen@kuaishou.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TCP pingpong threshold is 1 by default. But some applications, like SQL DB
may prefer a higher pingpong threshold to activate delayed acks in quick
ack mode for better performance.
The pingpong threshold and related code were changed to 3 in the year
2019 in:
commit 4a41f453be ("tcp: change pingpong threshold to 3")
And reverted to 1 in the year 2022 in:
commit 4d8f24eeed ("Revert "tcp: change pingpong threshold to 3"")
There is no single value that fits all applications.
Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
optimal performance based on the application needs.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>