linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-13 10:10:24 -04:00

Author	SHA1	Message	Date
Wen Gong	68c35cc39b	wifi: ath12k: trigger station disconnect on hardware restart Currently after the hardware restart triggered from the driver, the station interface connection remains intact, since a disconnect trigger is not sent to userspace. This can lead to a problem in targets where the wifi mac sequence is added by the firmware. After the target restart, its wifi mac sequence number gets reset to zero. Hence AP to which our device is connected will receive frames with a wifi mac sequence number jump to the past, thereby resulting in the AP dropping all these frames, until the frame arrives with a wifi mac sequence number which AP was expecting. To avoid such frame drops, its better to trigger a station disconnect upon target hardware restart which can be done with API ieee80211_reconfig_disconnect exposed to mac80211. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230714092555.2018-1-quic_wgong@quicinc.com	2023-08-02 20:00:25 +03:00
Baochen Qiang	7ee027abd4	wifi: ath12k: Use pdev_id rather than mac_id to get pdev We are seeing kernel crash in below test scenario: 1. make DUT connect to an WPA3 encrypted 11ax AP in Ch44 HE80 2. use "wpa_cli -i <inf> disconnect" to disconnect 3. wait for DUT to automatically reconnect Kernel crashes while waiting, below shows the crash stack: [ 755.120868] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 755.120871] #PF: supervisor read access in kernel mode [ 755.120872] #PF: error_code(0x0000) - not-present page [ 755.120873] PGD 0 P4D 0 [ 755.120875] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 755.120876] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 5.19.0-rc1+ #3 [ 755.120878] Hardware name: Intel(R) Client Systems NUC11PHi7/NUC11PHBi7, BIOS PHTGL579.0063.2021.0707.1057 07/07/2021 [ 755.120879] RIP: 0010:ath12k_dp_process_rx_err+0x2b6/0x14a0 [ath12k] [ 755.120890] Code: 01 c0 48 c1 e0 05 48 8b 9c 07 b8 b2 00 00 48 c7 c0 61 ff 0e c1 48 85 db 53 48 0f 44 c6 48 c7 c6 80 9d 0f c1 50 e8 1a 25 00 00 <4c> 8b 3b 4d 8b 76 14 41 59 41 5a 41 8b 87 78 43 01 00 4d 85 f6 89 [ 755.120891] RSP: 0018:ffff9a93402c8d10 EFLAGS: 00010282 [ 755.120892] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000303 [ 755.120893] RDX: 0000000000000000 RSI: ffffffff93b7cbe9 RDI: 00000000ffffffff [ 755.120894] RBP: ffff9a93402c8e50 R08: ffffffff93e65360 R09: ffffffff942e044d [ 755.120894] R10: 0000000000000000 R11: 0000000000000063 R12: ffff8dbec5420000 [ 755.120895] R13: ffff8dbec5420000 R14: ffff8dbdefe9a0a0 R15: ffff8dbec5420000 [ 755.120896] FS: 0000000000000000(0000) GS:ffff8dc2705c0000(0000) knlGS:0000000000000000 [ 755.120897] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 755.120898] CR2: 0000000000000000 CR3: 0000000107be4005 CR4: 0000000000770ee0 [ 755.120898] PKRU: 55555554 [ 755.120899] Call Trace: [ 755.120900] <IRQ> [ 755.120903] ? ath12k_pci_write32+0x2e/0x80 [ath12k] [ 755.120910] ath12k_dp_service_srng+0x214/0x2e0 [ath12k] [ 755.120917] ath12k_pci_ext_grp_napi_poll+0x26/0x80 [ath12k] [ 755.120923] __napi_poll+0x2b/0x1c0 [ 755.120925] net_rx_action+0x2a1/0x2f0 [ 755.120927] __do_softirq+0xfa/0x2e9 [ 755.120929] irq_exit_rcu+0xb9/0xd0 [ 755.120932] common_interrupt+0xc1/0xe0 [ 755.120934] </IRQ> [ 755.120934] <TASK> [ 755.120935] asm_common_interrupt+0x2c/0x40 [ 755.120936] RIP: 0010:cpuidle_enter_state+0xdd/0x3a0 [ 755.120938] Code: 00 31 ff e8 45 e2 74 ff 80 7d d7 00 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 a0 02 00 00 31 ff e8 69 79 7b ff fb 0f 1f 44 00 00 <45> 85 ff 0f 88 6d 01 00 00 49 63 d7 4c 2b 6d c8 48 8d 04 52 48 8d [ 755.120939] RSP: 0018:ffff9a934018be50 EFLAGS: 00000246 [ 755.120940] RAX: ffff8dc2705c0000 RBX: 0000000000000002 RCX: 000000000000001f [ 755.120941] RDX: 000000afd0b532d3 RSI: ffffffff93b7cbe9 RDI: ffffffff93b8b66e [ 755.120942] RBP: ffff9a934018be88 R08: 0000000000000002 R09: 0000000000030500 [ 755.120942] R10: ffff9a934018be18 R11: 0000000000000741 R12: ffffba933fdc0600 [ 755.120943] R13: 000000afd0b532d3 R14: ffffffff93fcbc60 R15: 0000000000000002 [ 755.120945] cpuidle_enter+0x2e/0x40 [ 755.120946] call_cpuidle+0x23/0x40 [ 755.120948] do_idle+0x1ff/0x260 [ 755.120950] cpu_startup_entry+0x1d/0x20 [ 755.120951] start_secondary+0x10d/0x130 [ 755.120953] secondary_startup_64_no_verify+0xd3/0xdb [ 755.120956] </TASK> [ 755.120956] Modules linked in: michael_mic rfcomm cmac algif_hash algif_skcipher af_alg bnep qrtr_mhi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm_intel qrtr snd_hda_codec_hdmi kvm irqbypass ath12k snd_hda_intel snd_seq_midi crct10dif_pclmul mhi ghash_clmulni_intel snd_intel_dspcfg snd_seq_midi_event aesni_intel qmi_helpers i915 snd_rawmidi crypto_simd snd_hda_codec cryptd cec intel_cstate snd_hda_core mac80211 rc_core nouveau snd_seq snd_hwdep btusb drm_buddy drm_ttm_helper nls_iso8859_1 snd_pcm ttm btrtl snd_seq_device wmi_bmof mxm_wmi input_leds cfg80211 joydev btbcm drm_display_helper snd_timer btintel mei_me libarc4 drm_kms_helper bluetooth i2c_algo_bit snd fb_sys_fops syscopyarea mei sysfillrect ecdh_generic soundcore sysimgblt ecc acpi_pad mac_hid sch_fq_codel ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp ramoops parport reed_solomon drm efi_pstore ip_tables x_tables autofs4 [ 755.120992] hid_generic usbhid hid ax88179_178a usbnet mii nvme nvme_core rtsx_pci_sdmmc crc32_pclmul i2c_i801 intel_lpss_pci i2c_smbus intel_lpss rtsx_pci idma64 virt_dma vmd wmi video [ 755.121002] CR2: 0000000000000000 The crash is because, for WCN7850, only ab->pdev[0] is initialized, while mac_id here is misused to retrieve pdev and it is not zero, leading to a NULL pointer access. Fix this issue by getting pdev_id first and then use it to retrieve pdev. Also fix some other code snippets which have the same issue. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230714080658.3140-1-quic_bqiang@quicinc.com	2023-08-02 19:59:04 +03:00
Wen Gong	1e9b1363e2	wifi: ath12k: avoid array overflow of hw mode for preferred_hw_mode Currently ath12k define WMI_HOST_HW_MODE_DBS_OR_SBS=5 as max hw mode for enum wmi_host_hw_mode_config_type, it is also same for the array ath12k_hw_mode_pri_map. When tested with new version firmware/board data which support new hw mode eMLSR mode with hw mode value 8, it leads overflow usage for array ath12k_hw_mode_pri_map in function ath12k_wmi_hw_mode_caps(), and then lead preferred_hw_mode changed to 8, and finally function ath12k_pull_mac_phy_cap_svc_ready_ext() select the capability of hw mode 8, but the capability of eMLSR mode report from firmware does not support 2.4 GHz band for WCN7850, so finally 2.4 GHz band is disabled. Skip the hw mode which exceeds WMI_HOST_HW_MODE_MAX in function ath12k_wmi_hw_mode_caps() helps to avoid array overflow, then the 2.4 GHz band will not be disabled. This is to keep compatibility with newer version firmware/board data files, this change is still needed after ath12k add eMLSR hw mode 8 in array ath12k_hw_mode_pri_map and enum wmi_host_hw_mode_config_type, because more hw mode maybe added in next firmware/board data version e.g hw mode 9, then it will also lead new array overflow without this change. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230714072405.28705-1-quic_wgong@quicinc.com	2023-08-02 19:57:56 +03:00
Arnd Bergmann	603cf6c2fc	wifi: ath12k: fix memcpy array overflow in ath12k_peer_assoc_h_he() Two memory copies in this function copy from a short array into a longer one, using the wrong size, which leads to an out-of-bounds access: include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] __read_overflow2_field(q_size_field, size); ^ include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] 2 errors generated. Fixes: `d889913205` ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230703123737.3420464-1-arnd@kernel.org	2023-08-02 19:55:53 +03:00
Wen Gong	15c8441dc1	wifi: ath12k: correct the data_type from QMI_OPT_FLAG to QMI_UNSIGNED_1_BYTE for mlo_capable Currently, the encoding rule for field mlo_capable in struct qmi_wlanfw_host_cap_req_msg_v01 defined in array qmi_wlanfw_host_cap_req_msg_v01_ei uses type QMI_OPT_FLAG. Unfortunately, all ath12k firmware actually expects this field to be of type NON QMI_OPT_FLAG such as QMI_UNSIGNED_1_BYTE/QMI_UNSIGNED_8_BYTE... And as a result, firmware is unable to correctly decode the mlo_capable field. Change the ath12k definition as QMI_UNSIGNED_1_BYTE to match the firmware definition so that firmware can correctly parse the mlo_capable info from message QMI_WLANFW_HOST_CAP_REQ_V01 at wlan load time. This is just an accidental typo and that both WCN7850 and QCN9274 firmwares use QMI_UNSIGNED_1_BYTE. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726093857.3610-1-quic_wgong@quicinc.com	2023-08-02 19:54:57 +03:00
Wen Gong	8ad314da54	wifi: ath12k: Fix a NULL pointer dereference in ath12k_mac_op_hw_scan() In ath12k_mac_op_hw_scan(), the return value of kzalloc() is directly used in memcpy(), which may lead to a NULL pointer dereference on failure of kzalloc(). Fix this bug by adding a check of arg.extraie.ptr. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726092625.3350-1-quic_wgong@quicinc.com	2023-08-02 19:54:18 +03:00
Seevalamuthu Mariappan	13329d0cb7	wifi: ath11k: Remove cal_done check during probe In some race conditions, calibration done QMI message is received even before host wait starts for calibration to be done. Due to this, resetting firmware was not performed after calibration. Hence, remove cal_done check in ath11k_qmi_fwreset_from_cold_boot() as this is called only from probe. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Signed-off-by: Seevalamuthu Mariappan <quic_seevalam@quicinc.com> Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726141032.3061-4-quic_rajkbhag@quicinc.com	2023-08-02 19:49:39 +03:00
Anilkumar Kolli	bdfc967bf5	wifi: ath11k: Add coldboot calibration support for QCN9074 QCN9074 supports 6 GHz, which has increased number of channels compared to 5 GHz/2 GHz. So, to support coldboot calibration in QCN9074 ATH11K_COLD_BOOT_FW_RESET_DELAY extended to 60 seconds. To avoid code redundancy, fwreset_from_cold_boot moved to QMI and made common for both ahb and pci. Coldboot calibration is enabled only in FTM mode for QCN9074. QCN9074 requires firmware restart after coldboot, hence enable cbcal_restart_fw in hw_params. This support can be enabled/disabled using hw params for different hardware. Currently it is not enabled for QCA6390. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Signed-off-by: Anilkumar Kolli <quic_akolli@quicinc.com> Signed-off-by: Seevalamuthu Mariappan <quic_seevalam@quicinc.com> Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726141032.3061-3-quic_rajkbhag@quicinc.com	2023-08-02 19:49:39 +03:00
Seevalamuthu Mariappan	011e5a3052	wifi: ath11k: Split coldboot calibration hw_param QCN9074 enables coldboot calibration only in Factory Test Mode (FTM). Hence, split cold_boot_calib to two hw_params for mission and FTM mode. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Signed-off-by: Seevalamuthu Mariappan <quic_seevalam@quicinc.com> Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726141032.3061-2-quic_rajkbhag@quicinc.com	2023-08-02 19:49:38 +03:00
Dmitry Antipov	6f092c98dc	wifi: ath11k: simplify ath11k_mac_validate_vht_he_fixed_rate_settings() In ath11k_mac_validate_vht_he_fixed_rate_settings() ar->ab->peers list is not altered so list_for_each_entry() should be safe. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726092113.78794-1-dmantipov@yandex.ru	2023-08-02 14:52:15 +03:00
Aditya Kumar Singh	72c8caf904	wifi: ath11k: fix band selection for ppdu received in channel 177 of 5 GHz 5 GHz band channel 177 support was added with the commit `e5e94d10c8` ("wifi: ath11k: add channel 177 into 5 GHz channel list"). However, during processing for the received ppdu in ath11k_dp_rx_h_ppdu(), channel number is checked only till 173. This leads to driver code checking for channel and then fetching the band from it which is extra effort since firmware has already given the channel number in the metadata. Fix this issue by checking the channel number till 177 since we support it now. Found via code review. Compile tested only. Fixes: `e5e94d10c8` ("wifi: ath11k: add channel 177 into 5 GHz channel list") Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230726044624.20507-1-quic_adisi@quicinc.com	2023-08-02 14:48:40 +03:00
Dmitry Antipov	1ad8237e97	wifi: wil6210: fix fortify warnings When compiling with gcc 13.1 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following: In function ‘fortify_memcpy_chk’, inlined from ‘wil_rx_crypto_check_edma’ at drivers/net/wireless/ath/wil6210/txrx_edma.c:566:2: ./include/linux/fortify-string.h:529:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 529 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ where the compiler complains on: const u8 pn; ... pn = (u8 )&st->ext.pn_15_0; ... memcpy(cc->pn, pn, IEEE80211_GCMP_PN_LEN); and: In function ‘fortify_memcpy_chk’, inlined from ‘wil_rx_crypto_check’ at drivers/net/wireless/ath/wil6210/txrx.c:684:2: ./include/linux/fortify-string.h:529:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 529 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ where the compiler complains on: const u8 pn = (u8 )&d->mac.pn_15_0; ... memcpy(cc->pn, pn, IEEE80211_GCMP_PN_LEN); In both cases, the fortification logic interprets 'memcpy()' as 6-byte overread of 2-byte field 'pn_15_0' of 'struct wil_rx_status_extension' and 'pn_15_0' of 'struct vring_rx_mac', respectively. To silence these warnings, last two fields of the aforementioned structures are grouped using 'struct_group_attr(pn, __packed' quirk. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230621093711.80118-1-dmantipov@yandex.ru	2023-07-27 19:12:49 +03:00
Dongliang Mu	061115fbfb	wifi: ath9k: fix printk specifier Smatch reports: ath_pci_probe() warn: argument 4 to %lx specifier is cast from pointer ath_ahb_probe() warn: argument 4 to %lx specifier is cast from pointer Fix it by modifying %lx to %p in the printk format string. Note that with this change, the pointer address will be printed as a hashed value by default. This is appropriate because the kernel should not leak kernel pointers to user space in an informational message. If someone wants to see the real address for debugging purposes, this can be achieved with the no_hash_pointers kernel option. Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230723040403.296723-1-dzm91@hust.edu.cn	2023-07-25 17:31:37 +03:00
Wang Ming	1301783c8d	wifi: ath6kl: Remove error checking for debugfs_create_dir() It is expected that most callers should _ignore_ the errors return by debugfs_create_dir() in ath6kl_debug_init_fs(). Signed-off-by: Wang Ming <machel@vivo.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230714014358.514-1-machel@vivo.com	2023-07-25 17:30:54 +03:00
Minjie Du	f7eb8315b2	wifi: ath5k: remove phydir check from ath5k_debug_init_device() 'phydir' returned from debugfs_create_dir() is checked against NULL. As the debugfs API returns an error pointer, the returned value can never be NULL. Therefore, as the documentation suggests that the check is unnecessary and other debugfs calls have no operation in error cases, it is advisable to completely eliminate the check. Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230714081619.2032-1-duminjie@vivo.com	2023-07-25 17:30:03 +03:00
Dmitry Antipov	810e41cebb	wifi: ath9k: fix fortify warnings When compiling with gcc 13.1 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following: In function ‘fortify_memcpy_chk’, inlined from ‘ath_tx_complete_aggr’ at drivers/net/wireless/ath/ath9k/xmit.c:556:4, inlined from ‘ath_tx_process_buffer’ at drivers/net/wireless/ath/ath9k/xmit.c:773:3: ./include/linux/fortify-string.h:529:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 529 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘fortify_memcpy_chk’, inlined from ‘ath_tx_count_frames’ at drivers/net/wireless/ath/ath9k/xmit.c:473:3, inlined from ‘ath_tx_complete_aggr’ at drivers/net/wireless/ath/ath9k/xmit.c:572:2, inlined from ‘ath_tx_process_buffer’ at drivers/net/wireless/ath/ath9k/xmit.c:773:3: ./include/linux/fortify-string.h:529:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 529 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In both cases, the compiler complains on: memcpy(ba, &ts->ba_low, WME_BA_BMP_SIZE >> 3); which is the legal way to copy both 'ba_low' and following 'ba_high' members of 'struct ath_tx_status' at once (that is, issue one 8-byte 'memcpy()' for two 4-byte fields). Since the fortification logic seems interprets this trick as an attempt to overread 4-byte 'ba_low', silence relevant warnings by using the convenient 'struct_group()' quirk. Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230620080855.396851-2-dmantipov@yandex.ru	2023-07-25 17:28:03 +03:00
Dmitry Antipov	90f2ba4896	wifi: ath9k: avoid using uninitialized array In 'ath_tx_count_frames()', 'ba' array may be used uninitialized, so add 'memset()' call similar to one used in 'ath_tx_complete_aggr()'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230620080855.396851-1-dmantipov@yandex.ru	2023-07-25 17:28:02 +03:00
Eric Dumazet	f5f80e32de	ipv6: remove hard coded limitation on ipv6_pinfo IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit `f5d547676c` ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chao Wu <wwchao@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-24 09:39:31 +01:00
Patrick Rohr	1671bcfd76	net: add sysctl accept_ra_min_rtr_lft This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:51:24 +01:00
justinstitt@google.com	5c9f7b04aa	net: dsa: remove deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1]. Even call sites utilizing length-bounded destination buffers should switch over to using `strtomem` or `strtomem_pad`. In this case, however, the compiler is unable to determine the size of the `data` buffer which renders `strtomem` unusable. Due to this, `strscpy` should be used. It should be noted that most call sites already zero-initialize the destination buffer. However, I've opted to use `strscpy_pad` to maintain the same exact behavior that `strncpy` produced (zero-padded tail up to `len`). Also see [3]. [1]: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [2]: elixir.bootlin.com/linux/v6.3/source/net/ethtool/ioctl.c#L1944 [3]: manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:45:46 +01:00
David S. Miller	2e60314c28	Merge branch 'process-connector-bug-fixes-and-enhancements' Anjali Kulkarni says: ==================== Process connector bug fixes & enhancements Oracle DB is trying to solve a performance overhead problem it has been facing for the past 10 years and using this patch series, we can fix this issue. Oracle DB runs on a large scale with 100000s of short lived processes, starting up and exiting quickly. A process monitoring DB daemon which tracks and cleans up after processes that have died without a proper exit needs notifications only when a process died with a non-zero exit code (which should be rare). Due to the pmon architecture, which is distributed, each process is independent and has minimal interaction with pmon. Hence fd based solutions to track a process's spawning and exit cannot be used. Pmon needs to detect the abnormal death of a process so it can cleanup after. Currently it resorts to checking /proc every few seconds. Other methods we tried like using system call to reduce the above overhead were not accepted upstream. With this change, we add event based filtering to proc connector module so that DB can only listen to the events it is interested in. A new event type PROC_EVENT_NONZERO_EXIT is added, which is only sent by kernel to a listening application when any process exiting has a non-zero exit status. This change will give Oracle DB substantial performance savings - it takes 50ms to scan about 8K PIDs in /proc, about 500ms for 100K PIDs. DB does this check every 3 secs, so over an hour we save 10secs for 100K PIDs. With this, a client can register to listen for only exit or fork or a mix or all of the events. This greatly enhances performance - currently, we need to listen to all events, and there are 9 different types of events. For eg. handling 3 types of events - 8K-forks + 8K-exits + 8K-execs takes 200ms, whereas handling 2 types - 8K-forks + 8K-exits takes about 150ms, and handling just one type - 8K exits takes about 70ms. Measuring the time using pidfds for monitoring 8K process exits took 4 times longer - 200ms, as compared to 70ms using only exit notifications of proc connector. Hence, we cannot use pidfd for our use case. This kind of a new event could also be useful to other applications like Google's lmkd daemon, which needs a killed process's exit notification. This patch series is organized as follows - Patch 1 : Needed for patch 3 to work. Patch 2 : Needed for patch 3 to work. Patch 3 : Fixes some bugs in proc connector, details in the patch. Patch 4 : Adds event based filtering for performance enhancements. Patch 5 : Allow non-root users access to proc connector events. Patch 6 : Selftest code for proc connector. v9->v10 changes: - Rebased to net-next, re-compiled and re-tested. v8->v9 changes: - Added sha1 ("title") of reversed patch as suggested by Eric Dumazet. v7->v8 changes: - Fixed an issue pointed by Liam Howlett in v7. v6->v7 changes: - Incorporated Liam Howlett's comments on v6 - Incorporated Kalesh Anakkur Purayil's comments v5->v6 changes: - Incorporated Liam Howlett's comments - Removed FILTER define from proc_filter.c and added a "-f" run-time option to run new filter code. - Made proc_filter.c a selftest in tools/testing/selftests/connector v4->v5 changes: - Change the cover letter - Fix a small issue in proc_filter.c v3->v4 changes: - Fix comments by Jakub Kicinski to incorporate root access changes within bind call of connector v2->v3 changes: - Fix comments by Jakub Kicinski to separate netlink (patch 2) (after layering) from connector fixes (patch 3). - Minor fixes suggested by Jakub. - Add new multicast group level permissions check at netlink layer. Split this into netlink & connector layers (patches 6 & 7) v1->v2 changes: - Fix comments by Jakub Kicinski to keep layering within netlink and update kdocs. - Move non-root users access patch last in series so remaining patches can go in first. v->v1 changes: - Changed commit log in patch 4 as suggested by Christian Brauner - Changed patch 4 to make more fine grained access to non-root users - Fixed warning in cn_proc.c, Reported-by: kernel test robot <lkp@intel.com> - Fixed some existing warnings in cn_proc.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	73a29531f4	connector/cn_proc: Selftest for proc connector Run as ./proc_filter -f to run new filter code. Run without "-f" to run usual proc connector code without the new filtering code. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	bfdfdc2f3b	connector/cn_proc: Allow non-root users access There were a couple of reasons for not allowing non-root users access initially - one is there was some point no proper receive buffer management in place for netlink multicast. But that should be long fixed. See link below for more context. Second is that some of the messages may contain data that is root only. But this should be handled with a finer granularity, which is being done at the protocol layer. The only problematic protocols are nf_queue and the firewall netlink. Hence, this restriction for non-root access was relaxed for NETLINK_ROUTE initially: https://lore.kernel.org/all/20020612013101.A22399@wotan.suse.de/ This restriction has also been removed for following protocols: NETLINK_KOBJECT_UEVENT, NETLINK_AUDIT, NETLINK_SOCK_DIAG, NETLINK_GENERIC, NETLINK_SELINUX. Since process connector messages are not sensitive (process fork, exit notifications etc.), and anyone can read /proc data, we can allow non-root access here. However, since process event notification is not the only consumer of NETLINK_CONNECTOR, we can make this change even more fine grained than the protocol level, by checking for multicast group within the protocol. Allow non-root access for NETLINK_CONNECTOR via NL_CFG_F_NONROOT_RECV but add new bind function cn_bind(), which allows non-root access only for CN_IDX_PROC multicast group. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	743acf351b	connector/cn_proc: Performance improvements This patch adds the capability to filter messages sent by the proc connector on the event type supplied in the message from the client to the connector. The client can register to listen for an event type given in struct proc_input. This event based filteting will greatly enhance performance - handling 8K exits takes about 70ms, whereas 8K-forks + 8K-exits takes about 150ms & handling 8K-forks + 8K-exits + 8K-execs takes 200ms. There are currently 9 different types of events, and we need to listen to all of them. Also, measuring the time using pidfds for monitoring 8K process exits took much longer - 200ms, as compared to 70ms using only exit notifications of proc connector. We also add a new event type - PROC_EVENT_NONZERO_EXIT, which is only sent by kernel to a listening application when any process exiting, has a non-zero exit status. This will help the clients like Oracle DB, where a monitoring process wants notfications for non-zero process exits so it can cleanup after them. This kind of a new event could also be useful to other applications like Google's lmkd daemon, which needs a killed process's exit notification. The patch takes care that existing clients using old mechanism of not sending the event type work without any changes. cn_filter function checks to see if the event type being notified via proc connector matches the event type requested by client, before sending(matches) or dropping(does not match) a packet. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	2aa1f7a1f4	connector/cn_proc: Add filtering to fix some bugs The current proc connector code has the foll. bugs - if there are more than one listeners for the proc connector messages, and one of them deregisters for listening using PROC_CN_MCAST_IGNORE, they will still get all proc connector messages, as long as there is another listener. Another issue is if one client calls PROC_CN_MCAST_LISTEN, and another one calls PROC_CN_MCAST_IGNORE, then both will end up not getting any messages. This patch adds filtering and drops packet if client has sent PROC_CN_MCAST_IGNORE. This data is stored in the client socket's sk_user_data. In addition, we only increment or decrement proc_event_num_listeners once per client. This fixes the above issues. cn_release is the release function added for NETLINK_CONNECTOR. It uses the newly added netlink_release function added to netlink_sock. It will free sk_user_data. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	a4c9a56e6a	netlink: Add new netlink_release function A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be supplied by the protocol via the release function in netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR protocol, so it can free it's data when socket is deleted. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Anjali Kulkarni	a3377386b5	netlink: Reverse the patch which removed filtering To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink filtering - commit ID for that patch: `549017aa1b` (netlink: remove netlink_broadcast_filtered). Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Jakub Kicinski	6bfef2ec01	Merge branch 'net-page_pool-remove-page_pool_release_page' Jakub Kicinski says: ==================== net: page_pool: remove page_pool_release_page() page_pool_return_page() is a historic artefact from before recycling of pages attached to skbs was supported. Theoretical uses for it may be thought up but in practice all existing users can be converted to use skb_mark_for_recycle() instead. This code was previously posted as part of the memory provider RFC. https://lore.kernel.org/all/20230707183935.997267-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230720010409.1967072-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:50:39 -07:00
Jakub Kicinski	07e0c7d317	net: page_pool: merge page_pool_release_page() with page_pool_return_page() Now that page_pool_release_page() is not exported we can merge it with page_pool_return_page(). I believe that the "Do not replace this with page_pool_return_page()" comment was there in case page_pool_return_page() was not inlined, to avoid two function calls. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230720010409.1967072-5-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:50:24 -07:00
Jakub Kicinski	535b9c61bd	net: page_pool: hide page_pool_release_page() There seems to be no user calling page_pool_release_page() for legit reasons, all the users simply haven't been converted to skb-based recycling, yet. Previous changes converted them. Update the docs, and unexport the function. Link: https://lore.kernel.org/r/20230720010409.1967072-4-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:50:18 -07:00
Jakub Kicinski	98e2727c79	eth: stmmac: let page recycling happen with skbs stmmac removes pages from the page pool after attaching them to skbs. Use page recycling instead. skb heads are always copied, and pages are always from page pool in this driver. We could as well mark all allocated skbs for recycling. Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230720010409.1967072-3-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:50:18 -07:00
Jakub Kicinski	b03f68ba26	eth: tsnep: let page recycling happen with skbs tsnep builds an skb with napi_build_skb() and then calls page_pool_release_page() for the page in which that skb's head sits. Use recycling instead, recycling of heads works just fine. Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230720010409.1967072-2-kuba@kernel.org Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:50:18 -07:00
Jiri Pirko	5766946ea5	genetlink: add explicit ordering break check for split ops Currently, if cmd in the split ops array is of lower value than the previous one, genl_validate_ops() continues to do the checks as if the values are equal. This may result in non-obvious WARN_ON() hit in these check. Instead, check the incorrect ordering explicitly and put a WARN_ON() in case it is broken. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230720111354.562242-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:49:12 -07:00
Marc Kleine-Budde	070e8bd31b	MAINTAINERS: net: fix sort order Linus seems to like the MAINTAINERS file sorted, see `c192ac7357` ("MAINTAINERS 2: Electric Boogaloo"). Since this is currently not the case, restore the sort order. Fixes: `3abf3d15ff` ("MAINTAINERS: ASP 2.0 Ethernet driver maintainers") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230720151107.679668-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-21 18:48:54 -07:00
David S. Miller	2da6a80416	Merge branch 'octeontx2-pf-round-robin-sched' Hariprasad Kelam says: ==================== octeontx2-pf: support Round Robin scheduling octeontx2 and CN10K silicons support Round Robin scheduling. When multiple traffic flows reach transmit level with the same priority, with Round Robin scheduling traffic flow with the highest quantum value is picked. With this support, the user can add multiple classes with the same priority and different quantum in htb offload. This series of patches adds support for the same. Patch1: implement transmit schedular allocation algorithm as preparation for support round robin scheduling. Patch2: Allow quantum parameter in HTB offload mode. Patch3: extends octeontx2 htb offload support for Round Robin scheduling Patch4: extend QOS documentation for Round Robin scheduling Hariprasad Kelam (1): docs: octeontx2: extend documentation for Round Robin scheduling Naveen Mamindlapalli (3): octeontx2-pf: implement transmit schedular allocation algorithm sch_htb: Allow HTB quantum parameter in offload mode octeontx2-pf: htb offload support for Round Robin scheduling --- v4 * update classid values in documentation. v3 * 1. update QOS documentation for round robin scheduling 2. added out of bound checks for quantum parameter v2 * change data type of otx2_index_used to reduce size of structure otx2_qos_cfg ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 09:55:54 +01:00
Hariprasad Kelam	6f71051ffb	docs: octeontx2: extend documentation for Round Robin scheduling Add example tc-htb commands for Round robin scheduling Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 09:55:54 +01:00
Naveen Mamindlapalli	47a9656f16	octeontx2-pf: htb offload support for Round Robin scheduling When multiple traffic flows reach Transmit level with the same priority, with Round robin scheduling traffic flow with the highest quantum value is picked. With this support, the user can add multiple classes with the same priority and different quantum. This patch does necessary changes to support the same. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 09:55:53 +01:00
Naveen Mamindlapalli	9fe63d5f1d	sch_htb: Allow HTB quantum parameter in offload mode The current implementation of HTB offload returns the EINVAL error for quantum parameter. This patch removes the error returning checks for 'quantum' parameter and populates its value to tc_htb_qopt_offload structure such that driver can use the same. Add quantum parameter check in mlx5 driver, as mlx5 devices are not capable of supporting the quantum parameter when htb offload is used. Report error if quantum parameter is set to a non-default value. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 09:55:53 +01:00
Naveen Mamindlapalli	f78dca6912	octeontx2-pf: implement transmit schedular allocation algorithm unlike strict priority, where number of classes are limited to max 8, there is no restriction on the number of dwrr child nodes unless the count increases the max number of child nodes supported. Hardware expects strict priority transmit schedular indexes mapped to their priority. This patch adds defines transmit schedular allocation algorithm such that the above requirement is honored. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 09:55:53 +01:00
David S. Miller	c6514f3627	Merge branch 'mlxsw-enslavement' Petr Machata says: ==================== mlxsw: Permit enslavement to netdevices with uppers The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. Similarly, detaching a front panel port from a configured topology means unoffloading of this whole topology -- VLAN uppers, next hops, etc. Attaching the port back is then not permitted at all. If it were, it would not result in a working configuration, because much of mlxsw is written to react to changes in immediate configuration. There is nothing that would go visit netdevices in the attached-to topology and offload existing routes and VLAN memberships, for example. In this patchset, introduce a number of replays to be invoked so that this sort of post-hoc offload is supported. Then remove the vetoes that disallowed enslavement of front panel ports to other netdevices with uppers. The patchset progresses as follows: - In patch #1, fix an issue in the bridge driver. To my knowledge, the issue could not have resulted in a buggy behavior previously, and thus is packaged with this patchset instead of being sent separately to net. - In patch #2, add a new helper to the switchdev code. - In patch #3, drop mlxsw selftests that will not be relevant after this patchset anymore. - Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother introduction of the rest of the code. - Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper configuration when a front panel port is introduced into a topology. Individual patches take care of bridge and LAG RIF memberships, switchdev replay, nexthop and neighbors replay, and MACVLAN offload. - Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a front panel port is enslaved (in which case all uppers are newly relevant), or, respectively, deslaved (in which case the newly-relevant netdevice is the one being deslaved). - Up until this point, the introduced scaffolding was not really used, because mlxsw still forbids enslavement of mlxsw netdevices to uppers with uppers. In patch #17, this condition is finally relaxed. A sizable selftest suite is available to test all this new code. That will be sent in a separate patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:06 +01:00
Petr Machata	2c5ffe8d72	mlxsw: spectrum: Permit enslavement to netdevices with uppers Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. In the previous patches, a number of replays have been added. Those ensure that various bits of state, such as next hops or switchdev objects, are offloaded when they become relevant due to a mlxsw lower being introduced into the topology. However the act of actually, for example, enslaving a front-panel port to a bridge with uppers, has been vetoed so far. In this patch, remove the vetoes and permit the operation. mlxsw currently validates creation of "interesting" uppers. Thus creating VLAN netdevices on top of 802.1ad bridges is forbidden if the bridge has an mlxsw lower, but permitted in general. This validation code never gets run when a port is introduced as a lower of an existing netdevice structure. Thus when enslaving an mlxsw netdevice to netdevices with uppers, invoke the PRECHANGEUPPER event handler for each netdevice above the one that the front panel port is being enslaved to. This way the tower of netdevices above the attachment point is validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:06 +01:00
Petr Machata	4560cf408e	mlxsw: spectrum_router: Replay IP NETDEV_UP on device deslavement When a netdevice is removed from a bridge or a LAG, and it has an IP address, it should join the router and gain a RIF. Do that by replaying address addition event on the netdevice. When handling deslavement of LAG or its upper from a bridge device, the replay should be done after all the lowers of the LAG have left the bridge. Thus these scenarios are handled by passing replay_deslavement of false, and by invoking, after the lowers have been processed, a new helper, mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper handling, and in particular invokes the replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:06 +01:00
Petr Machata	31618b22f2	mlxsw: spectrum_router: Replay IP NETDEV_UP on device enslavement Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. When this is permitted, any uppers with IP addresses need to have the NETDEV_UP inetaddr event replayed, so that any RIFs are created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	8fdb09a767	mlxsw: spectrum_router: Replay neighbours when RIF is made As neighbours are created, mlxsw is involved through the netevent notifications. When at the time there is no RIF for a given neighbour, the notification is not acted upon. When the RIF is later created, these outstanding neighbours are left unoffloaded and cause traffic to go through the SW datapath. In order to fix this issue, as a RIF is created, walk the ARP and ND tables and find neighbours for the netdevice that represents the RIF. Then schedule neighbour work for them, allowing them to be offloaded. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	49c3a615d3	mlxsw: spectrum_router: Replay MACVLANs when RIF is made If IP address is added to a MACVLAN netdevice, the effect is of configuring VRRP on the RIF for the netdevice linked to the MACVLAN. Because the MACVLAN offload is tied to existence of a RIF at the linked netdevice, adding a MACVLAN is currently not allowed until a RIF is present. If this requirement stays, it will never be possible to attach a first port into a topology that involves a MACVLAN. Thus topologies would need to be built in a certain order, which is impractical. Additionally, IP address removal, which leads to disappearance of the RIF that the MACVLAN depends on, cannot be vetoed. Thus even as things stand now it is possible to get to a state where a MACVLAN netdevice exists without a RIF, despite having mlxsw lowers. And once the MACVLAN is un-offloaded due to RIF getting destroyed, recreating the RIF does not bring it back. In this patch, accept that MACVLAN can be created out of order and support that use case. One option would seem to be to simply recognize MACVLAN netdevices as "interesting", and let the existing replay mechanisms take care of the offload. However, that does not address the necessity to reoffload MACVLAN once a RIF is created. Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(), called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the existing offloads, applies the configuration as if the netdevice were created just now. Additionally, remove all vetoes and warning messages that checked for presence of a RIF at the linked device. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	cfc01a92ea	mlxsw: spectrum_router: Offload ethernet nexthops when RIF is made As RIF is created, refresh each netxhop group tracked at the CRIF for which the RIF was created. Note that nothing needs to be done for IPIP nexthops. The RIF for these is either available from the get-go, or will never be available, so no after the fact offloading needs to be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	ef59713c26	mlxsw: spectrum_router: Join RIFs of LAG upper VLANs In the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join not only RIF for the immediate LAG, as is currently the case, but also RIFs for VLAN netdevices upper to the LAG. In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the uppers of a LAG being joined, and also join any VLAN ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	ec4643ca3d	mlxsw: spectrum_switchdev: Replay switchdev objects on port join Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. At that point there are no bridge objects (such as port VLANs) to replay. Those are added afterwards, and notified as they are created. This holds even for the PVID. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to replay the existing bridge objects. Without this replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which causes issues later, as a lot of code relies on their presence. To that end, add a new notifier block whose sole role is to filter out events related to the one relevant upper, and forward those to the existing switchdev notifier block. Pass the new notifier block to switchdev_bridge_port_offload() when the bridge port is created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:05 +01:00
Petr Machata	987c7782f0	mlxsw: spectrum: On port enslavement to a LAG, join upper's bridges Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG already has an upper, enslaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join bridges of LAG uppers. Without this replay, the mlxsw bridge_port objects are not instantiated, which causes issues later, as a lot of code relies on their presence. Therefore in this patch, when the first mlxsw physical netdevice is enslaved to a LAG, consider bridges upper to the LAG (both the direct master, if any, and any bridge masters of VLAN uppers), and have the relevant netdevices join their bridges. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:04 +01:00
Petr Machata	1c47e65b8c	mlxsw: spectrum: Add a replay_deslavement argument to event handlers When handling deslavement of LAG or its upper from a bridge device, when the deslaved netdevice has an IP address, it should join the router. This should be done after all the lowers of the LAG have left the bridge. The replay intended to cause the device to join the router therefore cannot be invoked unconditionally in the event handlers themselves. It can be done right away if the handler is invoked for a sole device, but when it is invoked repeated for each LAG lower, the replay needs to be postponed until after this processing is done. To that end, add a boolean parameter, replay_deslavement, to mlxsw_sp_netdevice_port_upper_event(), mlxsw_sp_netdevice_port_vlan_event() and one helper on the call path. Have the invocations that are done for sole netdevices pass true, and those done for LAG lowers pass false. Nothing depends on this flag at this point, but it removes some noise from the patch that introduces the replay itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:04 +01:00

1 2 3 4 5 ...

1200653 Commits