linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-09 12:33:18 -04:00

Author	SHA1	Message	Date
David S. Miller	2619835e31	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-08-27 This series contains updates to ice driver only. Jake corrects the iterator used for looping Tx timestamp and removes dead code related to pin configuration. He also adds locking around flushing of the Tx tracker and restarts the periodic clock following time changes. Brett corrects the locking around updating netdev dev_addr. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-28 12:31:41 +01:00
Vladimir Oltean	0d55649d2a	net: phy: marvell10g: fix broken PHY interrupts for anyone after us in the driver probe list Enabling interrupts via device tree for the internal PHYs on the mv88e6390 DSA switch does not work. The driver insists to use poll mode. Stage one debugging shows that the fwnode_mdiobus_phy_device_register function calls fwnode_irq_get properly, and phy->irq is set to a valid interrupt line initially. But it is then cleared. Stage two debugging shows that it is cleared here: phy_probe: /* Disable the interrupt if the PHY doesn't support it * but the interrupt is still a valid one / if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev)) phydev->irq = PHY_POLL; Okay, so does the "Marvell 88E6390 Family" PHY driver not have the .config_intr and .handle_interrupt function pointers? Yes it does. Stage three debugging shows that the PHY device does not attempt a probe against the "Marvell 88E6390 Family" driver, but against the "mv88x3310" driver. Okay, so why does the "mv88x3310" driver match on a mv88x6390 internal PHY? The PHY IDs (MARVELL_PHY_ID_88E6390_FAMILY vs MARVELL_PHY_ID_88X3310) are way different. Stage four debugging has us looking through: phy_device_register -> device_add -> bus_probe_device -> device_initial_probe -> __device_attach -> bus_for_each_drv -> driver_match_device -> drv->bus->match -> phy_bus_match Okay, so as we said, the MII_PHYSID1 of mv88e6390 does not match the mv88x3310 driver's PHY mask & ID, so why would phy_bus_match return... Ahh, phy_bus_match calls a shortcircuit method, phydrv->match_phy_device, and does not even bother to compare the PHY ID if that is implemented. So of course, we go inside the marvell10g.c driver and sure enough, it implements .match_phy_device and does not bother to check the PHY ID. What's interesting though is that at the end of the device_add() from phy_device_register(), the driver for the internal PHYs _is_ the proper "Marvell 88E6390 Family". This is because "mv88x3310" ends up failing to probe after all, and __device_attach_driver(), to quote: / * Ignore errors returned by ->probe so that the next driver can try * its luck. */ The next (and only other) driver that matches is the 6390 driver. For this one, phy_probe doesn't fail, and everything expects to work as normal, EXCEPT phydev->irq has already been cleared by the previous unsuccessful probe of a driver which did not implement PHY interrupts, and therefore cleared that IRQ. Okay, so it is not just Marvell 6390 that has PHY interrupts broken. Stuff like Atheros, Aquantia, Broadcom, Qualcomm work because they are lexicographically before Marvell, and stuff like NXP, Realtek, Vitesse are broken. This goes to show how fragile it is to reset phydev->irq = PHY_POLL from the actual beginning of phy_probe itself. That seems like an actual bug of its own too, since phy_probe has side effects which are not restored on probe failure, but the line of thought probably was, the same driver will attempt probe again, so it doesn't matter. Well, looks like it does. Maybe it would make more sense to move the phydev->irq clearing after the actual device_add() in phy_device_register() completes, and the bound driver is the actual final one. (also, a bit frightening that drivers are permitted to bypass the MDIO bus matching in such a trivial way and perform PHY reads and writes from the .match_phy_device method, on devices that do not even belong to them. In the general case it might not be guaranteed that the MDIO accesses one driver needs to make to figure out whether to match on a device is safe for all other PHY devices) Fixes: `a5de4be0aa` ("net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Marek Behún <kabel@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210827132541.28953-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-27 17:27:42 -07:00
Brett Creeley	b357d9717b	ice: Only lock to update netdev dev_addr commit `3ba7f53f8b` ("ice: don't remove netdev->dev_addr from uc sync list") introduced calls to netif_addr_lock_bh() and netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is fine since the driver is updated the netdev's dev_addr, but since this is a spinlock, the driver cannot sleep when the lock is held. Unfortunately the functions to add/delete MAC filters depend on a mutex. This was causing a trace with the lock debug kernel config options enabled when changing the mac address via iproute. [ 203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip [ 203.273068] Preemption disabled at: [ 203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S W I 5.14.0-rc4 #2 [ 203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 203.273102] Call Trace: [ 203.273107] dump_stack_lvl+0x33/0x42 [ 203.273113] ? ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273124] ___might_sleep.cold.150+0xda/0xea [ 203.273131] mutex_lock+0x1c/0x40 [ 203.273136] ice_remove_mac+0xe3/0x180 [ice] [ 203.273155] ? ice_fltr_add_mac_list+0x20/0x20 [ice] [ 203.273175] ice_fltr_prepare_mac+0x43/0xa0 [ice] [ 203.273194] ice_set_mac_address+0xab/0x1c0 [ice] [ 203.273206] dev_set_mac_address+0xb8/0x120 [ 203.273210] dev_set_mac_address_user+0x2c/0x50 [ 203.273212] do_setlink+0x1dd/0x10e0 [ 203.273217] ? __nla_validate_parse+0x12d/0x1a0 [ 203.273221] __rtnl_newlink+0x530/0x910 [ 203.273224] ? __kmalloc_node_track_caller+0x17f/0x380 [ 203.273230] ? preempt_count_add+0x68/0xa0 [ 203.273236] ? _raw_spin_lock_irqsave+0x1f/0x30 [ 203.273241] ? kmem_cache_alloc_trace+0x4d/0x440 [ 203.273244] rtnl_newlink+0x43/0x60 [ 203.273245] rtnetlink_rcv_msg+0x13a/0x380 [ 203.273248] ? rtnl_calcit.isra.40+0x130/0x130 [ 203.273250] netlink_rcv_skb+0x4e/0x100 [ 203.273256] netlink_unicast+0x1a2/0x280 [ 203.273258] netlink_sendmsg+0x242/0x490 [ 203.273260] sock_sendmsg+0x58/0x60 [ 203.273263] ____sys_sendmsg+0x1ef/0x260 [ 203.273265] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273268] ? ____sys_recvmsg+0xe6/0x170 [ 203.273270] ___sys_sendmsg+0x7c/0xc0 [ 203.273272] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273274] ? ___sys_recvmsg+0x89/0xc0 [ 203.273276] ? __netlink_sendskb+0x50/0x50 [ 203.273278] ? mod_objcg_state+0xee/0x310 [ 203.273282] ? __dentry_kill+0x114/0x170 [ 203.273286] ? get_max_files+0x10/0x10 [ 203.273288] __sys_sendmsg+0x57/0xa0 [ 203.273290] do_syscall_64+0x37/0x80 [ 203.273295] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 203.273296] RIP: 0033:0x7f8edf96e278 [ 203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 [ 203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278 [ 203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003 [ 203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0 [ 203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005 Fix this by only locking when changing the netdev->dev_addr. Also, make sure to restore the old netdev->dev_addr on any failures. Fixes: `3ba7f53f8b` ("ice: don't remove netdev->dev_addr from uc sync list") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 13:15:55 -07:00
Jacob Keller	9ee313433c	ice: restart periodic outputs around time changes When we enabled auxiliary input/output support for the E810 device, we forgot to add logic to restart the output when we change time. This is important as the periodic output will be incorrect after a time change otherwise. This unfortunately includes the adjust time function, even though it uses an atomic hardware interface. The atomic adjustment can still cause the pin output to stall permanently, so we need to stop and restart it. Introduce wrapper functions to temporarily disable and then re-enable the clock outputs. Fixes: `172db5f91d` ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha D Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 13:14:41 -07:00
Jacob Keller	4dd0d5c33c	ice: add lock around Tx timestamp tracker flush The driver didn't take the lock while flushing the Tx tracker, which could cause a race where one thread is trying to read timestamps out while another thread is trying to read the tracker to check the timestamps. Avoid this by ensuring that flushing is locked against read accesses. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:14:49 -07:00
Jacob Keller	1f0cbb3e89	ice: remove dead code for allocating pin_config We have code in the ice driver which allocates the pin_config structure if n_pins is > 0, but we never set n_pins to be greater than zero. There's no reason to keep this code until we actually have pin_config support. Remove this. We can re-add it properly when we implement support for pin_config for E810-T devices. Fixes: `172db5f91d` ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 09:11:39 -07:00
Jacob Keller	84c5fb8c42	ice: fix Tx queue iteration for Tx timestamp enablement The driver accidentally copied the ice_for_each_rxq iterator when implementing enablement of the ptp_tx bit for the Tx rings. We still load the Tx rings and set the ptp_tx field, but we iterate over the count of the num_rxq. If the number of Tx and Rx queues differ, this could either cause a buffer overrun when accessing the tx_rings list if num_txq is greater than num_rxq, or it could cause us to fail to enable Tx timestamps for some rings. This was not noticed originally as we generally have the same number of Tx and Rx queues. Fixes: `ea9b847cda` ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-08-27 08:38:55 -07:00
David S. Miller	5fe2a6b434	Merge tag 'mlx5-fixes-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-08-26 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-27 09:42:21 +01:00
Peter Collingbourne	d0efb16294	net: don't unconditionally copy_from_user a struct ifreq for socket ioctls A common implementation of isatty(3) involves calling a ioctl passing a dummy struct argument and checking whether the syscall failed -- bionic and glibc use TCGETS (passing a struct termios), and musl uses TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will copy sizeof(struct ifreq) bytes of data from the argument and return -EFAULT if that fails. The result is that the isatty implementations may return a non-POSIX-compliant value in errno in the case where part of the dummy struct argument is inaccessible, as both struct termios and struct winsize are smaller than struct ifreq (at least on arm64). Although there is usually enough stack space following the argument on the stack that this did not present a practical problem up to now, with MTE stack instrumentation it's more likely for the copy to fail, as the memory following the struct may have a different tag. Fix the problem by adding an early check for whether the ioctl is a valid socket ioctl, and return -ENOTTY if it isn't. Fixes: `44c02a2c3d` ("dev_ioctl(): move copyin/copyout to callers") Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d Signed-off-by: Peter Collingbourne <pcc@google.com> Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-27 09:40:25 +01:00
Wentao_Liang	6cc64770fb	net/mlx5: DR, fix a potential use-after-free bug In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to cur_htbl and may cause cur_htbl to be freed. However, cur_htbl is subsequently used in the next line, which may result in an use-after-free bug. Fix this by calling mlx5dr_err() before the cur_htbl is put. Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00
Dmytro Linkin	f9d196bd63	net/mlx5e: Use correct eswitch for stack devices with lag If link aggregation is used within stack devices driver rejects encap rules if PF of the VF tunnel device is down. This happens because route resolved for other PF and its eswitch instance is used to determine correct vport. To fix that use devcom feature to retrieve other eswitch instance if failed to find vport for the 1st eswitch and LAG is active. Fixes: `10742efc20` ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00
Maor Dickman	ca6891f9b2	net/mlx5: E-Switch, Set vhca id valid flag when creating indir fwd group When indirect forward group is created, flow is added with vhca id but without setting vhca id valid flag which violates the PRM. Fix by setting the missing flag, vhca id valid. Fixes: `34ca65352d` ("net/mlx5: E-Switch, Indirect table infrastructure") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00
Roi Dayan	9a5f9cc794	net/mlx5e: Fix possible use-after-free deleting fdb rule After neigh-update-add failure we are still with a slow path rule but the driver always assume the rule is an fdb rule. Fix neigh-update-del by checking slow path tc flag on the flow. Also fix neigh-update-add for when neigh-update-del fails the same. Fixes: `5dbe906ff1` ("net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:42 -07:00
Leon Romanovsky	8e7e2e8ed0	net/mlx5: Remove all auxiliary devices at the unregister event The call to mlx5_unregister_device() means that mlx5_core driver is removed. In such scenario, we need to disregard all other flags like attach/detach and forcibly remove all auxiliary devices. Fixes: `a5ae8fc905` ("net/mlx5e: Don't create devices during unload flow") Tested-and-Reported-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:41 -07:00
Dima Chumak	2f8b6161cc	net/mlx5: Lag, fix multipath lag activation When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route, lag activation can be missed if a stale (struct lag_mp)->mfi pointer exists, which was associated with an older multipath route that had been removed. Normally, when a route is removed, it triggers mlx5_lag_fib_event(), which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if mlx5_lag_check_prereq() condition isn't met, for example when eswitch is in legacy mode, the fib event is skipped and mfi pointer becomes stale. Fix by resetting mfi pointer to NULL in mlx5_deactivate_lag(). Fixes: `8a66e45859` ("net/mlx5: Change ownership model for lag") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-26 15:15:41 -07:00
Linus Torvalds	73367f05b2	Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux Pull nfsd fix from Bruce Fields: "This is a one-liner fix for a serious bug that can cause the server to become unresponsive to a client, so I think it's worth the last-minute inclusion for 5.14" * tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux: SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...	2021-08-26 13:26:40 -07:00
Linus Torvalds	8a2cb8bd06	Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from can and bpf. Closing three hw-dependent regressions. Any fixes of note are in the 'old code' category. Nothing blocking release from our perspective. Current release - regressions: - stmmac: revert "stmmac: align RX buffers" - usb: asix: ax88772: move embedded PHY detection as early as possible - usb: asix: do not call phy_disconnect() for ax88178 - Revert "net: really fix the build...", from Kalle to fix QCA6390 Current release - new code bugs: - phy: mediatek: add the missing suspend/resume callbacks Previous releases - regressions: - qrtr: fix another OOB Read in qrtr_endpoint_post - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings Previous releases - always broken: - inet: use siphash in exception handling - ip_gre: add validation for csum_start - bpf: fix ringbuf helper function compatibility - rtnetlink: return correct error on changing device netns - e1000e: do not try to recover the NVM checksum on Tiger Lake" * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) Revert "net: really fix the build..." net: hns3: fix get wrong pfc_en when query PFC configuration net: hns3: fix GRO configuration error after reset net: hns3: change the method of getting cmd index in debugfs net: hns3: fix duplicate node in VLAN list net: hns3: fix speed unknown issue in bond 4 net: hns3: add waiting time before cmdq memory is released net: hns3: clear hardware resource when loading driver net: fix NULL pointer reference in cipso_v4_doi_free rtnetlink: Return correct error on changing device netns net: dsa: hellcreek: Adjust schedule look ahead window net: dsa: hellcreek: Fix incorrect setting of GCL cxgb4: dont touch blocked freelist bitmap after free ipv4: use siphash instead of Jenkins in fnhe_hashfun() ipv6: use siphash in rt6_exception_hash() can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters net: usb: asix: ax88772: fix boolconv.cocci warnings net/sched: ets: fix crash when flipping from 'strict' to 'quantum' qede: Fix memset corruption net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp ...	2021-08-26 13:20:22 -07:00
Linus Torvalds	1a6d80ff24	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "We received a report this week that the generic version of pfn_valid(), which we switched to this merge window in `16c9afc776` ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with dma_map_resource() due to the following check: /* Don't allow RAM to be mapped / if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr)))) return DMA_MAPPING_ERROR; Since the ongoing saga to determine the semantics of pfn_valid() is unlikely to be resolved this week (does it indicate valid memory, or just the presence of a struct page, or whether that struct page has been initialised?), just revert back to our old version of pfn_valid() for 5.14. Summary: - Fix dma_map_resource() by reverting back to old pfn_valid() code" tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"	2021-08-26 11:26:00 -07:00
Linus Torvalds	97d8cc2008	Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two memory management fixes for the filesystem" * tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client: ceph: fix possible null-pointer dereference in ceph_mdsmap_decode() ceph: correctly handle releasing an embedded cap flush	2021-08-26 11:18:30 -07:00
Kalle Valo	9ebc2758d0	Revert "net: really fix the build..." This reverts commit `ce78ffa3ef`. Wren and Nicolas reported that ath11k was failing to initialise QCA6390 Wi-Fi 6 device with error: qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22 Commit `ce78ffa3ef` ("net: really fix the build..."), introduced in v5.14-rc5, caused this regression in qrtr. Most likely all ath11k devices are broken, but I only tested QCA6390. Let's revert the broken commit so that ath11k works again. Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 11:08:32 -07:00
Linus Torvalds	9b49ceb854	Merge tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix that I think qualifies for a late merge. It's a revert of a one-liner fix that meanwhile got backported to stable kernels and we got reports from users. The broken fix prevents creating compressed inline extents, which could be noticeable on space consumption. Technically it's a regression as the patch was merged in 5.14-rc1 but got propagated to several stable kernels and has higher exposure than a 'typical' development cycle bug" * tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: compression: don't try to compress if we don't have enough pages"	2021-08-26 11:05:11 -07:00
Jakub Kicinski	75da63b7a1	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== bpf 2021-08-26 We've added 1 non-merge commit during the last 1 day(s): 1) Fix ringbuf helper function compatibility, from Daniel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix ringbuf helper function compatibility ==================== Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 08:44:38 -07:00
Jakub Kicinski	57f8178292	Merge branch 'net-hns3-add-some-fixes-for-net' Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Link: https://lore.kernel.org/r/1629976921-43438-1-git-send-email-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:21 -07:00
Guangbin Huang	8c1671e0d1	net: hns3: fix get wrong pfc_en when query PFC configuration Currently, when query PFC configuration by dcbtool, driver will return PFC enable status based on TC. As all priorities are mapped to TC0 by default, if TC0 is enabled, then all priorities mapped to TC0 will be shown as enabled status when query PFC setting, even though some priorities have never been set. for example: $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off $ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on To fix this problem, just returns user's PFC config parameter saved in driver. Fixes: `cacde272dd` ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:17 -07:00
Yufeng Mo	3462207d2d	net: hns3: fix GRO configuration error after reset The GRO configuration is enabled by default after reset. This is incorrect and should be restored to the user-configured value. So this restoration is added during reset initialization. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:17 -07:00
Yufeng Mo	55649d5654	net: hns3: change the method of getting cmd index in debugfs Currently, the cmd index is obtained in debugfs by comparing file names. However, this method may cause errors when processing more complex file names. So, change this method by saving cmd in private data and comparing it when getting cmd index in debugfs for optimization. Fixes: `5e69ea7ee2` ("net: hns3: refactor the debugfs process") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:17 -07:00
Guojia Liao	94391fae82	net: hns3: fix duplicate node in VLAN list VLAN list should not be added duplicate VLAN node, otherwise it would cause "add failed" when restore VLAN from VLAN list, so this patch adds VLAN ID check before adding node into VLAN list. Fixes: `c6075b1934` ("net: hns3: Record VF vlan tables") Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:17 -07:00
Yonglong Liu	b15c072a9f	net: hns3: fix speed unknown issue in bond 4 In bond 4, when the link goes down and up repeatedly, the bond may get an unknown speed, and then this port can not work. The driver notify netif_carrier_on() before update the link state, when the bond receive carrier on, will query the speed of the port, if the query operation happens before updating the link state, will get an unknown speed. So need to notify netif_carrier_on() after update the link state. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Fixes: `e2cb1dec97` ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:16 -07:00
Yufeng Mo	a96d9330b0	net: hns3: add waiting time before cmdq memory is released After the cmdq registers are cleared, the firmware may take time to clear out possible left over commands in the cmdq. Driver must release cmdq memory only after firmware has completed processing of left over commands. Fixes: `232d0d55fc` ("net: hns3: uninitialize command queue while unloading PF driver") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:16 -07:00
Yufeng Mo	1a6d281946	net: hns3: clear hardware resource when loading driver If a PF is bonded to a virtual machine and the virtual machine exits unexpectedly, some hardware resource cannot be cleared. In this case, loading driver may cause exceptions. Therefore, the hardware resource needs to be cleared when the driver is loaded. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 07:24:16 -07:00
王贇	733c99ee8b	net: fix NULL pointer reference in cipso_v4_doi_free In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc failed, we sometime observe panic: BUG: kernel NULL pointer dereference, address: ... RIP: 0010:cipso_v4_doi_free+0x3a/0x80 ... Call Trace: netlbl_cipsov4_add_std+0xf4/0x8c0 netlbl_cipsov4_add+0x13f/0x1b0 genl_family_rcv_msg_doit.isra.15+0x132/0x170 genl_rcv_msg+0x125/0x240 This is because in cipso_v4_doi_free() there is no check on 'doi_def->map.std' when 'doi_def->type' equal 1, which is possibe, since netlbl_cipsov4_add_std() haven't initialize it before alloc 'doi_def->map.std'. This patch just add the check to prevent panic happen for similar cases. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 12:20:47 +01:00
Andrey Ignatov	96a6b93b69	rtnetlink: Return correct error on changing device netns Currently when device is moved between network namespaces using RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID, IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and target namespace already has device with same name, userspace will get EINVAL what is confusing and makes debugging harder. Fix it so that userspace gets more appropriate EEXIST instead what makes debugging much easier. Before: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: Invalid argument After: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: File exists The problem is that do_setlink() passes its `char ifname` argument, that it gets from a caller, to __dev_change_net_namespace() as is (as `const char pat`), but semantics of ifname and pat can be different. For example, __rtnl_newlink() does this: net/core/rtnetlink.c 3270 char ifname[IFNAMSIZ]; ... 3286 if (tb[IFLA_IFNAME]) 3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); 3288 else 3289 ifname[0] = '\0'; ... 3364 if (dev) { ... 3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status); 3395 } , i.e. do_setlink() gets ifname pointer that is always valid no matter if user specified IFLA_IFNAME or not and then do_setlink() passes this ifname pointer as is to __dev_change_net_namespace() as pat argument. But the pat (pattern) in __dev_change_net_namespace() is used as: net/core/dev.c 11198 err = -EEXIST; 11199 if (__dev_get_by_name(net, dev->name)) { 11200 /* We get here if we can't use the current device name */ 11201 if (!pat) 11202 goto out; 11203 err = dev_get_valid_name(net, dev, pat); 11204 if (err < 0) 11205 goto out; 11206 } As the result the `goto out` path on line 11202 is neven taken and instead of returning EEXIST defined on line 11198, __dev_change_net_namespace() returns an error from dev_get_valid_name() and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier. Fixes: `d8a5ec6727` ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 12:08:08 +01:00
David S. Miller	a423cbe0f2	Merge branch 'dsa-hellcreek-fixes' Kurt Kanzenbach says: ==================== net: dsa: hellcreek: 802.1Qbv Fixes while using TAPRIO offloading on the Hirschmann hellcreek switch, I've noticed two issues in the current implementation: 1. The gate control list is incorrectly programmed 2. The admin base time is not set properly Fix it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:26:06 +01:00
Kurt Kanzenbach	b7658ed35a	net: dsa: hellcreek: Adjust schedule look ahead window Traffic schedules can only be started up to eight seconds within the future. Therefore, the driver periodically checks every two seconds whether the admin base time provided by the user is inside that window. If so the schedule is started. Otherwise the check is deferred. However, according to the programming manual the look ahead window size should be four - not eight - seconds. By using the proposed value of four seconds starting a schedule at a specified admin base time actually works as expected. Fixes: `24dfc6eb39` ("net: dsa: hellcreek: Add TAPRIO offloading support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:26:06 +01:00
Kurt Kanzenbach	a7db5ed863	net: dsa: hellcreek: Fix incorrect setting of GCL Currently the gate control list which is programmed into the hardware is incorrect resulting in wrong traffic schedules. The problem is the loop variables are incremented before they are referenced. Therefore, move the increment to the end of the loop. Fixes: `24dfc6eb39` ("net: dsa: hellcreek: Add TAPRIO offloading support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:26:05 +01:00
Rahul Lakkireddy	43fed4d48d	cxgb4: dont touch blocked freelist bitmap after free When adapter init fails, the blocked freelist bitmap is already freed up and should not be touched. So, move the bitmap zeroing closer to where it was successfully allocated. Also handle adapter init failure unwind path immediately and avoid setting up RDMA memory windows. Fixes: `5b377d114f` ("cxgb4: Add debugfs facility to inject FL starvation") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:23:24 +01:00
David S. Miller	38d57551dd	Merge branch 'inet-siphash' Eric Dumazet says: ==================== inet: use siphash in exception handling A group of security researchers brought to our attention the weakness of hash functions used in rt6_exception_hash() and fnhe_hashfun() I made two distinct patches to help backports, since IPv6 part was added in 4.15 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:20:34 +01:00
Eric Dumazet	6457378fe7	ipv4: use siphash instead of Jenkins in fnhe_hashfun() A group of security researchers brought to our attention the weakness of hash function used in fnhe_hashfun(). Lets use siphash instead of Jenkins Hash, to considerably reduce security risks. Also remove the inline keyword, this really is distracting. Fixes: `d546c62154` ("ipv4: harden fnhe_hashfun()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:20:34 +01:00
Eric Dumazet	4785305c05	ipv6: use siphash in rt6_exception_hash() A group of security researchers brought to our attention the weakness of hash function used in rt6_exception_hash() Lets use siphash instead of Jenkins Hash, to considerably reduce security risks. Following patch deals with IPv4. Fixes: `35732d01fe` ("ipv6: introduce a hash table to store dst cache") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 10:20:34 +01:00
David S. Miller	92ea47fe09	Merge tag 'linux-can-fixes-for-5.14-20210826' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine says: ==================== pull-request: can 2021-08-26 this is a pull request of a single patch for net/master. Stefan Mätje's patch fixes the interchange of RX and TX error counters inthe esd_usb2 CAN driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 09:37:40 +01:00
Stefan Mätje	044012b520	can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters This patch fixes the interchanged fetch of the CAN RX and TX error counters from the ESD_EV_CAN_ERROR_EXT message. The RX error counter is really in struct rx_msg::data[2] and the TX error counter is in struct rx_msg::data[3]. Fixes: `96d8e90382` ("can: Add driver for esd CAN-USB/2 device") Link: https://lore.kernel.org/r/20210825215227.4947-2-stefan.maetje@esd.eu Cc: stable@vger.kernel.org Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-08-26 08:37:13 +02:00
kernel test robot	ec92e524ee	net: usb: asix: ax88772: fix boolconv.cocci warnings drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here Remove unneeded conversion to bool Semantic patch information: Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unneeded. Generated by: scripts/coccinelle/misc/boolconv.cocci Fixes: `7a141e64cf` ("net: usb: asix: ax88772: move embedded PHY detection as early as possible") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-25 16:35:51 -07:00
Trond Myklebust	062b829c52	SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()... If the attempt to reserve a slot fails, we currently leak the XPT_BUSY flag on the socket. Among other things, this make it impossible to close the socket. Fixes: `82011c80b3` ("SUNRPC: Move svc_xprt_received() call sites") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-08-25 16:58:09 -04:00
Linus Torvalds	73f3af7b46	Merge branch 'akpm' (patches from Andrew) Merge fixes from Andrew Morton: "2 patches. Subsystems affected by this patch series: mm/memory-hotplug and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: exfat: update my email address mm/memory_hotplug: fix potential permanent lru cache disable	2021-08-25 12:45:31 -07:00
Namjae Jeon	a34cc13add	MAINTAINERS: exfat: update my email address My email address in exfat entry will be not available in a few days. Update it to my own kernel.org address. Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.com Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-25 12:25:12 -07:00
Miaohe Lin	946746d1ad	mm/memory_hotplug: fix potential permanent lru cache disable If offline_pages failed after lru_cache_disable(), it forgot to do lru_cache_enable() in error path. So we would have lru cache disabled permanently in this case. Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com Fixes: `d479960e44` ("mm: disable LRU pagevec during the migration temporarily") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-25 12:25:12 -07:00
Linus Torvalds	fe67f4dd8d	pipe: do FASYNC notifications for every pipe IO, not just state changes It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits `f467a6a664` and `1b6b26ae70` ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit `3a34b13a88` ("pipe: make pipe writes always wake up readers") only to be broken again when by commit `3b844826b6` ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: `3b844826b6` ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-25 10:27:16 -07:00
Linus Torvalds	62add98208	Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fixes from Eric Biederman: "This branch fixes a regression that made it impossible to increase rlimits that had been converted to the ucount infrastructure, and also fixes a reference counting bug where the reference was not incremented soon enough. The fixes are trivial and the bugs have been encountered in the wild, and the fixes have been tested" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Increase ucounts reference counter before the security hook ucounts: Fix regression preventing increasing of rlimits in init_user_ns	2021-08-25 09:56:10 -07:00
Tuo Li	a9e6ffbc5b	ceph: fix possible null-pointer dereference in ceph_mdsmap_decode() kcalloc() is called to allocate memory for m->m_info, and if it fails, ceph_mdsmap_destroy() behind the label out_err will be called: ceph_mdsmap_destroy(m); In ceph_mdsmap_destroy(), m->m_info is dereferenced through: kfree(m->m_info[i].export_targets); To fix this possible null-pointer dereference, check m->m_info before the for loop to free m->m_info[i].export_targets. [ jlayton: fix up whitespace damage only kfree(m->m_info) if it's non-NULL ] Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-08-25 16:34:11 +02:00
Xiubo Li	b2f9fa1f3b	ceph: correctly handle releasing an embedded cap flush The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one. When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap. Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set. At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-08-25 16:34:11 +02:00

1 2 3 4 5 ...

1031387 Commits