Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-08-30
This series contains some misc fixes to the mlx5 driver.
Please pull and let me know if there's any problem.
For -stable:
Kernels >= 4.12
net/mlx5e: Fix CQ moderation mode not set properly
net/mlx5e: Don't override user RSS upon set channels
Kernels >= 4.11
net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address
Kernels >= 4.10
net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap
net/mlx5e: Check for qos capability in dcbnl_initialize
Kernels >= 4.9
net/mlx5e: Fix dangling page pointer on DMA mapping error
Kernels >= 4.8
net/mlx5e: Fix inline header size for small packets
net/mlx5: E-Switch, Unload the representors in the correct order
net/mlx5: Fix arm SRQ command for ISSI version 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller had no problem to trigger a deadlock, attaching a KCM socket
to another one (or itself). (original syzkaller report was a very
confusing lockdep splat during a sendmsg())
It seems KCM claims to only support TCP, but no enforcement is done,
so we might need to add additional checks.
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
net/sched: init failure fixes
I went over all qdiscs' init, destroy and reset callbacks and found the
issues fixed in each patch. Mostly they are null pointer dereferences due
to uninitialized timer (qdisc watchdog) or double frees due to ->destroy
cleaning up a second time. There's more information in each patch.
I've tested these by either sending wrong attributes from user-spaces, no
attributes or by simulating memory alloc failure where applicable. Also
tried all of the qdiscs as a default qdisc.
Most of these bugs were present before commit 87b60cfacf, I've tried to
include proper fixes tags in each patch.
I haven't included individual patch acks in the set, I'd appreciate it if
you take another look and resend them.
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently only a memory allocation failure can lead to this, so let's
initialize the timer first.
Fixes: 6529eaba33 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is very unlikely to happen but the backlogs memory allocation
could fail and will free q->flows, but then ->destroy() will free
q->flows too. For correctness remove the first free and let ->destroy
clean up.
Fixes: 87b60cfacf ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on where ->init fails we can get a null pointer deref due to
uninitialized hires timer (watchdog) or a double free of the qdisc hash
because it is already freed by ->destroy().
Fixes: 8d55373875 ("net/sched/hfsc: allocate tcf block for hfsc root class")
Fixes: 87b60cfacf ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cq_period_mode assignment was mistakenly removed so it was always set to "0",
which is EQE based moderation, regardless of the device CAPs and
requested value in ethtool.
Fixes: 6a9764efb2 ("net/mlx5e: Isolate open_channels from priv->params")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix inline header size, make sure it is not greater than skb len.
This bug effects small packets, for example L2 packets with size < 18.
Fixes: ae76715d15 ("net/mlx5e: Check the minimum inline header mode before xmit")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When changing from switchdev to legacy mode, all the representor port
devices (uplink nic and reps) are cleaned up. Part of this cleaning
process is removing the neigh entries and the hash table containing them.
However, a representor neigh entry might be linked to the uplink port
hash table and if the uplink nic is cleaned first the cleaning of the
representor will end up in null deref.
Fix that by unloading the representors in the opposite order of load.
Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to
resolve it as part of the route lookup. The resulting encap header
is left with a zeroed out ipv6 src address so the packets are sent
with this src ip.
Use an appropriate route lookup API that also resolves the source
ipv6 address if it's not supplied.
Fixes: ce99f6b97f ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, increasing the number of combined channels is changing
the RSS spread to use the new created channels.
Prevent the RSS spread change in case the user explicitly declare it,
to avoid overriding user configuration.
Tested:
when RSS default:
# ethtool -L ens8 combined 4
RSS spread will change and point to 4 channels.
# ethtool -X ens8 equal 4
# ethtool -L ens8 combined 6
RSS will not change after increasing the number of the channels.
Fixes: 8bf3686204 ('ethtool: ensure channel counts are within bounds during SCHANNELS')
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Function mlx5e_dealloc_rx_wqe is using page pointer value as an
indication to valid DMA mapping. In case that the mapping failed, we
released the page but kept the dangling pointer. Store the page pointer
only after the DMA mapping passed to avoid invalid page DMA unmap.
Fixes: bc77b240b3 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Fixes: 5fc7197d3a ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Support for ISSI version 0 was recently broken as the arm_srq_cmd
command, which is used only for ISSI version 0, was given the opcode
for ISSI version 1 instead of ISSI version 0.
Change arm_srq_cmd to use the correct command opcode for ISSI version
0.
Fixes: af1ba291c5 ('{net, IB}/mlx5: Refactor internal SRQ API')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Current code doesn't report DCB_CAP_DCBX_HOST capability when query
through getcap. User space lldptool expects capability to have HOST mode
set when it wants to configure DCBX CEE mode. In absence of HOST mode
capability, lldptool fails to switch to CEE mode.
This fix returns DCB_CAP_DCBX_HOST capability when port's DCBX
controlled mode is under software control.
Fixes: 3a6a931dfb ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
qos capability is the master capability bit that determines
if the DCBX is supported for the PCI function. If this bit is off,
driver cannot run any dcbx code.
Fixes: e207b7e991 ("net/mlx5e: ConnectX-4 firmware support for DCBX")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
It is quite common for ti_cm_get_macid() to fail on some of the
platforms it is invoked on. They include any platform where
mac address is not part of SoC register space.
On these platforms, mac address is read and populated in
device-tree by bootloader. An example is TI DA850.
Downgrade the severity of message to "information", so it does
not spam logs when 'quiet' boot is desired.
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phy is connected at early stage of probe but not properly
disconnected if error occurs. This patch fixes the issue.
Also changing the return type of xgene_enet_check_phy_handle(),
since this function always returns success.
Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both the nfp_net_pf_app_start() and the nfp_net_pci_probe() functions
call nfp_net_pf_app_stop_ctrl(pf) so there is a double free. The free
should be done from the probe function because it's allocated there so
I have removed the call from nfp_net_pf_app_start().
Fixes: 02082701b9 ("nfp: create control vNICs and wire up rx/tx")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous says:
====================
net:ethernet:aquantia: Atlantic driver Update 2017-08-23
This series contains updates for aQuantia Atlantic driver.
It has bugfixes and some improvements.
Changes in v2:
- "MCP state change" fix removed (will be sent as
a separate fix after further investigation.)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We should inform user about wrong firmware version
by printing message in dmesg.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the HW supports up to 32 multicast filters we should
track count of multicast filters to avoid overflow.
If we attempt to add >32 multicast filter - just set NETIF_ALLMULTI flag
instead.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Igor Russkikh <Igor.Russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver choose the optimal interrupt throttling settings depends
of current link speed.
Due this bug link_status field from aq_hw is never updated and as result
always used same interrupt throttling values.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware has the HW Checksum Offload bug when small
TCP patckets (with length <= 60 bytes) has wrong "checksum valid" bit.
The solution is - ignore checksum valid bit for small packets
(with length <= 60 bytes) and mark this as CHECKSUM_NONE to allow
network stack recalculate checksum itself.
Fixes: ccf9a5ed14be ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of RSS queues should be not more than numbers of CPU.
Its does not make sense to increase perfomance, and also cause problems on
some motherboards.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes datapath spinlocks which does not perform any
useful work.
Fixes: 6e70637f9f1e ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a bond slave device as a tipc bearer, the dev represents the bond
interface and orig_dev represents the slave in tipc_l2_rcv_msg().
Since we decode the tipc_ptr from bonding device (dev), we fail to
find the bearer and thus tipc links are not established.
In this commit, we register the tipc protocol callback per device and
look for tipc bearer from both the devices.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ChunYu found a kernel warn_on during syzkaller fuzzing:
[40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
[40226.144849] Call Trace:
[40226.147590] <IRQ>
[40226.149859] dump_stack+0xe2/0x186
[40226.176546] __warn+0x1a4/0x1e0
[40226.180066] warn_slowpath_null+0x31/0x40
[40226.184555] inet_sock_destruct+0x78d/0x9a0
[40226.246355] __sk_destruct+0xfa/0x8c0
[40226.290612] rcu_process_callbacks+0xaa0/0x18a0
[40226.336816] __do_softirq+0x241/0x75e
[40226.367758] irq_exit+0x1f6/0x220
[40226.371458] smp_apic_timer_interrupt+0x7b/0xa0
[40226.376507] apic_timer_interrupt+0x93/0xa0
The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.
But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.
This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.
Fixes: f970bd9e3a ("udp: implement memory accounting helpers")
Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-08-29
1) Fix dst_entry refcount imbalance when using socket policies.
From Lorenzo Colitti.
2) Fix locking when adding the ESP trailers.
3) Fix tailroom calculation for the ESP trailer by using
skb_tailroom instead of skb_availroom.
4) Fix some info leaks in xfrm_user.
From Mathias Krause.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.
The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.
But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.
This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.
Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c5cff8561d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:
for (i = 0; i < size / 8; i++)
entry->cmd[i] = be64_to_cpu(cmd[i]);
Up to 48 bytes from the stack are then leaked to debugfs.
This happens whenever we send (and log) commands described by
structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
fw_sched_cmd(32), fw_devlog_cmd(32).
The cxgb4vf driver got this right instead.
When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.
Fixes: 7f080c3f2f ("cxgb4: Add support to enable logging of firmware mailbox commands")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.
Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren says:
====================
nfp: fix layer calculation and flow dissector use
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.
Fixes: af9d842c13 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.
Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault says:
====================
l2tp: fix some l2tp_tunnel_find() issues in l2tp_netlink
Since l2tp_tunnel_find() doesn't take a reference on the tunnel it
returns, its users are almost guaranteed to be racy.
This series defines l2tp_tunnel_get() which can be used as a safe
replacement, and converts some of l2tp_tunnel_find() users in the
l2tp_netlink module.
Other users often combine this issue with other more or less subtle
races. They will be fixed incrementally in followup series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on
us. Otherwise l2tp_tunnel_destruct() might release the last reference
count concurrently, thus freeing the tunnel while we're using it.
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>