Commit Graph

109162 Commits

Author SHA1 Message Date
Florian Westphal
223fd0adfa bridge: broute: make broute a real ebtables table
This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.

It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.

This change is backwards compatible with ebtables as no userspace visible
parts are changed.

This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:50 +02:00
Florian Westphal
971502d77f bridge: netfilter: unroll NF_HOOK helper in bridge input path
Replace NF_HOOK() based invocation of the netfilter hooks with a private
copy of nf_hook_slow().

This copy has one difference: it can return the rx handler value expected
by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

This is needed by the next patch to invoke the ebtables
"broute" table via the standard netfilter hooks rather than the custom
"br_should_route_hook" indirection that is used now.

When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
bridge rx input handler, but there is no way to indicate this via
NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
netfilter core or a percpu flag.

  text    data     bss     dec   filename
  3369      56       0    3425   net/bridge/br_input.o.before
  3458      40       0    3498   net/bridge/br_input.o.after

This allows removal of the "br_should_route_hook" in the next patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:39 +02:00
Florian Westphal
610a43149c netfilter: nf_nat_masquerade: unify ipv4/6 notifier registration
Only reason for having two different register functions was because of
ipt_MASQUERADE and ip6t_MASQUERADE being two different modules.

Previous patch merged those into xt_MASQUERADE, so we can merge this too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11 20:59:34 +02:00
Florian Westphal
bf8981a2aa netfilter: nf_nat: merge ip/ip6 masquerade headers
Both are now implemented by nf_nat_masquerade.c, so no need to keep
different headers.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11 20:59:21 +02:00
Florian Westphal
3b0a081db1 netfilter: make two functions static
They have no external callers anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:28:33 +02:00
Fernando Fernandez Mancera
22c7652cda netfilter: nft_osf: Add version option support
Add version option support to the nftables "osf" expression.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:27:12 +02:00
Florian Westphal
4806e97572 netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)
NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is
enabled.  Now that the af-specific nat configuration switches have been
removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:02:52 +02:00
Florian Westphal
c1deb065cf netfilter: nf_tables: merge route type into core
very little code, so it really doesn't make sense to have extra
modules or even a kconfig knob for this.

Merge them and make functionality available unconditionally.
The merge makes inet family route support trivial, so add it
as well here.

Before:
   text	   data	    bss	    dec	    hex	filename
    835	    832	      0	   1667	    683 nft_chain_route_ipv4.ko
    870	    832	      0	   1702	    6a6	nft_chain_route_ipv6.ko
 111568	   2556	    529	 114653	  1bfdd	nf_tables.ko

After:
   text	   data	    bss	    dec	    hex	filename
 113133	   2556	    529	 116218	  1c5fa	nf_tables.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:01:42 +02:00
Florian Westphal
d164385ec5 netfilter: nat: add inet family nat support
We need minimal support from the nat core for this, as we do not
want to register additional base hooks.

When an inet hook is registered, interally register ipv4 and ipv6
hooks for them and unregister those when inet hooks are removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:01:39 +02:00
Li RongQing
01902f8c85 netfilter: optimize nf_inet_addr_cmp
optimize nf_inet_addr_cmp by 64bit xor computation
similar to ipv6_addr_equal()

Signed-off-by: Yuan Linsi <yuanlinsi01@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 22:58:16 +02:00
Jacky Hu
84c0d5e96f ipvs: allow tunneling with gue encapsulation
ipip packets are blocked in some public cloud environments, this patch
allows gue encapsulation with the tunneling method, which would make
tunneling working in those environments.

Signed-off-by: Jacky Hu <hengqing.hu@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 22:57:59 +02:00
Li RongQing
3b15d09f7e time: Introduce jiffies64_to_msecs()
there is a similar helper in net/netfilter/nf_tables_api.c,
this maybe become a common request someday, so move it to
time.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 22:56:14 +02:00
Paolo Abeni
fd69c399c7 datagram: remove rendundant 'peeked' argument
After commit a297569fe0 ("net/udp: do not touch skb->peeked unless
really needed") the 'peeked' argument of __skb_try_recv_datagram()
and friends is always equal to !!'flags & MSG_PEEK'.

Since such argument is really a boolean info, and the callers have
already 'flags & MSG_PEEK' handy, we can remove it and clean-up the
code a bit.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08 09:51:54 -07:00
NeilBrown
149212f078 rhashtable: add lockdep tracking to bucket bit-spin-locks.
Native bit_spin_locks are not tracked by lockdep.

The bit_spin_locks used for rhashtable buckets are local
to the rhashtable implementation, so there is little opportunity
for the sort of misuse that lockdep might detect.
However locks are held while a hash function or compare
function is called, and if one of these took a lock,
a misbehaviour is possible.

As it is quite easy to add lockdep support this unlikely
possibility seems to be enough justification.

So create a lockdep class for bucket bit_spin_lock and attach
through a lockdep_map in each bucket_table.

Without the 'nested' annotation in rhashtable_rehash_one(), lockdep
correctly reports a possible problem as this lock is taken
while another bucket lock (in another table) is held.  This
confirms that the added support works.
With the correct nested annotation in place, lockdep reports
no problems.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-07 19:12:12 -07:00
NeilBrown
8f0db01800 rhashtable: use bit_spin_locks to protect hash bucket.
This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
 - no need to allocate a separate array of locks.
 - no need to have a configuration option to guide the
   choice of the size of this array
 - locking cost is often a single test-and-set in a cache line
   that will have to be loaded anyway.  When inserting at, or removing
   from, the head of the chain, the unlock is free - writing the new
   address in the bucket head implicitly clears the lock bit.
   For __rhashtable_insert_fast() we ensure this always happens
   when adding a new key.
 - even when lockings costs 2 updates (lock and unlock), they are
   in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved.  This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
  pointer plus lock-bit
that is stored in the bucket-table.  This is "struct rhash_lock_head"
and is empty.  A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head.  As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-07 19:12:12 -07:00
NeilBrown
ff302db965 rhashtable: allow rht_bucket_var to return NULL.
Rather than returning a pointer to a static nulls, rht_bucket_var()
now returns NULL if the bucket doesn't exist.
This will make the next patch, which stores a bitlock in the
bucket pointer, somewhat cleaner.

This change involves introducing __rht_bucket_nested() which is
like rht_bucket_nested(), but doesn't provide the static nulls,
and changing rht_bucket_nested() to call this and possible
provide a static nulls - as is still needed for the non-var case.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-07 19:12:12 -07:00
David S. Miller
f83f715195 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment merge conflict in mlx5.

Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05 14:14:19 -07:00
Linus Torvalds
0548740e53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several hash table refcount fixes in batman-adv, from Sven
    Eckelmann.

 2) Use after free in bpf_evict_inode(), from Daniel Borkmann.

 3) Fix mdio bus registration in ixgbe, from Ivan Vecera.

 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.

 5) ila rhashtable corruption fix from Herbert Xu.

 6) Don't allow upper-devices to be added to vrf devices, from Sabrina
    Dubroca.

 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.

 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
    from Alexander Lobakin.

 9) Missing IDR checks in mlx5 driver, from Aditya Pakki.

10) Fix false connection termination in ktls, from Jakub Kicinski.

11) Work around some ASPM issues with r8169 by disabling rx interrupt
    coalescing on certain chips. From Heiner Kallweit.

12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
    Abeni.

13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.

14) Various BPF flow dissector fixes from Stanislav Fomichev.

15) Divide by zero in act_sample, from Davide Caratti.

16) Fix bridging multicast regression introduced by rhashtable
    conversion, from Nikolay Aleksandrov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  ibmvnic: Fix completion structure initialization
  ipv6: sit: reset ip header pointer in ipip6_rcv
  net: bridge: always clear mcast matching struct on reports and leaves
  libcxgb: fix incorrect ppmax calculation
  vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
  sch_cake: Make sure we can write the IP header before changing DSCP bits
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  tcp: Ensure DCTCP reacts to losses
  net/sched: act_sample: fix divide by zero in the traffic path
  net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
  net: hns: Fix sparse: some warnings in HNS drivers
  net: hns: Fix WARNING when remove HNS driver with SMMU enabled
  net: hns: fix ICMP6 neighbor solicitation messages discard problem
  net: hns: Fix probabilistic memory overwrite when HNS driver initialized
  net: hns: Use NAPI_POLL_WEIGHT for hns driver
  net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
  flow_dissector: rst'ify documentation
  ipv6: Fix dangling pointer when ipv6 fragment
  net-gro: Fix GRO flush when receiving a GSO packet.
  flow_dissector: document BPF flow dissector environment
  ...
2019-04-04 18:07:12 -10:00
Jiri Pirko
7e1146e8c1 net: devlink: introduce devlink_compat_switch_id_get() helper
Introduce devlink_compat_switch_id_get() helper which fills up switch_id
according to passed netdev pointer. Call it directly from
dev_get_port_parent_id() as a fallback when ndo_get_port_parent_id
is not defined for given netdev.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 17:42:36 -07:00
Jiri Pirko
bec5267cde net: devlink: extend port attrs for switch ID
Extend devlink_port_attrs_set() to pass switch ID for ports which are
part of switch and store it in port attrs. For other ports, this is
NULL.

Note that this allows the driver to group devlink ports into one or more
switches according to the actual topology.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 17:42:36 -07:00
Jiri Pirko
407dd706fb net: devlink: convert devlink_port_attrs bools to bits
In order to save space in the struct, convert bools to bits.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 17:42:36 -07:00
Florian Westphal
28b05b9288 net: use correct this_cpu primitive in dev_recursion_level
syzbot reports:
BUG: using __this_cpu_read() in preemptible code:
caller is dev_recursion_level include/linux/netdevice.h:3052 [inline]
 __this_cpu_preempt_check+0x246/0x270 lib/smp_processor_id.c:47
 dev_recursion_level include/linux/netdevice.h:3052 [inline]
 ip6_skb_dst_mtu include/net/ip6_route.h:245 [inline]

I erronously downgraded a this_cpu_read to __this_cpu_read when
moving dev_recursion_level() around.

Reported-by: syzbot+51471b4aae195285a4a3@syzkaller.appspotmail.com
Fixes: 97cdcf37b5 ("net: place xmit recursion in softnet data")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 17:34:58 -07:00
David Ahern
c0a720770c ipv6: Flip to fib_nexthop_info
Export fib_nexthop_info and fib_add_nexthop for use by IPv6 code.
Remove rt6_nexthop_info and rt6_add_nexthop in favor of the IPv4
versions. Update fib_nexthop_info for IPv6 linkdown check and
RTA_GATEWAY for AF_INET6.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03 21:50:20 -07:00
David Ahern
eba618abac ipv4: Add fib_nh_common to fib_result
Most of the ipv4 code only needs data from fib_nh_common. Add
fib_nh_common selection to fib_result and update users to use it.

Right now, fib_nh_common in fib_result will point to a fib_nh struct
that is embedded within a fib_info:

        fib_info  --> fib_nh
                      fib_nh
                      ...
                      fib_nh
                        ^
    fib_result->nhc ----+

Later, nhc can point to a fib_nh within a nexthop struct:

        fib_info --> nexthop --> fib_nh
                                   ^
    fib_result->nhc ---------------+

or for a nexthop group:

        fib_info --> nexthop --> nexthop --> fib_nh
                                 nexthop --> fib_nh
                                 ...
                                 nexthop --> fib_nh
                                               ^
    fib_result->nhc ---------------------------+

In all cases nhsel within fib_result will point to which leg in the
multipath route is used.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03 21:50:20 -07:00
David Ahern
0af7e7c128 ipv4: Update fib_table_lookup tracepoint to take common nexthop
Update fib_table_lookup tracepoint to take a fib_nh_common struct and
dump the v6 gateway address if the nexthop uses it.

Over the years saddr has not proven useful and the output of the
tracepoint produces very long lines. Since saddr is not part of
fib_nh_common, drop it. If it needs to be added later, fib_nh which
contains saddr can be obtained from a fib_nh_common via container_of.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03 21:50:20 -07:00
Heiner Kallweit
4950c2ba49 net: phy: fix autoneg mismatch case in genphy_read_status
The original patch didn't consider the case that autoneg process
finishes successfully but both link partners have no mode in common.
In this case there's no link, nevertheless we may be interested in
what the link partner advertised.

Like phydev->link we set phydev->autoneg_complete in
genphy_update_link() and use the stored value in genphy_read_status().
This way we don't have to read register BMSR again.

Fixes: b6163f194c ("net: phy: improve genphy_read_status")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-03 21:47:54 -07:00
Heiner Kallweit
045925e3fe net: phy: add genphy_read_abilities
Similar to genphy_c45_pma_read_abilities() add a function to dynamically
detect the abilities of a Clause 22 PHY. This is mainly copied from
genphy_config_init().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02 13:09:55 -07:00
Florian Westphal
4f296edeb9 drivers: net: aurora: use netdev_xmit_more helper
This is the last driver using always-0 skb->xmit_more.
Switch it to netdev_xmit_more and remove the now unused xmit_more flag
from sk_buff.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 18:35:02 -07:00
Florian Westphal
6b16f9ee89 net: move skb->xmit_more hint to softnet data
There are two reasons for this.

First, the xmit_more flag conceptually doesn't fit into the skb, as
xmit_more is not a property related to the skb.
Its only a hint to the driver that the stack is about to transmit another
packet immediately.

Second, it was only done this way to not have to pass another argument
to ndo_start_xmit().

We can place xmit_more in the softnet data, next to the device recursion.
The recursion counter is already written to on each transmit. The "more"
indicator is placed right next to it.

Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more
to check the "more packets coming" hint.

skb->xmit_more is retained (but always 0) to not cause build breakage.

This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/
conversions.  Remaining drivers are converted in the next patches.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 18:35:02 -07:00
Florian Westphal
97cdcf37b5 net: place xmit recursion in softnet data
This fills a hole in softnet data, so no change in structure size.

Also prepares for xmit_more placement in the same spot;
skb->xmit_more will be removed in followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 18:35:02 -07:00
Xiaofei Shen
a2c7023f70 net: dsa: read mac address from DT for slave device
Before creating a slave netdevice, get the mac address from DTS and
apply in case it is valid.

Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 14:57:32 -07:00
Paolo Abeni
e5f0e8f8e4 net: sched: introduce and use qdisc tree flush/purge helpers
The same code to flush qdisc tree and purge the qdisc queue
is duplicated in many places and in most cases it does not
respect NOLOCK qdisc: the global backlog len is used and the
per CPU values are ignored.

This change addresses the above, factoring-out the relevant
code and using the helpers introduced by the previous patch
to fetch the correct backlog len.

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 14:50:13 -07:00
Paolo Abeni
5dd431b6b9 net: sched: introduce and use qstats read helpers
Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.

Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.

This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.

DRR needs also to access the child qdisc queue length, so it
needs custom handling.

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 14:50:13 -07:00
Stephen Suryaputra
8c83f2df9c vrf: check accept_source_route on the original netdevice
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.

v2->v3:
- Simplify by passing the original netdevice down the stack (per David
  Ahern).

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01 10:44:58 -07:00
Jiri Pirko
a4e76ba6b4 mlxsw: spectrum_acl: Rename rehash_dis trace
The name of the trace is no longer correct, since there is no disable of
rehash done. So name it "rehash_rollback_failed".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-31 11:01:23 -07:00
Linus Torvalds
63fc9c2348 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "A collection of x86 and ARM bugfixes, and some improvements to
  documentation.

  On top of this, a cleanup of kvm_para.h headers, which were exported
  by some architectures even though they not support KVM at all. This is
  responsible for all the Kbuild changes in the diffstat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
  KVM: doc: Document the life cycle of a VM and its resources
  KVM: selftests: complete IO before migrating guest state
  KVM: selftests: disable stack protector for all KVM tests
  KVM: selftests: explicitly disable PIE for tests
  KVM: selftests: assert on exit reason in CR4/cpuid sync test
  KVM: x86: update %rip after emulating IO
  x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
  kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
  KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
  kvm: don't redefine flags as something else
  kvm: mmu: Used range based flushing in slot_handle_level_range
  KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
  KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
  kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
  KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
  KVM: Reject device ioctls from processes other than the VM's creator
  KVM: doc: Fix incorrect word ordering regarding supported use of APIs
  KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
  KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
  ...
2019-03-31 08:55:59 -07:00
Linus Torvalds
f78b5be2a5 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
 "A small set of core updates:

   - Make the watchdog respect the selected CPU mask again. That was
     broken by the rework of the watchdog thread management and caused
     inconsistent state and NMI watchdog being unstoppable.

   - Ensure that the objtool build can find the libelf location.

   - Remove dead kcore stub code"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  watchdog: Respect watchdog cpumask on CPU hotplug
  objtool: Query pkg-config for libelf location
  proc/kcore: Remove unused kclist_add_remap()
2019-03-31 07:47:21 -07:00
Linus Torvalds
3af9a5256f Merge tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
 "As you can see [in the git history] I was away on leave and Bartosz
  kindly stepped in and collected a slew of fixes, I pulled them into my
  tree in two sets and merged some two more fixes (fixing my own caused
  bugs) on top.

  Summary:

   - Revert the extended use of gpio_set_config() and think about how we
     can do this properly.

   - Fix up the SPI CS GPIO handling so it now works properly on the SPI
     bus children, as intended.

   - Error paths and driver fixes"

* tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: mockup: use simple_read_from_buffer() in debugfs read callback
  gpio: of: Fix of_gpiochip_add() error path
  gpio: of: Check for "spi-cs-high" in child instead of parent node
  gpio: of: Check propname before applying "cs-gpios" quirks
  gpio: mockup: fix debugfs read
  Revert "gpio: use new gpio_set_config() helper in more places"
  gpio: aspeed: fix a potential NULL pointer dereference
  gpio: amd-fch: Fix bogus SPDX identifier
  gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
  gpio: exar: add a check for the return value of ida_simple_get fails
2019-03-30 11:33:34 -07:00
Linus Torvalds
782492a7a4 Merge tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
 "This corrects a previous attempt to make Linux use its own set of ACPI
  debug flags different from the upstream ACPICA's default (Erik
  Schmauss)"

* tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: use different default debug value than ACPICA
2019-03-30 10:09:11 -07:00
Linus Torvalds
922c010cf2 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "22 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
  fs: fs_parser: fix printk format warning
  checkpatch: add %pt as a valid vsprintf extension
  mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
  drivers/block/zram/zram_drv.c: fix idle/writeback string compare
  mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
  mm/memory_hotplug.c: fix notification in offline error path
  ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
  fs/proc/kcore.c: make kcore_modules static
  include/linux/list.h: fix list_is_first() kernel-doc
  mm/debug.c: fix __dump_page when mapping->host is not set
  mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  include/linux/hugetlb.h: convert to use vm_fault_t
  iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
  mm: add support for kmem caches in DMA32 zone
  ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
  mm/hotplug: fix offline undo_isolate_page_range()
  fs/open.c: allow opening only regular files during execve()
  mailmap: add Changbin Du
  mm/debug.c: add a cast to u64 for atomic64_read()
  ...
2019-03-29 16:02:28 -07:00
David S. Miller
19c847444d Merge tag 'mlx5-fixes-2019-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-03-29

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.11
('net/mlx5: Decrease default mr cache size')

For -stable v4.12
('net/mlx5e: Add a lock on tir list')

For -stable v4.13
('net/mlx5e: Fix error handling when refreshing TIRs')

For -stable v4.18
('net/mlx5e: Update xon formula')

For -stable v4.19
('net: mlx5: Add a missing check on idr_find, free buf')
('net/mlx5e: Update xoff formula')

net-next merge Note:
When merged with net-next the following simple conflict will appear,

drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c

++<<<<<<< HEAD (net)
 + *   max_mtu: netdev's max_mtu
++=======
+  *    @mtu: device's MTU
++>>>>>>> net-next

To resolve: just replace the line in net-next
*    @mtu: device's MTU
to
*    @max_mtu: netdev's max_mtu
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 15:23:16 -07:00
Linus Torvalds
eed4897dfe Merge tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
 "Here is a single driver core patch for 5.1-rc3.

  After 5.1-rc1, all of the users of BUS_ATTR() are finally removed, so
  we can now drop this macro from include/linux/device.h so that no more
  new users will be created.

  This patch has been in linux-next for a while, with no reported
  issues"

* tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: remove BUS_ATTR()
2019-03-29 15:07:29 -07:00
Linus Torvalds
6f51092301 Merge tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are some binder, habanalabs, and vboxguest driver fixes for
  5.1-rc3.

  The Binder fixes resolve some reported issues found by testing, first
  by the selinux developers, and then earlier today by syzbot.

  The habanalabs fixes are all minor, resolving a number of tiny things.

  The vboxguest patches are a bit larger. They resolve the fact that
  virtual box decided to change their api in their latest release in a
  way that broke the existing kernel code, despite saying that they were
  never going to do that. So this is a bit of a "new feature", but is
  good to get merged so that 5.1 will work with the latest release. The
  changes are not large and of course virtual box "swears" they will not
  break this again, but no one is holding their breath here.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x
  binder: fix race between munmap() and direct reclaim
  binder: fix BUG_ON found by selinux-testsuite
  habanalabs: cast to expected type
  habanalabs: prevent host crash during suspend/resume
  habanalabs: perform accounting for active CS
  habanalabs: fix mapping with page size bigger than 4KB
  habanalabs: complete user context cleanup before hard reset
  habanalabs: fix bug when mapping very large memory area
  habanalabs: fix MMU number of pages calculation
2019-03-29 15:03:30 -07:00
wenxu
18b6f71748 openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode
There is currently no support for the multicast/broadcast aspects
of VXLAN in ovs. In the datapath flow the tun_dst must specific.
But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
And the packet can forward through the fdb table of vxlan devcice. In
this mode the broadcast/multicast packet can be sent through the
following ways in ovs.

ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
        options:key=1000 options:remote_ip=flow
ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff, \
        action=output:vxlan

bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
        src_vni 1000 vni 1000 self
bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
        src_vni 1000 vni 1000 self

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 13:29:15 -07:00
Yuval Avnery
80a2a9026b net/mlx5e: Add a lock on tir list
Refresh tirs is looping over a global list of tirs while netdevs are
adding and removing tirs from that list. That is why a lock is
required.

Fixes: 724b2aa151 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-29 12:24:41 -07:00
David Ahern
3616d08bcb ipv6: Move ipv6 stubs to a separate header file
The number of stubs is growing and has nothing to do with addrconf.
Move the definition of the stubs to a separate header file and update
users. In the move, drop the vxlan specific comment before ipv6_stub.

Code move only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:53:45 -07:00
David Ahern
979e276ebe net: Use common nexthop init and release helpers
With fib_nh_common in place, move common initialization and release
code into helpers used by both ipv4 and ipv6. For the moment, the init
is just the lwt encap and the release is both the netdev reference and
the the lwt state reference. More will be added later.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:48:04 -07:00
David Ahern
f1741730dd net: Add fib_nh_common and update fib_nh and fib6_nh
Add fib_nh_common struct with common nexthop attributes. Convert
fib_nh and fib6_nh to use it. Use macros to move existing
fib_nh_* references to the new nh_common.nhc_*.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:48:04 -07:00
David Ahern
ad1601ae02 ipv6: Rename fib6_nh entries
Rename fib6_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, gateway, flags, and lwtstate are common
with all nexthop definitions. In some places new temporary variables
are declared or local variables renamed to maintain line lengths.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:48:04 -07:00
David Ahern
b75ed8b1aa ipv4: Rename fib_nh entries
Rename fib_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, oif, gateway, flags, scope, lwtstate,
nh_weight and nh_upper_bound are common with all nexthop definitions.
In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really
long lines.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:48:04 -07:00