Commit Graph

101101 Commits

Author SHA1 Message Date
David S. Miller
6f6e434aa2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21 16:01:54 -04:00
Linus Torvalds
5aef268ace Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix refcounting bug for connections in on-packet scheduling mode of
    IPVS, from Julian Anastasov.

 2) Set network header properly in AF_PACKET's packet_snd, from Willem
    de Bruijn.

 3) Fix regressions in 3c59x by converting to generic DMA API. It was
    relying upon the hack that the PCI DMA interfaces would accept NULL
    for EISA devices. From Christoph Hellwig.

 4) Remove RDMA devices before unregistering netdev in QEDE driver, from
    Michal Kalderon.

 5) Use after free in TUN driver ptr_ring usage, from Jason Wang.

 6) Properly check for missing netlink attributes in SMC_PNETID
    requests, from Eric Biggers.

 7) Set DMA mask before performaing any DMA operations in vmxnet3
    driver, from Regis Duchesne.

 8) Fix mlx5 build with SMP=n, from Saeed Mahameed.

 9) Classifier fixes in bcm_sf2 driver from Florian Fainelli.

10) Tuntap use after free during release, from Jason Wang.

11) Don't use stack memory in scatterlists in tls code, from Matt
    Mullins.

12) Not fully initialized flow key object in ipv4 routing code, from
    David Ahern.

13) Various packet headroom bug fixes in ip6_gre driver, from Petr
    Machata.

14) Remove queues from XPS maps using correct index, from Amritha
    Nambiar.

15) Fix use after free in sock_diag, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: ip6_gre: fix tunnel metadata device sharing.
  cxgb4: fix offset in collecting TX rate limit info
  net: sched: red: avoid hashing NULL child
  sock_diag: fix use-after-free read in __sk_free
  sh_eth: Change platform check to CONFIG_ARCH_RENESAS
  net: dsa: Do not register devlink for unused ports
  net: Fix a bug in removing queues from XPS map
  bpf: fix truncated jump targets on heavy expansions
  bpf: parse and verdict prog attach may race with bpf map update
  bpf: sockmap update rollback on error can incorrectly dec prog refcnt
  net: test tailroom before appending to linear skb
  net: ip6_gre: Fix ip6erspan hlen calculation
  net: ip6_gre: Split up ip6gre_changelink()
  net: ip6_gre: Split up ip6gre_newlink()
  net: ip6_gre: Split up ip6gre_tnl_change()
  net: ip6_gre: Split up ip6gre_tnl_link_config()
  net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
  net: ip6_gre: Request headroom in __gre6_xmit()
  selftests/bpf: check return value of fopen in test_verifier.c
  erspan: fix invalid erspan version.
  ...
2018-05-21 08:37:48 -07:00
Florian Fainelli
bf3c592b9d net: dsa: b53: Extend platform data to include DSA ports
The b53 driver already defines and internally uses platform data to let the
glue drivers specify parameters such as the chip id.  What we were missing was
a way to tell the core DSA layer about the ports and their type.

Place a dsa_chip_data structure at the beginning of b53_platform_data for
dsa_register_switch() to access it. This does not require modifications to
b53_common.c which will pass platform_data trough.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20 18:59:34 -04:00
Andrew Lunn
00baabe528 net: dsa: mv88e6xxx: Add support for EEPROM via platform data
Add the size of the EEPROM to the platform data, so it can also be
instantiated by a platform device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20 18:58:27 -04:00
Andrew Lunn
877b7cb0b6 net: dsa: mv88e6xxx: Add minimal platform_data support
Not all the world uses device tree. Some parts of the world still use
platform devices and platform data. Add basic support for probing a
Marvell switch via platform data.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20 18:58:27 -04:00
William Tu
d48f1958ab erspan: set bso bit based on mirrored packet's len
Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
handled.  BSO has 4 possible values:
  00 --> Good frame with no error, or unknown integrity
  11 --> Payload is a Bad Frame with CRC or Alignment Error
  01 --> Payload is a Short Frame
  10 --> Payload is an Oversized Frame

Based the short/oversized definitions in RFC1757, the patch sets
the bso bit based on the mirrored packet's size.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20 18:31:42 -04:00
David S. Miller
2c71ab4bb4 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2018-05-18

Here's the first bluetooth-next pull request for the 4.18 kernel:

 - Refactoring of the btbcm driver
 - New USB IDs for QCA_ROME and LiteOn controllers
 - Buffer overflow fix if the controller sends invalid advertising data length
 - Various cleanups & fixes for Qualcomm controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20 18:24:22 -04:00
Linus Torvalds
4b65f455c7 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "Two fixes to address shortcomings of the rwsem/percpu-rwsem lock
  debugging code which emits false positive warnings when the rwsem is
  anonymously locked and unlocked"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
  locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
2018-05-20 10:43:27 -07:00
Linus Torvalds
056ad121c2 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:

 - Use explicitely sized type for the romimage pointer in the 32bit EFI
   protocol struct so a 64bit kernel does not expand it to 64bit. Ditto
   for the 64bit struct to avoid the reverse issue on 32bit kernels.

 - Handle randomized tex offset correctly in the ARM64 EFI stub to avoid
   unaligned data resulting in stack corruption and other hard to
   diagnose wreckage.

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub/arm64: Handle randomized TEXT_OFFSET
  efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
2018-05-20 10:36:52 -07:00
Jiri Pirko
08474c1a9d devlink: introduce a helper to generate physical port names
Each driver implements physical port name generation by itself. However
as devlink has all needed info, it can easily do the job for all its
users. So implement this helper in devlink.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Jiri Pirko
5ec1380a21 devlink: extend attrs_set for setting port flavours
Devlink ports can have specific flavour according to the purpose of use.
This patch extend attrs_set so the driver can say which flavour port
has. Initial flavours are:
physical, cpu, dsa
User can query this to see right away what is the purpose of each port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Jiri Pirko
b9ffcbaf56 devlink: introduce devlink_port_attrs_set
Change existing setter for split port information into more generic
attrs setter. Alongside with that, allow to set port number and subport
number for split ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Souptick Joarder
d97baf9470 include/linux/mm.h: add new inline function vmf_error()
Many places in drivers/ file systems, error was handled in a common way
like below:

	ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;

vmf_error() will replace this and return vm_fault_t type err.

A lot of drivers and filesystems currently have a rather complex mapping
of errno-to-VM_FAULT code.  We have been able to eliminate a lot of it
by just returning VM_FAULT codes directly from functions which are
called exclusively from the fault handling path.

Some functions can be called both from the fault handler and other
context which are expecting an errno, so they have to continue to return
an errno.  Some users still need to choose different behaviour for
different errnos, but vmf_error() captures the essential error
translation that's common to all users, and those that need to handle
additional errors can handle them first.

Link: http://lkml.kernel.org/r/20180510174826.GA14268@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
David S. Miller
3888ea4e2f Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 13:00:08 -04:00
Linus Torvalds
163ced613c Merge tag 'mtd/fixes-for-4.17-rc6' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
 "NAND fixes:
   - Fix read path of the Marvell NAND driver
   - Make sure we don't pass a u64 to ndelay()

  CFI fix:
   - Fix the map_word_andequal() implementation"

* tag 'mtd/fixes-for-4.17-rc6' of git://git.infradead.org/linux-mtd:
  mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit
  mtd: rawnand: marvell: Fix read logic for layouts with ->nchunks > 2
  mtd: Fix comparison in map_word_andequal()
2018-05-18 09:58:29 -07:00
Eric Dumazet
9c21d2fc41 tcp: add tcp_comp_sack_nr sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

This limits number of SACK that can be compressed.
Using 0 disables SACK compression.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
6d82aa2420 tcp: add tcp_comp_sack_delay_ns sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
200d95f457 tcp: add TCPAckCompressed SNMP counter
This counter tracks number of ACK packets that the host has not sent,
thanks to ACK compression.

Sample output :

$ nstat -n;sleep 1;nstat|egrep "IpInReceives|IpOutRequests|TcpInSegs|TcpOutSegs|TcpExtTCPAckCompressed"
IpInReceives                    123250             0.0
IpOutRequests                   3684               0.0
TcpInSegs                       123251             0.0
TcpOutSegs                      3684               0.0
TcpExtTCPAckCompressed          119252             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
5d9f4262b7 tcp: add SACK compression
When TCP receives an out-of-order packet, it immediately sends
a SACK packet, generating network load but also forcing the
receiver to send 1-MSS pathological packets, increasing its
RTX queue length/depth, and thus processing time.

Wifi networks suffer from this aggressive behavior, but generally
speaking, all these SACK packets add fuel to the fire when networks
are under congestion.

This patch adds a high resolution timer and tp->compressed_ack counter.

Instead of sending a SACK, we program this timer with a small delay,
based on RTT and capped to 1 ms :

	delay = min ( 5 % of RTT, 1 ms)

If subsequent SACKs need to be sent while the timer has not yet
expired, we simply increment tp->compressed_ack.

When timer expires, a SACK is sent with the latest information.
Whenever an ACK is sent (if data is sent, or if in-order
data is received) timer is canceled.

Note that tcp_sack_new_ofo_skb() is able to force a SACK to be sent
if the sack blocks need to be shuffled, even if the timer has not
expired.

A new SNMP counter is added in the following patch.

Two other patches add sysctls to allow changing the 1,000,000 and 44
values that this commit hard-coded.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
cf0dd20372 tcp: use __sock_put() instead of sock_put() in tcp_clear_xmit_timers()
Socket can not disappear under us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Loic Poulain
d6ee6ad774 Bluetooth: Add __hci_cmd_send function
This function allows to send a HCI command without expecting any
controller event/response in return. This is allowed for vendor-
specific commands only.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-05-18 06:37:52 +02:00
Shahar Klein
3e99df8772 net/mlx5: Add source e-switch owner
The source e-switch owner allows a vport on one e-switch port be associated
with a rule defined on the second port e-switch.

The role of the source eswitch owner valid bit in the flow group is to
allow the firmware fail driver attempts to wild card the source eswitch
match field. If this bit is not set, the firmware ignores the source
eswitch owner field totally.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Shahar Klein
b17f7fc10f net/mlx5: Add destination e-switch owner
The destination e-switch owner allows a rule in namespace of one e-switch
owner to point to a vport that is natively associated with another
e-switch owner.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Roi Dayan
a6d0456912 net/mlx5: Add merged e-switch cap
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.
Under merged eswitch, both the matching on source vport and setting
destination vport can have a 2nd attribute which is the vhca id of the
eswitch owner.

For example:
esw0: {match: <src vport=1 owner=0> action: fwd to <dst vport=7, owner=1>}
is a flow set on eswitch0 matching on source vport=1 from his eswitch
and the action being fwd to dest vport=7 of eswitch1.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz Klein <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Antoine Tenart
5490b8725d phy: add 2.5G SGMII mode to the phy_mode enum
This patch adds one more generic PHY mode to the phy_mode enum, to allow
configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
callback.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:39 -04:00
David S. Miller
a564b659bb Merge tag 'wireless-drivers-next-for-davem-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:

====================
wireless-drivers-next patches for 4.18

The first pull request for 4.18. As usual new features and bug fixes
but nothing really special.

I also merged wireless-drivers due to an iwlwifi patch dependency.

Major changes:

iwlwifi

* implement Traffic Condition Monitor and use it for scan, BT coex and
  to detect when the AP doesn't support UAPSD properly

* some more work for the 22000 family of devices;

* introduce AMSDU rate control offload

qtnfmac

* DFS offload support

rsi

* roaming enhancements

* increase max supported aggregation subframes

* don't advertise 5 GHz support if the device doesn't support it

brcmfmac

* add support for BCM4366E chipset

* add support for bcm43364 wireless chipset

ath10k

* enable temperature reads for QCA6174 and QCA9377

* add firmware memory dump support for QCA9984

* continue adding WCN3990 support via SNOC bus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:00:40 -04:00
Yuchung Cheng
b8fef65a8a tcp: new helper tcp_rack_skb_timeout
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack
to prepare the final RTO change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00
Yuchung Cheng
d716bfdb10 tcp: account lost retransmit after timeout
The previous approach for the lost and retransmit bits was to
wipe the slate clean: zero all the lost and retransmit bits,
correspondingly zero the lost_out and retrans_out counters, and
then add back the lost bits (and correspondingly increment lost_out).

The new approach is to treat this very much like marking packets
lost in fast recovery. We don’t wipe the slate clean. We just say
that for all packets that were not yet marked sacked or lost, we now
mark them as lost in exactly the same way we do for fast recovery.

This fixes the lost retransmit accounting at RTO time and greatly
simplifies the RTO code by sharing much of the logic with Fast
Recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00
Yuchung Cheng
6ac06ecd3a tcp: simpler NewReno implementation
This is a rewrite of NewReno loss recovery implementation that is
simpler and standalone for readability and better performance by
using less states.

Note that NewReno refers to RFC6582 as a modification to the fast
recovery algorithm. It is used only if the connection does not
support SACK in Linux. It should not to be confused with the Reno
(AIMD) congestion control.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:28 -04:00
Yuchung Cheng
20b654dfe1 tcp: support DUPACK threshold in RACK
This patch adds support for the classic DUPACK threshold rule
(#DupThresh) in RACK.

When the number of packets SACKed is greater or equal to the
threshold, RACK sets the reordering window to zero which would
immediately mark all the unsacked packets below the highest SACKed
sequence lost. Since this approach is known to not work well with
reordering, RACK only uses it if no reordering has been observed.

The DUPACK threshold rule is a particularly useful extension to the
fast recoveries triggered by RACK reordering timer. For example
data-center transfers where the RTT is much smaller than a timer
tick, or high RTT path where the default RTT/4 may take too long.

Note that this patch differs slightly from RFC6675. RFC6675
considers a packet lost when at least #DupThresh higher-sequence
packets are SACKed.

With RACK, for connections that have seen reordering, RACK
continues to use a dynamically-adaptive time-based reordering
window to detect losses. But for connections on which we have not
yet seen reordering, this patch considers a packet lost when at
least one higher sequence packet is SACKed and the total number
of SACKed packets is at least DupThresh. For example, suppose a
connection has not seen reordering, and sends 10 packets, and
packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2
lost. RACK considers packets 1, 2, 4, 6 lost.

There is some small risk of spurious retransmits here due to
reordering. However, this is mostly limited to the first flight of
a connection on which the sender receives SACKs from reordering.
And RFC 6675 and FACK loss detection have a similar risk on the
first flight with reordering (it's just that the risk of spurious
retransmits from reordering was slightly narrower for those older
algorithms due to the margin of 3*MSS).

Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:28 -04:00
Matt Mullins
8ab6ffba14 tls: don't use stack memory in a scatterlist
scatterlist code expects virt_to_page() to work, which fails with
CONFIG_VMAP_STACK=y.

Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 14:49:38 -04:00
Linus Torvalds
58ddfe6c3a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:

 - ARM/ARM64 locking fixes

 - x86 fixes: PCID, UMIP, locking

 - improved support for recent Windows version that have a 2048 Hz APIC
   timer

 - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME

 - better behaved selftests

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
  KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
  KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
  KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
  KVM: arm/arm64: Properly protect VGIC locks from IRQs
  KVM: X86: Lower the default timer frequency limit to 200us
  KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
  kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
  KVM: selftests: exit with 0 status code when tests cannot be run
  KVM: hyperv: idr_find needs RCU protection
  x86: Delay skip of emulated hypercall instruction
  KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
2018-05-17 10:23:36 -07:00
Paolo Abeni
021a17ed79 pfifo_fast: drop unneeded additional lock on dequeue
After the previous patch, for NOLOCK qdiscs, q->seqlock is
always held when the dequeue() is invoked, we can drop
any additional locking to protect such operation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 12:46:54 -04:00
Paolo Abeni
96009c7d50 sched: replace __QDISC_STATE_RUNNING bit with a spin lock
So that we can use lockdep on it.
The newly introduced sequence lock has the same scope of busylock,
so it shares the same lockdep annotation, but it's only used for
NOLOCK qdiscs.

With this changeset we acquire such lock in the control path around
flushing operation (qdisc reset), to allow more NOLOCK qdisc perf
improvement in the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 12:46:54 -04:00
Willy Tarreau
7f7ccc2ccc proc: do not access cmdline nor environ from file-backed areas
proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.

Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.

This was assigned CVE-2018-1120.

Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.

Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-17 09:27:47 -07:00
David S. Miller
b9f672af14 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
   in the kernel tables from an XDP or tc BPF program. The helper
   provides a fast-path for forwarding packets. The API supports
   IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
   implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
   extending the BPF offload beyond a pure host processing offload.
   Offloaded XDP programs are allowed to set the RX queue index and
   thus opening the door for defining a fully programmable RSS/n-tuple
   filter replacement. Once BPF decided on a queue already, the device
   data-path will skip the conventional RSS processing completely,
   from Jakub.

3) The original sockmap implementation was array based similar to
   devmap. However unlike devmap where an ifindex has a 1:1 mapping
   into the map there are use cases with sockets that need to be
   referenced using longer keys. Hence, sockhash map is added reusing
   as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
   with BPF maps and progs. It also makes BTF accessible to user
   space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
   through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
   up_read() of current->mm->mmap_sem build_id cannot be parsed.
   This work defers the up_read() via a per-cpu irq_work so that
   at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
   JIT conversion as well as implementation of an optimized 32/64 bit
   immediate load in the arm64 JIT that allows to reduce the number of
   emitted instructions; in case of tested real-world programs they
   were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
   BPF offload support. Right now only iproute2 can load offloaded
   BPF and this will also enable libbpf for direct integration into
   other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
   RST format since this is the appropriate standard the kernel is
   moving to for all documentation. Also add an overview README.rst,
   from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
   helper. Though it uses va_list we can still allow gcc to check
   the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
    is a bash 4.0+ feature which is not guaranteed to be available
    when calling out to shell, therefore use a more portable variant,
    from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
    instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
    overly large key size is used in hashmap then causing overflows
    in htab->elem_size. Reject bogus attr->key_size early in the
    sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
    --build-id is always enabled so that test_stacktrace_build_id[_nmi]
    won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
    header infrastructure which point to one of the arch specific
    uapi headers. This was needed in order to fix a build error on
    some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
    code. And also a bpf.h tools uapi header sync in order to fix a
    selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
    section of the BPF documentation, from Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 22:47:11 -04:00
Linus Torvalds
e6506eb241 Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "Some of the ftrace internal events use a zero for a data size of a
  field event. This is increasingly important for the histogram trigger
  work that is being extended.

  While auditing trace events, I found that a couple of the xen events
  were used as just marking that a function was called, by creating a
  static array of size zero. This can play havoc with the tracing
  features if these events are used, because a zero size of a static
  array is denoted as a special nul terminated dynamic array (this is
  what the trace_marker code uses). But since the xen events have no
  size, they are not nul terminated, and unexpected results may occur.

  As trace events were never intended on being a marker to denote that a
  function was hit or not, especially since function tracing and kprobes
  can trivially do the same, the best course of action is to simply
  remove these events"

* tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
2018-05-16 16:45:23 -07:00
Mathieu Malaterre
be2d04d11f bpf: add __printf verification to bpf_verifier_vlog
__printf is useful to verify format and arguments. ‘bpf_verifier_vlog’
function is used twice in verifier.c in both cases the caller function
already uses the __printf gcc attribute.

Remove the following warning, triggered with W=1:

  kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17 00:49:17 +02:00
Paolo Abeni
32f7b44d0f sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers
Currently NOLOCK qdiscs pay a measurable overhead to atomically
manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per
packet in the uncontended scenario with packet rate below the
line rate: on packed dequeue and on the next, failing dequeue attempt.

This changeset moves the bit manipulation into the qdisc_run_{begin,end}
helpers, so that the bit is now flipped only once per packet, with
measurable performance improvement in the uncontended scenario.

This also allows simplifying the qdisc teardown code path - since
qdisc_is_running() is now effective for each qdisc type - and avoid a
possible race between qdisc_run() and dev_deactivate_many(), as now
the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy
dequeuing packets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 12:26:25 -04:00
Saeed Mahameed
e3ca348806 net/mlx5: Fix build break when CONFIG_SMP=n
Avoid using the kernel's irq_descriptor and return IRQ vector affinity
directly from the driver.

This fixes the following build break when CONFIG_SMP=n

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
        ‘struct irq_desc’ has no member named ‘affinity_hint’

Fixes: 6082d9c9c9 ("net/mlx5: Fix mlx5_get_vector_affinity function")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CC: Randy Dunlap <rdunlap@infradead.org>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 12:16:45 -04:00
Debabrata Banerjee
e79c105574 bonding: allow use of tx hashing in balance-alb
The rx load balancing provided by balance-alb is not mutually
exclusive with using hashing for tx selection, and should provide a decent
speed increase because this eliminates spinlocks and cache contention.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 12:15:11 -04:00
Waiman Long
5a817641f6 locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
The filesystem freezing code needs to transfer ownership of a rwsem
embedded in a percpu-rwsem from the task that does the freezing to
another one that does the thawing by calling percpu_rwsem_release()
after freezing and percpu_rwsem_acquire() before thawing.

However, the new rwsem debug code runs afoul with this scheme by warning
that the task that releases the rwsem isn't the one that acquires it,
as reported by Amir Goldstein:

  DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
  WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79

  Call Trace:
   percpu_up_write+0x1f/0x28
   thaw_super_locked+0xdf/0x120
   do_vfs_ioctl+0x270/0x5f1
   ksys_ioctl+0x52/0x71
   __x64_sys_ioctl+0x16/0x19
   do_syscall_64+0x5d/0x167
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

To work properly with the rwsem debug code, we need to annotate that the
rwsem ownership is unknown during the tranfer period until a brave soul
comes forward to acquire the ownership. During that period, optimistic
spinning will be disabled.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 11:45:16 +02:00
John Fastabend
8111038444 bpf: sockmap, add hash map support
Sockmap is currently backed by an array and enforces keys to be
four bytes. This works well for many use cases and was originally
modeled after devmap which also uses four bytes keys. However,
this has become limiting in larger use cases where a hash would
be more appropriate. For example users may want to use the 5-tuple
of the socket as the lookup key.

To support this add hash support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-15 20:41:03 +02:00
Linus Torvalds
21b9f1c7e3 Merge tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
 "Here's a set of patches that fix a number of bugs in the in-kernel AFS
  client, including:

   - Fix directory locking to not use individual page locks for
     directory reading/scanning but rather to use a semaphore on the
     afs_vnode struct as the directory contents must be read in a single
     blob and data from different reads must not be mixed as the entire
     contents may be shuffled about between reads.

   - Fix address list parsing to handle port specifiers correctly.

   - Only give up callback records on a server if we actually talked to
     that server (we might not be able to access a server).

   - Fix some callback handling bugs, including refcounting,
     whole-volume callbacks and when callbacks actually get broken in
     response to a CB.CallBack op.

   - Fix some server/address rotation bugs, including giving up if we
     can't probe a server; giving up if a server says it doesn't have a
     volume, but there are more servers to try.

   - Fix the decoding of fetched statuses to be OpenAFS compatible.

   - Fix the handling of server lookups in Cache Manager ops (such as
     CB.InitCallBackState3) to use a UUID if possible and to handle no
     server being found.

   - Fix a bug in server lookup where not all addresses are compared.

   - Fix the non-encryption of calls that prevents some servers from
     being accessed (this also requires an AF_RXRPC patch that has
     already gone in through the net tree).

  There's also a patch that adds tracepoints to log Cache Manager ops
  that don't find a matching server, either by UUID or by address"

* tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix the non-encryption of calls
  afs: Fix CB.CallBack handling
  afs: Fix whole-volume callback handling
  afs: Fix afs_find_server search loop
  afs: Fix the handling of an unfound server in CM operations
  afs: Add a tracepoint to record callbacks from unlisted servers
  afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
  afs: Fix VNOVOL handling in address rotation
  afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
  afs: Fix server rotation's handling of fileserver probe failure
  afs: Fix refcounting in callback registration
  afs: Fix giving up callbacks on server destruction
  afs: Fix address list parsing
  afs: Fix directory page locking
2018-05-15 10:48:36 -07:00
John Fastabend
e5cd3abcb3 bpf: sockmap, refactor sockmap routines to work with hashmap
This patch only refactors the existing sockmap code. This will allow
much of the psock initialization code path and bpf helper codes to
work for both sockmap bpf map types that are backed by an array, the
currently supported type, and the new hash backed bpf map type
sockhash.

Most the fallout comes from three changes,

  - Pushing bpf programs into an independent structure so we
    can use it from the htab struct in the next patch.
  - Generalizing helpers to use void *key instead of the hardcoded
    u32.
  - Instead of passing map/key through the metadata we now do
    the lookup inline. This avoids storing the key in the metadata
    which will be useful when keys can be longer than 4 bytes. We
    rename the sk pointers to sk_redir at this point as well to
    avoid any confusion between the current sk pointer and the
    redirect pointer sk_redir.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-15 17:19:59 +02:00
Geert Uytterhoeven
9f825e74d7 mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit
The __DIVIDE() macro checks whether it is called with a 32-bit or 64-bit
dividend, to select the appropriate divide-and-round-up routine.
As the check uses the ternary operator, the result will always be
promoted to a type that can hold both results, i.e. unsigned long long.

When using this result in a division on a 32-bit system, this may lead
to link errors like:

    ERROR: "__udivdi3" [drivers/mtd/nand/raw/nand.ko] undefined!

Fix this by casting the result of the division to the type of the
dividend.

Fixes: 8878b126df ("mtd: nand: add ->exec_op() implementation")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-05-15 14:40:34 +02:00
Steven Rostedt (VMware)
45dd9b0666 tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.

Worse yet, the hack used was:

 __array(char, x, 0)

Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.

Nuke the trace events!

Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 95a7d76897 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-14 17:02:30 -04:00
Marcelo Ricardo Leitner
81c7288b17 sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.

But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.

The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.

This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.

The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.

For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
	flower verbose \
	src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
	flower \
	src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 16:18:27 -04:00
David S. Miller
23548dab56 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-05-14

This series contains updates to virtchnl, i40e and i40evf.

Bruce cleans up whitespace and unnecessary parentheses in virtchnl.

Jake does a number of stat cleanups in the i40e driver, including
cleanup of code indentation, whitespace issues, remove duplicate stats,
fix grammar in code comment and general spring cleaning of the
statistics code.

Patryk fixes an issue where we recalculate vectors left and vectors
wanted but do not take into account the reduced number of queue pairs
per VSI.

Harshitha adds tx_busy stat to ethtool stats to track the number of
times we return NETDEV_TX_BUSY to the stack during transmit.

Paweł fixes a potential system crash when unloading the VF driver after
a hardware reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 14:28:59 -04:00
Rahul Lakkireddy
2724273e8f vmcore: add API to collect hardware dump in second kernel
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an Elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

Ensure that the device dump buffer size is always aligned to page size
so that it can be mmaped.

Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
generic and reserve NT_VMCOREDD note type to indicate vmcore device
dump.

Suggested-by: Eric Biederman <ebiederm@xmission.com>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 13:46:04 -04:00