Commit Graph

80928 Commits

Author SHA1 Message Date
Thomas Gleixner
54d751d4ad perf: Allow storage of PMU private data in event
For PMUs which are not per CPU, but e.g. per package/socket, we want to be
able to store a reference to the underlying per package/socket facility in the
event at init time so we can avoid magic storage constructs in the PMU driver.

This allows us to get rid of the per CPU dance in the intel uncore and RAPL
drivers and avoids a lookup of the per package data in the perf hotpath.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.364140369@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:16 +01:00
Linus Torvalds
1b9540ce03 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A rather largish series of 12 patches addressing a maze of race
  conditions in the perf core code from Peter Zijlstra"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Robustify task_function_call()
  perf: Fix scaling vs. perf_install_in_context()
  perf: Fix scaling vs. perf_event_enable()
  perf: Fix scaling vs. perf_event_enable_on_exec()
  perf: Fix ctx time tracking by introducing EVENT_TIME
  perf: Cure event->pending_disable race
  perf: Fix race between event install and jump_labels
  perf: Fix cloning
  perf: Only update context time when active
  perf: Allow perf_release() with !event->ctx
  perf: Do not double free
  perf: Close install vs. exit race
2016-02-28 07:52:00 -08:00
Linus Torvalds
691429e13d Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dax: move writeback calls into the filesystems
  dax: give DAX clearing code correct bdev
  ext4: online defrag not supported with DAX
  ext2, ext4: only set S_DAX for regular inodes
  block: disable block device DAX by default
  ocfs2: unlock inode if deleting inode from orphan fails
  mm: ASLR: use get_random_long()
  drivers: char: random: add get_random_long()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27 12:46:16 -08:00
Linus Torvalds
a9f8094aae Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
 "Enumeration:
    Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)

  Marvell MVEBU host bridge driver:
    Restrict build to 32-bit ARM (Thierry Reding)"

* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Restrict build to 32-bit ARM
  Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
  Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
  Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
2016-02-27 12:33:42 -08:00
Ross Zwisler
7f6d5b529b dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
20a90f5899 dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases.  This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block.  This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Daniel Cashman
ec9ee4acd9 drivers: char: random: add get_random_long()
Commit d07e22597d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.

The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64.  Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().

Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow.  This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.

Finally, replace calls to get_random_int() with get_random_long() where
appropriate.

This patch (of 2):

Add get_random_long().

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Linus Torvalds
c213341e49 Merge tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Things got calmed down for rc6, as it seems, and we have only a few
  HD-audio fixes at this time: a fix for Skylake codec probe errors, a
  fix for missing interrupt handling, and a few Dell and HP quirks"

* tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Loop interrupt handling until really cleared
  ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2
  ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225
  ALSA: hda - Fixing background noise on Dell Inspiron 3162
  ALSA: hda - Apply clock gate workaround to Skylake, too
2016-02-26 09:27:21 -08:00
Takashi Iwai
473f414564 ALSA: hda - Loop interrupt handling until really cleared
Currently the interrupt handler of HD-audio driver assumes that no irq
update is needed while processing the irq.  But in reality, it has
been confirmed that the HW irq is issued even during the irq
handling.  Since we clear the irq status at the beginning, process the
interrupt, then exits from the handler, the lately issued interrupt is
left untouched without being properly processed.

This patch changes the interrupt handler code to loop over the
check-and-process.  The handler tries repeatedly as long as the IRQ
status are turned on, and either stream or CORB/RIRB is handled.

For checking the stream handling, snd_hdac_bus_handle_stream_irq()
returns a value indicating the stream indices bits.  Other than that,
the change is only in the irq handler itself.

Reported-by: Libin Yang <libin.yang@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-26 08:50:31 +01:00
Linus Torvalds
3d7b365490 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:

 - Two fixes for compatibility with the ACPI 6.1 specification.

   Without these fixes multi-interface DIMMs will fail to be probed, and
   address range scrub commands to find memory errors will give results
   that the kernel will mis-interpret.  For multi-interface DIMMs Linux
   will accept either the original 6.0 implementation or 6.1.

   For address range scrub we'll only support 6.1 since ACPI formalized
   this DSM differently than the original example [1] implemented in
   v4.2.  The expectation is that production systems will only ever ship
   the ACPI 6.1 address range scrub command definition.

 - The wider async address range scrub work targeting 4.6 discovered
   that the original synchronous implementation in 4.5 is not sizing its
   return buffer correctly.

 - Arnd caught that my recent fix to the size of the pfn_t flags missed
   updating the flags variable used in the pmem driver.

 - Toshi found that we mishandle the memremap() return value in
   devm_memremap().

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm: use 'u64' for pfn flags
  devm_memremap: Fix error value when memremap failed
  nfit: update address range scrub commands to the acpi 6.1 format
  libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
  nfit: fix multi-interface dimm handling, acpi6.1 compatibility
2016-02-25 18:54:53 -08:00
Linus Torvalds
1ebe3839e6 Merge tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
 "Add a regression fix for changed sysfs path of bq27xxx_battery and
  update MAINTAINERS file"

* tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: bq27xxx_battery: Restore device name
  MAINTAINERS: update bq27xxx driver
2016-02-25 18:42:08 -08:00
Peter Zijlstra
9107c89e26 perf: Fix race between event install and jump_labels
perf_install_in_context() relies upon the context switch hooks to have
scheduled in events when the IPI misses its target -- after all, if
the task has moved from the CPU (or wasn't running at all), it will
have to context switch to run elsewhere.

This however doesn't appear to be happening.

It is possible for the IPI to not happen (task wasn't running) only to
later observe the task running with an inactive context.

The only possible explanation is that the context switch hooks are not
called. Therefore put in a sync_sched() after toggling the jump_label
to guarantee all CPUs will have them enabled before we install an
event.

A simple if (0->1) sync_sched() will not in fact work, because any
further increment can race and complete before the sync_sched().
Therefore we must jump through some hoops.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25 08:42:34 +01:00
Peter Zijlstra
a69b0ca4ac perf: Fix cloning
Alexander reported that when the 'original' context gets destroyed, no
new clones happen.

This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.

perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.

However, since inherit_event() does:

	if (parent_event->parent)
		parent_event = parent_event->parent;

it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.

Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.

Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.

STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.

Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Fixes: c6e5b73242 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25 08:42:33 +01:00
Dan Williams
4577b06655 nfit: update address range scrub commands to the acpi 6.1 format
The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
     "9.20.7 NVDIMM Root Device _DSMs"

Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
   implemented in the middle of the existing definition so this change
   is not backwards compatible.  The expectation is that shipping
   platforms will only ever support the ACPI 6.1 definition.

2/ New status values for ars_start ('busy') and ars_status ('overflow').

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-02-23 17:17:20 -08:00
Linus Torvalds
420eb6d7ef Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Stable bugfixes:
   - Fix nfs_size_to_loff_t
   - NFSv4: Fix a dentry leak on alias use

  Other bugfixes:
   - Don't schedule a layoutreturn if the layout segment can be freed
     immediately.
   - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
   - rpcrdma_bc_receive_call() should init rq_private_buf.len
   - fix stateid handling for the NFS v4.2 operations
   - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
   - fix panic in gss_pipe_downcall() in fips mode
   - Fix a race between layoutget and pnfs_destroy_layout
   - Fix a race between layoutget and bulk recalls"

* tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
  NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
  auth_gss: fix panic in gss_pipe_downcall() in fips mode
  pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
  nfs4: fix stateid handling for the NFS v4.2 operations
  NFSv4: Fix a dentry leak on alias use
  xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len
  pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
  pNFS: Fix pnfs_mark_matching_lsegs_return()
  nfs: fix nfs_size_to_loff_t
2016-02-23 16:39:21 -08:00
Linus Torvalds
dea08e6044 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Looks like a lot, but mostly driver fixes scattered all over as usual.

  Of note:

   1) Add conditional sched in nf conntrack in cleanup to avoid NMI
      watchdogs.  From Florian Westphal.

   2) Fix deadlock in nfnetlink cttimeout, also from Floarian.

   3) Fix handling of slaves in bonding ARP monitor validation, from Jay
      Vosburgh.

   4) Callers of ip_cmsg_send() are responsible for freeing IP options,
      some were not doing so.  Fix from Eric Dumazet.

   5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.

   6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.

   7) bcm7xxx PHY driver bug fixes from Florian Fainelli.

   8) Avoid unaligned accesses to protocol headers wrt.  GRE, from
      Alexander Duyck.

   9) SKB leaks and other problems in arc_emac driver, from Alexander
      Kochetkov.

  10) tcp_v4_inbound_md5_hash() releases listener socket instead of
      request socket on error path, oops.  Fix from Eric Dumazet.

  11) Missing socket release in pppoe_rcv_core() that seems to have
      existed basically forever.  From Guillaume Nault.

  12) Missing slave_dev unregister in dsa_slave_create() error path,
      from Florian Fainelli.

  13) crypto_alloc_hash() never returns NULL, fix return value check in
      __tcp_alloc_md5sig_pool.  From Insu Yun.

  14) Properly expire exception route entries in ipv4, from Xin Long.

  15) Fix races in tcp/dccp listener socket dismantle, from Eric
      Dumazet.

  16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
      legal.  These drivers modify the SKB on transmit.  From Jiri Benc.

  17) Fix regression in the initialziation of netdev->tx_queue_len.
      From Phil Sutter.

  18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.

  19) SCTP port hash sizing does not properly ensure that table is a
      power of two in size.  From Neil Horman.

  20) Fix initializing of software copy of MAC address in fmvj18x_cs
      driver, from Ken Kawasaki"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
  bnx2x: Fix 84833 phy command handler
  bnx2x: Fix led setting for 84858 phy.
  bnx2x: Correct 84858 PHY fw version
  bnx2x: Fix 84833 RX CRC
  bnx2x: Fix link-forcing for KR2
  net: ethernet: davicom: fix devicetree irq resource
  fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
  Driver: Vmxnet3: Update Rx ring 2 max size
  net: netcp: rework the code for get/set sw_data in dma desc
  soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
  net: ti: netcp: restore get/set_pad_info() functionality
  MAINTAINERS: Drop myself as xen netback maintainer
  sctp: Fix port hash table size computation
  can: ems_usb: Fix possible tx overflow
  Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
  net: bcmgenet: Fix internal PHY link state
  af_unix: Don't use continue to re-execute unix_stream_read_generic loop
  unix_diag: fix incorrect sign extension in unix_lookup_by_ino
  bnxt_en: Failure to update PHY is not fatal condition.
  bnxt_en: Remove unnecessary call to update PHY settings.
  ...
2016-02-22 12:18:07 -08:00
Karicheri, Muralidharan
b1cb86ae0e soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
Rename the pad to sw_data as per description of this field in the hardware
spec(refer sprugr9 from www.ti.com). Latest version of the document is
at http://www.ti.com/lit/ug/sprugr9h/sprugr9h.pdf and section 3.1
Host Packet Descriptor describes this field.

Define and use a constant for the size of sw_data field similar to
other fields in the struct for desc and document the sw_data field
in the header. As the sw_data is not touched by hw, it's type can be
changed to u32.

Rename the helpers to match with the updated dma desc field sw_data.

Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:03:15 -05:00
Ivaylo Dimitrov
9aafabc7fe power: bq27xxx_battery: Restore device name
Patch <703df6c09795> ("power: bq27xxx_battery: Reorganize I2C
into a module") has removed the device name numbering from
bq27xxx_battery_i2c_probe. Fix that by restoring the code.

Fixes: 703df6c097
Signed-off-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
2016-02-21 20:49:34 +01:00
Linus Torvalds
0389075ecf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This is unusually large, partly due to the EFI fixes that prevent
  accidental deletion of EFI variables through efivarfs that may brick
  machines.  These fixes are somewhat involved to maintain compatibility
  with existing install methods and other usage modes, while trying to
  turn off the 'rm -rf' bricking vector.

  Other fixes are for large page ioremap()s and for non-temporal
  user-memcpy()s"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  hpet: Drop stale URLs
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  lib/ucs2_string: Correct ucs2 -> utf8 conversion
  efi: Add pstore variables to the deletion whitelist
  efi: Make efivarfs entries immutable by default
  efi: Make our variable validation list include the guid
  efi: Do variable name validation tests in utf8
  efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
  lib/ucs2_string: Add ucs2 -> utf8 helper functions
2016-02-20 09:32:40 -08:00
Linus Torvalds
e6a1c1e9dd Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
 - Fix dedotify for binutils >= 2.26 from Andreas Schwab
 - Don't trace hcalls on offline CPUs from Denis Kirjanov
 - eeh: Fix stale cached primary bus from Gavin Shan
 - eeh: Fix stale PE primary bus from Gavin Shan
 - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
 - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy

* tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ioda: Set "read" permission when "write" is set
  powerpc/mm: Fix Multi hit ERAT cause by recent THP update
  powerpc/powernv: Fix stale PE primary bus
  powerpc/eeh: Fix stale cached primary bus
  powerpc/pseries: Don't trace hcalls on offline CPUs
  powerpc: Fix dedotify for binutils >= 2.26
  powerpc/book3s_32: Fix build error with checkpoint restart
2016-02-20 09:22:11 -08:00
Linus Torvalds
37aa4dac99 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk driver fixes from Stephen Boyd:
 "An assortment of vendor specific clk drivers fixes, most notably
  fallout from adding Tegra210 and rockchip rk3036/rk3368 drivers this
  cycle.

  There's also the random smattering of sparse/checker fixes, a build
  "fix" to get the Tango clk driver to compile because the Kconfig
  symbol was renamed after the fact, and a clk gpio fix for a patch
  mismerge"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (28 commits)
  clk: gpio: Really allow an optional clock= DT property
  Revert "clk: qcom: Specify LE device endianness"
  clk: versatile: mask VCO bits before writing
  clk: tegra: super: Fix sparse warnings for functions not declared as static
  clk: tegra: Fix sparse warnings for functions not declared as static
  clk: tegra: Fix sparse warning for pll_m
  clk: tegra: Use definition for pll_u override bit
  clk: tegra: Fix warning caused by pll_u failing to lock
  clk: tegra: Fix clock sources for Tegra210 EMC
  clk: tegra: Add the APB2APE audio clock on Tegra210
  clk: tegra: Add missing of_node_put()
  clk: tegra: Fix PLLE SS coefficients
  clk: tegra: Fix typos around clearing PLLE bits during enable
  clk: tegra: Do not disable PLLE when under hardware control
  clk: tegra: Fix pllx dyn step calculation
  clk: tegra: pll: Fix potential sleeping-while-atomic
  clk: tegra: Fix the misnaming of nvenc from msenc
  clk: tegra: Fix naming of MISC registers
  clk: tango4: rename ARCH_TANGOX to ARCH_TANGO
  clk: scpi: Fix checking return value of platform_device_register_simple()
  ...
2016-02-20 09:16:51 -08:00
Linus Torvalds
a703f42d6c Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull more drm fixes from Dave Airlie:
 "Some more fixes trickled in:

  A bunch of VC4 ones since it's a pretty new driver not much chance of
  regressions, and it fixes GPU resets.

  Also one atomic fix, one set of fixes for a common bug in TTM cleanup,
  and one i915 hotplug fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: use post-decrement in error handling
  drm/atomic: Allow for holes in connector state, v2.
  drm/i915: Fix hpd live status bits for g4x
  drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
  drm/vc4: Enable runtime PM.
  drm/vc4: Fix spurious GPU resets due to BO reuse.
  drm/vc4: Drop error message on seqno wait timeouts.
  drm/vc4: Fix -ERESTARTSYS error return from BO waits.
  drm/vc4: Return an ERR_PTR from BO creation instead of NULL.
  drm/vc4: Fix the clear color for the first tile rendered.
  drm/vc4: Validate that WAIT_BO padding is cleared.
  drm/radeon: use post-decrement in error handling
  drm/amdgpu: use post-decrement in error handling
2016-02-20 09:13:18 -08:00
Dan Williams
747ffe11b4 libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
Use the output length specified in the command to size the receive
buffer rather than the arbitrary 4K limit.

This bug was hiding the fact that the ndctl implementation of
ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.

Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-02-19 15:21:52 -08:00
Linus Torvalds
87d9ac712b Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slab: free kmem_cache_node after destroy sysfs file
  ipc/shm: handle removed segments gracefully in shm_mmap()
  MAINTAINERS: update Kselftest Framework mailing list
  devm_memremap_release(): fix memremap'd addr handling
  mm/hugetlb.c: fix incorrect proc nr_hugepages value
  mm, x86: fix pte_page() crash in gup_pte_range()
  fsnotify: turn fsnotify reaper thread into a workqueue job
  Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
  mm: fix regression in remap_file_pages() emulation
  thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19 13:36:00 -08:00
Nikolay Aleksandrov
cfdd28beb3 net: make netdev_for_each_lower_dev safe for device removal
When I used netdev_for_each_lower_dev in commit bad5316232 ("vrf:
remove slave queue and private slave struct") I thought that it acts
like netdev_for_each_lower_private and can be used to remove the current
device from the list while walking, but unfortunately it acts more like
netdev_for_each_lower_private_rcu and doesn't allow it. The difference
is where the "iter" points to, right now it points to the current element
and that makes it impossible to remove it. Change the logic to be
similar to netdev_for_each_lower_private and make it point to the "next"
element so we can safely delete the current one. VRF is the only such
user right now, there's no change for the read-only users.

Here's what can happen now:
[98423.249858] general protection fault: 0000 [#1] SMP
[98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
virtio_ring virtio scsi_mod [last unloaded: bridge]
[98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G           O
4.5.0-rc2+ #81
[98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
ffff88003428c000
[98423.256123] RIP: 0010:[<ffffffff81514f3e>]  [<ffffffff81514f3e>]
netdev_lower_get_next+0x1e/0x30
[98423.256534] RSP: 0018:ffff88003428f940  EFLAGS: 00010207
[98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
0000000000000000
[98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
ffff880054ff90c0
[98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
0000000000000000
[98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff88003428f9e0
[98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
0000000000000001
[98423.258055] FS:  00007f3d76881700(0000) GS:ffff88005d000000(0000)
knlGS:0000000000000000
[98423.258418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
00000000000406e0
[98423.258902] Stack:
[98423.259075]  ffff88003428f960 ffffffffa0442636 0002000100000004
ffff880054ff9000
[98423.259647]  ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
ffff88003428f978
[98423.260208]  ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
ffff880035b35f00
[98423.260739] Call Trace:
[98423.260920]  [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf]
[98423.261156]  [<ffffffff81518205>]
rollback_registered_many+0x205/0x390
[98423.261401]  [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70
[98423.261641]  [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50
[98423.271557]  [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0
[98423.271800]  [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90
[98423.272049]  [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200
[98423.272279]  [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10
[98423.272513]  [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40
[98423.272755]  [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40
[98423.272983]  [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0
[98423.273209]  [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40
[98423.273476]  [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0
[98423.273710]  [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610
[98423.273947]  [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70
[98423.274175]  [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0
[98423.274416]  [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140
[98423.274658]  [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210
[98423.274894]  [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210
[98423.275130]  [<ffffffff81269611>] ? __fget_light+0x91/0xb0
[98423.275365]  [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80
[98423.275595]  [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20
[98423.275827]  [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
[98423.279639] RIP  [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30
[98423.279920]  RSP <ffff88003428f940>

CC: David Ahern <dsa@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
Fixes: bad5316232 ("vrf: remove slave queue and private slave struct")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 15:29:26 -05:00
Maarten Lankhorst
5fff80bbdb drm/atomic: Allow for holes in connector state, v2.
Because we record connector_mask using 1 << drm_connector_index now
the connector_mask should stay the same even when other connectors
are removed. This was not the case with MST, in that case when removing
a connector all other connectors may change their index.

This is fixed by waiting until the first get_connector_state to allocate
connector_state, and force reallocation when state is too small.

As a side effect connector arrays no longer have to be preallocated,
and can be allocated on first use which means a less allocations in
the page flip only path.

Changes since v1:
- Whitespace. (Ville)
- Call ida_remove when destroying the connector. (Ville)
- u32 alloc -> int. (Ville)

Fixes: 14de6c44d1 ("drm/atomic: Remove drm_atomic_connectors_for_crtc.")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lyude <cpaul@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-19 13:24:03 +10:00
Linus Torvalds
705d43dbe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - regression (from 4.4) fix for ordering issue, introduced by an
   earlier ftrace change, that broke live patching of modules.

   The fix replaces the ftrace module notifier by direct call in order
   to make the ordering guaranteed and well-defined.  The patch, from
   Jessica Yu, has been acked both by Steven and Rusty

 - error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  ftrace/module: remove ftrace module notifier
  livepatch: change the error message in asm/livepatch.h header files
2016-02-18 16:34:15 -08:00
Jeff Layton
13d34ac6e5 Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
This reverts commit c510eff6be ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks.  While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here.  While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys.  While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch.  A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Eric Dumazet
7716682cc5 tcp/dccp: fix another race at listener dismantle
Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Fixes: ebb516af60 ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 11:35:51 -05:00
Xin Long
deed49df73 route: check and remove route cache when we get route
Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 11:31:36 -05:00
Bjorn Helgaas
67b4eab91c Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
Revert 811a4e6fce ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").

This is part of reverting 991de2e590 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
2016-02-17 17:23:36 -06:00
Jessica Yu
7dcd182bec ftrace/module: remove ftrace module notifier
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.

This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.

Fixes: 5156dca34a ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-02-17 22:14:06 +01:00
Linus Torvalds
2850713576 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes from the past few weeks that should go into 4.5.
  This contains:

   - Overflow fix for sysfs discard show function from Alan.

   - A stacking limit init fix for max_dev_sectors, so we don't end up
     artificially capping some use cases.  From Keith.

   - Have blk-mq proper end unstarted requests on a dying queue, instead
     of pushing that to the driver.  From Keith.

   - NVMe:
        - Update to Kconfig description for NVME_SCSI, since it was
          vague and having it on is important for some SUSE distros.
          From Christoph.
        - Set of fixes from Keith, around surprise removal. Also kills
          the no-merge flag, so it supports merging.

   - Set of fixes for lightnvm from Matias, Javier, and Wenwei.

   - Fix null_blk oops when asked for lightnvm, but not available.  From
     Matias.

   - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
     if interrupted by a signal.

   - Two floppy fixes from Jiri, fixing signal handling and blocking
     open.

   - A use-after-free fix for O_DIRECT, from Mike Krinkin.

   - A block module ref count fix from Roman Pen.

   - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.

   - Smaller reallo fix for xen-blkfront from Bob Liu.

   - Removal of an unused struct member in the deadline IO scheduler,
     from Tahsin.

   - Also from Tahsin, properly initialize inode struct members
     associated with cgroup writeback, if enabled.

   - From Tejun, ensure that we keep the superblock pinned during cgroup
     writeback"

* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
  blk: fix overflow in queue_discard_max_hw_show
  writeback: initialize inode members that track writeback history
  writeback: keep superblock pinned during cgroup writeback association switches
  bio: return EINTR if copying to user space got interrupted
  NVMe: Rate limit nvme IO warnings
  NVMe: Poll device while still active during remove
  NVMe: Requeue requests on suspended queues
  NVMe: Allow request merges
  NVMe: Fix io incapable return values
  blk-mq: End unstarted requests on dying queue
  block: Initialize max_dev_sectors to 0
  null_blk: oops when initializing without lightnvm
  block: fix module reference leak on put_disk() call for cgroups throttle
  nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
  kernel/fs: fix I/O wait not accounted for RW O_DSYNC
  floppy: refactor open() flags handling
  lightnvm: allow to force mm initialization
  lightnvm: check overflow and correct mlc pairs
  lightnvm: fix request intersection locking in rrpc
  lightnvm: warn if irqs are disabled in lock laddr
  ...
2016-02-17 11:59:23 -08:00
Linus Torvalds
a9f70bd4e7 Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This includes two fixes.

  The first is something that has come up a few times and has been
  worked out individually, but it's come up now enough that the problem
  should be generic.  Tracepoints are protected by RCU sched.  There are
  several tracepoints within core infrastructure like kfree().  If a
  tracepoint is called when the CPU is going down, or when it's coming
  up but has yet to be recognized by RCU, a RCU warning is triggered.

  This is a true bug as that tracepoint is not protected by RCU.
  Usually, this is taken care of by testing for cpu online as a
  tracepoint condition.  But as this is happening more often, moving it
  from a individual tracepoint to a check in the tracepoint
  infrastructure is more robust.

  Note, there is now a duplicate of a cpu online test, because this
  update does not remove the individual checks.  But the overhead is
  small enough that the removal can be done in another release.

  The second change is strange linker breakage due to the branch
  tracer's builtin_constant_p() check failing, and treating the
  condition as a variable instead of a constant.  Arnd Bergmann found
  that this can be fixed by testing !!(cond) instead of just (cond)"

* tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix freak link error caused by branch tracer
  tracepoints: Do not trace when cpu is offline
2016-02-17 11:35:41 -08:00
Kinglong Mee
c89757061a pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
unreferenced object 0xffffc90000abf000 (size 16900):
  comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
    [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
    [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
    [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
    [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
    [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
    [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
    [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
    [<ffffffff81228f1d>] do_fsync+0x3d/0x70
    [<ffffffff812291d0>] SyS_fsync+0x10/0x20
    [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
    [<ffffffffffffffff>] 0xffffffffffffffff

v2, add missing include header

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17 11:44:45 -05:00
Huy Nguyen
85743f1eb3 net/mlx4_core: Set UAR page size to 4KB regardless of system page size
problem description:

The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.

solution:

Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.

Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.

Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.

Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.

Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
Matan Barak
b4ff3a36d3 net/mlx5: Use offset based reserved field names in the IFC header file
mlx5_ifc.h is a header file representing the API and ABI between
the driver to the firmware and hardware. This file is used from
both the mlx5_ib and mlx5_core drivers.

Previously, this file used incrementing counter to indicate
reserved fields, for example:

struct mlx5_ifc_odp_per_transport_service_cap_bits {
        u8         send[0x1];
        u8         receive[0x1];
        u8         write[0x1];
        u8         read[0x1];
        u8         reserved_0[0x1];
        u8         srq_receive[0x1];
        u8         reserved_1[0x1a];
};

If one developer implements through net-next feature A that uses
reserved_0, they replace it with featureA and renames reserved_1 to
reserved_0. In the same kernel cycle, a 2nd developer could implement
feature B through the rdma tree, that uses reserved_1 and split it to
featureB and a smaller reserved_1 field. This will cause a conflict
when the two trees are merged.

The source of this conflict is that the 1st developer changed *all*
reserved fields.

As Linus suggested, we change the layout of structs to:

struct mlx5_ifc_odp_per_transport_service_cap_bits {
	u8         send[0x1];
	u8         receive[0x1];
	u8         write[0x1];
	u8         read[0x1];
	u8         reserved_at_4[0x1];
	u8         srq_receive[0x1];
	u8         reserved_at_6[0x1a];
};

This makes the conflicts much more rare and preserves the locality of
changes.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 15:21:47 -05:00
Linus Torvalds
87bbcfdecc Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu
Pull IOMMU SVM fixes from David Woodhouse:
 "Minor register size and interrupt acknowledgement fixes which only
  showed up in testing on newer hardware, but mostly a fix to the MM
  refcount handling to prevent a recursive refcount issue when mmap() is
  used on the file descriptor associated with a bound PASID"

* tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu:
  iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
  iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
  iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
2016-02-16 08:04:06 -08:00
Ingo Molnar
4682c211a8 Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:

 * Prevent accidental deletion of EFI variables through efivarfs that
   may brick machines. We use a whitelist of known-safe variables to
   allow things like installing distributions to work out of the box, and
   instead restrict vendor-specific variable deletion by making
   non-whitelist variables immutable (Peter Jones)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 13:14:57 +01:00
Arnd Bergmann
b33c8ff443 tracing: Fix freak link error caused by branch tracer
In my randconfig tests, I came across a bug that involves several
components:

* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
  replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing

        u32 adc_clock = 450560; /* 45.056 MHz */
        if (state->config.adc_clock)
                adc_clock = state->config.adc_clock;
        do_div(value, adc_clock);

In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.

That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():

dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!

This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.

I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.

Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de

Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-02-15 13:06:00 -05:00
Steven Rostedt (Red Hat)
f37755490f tracepoints: Do not trace when cpu is offline
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.

This can probuce the following warning:

 [ INFO: suspicious RCU usage. ]
 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
 -------------------------------
 include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/8/0.

 stack backtrace:
  CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
  Call Trace:
  [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
  [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
  [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
  [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
  [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
  [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
  [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
  [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
  [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
  [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
  [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
  [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14

This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.

Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.

Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.

Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org

Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-02-15 13:04:46 -05:00
David Woodhouse
4692400827 iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.

This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.

Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-15 12:42:38 +00:00
Aneesh Kumar K.V
c777e2a8b6 powerpc/mm: Fix Multi hit ERAT cause by recent THP update
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

		CPU0				CPU1
shrink_page_list()
  add_to_swap()
    split_huge_page_to_list()
      __split_huge_pmd_locked()
        pmdp_huge_clear_flush_notify()
	// pmd_none() == true
					exit_mmap()
					  unmap_vmas()
					    zap_pmd_range()
					      // no action on pmd since pmd_none() == true
	pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Fixes: eef1b3ba05 ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-15 21:10:04 +11:00
Linus Torvalds
779ee19da7 Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are a number of small tty and serial driver fixes for 4.5-rc4
  that resolve some reported issues.

  One of them got reverted as it wasn't correct based on testing, and
  all have been in linux-next for a while"

* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "8250: uniphier: allow modular build with 8250 console"
  pty: make sure super_block is still valid in final /dev/tty close
  pty: fix possible use after free of tty->driver_data
  tty: Add support for PCIe WCH382 2S multi-IO card
  serial/omap: mark wait_for_xmitr as __maybe_unused
  serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
  8250: uniphier: allow modular build with 8250 console
  tty: Drop krefs for interrupted tty lock
2016-02-14 12:29:59 -08:00
Linus Torvalds
6a810945ed Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix preventing a 32bit overflow in timespec/val to cputime
  conversions on 32bit machines"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
2016-02-14 11:57:24 -08:00
Linus Torvalds
2f2e9f2dd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This includes the long awaited series to address a set of bugs around
  active I/O remote-port LUN_RESET, as well as properly handling this
  same case with concurrent fabric driver session disconnect ->
  reconnect.

  Note this set of LUN_RESET bug-fixes has been surviving extended
  testing on both v4.5-rc1 and v3.14.y code over the last weeks, and is
  CC'ed for stable as it's something folks using multiple ESX connected
  hosts with slow backends can certainly trigger.

  The highlights also include:

   - Fix WRITE_SAME/DISCARD emulation 4k sector conversion in
     target/iblock (Mike Christie)

   - Fix TMR abort interaction and AIO type TMR response in qla2xxx
     target (Quinn Tran + Swapnil Nagle)

   - Fix >= v3.17 stale descriptor pointer regression in qla2xxx target
     (Quinn Tran)

   - Fix >= v4.5-rc1 return regression with unmap_zeros_data_store new
     configfs store handler (nab)

   - Add CPU affinity flag + convert qla2xxx to use bit (Quinn + HCH +
     Bart)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
  target/transport: add flag to indicate CPU Affinity is observed
  target: Fix incorrect unmap_zeroes_data_store return
  qla2xxx: Use ATIO type to send correct tmr response
  qla2xxx: Fix stale pointer access.
  target/user: Fix cast from pointer to phys_addr_t
  target: Drop legacy se_cmd->task_stop_comp + REQUEST_STOP usage
  target: Fix race with SCF_SEND_DELAYED_TAS handling
  target: Fix remote-port TMR ABORT + se_cmd fabric stop
  target: Fix TAS handling for multi-session se_node_acls
  target: Fix LUN_RESET active TMR descriptor handling
  target: Fix LUN_RESET active I/O handling for ACK_KREF
  qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM
  qla2xxx: Fix warning reported by static checker
  target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
2016-02-13 16:39:27 -08:00
Linus Torvalds
29f1bf34ed Merge branch 'akpm'(patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window.  Andrew thought it should go in now.  I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
  mm: fix pfn_t vs highmem
  kernel/locking/lockdep.c: convert hash tables to hlists
  mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
  mm,thp: khugepaged: call pte flush at the time of collapse
  mm/backing-dev.c: fix error path in wb_init()
  mm, dax: check for pmd_none() after split_huge_pmd()
  vsprintf: kptr_restrict is okay in IRQ when 2
  mm: fix filemap.c kernel doc warning
  ubsan: cosmetic fix to Kconfig text
2016-02-12 13:12:27 -08:00
Linus Torvalds
4c05121e25 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "A set of seven fixes:

  Two regressions in the new hisi_sas arm driver, a blacklist entry for
  the marvell console which was causing a reset cascade without it, a
  race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
  driver, without which, it would prematurely return EIO and a couple of
  fixes for the hyper-v storvsc driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  SCSI: Add Marvell Console to VPD blacklist
  scsi_dh_rdac: always retry MODE SELECT on command lock violation
  storvsc: Use the specified target ID in device lookup
  storvsc: Install the storvsc specific timeout handler for FC devices
  hisi_sas: fix v1 hw check for slot error
  hisi_sas: add dependency for HAS_IOMEM
2016-02-12 09:32:37 -08:00
Dan Williams
db78c22230 mm: fix pfn_t vs highmem
The pfn_t type uses an unsigned long to store a pfn + flags value.  On a
64-bit platform the upper 12 bits of an unsigned long are never used for
storing the value of a pfn.  However, this is not true on highmem
platforms, all 32-bits of a pfn value are used to address a 44-bit
physical address space.  A pfn_t needs to store a 64-bit value.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
Fixes: 01c8f1c44b ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stuart Foster <smf.linux@ntlworld.com>
Reported-by: Julian Margetson <runaway@candw.ms>
Tested-by: Julian Margetson <runaway@candw.ms>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11 18:35:48 -08:00
Andrew Morton
4a389810bc kernel/locking/lockdep.c: convert hash tables to hlists
Mike said:

: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
: kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized.  Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.

Fix this ordering issue permanently: change lockdep so that it uses
hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for
operation immediately upon boot - lockdep_init() need not have run.

The patch will also save some memory.

lockdep_init() and lockdep_initialized can be done away with now - a 4.6
patch has been prepared to do this.

Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11 18:35:48 -08:00