Commit Graph

693430 Commits

Author SHA1 Message Date
Suzuki K Poulose
79d29bb93b coresight replicator: Expose replicator management registers
Expose the idfilter* registers of the programmable replicator.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Suzuki K Poulose
2b4553399b coresight tmc: Expose DBA and AXICTL
Expose DBALO,DBAHI and AXICTL registers

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Suzuki K Poulose
6f6ab4fce5 coresight tmc: Add helpers for accessing 64bit registers
Coresight TMC splits 64bit registers into a pair of 32bit registers
(e.g DBA, RRP, RWP). Provide helpers to read/write to these registers.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Suzuki K Poulose
47675f6a46 coresight: Use the new helper for defining registers
Use the new helpers for exposing coresight component registers,
choosing the 64bit variants for appropriate registers.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Suzuki K Poulose
b4523c87c0 coresight: Add support for reading 64bit registers
Add support for reading a lower and upper 32bits of a register
as a single 64bit register. Also add simplified macros for
direct register accesses.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Suzuki K Poulose
1c8859848d coresight replicator: Cleanup programmable replicator naming
The Linux coresight drivers define the programmable ATB replicator as
Qualcomm replicator, while this is designed by ARM. This can cause
confusion to a user selecting the driver. Cleanup all references to
make it explicitly clear. This patch :

 1) Replace the compatible string for the replicator :
      qcom,coresight-replicator1x => arm,coresight-dynamic-replicator
 2) Changes the Kconfig symbol (since this is not part of any defconfigs)
     CORESIGHT_QCOM_REPLICATOR => CORESIGHT_DYNAMIC_REPLICATOR
 3) Improves the help message in the Kconfig.
 4) Changes the name of the driver and the file :
      coresight-replicator-qcom => coresight-dynamic-replicator

Cc: Pratik Patel <pratikp@codeaurora.org>
Cc: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Mike Leach
27b8f6673a coresight: etm4x: Adds trace return stack option programming for ETMv4.
Adds handling to program the return stack option into ETMv4 hardware if
specified in the perf command line.

If option is not supported by the hardware then it will be ignored.
This allows capture to move between core/ETM combinations that have the
hardware support to those that do not.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Mike Leach
557587bede coresight: ptm: Adds trace return stack option programming for PTM.
Adds handling to program the return stack option into PTM hardware if
specified in the perf command line.

If option is not supported by the hardware then it will be ignored.
This allows capture to move between core/ETM combinations that have the
hardware support to those that do not.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Mike Leach
b97971bee5 coresight: pmu: Adds return stack option to perf coresight pmu
Return stack is a programmable option on some ETM and PTM hardware.
Adds the option flags to enable this from the perf event command line.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:48 +02:00
Arvind Yadav
89f00a1ae5 hwtracing: coresight: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
  text	   data	    bss	    dec	    hex	filename
   2573	    288	    296	   3157	    c55	coresight-etm-perf.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   2613	    224	    296	   3133	    c3d	coresight-etm-perf.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Mathieu Poirier
af36103e48 coresight: etm3x: Set synchronisation frequencty to TRM default
Register ETMSYNCFR holds the number of by that need to be generated before
periodic synchronisation packets are inserted in the trace stream.  By
zeroing out the config structure, the current code effectively disable
periodic synchronization.

This patch simply initialise the recommended value for this register as
specified in the technical reference manual.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Mathieu Poirier
1655a3d6f3 coresight: etb10: Move etb_disable_hw() outside of lock
Function etb_disable_hw() is already taking care of unlocking and locking
the coresight access register and as such doesn't need to be placed
within the unlock/lock of function etb_update_buffer().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Mathieu Poirier
0c3fc4d5fa coresight: Add barrier packet for synchronisation
When a buffer overflow happens the synchronisation patckets usually
present at the beginning of the buffer are lost, a situation that
prevents the decoder from knowing the context of the traces being
decoded.

This patch adds a barrier packet to be used by sink IPs when a buffer
overflow condition is detected.  These barrier packets are then used
by the decoding library as markers to force re-synchronisation.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Mathieu Poirier
4f871a9f0f coresight: etb10: Remove useless conversion to LE
Internal CoreSight components are rendering trace data in little-endian
format.  As such there is no need to convert the data once more, hence
removing the extra step.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Mathieu Poirier
cfd9f6306f coresight: Correct buffer lost increment
Many conditions may cause synchronisation to be lost when updating
the perf ring buffer but the end result is still the same: synchronisation
is lost.  As such there is no need to increment the lost count for each
condition, just once will suffice.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:05:47 +02:00
Kiran Gunda
2fb4f2581d spmi: pmic-arb: Move the ownership check to irq_chip callback
Check the irq ownership in the irq_request_resources callback
instead of checking it during the irq mapping. This can prevent
installing the flow handler for the interrupt that is not owned by the EE
and allow the irq translation during the gpio driver probe.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Tested-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:52:22 +02:00
Rob Herring
e55fe64a19 spmi: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:20 +02:00
Fenglin Wu
53d296b594 spmi: pmic-arb: Remove checking opc value not less than 0
The opc parameter in pmic_arb_write_cmd() function is defined with type
u8 and it's always greater than or equal to 0. Checking that it's not
less than 0 is redundant and it can cause a forbidden warning during
compilation. Remove the check.

Signed-off-by: Fenglin Wu <fenglinw@codeaurora.org>
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
David Collins
40f318f0ed spmi: pmic-arb: add support for HW version 5
Add support for version 5 of the SPMI PMIC arbiter.  It utilizes
different offsets for registers than those found on version 3.
Also, the procedure to determine if writing and IRQ access is
allowed for a given PPID changes for version 5.

Signed-off-by: David Collins <collinsd@codeaurora.org>
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
000e1a43d3 spmi: pmic-arb: fix a possible null pointer dereference
If "core" memory resource is not specified, then the driver could
end up dereferencing a null pointer. Fix this issue.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
e95d073c8c spmi: pmic-arb: return __iomem pointer instead of offset
Modify the pmic_arb version ops to return an __iomem pointer
to the address instead of an offset. That way we do not need to
care about the base address changes in the new HW version.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
cdeef07a8d spmi: pmic-arb: use irq_chip callback to set spmi irq wakeup capability
Currently the driver sets the pmic arbiter core interrupt as wakeup capable
irrespective of the child irqs which causes the system to wakeup
unnecessarily. To fix this, set the core interrupt as wakeup capable
only if any of the child irqs request for it. Do this by marking it as
wakeup capable in the irq_set_wake callback.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
ff615ed91b spmi: pmic-arb: return the value instead of passing by pointer
Returning the output value from a function, when it is possible, is the
better and cleaner way than passing it by the pointer. Hence, modify
the ppid_to_apid mapping function to return apid instead of passing
it by a pointer. While at it, pass the ppid as function parameter to
ppid_to_apid mapping function instead of passing the sid and addr.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
9f7a9a448d spmi: pmic-arb: replace the writel_relaxed with __raw_writel
Replace the writel_relaxed with __raw_writel to avoid byte swapping
in pmic_arb_write_data() function. That way the code is independent
of the CPU endianness.

Fixes: 111a10bf3e ("spmi: pmic-arb: rename spmi_pmic_arb_dev to
spmi_pmic_arb")
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
4788e613a6 spmi: pmic-arb: fix memory allocation for mapping_table
Allocate the correct memory size (max_pmic_peripherals) for the
mapping_table that holds the apid to ppid mapping. Also use a local
variable for mapping_table for better alignment of the code.

Fixes: 987a9f128b ("spmi: pmic-arb: Support more than 128 peripherals")
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
325255bc95 spmi: pmic-arb: optimize qpnpint_irq_set_type function
Optimize the qpnpint_irq_set_type() by using a local variable
to hold the handler type. Also clean up other variable usage.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
f2f3156472 spmi: pmic-arb: clean up pmic_arb_find_apid function
Clean up the pmic_arb_find_apid() by using the local
variables to improve the code readability.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
02abec3616 spmi: pmic-arb: rename pa_xx to pmic_arb_xx and other cleanup
This patch cleans up the following.

- Rename the "pa" to "pmic_arb".
- Rename the spmi_pmic_arb *dev to spmi_pmic_arb *pmic_arb.
- Rename the pa_{read,write}_data() functions to
  pmic_arb_{read,write}_data().
- Rename channel to APID.
- Rename the HWIRQ_*() macros to hwirq_to_*().

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Kiran Gunda
b319b5922d spmi: pmic-arb: remove the read/write access checks
The access mode checks for peripheral ownership for read/write
permissions should not be required. Every peripheral enabled for
this master is expected to have a read/write permissions. If there
is any such invalid access due to wrong configuration in boot loader
or device tree files, then it should be fixed in those locations.
Hence, remove the access mode checks from the driver.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 13:51:19 +02:00
Bhumika Goyal
b8d01b7fba mei: make device_type const
Make this const as it is only stored in the type field of a device
structure, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 11:59:39 +02:00
Jisheng Zhang
e8d2ed7db7 Revert "staging: Fix build issues with new binder API"
This reverts commit d0bdff0db8 ("staging: Fix build issues with new
binder API"), because commit e38361d032 ("ARM: 8091/2: add get_user()
support for 8 byte types") has added the 64bit __get_user_asm_*
implementation.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 10:44:15 +02:00
Greg Kroah-Hartman
9749c37275 Merge 4.13-rc7 into char-misc-next
We want the binder fix in here as well for testing and merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 10:19:01 +02:00
Linus Torvalds
cc4a41fe55 Linux 4.13-rc7 v4.13-rc7 2017-08-27 17:20:40 -07:00
Linus Torvalds
2c25833c42 Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
 "Another fix, this time in common IOMMU sysfs code.

  In the conversion from the old iommu sysfs-code to the
  iommu_device_register interface, I missed to update the release path
  for the struct device associated with an IOMMU. It freed the 'struct
  device', which was a pointer before, but is now embedded in another
  struct.

  Freeing from the middle of allocated memory had all kinds of nasty
  side effects when an IOMMU was unplugged. Unfortunatly nobody
  unplugged and IOMMU until now, so this was not discovered earlier. The
  fix is to make the 'struct device' a pointer again"

* tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix wrong freeing of iommu_device->dev
2017-08-27 17:10:34 -07:00
Linus Torvalds
80f73b2da0 Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
 "Here is a single misc driver fix for 4.13-rc7. It resolves a reported
  problem in the Android binder driver due to previous patches in
  4.13-rc.

  It's been in linux-next with no reported issues"

* tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ANDROID: binder: fix proc->tsk check.
2017-08-27 17:08:37 -07:00
Linus Torvalds
c3c162635f Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio fixes from Greg KH:
 "Here are few small staging driver fixes, and some more IIO driver
  fixes for 4.13-rc7. Nothing major, just resolutions for some reported
  problems.

  All of these have been in linux-next with no reported problems"

* tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: magnetometer: st_magn: remove ihl property for LSM303AGR
  iio: magnetometer: st_magn: fix status register address for LSM303AGR
  iio: hid-sensor-trigger: Fix the race with user space powering up sensors
  iio: trigger: stm32-timer: fix get trigger mode
  iio: imu: adis16480: Fix acceleration scale factor for adis16480
  PATCH] iio: Fix some documentation warnings
  staging: rtl8188eu: add RNX-N150NUB support
  Revert "staging: fsl-mc: be consistent when checking strcmp() return"
  iio: adc: stm32: fix common clock rate
  iio: adc: ina219: Avoid underflow for sleeping time
  iio: trigger: stm32-timer: add enable attribute
  iio: trigger: stm32-timer: fix get/set down count direction
  iio: trigger: stm32-timer: fix write_raw return value
  iio: trigger: stm32-timer: fix quadrature mode get routine
  iio: bmp280: properly initialize device for humidity reading
2017-08-27 17:03:33 -07:00
Linus Torvalds
fff4e7a0e6 Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb
Pull NTB fixes from Jon Mason:
 "NTB bug fixes to address an incorrect ntb_mw_count reference in the
  NTB transport, improperly bringing down the link if SPADs are
  corrupted, and an out-of-order issue regarding link negotiation and
  data passing"

* tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
  ntb: ntb_test: ensure the link is up before trying to configure the mws
  ntb: transport shouldn't disable link due to bogus values in SPADs
  ntb: use correct mw_count function in ntb_tool and ntb_transport
2017-08-27 17:01:54 -07:00
Linus Torvalds
a8b169afbf Avoid page waitqueue race leaving possible page locker waiting
The "lock_page_killable()" function waits for exclusive access to the
page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry
set.

That means that if it gets woken up, other waiters may have been
skipped.

That, in turn, means that if it sees the page being unlocked, it *must*
take that lock and return success, even if a lethal signal is also
pending.

So instead of checking for lethal signals first, we need to check for
them after we've checked the actual bit that we were waiting for.  Even
if that might then delay the killing of the process.

This matches the order of the old "wait_on_bit_lock()" infrastructure
that the page locking used to use (and is still used in a few other
areas).

Note that if we still return an error after having unsuccessfully tried
to acquire the page lock, that is ok: that means that some other thread
was able to get ahead of us and lock the page, and when that other
thread then unlocks the page, the wakeup event will be repeated.  So any
other pending waiters will now get properly woken up.

Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 16:25:09 -07:00
Linus Torvalds
3510ca20ec Minor page waitqueue cleanups
Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Christopher Lameter <cl@linux.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 13:55:12 -07:00
Linus Torvalds
0cc3b0ec23 Clarify (and fix) MAX_LFS_FILESIZE macros
We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong.  We used to
define that value to

	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense.  The index is actually the full
32 bit unsigned value, and we can use that whole full page.  So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow.  So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed.  But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case.  That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <nazard@nazar.ca>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 12:12:25 -07:00
Linus Torvalds
bab9752480 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a tweak to the IBM Trackpoint driver that helps recognizing
   trackpoints on never Lenovo Carbons

 - a fix to the ALPS driver solving scroll issues on some Dells

 - yet another ACPI ID has been added to Elan I2C toucpad driver

 - quieted diagnostic message in soc_button_array driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
  Input: soc_button_array - silence -ENOENT error on Dell XPS13 9365
  Input: trackpoint - add new trackpoint firmware ID
  Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
2017-08-26 12:48:29 -07:00
Linus Torvalds
9716bdb23e Merge tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
 "Remove needlessly alarming MSI affinity warning (this is not actually
  a bug fix, but the warning prompts unnecessary bug reports)"

* tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/MSI: Don't warn when irq_create_affinity_masks() returns NULL
2017-08-26 12:46:14 -07:00
Linus Torvalds
c153e62105 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes: one for an ldt_struct handling bug and a cherry-picked
  objtool fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use-after-free of ldt_struct
  objtool: Fix '-mtune=atom' decoding support in objtool 2.0
2017-08-26 09:06:28 -07:00
Linus Torvalds
0adb8f3d31 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix a timer granularity handling race+bug, which would manifest itself
  by spuriously increasing timeouts of some timers (from 1 jiffy to ~500
  jiffies in the worst case measured) in certain nohz states"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix excessive granularity of new timers after a nohz idle
2017-08-26 09:02:18 -07:00
Linus Torvalds
53ede64de3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
 "A single fix to not allow nonsensical event groups that result in
  kernel warnings"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix group {cpu,task} validation
2017-08-26 08:59:50 -07:00
Linus Torvalds
b3242dba9f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memblock.c: reversed logic in memblock_discard()
  fork: fix incorrect fput of ->exe_file causing use-after-free
  mm/madvise.c: fix freeing of locked page with MADV_FREE
  dax: fix deadlock due to misaligned PMD faults
  mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled
  PM/hibernate: touch NMI watchdog when creating snapshot
2017-08-25 18:02:27 -07:00
Linus Torvalds
67a3b5cb33 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull Paolo Bonzini:
 "Bugfixes for x86, PPC and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
  KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state
  KVM: x86: simplify handling of PKRU
  KVM: x86: block guest protection keys unless the host has them enabled
  KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document them
  KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss
  KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9
  KVM: s390: sthyi: fix specification exception detection
  KVM: s390: sthyi: fix sthyi inline assembly
2017-08-25 17:46:23 -07:00
Linus Torvalds
17e34c4fd0 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Fixes two obvious bugs in virtio pci"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_pci: fix cpu affinity support
  virtio_blk: fix incorrect message when disk is resized
2017-08-25 17:40:03 -07:00
Linus Torvalds
42e6d5e5ee Merge tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
 "Just one fix, to add a barrier in the switch_mm() code to make sure
  the mm cpumask update is ordered vs the MMU starting to load
  translations. As far as we know no one's actually hit the bug, but
  that's just luck.

  Thanks to Benjamin Herrenschmidt, Nicholas Piggin"

* tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Ensure cpumask update is ordered
2017-08-25 17:32:35 -07:00
Linus Torvalds
105065c3f7 Merge tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Two nfsd bugfixes, neither 4.13 regressions, but both potentially
  serious"

* tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux:
  net: sunrpc: svcsock: fix NULL-pointer exception
  nfsd: Limit end of page list when decoding NFSv4 WRITE
2017-08-25 17:27:26 -07:00