Commit 662bbcb274 ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).
The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()
Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.
However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:
spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)
If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.
Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.
As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.
Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
preempt_mask.h defines all the preempt_count semantics and related
symbols: preempt, softirq, hardirq, nmi, preempt active, need resched,
etc...
preempt.h defines the accessors and mutators of preempt_count.
But there is a messy dependency game around those two header files:
* preempt_mask.h includes preempt.h in order to access preempt_count()
* preempt_mask.h defines all preempt_count semantic and symbols
except PREEMPT_NEED_RESCHED that is needed by asm/preempt.h
Thus we need to define it from preempt.h, right before including
asm/preempt.h, instead of defining it to preempt_mask.h with the
other preempt_count symbols. Therefore the preempt_count semantics
happen to be spread out.
* We plan to introduce preempt_active_[enter,exit]() to consolidate
preempt_schedule*() code. But we'll need to access both preempt_count
mutators (preempt_count_add()) and preempt_count symbols
(PREEMPT_ACTIVE, PREEMPT_OFFSET). The usual place to define preempt
operations is in preempt.h but then we'll need symbols in
preempt_mask.h which already includes preempt.h. So we end up with
a ressource circle dependency.
Lets merge preempt_mask.h into preempt.h to solve these dependency issues.
This way we gather semantic symbols and operation definition of
preempt_count in a single file.
This is a dumb copy-paste merge. Further merge re-arrangments are
performed in a subsequent patch to ease review.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431441711-29753-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull tty/serial fixes from Greg KH:
"Here's some TTY and serial driver fixes for reported issues.
All of these have been in linux-next successfully"
* tag 'tty-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
pty: Fix input race when closing
tty/n_gsm.c: fix a memory leak when gsmtty is removed
Revert "serial/amba-pl011: Leave the TX IRQ alone when the UART is not open"
serial: omap: Fix error handling in probe
earlycon: Revert log warnings
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a suspend/resume related regression fix, and an RT priority
boosting fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix regression in cpuset_cpu_inactive() for suspend
sched: Handle priority boosted tasks proper in setscheduler()
Pull drm fixes from Dave Airlie:
"Radeon:
one oops fix, one bug fix, one pci id addition patch
i915:
one suspend/resume regression fix.
All seems quiet enough."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
drm/i915: Avoid GPU hang when coming out of s3 or s4
radeon minor fixes, and pci id addition.
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: don't do mst probing if MST isn't enabled.
drm/radeon: add new bonaire pci id
drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
{u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both
arguments and compares. With !CONFIG_MULTIUSER, __k{u,g}id_val return a
constant 0, which makes {u,g}id_valid always return false. Change
{u,g}id_valid to compare their argument against -1 instead. That produces
identical results in the normal CONFIG_MULTIUSER=y case, but with
!CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return
true;" rather than "return false;".
This fixes uses of devpts without CONFIG_MULTIUSER.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>,
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not all kmem allocations should be accounted to memcg. The following
patch gives an example when accounting of a certain type of allocations to
memcg can effectively result in a memory leak. This patch adds the
__GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
allocation to go through the root cgroup. It will be used by the next
patch.
Note, since in case of kmemleak enabled each kmalloc implies yet another
allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
gfp_kmemleak_mask.
Alternatively, we could introduce a per kmem cache flag disabling
accounting for all allocations of a particular kind, but (a) we would not
be able to bypass accounting for kmalloc then and (b) a kmem cache with
this flag set could not be merged with a kmem cache without this flag,
which would increase the number of global caches and therefore
fragmentation even if the memory cgroup controller is not used.
Despite its generic name, currently __GFP_NOACCOUNT disables accounting
only for kmem allocations while user page allocations are always charged.
To catch abusing of this flag, a warning is issued on an attempt of
passing it to mem_cgroup_try_charge.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from
Avri Altman.
2) Use the correct FW API for scan completions in iwlwifi, from Avraham
Stern.
3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad
Kaufman.
4) rhashtable conversion of mac80211 station table was buggy, the
virtual interface was not taken into account. Fix from Johannes
Berg.
5) Fix deadlock in rtlwifi by not using a zero timeout for
usb_control_msg(), from Larry Finger.
6) Update reordering state before calculating loss detection, from
Yuchung Cheng.
7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter.
8) Fix extended frame handling in xiling_can driver, from Jeppe
Ledet-Pedersen.
9) Fix CODEL packet scheduler behavior in the presence of TSO packets,
from Eric Dumazet.
10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck.
11) macvlan needs to propagate promisc settings down the the lower
device, from Vlad Yasevich.
12) igb driver can oops when changing number of rings, from Toshiaki
Makita.
13) Source specific default routes not handled properly in ipv6, from
Markus Stenberg.
14) Use after free in tc_ctl_tfilter(), from WANG Cong.
15) Use softirq spinlocking in netxen driver, from Tony Camuso.
16) Two ARM bpf JIT fixes from Nicolas Schichan.
17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from
Mathias Kretschmer.
18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei
Starovoitov.
19) ll_temac driver DMA maps TX packet header with incorrect length, fix
from Michal Simek.
20) We removed pm_qos bits from netdevice.h, but some indirect
references remained. Kill them. From David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
net: Remove remaining remnants of pm_qos from netdevice.h
e1000e: Add pm_qos header
net: phy: micrel: Fix regression in kszphy_probe
net: ll_temac: Fix DMA map size bug
x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions
netns: return RTM_NEWNSID instead of RTM_GETNSID on a get
Update be2net maintainers' email addresses
net_sched: gred: use correct backlog value in WRED mode
pppoe: drop pppoe device in pppoe_unbind_sock_work
net: qca_spi: Fix possible race during probe
net: mdio-gpio: Allow for unspecified bus id
af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
bnx2x: limit fw delay in kdump to 5s after boot
ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits.
ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
mpls: Change reserved label names to be consistent with netbsd
usbnet: avoid integer overflow in start_xmit
netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
net: xgene_enet: Set hardware dependency
net: amd-xgbe: Add hardware dependency
...
Commit e2c6544829 removed pm_qos from struct net_device but left the
comment and header file. Remove those.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libata fixes from Tejun Heo:
"Rather big for fixes pull.
- SCC controllers never lived to see the light of the day. Both
libata and ide drivers removed.
- In some configurations, link power management policy changes
sometimes cause delayed spurious PHY events which can develop into
noticeable failures. This has been reported several times over the
years. Gabriele's patches suppress PHY events for a while after
LPM policy changes which should help most of these failures without
causing too much problem for hotplug use cases.
- A few controller specific fixes"
[ Hmm. I don't think removing SSC support is really a "fix", but hey, it
removes a lot of lines of code. Which I like. So ... good riddance ]
* 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: avoton port-disable reset-quirk
ata: select DW_DMAC in case of SATA_DWC
libata: Blacklist queued TRIM on all Samsung 800-series
libata: Ignore spurious PHY event on LPM policy change
libata: Add helper to determine when PHY events should be ignored
ata: ahci_st: fixup layering violations / drvdata errors
Remove celleb-only SCC PATA drivers
A read() from a pty master may mistakenly indicate EOF (errno == -EIO)
after the pty slave has closed, even though input data remains to be read.
For example,
pty slave | input worker | pty master
| |
| | n_tty_read()
pty_write() | | input avail? no
add data | | sleep
schedule worker --->| | .
|---> flush_to_ldisc() | .
pty_close() | fill read buffer | .
wait for worker | wakeup reader --->| .
| read buffer full? |---> input avail ? yes
|<--- yes - exit worker | copy 4096 bytes to user
TTY_OTHER_CLOSED <---| |<--- kick worker
| |
**** New read() before worker starts ****
| | n_tty_read()
| | input avail? no
| | TTY_OTHER_CLOSED? yes
| | return -EIO
Several conditions are required to trigger this race:
1. the ldisc read buffer must become full so the input worker exits
2. the read() count parameter must be >= 4096 so the ldisc read buffer
is empty
3. the subsequent read() occurs before the kicked worker has processed
more input
However, the underlying cause of the race is that data is pipelined, while
tty state is not; ie., data already written by the pty slave end is not
yet visible to the pty master end, but state changes by the pty slave end
are visible to the pty master end immediately.
Pipeline the TTY_OTHER_CLOSED state through input worker to the reader.
1. Introduce TTY_OTHER_DONE which is set by the input worker when
TTY_OTHER_CLOSED is set and either the input buffers are flushed or
input processing has completed. Readers/polls are woken when
TTY_OTHER_DONE is set.
2. Reader/poll checks TTY_OTHER_DONE instead of TTY_OTHER_CLOSED.
3. A new input worker is started from pty_close() after setting
TTY_OTHER_CLOSED, which ensures the TTY_OTHER_DONE state will be
set if the last input worker is already finished (or just about to
exit).
Remove tty_flush_to_ldisc(); no in-tree callers.
Fixes: 52bce7f8d4 ("pty, n_tty: Simplify input processing on final close")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96311
BugLink: http://bugs.launchpad.net/bugs/1429756
Cc: <stable@vger.kernel.org> # 3.19+
Reported-by: Andy Whitcroft <apw@canonical.com>
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since these are now visible to userspace it is nice to be consistent
with BSD (sys/netmpls/mpls.h in netBSD).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull irq updates from Thomas Gleixner:
"Two patches from the irq departement:
- a simple fix to make dummy_irq_chip usable for wakeup scenarios
- removal of the gic arch_extn hackery. Now that all users are
converted we really want to get rid of the interface so people wont
come up with new use cases"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: gic: Drop support for gic_arch_extn
genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
Pull block fixes from Jens Axboe:
"A collection of fixes since the merge window;
- fix for a double elevator module release, from Chao Yu. Ancient bug.
- the splice() MORE flag fix from Christophe Leroy.
- a fix for NVMe, fixing a patch that went in in the merge window.
From Keith.
- two fixes for blk-mq CPU hotplug handling, from Ming Lei.
- bdi vs blockdev lifetime fix from Neil Brown, fixing and oops in md.
- two blk-mq fixes from Shaohua, fixing a race on queue stop and a
bad merge issue with FUA writes.
- division-by-zero fix for writeback from Tejun.
- a block bounce page accounting fix, making sure we inc/dec after
bouncing so that pre/post IO pages match up. From Wang YanQing"
* 'for-linus' of git://git.kernel.dk/linux-block:
splice: sendfile() at once fails for big files
blk-mq: don't lose requests if a stopped queue restarts
blk-mq: fix FUA request hang
block: destroy bdi before blockdev is unregistered.
block:bounce: fix call inc_|dec_zone_page_state on different pages confuse value of NR_BOUNCE
elevator: fix double release of elevator module
writeback: use |1 instead of +1 to protect against div by zero
blk-mq: fix CPU hotplug handling
blk-mq: fix race between timeout and CPU hotplug
NVMe: Fix VPD B0 max sectors translation
Pull tracing fix from Steven Rostedt:
"The newly added ftrace_print_array_seq() function had a bug in it.
Luckily, the only user of it didn't make the 4.1 merge window.
But the helper function should be fixed before 4.2 when the users
start coming in"
* tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Make ftrace_print_array_seq compute buf_len
This is useful for locking primitives that can effect multiple
wakeups per operation and want to avoid lock internal lock contention
by delaying the wakeups until we've released the lock internal locks.
Alternatively it can be used to avoid issuing multiple wakeups, and
thus save a few cycles, in packet processing. Queue all target tasks
and wakeup once you've processed all packets. That way you avoid
waking the target task multiple times if there were multiple packets
for the same task.
Properties of a wake_q are:
- Lockless, as queue head must reside on the stack.
- Being a queue, maintains wakeup order passed by the callers. This can
be important for otherwise, in scenarios where highly contended locks
could affect any reliance on lock fairness.
- A queued task cannot be added again until it is woken up.
This patch adds the needed infrastructure into the scheduler code
and uses the new wake_list to delay the futex wakeups until
after we've released the hash bucket locks.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[tweaks, adjustments, comments, etc.]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Mason <clm@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: George Spelvin <linux@horizon.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1430494072-30283-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While running a database workload, we found a scalability issue with itimers.
Much of the problem was caused by the thread_group_cputimer spinlock.
Each time we account for group system/user time, we need to obtain a
thread_group_cputimer's spinlock to update the timers. On larger systems
(such as a 16 socket machine), this caused more than 30% of total time
spent trying to obtain this kernel lock to update these group timer stats.
This patch converts the timers to 64-bit atomic variables and use
atomic add to update them without a lock. With this patch, the percent
of total time spent updating thread group cputimer timers was reduced
from 30% down to less than 1%.
Note: On 32-bit systems using the generic 64-bit atomics, this causes
sample_group_cputimer() to take locks 3 times instead of just 1 time.
However, we tested this patch on a 32-bit system ARM system using the
generic atomics and did not find the overhead to be much of an issue.
An explanation for why this isn't an issue is that 32-bit systems usually
have small numbers of CPUs, and cacheline contention from extra spinlocks
called periodically is not really apparent on smaller systems.
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I could not find the loadavg code.. turns out it was hidden in a file
called proc.c. It further got mingled up with the cruft per rq load
indexes (which we really want to get rid of).
Move the per rq load indexes into the fair.c load-balance code (that's
the only thing that uses them) and rename proc.c to loadavg.c so we
can find it again.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Did minor cleanups to the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ronny reported that the following scenario is not handled correctly:
T1 (prio = 10)
lock(rtmutex);
T2 (prio = 20)
lock(rtmutex)
boost T1
T1 (prio = 20)
sys_set_scheduler(prio = 30)
T1 prio = 30
....
sys_set_scheduler(prio = 10)
T1 prio = 30
The last step is wrong as T1 should now be back at prio 20.
Commit c365c292d0 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.
Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.
Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: c365c292d0 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull infiniband updates from Doug Ledford:
"Minor updates for 4.1-rc
Most of the changes are fairly small and well confined. The iWARP
address reporting changes are the only ones that are a medium size. I
had these queued up prior to rc1, but due to the shuffle in
maintainers, they did not get submitted when I expected. My apologies
for that. I feel comfortable with them however due to the testing
they've received, so I left them in this submission"
* tag 'for-linus' of git://github.com/dledford/linux:
MAINTAINERS: Update InfiniBand subsystem maintainer
MAINTAINERS: add include/rdma/ to InfiniBand subsystem
IPoIB/CM: Fix indentation level
iw_cxgb4: Remove negative advice dmesg warnings
IB/core: Fix unaligned accesses
IB/core: change rdma_gid2ip into void function as it always return zero
IB/qib: use arch_phys_wc_add()
IB/qib: add acounting for MTRR
IB/core: dma unmap optimizations
IB/core: dma map/unmap locking optimizations
RDMA/cxgb4: Report the actual address of the remote connecting peer
RDMA/nes: Report the actual address of the remote connecting peer
RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
iw_cxgb4: enforce qp/cq id requirements
iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs
iw_cxgb4: 32b platform fixes
iw_cxgb4: Cleanup register defines/MACROS
RDMA/CMA: Canonize IPv4 on IPV6 sockets properly
Pull xen bug fixes from David Vrabel:
- fix blkback regression if using persistent grants
- fix various event channel related suspend/resume bugs
- fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS
- SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
capable of 32-bit DMA work.
* tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
xen/console: Update console event channel on resume
xen/xenbus: Update xenbus event channel on resume
xen/events: Clear cpu_evtchn_mask before resuming
xen-pciback: Add name prefix to global 'permissive' variable
xen: Suspend ticks on all CPUs during suspend
xen/grant: introduce func gnttab_unmap_refs_sync()
xen/blkback: safely unmap purge persistent grants
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
PMU driver hardware-enablement addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix segfault if passed with ''.
perf report: Fix -T/--threads option to work again
perf bench numa: Fix immediate meeting of convergence condition
perf bench numa: Fixes of --quiet argument
perf bench futex: Fix hung wakeup tasks after requeueing
perf probe: Fix bug with global variables handling
perf top: Fix a segfault when kernel map is restricted.
tools lib traceevent: Fix build failure on 32-bit arch
perf kmem: Fix compiles on RHEL6/OL6
tools lib api: Undefine _FORTIFY_SOURCE before setting it
perf kmem: Consistently use PRIu64 for printing u64 values
perf trace: Disable events and drain events when forked workload ends
perf trace: Enable events when doing system wide tracing and starting a workload
perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).
Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.
This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using the new find_closest() macro can result in the following sparse
warnings.
drivers/hwmon/lm85.c:194:16: warning:
incorrect type in initializer (different modifiers)
drivers/hwmon/lm85.c:194:16: expected int *__fc_a
drivers/hwmon/lm85.c:194:16: got int static const [toplevel] *<noident>
drivers/hwmon/lm85.c:210:16: warning:
incorrect type in initializer (different modifiers)
drivers/hwmon/lm85.c:210:16: expected int *__fc_a
drivers/hwmon/lm85.c:210:16: got int const *map
This is because the array passed to find_closest() will typically be
declared as array of constants, but the macro declares a non-constant
pointer to it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move to include/uapi/linux/mpls.h to be externally visibile.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Addresses the following kernel logs seen during boot of sparc systems:
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull crypto fixes from Herbert Xu:
"This fixes a build problem with bcm63xx and yet another fix to the
memzero_explicit function to ensure that the memset is not elided"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm63xx - Fix driver compilation
lib: make memzero_explicit more robust against dead store elimination