hrtimers callbacks are always done from hardirq context, either the
jiffy tick interrupt or the hrtimer device interrupt.
[ there is currently one exception that can still call a hrtimer
callback from softirq, but even in that case this will still
work correctly. ]
Reported-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yury Polyanskiy <ypolyans@princeton.edu>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <1265120401.24455.306.camel@laptop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently the omap serial clocks are autoidled after 5 seconds.
However, this causes lost characters on the serial ports. As this
is considered non-standard behaviour for Linux, disable the timeout.
Note that this will also cause blocking of any deeper omap sleep
states.
To enable the autoidling of the serial ports, do something like
this for each serial port:
# echo 5 > /sys/devices/platform/serial8250.0/sleep_timeout
# echo 5 > /sys/devices/platform/serial8250.1/sleep_timeout
...
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
I have found an access to already released memory in
clk_debugfs_register_one() function.
Signed-off-by: Marek Skuczynski <mareksk7@gmail.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
David Binderman ran the sourceforge tool cppcheck over the source code of the
new Linux kernel 2.6.33-rc6:
[./arm/mach-omap2/mux.c:492]: (error) Buffer access out-of-bounds
13 characters + 1 digit + 1 zero byte is more than 14 characters.
Also add a comment on mode0 name length in case new omaps
start using longer names.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
3630 has more mux signals than 34xx. The additional pins
exist in omap36xx_cbp_subset, but are not initialized
as the superset is missing these offsets. This causes
the following errors during the boot:
mux: Unknown entry offset 0x236
mux: Unknown entry offset 0x22e
mux: Unknown entry offset 0x1ec
mux: Unknown entry offset 0x1ee
mux: Unknown entry offset 0x1f4
mux: Unknown entry offset 0x1f6
mux: Unknown entry offset 0x1f8
mux: Unknown entry offset 0x1fa
mux: Unknown entry offset 0x1fc
mux: Unknown entry offset 0x22a
mux: Unknown entry offset 0x226
mux: Unknown entry offset 0x230
mux: Unknown entry offset 0x22c
mux: Unknown entry offset 0x228
Fix this by adding the missing offsets to omap3 superset.
Note that additionally the uninitialized pins need to be
skipped on 34xx.
Based on an earlier patch by Allen Pais <allen.pais@ti.com>.
Reported-by: Allen Pais <allen.pais@ti.com>
Signed-off-by: Allen Pais <allen.pais@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Ensure valid clock pointer during GPMC init. Fixes compiler
warning about potential use of uninitialized variable.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Ensure valid base address during IRQ init. Fixes compiler warning
about potential use of uninitialized variable.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
OMAP platforms(like OMAP3530) include DSP or other co-processors
for media acceleration. when carving out memory for the
accelerators we can end up creating a hole in the memory map
of sort:
<kernel memory><hole(memory for accelerator)><kernel memory>
To handle such a memory configuration ARCH_HAS_HOLES_MEMORYMODEL
has to be enabled. For further information refer discussion at:
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg15262.html.
Signed-off-by: Sriramakrishnan <srk@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The WARN_ON in lookup_pi_state which complains about a mismatch
between pi_state->owner->pid and the pid which we retrieved from the
user space futex is completely bogus.
The code just emits the warning and then continues despite the fact
that it detected an inconsistent state of the futex. A conveniant way
for user space to spam the syslog.
Replace the WARN_ON by a consistency check. If the values do not match
return -EINVAL and let user space deal with the mess it created.
This also fixes the missing task_pid_vnr() when we compare the
pi_state->owner pid with the futex value.
Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
If the owner of a PI futex dies we fix up the pi_state and set
pi_state->owner to NULL. When a malicious or just sloppy programmed
user space application sets the futex value to 0 e.g. by calling
pthread_mutex_init(), then the futex can be acquired again. A new
waiter manages to enqueue itself on the pi_state w/o damage, but on
unlock the kernel dereferences pi_state->owner and oopses.
Prevent this by checking pi_state->owner in the unlock path. If
pi_state->owner is not current we know that user space manipulated the
futex value. Ignore the mess and return -EINVAL.
This catches the above case and also the case where a task hijacks the
futex by setting the tid value and then tries to unlock it.
Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
This fixes a futex key reference count bug in futex_lock_pi(),
where a key's reference count is incremented twice but decremented
only once, causing the backing object to not be released.
If the futex is created in a temporary file in an ext3 file system,
this bug causes the file's inode to become an "undead" orphan,
which causes an oops from a BUG_ON() in ext3_put_super() when the
file system is unmounted. glibc's test suite is known to trigger this,
see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>.
The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
38d47c1b70 "[PATCH] futex: rely on
get_user_pages() for shared futexes". That commit made get_futex_key()
also increment the reference count of the futex key, and updated its
callers to decrement the key's reference count before returning.
Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
the reference count is incremented by get_futex_key() and queue_lock(),
but the normal exit path only decrements once, via unqueue_me_pi().
The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
this is easily done by 'goto out_put_key' rather than 'goto out'.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
not set the S_IFMT part of inode->i_mode.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we unregister the bdi before kill_anon_super() calls
ida_remove() on our device name.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).
The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Interrupts must be disabled while an interrupt state restore
(prep for interrupt return) is in progress.
Code to do this was lost in the port to the mainline kernel.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.
This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.
However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).
This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.
Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.
Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178
Reported-by: Coly Li <coly.li@suse.de>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.
This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.
Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker
<joel.becker@oracle.com> contributed heavily to this patch.
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
This patch adds 2 fields to the ibm_architecture_vec array.
The first of these fields indicates the number of cores which Linux can
boot. It does not account for SMT, so it may result in cpus assigned to
Linux which cannot be booted. A second patch follows that dynamically
updates this for SMT.
The second field just indicates that our OS is Linux, and not another
OS. The system may or may not use this hint to performance tune
settings for Linux.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We can free memory allocated with lmb_alloc() by removing it from the
list of reserved LMBs. Rework lmb_remove() to allow that possibility
and add lmb_free() which exploits it.
BenH: Removed some useless parenthesis
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With dynamic irq descriptors the overhead of a large NR_IRQS is much lower
than it used to be. With more MSI-X capable adapters and drivers exploiting
multiple vectors we may as well allow the user to increase it beyond the
current maximum of 512.
32768 seems large enough that we'd never have to bump it again (although I bet
my prediction is horribly wrong). It boot tests OK and the vmlinux footprint
increase is only around 500kB due to:
struct irq_map_entry irq_map[NR_IRQS];
We format /proc/interrupts correctly with the previous changes:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
286: 0 0 0 0 0 0
516: 0 0 0 0 0 0
16689: 1833 0 0 0 0 0
17157: 0 0 0 0 0 0
17158: 319 0 0 0 0 0
25092: 0 0 0 0 0 0
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some code that is in ams_exit() (the module exit code) should instead
be called when the device (not module) is removed. It probably doesn't
make much of a difference in the PMU case, but in the I2C case it does
matter.
I make no guarantee that my fix isn't racy, I'm not familiar enough
with the ams driver code to tell for sure.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stelian Pop <stelian@popies.net>
Cc: Michael Hanselmann <linux-kernel@hansmi.ch>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Looking at drivers/macintosh/therm_adt746x.c, the sysfs files are
created in thermostat_init() and removed in thermostat_exit(), which
are the driver's init and exit functions. These files are backed-up by
a per-device structure, so it looks like the wrong thing to do: the
sysfs files have a lifetime longer than the data structure that is
backing it up.
I think that sysfs files creation should be moved to the end of
probe_thermostat() and sysfs files removal should be moved to the
beginning of remove_thermostat().
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Colin Leroy <colin@colino.net>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Virtio consoles can be hotplugged, so hvc_alloc gets called from
multiple sites: from the initial probe() routine as well as later on
from workqueue handlers which aren't __devinit code.
So, drop the __devinit annotation for hvc_alloc.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent U-Boot commit 5ccd29c3679b3669b0bde5c501c1aa0f325a7acb caused
the "cpu-release-addr" device tree property to contain the physical RAM
location that secondary cores were spinning at. Previously, the
"cpu-release-addr" property contained a value referencing the boot page
translation address range of 0xfffffxxx, which then indirectly accessed
RAM.
The "cpu-release-addr" is currently ioremapped and the secondary cores
kicked. However, due to the recent change in "cpu-release-addr", it
sometimes points to a memory location in low memory that cannot be
ioremapped. For example on a P2020-based board with 512MB of RAM the
following error occurs on bootup:
<...>
mpic: requesting IPIs ...
__ioremap(): phys addr 0x1ffff000 is RAM lr c05df9a0
Unable to handle kernel paging request for data at address 0x00000014
Faulting instruction address: 0xc05df9b0
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 P2020 RDB
Modules linked in:
<... eventual kernel panic>
Adding logic to conditionally ioremap or access memory directly resolves
the issue.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Nate Case <ncase@xes-inc.com>
Reported-by: Dipen Dudhat <B09055@freescale.com>
Tested-by: Dipen Dudhat <B09055@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The cputime code has a few places that do per_cpu(, smp_processor_id()).
Replace them with __get_cpu_var().
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Although we use u64 to pass userspace pointers to the kernel
to avoid compat_ioctl, it doesn't work in some ppc platform.
So wrap them with compat_ptr and add compat_ioctl.
The detailed discussion about compat_ptr can be found in thread
http://lkml.org/lkml/2009/10/27/423.
We indeed met with a bug when testing on ppc(-EFAULT is returned
when using old_path). This patch try to fix this.
I have tested in ppc64(with 32 bit reflink) and x86_64(with i686
reflink), both works.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Mainline commit aad1b15310 made the
dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN.
As this error is transmitted over the wire, we want the receiver,
dlm_send_begin_reco_message(), to understand both the older EAGAIN and
the newer -EAGAIN, to allow rolling upgrade of the cluster nodes.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In CoW, we have to make sure that the page is already written
out to the disk. So we have a BUG_ON(PageDirty(page)).
In ppc platform we have pagesize=64K, so if the cs=4K, if the
file have fragmented clusters, we will map the page many times.
See this file as an example.
Tree Depth: 0 Count: 19 Next Free Rec: 14
## Offset Clusters Block# Flags
0 0 4 2164864 0x2 Refcounted
1 4 2 9302792 0x2 Refcounted
...
We have to replace the extent recs one by one, so the page with index 0
will be mapped and dirtied twice.
I'd like to leave the BUG_ON there while adding a check so that in
case we meet with an error in other platforms, we can find it easily.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In ocfs2_duplicate_clusters_by_page, we calculate map_end
by shifting page_index. But actually in case we meet with
a large offset(say in a i686 box, poff_t is only 32 bits
and page_index=2056240), we will overflow. So change the
type of page_index to loff_t.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
connector: Delete buggy notification code.
be2net: use eq-id to calculate cev-isr reg offset
Bluetooth: Use the control channel for raw HID reports
Bluetooth: Add DFU driver for Atheros Bluetooth chipset AR3011
Bluetooth: Redo checks in IRQ handler for shared IRQ support
Bluetooth: Fix memory leak in L2CAP
Bluetooth: Remove double free of SKB pointer in L2CAP
cdc_ether: Partially revert "usbnet: Set link down initially ..."
be2net: Fix memset() arg ordering.
bonding: bond_open error return value
ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
ixgbe: set the correct DCB bit for pg tx settings
igbvf: fix issue w/ mapped_as_page being left set after unmap
drivers/net: ks8851_mll ethernet network driver
be2net: Bug fix to support newer generation of BE ASIC
starfire: clean up properly if firmware loading fails
mac80211: fix NULL pointer dereference when ftrace is enabled
netfilter: ctnetlink: fix expectation mask dump
ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
ath9k: fix eeprom INI values override for 2GHz-only cards
...
This is the counterpart to cba767175b
("pktcdvd: remove broken dev_t export of class devices"). Device is not
registered using dev_t, so it should not be destroyed using device_destroy
which looks up the device by dev_t. This will fail and adding the device
again will fail with the "duplicate name" error. This is fixed using
device_unregister instead of device_destroy.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>