Kill the declarations in the header file and mark them as static.
Reshuffle a few functions to ensure that everything is properly
declared before being used.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Do the conversion/copy up front instead of passing in a compat flag
to the ioctl handler and subsequently to the exec_drive_taskfile()
function.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
virtio-blk: use ida to allocate disk index
hpsa: add small delay when using PCI Power Management to reset for kump
cciss: add small delay when using PCI Power Management to reset for kump
xen/blkback: Fix two races in the handling of barrier requests.
xen/blkback: Check for proper operation.
xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
xen/blkback: Report VBD_WSECT (wr_sect) properly.
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
xen-blkfront: plug device number leak in xlblk_init() error path
xen-blkfront: If no barrier or flush is supported, use invalid operation.
xen-blkback: use kzalloc() in favor of kmalloc()+memset()
xen-blkback: fixed indentation and comments
xen-blkfront: fix a deadlock while handling discard response
xen-blkfront: Handle discard requests.
xen-blkback: Implement discard requests ('feature-discard')
xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
drivers/block/loop.c: emit uevent on auto release
drivers/block/cpqarray.c: use pci_dev->revision
loop: always allow userspace partitions and optionally support automatic scanning
...
Fic up trivial header file includsion conflict in drivers/block/loop.c
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
The doorbell stride allows devices to spread out their doorbells instead
of packing them tightly. This feature was added as part of ECN 003.
This patch also enables support for more than 512 queues :-)
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
ECN 001 documented that namespace 0 is not valid. Sending an Identify
with CNS of 0 and Namespace of 0 is an undefined command.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The existing calculation underestimated the number of pages required
as it did not take into account the pointer at the end of each page.
The replacement calculation may overestimate the number of pages required
if the last page in the PRP List is entirely full. By using ->npages
as a counter as we fill in the pages, we ensure that we don't try to
free a page that was never allocated.
Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Instead of open-coding calls to nvme_submit_admin_cmd, these
small wrappers are simpler to use (the patch removes 14 lines from
nvme_dev_add() for example).
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
dma_unmap_sg() must be called with the same 'nents' passed to
dma_map_sg(), not the number returned from dma_map_sg().
Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Our SG list was constructed to always fill the entire first page, even
if that was more than the length of the I/O. This is probably harmless,
but some IOMMUs might do something bad.
Correcting the first call to sg_set_page() made it look a lot closer to
the sg_set_page() in the loop, so fold the first call to sg_set_page()
into the loop.
Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE
and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD
ioctl that can submit any admin command.
Add a new ID ioctl that returns the namespace ID of the queried device.
It corresponds to the SCSI Idlun ioctl.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
If the I/O was not completed by a single NVMe command, we add the
bio to the congestion list and wake up the kthread to resubmit it.
But the kthread calls remove_wait_queue() unconditionally, which
will oops if it's not on the wait queue. So add the kthread to
the wait queue before waking it up.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
nvme_setup_io_queues() was assuming that a NULL return from
nvme_create_queue() was an out-of-memory error. That's not necessarily
true; the adapter might return -EIO, for example. Change the calling
convention to return an ERR_PTR on failure instead of NULL.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
For the benefit of reviewers, add comments to a few functions describing
their calling context
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
If any of the memory allocations in nvme_setup_prps fail, handle it by
modifying the passed-in data length to reflect the number of bytes we are
actually able to send. Also allow the caller to specify the GFP flags
they need; for user-initiated commands, we can use GFP_KERNEL allocations.
The various callers are updated to handle this possibility; the main
I/O path is already prepared for this possibility (as it may happen
due to nvme_map_bio being unable to map all the segments of the I/O).
The other callers return -ENOMEM instead of doing partial I/Os.
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The current approach of using the namespace ID as the minor number
doesn't work when there are multiple adapters in the machine. Rather
than statically partitioning the number of namespaces between adapters,
dynamically allocate minor numbers to namespaces as they are detected.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The trailing '_data' on the end was annoying and inconsistent. Also, make
it actually return the data since this is needed for timing out commands.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
When an I/O completed with an error, we would call bio_endio twice
(once with -EIO and once with 0). Found by inspection.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
THe device reports (in its capability register) how long it will take
to initialise. If that time elapses before the ready bit becomes set,
conclude the device is broken and refuse to initialise it. Log a nice
error message so the user knows why we did nothing.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The arbitration field was extended by one bit, shifting the shutdown
notification bits by one. Also, the SQ/CQ entry size was made
configurable for future extensions.
Reported-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The read and write commands don't define a 'result', so there's no need
to copy it back to userspace.
Remove the ability of the ioctl to submit commands to a different
namespace; it's just asking for trouble, and the use case I have in mind
will be addressed througha different ioctl in the future. That removes
the need for both the block_shift and nsid arguments.
Check that the opcode is one of 'read' or 'write'. Future opcodes may
be added in the future, but we will need a different structure definition
for them.
The nblocks field is redefined to be 0-based. This allows the user to
request the full 65536 blocks.
Don't byteswap the reftag, apptag and appmask. Martin Petersen tells
me these are calculated in big-endian and are transmitted to the device
in big-endian.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Make ioctls work for 32-bit applications on 64-bit kernels. The structures
are defined to be the same for both 32- and 64-bit applications, so
we can use the same handler for both.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Fill in all the num_possible_cpus() entries with duplicate pointers.
This reduces the complexity of the frequently-called get_nvmeq(), as
well as avoiding a bug in it when there are fewer queues than CPUs.
Reported-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Once there are no more bios on the congestion list, we can stop waking
up the nvme kthread every time a completion happens.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
If the last element in the PRP list fits on the end of the page, there's
no need to allocate an extra page to put that single element in. It can
fit on the end of the page.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The spec says this is a 0s based value. We don't need to handle the
maximal value because it's reserved to mean "every namespace".
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The head can never overrun the tail since we won't allocate enough command
IDs to let that happen. The status codes are in sync with the spec.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The spec says we're not allowed to completely fill the submission queue.
Solve this by reducing the number of allocatable cmdids by 1.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
When we submit subsequent portions of the I/O, we need to access the
updated block, not start reading again from the original position.
This was showing up as miscompares in the XFS randholes testcase.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
NVMe scatterlists must be virtually contiguous, like almost all I/Os.
However, when the filesystem lays out files with a hole, it can be that
adjacent LBAs map to non-adjacent virtual addresses. Handle this by
submitting one NVMe command at a time for each virtually discontiguous
range.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Linux implements Flush as a bit in the bio. That means there may also be
data associated with the flush; if so the flush should be sent before the
data. To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate
the completion routine should do nothing.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The value written to the doorbell needs to be the first free index in
the queue, not the most recently used index in the queue.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
If interrupts are misconfigured, the kthread will be needed to process
admin queue completions.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>