Commit Graph

402786 Commits

Author SHA1 Message Date
Florian Schmaus
9b4e9f5abb bcache: do not assign in if condition in bcache_device_init()
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.

Signed-off-by: Florian Schmaus <flo@geekplace.eu>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Florian Schmaus
16c1fdf4cf bcache: do not assign in if condition in bcache_init()
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.

Signed-off-by: Florian Schmaus <flo@geekplace.eu>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Shenghui Wang
6268dc2c47 bcache: free heap cache_set->flush_btree in bch_journal_free
Free the cache_set->flush_bree heap memory on journal free.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Florian Schmaus
a56489d4b3 bcache: do not assign in if condition register_bcache()
Fixes an error condition reported by checkpatch.pl which is caused by
assigning a variable in an if condition.

Signed-off-by: Florian Schmaus <flo@geekplace.eu>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Tang Junhui
94f71c1606 bcache: fix I/O significant decline while backend devices registering
I attached several backend devices in the same cache set, and produced lots
of dirty data by running small rand I/O writes in a long time, then I
continue run I/O in the others cached devices, and stopped a cached device,
after a mean while, I register the stopped device again, I see the running
I/O in the others cached devices dropped significantly, sometimes even
jumps to zero.

In currently code, bcache would traverse each keys and btree node to count
the dirty data under read locker, and the writes threads can not get the
btree write locker, and when there is a lot of keys and btree node in the
registering device, it would last several seconds, so the write I/Os in
others cached device are blocked and declined significantly.

In this patch, when a device registering to a ache set, which exist others
cached devices with running I/Os, we get the amount of dirty data of the
device in an incremental way, and do not block other cached devices all the
time.

Patch v2: Rename some variables and macros name as Coly suggested.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Tang Junhui
7f4a59de28 bcache: calculate the number of incremental GC nodes according to the total of btree nodes
This patch base on "[PATCH] bcache: finish incremental GC".

Since incremental GC would stop 100ms when front side I/O comes, so when
there are many btree nodes, if GC only processes constant (100) nodes each
time, GC would last a long time, and the front I/Os would run out of the
buckets (since no new bucket can be allocated during GC), and I/Os be
blocked again.

So GC should not process constant nodes, but varied nodes according to the
number of btree nodes. In this patch, GC is divided into constant (100)
times, so when there are many btree nodes, GC can process more nodes each
time, otherwise GC will process less nodes each time (but no less than
MIN_GC_NODES).

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Tang Junhui
5c25c4fc74 bcache: finish incremental GC
In GC thread, we record the latest GC key in gc_done, which is expected
to be used for incremental GC, but in currently code, we didn't realize
it. When GC runs, front side IO would be blocked until the GC over, it
would be a long time if there is a lot of btree nodes.

This patch realizes incremental GC, the main ideal is that, when there
are front side I/Os, after GC some nodes (100), we stop GC, release locker
of the btree node, and go to process the front side I/Os for some times
(100 ms), then go back to GC again.

By this patch, when we doing GC, I/Os are not blocked all the time, and
there is no obvious I/Os zero jump problem any more.

Patch v2: Rename some variables and macros name as Coly suggested.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Tang Junhui
99a27d59bd bcache: simplify the calculation of the total amount of flash dirty data
Currently we calculate the total amount of flash only devices dirty data
by adding the dirty data of each flash only device under registering
locker. It is very inefficient.

In this patch, we add a member flash_dev_dirty_sectors in struct cache_set
to record the total amount of flash only devices dirty data in real time,
so we didn't need to calculate the total amount of dirty data any more.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27 09:15:46 -06:00
Greg Edwards
cdcdcaae84 scsi: virtio_scsi: fix pi_bytes{out,in} on 4 KiB block size devices
When the underlying device is a 4 KiB logical block size device with a
protection interval exponent of 0, i.e. 4096 bytes data + 8 bytes PI, the
driver miscalculates the pi_bytes{out,in} by a factor of 8x (64 bytes).

This leads to errors on all reads and writes on 4 KiB logical block size
devices when CONFIG_BLK_DEV_INTEGRITY is enabled and the
VIRTIO_SCSI_F_T10_PI feature bit has been negotiated.

Fixes: e6dc783a38 ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD")
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-26 15:49:43 -06:00
Juergen Gross
d3df0ac096 xen/blkfront: remove unused macros
Remove some macros not used anywhere.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-25 08:49:24 -06:00
Jens Axboe
eca53cb63f Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block
Pull NVMe updates from Christoph:

"Highlights:

 - massively improved tracepoints (Keith Busch)
 - support for larger inline data in the RDMA host and target
   (Steve Wise)
 - RDMA setup/teardown path fixes and refactor (Sagi Grimberg)
 - Command Supported and Effects log support for the NVMe target
   (Chaitanya Kulkarni)
 - buffered I/O support for the NVMe target (Chaitanya Kulkarni)

 plus the usual set of cleanups and small enhancements."

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvmet: don't use uuid_le type
  nvmet: check fileio lba range access boundaries
  nvmet: fix file discard return status
  nvme-rdma: centralize admin/io queue teardown sequence
  nvme-rdma: centralize controller setup sequence
  nvme-rdma: unquiesce queues when deleting the controller
  nvme-rdma: mark expected switch fall-through
  nvme: add disk name to trace events
  nvme: add controller name to trace events
  nvme: use hw qid in trace events
  nvme: cache struct nvme_ctrl reference to struct nvme_request
  nvmet-rdma: add an error flow for post_recv failures
  nvmet-rdma: add unlikely check in the fast path
  nvmet-rdma: support max(16KB, PAGE_SIZE) inline data
  nvme-rdma: support up to 4 segments of inline data
  nvmet: add buffered I/O support for file backed ns
  nvmet: add commands supported and effects log page
  nvme: move init of keep_alive work item to controller initialization
  nvme.h: resync with nvme-cli
2018-07-25 08:43:30 -06:00
Christoph Hellwig
3ed122e68b md: remove a bogus comment
The function name mentioned doesn't exist, and the code next to it
doesn't match the description either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24 14:43:24 -06:00
Christoph Hellwig
c8b27acc77 bcache: don't clone bio in bch_data_verify
We immediately overwrite the biovec array, so instead just allocate
a new bio and copy over the disk, setor and size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24 14:43:19 -06:00
Bart Van Assche
76f17d8ba1 block: Rename the null_blk_mod kernel module back into null_blk
Commit ca4b2a0119 ("null_blk: add zone support") breaks several
blktests scripts because it renamed the null_blk kernel module into
null_blk_mod. Hence rename null_blk_mod back into null_blk.

Fixes: ca4b2a0119 ("null_blk: add zone support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Matias Bjorling <matias.bjorling@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24 09:54:36 -06:00
Andy Shevchenko
1b0d274523 nvmet: don't use uuid_le type
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le.
Since both arguments are u8 [16], use size of destination there.

Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t
already.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:51 +02:00
Sagi Grimberg
9c891c1398 nvmet: check fileio lba range access boundaries
Fail out-of-bounds with a proper status code.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:51 +02:00
Sagi Grimberg
1b72b71fac nvmet: fix file discard return status
If nvmet_copy_from_sgl failed, we falsly return successful
completion status.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:50 +02:00
Sagi Grimberg
75862c7232 nvme-rdma: centralize admin/io queue teardown sequence
We follow the same queue teardown sequence in delete, reset and error
recovery. Centralize the logic.  This patch does not change any
functionality.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:50 +02:00
Sagi Grimberg
c66e2998c8 nvme-rdma: centralize controller setup sequence
Centralize controller sequence to a single routine that correctly cleans
up after failures instead of having multiple apperances in several flows
(create, reset, reconnect).

One thing that we also gain here are the sanity/boundary checks also
when connecting back to a dynamic controller.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:49 +02:00
Sagi Grimberg
90140624e8 nvme-rdma: unquiesce queues when deleting the controller
If the controller is going away, we need to unquiesce the IO queues so
that all pending request can fail gracefully before moving forward with
controller deletion. Do that before we destroy the IO queues so
blk_cleanup_queue won't block in freeze.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:49 +02:00
Gustavo A. R. Silva
249090f901 nvme-rdma: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:48 +02:00
Keith Busch
6268953e89 nvme: add disk name to trace events
This will print the disk name to the nvme event trace for io requests so
a user can better distinguish traffic to different disks. This can be used
to  create disk based filters. For example, to see only nvme0n2 traffic:

  echo "disk == \"nvme0n2\"" > /sys/kernel/debug/tracing/events/nvme/filter

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: turned __assign_disk_name into an inline function]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:48 +02:00
Keith Busch
b80a55e246 nvme: add controller name to trace events
This appends the controller instance to the nvme trace buffer to
distinguish which controller is dispatching and completing a command.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:45 +02:00
Keith Busch
5d87eb94d9 nvme: use hw qid in trace events
We can not match a command to its completion based on the command
id alone. We need the submitting queue identifier to pair with the
completion, so this patch adds that to the trace buffer.

This patch is also collapsing the admin and IO submission traces into a
single one so we don't need to duplicate this and creating unnecessary
code branches: we know if the command is an admin vs IO based on the qid.

And since we're here, the patch fixes code formatting in the area.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: move the qid helper to nvme.h and made it an inline function]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:19 +02:00
Sagi Grimberg
59e29ce66b nvme: cache struct nvme_ctrl reference to struct nvme_request
We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:18 +02:00
Max Gurtovoy
202093848c nvmet-rdma: add an error flow for post_recv failures
Posting receive buffer operation can fail, thus we should make
sure to have an error flow during initialization phase. While
we're here, add a debug print in case of a failure.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:17 +02:00
Max Gurtovoy
2fc464e216 nvmet-rdma: add unlikely check in the fast path
ib_post_send operation should succeed unless something unusual
happened to the ib device.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:16 +02:00
Steve Wise
0d5ee2b2ab nvmet-rdma: support max(16KB, PAGE_SIZE) inline data
The patch enables inline data sizes using up to 4 recv sges, and capping
the size at 16KB or at least 1 page size.  So on a 4K page system, up to
16KB is supported, and for a 64K page system 1 page of 64KB is supported.

We avoid > 0 order page allocations for the inline buffers by using
multiple recv sges, one for each page.  If the device cannot support
the configured inline data size due to lack of enough recv sges, then
log a warning and reduce the inline size.

Add a new configfs port attribute, called param_inline_data_size,
to allow configuring the size of inline data for a given nvmf port.
The maximum size allowed is still enforced by nvmet-rdma with
NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE).
And the default size, if not specified via configfs, is still PAGE_SIZE.
This preserves the existing behavior, but allows larger inline sizes
for small page systems.  If the configured inline data size exceeds
NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is
reduced.  If param_inline_data_size is set to 0, then inline data is
disabled for that nvmf port.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:16 +02:00
Steve Wise
64a741c1ea nvme-rdma: support up to 4 segments of inline data
Allow up to 4 segments of inline data for NVMF WRITE operations. This
reduces latency for small WRITEs by removing the need for the target to
issue a READ WR for IB, or a REG_MR + READ WR chain for iWarp.

Also cap the inline segments used based on the limitations of the
device.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:15 +02:00
Chaitanya Kulkarni
55eb942eda nvmet: add buffered I/O support for file backed ns
Add a new "buffered_io" attribute, which disabled direct I/O and thus
enables page cache based caching when enabled.   The attribute can only
be changed when the namespace is disabled as the file has to be reopend
for the change to take effect.

The possibly blocking read/write are deferred to a newly introduced
global workqueue.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:14 +02:00
Chaitanya Kulkarni
0866bf0c37 nvmet: add commands supported and effects log page
This patch adds support for Commands Supported and Effects log page
(Log Identifier 05h) for NVMeOF. This also makes it easier to find
which commands are supported, e.g. :-

subnqn    : testnqn1
Admin Command Set
ACS2     [Get Log Page                    ] 00000001
ACS6     [Identify                        ] 00000001
ACS8     [Abort                           ] 00000001
ACS9     [Set Features                    ] 00000001
ACS10    [Get Features                    ] 00000001
ACS12    [Asynchronous Event Request      ] 00000001
ACS24    [Keep Alive                      ] 00000001

NVM Command Set
IOCS0    [Flush                           ] 00000001
IOCS1    [Write                           ] 00000001
IOCS2    [Read                            ] 00000001
IOCS8    [Write Zeroes                    ] 00000001
IOCS9    [Dataset Management              ] 00000001

This partticular functionality can be used from the host side to examine
the NVMeOF ctrl commands supported.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:13 +02:00
James Smart
230f1f9e04 nvme: move init of keep_alive work item to controller initialization
Currently, the code initializes the keep alive work item whenever
nvme_start_keep_alive() is called. However, this routine is called
several times while reconnecting, etc. Although it's hoped that keep
alive is always disabled and not scheduled when start is called,
re-initing if it were scheduled or completing can have very bad
side effects. There's no need for re-initialization.

Move the keep_alive work item and cmd struct initialization to
controller init.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:13 +02:00
Michael Callahan
ddcf35d397 block: Add and use op_stat_group() for indexing disk_stat fields.
Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.

In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.

Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function.  It's now
indexed by op_is_write().

tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.

Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18 08:44:20 -06:00
Michael Callahan
59767fbd49 block: Add part_stat_read_accum to read across field entries.
Add a part_stat_read_accum macro to genhd.h to read and sum across
field entries.  For example to sum up the number read and write
sectors completed.  In addition to being ar reasonable cleanup by
itself this will make it easier to add new stat fields in the future.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18 08:44:16 -06:00
Tejun Heo
3f289dcb4b block: make bdev_ops->rw_page() take a REQ_OP instead of bool
c11f0c0b5b ("block/mm: make bdev_ops->rw_page() take a bool for
read/write") replaced @op with boolean @is_write, which limited the
amount of information going into ->rw_page() and more importantly
page_endio(), which removed the need to expose block internals to mm.

Unfortunately, we want to track discards separately and @is_write
isn't enough information.  This patch updates bdev_ops->rw_page() to
take REQ_OP instead but leaves page_endio() to take bool @is_write.
This allows the block part of operations to have enough information
while not leaking it to mm.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18 08:44:14 -06:00
RAGHU Halharvi
ada94973f1 pktcdvd: remove assignment in if condition
* Remove checkpatch errors caused due to assignment operation in if
condition

Signed-off-by: RAGHU Halharvi <raghuhack78@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-17 16:17:04 -06:00
Hans Holmberg
f6352103d2 lightnvm: pblk: assume that chunks are closed on 1.2 devices
We can't know if a block is closed or not on 1.2 devices, so assume
closed state to make sure that blocks are erased before writing.

Fixes: 32ef9412c1 ("lightnvm: pblk: implement get log report chunk")
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:49 -06:00
Heiner Litz
11f6ad699a lightnvm: pblk: add asynchronous partial read
In the read path, partial reads are currently performed synchronously
which affects performance for workloads that generate many partial
reads. This patch adds an asynchronous partial read path as well as
the required partial read ctx.

Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
Reviewed-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:47 -06:00
Gustavo A. R. Silva
884b031b28 lightnvm: pblk: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:45 -06:00
Matias Bjørling
4e495a46b1 lightnvm: pblk: expose generic disk name on pr_* msgs
The error messages in pblk does not say which pblk instance that
a message occurred from. Update each error message to reflect the
instance it belongs to, and also prefix it with pblk, so we know
the message comes from the pblk module.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:43 -06:00
Matias Bjørling
59a8f43b63 lightnvm: limit get chunk meta request size
For devices that does not specify a limit on its transfer size, the
get_chk_meta command may send down a single I/O retrieving the full
chunk metadata table. Resulting in large 2-4MB I/O requests. Instead,
split up the I/Os to a maximum of 256KB and issue them separately to
reduce memory requirements.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:41 -06:00
Matias Bjørling
921aebfac0 lightnvm: pblk: fix read_bitmap for 32bit archs
If using pblk on a 32bit architecture, and there is a need to
perform a partial read, the partial read bitmap will only have
allocated 32 entries, where as 64 are needed.

Make sure that the read_bitmap is initialized to 64bits on 32bit
architectures as well.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:39 -06:00
Bart Van Assche
242e461fb6 lightnvm: Remove redundant rq->__data_len initialization
Since both blk_old_get_request() and blk_mq_alloc_request() initialize
rq->__data_len to zero, it is not necessary to initialize that member
in nvme_nvm_alloc_request(). Hence remove the rq->__data_len
initialization from nvme_nvm_alloc_request().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:36 -06:00
Matias Bjørling
99b8dad1b6 lightnvm: pblk: enable line minor version detection
When recovering a line, an extra check was added when debugging was
active, such that minor version where also checked. Unfortunately,
this used the ifdef NVM_DEBUG, which is not correct.

Instead use the proper DEBUG def, and now that it compiles, also fix
the variable.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Fixes: d0ab0b1ab9 ("lightnvm: pblk: check data lines version on recovery")
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:32 -06:00
Matias Bjørling
880eda5440 lightnvm: move NVM_DEBUG to pblk
There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All
users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable
only for pblk.

Also fix up the CONFIG_NVM_PBLK entry to follow the code style for
Kconfig files.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:31 -06:00
Marcin Dziegielewski
ffc03fb7a5 lightnvm: pblk: handle case when mw_cunits equals to 0
Some devices can expose mw_cunits equal to 0, it can cause the
creation of too small write buffer and cause performance to drop
on write workloads.

Additionally, write buffer size must cover write data requirements,
such as WS_MIN and MW_CUNITS - it must be greater than or equal to
the larger one multiplied by the number of PUs. However, for
performance reasons, use the WS_OPT value to calculation instead of
WS_MIN.

Because the place where buffer size is calculated was changed, this
patch also removes pgs_in_buffer filed in pblk structure.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13 08:14:29 -06:00
Helge Deller
ea870bb2ae block: skd: Use %pad printk format for dma_addr_t values
Use the existing %pad printk format to print dma_addr_t values.
This avoids the following warnings when compiling on the parisc64 platform:

drivers/block/skd_main.c: In function 'skd_preop_sg_list':
drivers/block/skd_main.c:660:4: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-12 15:01:28 -06:00
Randy Dunlap
3993e501bf block/DAC960.c: fix defined but not used build warnings
Fix build warnings in DAC960.c when CONFIG_PROC_FS is not enabled
by marking the unused functions as __maybe_unused.

../drivers/block/DAC960.c:6429:12: warning: 'dac960_proc_show' defined but not used [-Wunused-function]
../drivers/block/DAC960.c:6449:12: warning: 'dac960_initial_status_proc_show' defined but not used [-Wunused-function]
../drivers/block/DAC960.c:6456:12: warning: 'dac960_current_status_proc_show' defined but not used [-Wunused-function]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-09 09:07:55 -06:00
Matias Bjørling
ca4b2a0119 null_blk: add zone support
Adds support for exposing a null_blk device through the zone device
interface.

The interface is managed with the parameters zoned and zone_size.
If zoned is set, the null_blk instance registers as a zoned block
device. The zone_size parameter defines how big each zone will be.

Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-09 09:07:55 -06:00
Matias Bjørling
6dad38d38f null_blk: move shared definitions to header file
Split the null_blk device driver, such that it can prepare for
zoned block interface support.

Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-09 09:07:54 -06:00