As the LU cache grows it can consume large enough chunks of
memory that ends up preventing buffers for other objects,
such as the OIs, from being cached and severely impacting
the performance for FID lookups. Limit the lu_object cache
to a maximum of lu_cache_nr objects.
NOTES:
* In order to be able to quickly determine the number of objects in
the hash table the CFS_HASH_COUNTER flag is added. This adds an
atomic_inc/dec to the hash insert/remove paths but is not expected
to have any measurable impact of performance.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5164
Reviewed-on: http://review.whamcloud.com/10237
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A few changes are made in this patch for unstable pages tracking:
1. Remove kernel NFS unstable pages tracking because it killed
performance
2. Track unstable pages as part of LRU cache. Otherwise Lustre
can use much more memory than max_cached_mb
3. Remove obd_unstable_pages tracking to avoid using global
atomic counter
4. Make unstable pages track optional. Tracking unstable pages is
turned off by default, and can be controlled by
llite.*.unstable_stats.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
Reviewed-on: http://review.whamcloud.com/10003
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With DNE every object can have two locks in different namespaces:
lookup lock in space of MDT storing direntry and update/open lock
in space of MDT storing inode. In lmv_find_cbdata/lmv_lock_lock,
it should try the MDT that the FID maps to first, since this can
be easily found, and only try others if that fails.
In the error handler of lmv_add_targets, it should check whether
ld_tgt_count is being increased before ld_tgt_count is being -1.
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4098
Reviewed-on: http://review.whamcloud.com/8019
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the layout LFSCK repairs orphan OST-object, if the parent
MDT-object was lost, then it will re-create the MDT-object and
regenerate the LOV EA and fill the target LOV EA slot with the
orphan information, and fill other slots with zero (LOV hole);
if related LOV EA slot is invalid or hole, then it will refill
the target LOV EA slot; if the target slot exceeds current LOV
EA tail, then extend the LOV EA, and fill the gaps as zero.
Some of the LOV EA holes may cannot be re-filled finally because
of lost some OST-objects. And even if they can be re-filled, but
there are still some possible race accessings from client before
the re-filling. If the client access the LOV EA with hole(s), it
may cause some strange behaviour, such as trigger LBUG()/LASSERT()
on the client.
So we will make the client to be aware of the LOV EA is incomplete.
We introduce a new LOV EA pattern flag LOV_PATTERN_F_HOLE for that:
any time when the LFSCK repairs the LOV EA with hole(s), the LOV EA
will be marked as LOV_PATTERN_F_HOLE; when all the holes in the LOV
EA are refilled, the LOV_PATTERN_F_HOLE will be dropped.
For a new client, it recongizes the pattern flag LOV_PATTERN_F_HOLE,
then it can permit/forbid some opertions on the file with LOV holes:
1) Normal read/write the file with LOV EA hole is permitted, but the
application will get EIO error when read data from the dummy slot
or write data to the dummy slot.
2) The users can dump the recovered data via some common read tools,
such as "dd conv=sync,noerror".
3) Append data to the file which has LOV EA hole will get EIO failure.
4) Other operations will skip the LOV EA hole(s), and will not get
failures, such as {s,g}etattr, {s,g}getxattr, stat, chown/chgrp,
chmod, touch, unlink, and so on.
For an old client, since it will not recognize the new pattern flag
LOV_PATTERN_F_HOLE. So the LOV EA with hole will be dicarded with
failure, but it will not cause the client to be crashed.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4675
Reviewed-on: http://review.whamcloud.com/10042
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not permitted in C++ to have a static declaration inside
of an anonymous union. The g++ compiler will complaine with an
error like this:
error: struct ost_id::<anonymous union>::ostid invalid; an
anonymous union can only have non-static data members [-fpermissive]
This patch changes the code to use an unnamed struct in place of
"struct ostid" inside of the anonymous union. That name declaration
was completely unnecessary anyway, since it was not used anywhere else.
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4987
Reviewed-on: http://review.whamcloud.com/10176
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Root squash exhibits inconsistent behaviour on a client when
enabled. If a file is not cached on the client, then root will get
a permission denied error when accessing the file. When
the file has recently been accessed by a regular user and is
still in cache, root will be able to access the file without error
because the permission check is only done by the client that
isn't aware of root squash.
While the only real security benefit from root squash is to deny
clients access to files owned by root itself, it also makes sense
to treat file access on the client in a consistent manner
regardless of whether the file is in cache or not.
This patch adds root squash settings to llite so that client is able
to apply root squashing when it is relevant.
Configuration of MDT root squash settings will automatically be
applied to llite config log as well.
Update cfs_str2num_check() routine by removing any modification
of the specified string parameter. Since string can come from ls_str
field of a lstr structure, this avoids inconsistent ls_len field.
Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1778
Reviewed-on: http://review.whamcloud.com/5700
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Separate master stripe with master object, so
1. stripeEA only exists on master object.
2. sub-stripe object will be inserted into master object
as sub-directory, and it can get the master object by "..".
By this, it will remove those specilities for stripe0 in
LMV and LOD. And also simplify LFSCK, i.e. consistency check
would be easier.
When then master object becomes an orphan, we should
mark all of its sub-stripes as dead object as well,
otherwise client might still be able to create files
under these stripes.
A few fixes for striped directory layout lock:
1. stripe 0 should be locked as EX, same as other stripes.
2. Acquire the layout for directory, when it is being unliked.
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4690
Reviewed-on: http://review.whamcloud.com/9511
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When determining whether an early reply can be sent the server will
calculate the new deadline based on an offset from the request
arrival time. However, when actually setting the new deadline
the server offsets the current time. This can result in deadlines
being extended more than at_max seconds past the request arrival
time. Instead, the server should offset the arrival time when updating
its request timeout.
When a client receives an early reply it doesn't know the server side
arrival time so we use the original sent time as an approximation.
Signed-off-by: Chris Horn <hornc@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4578
Reviewed-on: http://review.whamcloud.com/9100
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Christopher J. Morrone <chris.morrone.llnl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the log file name contains the current time in seconds, dumping
the logs more than once per second causes EEXIST errors to be emitted.
Add a static variable to libcfs_debug_dumplog_internal that records
the time of the last Lustre log dump. If the current time in seconds
is equal to the last time, do not dump logs again.
Note that this is not thread-safe. However, in the rare case that two
threads try to access last_dump_time simultaneously, the worst thing
that could happen is that one of the threads will get an EEXIST error
when trying to write the log file. This is no worse than the current
situation, and it is not likely to happen.
Signed-off-by: Ryan Haasken <haasken@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4129
Reviewed-on: http://review.whamcloud.com/8964
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was no protection when inc/dec lu_device_type::ldt_device_nr,
which may caused the ldt_device_nr to be wrong and trigger assert.
This patch redefine lu_device_type::ldt_device_nr as atomic type.
There was no protection when add/del lu_device_type::ldt_linkage
into/from the global lu_device_types list, which may caused bad
address accessing. This patch uses the existing obd_types_lock
to protect related operations.
We do NOT need lu_types_stop() any longer. Such function scans
the global lu_device_types list, and for each type item on it
which has zerod lu_device_type::ldt_device_nr, call its stop()
method. In fact, the lu_device_type::ldt_device_nr only will be
zero when the last lu_device_fini() is called, and at that time,
inside the lu_device_fini(), its stop() method will be called.
So it is unnecessary to call the stop() again via lu_types_stop().
Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4604
Reviewed-on: http://review.whamcloud.com/8694
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>