git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

]> git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

projects / mirror_ubuntu-zesty-kernel.git / log

Yan, Zheng [Fri, 13 May 2016 09:29:51 +0000 (17:29 +0800)]

ceph: handle interrupted ceph_writepage()

writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 13 May 2016 03:30:24 +0000 (11:30 +0800)]

ceph: make ceph_update_writeable_page() uninterruptible

ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 13 May 2016 03:04:33 +0000 (11:04 +0800)]

libceph: make ceph_osdc_wait_request() uninterruptible

Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most
cases, the sync IO should be uninterruptible. The fix is use killale
wait function in ceph_osdc_wait_request().

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Tue, 10 May 2016 11:09:06 +0000 (19:09 +0800)]

ceph: handle -EAGAIN returned by ceph_update_writeable_page()

when ceph_update_writeable_page() return -EAGAIN, caller should
lock the page and call ceph_update_writeable_page() again.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Tue, 10 May 2016 10:59:13 +0000 (18:59 +0800)]

ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Tue, 10 May 2016 10:40:28 +0000 (18:40 +0800)]

ceph: block non-fatal signals for fault/page_mkwrite

Fault and page_mkwrite are supposed to be uninterruptable. But they
call ceph functions that are interruptible. So they should block
signals before calling functions that are interruptible

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Zhang Zhuoyu [Fri, 25 Mar 2016 09:18:39 +0000 (05:18 -0400)]

ceph: make logical calculation functions return bool

This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.

No functional change.

Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 5 May 2016 08:40:17 +0000 (16:40 +0800)]

ceph: tolerate bad i_size for symlink inode

A mds bug can cause symlink's size to be truncated to zero.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Wed, 4 May 2016 03:40:30 +0000 (11:40 +0800)]

ceph: improve fragtree change detection

check if number of splits in i_fragtree is equal to number of splits
in mds reply

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Wed, 4 May 2016 03:05:10 +0000 (11:05 +0800)]

ceph: keep leaf frag when updating fragtree

Nodes in i_fragtree are sorted according to ceph_compare_frag().
It means frag node in i_fragtree always follow its direct parent
node. To check if a leaf node is valid, we just need to check if
it's child of previous split node.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Tue, 3 May 2016 14:33:20 +0000 (22:33 +0800)]

ceph: fix dir_auth check in ceph_fill_dirfrag()

-1 is CDIR_AUTH_PARENT, it means dir's auth mds is the same as
inode's auth mds

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Tue, 3 May 2016 12:55:50 +0000 (20:55 +0800)]

ceph: don't assume frag tree splits in mds reply are sorted

The algorithm that updates i_fragtree relies on that the frag tree
splits in mds reply are of the same order of i_fragtree. This is not
true because current MDS encodes frag tree splits in ascending order
of (unsigned)frag_t. But nodes in i_fragtree are sorted according to
ceph_frag_compare().

The fix is sort the frag tree splits first, then updates i_fragtree.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 29 Apr 2016 15:40:23 +0000 (23:40 +0800)]

ceph: fix inode reference leak

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 29 Apr 2016 03:27:30 +0000 (11:27 +0800)]

ceph: using hash value to compose dentry offset

If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:

(0xff << 52) | ((24 bits hash) << 28) |
(the nth entry hash hash collision)

This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 28 Apr 2016 14:56:44 +0000 (22:56 +0800)]

ceph: don't forbid marking directory complete after forward seek

Forward seek within same frag does not update fi->last_name, it will
not affect contents of later readdir reply. So there is no need to
forbid marking directory complete

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 28 Apr 2016 07:17:40 +0000 (15:17 +0800)]

ceph: record 'offset' for each entry of readdir result

This is preparation for using hash value as dentry 'offset'

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Wed, 27 Apr 2016 09:48:30 +0000 (17:48 +0800)]

ceph: define 'end/complete' in readdir reply as bit flags

Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 28 Apr 2016 01:37:39 +0000 (09:37 +0800)]

ceph: define struct for dir entry in readdir reply

This avoids defining multiple arrays for entries in readdir reply

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Wed, 27 Apr 2016 09:32:34 +0000 (17:32 +0800)]

ceph: simplify 'offset in frag'

don't distinguish leftmost frag from other frags. always use 2 as
first entry's offset.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 29 Apr 2016 07:58:32 +0000 (15:58 +0800)]

ceph: remove unnecessary checks in __dcache_readdir

we never add snapdir and the hidden .ceph dir into readdir cache

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 28 Apr 2016 09:43:35 +0000 (17:43 +0800)]

ceph: search cache postion for dcache readdir

use binary search to find cache index that corresponds to readdir
postion.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 21 Apr 2016 04:11:54 +0000 (12:11 +0800)]

ceph: use CEPH_MDS_OP_RMXATTR request to remove xattr

Setxattr with NULL value and XATTR_REPLACE flag should be equivalent
to removexattr. But current MDS does not support deleting vxattrs through
MDS_OP_SETXATTR request. The workaround is sending MDS_OP_RMXATTR request
if setxattr actually removs xattr.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 21 Apr 2016 03:09:55 +0000 (11:09 +0800)]

ceph: report mount root in session metadata

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Mon, 18 Apr 2016 08:51:37 +0000 (16:51 +0800)]

ceph: don't show symlink target in debugfs/mdsc

symlink target is useless for debug and can be very long. It's annoying
to show it in debugfs/mdsc.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 15 Apr 2016 05:56:12 +0000 (13:56 +0800)]

ceph: don't call truncate_pagecache in ceph_writepages_start

truncate_pagecache() may decrease inode's reference. This can cause
deadlock if inode's last reference is dropped and iput_final() wants
to evict the inode. (evict() calls inode_wait_for_writeback(), which
waits for ceph_writepages_start() to return).

The fix is use work thead to truncate dirty pages. Also add 'forced
umount' check to ceph_update_writeable_page(), which prevents new
pages getting dirty.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Fri, 8 Apr 2016 07:27:16 +0000 (15:27 +0800)]

ceph: renew caps for read/write if mds session got killed.

When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Thu, 31 Mar 2016 07:53:01 +0000 (15:53 +0800)]

ceph: CEPH_FEATURE_MDSENC support

Signed-off-by: Yan, Zheng <zyan@redhat.com>

commit | commitdiff | tree

Yan, Zheng [Wed, 30 Mar 2016 09:18:34 +0000 (17:18 +0800)]

ceph: multiple filesystem support

To access non-default filesystem, we just need to subscribe to
mdsmap.<MDS_NAMESPACE_ID> and add a new mount option for mds
namespace id.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: switch to a new libceph API]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

Ilya Dryomov [Wed, 25 May 2016 22:05:01 +0000 (00:05 +0200)]

libceph: support for subscribing to "mdsmap.<id>" maps

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:28 +0000 (16:07 +0200)]

libceph: replace ceph_monc_request_next_osdmap()

... with a wrapper around maybe_request_map() - no need for two
osdmap-specific functions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree