]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | .. _rbd-exclusive-locks: |
2 | ||
3 | ==================== | |
4 | RBD Exclusive Locks | |
5 | ==================== | |
6 | ||
7 | .. index:: Ceph Block Device; RBD exclusive locks; exclusive-lock | |
8 | ||
39ae355f TL |
9 | Exclusive locks are mechanisms designed to prevent multiple processes from |
10 | accessing the same Rados Block Device (RBD) in an uncoordinated fashion. | |
11 | Exclusive locks are used heavily in virtualization (where they prevent VMs from | |
12 | clobbering each other's writes) and in `RBD mirroring`_ (where they are a | |
13 | prerequisite for journaling in journal-based mirroring and fast generation of | |
14 | incremental diffs in snapshot-based mirroring). | |
15 | ||
16 | The ``exclusive-lock`` feature is enabled on newly created images. This default | |
17 | can be overridden via the ``rbd_default_features`` configuration option or the | |
18 | ``--image-feature`` and ``--image-shared`` options for ``rbd create`` command. | |
19 | ||
20 | .. note:: | |
21 | Many image features, including ``object-map`` and ``fast-diff``, depend upon | |
22 | exclusive locking. Disabling the ``exclusive-lock`` feature will negatively | |
23 | affect the performance of some operations. | |
24 | ||
25 | To maintain multi-client access, the ``exclusive-lock`` feature implements | |
26 | automatic cooperative lock transitions between clients. It ensures that only | |
27 | a single client can write to an RBD image at any given time and thus protects | |
28 | internal image structures such as the object map, the journal or the `PWL | |
29 | cache`_ from concurrent modification. | |
30 | ||
31 | Exclusive locking is mostly transparent to the user: | |
32 | ||
33 | * Whenever a client (a ``librbd`` process or, in case of a ``krbd`` client, | |
34 | a client node's kernel) needs to handle a write to an RBD image on which | |
35 | exclusive locking has been enabled, it first acquires an exclusive lock on | |
36 | the image. If the lock is already held by some other client, that client is | |
37 | requested to release it. | |
38 | ||
39 | * Whenever a client that holds an exclusive lock on an RBD image gets | |
40 | a request to release the lock, it stops handling writes, flushes its caches | |
41 | and releases the lock. | |
42 | ||
43 | * Whenever a client that holds an exclusive lock on an RBD image terminates | |
44 | gracefully, the lock is also released gracefully. | |
45 | ||
46 | * A graceful release of an exclusive lock on an RBD image (whether by request | |
47 | or due to client termination) enables another, subsequent, client to acquire | |
48 | the lock and start handling writes. | |
49 | ||
50 | .. warning:: | |
51 | By default, the ``exclusive-lock`` feature does not prevent two or more | |
52 | concurrently running clients from opening the same RBD image and writing to | |
53 | it in turns (whether on the same node or not). In effect, their writes just | |
54 | get linearized as the lock is automatically transitioned back and forth in | |
55 | a cooperative fashion. | |
56 | ||
57 | .. note:: | |
58 | To disable automatic lock transitions between clients, the | |
59 | ``RBD_LOCK_MODE_EXCLUSIVE`` flag may be specified when acquiring the | |
60 | exclusive lock. This is exposed by the ``--exclusive`` option for ``rbd | |
61 | device map`` command. | |
9f95a23c TL |
62 | |
63 | ||
20effc67 | 64 | Blocklisting |
9f95a23c TL |
65 | ============ |
66 | ||
39ae355f TL |
67 | Sometimes a client that previously held an exclusive lock on an RBD image does |
68 | not terminate gracefully, but dies abruptly. This may be because the client | |
69 | process received a ``KILL`` or ``ABRT`` signal, or because the client node | |
70 | underwent a hard reboot or suffered a power failure. In cases like this, the | |
71 | lock is never gracefully released. This means that any new client that comes up | |
72 | and attempts to write to the image must break the previously held exclusive | |
73 | lock. | |
74 | ||
75 | However, a process (or kernel thread) may hang or merely lose network | |
76 | connectivity to the Ceph cluster for some amount of time. In that case, | |
77 | breaking the lock would be potentially catastrophic: the hung process or | |
78 | connectivity issue could resolve itself and the original process might then | |
79 | compete with one that started in the interim, thus accessing RBD data in an | |
80 | uncoordinated and destructive manner. | |
81 | ||
82 | In the event that a lock cannot be acquired in the standard graceful manner, | |
83 | the overtaking process not only breaks the lock but also blocklists the | |
84 | previous lock holder. This is negotiated between the new client process and the | |
85 | Ceph Monitor. | |
86 | ||
87 | * Upon receiving the blocklist request, the monitor instructs the relevant OSDs | |
88 | to no longer serve requests from the old client process; | |
89 | * after the associated OSD map update is complete, the new client can break the | |
90 | previously held lock; | |
91 | * after the new client has acquired the lock, it can commence writing | |
9f95a23c TL |
92 | to the image. |
93 | ||
f67539c2 | 94 | Blocklisting is thus a form of storage-level resource `fencing`_. |
9f95a23c | 95 | |
39ae355f TL |
96 | .. note:: |
97 | In order for blocklisting to work, the client must have the ``osd | |
98 | blocklist`` capability. This capability is included in the ``profile | |
99 | rbd`` capability profile, which should be set generally on all Ceph | |
100 | :ref:`client identities <user-management>` using RBD. | |
9f95a23c | 101 | |
39ae355f TL |
102 | .. _RBD mirroring: ../rbd-mirroring |
103 | .. _PWL cache: ../rbd-persistent-write-log-cache | |
9f95a23c | 104 | .. _fencing: https://en.wikipedia.org/wiki/Fencing_(computing) |