ceph/doc/rbd/rbd-exclusive-locks.rst

   1 .. _rbd-exclusive-locks:
   2
   3 ====================
   4  RBD Exclusive Locks
   5 ====================
   6
   7 .. index:: Ceph Block Device; RBD exclusive locks; exclusive-lock
   8
   9 Exclusive locks are a mechanism designed to prevent multiple processes
  10 from accessing the same Rados Block Device (RBD) in an uncoordinated
  11 fashion. Exclusive locks are heavily used in virtualization (where
  12 they prevent VMs from clobbering each others' writes), and also in RBD
  13 mirroring (where they are a prerequisite for journaling).
  14
  15 Exclusive locks are enabled on newly created images by default, unless
  16 overridden via the ``rbd_default_features`` configuration option or
  17 the ``--image-feature`` flag for ``rbd create``.
  18
  19 In order to ensure proper exclusive locking operations, any client
  20 using an RBD image whose ``exclusive-lock`` feature is enabled should
  21 be using a CephX identity whose capabilities include ``profile rbd``.
  22
  23 Exclusive locking is mostly transparent to the user.
  24
  25 #. Whenever any ``librbd`` client process or kernel RBD client
  26    starts using an RBD image on which exclusive locking has been
  27    enabled, it obtains an exclusive lock on the image before the first
  28    write.
  29
  30 #. Whenever any such client process gracefully terminates, it
  31    automatically relinquishes the lock.
  32
  33 #. This subsequently enables another process to acquire the lock, and
  34    write to the image.
  35
  36 Note that it is perfectly possible for two or more concurrently
  37 running processes to merely open the image, and also to read from
  38 it. The client acquires the exclusive lock only when attempting to
  39 write to the image. To disable transparent lock transitions between
  40 multiple clients, it needs to acquire the lock specifically with
  41 ``RBD_LOCK_MODE_EXCLUSIVE``.
  42
  43
  44 Blocklisting
  45 ============
  46
  47 Sometimes, a client process (or, in case of a krbd client, a client
  48 node's kernel thread) that previously held an exclusive lock on an
  49 image does not terminate gracefully, but dies abruptly. This may be
  50 due to having received a ``KILL`` or ``ABRT`` signal, for example, or
  51 a hard reboot or power failure of the client node. In that case, the
  52 exclusive lock is never gracefully released. Thus, when a new process
  53 starts and attempts to use the device, it needs a way to break the
  54 previously held exclusive lock.
  55
  56 However, a process (or kernel thread) may also hang, or merely lose
  57 network connectivity to the Ceph cluster for some amount of time. In
  58 that case, simply breaking the lock would be potentially catastrophic:
  59 the hung process or connectivity issue may resolve itself, and the old
  60 process may then compete with one that has started in the interim,
  61 accessing RBD data in an uncoordinated and destructive manner.
  62
  63 Thus, in the event that a lock cannot be acquired in the standard
  64 graceful manner, the overtaking process not only breaks the lock, but
  65 also blocklists the previous lock holder. This is negotiated between
  66 the new client process and the Ceph Mon: upon receiving the blocklist
  67 request,
  68
  69 * the Mon instructs the relevant OSDs to no longer serve requests from
  70   the old client process;
  71 * once the associated OSD map update is complete, the Mon grants the
  72   lock to the new client;
  73 * once the new client has acquired the lock, it can commence writing
  74   to the image.
  75
  76 Blocklisting is thus a form of storage-level resource `fencing`_.
  77
  78 In order for blocklisting to work, the client must have the ``osd
  79 blocklist`` capability. This capability is included in the ``profile
  80 rbd`` capability profile, which should generally be set on all Ceph
  81 :ref:`client identities <user-management>` using RBD.
  82
  83 .. _fencing: https://en.wikipedia.org/wiki/Fencing_(computing)