- :term:`Ceph Monitor`
- :term:`Ceph OSD Daemon`
-.. ditaa:: +---------------+ +---------------+
+.. ditaa::
+
+ +---------------+ +---------------+
| OSDs | | Monitors |
+---------------+ +---------------+
The Ceph Storage Cluster receives data from :term:`Ceph Clients`--whether it
comes through a :term:`Ceph Block Device`, :term:`Ceph Object Storage`, the
-:term:`Ceph Filesystem` or a custom implementation you create using
+:term:`Ceph File System` or a custom implementation you create using
``librados``--and it stores the data as objects. Each object corresponds to a
file in a filesystem, which is stored on an :term:`Object Storage Device`. Ceph
OSD Daemons handle the read/write operations on the storage disks.
-.. ditaa:: /-----\ +-----+ +-----+
+.. ditaa::
+
+ /-----\ +-----+ +-----+
| obj |------>| {d} |------>| {s} |
\-----/ +-----+ +-----+
forth.
-.. ditaa:: /------+------------------------------+----------------\
+.. ditaa::
+
+ /------+------------------------------+----------------\
| ID | Binary Data | Metadata |
+------+------------------------------+----------------+
| 1234 | 0101010101010100110101010010 | name1 = value1 |
.. note:: The ``client.admin`` user must provide the user ID and
secret key to the user in a secure manner.
-.. ditaa:: +---------+ +---------+
+.. ditaa::
+
+ +---------+ +---------+
| Client | | Monitor |
+---------+ +---------+
| request to |
ticket and uses it to sign requests to OSDs and metadata servers throughout the
cluster.
-.. ditaa:: +---------+ +---------+
+.. ditaa::
+
+ +---------+ +---------+
| Client | | Monitor |
+---------+ +---------+
| authenticate |
subsequent to the initial authentication, is signed using a ticket that the
monitors, OSDs and metadata servers can verify with their shared secret.
-.. ditaa:: +---------+ +---------+ +-------+ +-------+
+.. ditaa::
+
+ +---------+ +---------+ +-------+ +-------+
| Client | | Monitor | | MDS | | OSD |
+---------+ +---------+ +-------+ +-------+
| request to | | |
and tertiary OSDs (as many OSDs as additional replicas), and responds to the
client once it has confirmed the object was stored successfully.
-.. ditaa::
+.. ditaa::
+
+----------+
| Client |
| |
pools. The pool's ``size`` or number of replicas, the CRUSH rule and the
number of placement groups determine how Ceph will place the data.
-.. ditaa::
+.. ditaa::
+
+--------+ Retrieves +---------------+
| Client |------------>| Cluster Map |
+--------+ +---------------+
come online. The following diagram depicts how CRUSH maps objects to placement
groups, and placement groups to OSDs.
-.. ditaa::
+.. ditaa::
+
/-----\ /-----\ /-----\ /-----\ /-----\
| obj | | obj | | obj | | obj | | obj |
\-----/ \-----/ \-----/ \-----/ \-----/
new OSD after rebalancing is complete.
-.. ditaa::
+.. ditaa::
+
+--------+ +--------+
Before | OSD 1 | | OSD 2 |
+--------+ +--------+
.. ditaa::
+
+-------------------+
name | NYAN |
+-------------------+
account.
.. ditaa::
+
+-------------------+
name | NYAN |
+-------------------+
authoritative version of the placement group logs.
In the following diagram, an erasure coded placement group has been created with
-``K = 2 + M = 1`` and is supported by three OSDs, two for ``K`` and one for
+``K = 2, M = 1`` and is supported by three OSDs, two for ``K`` and one for
``M``. The acting set of the placement group is made of **OSD 1**, **OSD 2** and
**OSD 3**. An object has been encoded and stored in the OSDs : the chunk
``D1v1`` (i.e. Data chunk number 1, version 1) is on **OSD 1**, ``D2v1`` on
.. ditaa::
+
Primary OSD
+-------------+
on **OSD 3**.
.. ditaa::
+
Primary OSD
+-------------+
will be the head of the new authoritative log.
.. ditaa::
+
+-------------+
| OSD 1 |
| (down) |
.. ditaa::
+
Primary OSD
+-------------+
to Ceph clients.
-.. ditaa::
+.. ditaa::
+
+-------------+
| Ceph Client |
+------+------+
you can create your own custom Ceph Clients. The following diagram depicts the
basic architecture.
-.. ditaa::
+.. ditaa::
+
+---------------------------------+
| Ceph Storage Cluster Protocol |
| (librados) |
synchronization/communication channel.
-.. ditaa:: +----------+ +----------+ +----------+ +---------------+
+.. ditaa::
+
+ +----------+ +----------+ +----------+ +---------------+
| Client 1 | | Client 2 | | Client 3 | | OSD:Object ID |
+----------+ +----------+ +----------+ +---------------+
| | | |
volume'. Ceph's striping offers the throughput of RAID 0 striping, the
reliability of n-way RAID mirroring and faster recovery.
-Ceph provides three types of clients: Ceph Block Device, Ceph Filesystem, and
+Ceph provides three types of clients: Ceph Block Device, Ceph File System, and
Ceph Object Storage. A Ceph Client converts its data from the representation
format it provides to its users (a block device image, RESTful objects, CephFS
filesystem directories) into objects for storage in the Ceph Storage Cluster.
.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
- Ceph Object Storage, Ceph Block Device, and the Ceph Filesystem stripe their
+ Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
data over multiple Ceph Storage Cluster objects. Ceph Clients that write
directly to the Ceph Storage Cluster via ``librados`` must perform the
striping (and parallel I/O) for themselves to obtain these benefits.
groups, and consequently doesn't improve performance very much. The following
diagram depicts the simplest form of striping:
-.. ditaa::
+.. ditaa::
+
+---------------+
| Client Data |
| Format |
stripe (``stripe unit 16``) in the first object in the new object set (``object
4`` in the diagram below).
-.. ditaa::
+.. ditaa::
+
+---------------+
| Client Data |
| Format |
provides RESTful APIs with interfaces that are compatible with Amazon S3
and OpenStack Swift.
-- **Filesystem**: The :term:`Ceph Filesystem` (CephFS) service provides
+- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
a POSIX compliant filesystem usable with ``mount`` or as
a filesystem in user space (FUSE).
architecture.
.. ditaa::
+
+--------------+ +----------------+ +-------------+
| Block Device | | Object Storage | | CephFS |
+--------------+ +----------------+ +-------------+
Device kernel object(s). This is done with the command-line tool ``rbd``.
-.. index:: CephFS; Ceph Filesystem; libcephfs; MDS; metadata server; ceph-mds
+.. index:: CephFS; Ceph File System; libcephfs; MDS; metadata server; ceph-mds
.. _arch-cephfs:
-Ceph Filesystem
----------------
+Ceph File System
+----------------
-The Ceph Filesystem (CephFS) provides a POSIX-compliant filesystem as a
+The Ceph File System (CephFS) provides a POSIX-compliant filesystem as a
service that is layered on top of the object-based Ceph Storage Cluster.
CephFS files get mapped to objects that Ceph stores in the Ceph Storage
Cluster. Ceph Clients mount a CephFS filesystem as a kernel object or as
a Filesystem in User Space (FUSE).
.. ditaa::
+
+-----------------------+ +------------------------+
| CephFS Kernel Object | | CephFS FUSE |
+-----------------------+ +------------------------+
+---------------+ +---------------+ +---------------+
-The Ceph Filesystem service includes the Ceph Metadata Server (MDS) deployed
+The Ceph File System service includes the Ceph Metadata Server (MDS) deployed
with the Ceph Storage cluster. The purpose of the MDS is to store all the
filesystem metadata (directories, file ownership, access modes, etc) in
high-availability Ceph Metadata Servers where the metadata resides in memory.
The reason for the MDS (a daemon called ``ceph-mds``) is that simple filesystem
operations like listing a directory or changing a directory (``ls``, ``cd``)
would tax the Ceph OSD Daemons unnecessarily. So separating the metadata from
-the data means that the Ceph Filesystem can provide high performance services
+the data means that the Ceph File System can provide high performance services
without taxing the Ceph Storage Cluster.
CephFS separates the metadata from the data, storing the metadata in the MDS,