[ceph.git] / ceph / doc / rados / operations / add-or-rm-osds.rst

======================
 Adding/Removing OSDs
======================

When you have a cluster up and running, you may add OSDs or remove OSDs
from the cluster at runtime. 

Adding OSDs
===========

When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an
OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a
host machine. If your host has multiple storage drives, you may map one
``ceph-osd`` daemon for each drive.

Generally, it's a good idea to check the capacity of your cluster to see if you
are reaching the upper end of its capacity. As your cluster reaches its ``near
full`` ratio, you should add one or more OSDs to expand your cluster's capacity.

.. warning:: Do not let your cluster reach its ``full ratio`` before
   adding an OSD. OSD failures that occur after the cluster reaches 
   its ``near full`` ratio may cause the cluster to exceed its
   ``full ratio``.

Deploy your Hardware
--------------------

If you are adding a new host when adding a new OSD,  see `Hardware
Recommendations`_ for details on minimum recommendations for OSD hardware. To
add an OSD host to your cluster, first make sure you have an up-to-date version
of Linux installed, and you have made some initial preparations for your 
storage drives.  See `Filesystem Recommendations`_ for details.

Add your OSD host to a rack in your cluster, connect it to the network
and ensure that it has network connectivity. See the `Network Configuration
Reference`_ for details.

.. _Hardware Recommendations: ../../../start/hardware-recommendations
.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
.. _Network Configuration Reference: ../../configuration/network-config-ref

Install the Required Software
-----------------------------

For manually deployed clusters, you must install Ceph packages
manually. See `Installing Ceph (Manual)`_ for details.
You should configure SSH to a user with password-less authentication
and root permissions.

.. _Installing Ceph (Manual): ../../../install


Adding an OSD (Manual)
----------------------

This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive,
and configures the cluster to distribute data to the OSD. If your host has
multiple drives, you may add an OSD for each drive by repeating this procedure.

To add an OSD, create a data directory for it, mount a drive to that directory, 
add the OSD to the cluster, and then add it to the CRUSH map.

When you add the OSD to the CRUSH map, consider the weight you give to the new
OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger
hard drives than older hosts in the cluster (i.e., they may have greater 
weight).

.. tip:: Ceph prefers uniform hardware across pools. If you are adding drives
   of dissimilar size, you can adjust their weights. However, for best 
   performance, consider a CRUSH hierarchy with drives of the same type/size.

#. Create the OSD. If no UUID is given, it will be set automatically when the 
   OSD starts up. The following command will output the OSD number, which you 
   will need for subsequent steps. ::
	
	ceph osd create [{uuid} [{id}]]

   If the optional parameter {id} is given it will be used as the OSD id.
   Note, in this case the command may fail if the number is already in use.

   .. warning:: In general, explicitly specifying {id} is not recommended.
      IDs are allocated as an array, and skipping entries consumes some extra
      memory. This can become significant if there are large gaps and/or
      clusters are large. If {id} is not specified, the smallest available is
      used.

#. Create the default directory on your new OSD. :: 

	ssh {new-osd-host}
	sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
	

#. If the OSD is for a drive other than the OS drive, prepare it 
   for use with Ceph, and mount it to the directory you just created:: 

	ssh {new-osd-host}
	sudo mkfs -t {fstype} /dev/{drive}
	sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}

	
#. Initialize the OSD data directory. :: 

	ssh {new-osd-host}
	ceph-osd -i {osd-num} --mkfs --mkkey
	
   The directory must be empty before you can run ``ceph-osd``.

#. Register the OSD authentication key. The value of ``ceph`` for 
   ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.  If your 
   cluster name differs from ``ceph``, use your cluster name instead.::

	ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring


#. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The 
   ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy 
   wherever you wish. If you specify at least one bucket, the command 
   will place the OSD into the most specific bucket you specify, *and* it will 
   move that bucket underneath any other buckets you specify. **Important:** If 
   you specify only the root bucket, the command will attach the OSD directly 
   to the root, but CRUSH rules expect OSDs to be inside of hosts.
      
   For Argonaut (v 0.48), execute the following::

	ceph osd crush add {id} {name} {weight}  [{bucket-type}={bucket-name} ...]

   For Bobtail (v 0.56) and later releases, execute the following:: 

	ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]

   You may also decompile the CRUSH map, add the OSD to the device list, add the 
   host as a bucket (if it's not already in the CRUSH map), add the device as an 
   item in the host, assign it a weight, recompile it and set it. See 
   `Add/Move an OSD`_ for details.


.. topic:: Argonaut (v0.48) Best Practices

 To limit impact on user I/O performance, add an OSD to the CRUSH map
 with an initial weight of ``0``. Then, ramp up the CRUSH weight a
 little bit at a time.  For example, to ramp by increments of ``0.2``,
 start with::

      ceph osd crush reweight {osd-id} .2

 and allow migration to complete before reweighting to ``0.4``,
 ``0.6``, and so on until the desired CRUSH weight is reached.

 To limit the impact of OSD failures, you can set::

      mon osd down out interval = 0

 which prevents down OSDs from automatically being marked out, and then
 ramp them down manually with::

      ceph osd reweight {osd-num} .8

 Again, wait for the cluster to finish migrating data, and then adjust
 the weight further until you reach a weight of 0.  Note that this
 problem prevents the cluster to automatically re-replicate data after
 a failure, so please ensure that sufficient monitoring is in place for
 an administrator to intervene promptly.

 Note that this practice will no longer be necessary in Bobtail and
 subsequent releases.


Starting the OSD
----------------

After you add an OSD to Ceph, the OSD is in your configuration. However, 
it is not yet running. The OSD is ``down`` and ``in``. You must start 
your new OSD before it can begin receiving data. You may use
``service ceph`` from your admin host or start the OSD from its host
machine.

For Ubuntu Trusty use Upstart. ::

	sudo start ceph-osd id={osd-num}

For all other distros use systemd. ::

	sudo systemctl start ceph-osd@{osd-num}


Once you start your OSD, it is ``up`` and ``in``.


Observe the Data Migration
--------------------------

Once you have added your new OSD to the CRUSH map, Ceph  will begin rebalancing
the server by migrating placement groups to your new OSD. You can observe this
process with  the `ceph`_ tool. :: 

	ceph -w

You should see the placement group states change from ``active+clean`` to
``active, some degraded objects``, and finally ``active+clean`` when migration
completes. (Control-c to exit.)


.. _Add/Move an OSD: ../crush-map#addosd
.. _ceph: ../monitoring


Removing OSDs (Manual)
======================

When you want to reduce the size of a cluster or replace hardware, you may
remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd``
daemon for one storage drive within a host machine. If your host has multiple
storage drives, you may need to remove one ``ceph-osd`` daemon for each drive.
Generally, it's a good idea to check the capacity of your cluster to see if you
are reaching the upper end of its capacity. Ensure that when you remove an OSD
that your cluster is not at its ``near full`` ratio.

.. warning:: Do not let your cluster reach its ``full ratio`` when
   removing an OSD. Removing OSDs could cause the cluster to reach 
   or exceed its ``full ratio``.
   

Take the OSD out of the Cluster
-----------------------------------

Before you remove an OSD, it is usually ``up`` and ``in``.  You need to take it
out of the cluster so that Ceph can begin rebalancing and copying its data to
other OSDs. :: 

	ceph osd out {osd-num}


Observe the Data Migration
--------------------------

Once you have taken your OSD ``out`` of the cluster, Ceph  will begin
rebalancing the cluster by migrating placement groups out of the OSD you
removed. You can observe  this process with  the `ceph`_ tool. :: 

	ceph -w

You should see the placement group states change from ``active+clean`` to
``active, some degraded objects``, and finally ``active+clean`` when migration
completes. (Control-c to exit.)

.. note:: Sometimes, typically in a "small" cluster with few hosts (for
   instance with a small testing cluster), the fact to take ``out`` the
   OSD can spawn a CRUSH corner case where some PGs remain stuck in the
   ``active+remapped`` state. If you are in this case, you should mark
   the OSD ``in`` with:

       ``ceph osd in {osd-num}``

   to come back to the initial state and then, instead of marking ``out``
   the OSD, set its weight to 0 with:

       ``ceph osd crush reweight osd.{osd-num} 0``

   After that, you can observe the data migration which should come to its
   end. The difference between marking ``out`` the OSD and reweighting it
   to 0 is that in the first case the weight of the bucket which contains
   the OSD isn't changed whereas in the second case the weight of the bucket
   is updated (and decreased of the OSD weight). The reweight command could
   be sometimes favoured in the case of a "small" cluster.


Stopping the OSD
----------------

After you take an OSD out of the cluster, it may still be running. 
That is, the OSD may be ``up`` and ``out``. You must stop 
your OSD before you remove it from the configuration. :: 

	ssh {osd-host}
	sudo systemctl stop ceph-osd@{osd-num}

Once you stop your OSD, it is ``down``. 


Removing the OSD
----------------

This procedure removes an OSD from a cluster map, removes its authentication
key, removes the OSD from the OSD map, and removes the OSD from the
``ceph.conf`` file. If your host has multiple drives, you may need to remove an
OSD for each drive by repeating this procedure.


#. Remove the OSD from the CRUSH map so that it no longer receives data. You may
   also decompile the CRUSH map, remove the OSD from the device list, remove the
   device as an item in the host bucket or remove the host  bucket (if it's in the
   CRUSH map and you intend to remove the host), recompile the map and set it. 
   See `Remove an OSD`_ for details. :: 

	ceph osd crush remove {name}
	
#. Remove the OSD authentication key. ::

	ceph auth del osd.{osd-num}
	
   The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. 
   If your cluster name differs from ``ceph``, use your cluster name instead.	
	
#. Remove the OSD. ::

	ceph osd rm {osd-num}
	#for example
	ceph osd rm 1
	
#. Navigate to the host where you keep the master copy of the cluster's 
   ``ceph.conf`` file. ::

	ssh {admin-host}
	cd /etc/ceph
	vim ceph.conf

#. Remove the OSD entry from your ``ceph.conf`` file (if it exists). ::

	[osd.1]
		host = {hostname}
 
#. From the host where you keep the master copy of the cluster's ``ceph.conf`` file, 
   copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other 
   hosts in your cluster.
   
		
.. _Remove an OSD: ../crush-map#removeosd
Commit	Line	Data
7c673cae FG	1	======================
	2	Adding/Removing OSDs
	3	======================
	4
	5	When you have a cluster up and running, you may add OSDs or remove OSDs
	6	from the cluster at runtime.
	7
	8	Adding OSDs
	9	===========
	10
	11	When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an
	12	OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a
	13	host machine. If your host has multiple storage drives, you may map one
	14	``ceph-osd`` daemon for each drive.
	15
	16	Generally, it's a good idea to check the capacity of your cluster to see if you
	17	are reaching the upper end of its capacity. As your cluster reaches its ``near
	18	full`` ratio, you should add one or more OSDs to expand your cluster's capacity.
	19
	20	.. warning:: Do not let your cluster reach its ``full ratio`` before
	21	adding an OSD. OSD failures that occur after the cluster reaches
	22	its ``near full`` ratio may cause the cluster to exceed its
	23	``full ratio``.
	24
	25	Deploy your Hardware
	26	--------------------
	27
	28	If you are adding a new host when adding a new OSD, see `Hardware
	29	Recommendations`_ for details on minimum recommendations for OSD hardware. To
	30	add an OSD host to your cluster, first make sure you have an up-to-date version
	31	of Linux installed, and you have made some initial preparations for your
	32	storage drives. See `Filesystem Recommendations`_ for details.
	33
	34	Add your OSD host to a rack in your cluster, connect it to the network
	35	and ensure that it has network connectivity. See the `Network Configuration
	36	Reference`_ for details.
	37
	38	.. _Hardware Recommendations: ../../../start/hardware-recommendations
	39	.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
	40	.. _Network Configuration Reference: ../../configuration/network-config-ref
	41
	42	Install the Required Software
	43	-----------------------------
	44
	45	For manually deployed clusters, you must install Ceph packages
	46	manually. See `Installing Ceph (Manual)`_ for details.
	47	You should configure SSH to a user with password-less authentication
	48	and root permissions.
	49
	50	.. _Installing Ceph (Manual): ../../../install
	51
	52
	53	Adding an OSD (Manual)
	54	----------------------
	55
	56	This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive,
	57	and configures the cluster to distribute data to the OSD. If your host has
	58	multiple drives, you may add an OSD for each drive by repeating this procedure.
	59
	60	To add an OSD, create a data directory for it, mount a drive to that directory,
	61	add the OSD to the cluster, and then add it to the CRUSH map.
	62
	63	When you add the OSD to the CRUSH map, consider the weight you give to the new
	64	OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger
65	hard drives than older hosts in the cluster (i.e., they may have greater
66	weight).
67
68	.. tip:: Ceph prefers uniform hardware across pools. If you are adding drives
69	of dissimilar size, you can adjust their weights. However, for best
70	performance, consider a CRUSH hierarchy with drives of the same type/size.
71
72	#. Create the OSD. If no UUID is given, it will be set automatically when the
73	OSD starts up. The following command will output the OSD number, which you
74	will need for subsequent steps. ::
75
76	ceph osd create [{uuid} [{id}]]
77
78	If the optional parameter {id} is given it will be used as the OSD id.
79	Note, in this case the command may fail if the number is already in use.
80
81	.. warning:: In general, explicitly specifying {id} is not recommended.
82	IDs are allocated as an array, and skipping entries consumes some extra
83	memory. This can become significant if there are large gaps and/or
84	clusters are large. If {id} is not specified, the smallest available is
85	used.
86
87	#. Create the default directory on your new OSD. ::
88
89	ssh {new-osd-host}
90	sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
91
92
93	#. If the OSD is for a drive other than the OS drive, prepare it
94	for use with Ceph, and mount it to the directory you just created::
95
96	ssh {new-osd-host}
97	sudo mkfs -t {fstype} /dev/{drive}
98	sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
99
100
101	#. Initialize the OSD data directory. ::
102
103	ssh {new-osd-host}
104	ceph-osd -i {osd-num} --mkfs --mkkey
105
106	The directory must be empty before you can run ``ceph-osd``.
107
108	#. Register the OSD authentication key. The value of ``ceph`` for
109	``ceph-{osd-num}`` in the path is the ``$cluster-$id``. If your
110	cluster name differs from ``ceph``, use your cluster name instead.::
111
112	ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
113
114
115	#. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The
116	``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy
117	wherever you wish. If you specify at least one bucket, the command
118	will place the OSD into the most specific bucket you specify, and it will
119	move that bucket underneath any other buckets you specify. Important: If
120	you specify only the root bucket, the command will attach the OSD directly
121	to the root, but CRUSH rules expect OSDs to be inside of hosts.
122
123	For Argonaut (v 0.48), execute the following::
124
125	ceph osd crush add {id} {name} {weight} [{bucket-type}={bucket-name} ...]
126
127	For Bobtail (v 0.56) and later releases, execute the following::
128
129	ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
130
131	You may also decompile the CRUSH map, add the OSD to the device list, add the
132	host as a bucket (if it's not already in the CRUSH map), add the device as an
133	item in the host, assign it a weight, recompile it and set it. See
134	`Add/Move an OSD`_ for details.
135
136
137	.. topic:: Argonaut (v0.48) Best Practices
138
139	To limit impact on user I/O performance, add an OSD to the CRUSH map
140	with an initial weight of ``0``. Then, ramp up the CRUSH weight a
141	little bit at a time. For example, to ramp by increments of ``0.2``,
142	start with::
143
144	ceph osd crush reweight {osd-id} .2
145
146	and allow migration to complete before reweighting to ``0.4``,
147	``0.6``, and so on until the desired CRUSH weight is reached.
148
149	To limit the impact of OSD failures, you can set::
150
151	mon osd down out interval = 0
152
153	which prevents down OSDs from automatically being marked out, and then
154	ramp them down manually with::
155
156	ceph osd reweight {osd-num} .8
157
158	Again, wait for the cluster to finish migrating data, and then adjust
159	the weight further until you reach a weight of 0. Note that this
160	problem prevents the cluster to automatically re-replicate data after
161	a failure, so please ensure that sufficient monitoring is in place for
162	an administrator to intervene promptly.
163
164	Note that this practice will no longer be necessary in Bobtail and
165	subsequent releases.
166
167
168	Starting the OSD
169	----------------
170
171	After you add an OSD to Ceph, the OSD is in your configuration. However,
172	it is not yet running. The OSD is ``down`` and ``in``. You must start
173	your new OSD before it can begin receiving data. You may use
174	``service ceph`` from your admin host or start the OSD from its host
175	machine.
176
177	For Ubuntu Trusty use Upstart. ::
178
179	sudo start ceph-osd id={osd-num}
180
181	For all other distros use systemd. ::
182
183	sudo systemctl start ceph-osd@{osd-num}
184
185
186	Once you start your OSD, it is ``up`` and ``in``.
187
188
189	Observe the Data Migration
190	--------------------------
191
192	Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancing
193	the server by migrating placement groups to your new OSD. You can observe this
194	process with the `ceph`_ tool. ::
195
196	ceph -w
197
198	You should see the placement group states change from ``active+clean`` to
199	``active, some degraded objects``, and finally ``active+clean`` when migration
200	completes. (Control-c to exit.)
201
202
203	.. _Add/Move an OSD: ../crush-map#addosd
204	.. _ceph: ../monitoring
205
206
207
208	Removing OSDs (Manual)
209	======================
210
211	When you want to reduce the size of a cluster or replace hardware, you may
212	remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd``
213	daemon for one storage drive within a host machine. If your host has multiple
214	storage drives, you may need to remove one ``ceph-osd`` daemon for each drive.
215	Generally, it's a good idea to check the capacity of your cluster to see if you
216	are reaching the upper end of its capacity. Ensure that when you remove an OSD
217	that your cluster is not at its ``near full`` ratio.
218
219	.. warning:: Do not let your cluster reach its ``full ratio`` when
220	removing an OSD. Removing OSDs could cause the cluster to reach
221	or exceed its ``full ratio``.
222
223
224	Take the OSD out of the Cluster
225	-----------------------------------
226
227	Before you remove an OSD, it is usually ``up`` and ``in``. You need to take it
228	out of the cluster so that Ceph can begin rebalancing and copying its data to
229	other OSDs. ::
230
231	ceph osd out {osd-num}
232
233
234	Observe the Data Migration
235	--------------------------
236
237	Once you have taken your OSD ``out`` of the cluster, Ceph will begin
238	rebalancing the cluster by migrating placement groups out of the OSD you
239	removed. You can observe this process with the `ceph`_ tool. ::
240
241	ceph -w
242
243	You should see the placement group states change from ``active+clean`` to
244	``active, some degraded objects``, and finally ``active+clean`` when migration
245	completes. (Control-c to exit.)
246
247	.. note:: Sometimes, typically in a "small" cluster with few hosts (for
248	instance with a small testing cluster), the fact to take ``out`` the
249	OSD can spawn a CRUSH corner case where some PGs remain stuck in the
250	``active+remapped`` state. If you are in this case, you should mark
251	the OSD ``in`` with:
252
253	``ceph osd in {osd-num}``
254
255	to come back to the initial state and then, instead of marking ``out``
256	the OSD, set its weight to 0 with:
257
258	``ceph osd crush reweight osd.{osd-num} 0``
259
260	After that, you can observe the data migration which should come to its
261	end. The difference between marking ``out`` the OSD and reweighting it
262	to 0 is that in the first case the weight of the bucket which contains
263	the OSD isn't changed whereas in the second case the weight of the bucket
264	is updated (and decreased of the OSD weight). The reweight command could
265	be sometimes favoured in the case of a "small" cluster.
266
267
268
269	Stopping the OSD
270	----------------
271
272	After you take an OSD out of the cluster, it may still be running.
273	That is, the OSD may be ``up`` and ``out``. You must stop
274	your OSD before you remove it from the configuration. ::
275
276	ssh {osd-host}
277	sudo systemctl stop ceph-osd@{osd-num}
278
279	Once you stop your OSD, it is ``down``.
280
281
282	Removing the OSD
283	----------------
284
285	This procedure removes an OSD from a cluster map, removes its authentication
286	key, removes the OSD from the OSD map, and removes the OSD from the
287	``ceph.conf`` file. If your host has multiple drives, you may need to remove an
288	OSD for each drive by repeating this procedure.
289
290
291	#. Remove the OSD from the CRUSH map so that it no longer receives data. You may
292	also decompile the CRUSH map, remove the OSD from the device list, remove the
293	device as an item in the host bucket or remove the host bucket (if it's in the
294	CRUSH map and you intend to remove the host), recompile the map and set it.
295	See `Remove an OSD`_ for details. ::
296
297	ceph osd crush remove {name}
298
299	#. Remove the OSD authentication key. ::
300
301	ceph auth del osd.{osd-num}
302
303	The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.
304	If your cluster name differs from ``ceph``, use your cluster name instead.
305
306	#. Remove the OSD. ::
307
308	ceph osd rm {osd-num}
309	#for example
310	ceph osd rm 1
311
312	#. Navigate to the host where you keep the master copy of the cluster's
313	``ceph.conf`` file. ::
314
315	ssh {admin-host}
316	cd /etc/ceph
317	vim ceph.conf
318
319	#. Remove the OSD entry from your ``ceph.conf`` file (if it exists). ::
320
321	[osd.1]
322	host = {hostname}
323
324	#. From the host where you keep the master copy of the cluster's ``ceph.conf`` file,
325	copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other
326	hosts in your cluster.
327
328
329
330	.. _Remove an OSD: ../crush-map#removeosd