]> git.proxmox.com Git - ceph.git/blob - ceph/doc/radosgw/cloud-transition.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / radosgw / cloud-transition.rst
1 ================
2 Cloud Transition
3 ================
4
5 This feature enables data transition to a remote cloud service as part of `Lifecycle Configuration <https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html>`__ via :ref:`storage_classes`. The transition is unidirectional; data cannot be transitioned back from the remote zone. The goal of this feature is to enable data transition to multiple cloud providers. The currently supported cloud providers are those that are compatible with AWS (S3).
6
7 Special storage class of tier type ``cloud-s3`` is used to configure the remote cloud S3 object store service to which the data needs to be transitioned. These are defined in terms of zonegroup placement targets and unlike regular storage classes, do not need a data pool.
8
9 User credentials for the remote cloud object store service need to be configured. Note that source ACLs will not
10 be preserved. It is possible to map permissions of specific source users to specific destination users.
11
12
13 Cloud Storage Class Configuration
14 ---------------------------------
15
16 ::
17
18 {
19 "access_key": <access>,
20 "secret": <secret>,
21 "endpoint": <endpoint>,
22 "region": <region>,
23 "host_style": <path | virtual>,
24 "acls": [ { "type": <id | email | uri>,
25 "source_id": <source_id>,
26 "dest_id": <dest_id> } ... ],
27 "target_path": <target_path>,
28 "target_storage_class": <target-storage-class>,
29 "multipart_sync_threshold": {object_size},
30 "multipart_min_part_size": {part_size},
31 "retain_head_object": <true | false>
32 }
33
34
35 Cloud Transition Specific Configurables:
36 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37
38 * ``access_key`` (string)
39
40 The remote cloud S3 access key that will be used for a specific connection.
41
42 * ``secret`` (string)
43
44 The secret key for the remote cloud S3 service.
45
46 * ``endpoint`` (string)
47
48 URL of remote cloud S3 service endpoint.
49
50 * ``region`` (string)
51
52 The remote cloud S3 service region name.
53
54 * ``host_style`` (path | virtual)
55
56 Type of host style to be used when accessing remote cloud S3 endpoint (default: ``path``).
57
58 * ``acls`` (array)
59
60 Contains a list of ``acl_mappings``.
61
62 * ``acl_mapping`` (container)
63
64 Each ``acl_mapping`` structure contains ``type``, ``source_id``, and ``dest_id``. These
65 will define the ACL mutation that will be done on each object. An ACL mutation allows converting source
66 user id to a destination id.
67
68 * ``type`` (id | email | uri)
69
70 ACL type: ``id`` defines user id, ``email`` defines user by email, and ``uri`` defines user by ``uri`` (group).
71
72 * ``source_id`` (string)
73
74 ID of user in the source zone.
75
76 * ``dest_id`` (string)
77
78 ID of user in the destination.
79
80 * ``target_path`` (string)
81
82 A string that defines how the target path is created. The target path specifies a prefix to which
83 the source 'bucket-name/object-name' is appended. If not specified the target_path created is "rgwx-${zonegroup}-${storage-class}-cloud-bucket".
84
85 For example: ``target_path = rgwx-archive-${zonegroup}/``
86
87 * ``target_storage_class`` (string)
88
89 A string that defines the target storage class to which the object transitions to. If not specified, object is transitioned to STANDARD storage class.
90
91 * ``retain_head_object`` (true | false)
92
93 If true, retains the metadata of the object transitioned to cloud. If false (default), the object is deleted post transition.
94 This option is ignored for current versioned objects. For more details, refer to section "Versioned Objects" below.
95
96
97 S3 Specific Configurables:
98 ~~~~~~~~~~~~~~~~~~~~~~~~~~
99
100 Currently cloud transition will only work with backends that are compatible with AWS S3. There are
101 a few configurables that can be used to tweak its behavior when accessing these cloud services:
102
103 ::
104
105 {
106 "multipart_sync_threshold": {object_size},
107 "multipart_min_part_size": {part_size}
108 }
109
110
111 * ``multipart_sync_threshold`` (integer)
112
113 Objects this size or larger will be transitioned to the cloud using multipart upload.
114
115 * ``multipart_min_part_size`` (integer)
116
117 Minimum parts size to use when transitioning objects using multipart upload.
118
119
120 How to Configure
121 ~~~~~~~~~~~~~~~~
122
123 See :ref:`adding_a_storage_class` for how to configure storage-class for a zonegroup. The cloud transition requires a creation of a special storage class with tier type defined as ``cloud-s3``
124
125 .. note:: If you have not done any previous `Multisite Configuration`_,
126 a ``default`` zone and zonegroup are created for you, and changes
127 to the zone/zonegroup will not take effect until the Ceph Object
128 Gateways are restarted. If you have created a realm for multisite,
129 the zone/zonegroup changes will take effect once the changes are
130 committed with ``radosgw-admin period update --commit``.
131
132 ::
133
134 # radosgw-admin zonegroup placement add --rgw-zonegroup={zone-group-name} \
135 --placement-id={placement-id} \
136 --storage-class={storage-class-name} \
137 --tier-type=cloud-s3
138
139 For example:
140
141 ::
142
143 # radosgw-admin zonegroup placement add --rgw-zonegroup=default \
144 --placement-id=default-placement \
145 --storage-class=CLOUDTIER --tier-type=cloud-s3
146 [
147 {
148 "key": "default-placement",
149 "val": {
150 "name": "default-placement",
151 "tags": [],
152 "storage_classes": [
153 "CLOUDTIER",
154 "STANDARD"
155 ],
156 "tier_targets": [
157 {
158 "key": "CLOUDTIER",
159 "val": {
160 "tier_type": "cloud-s3",
161 "storage_class": "CLOUDTIER",
162 "retain_head_object": "false",
163 "s3": {
164 "endpoint": "",
165 "access_key": "",
166 "secret": "",
167 "host_style": "path",
168 "target_storage_class": "",
169 "target_path": "",
170 "acl_mappings": [],
171 "multipart_sync_threshold": 33554432,
172 "multipart_min_part_size": 33554432
173 }
174 }
175 }
176 ]
177 }
178 }
179 ]
180
181
182 .. note:: Once a storage class is created of ``--tier-type=cloud-s3``, it cannot be later modified to any other storage class type.
183
184 The tier configuration can be then done using the following command
185
186 ::
187
188 # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
189 --placement-id={placement-id} \
190 --storage-class={storage-class-name} \
191 --tier-config={key}={val}[,{key}={val}]
192
193 The ``key`` in the configuration specifies the config variable that needs to be updated, and
194 the ``val`` specifies its new value.
195
196
197 For example:
198
199 ::
200
201 # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
202 --placement-id default-placement \
203 --storage-class CLOUDTIER \
204 --tier-config=endpoint=http://XX.XX.XX.XX:YY,\
205 access_key=<access_key>,secret=<secret>, \
206 multipart_sync_threshold=44432, \
207 multipart_min_part_size=44432, \
208 retain_head_object=true
209
210 Nested values can be accessed using period. For example:
211
212 ::
213
214 # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
215 --placement-id={placement-id} \
216 --storage-class={storage-class-name} \
217 --tier-config=acls.source_id=${source-id}, \
218 acls.dest_id=${dest-id}
219
220
221
222 Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed
223 in square brackets, and adding new array entry can be done by using `[]`.
224 For example, creating a new acl array entry:
225
226 ::
227
228 # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
229 --placement-id={placement-id} \
230 --storage-class={storage-class-name} \
231 --tier-config=acls[].source_id=${source-id}, \
232 acls[${source-id}].dest_id=${dest-id}, \
233 acls[${source-id}].type=email
234
235 An entry can be removed by using ``--tier-config-rm={key}``.
236
237 For example,
238
239 ::
240
241 # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
242 --placement-id default-placement \
243 --storage-class CLOUDTIER \
244 --tier-config-rm=acls.source_id=testid
245
246 # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
247 --placement-id default-placement \
248 --storage-class CLOUDTIER \
249 --tier-config-rm=target_path
250
251 The storage class can be removed using the following command
252
253 ::
254
255 # radosgw-admin zonegroup placement rm --rgw-zonegroup={zone-group-name} \
256 --placement-id={placement-id} \
257 --storage-class={storage-class-name}
258
259 For example,
260
261 ::
262
263 # radosgw-admin zonegroup placement rm --rgw-zonegroup default \
264 --placement-id default-placement \
265 --storage-class CLOUDTIER
266 [
267 {
268 "key": "default-placement",
269 "val": {
270 "name": "default-placement",
271 "tags": [],
272 "storage_classes": [
273 "STANDARD"
274 ]
275 }
276 }
277 ]
278
279 Object modification & Limitations
280 ----------------------------------
281
282 The cloud storage class once configured can then be used like any other storage class in the bucket lifecycle rules. For example,
283
284 ::
285
286 <Transition>
287 <StorageClass>CLOUDTIER</StorageClass>
288 ....
289 ....
290 </Transition>
291
292
293 Since the transition is unidirectional, while configuring S3 lifecycle rules, the cloud storage class should be specified last among all the storage classes the object transitions to. Subsequent rules (if any) do not apply post transition to the cloud.
294
295 Due to API limitations there is no way to preserve original object modification time and ETag but they get stored as metadata attributes on the destination objects, as shown below:
296
297 ::
298
299 x-amz-meta-rgwx-source: rgw
300 x-amz-meta-rgwx-source-etag: ed076287532e86365e841e92bfc50d8c
301 x-amz-meta-rgwx-source-key: lc.txt
302 x-amz-meta-rgwx-source-mtime: 1608546349.757100363
303 x-amz-meta-rgwx-versioned-epoch: 0
304
305 By default, post transition, the source object gets deleted. But it is possible to retain its metadata but with updated values (like storage-class and object-size) by setting config option 'retain_head_object' to true. However GET on those objects shall still fail with 'InvalidObjectState' error.
306
307 For example,
308 ::
309
310 # s3cmd info s3://bucket/lc.txt
311 s3://bucket/lc.txt (object):
312 File size: 12
313 Last mod: Mon, 21 Dec 2020 10:25:56 GMT
314 MIME type: text/plain
315 Storage: CLOUDTIER
316 MD5 sum: ed076287532e86365e841e92bfc50d8c
317 SSE: none
318 Policy: none
319 CORS: none
320 ACL: M. Tester: FULL_CONTROL
321 x-amz-meta-s3cmd-attrs: atime:1608466266/ctime:1597606156/gid:0/gname:root/md5:ed076287532e86365e841e92bfc50d8c/mode:33188/mtime:1597605793/uid:0/uname:root
322
323 # s3cmd get s3://bucket/lc.txt lc_restore.txt
324 download: 's3://bucket/lc.txt' -> 'lc_restore.txt' [1 of 1]
325 ERROR: S3 error: 403 (InvalidObjectState)
326
327 To avoid object names collision across various buckets, source bucket name is prepended to the target object name. If the object is versioned, object versionid is appended to the end.
328
329 Below is the sample object name format:
330 ::
331
332 s3://<target_path>/<source_bucket_name>/<source_object_name>(-<source_object_version_id>)
333
334
335 Versioned Objects
336 ~~~~~~~~~~~~~~~~~
337
338 For versioned and locked objects, similar semantics as that of LifecycleExpiration are applied as stated below.
339
340 * If the object is current, post transitioning to cloud, it is made noncurrent with delete marker created.
341
342 * If the object is noncurrent and is locked, its transition is skipped.
343
344
345 Future Work
346 -----------
347
348 * Send presigned redirect or read-through the objects transitioned to cloud
349
350 * Support s3:RestoreObject operation on cloud transitioned objects.
351
352 * Federation between RGW and Cloud services.
353
354 * Support transition to other cloud providers (like Azure).
355
356 .. _`Multisite Configuration`: ../multisite