]> git.proxmox.com Git - ceph.git/blame - ceph/doc/radosgw/layout.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / radosgw / layout.rst
CommitLineData
7c673cae
FG
1===========================
2 Rados Gateway Data Layout
3===========================
4
5Although the source code is the ultimate guide, this document helps
6new developers to get up to speed with the implementation details.
7
8Introduction
9------------
10
f67539c2
TL
11Swift offers something called a *container*, which we use interchangeably with
12the term *bucket*, so we say that RGW's buckets implement Swift containers.
7c673cae
FG
13
14This document does not consider how RGW operates on these structures,
15e.g. the use of encode() and decode() methods for serialization and so on.
16
17Conceptual View
18---------------
19
20Although RADOS only knows about pools and objects with their xattrs and
21omap[1], conceptually RGW organizes its data into three different kinds:
22metadata, bucket index, and data.
23
24Metadata
25^^^^^^^^
26
27We have 3 'sections' of metadata: 'user', 'bucket', and 'bucket.instance'.
28You can use the following commands to introspect metadata entries: ::
29
30 $ radosgw-admin metadata list
31 $ radosgw-admin metadata list bucket
32 $ radosgw-admin metadata list bucket.instance
33 $ radosgw-admin metadata list user
34
35 $ radosgw-admin metadata get bucket:<bucket>
36 $ radosgw-admin metadata get bucket.instance:<bucket>:<bucket_id>
37 $ radosgw-admin metadata get user:<user> # get or set
38
39Some variables have been used in above commands, they are:
40
41- user: Holds user information
42- bucket: Holds a mapping between bucket name and bucket instance id
43- bucket.instance: Holds bucket instance information[2]
44
f67539c2 45Every metadata entry is kept on a single RADOS object. See below for implementation details.
7c673cae
FG
46
47Note that the metadata is not indexed. When listing a metadata section we do a
f67539c2 48RADOS ``pgls`` operation on the containing pool.
7c673cae
FG
49
50Bucket Index
51^^^^^^^^^^^^
52
53It's a different kind of metadata, and kept separately. The bucket index holds
f67539c2
TL
54a key-value map in RADOS objects. By default it is a single RADOS object per
55bucket, but it is possible since Hammer to shard that map over multiple RADOS
56objects. The map itself is kept in omap, associated with each RADOS object.
7c673cae
FG
57The key of each omap is the name of the objects, and the value holds some basic
58metadata of that object -- metadata that shows up when listing the bucket.
59Also, each omap holds a header, and we keep some bucket accounting metadata
60in that header (number of objects, total size, etc.).
61
62Note that we also hold other information in the bucket index, and it's kept in
63other key namespaces. We can hold the bucket index log there, and for versioned
64objects there is more information that we keep on other keys.
65
66Data
67^^^^
68
f67539c2 69Objects data is kept in one or more RADOS objects for each rgw object.
7c673cae
FG
70
71Object Lookup Path
72------------------
73
74When accessing objects, ReST APIs come to RGW with three parameters:
75account information (access key in S3 or account name in Swift),
76bucket or container name, and object name (or key). At present, RGW only
77uses account information to find out the user ID and for access control.
78Only the bucket name and object key are used to address the object in a pool.
79
80The user ID in RGW is a string, typically the actual user name from the user
81credentials and not a hashed or mapped identifier.
82
83When accessing a user's data, the user record is loaded from an object
31f18b77 84"<user_id>" in pool "default.rgw.meta" with namespace "users.uid".
7c673cae 85
31f18b77
FG
86Bucket names are represented in the pool "default.rgw.meta" with namespace
87"root". Bucket record is
7c673cae
FG
88loaded in order to obtain so-called marker, which serves as a bucket ID.
89
31f18b77
FG
90The object is located in pool "default.rgw.buckets.data".
91Object name is "<marker>_<key>",
7c673cae
FG
92for example "default.7593.4_image.png", where the marker is "default.7593.4"
93and the key is "image.png". Since these concatenated names are not parsed,
94only passed down to RADOS, the choice of the separator is not important and
95causes no ambiguity. For the same reason, slashes are permitted in object
96names (keys).
97
98It is also possible to create multiple data pools and make it so that
20effc67 99different users\` buckets will be created in different RADOS pools by default,
7c673cae
FG
100thus providing the necessary scaling. The layout and naming of these pools
101is controlled by a 'policy' setting.[3]
102
103An RGW object may consist of several RADOS objects, the first of which
104is the head that contains the metadata, such as manifest, ACLs, content type,
105ETag, and user-defined metadata. The metadata is stored in xattrs.
20effc67 106The head may also contain up to :confval:`rgw_max_chunk_size` of object data, for efficiency
7c673cae
FG
107and atomicity. The manifest describes how each object is laid out in RADOS
108objects.
109
110Bucket and Object Listing
111-------------------------
112
113Buckets that belong to a given user are listed in an omap of an object named
31f18b77
FG
114"<user_id>.buckets" (for example, "foo.buckets") in pool "default.rgw.meta"
115with namespace "users.uid".
7c673cae
FG
116These objects are accessed when listing buckets, when updating bucket
117contents, and updating and retrieving bucket statistics (e.g. for quota).
118
119See the user-visible, encoded class 'cls_user_bucket_entry' and its
20effc67 120nested class 'cls_user_bucket' for the values of these omap entries.
7c673cae
FG
121
122These listings are kept consistent with buckets in pool ".rgw".
123
124Objects that belong to a given bucket are listed in a bucket index,
125as discussed in sub-section 'Bucket Index' above. The default naming
31f18b77 126for index objects is ".dir.<marker>" in pool "default.rgw.buckets.index".
7c673cae
FG
127
128Footnotes
129---------
130
131[1] Omap is a key-value store, associated with an object, in a way similar
132to how Extended Attributes associate with a POSIX file. An object's omap
133is not physically located in the object's storage, but its precise
134implementation is invisible and immaterial to RADOS Gateway.
20effc67
TL
135In Hammer, LevelDB is used to store omap data within each OSD; later releases
136default to RocksDB but can be configured to use LevelDB.
7c673cae
FG
137
138[2] Before the Dumpling release, the 'bucket.instance' metadata did not
139exist and the 'bucket' metadata contained its information. It is possible
140to encounter such buckets in old installations.
141
20effc67 142[3] Pool names changed with the Infernalis release.
31f18b77
FG
143If you are looking at an older setup, some details may be different. In
144particular there was a different pool for each of the namespaces that are
20effc67 145now being used inside the ``default.root.meta`` pool.
7c673cae 146
31f18b77
FG
147Appendix: Compendium
148--------------------
7c673cae
FG
149
150Known pools:
151
152.rgw.root
153 Unspecified region, zone, and global information records, one per object.
154
31f18b77 155<zone>.rgw.control
7c673cae
FG
156 notify.<N>
157
31f18b77
FG
158<zone>.rgw.meta
159 Multiple namespaces with different kinds of metadata:
7c673cae 160
31f18b77
FG
161 namespace: root
162 <bucket>
163 .bucket.meta.<bucket>:<marker> # see put_bucket_instance_info()
7c673cae 164
31f18b77
FG
165 The tenant is used to disambiguate buckets, but not bucket instances.
166 Example::
7c673cae 167
31f18b77
FG
168 .bucket.meta.prodtx:test%25star:default.84099.6
169 .bucket.meta.testcont:default.4126.1
170 .bucket.meta.prodtx:testcont:default.84099.4
171 prodtx/testcont
172 prodtx/test%25star
173 testcont
7c673cae 174
31f18b77
FG
175 namespace: users.uid
176 Contains _both_ per-user information (RGWUserInfo) in "<user>" objects
177 and per-user lists of buckets in omaps of "<user>.buckets" objects.
178 The "<user>" may contain the tenant if non-empty, for example::
7c673cae 179
31f18b77
FG
180 prodtx$prodt
181 test2.buckets
182 prodtx$prodt.buckets
183 test2
7c673cae 184
31f18b77
FG
185 namespace: users.email
186 Unimportant
7c673cae 187
31f18b77
FG
188 namespace: users.keys
189 47UA98JSTJZ9YAN3OS3O
7c673cae 190
f67539c2 191 This allows ``radosgw`` to look up users by their access keys during authentication.
7c673cae 192
31f18b77
FG
193 namespace: users.swift
194 test:tester
195
196<zone>.rgw.buckets.index
7c673cae
FG
197 Objects are named ".dir.<marker>", each contains a bucket index.
198 If the index is sharded, each shard appends the shard index after
199 the marker.
200
31f18b77 201<zone>.rgw.buckets.data
7c673cae
FG
202 default.7593.4__shadow_.488urDFerTYXavx4yAd-Op8mxehnvTI_1
203 <marker>_<key>
204
205An example of a marker would be "default.16004.1" or "default.7593.4".
206The current format is "<zone>.<instance_id>.<bucket_id>". But once
207generated, a marker is not parsed again, so its format may change
208freely in the future.