]> git.proxmox.com Git - ceph.git/blob - ceph/doc/radosgw/troubleshooting.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / radosgw / troubleshooting.rst
1 =================
2 Troubleshooting
3 =================
4
5
6 The Gateway Won't Start
7 =======================
8
9 If you cannot start the gateway (i.e., there is no existing ``pid``),
10 check to see if there is an existing ``.asok`` file from another
11 user. If an ``.asok`` file from another user exists and there is no
12 running ``pid``, remove the ``.asok`` file and try to start the
13 process again. This may occur when you start the process as a ``root`` user and
14 the startup script is trying to start the process as a
15 ``www-data`` or ``apache`` user and an existing ``.asok`` is
16 preventing the script from starting the daemon.
17
18 The radosgw init script (/etc/init.d/radosgw) also has a verbose argument that
19 can provide some insight as to what could be the issue::
20
21 /etc/init.d/radosgw start -v
22
23 or ::
24
25 /etc/init.d radosgw start --verbose
26
27 HTTP Request Errors
28 ===================
29
30 Examining the access and error logs for the web server itself is
31 probably the first step in identifying what is going on. If there is
32 a 500 error, that usually indicates a problem communicating with the
33 ``radosgw`` daemon. Ensure the daemon is running, its socket path is
34 configured, and that the web server is looking for it in the proper
35 location.
36
37
38 Crashed ``radosgw`` process
39 ===========================
40
41 If the ``radosgw`` process dies, you will normally see a 500 error
42 from the web server (apache, nginx, etc.). In that situation, simply
43 restarting radosgw will restore service.
44
45 To diagnose the cause of the crash, check the log in ``/var/log/ceph``
46 and/or the core file (if one was generated).
47
48
49 Blocked ``radosgw`` Requests
50 ============================
51
52 If some (or all) radosgw requests appear to be blocked, you can get
53 some insight into the internal state of the ``radosgw`` daemon via
54 its admin socket. By default, there will be a socket configured to
55 reside in ``/var/run/ceph``, and the daemon can be queried with::
56
57 ceph daemon /var/run/ceph/client.rgw help
58
59 help list available commands
60 objecter_requests show in-progress osd requests
61 perfcounters_dump dump perfcounters value
62 perfcounters_schema dump perfcounters schema
63 version get protocol version
64
65 Of particular interest::
66
67 ceph daemon /var/run/ceph/client.rgw objecter_requests
68 ...
69
70 will dump information about current in-progress requests with the
71 RADOS cluster. This allows one to identify if any requests are blocked
72 by a non-responsive OSD. For example, one might see::
73
74 { "ops": [
75 { "tid": 1858,
76 "pg": "2.d2041a48",
77 "osd": 1,
78 "last_sent": "2012-03-08 14:56:37.949872",
79 "attempts": 1,
80 "object_id": "fatty_25647_object1857",
81 "object_locator": "@2",
82 "snapid": "head",
83 "snap_context": "0=[]",
84 "mtime": "2012-03-08 14:56:37.949813",
85 "osd_ops": [
86 "write 0~4096"]},
87 { "tid": 1873,
88 "pg": "2.695e9f8e",
89 "osd": 1,
90 "last_sent": "2012-03-08 14:56:37.970615",
91 "attempts": 1,
92 "object_id": "fatty_25647_object1872",
93 "object_locator": "@2",
94 "snapid": "head",
95 "snap_context": "0=[]",
96 "mtime": "2012-03-08 14:56:37.970555",
97 "osd_ops": [
98 "write 0~4096"]}],
99 "linger_ops": [],
100 "pool_ops": [],
101 "pool_stat_ops": [],
102 "statfs_ops": []}
103
104 In this dump, two requests are in progress. The ``last_sent`` field is
105 the time the RADOS request was sent. If this is a while ago, it suggests
106 that the OSD is not responding. For example, for request 1858, you could
107 check the OSD status with::
108
109 ceph pg map 2.d2041a48
110
111 osdmap e9 pg 2.d2041a48 (2.0) -> up [1,0] acting [1,0]
112
113 This tells us to look at ``osd.1``, the primary copy for this PG::
114
115 ceph daemon osd.1 ops
116 { "num_ops": 651,
117 "ops": [
118 { "description": "osd_op(client.4124.0:1858 fatty_25647_object1857 [write 0~4096] 2.d2041a48)",
119 "received_at": "1331247573.344650",
120 "age": "25.606449",
121 "flag_point": "waiting for sub ops",
122 "client_info": { "client": "client.4124",
123 "tid": 1858}},
124 ...
125
126 The ``flag_point`` field indicates that the OSD is currently waiting
127 for replicas to respond, in this case ``osd.0``.
128
129
130 Java S3 API Troubleshooting
131 ===========================
132
133
134 Peer Not Authenticated
135 ----------------------
136
137 You may receive an error that looks like this::
138
139 [java] INFO: Unable to execute HTTP request: peer not authenticated
140
141 The Java SDK for S3 requires a valid certificate from a recognized certificate
142 authority, because it uses HTTPS by default. If you are just testing the Ceph
143 Object Storage services, you can resolve this problem in a few ways:
144
145 #. Prepend the IP address or hostname with ``http://``. For example, change this::
146
147 conn.setEndpoint("myserver");
148
149 To::
150
151 conn.setEndpoint("http://myserver")
152
153 #. After setting your credentials, add a client configuration and set the
154 protocol to ``Protocol.HTTP``. ::
155
156 AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
157
158 ClientConfiguration clientConfig = new ClientConfiguration();
159 clientConfig.setProtocol(Protocol.HTTP);
160
161 AmazonS3 conn = new AmazonS3Client(credentials, clientConfig);
162
163
164
165 405 MethodNotAllowed
166 --------------------
167
168 If you receive an 405 error, check to see if you have the S3 subdomain set up correctly.
169 You will need to have a wild card setting in your DNS record for subdomain functionality
170 to work properly.
171
172 Also, check to ensure that the default site is disabled. ::
173
174 [java] Exception in thread "main" Status Code: 405, AWS Service: Amazon S3, AWS Request ID: null, AWS Error Code: MethodNotAllowed, AWS Error Message: null, S3 Extended Request ID: null
175
176
177
178 Numerous objects in default.rgw.meta pool
179 =========================================
180
181 Clusters created prior to *jewel* have a metadata archival feature enabled by default, using the ``default.rgw.meta`` pool.
182 This archive keeps all old versions of user and bucket metadata, resulting in large numbers of objects in the ``default.rgw.meta`` pool.
183
184 Disabling the Metadata Heap
185 ---------------------------
186
187 Users who want to disable this feature going forward should set the ``metadata_heap`` field to an empty string ``""``::
188
189 $ radosgw-admin zone get --rgw-zone=default > zone.json
190 [edit zone.json, setting "metadata_heap": ""]
191 $ radosgw-admin zone set --rgw-zone=default --infile=zone.json
192 $ radosgw-admin period update --commit
193
194 This will stop new metadata from being written to the ``default.rgw.meta`` pool, but does not remove any existing objects or pool.
195
196 Cleaning the Metadata Heap Pool
197 -------------------------------
198
199 Clusters created prior to *jewel* normally use ``default.rgw.meta`` only for the metadata archival feature.
200
201 However, from *luminous* onwards, radosgw uses :ref:`Pool Namespaces <radosgw-pool-namespaces>` within ``default.rgw.meta`` for an entirely different purpose, that is, to store ``user_keys`` and other critical metadata.
202
203 Users should check zone configuration before proceeding any cleanup procedures::
204
205 $ radosgw-admin zone get --rgw-zone=default | grep default.rgw.meta
206 [should not match any strings]
207
208 Having confirmed that the pool is not used for any purpose, users may safely delete all objects in the ``default.rgw.meta`` pool, or optionally, delete the entire pool itself.