]> git.proxmox.com Git - ceph.git/blame - ceph/doc/radosgw/troubleshooting.rst
import 14.2.4 nautilus point release
[ceph.git] / ceph / doc / radosgw / troubleshooting.rst
CommitLineData
7c673cae
FG
1=================
2 Troubleshooting
3=================
4
5
6The Gateway Won't Start
7=======================
8
9If you cannot start the gateway (i.e., there is no existing ``pid``),
10check to see if there is an existing ``.asok`` file from another
11user. If an ``.asok`` file from another user exists and there is no
12running ``pid``, remove the ``.asok`` file and try to start the
11fdf7f2 13process again. This may occur when you start the process as a ``root`` user and
7c673cae
FG
14the startup script is trying to start the process as a
15``www-data`` or ``apache`` user and an existing ``.asok`` is
16preventing the script from starting the daemon.
17
18The radosgw init script (/etc/init.d/radosgw) also has a verbose argument that
11fdf7f2 19can provide some insight as to what could be the issue::
7c673cae
FG
20
21 /etc/init.d/radosgw start -v
22
11fdf7f2 23or ::
7c673cae
FG
24
25 /etc/init.d radosgw start --verbose
26
27HTTP Request Errors
28===================
29
30Examining the access and error logs for the web server itself is
31probably the first step in identifying what is going on. If there is
32a 500 error, that usually indicates a problem communicating with the
33``radosgw`` daemon. Ensure the daemon is running, its socket path is
34configured, and that the web server is looking for it in the proper
35location.
36
37
38Crashed ``radosgw`` process
39===========================
40
41If the ``radosgw`` process dies, you will normally see a 500 error
42from the web server (apache, nginx, etc.). In that situation, simply
43restarting radosgw will restore service.
44
45To diagnose the cause of the crash, check the log in ``/var/log/ceph``
46and/or the core file (if one was generated).
47
48
49Blocked ``radosgw`` Requests
50============================
51
52If some (or all) radosgw requests appear to be blocked, you can get
53some insight into the internal state of the ``radosgw`` daemon via
54its admin socket. By default, there will be a socket configured to
55reside in ``/var/run/ceph``, and the daemon can be queried with::
56
57 ceph daemon /var/run/ceph/client.rgw help
58
59 help list available commands
60 objecter_requests show in-progress osd requests
61 perfcounters_dump dump perfcounters value
62 perfcounters_schema dump perfcounters schema
63 version get protocol version
64
65Of particular interest::
66
67 ceph daemon /var/run/ceph/client.rgw objecter_requests
68 ...
69
70will dump information about current in-progress requests with the
71RADOS cluster. This allows one to identify if any requests are blocked
72by a non-responsive OSD. For example, one might see::
73
74 { "ops": [
75 { "tid": 1858,
76 "pg": "2.d2041a48",
77 "osd": 1,
78 "last_sent": "2012-03-08 14:56:37.949872",
79 "attempts": 1,
80 "object_id": "fatty_25647_object1857",
81 "object_locator": "@2",
82 "snapid": "head",
83 "snap_context": "0=[]",
84 "mtime": "2012-03-08 14:56:37.949813",
85 "osd_ops": [
86 "write 0~4096"]},
87 { "tid": 1873,
88 "pg": "2.695e9f8e",
89 "osd": 1,
90 "last_sent": "2012-03-08 14:56:37.970615",
91 "attempts": 1,
92 "object_id": "fatty_25647_object1872",
93 "object_locator": "@2",
94 "snapid": "head",
95 "snap_context": "0=[]",
96 "mtime": "2012-03-08 14:56:37.970555",
97 "osd_ops": [
98 "write 0~4096"]}],
99 "linger_ops": [],
100 "pool_ops": [],
101 "pool_stat_ops": [],
102 "statfs_ops": []}
103
104In this dump, two requests are in progress. The ``last_sent`` field is
105the time the RADOS request was sent. If this is a while ago, it suggests
106that the OSD is not responding. For example, for request 1858, you could
107check the OSD status with::
108
109 ceph pg map 2.d2041a48
110
111 osdmap e9 pg 2.d2041a48 (2.0) -> up [1,0] acting [1,0]
112
113This tells us to look at ``osd.1``, the primary copy for this PG::
114
115 ceph daemon osd.1 ops
116 { "num_ops": 651,
117 "ops": [
118 { "description": "osd_op(client.4124.0:1858 fatty_25647_object1857 [write 0~4096] 2.d2041a48)",
119 "received_at": "1331247573.344650",
120 "age": "25.606449",
121 "flag_point": "waiting for sub ops",
122 "client_info": { "client": "client.4124",
123 "tid": 1858}},
124 ...
125
126The ``flag_point`` field indicates that the OSD is currently waiting
127for replicas to respond, in this case ``osd.0``.
128
129
130Java S3 API Troubleshooting
131===========================
132
133
134Peer Not Authenticated
135----------------------
136
137You may receive an error that looks like this::
138
139 [java] INFO: Unable to execute HTTP request: peer not authenticated
140
141The Java SDK for S3 requires a valid certificate from a recognized certificate
142authority, because it uses HTTPS by default. If you are just testing the Ceph
143Object Storage services, you can resolve this problem in a few ways:
144
145#. Prepend the IP address or hostname with ``http://``. For example, change this::
146
147 conn.setEndpoint("myserver");
148
149 To::
150
151 conn.setEndpoint("http://myserver")
152
153#. After setting your credentials, add a client configuration and set the
154 protocol to ``Protocol.HTTP``. ::
155
156 AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
157
158 ClientConfiguration clientConfig = new ClientConfiguration();
159 clientConfig.setProtocol(Protocol.HTTP);
160
161 AmazonS3 conn = new AmazonS3Client(credentials, clientConfig);
162
163
164
165405 MethodNotAllowed
166--------------------
167
168If you receive an 405 error, check to see if you have the S3 subdomain set up correctly.
169You will need to have a wild card setting in your DNS record for subdomain functionality
170to work properly.
171
172Also, check to ensure that the default site is disabled. ::
173
174 [java] Exception in thread "main" Status Code: 405, AWS Service: Amazon S3, AWS Request ID: null, AWS Error Code: MethodNotAllowed, AWS Error Message: null, S3 Extended Request ID: null
175
176
177
494da23a
TL
178Numerous objects in default.rgw.meta pool
179=========================================
180
181Clusters created prior to *jewel* have a metadata archival feature enabled by default, using the ``default.rgw.meta`` pool.
182This archive keeps all old versions of user and bucket metadata, resulting in large numbers of objects in the ``default.rgw.meta`` pool.
183
184Disabling the Metadata Heap
185---------------------------
186
187Users who want to disable this feature going forward should set the ``metadata_heap`` field to an empty string ``""``::
188
189 $ radosgw-admin zone get --rgw-zone=default > zone.json
190 [edit zone.json, setting "metadata_heap": ""]
191 $ radosgw-admin zone set --rgw-zone=default --infile=zone.json
192 $ radosgw-admin period update --commit
193
194This will stop new metadata from being written to the ``default.rgw.meta`` pool, but does not remove any existing objects or pool.
195
196Cleaning the Metadata Heap Pool
197-------------------------------
198
199Clusters created prior to *jewel* normally use ``default.rgw.meta`` only for the metadata archival feature.
200
201However, from *luminous* onwards, radosgw uses :ref:`Pool Namespaces <radosgw-pool-namespaces>` within ``default.rgw.meta`` for an entirely different purpose, that is, to store ``user_keys`` and other critical metadata.
202
203Users should check zone configuration before proceeding any cleanup procedures::
204
205 $ radosgw-admin zone get --rgw-zone=default | grep default.rgw.meta
206 [should not match any strings]
207
208Having confirmed that the pool is not used for any purpose, users may safely delete all objects in the ``default.rgw.meta`` pool, or optionally, delete the entire pool itself.