]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ================= |
2 | Troubleshooting | |
3 | ================= | |
4 | ||
5 | ||
6 | The Gateway Won't Start | |
7 | ======================= | |
8 | ||
9 | If you cannot start the gateway (i.e., there is no existing ``pid``), | |
10 | check to see if there is an existing ``.asok`` file from another | |
11 | user. If an ``.asok`` file from another user exists and there is no | |
12 | running ``pid``, remove the ``.asok`` file and try to start the | |
11fdf7f2 | 13 | process again. This may occur when you start the process as a ``root`` user and |
7c673cae FG |
14 | the startup script is trying to start the process as a |
15 | ``www-data`` or ``apache`` user and an existing ``.asok`` is | |
16 | preventing the script from starting the daemon. | |
17 | ||
18 | The radosgw init script (/etc/init.d/radosgw) also has a verbose argument that | |
11fdf7f2 | 19 | can provide some insight as to what could be the issue:: |
7c673cae FG |
20 | |
21 | /etc/init.d/radosgw start -v | |
22 | ||
11fdf7f2 | 23 | or :: |
7c673cae FG |
24 | |
25 | /etc/init.d radosgw start --verbose | |
26 | ||
27 | HTTP Request Errors | |
28 | =================== | |
29 | ||
30 | Examining the access and error logs for the web server itself is | |
31 | probably the first step in identifying what is going on. If there is | |
32 | a 500 error, that usually indicates a problem communicating with the | |
33 | ``radosgw`` daemon. Ensure the daemon is running, its socket path is | |
34 | configured, and that the web server is looking for it in the proper | |
35 | location. | |
36 | ||
37 | ||
38 | Crashed ``radosgw`` process | |
39 | =========================== | |
40 | ||
41 | If the ``radosgw`` process dies, you will normally see a 500 error | |
42 | from the web server (apache, nginx, etc.). In that situation, simply | |
43 | restarting radosgw will restore service. | |
44 | ||
45 | To diagnose the cause of the crash, check the log in ``/var/log/ceph`` | |
46 | and/or the core file (if one was generated). | |
47 | ||
48 | ||
49 | Blocked ``radosgw`` Requests | |
50 | ============================ | |
51 | ||
52 | If some (or all) radosgw requests appear to be blocked, you can get | |
53 | some insight into the internal state of the ``radosgw`` daemon via | |
54 | its admin socket. By default, there will be a socket configured to | |
55 | reside in ``/var/run/ceph``, and the daemon can be queried with:: | |
56 | ||
57 | ceph daemon /var/run/ceph/client.rgw help | |
58 | ||
59 | help list available commands | |
60 | objecter_requests show in-progress osd requests | |
61 | perfcounters_dump dump perfcounters value | |
62 | perfcounters_schema dump perfcounters schema | |
63 | version get protocol version | |
64 | ||
65 | Of particular interest:: | |
66 | ||
67 | ceph daemon /var/run/ceph/client.rgw objecter_requests | |
68 | ... | |
69 | ||
70 | will dump information about current in-progress requests with the | |
71 | RADOS cluster. This allows one to identify if any requests are blocked | |
72 | by a non-responsive OSD. For example, one might see:: | |
73 | ||
74 | { "ops": [ | |
75 | { "tid": 1858, | |
76 | "pg": "2.d2041a48", | |
77 | "osd": 1, | |
78 | "last_sent": "2012-03-08 14:56:37.949872", | |
79 | "attempts": 1, | |
80 | "object_id": "fatty_25647_object1857", | |
81 | "object_locator": "@2", | |
82 | "snapid": "head", | |
83 | "snap_context": "0=[]", | |
84 | "mtime": "2012-03-08 14:56:37.949813", | |
85 | "osd_ops": [ | |
86 | "write 0~4096"]}, | |
87 | { "tid": 1873, | |
88 | "pg": "2.695e9f8e", | |
89 | "osd": 1, | |
90 | "last_sent": "2012-03-08 14:56:37.970615", | |
91 | "attempts": 1, | |
92 | "object_id": "fatty_25647_object1872", | |
93 | "object_locator": "@2", | |
94 | "snapid": "head", | |
95 | "snap_context": "0=[]", | |
96 | "mtime": "2012-03-08 14:56:37.970555", | |
97 | "osd_ops": [ | |
98 | "write 0~4096"]}], | |
99 | "linger_ops": [], | |
100 | "pool_ops": [], | |
101 | "pool_stat_ops": [], | |
102 | "statfs_ops": []} | |
103 | ||
104 | In this dump, two requests are in progress. The ``last_sent`` field is | |
105 | the time the RADOS request was sent. If this is a while ago, it suggests | |
106 | that the OSD is not responding. For example, for request 1858, you could | |
107 | check the OSD status with:: | |
108 | ||
109 | ceph pg map 2.d2041a48 | |
110 | ||
111 | osdmap e9 pg 2.d2041a48 (2.0) -> up [1,0] acting [1,0] | |
112 | ||
113 | This tells us to look at ``osd.1``, the primary copy for this PG:: | |
114 | ||
115 | ceph daemon osd.1 ops | |
116 | { "num_ops": 651, | |
117 | "ops": [ | |
118 | { "description": "osd_op(client.4124.0:1858 fatty_25647_object1857 [write 0~4096] 2.d2041a48)", | |
119 | "received_at": "1331247573.344650", | |
120 | "age": "25.606449", | |
121 | "flag_point": "waiting for sub ops", | |
122 | "client_info": { "client": "client.4124", | |
123 | "tid": 1858}}, | |
124 | ... | |
125 | ||
126 | The ``flag_point`` field indicates that the OSD is currently waiting | |
127 | for replicas to respond, in this case ``osd.0``. | |
128 | ||
129 | ||
130 | Java S3 API Troubleshooting | |
131 | =========================== | |
132 | ||
133 | ||
134 | Peer Not Authenticated | |
135 | ---------------------- | |
136 | ||
137 | You may receive an error that looks like this:: | |
138 | ||
139 | [java] INFO: Unable to execute HTTP request: peer not authenticated | |
140 | ||
141 | The Java SDK for S3 requires a valid certificate from a recognized certificate | |
142 | authority, because it uses HTTPS by default. If you are just testing the Ceph | |
143 | Object Storage services, you can resolve this problem in a few ways: | |
144 | ||
145 | #. Prepend the IP address or hostname with ``http://``. For example, change this:: | |
146 | ||
147 | conn.setEndpoint("myserver"); | |
148 | ||
149 | To:: | |
150 | ||
151 | conn.setEndpoint("http://myserver") | |
152 | ||
153 | #. After setting your credentials, add a client configuration and set the | |
154 | protocol to ``Protocol.HTTP``. :: | |
155 | ||
156 | AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); | |
157 | ||
158 | ClientConfiguration clientConfig = new ClientConfiguration(); | |
159 | clientConfig.setProtocol(Protocol.HTTP); | |
160 | ||
161 | AmazonS3 conn = new AmazonS3Client(credentials, clientConfig); | |
162 | ||
163 | ||
164 | ||
165 | 405 MethodNotAllowed | |
166 | -------------------- | |
167 | ||
168 | If you receive an 405 error, check to see if you have the S3 subdomain set up correctly. | |
169 | You will need to have a wild card setting in your DNS record for subdomain functionality | |
170 | to work properly. | |
171 | ||
172 | Also, check to ensure that the default site is disabled. :: | |
173 | ||
174 | [java] Exception in thread "main" Status Code: 405, AWS Service: Amazon S3, AWS Request ID: null, AWS Error Code: MethodNotAllowed, AWS Error Message: null, S3 Extended Request ID: null | |
175 | ||
176 | ||
177 | ||
494da23a TL |
178 | Numerous objects in default.rgw.meta pool |
179 | ========================================= | |
180 | ||
181 | Clusters created prior to *jewel* have a metadata archival feature enabled by default, using the ``default.rgw.meta`` pool. | |
182 | This archive keeps all old versions of user and bucket metadata, resulting in large numbers of objects in the ``default.rgw.meta`` pool. | |
183 | ||
184 | Disabling the Metadata Heap | |
185 | --------------------------- | |
186 | ||
187 | Users who want to disable this feature going forward should set the ``metadata_heap`` field to an empty string ``""``:: | |
188 | ||
189 | $ radosgw-admin zone get --rgw-zone=default > zone.json | |
190 | [edit zone.json, setting "metadata_heap": ""] | |
191 | $ radosgw-admin zone set --rgw-zone=default --infile=zone.json | |
192 | $ radosgw-admin period update --commit | |
193 | ||
194 | This will stop new metadata from being written to the ``default.rgw.meta`` pool, but does not remove any existing objects or pool. | |
195 | ||
196 | Cleaning the Metadata Heap Pool | |
197 | ------------------------------- | |
198 | ||
199 | Clusters created prior to *jewel* normally use ``default.rgw.meta`` only for the metadata archival feature. | |
200 | ||
201 | However, from *luminous* onwards, radosgw uses :ref:`Pool Namespaces <radosgw-pool-namespaces>` within ``default.rgw.meta`` for an entirely different purpose, that is, to store ``user_keys`` and other critical metadata. | |
202 | ||
203 | Users should check zone configuration before proceeding any cleanup procedures:: | |
204 | ||
205 | $ radosgw-admin zone get --rgw-zone=default | grep default.rgw.meta | |
206 | [should not match any strings] | |
207 | ||
208 | Having confirmed that the pool is not used for any purpose, users may safely delete all objects in the ``default.rgw.meta`` pool, or optionally, delete the entire pool itself. |