]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | ================ |
2 | cephadm Exporter | |
3 | ================ | |
4 | ||
5 | There are a number of long running tasks that the cephadm 'binary' runs which can take several seconds | |
6 | to run. This latency represents a scalability challenge to the Ceph orchestrator management plane. | |
7 | ||
8 | To address this, cephadm needs to be able to run some of these longer running tasks asynchronously - this | |
9 | frees up processing on the mgr by offloading tasks to each host, reduces latency and improves scalability. | |
10 | ||
11 | This document describes the implementation requirements and design for an 'exporter' feature | |
12 | ||
13 | ||
14 | Requirements | |
15 | ============ | |
16 | The exporter should address these functional and non-functional requirements; | |
17 | ||
18 | * run as a normal systemd unit | |
19 | * utilise the same filesystem schema as other services deployed with cephadm | |
20 | * require only python3 standard library modules (no external dependencies) | |
21 | * use encryption to protect the data flowing from a host to Ceph mgr | |
22 | * execute data gathering tasks as background threads | |
23 | * be easily extended to include more data gathering tasks | |
24 | * monitor itself for the health of the data gathering threads | |
25 | * cache metadata to respond to queries quickly | |
26 | * respond to a metadata query in <30ms to support large Ceph clusters (000's nodes) | |
27 | * provide CLI interaction to enable the exporter to be deployed either at bootstrap time, or once the | |
28 | cluster has been deployed. | |
29 | * be deployed as a normal orchestrator service (similar to the node-exporter) | |
30 | ||
31 | High Level Design | |
32 | ================= | |
33 | ||
34 | This section will focus on the exporter logic **only**. | |
35 | ||
36 | .. code:: | |
37 | ||
38 | Establish a metadata cache object (tasks will be represented by separate attributes) | |
39 | Create a thread for each data gathering task; host, ceph-volume and list_daemons | |
40 | each thread updates it's own attribute within the cache object | |
41 | Start a server instance passing requests to a specific request handler | |
42 | the request handler only interacts with the cache object | |
43 | the request handler passes metadata back to the caller | |
44 | Main Loop | |
45 | Leave the loop if a 'stop' request is received | |
46 | check thread health | |
47 | if a thread that was active, is now inactive | |
48 | update the cache marking the task as inactive | |
49 | update the cache with an error message for that task | |
50 | wait for n secs | |
51 | ||
52 | ||
53 | In the initial exporter implementation, the exporter has been implemented as a RESTful API. | |
54 | ||
55 | ||
56 | Security | |
57 | ======== | |
58 | ||
59 | The cephadm 'binary' only supports standard python3 features, which has meant the RESTful API has been | |
60 | developed using the http module, which itself is not intended for production use. However, the implementation | |
61 | is not complex (based only on HTTPServer and BaseHHTPRequestHandler) and only supports the GET method - so the | |
62 | security risk is perceived as low. | |
63 | ||
64 | Current mgr to host interactions occurs within an ssh connection, so the goal of the exporter is to adopt a similar | |
65 | security model. | |
66 | ||
67 | The initial REST API is implemented with the following features; | |
68 | ||
69 | * generic self-signed, or user provided SSL crt/key to encrypt traffic between the mgr and the host | |
70 | * 'token' based authentication of the request | |
71 | ||
72 | All exporter instances will use the **same** crt/key to secure the link from the mgr to the host(s), in the same way | |
73 | that the ssh access uses the same public key and port for each host connection. | |
74 | ||
75 | .. note:: Since the same SSL configuration is used on every exporter, when you supply your own settings you must | |
76 | ensure that the CN or SAN components of the distinguished name are either **not** used or created using wildcard naming. | |
77 | ||
78 | The crt, key and token files are all defined with restrictive permissions (600), to help mitigate against the risk of exposure | |
79 | to any other user on the Ceph cluster node(s). | |
80 | ||
81 | Administrator Interaction | |
82 | ========================= | |
83 | Several new commands are required to configure the exporter, and additional parameters should be added to the bootstrap | |
84 | process to allow the exporter to be deployed automatically for new clusters. | |
85 | ||
86 | ||
87 | Enhancements to the 'bootstrap' process | |
88 | --------------------------------------- | |
89 | bootstrap should support additional parameters to automatically configure exporter daemons across hosts | |
90 | ||
91 | ``--with-exporter`` | |
92 | ||
93 | By using this flag, you're telling the bootstrap process to include the cephadm-exporter service within the | |
94 | cluster. If you do not provide a specific configuration (SSL, token, port) to use, defaults would be applied. | |
95 | ||
96 | ``--exporter-config`` | |
97 | ||
98 | With the --exporter-config option, you may pass your own SSL, token and port information. The file must be in | |
99 | JSON format and contain the following fields; crt, key, token and port. The JSON content should be validated, and any | |
100 | errors detected passed back to the user during the argument parsing phase (before any changes are done). | |
101 | ||
102 | ||
103 | Additional ceph commands | |
104 | ------------------------ | |
105 | :: | |
106 | ||
107 | # ceph cephadm generate-exporter-config | |
108 | ||
109 | This command will create generate a default configuration consisting of; a self signed certificate, a randomly generated | |
110 | 32 character token and the default port of 9443 for the REST API. | |
111 | :: | |
112 | ||
113 | # ceph cephadm set-exporter-config -i <config.json> | |
114 | ||
115 | Use a JSON file to define the crt, key, token and port for the REST API. The crt, key and token are validated by | |
116 | the mgr/cephadm module prior storing the values in the KV store. Invalid or missing entries should be reported to the | |
117 | user. | |
118 | :: | |
119 | ||
120 | # ceph cephadm clear-exporter-config | |
121 | ||
122 | Clear the current configuration (removes the associated keys from the KV store) | |
123 | :: | |
124 | ||
125 | # ceph cephadm get-exporter-config | |
126 | ||
127 | Show the current exporter configuration, in JSON format | |
128 | ||
129 | ||
130 | .. note:: If the service is already deployed any attempt to change or clear the configuration will | |
131 | be denied. In order to change settings you must remove the service, apply the required configuration | |
132 | and re-apply (``ceph orch apply cephadm-exporter``) | |
133 | ||
134 | ||
135 | ||
136 | New Ceph Configuration Keys | |
137 | =========================== | |
138 | The exporter configuration is persisted to the monitor's KV store, with the following keys: | |
139 | ||
140 | | mgr/cephadm/exporter_config | |
141 | | mgr/cephadm/exporter_enabled | |
142 | ||
143 | ||
144 | ||
145 | RESTful API | |
146 | =========== | |
147 | The primary goal of the exporter is the provision of metadata from the host to the mgr. This interaction takes | |
148 | place over a simple GET interface. Although only the GET method is supported, the API provides multiple URLs to | |
149 | provide different views on the metadata that has been gathered. | |
150 | ||
151 | .. csv-table:: Supported URL endpoints | |
152 | :header: "URL", "Purpose" | |
153 | ||
154 | "/v1/metadata", "show all metadata including health of all threads" | |
155 | "/v1/metadata/health", "only report on the health of the data gathering threads" | |
156 | "/v1/metadata/disks", "show the disk output (ceph-volume inventory data)" | |
157 | "/v1/metadata/host", "show host related metadata from the gather-facts command" | |
158 | "/v1/metatdata/daemons", "show the status of all ceph cluster related daemons on the host" | |
159 | ||
160 | Return Codes | |
161 | ------------ | |
162 | The following HTTP return codes are generated by the API | |
163 | ||
164 | .. csv-table:: Supported HTTP Responses | |
165 | :header: "Status Code", "Meaning" | |
166 | ||
167 | "200", "OK" | |
168 | "204", "the thread associated with this request is no longer active, no data is returned" | |
169 | "206", "some threads have stopped, so some content is missing" | |
170 | "401", "request is not authorised - check your token is correct" | |
171 | "404", "URL is malformed, not found" | |
172 | "500", "all threads have stopped - unable to provide any metadata for the host" | |
173 | ||
174 | ||
175 | Deployment | |
176 | ========== | |
177 | During the initial phases of the exporter implementation, deployment is regarded as optional but is available | |
178 | to new clusters and existing clusters that have the feature (Pacific and above). | |
179 | ||
180 | * new clusters : use the ``--with-exporter`` option | |
181 | * existing clusters : you'll need to set the configuration and deploy the service manually | |
182 | ||
183 | .. code:: | |
184 | ||
185 | # ceph cephadm generate-exporter-config | |
186 | # ceph orch apply cephadm-exporter | |
187 | ||
188 | If you choose to remove the cephadm-exporter service, you may simply | |
189 | ||
190 | .. code:: | |
191 | ||
192 | # ceph orch rm cephadm-exporter | |
193 | ||
194 | This will remove the daemons, and the exporter releated settings stored in the KV store. | |
195 | ||
196 | ||
197 | Management | |
198 | ========== | |
199 | Once the exporter is deployed, you can use the following snippet to extract the host's metadata. | |
200 | ||
201 | .. code-block:: python | |
202 | ||
203 | import ssl | |
204 | import json | |
205 | import sys | |
206 | import tempfile | |
207 | import time | |
208 | from urllib.request import Request, urlopen | |
209 | ||
210 | # CHANGE THIS V | |
211 | hostname = "rh8-1.storage.lab" | |
212 | ||
213 | print("Reading config.json") | |
214 | try: | |
215 | with open('./config.json', 'r') as f: | |
216 | raw=f.read() | |
217 | except FileNotFoundError as e: | |
218 | print("You must first create a config.json file using the cephadm get-exporter-config command") | |
219 | sys.exit(1) | |
220 | ||
221 | cfg = json.loads(raw) | |
222 | with tempfile.NamedTemporaryFile(buffering=0) as t: | |
223 | print("creating a temporary local crt file from the json") | |
224 | t.write(cfg['crt'].encode('utf-8')) | |
225 | ||
226 | ctx = ssl.create_default_context() | |
227 | ctx.check_hostname = False | |
228 | ctx.load_verify_locations(t.name) | |
229 | hdrs={"Authorization":f"Bearer {cfg['token']}"} | |
230 | print("Issuing call to gather metadata") | |
231 | req=Request(f"https://{hostname}:9443/v1/metadata",headers=hdrs) | |
232 | s_time = time.time() | |
233 | r = urlopen(req,context=ctx) | |
234 | print(r.status) | |
235 | print("call complete") | |
236 | # assert r.status == 200 | |
237 | if r.status in [200, 206]: | |
238 | ||
239 | raw=r.read() # bytes string | |
240 | js=json.loads(raw.decode()) | |
241 | print(json.dumps(js, indent=2)) | |
242 | elapsed = time.time() - s_time | |
243 | print(f"Elapsed secs : {elapsed}") | |
244 | ||
245 | ||
246 | .. note:: the above example uses python3, and assumes that you've extracted the config using the ``get-exporter-config`` command. | |
247 | ||
248 | ||
249 | Implementation Specific Details | |
250 | =============================== | |
251 | ||
252 | In the same way as a typical container based deployment, the exporter is deployed to a directory under ``/var/lib/ceph/<fsid>``. The | |
253 | cephadm binary is stored in this cluster folder, and the daemon's configuration and systemd settings are stored | |
254 | under ``/var/lib/ceph/<fsid>/cephadm-exporter.<id>/``. | |
255 | ||
256 | .. code:: | |
257 | ||
258 | [root@rh8-1 cephadm-exporter.rh8-1]# pwd | |
259 | /var/lib/ceph/cb576f70-2f72-11eb-b141-525400da3eb7/cephadm-exporter.rh8-1 | |
260 | [root@rh8-1 cephadm-exporter.rh8-1]# ls -al | |
261 | total 24 | |
262 | drwx------. 2 root root 100 Nov 25 18:10 . | |
263 | drwx------. 8 root root 160 Nov 25 23:19 .. | |
264 | -rw-------. 1 root root 1046 Nov 25 18:10 crt | |
265 | -rw-------. 1 root root 1704 Nov 25 18:10 key | |
266 | -rw-------. 1 root root 64 Nov 25 18:10 token | |
267 | -rw-------. 1 root root 38 Nov 25 18:10 unit.configured | |
268 | -rw-------. 1 root root 48 Nov 25 18:10 unit.created | |
269 | -rw-r--r--. 1 root root 157 Nov 25 18:10 unit.run | |
270 | ||
271 | ||
272 | In order to respond to requests quickly, the CephadmDaemon uses a cache object (CephadmCache) to hold the results | |
273 | of the cephadm commands. | |
274 | ||
275 | The exporter doesn't introduce any new data gathering capability - instead it merely calls the existing cephadm commands. | |
276 | ||
277 | The CephadmDaemon class creates a local HTTP server(uses ThreadingMixIn), secured with TLS and uses the CephadmDaemonHandler | |
278 | to handle the requests. The request handler inspects the request header and looks for a valid Bearer token - if this is invalid | |
279 | or missing the caller receives a 401 Unauthorized error. | |
280 | ||
281 | The 'run' method of the CephadmDaemon class, places the scrape_* methods into different threads with each thread supporting | |
282 | a different refresh interval. Each thread then periodically issues it's cephadm command, and places the output | |
283 | in the cache object. | |
284 | ||
285 | In addition to the command output, each thread also maintains it's own timestamp record in the cache so the caller can | |
286 | very easily determine the age of the data it's received. | |
287 | ||
288 | If the underlying cephadm command execution hits an exception, the thread passes control to a _handle_thread_exception method. | |
289 | Here the exception is logged to the daemon's log file and the exception details are added to the cache, providing visibility | |
290 | of the problem to the caller. | |
291 | ||
292 | Although each thread is effectively given it's own URL endpoint (host, disks, daemons), the recommended way to gather data from | |
293 | the host is to simply use the ``/v1/metadata`` endpoint. This will provide all of the data, and indicate whether any of the | |
294 | threads have failed. | |
295 | ||
296 | The run method uses "signal" to establish a reload hook, but in the initial implementation this doesn't take any action and simply | |
297 | logs that a reload was received. | |
298 | ||
299 | ||
300 | Future Work | |
301 | =========== | |
302 | ||
303 | #. Consider the potential of adding a restart policy for threads | |
304 | #. Once the exporter is fully integrated into mgr/cephadm, the goal would be to make the exporter the | |
305 | default means of data gathering. However, until then the exporter will remain as an opt-in 'feature | |
306 | preview'. |