]> git.proxmox.com Git - ceph.git/blame - ceph/doc/dev/cephadm/cephadm-exporter.rst
buildsys: switch source download to quincy
[ceph.git] / ceph / doc / dev / cephadm / cephadm-exporter.rst
CommitLineData
f67539c2
TL
1================
2cephadm Exporter
3================
4
5There are a number of long running tasks that the cephadm 'binary' runs which can take several seconds
6to run. This latency represents a scalability challenge to the Ceph orchestrator management plane.
7
8To address this, cephadm needs to be able to run some of these longer running tasks asynchronously - this
9frees up processing on the mgr by offloading tasks to each host, reduces latency and improves scalability.
10
11This document describes the implementation requirements and design for an 'exporter' feature
12
13
14Requirements
15============
16The exporter should address these functional and non-functional requirements;
17
18* run as a normal systemd unit
19* utilise the same filesystem schema as other services deployed with cephadm
20* require only python3 standard library modules (no external dependencies)
21* use encryption to protect the data flowing from a host to Ceph mgr
22* execute data gathering tasks as background threads
23* be easily extended to include more data gathering tasks
24* monitor itself for the health of the data gathering threads
25* cache metadata to respond to queries quickly
26* respond to a metadata query in <30ms to support large Ceph clusters (000's nodes)
27* provide CLI interaction to enable the exporter to be deployed either at bootstrap time, or once the
28 cluster has been deployed.
29* be deployed as a normal orchestrator service (similar to the node-exporter)
30
31High Level Design
32=================
33
34This section will focus on the exporter logic **only**.
35
36.. code::
37
38 Establish a metadata cache object (tasks will be represented by separate attributes)
39 Create a thread for each data gathering task; host, ceph-volume and list_daemons
40 each thread updates it's own attribute within the cache object
41 Start a server instance passing requests to a specific request handler
42 the request handler only interacts with the cache object
43 the request handler passes metadata back to the caller
44 Main Loop
45 Leave the loop if a 'stop' request is received
46 check thread health
47 if a thread that was active, is now inactive
48 update the cache marking the task as inactive
49 update the cache with an error message for that task
50 wait for n secs
51
52
53In the initial exporter implementation, the exporter has been implemented as a RESTful API.
54
55
56Security
57========
58
59The cephadm 'binary' only supports standard python3 features, which has meant the RESTful API has been
60developed using the http module, which itself is not intended for production use. However, the implementation
61is not complex (based only on HTTPServer and BaseHHTPRequestHandler) and only supports the GET method - so the
62security risk is perceived as low.
63
64Current mgr to host interactions occurs within an ssh connection, so the goal of the exporter is to adopt a similar
65security model.
66
67The initial REST API is implemented with the following features;
68
69* generic self-signed, or user provided SSL crt/key to encrypt traffic between the mgr and the host
70* 'token' based authentication of the request
71
72All exporter instances will use the **same** crt/key to secure the link from the mgr to the host(s), in the same way
73that the ssh access uses the same public key and port for each host connection.
74
75.. note:: Since the same SSL configuration is used on every exporter, when you supply your own settings you must
76 ensure that the CN or SAN components of the distinguished name are either **not** used or created using wildcard naming.
77
78The crt, key and token files are all defined with restrictive permissions (600), to help mitigate against the risk of exposure
79to any other user on the Ceph cluster node(s).
80
81Administrator Interaction
82=========================
83Several new commands are required to configure the exporter, and additional parameters should be added to the bootstrap
84process to allow the exporter to be deployed automatically for new clusters.
85
86
87Enhancements to the 'bootstrap' process
88---------------------------------------
89bootstrap should support additional parameters to automatically configure exporter daemons across hosts
90
91``--with-exporter``
92
93By using this flag, you're telling the bootstrap process to include the cephadm-exporter service within the
94cluster. If you do not provide a specific configuration (SSL, token, port) to use, defaults would be applied.
95
96``--exporter-config``
97
98With the --exporter-config option, you may pass your own SSL, token and port information. The file must be in
99JSON format and contain the following fields; crt, key, token and port. The JSON content should be validated, and any
100errors detected passed back to the user during the argument parsing phase (before any changes are done).
101
102
103Additional ceph commands
104------------------------
105::
106
107# ceph cephadm generate-exporter-config
108
109This command will create generate a default configuration consisting of; a self signed certificate, a randomly generated
11032 character token and the default port of 9443 for the REST API.
111::
112
113# ceph cephadm set-exporter-config -i <config.json>
114
115Use a JSON file to define the crt, key, token and port for the REST API. The crt, key and token are validated by
116the mgr/cephadm module prior storing the values in the KV store. Invalid or missing entries should be reported to the
117user.
118::
119
120# ceph cephadm clear-exporter-config
121
122Clear the current configuration (removes the associated keys from the KV store)
123::
124
125# ceph cephadm get-exporter-config
126
127Show the current exporter configuration, in JSON format
128
129
130.. note:: If the service is already deployed any attempt to change or clear the configuration will
131 be denied. In order to change settings you must remove the service, apply the required configuration
132 and re-apply (``ceph orch apply cephadm-exporter``)
133
134
135
136New Ceph Configuration Keys
137===========================
138The exporter configuration is persisted to the monitor's KV store, with the following keys:
139
140| mgr/cephadm/exporter_config
141| mgr/cephadm/exporter_enabled
142
143
144
145RESTful API
146===========
147The primary goal of the exporter is the provision of metadata from the host to the mgr. This interaction takes
148place over a simple GET interface. Although only the GET method is supported, the API provides multiple URLs to
149provide different views on the metadata that has been gathered.
150
151.. csv-table:: Supported URL endpoints
152 :header: "URL", "Purpose"
153
154 "/v1/metadata", "show all metadata including health of all threads"
155 "/v1/metadata/health", "only report on the health of the data gathering threads"
156 "/v1/metadata/disks", "show the disk output (ceph-volume inventory data)"
157 "/v1/metadata/host", "show host related metadata from the gather-facts command"
158 "/v1/metatdata/daemons", "show the status of all ceph cluster related daemons on the host"
159
160Return Codes
161------------
162The following HTTP return codes are generated by the API
163
164.. csv-table:: Supported HTTP Responses
165 :header: "Status Code", "Meaning"
166
167 "200", "OK"
168 "204", "the thread associated with this request is no longer active, no data is returned"
169 "206", "some threads have stopped, so some content is missing"
170 "401", "request is not authorised - check your token is correct"
171 "404", "URL is malformed, not found"
172 "500", "all threads have stopped - unable to provide any metadata for the host"
173
174
175Deployment
176==========
177During the initial phases of the exporter implementation, deployment is regarded as optional but is available
178to new clusters and existing clusters that have the feature (Pacific and above).
179
180* new clusters : use the ``--with-exporter`` option
181* existing clusters : you'll need to set the configuration and deploy the service manually
182
183.. code::
184
185 # ceph cephadm generate-exporter-config
186 # ceph orch apply cephadm-exporter
187
188If you choose to remove the cephadm-exporter service, you may simply
189
190.. code::
191
192 # ceph orch rm cephadm-exporter
193
194This will remove the daemons, and the exporter releated settings stored in the KV store.
195
196
197Management
198==========
199Once the exporter is deployed, you can use the following snippet to extract the host's metadata.
200
201.. code-block:: python
202
203 import ssl
204 import json
205 import sys
206 import tempfile
207 import time
208 from urllib.request import Request, urlopen
209
210 # CHANGE THIS V
211 hostname = "rh8-1.storage.lab"
212
213 print("Reading config.json")
214 try:
215 with open('./config.json', 'r') as f:
216 raw=f.read()
217 except FileNotFoundError as e:
218 print("You must first create a config.json file using the cephadm get-exporter-config command")
219 sys.exit(1)
220
221 cfg = json.loads(raw)
222 with tempfile.NamedTemporaryFile(buffering=0) as t:
223 print("creating a temporary local crt file from the json")
224 t.write(cfg['crt'].encode('utf-8'))
225
226 ctx = ssl.create_default_context()
227 ctx.check_hostname = False
228 ctx.load_verify_locations(t.name)
229 hdrs={"Authorization":f"Bearer {cfg['token']}"}
230 print("Issuing call to gather metadata")
231 req=Request(f"https://{hostname}:9443/v1/metadata",headers=hdrs)
232 s_time = time.time()
233 r = urlopen(req,context=ctx)
234 print(r.status)
235 print("call complete")
236 # assert r.status == 200
237 if r.status in [200, 206]:
238
239 raw=r.read() # bytes string
240 js=json.loads(raw.decode())
241 print(json.dumps(js, indent=2))
242 elapsed = time.time() - s_time
243 print(f"Elapsed secs : {elapsed}")
244
245
246.. note:: the above example uses python3, and assumes that you've extracted the config using the ``get-exporter-config`` command.
247
248
249Implementation Specific Details
250===============================
251
252In the same way as a typical container based deployment, the exporter is deployed to a directory under ``/var/lib/ceph/<fsid>``. The
253cephadm binary is stored in this cluster folder, and the daemon's configuration and systemd settings are stored
254under ``/var/lib/ceph/<fsid>/cephadm-exporter.<id>/``.
255
256.. code::
257
258 [root@rh8-1 cephadm-exporter.rh8-1]# pwd
259 /var/lib/ceph/cb576f70-2f72-11eb-b141-525400da3eb7/cephadm-exporter.rh8-1
260 [root@rh8-1 cephadm-exporter.rh8-1]# ls -al
261 total 24
262 drwx------. 2 root root 100 Nov 25 18:10 .
263 drwx------. 8 root root 160 Nov 25 23:19 ..
264 -rw-------. 1 root root 1046 Nov 25 18:10 crt
265 -rw-------. 1 root root 1704 Nov 25 18:10 key
266 -rw-------. 1 root root 64 Nov 25 18:10 token
267 -rw-------. 1 root root 38 Nov 25 18:10 unit.configured
268 -rw-------. 1 root root 48 Nov 25 18:10 unit.created
269 -rw-r--r--. 1 root root 157 Nov 25 18:10 unit.run
270
271
272In order to respond to requests quickly, the CephadmDaemon uses a cache object (CephadmCache) to hold the results
273of the cephadm commands.
274
275The exporter doesn't introduce any new data gathering capability - instead it merely calls the existing cephadm commands.
276
277The CephadmDaemon class creates a local HTTP server(uses ThreadingMixIn), secured with TLS and uses the CephadmDaemonHandler
278to handle the requests. The request handler inspects the request header and looks for a valid Bearer token - if this is invalid
279or missing the caller receives a 401 Unauthorized error.
280
281The 'run' method of the CephadmDaemon class, places the scrape_* methods into different threads with each thread supporting
282a different refresh interval. Each thread then periodically issues it's cephadm command, and places the output
283in the cache object.
284
285In addition to the command output, each thread also maintains it's own timestamp record in the cache so the caller can
286very easily determine the age of the data it's received.
287
288If the underlying cephadm command execution hits an exception, the thread passes control to a _handle_thread_exception method.
289Here the exception is logged to the daemon's log file and the exception details are added to the cache, providing visibility
290of the problem to the caller.
291
292Although each thread is effectively given it's own URL endpoint (host, disks, daemons), the recommended way to gather data from
293the host is to simply use the ``/v1/metadata`` endpoint. This will provide all of the data, and indicate whether any of the
294threads have failed.
295
296The run method uses "signal" to establish a reload hook, but in the initial implementation this doesn't take any action and simply
297logs that a reload was received.
298
299
300Future Work
301===========
302
303#. Consider the potential of adding a restart policy for threads
304#. Once the exporter is fully integrated into mgr/cephadm, the goal would be to make the exporter the
305 default means of data gathering. However, until then the exporter will remain as an opt-in 'feature
306 preview'.