]> git.proxmox.com Git - ceph.git/blob - ceph/src/arrow/dev/benchmarking/README.md
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / dev / benchmarking / README.md
1 <!--
2 ~ Licensed to the Apache Software Foundation (ASF) under one
3 ~ or more contributor license agreements. See the NOTICE file
4 ~ distributed with this work for additional information
5 ~ regarding copyright ownership. The ASF licenses this file
6 ~ to you under the Apache License, Version 2.0 (the
7 ~ "License"); you may not use this file except in compliance
8 ~ with the License. You may obtain a copy of the License at
9 ~
10 ~ http://www.apache.org/licenses/LICENSE-2.0
11 ~
12 ~ Unless required by applicable law or agreed to in writing,
13 ~ software distributed under the License is distributed on an
14 ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 ~ KIND, either express or implied. See the License for the
16 ~ specific language governing permissions and limitations
17 ~ under the License.
18 -->
19
20
21 > NOTE: For those deploying this database, Postgres does not by default use
22 > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in
23 > some columns to always work. This [stackoverflow post][so-utf8] describes
24 > how to do it for Amazon RDS. This [section of the docs][pg-charset]
25 > states how to do it in general, i.e.: `initdb -E UTF8`.
26
27 # Benchmark database
28
29 This directory contains files related to the benchmark database.
30
31 - 'ddl/\*.sql' contains the database definition.
32 - 'examples/' contain code to test the database and demonstrate its use.
33 - 'Dockerfile' and 'docker-compose.yml' are for developing benchmarks
34 against a testing database.
35 - An auto-generated summary of views in the [Data model][./data_model.rst].
36
37 ## Setup
38
39 To create a 'machine.json' file that will uniquely identify a computer for
40 benchmark submission, run the provided shell script and fill in the prompts
41 to identify the GPU.
42
43 > NOTE: this does not work on VMs or Windows.
44
45 ```shell
46 ./make_machine_json.sh
47 ```
48
49 Submit the machine details via http using the command
50
51 > NOTE: This will only work if we have selected graphql as a client
52 > and have it running in production or if during development
53 > you have run `docker-compose up` to create and run both a
54 > database Docker container and graphql client Docker container.
55
56 ```shell
57 ./graphql_submit.sh machine machine.json localhost:5000/graphql
58 ```
59
60 or submit after starting up the psql client from this directory, using
61
62 ```
63 \set content `cat machine.json`
64 SELECT ingest_machine_view(:'content'::jsonb);
65 ```
66
67 > NOTE: If you don't have a "machine.json" file generated,
68 > use the example file "examples/machine.json" instead.
69
70 ## Local testing
71
72 There is a file named "[.env][.env]" in this directory that is used by
73 `docker-compose` to set up the postgres user and password for the
74 local containers. Currently the name and password are both
75 `benchmark`. This will be the password for the psql client as well.
76
77 The Postgres Alpine image runs any added '\*.sql' and '\*.sh' scripts placed
78 in '/docker-entrypoint-initdb.d/' during its startup script, so the local
79 database will be set up automatically once the container is running.
80
81 To start the containers, be sure to have [Docker installed][docker],
82 and then run the following from this directory (arrow/dev/benchmarking).
83
84
85 ```
86 docker-compose up
87 ```
88
89 This will start a process that will show logs from both the running
90 Postgres container and the running GraphQL container.
91 To stop the running containers gracefully, background the process
92 and run
93
94 ```
95 docker-compose down
96 fg # To re-foreground the backgrounded process while it exits
97 ```
98
99 You will still have the container images "benchmarking_pg",
100 "graphile/postgraphile", and "postgres:11-alpine" on your
101 computer. You should keep them if you want to run this again.
102 If you don't, then remove them with the command:
103
104 ```
105 docker rmi benchmarking_pg postgres:11-alpine graphile/postgraphile
106 ```
107
108 ### Postgres client
109
110 The `psql` shell client is bundled with the PostgreSQL core distribution
111 available from the [Postgres download page][postgres-downloads].
112 Using the `PG_USER` defined in the `.env` file (currently "benchmark"),
113 the command to connect to the container is:
114 ```shell
115 psql -h localhost -p 5432 -U benchmark
116 ```
117 There is an example script in [examples/example.sql](examples/example.sql) that
118 runs some queries against the database. To run it in the psql client, type
119 the following in the psql command-line interface:
120
121 ```
122 \i examples/example.sql
123 ```
124
125 #### Bulk ingestion using CSV
126
127 An example CSV file for bulk ingestion is in
128 [examples/benchmark_run_example.csv](examples/benchmark_run_example.csv).
129 The columns are listed in the same order as they are defined, to avoid having
130 to explicitly name every column in ingestion. The "id" column is left empty
131 and will be automatically assigned on insert.
132
133 To ingest the example CSV file from the command line,
134 use the command below:
135
136 ```shell
137 CSV='examples/benchmark_run_example.csv' && \
138 psql -U benchmark -h localhost -p 5432 \
139 -c "\copy benchmark_run_view FROM '${CSV}' WITH (FORMAT csv, HEADER);"
140 ```
141
142 #### Bulk ingestion using JSON
143
144 To ingest the example JSON file using the psql client, use the command below.
145
146 ```
147 \set content `cat examples/benchmark_example.json`
148 SELECT ingest_benchmark_view(:'content'::jsonb);
149 ```
150
151 ### HTTP client
152
153 This section requires an actual HTTP client to be up, either
154 for the production database or via the testing setup.
155 (See the [local testing section](#local-testing) for how to set it up).
156
157 The 'graphile/postgraphile' container provides an HTTP interface
158 to the database via two url routes:
159
160 - A GraphiQL page ([localhost:5000/graphiql][graphiql])
161 to aid visual exploration of the data model.
162 (The `--watch` flag on the command line. Not recommended for production.)
163 - An endpoint that receives POST requests only (localhost:5000/graphql).
164
165 #### Ingestion
166
167 The script [graphql_submit.sh](./graphql_submit.sh) simplifies submission
168 to the database via curl. Examples:
169
170 ```shell
171 ./graphql_submit.sh benchmarks examples/benchmark_example.json
172 ./graphql_submit.sh runs examples/benchmark_run_example.json
173 ```
174
175 #### Querying
176
177 The output of the query is a JSON object that is hard to read on the command line.
178 Here is an example query in the shell:
179 ```shell
180 curl -X POST \
181 -H "Content-Type: application/json" \
182 --data '{"query": "{projectDetails{ projectName }}"}' \
183 localhost:5000/graphql
184 ```
185
186 which (if you have previously run the "examples.sql" command) yields
187
188 ```
189 {"data":{"projectDetails":{"projectName":"Apache Arrow"}}}
190 ```
191
192 Here is an example query using Python:
193 ```python
194 import json
195 import requests
196
197 uri = "http://localhost:5000/graphql"
198 query = json.load(open("examples/graphql_query_environment_view.json"))
199 response = requests.post(uri, json=query)
200 message = "{benchmarkLanguage}: {languageImplementationVersion}, {dependencies}"
201
202 for row in response.json()['data']['allEnvironmentViews']['edges']:
203 print(message.format(**row['node']))
204
205 # result:
206 #
207 # Python: CPython 2.7, {"six":"","numpy":"1.14","other_lib":"1.0"}
208 # Python: CPython 2.7, {"six":"","numpy":"1.15","other_lib":"1.0"}
209 # Python: CPython 3.6, {"boost":"1.42","numpy":"1.15"}
210 ```
211
212 ## Deployment
213
214 (work in progress).
215
216 > NOTE: For those deploying this database, Postgres does not by default use
217 > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in
218 > some columns to always work. This [stackoverflow post][so-utf8] describes
219 > how to do it for Amazon RDS. This [section of the docs][pg-charset]
220 > states how to do it in general, i.e.: `initdb -E UTF8`.
221
222
223 ## Quick reference
224
225 - String variables `'have single quotes'`
226 - Arrays `'{"have", "curly", "braces"}'::text[]` or `'{1, 2, 3}'::integer[]`
227 - JSONb `'{"has":"this", "format":42}'::jsonb`
228 - Elements inserted using JSON-formatted strings can use standard
229 JSON-formatted arrays (`[1, 2, 3]`) and do not have to use the above
230 string formats.
231 - When comparing nullable values use `x IS NOT DISTINCT FROM y` rather than `x = y`
232 - An auto-generated summary of the [Data model][./data_model.rst].
233
234 ## Data model documentation
235
236 To recreate the data model documentation,
237 (1) install the [psql client][postgres-downloads]
238 (sorry you need to download the whole thing),
239 (2) start the docker container using `docker-compose up`,
240 (3) and then run these scripts:
241
242 ```
243 ./make_dotfile.sh
244 ./make_data_model_rst.sh
245 ```
246
247 [pg-jsonb]: https://www.postgresql.org/docs/11/datatype-json.html#id-1.5.7.22.3
248 [so-utf8]: https://stackoverflow.com/a/33557023
249 [pg-charset]: https://www.postgresql.org/docs/9.3/multibyte.html#AEN34424
250 [docker]: https://www.docker.com/get-started
251 [citext-limitations]: https://www.postgresql.org/docs/11/citext.html#id-1.11.7.17.7
252 [postgres-downloads]: https://www.postgresql.org/download/
253 [graphiql]: http://localhost:5000/graphiql
254 [postgraphile-lambda]: https://github.com/graphile/postgraphile-lambda-example
255 [postgraphile-cli]: https://www.graphile.org/postgraphile/usage-cli/