]>
Commit | Line | Data |
---|---|---|
1d09f67e TL |
1 | <!-- |
2 | ~ Licensed to the Apache Software Foundation (ASF) under one | |
3 | ~ or more contributor license agreements. See the NOTICE file | |
4 | ~ distributed with this work for additional information | |
5 | ~ regarding copyright ownership. The ASF licenses this file | |
6 | ~ to you under the Apache License, Version 2.0 (the | |
7 | ~ "License"); you may not use this file except in compliance | |
8 | ~ with the License. You may obtain a copy of the License at | |
9 | ~ | |
10 | ~ http://www.apache.org/licenses/LICENSE-2.0 | |
11 | ~ | |
12 | ~ Unless required by applicable law or agreed to in writing, | |
13 | ~ software distributed under the License is distributed on an | |
14 | ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
15 | ~ KIND, either express or implied. See the License for the | |
16 | ~ specific language governing permissions and limitations | |
17 | ~ under the License. | |
18 | --> | |
19 | ||
20 | ||
21 | > NOTE: For those deploying this database, Postgres does not by default use | |
22 | > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in | |
23 | > some columns to always work. This [stackoverflow post][so-utf8] describes | |
24 | > how to do it for Amazon RDS. This [section of the docs][pg-charset] | |
25 | > states how to do it in general, i.e.: `initdb -E UTF8`. | |
26 | ||
27 | # Benchmark database | |
28 | ||
29 | This directory contains files related to the benchmark database. | |
30 | ||
31 | - 'ddl/\*.sql' contains the database definition. | |
32 | - 'examples/' contain code to test the database and demonstrate its use. | |
33 | - 'Dockerfile' and 'docker-compose.yml' are for developing benchmarks | |
34 | against a testing database. | |
35 | - An auto-generated summary of views in the [Data model][./data_model.rst]. | |
36 | ||
37 | ## Setup | |
38 | ||
39 | To create a 'machine.json' file that will uniquely identify a computer for | |
40 | benchmark submission, run the provided shell script and fill in the prompts | |
41 | to identify the GPU. | |
42 | ||
43 | > NOTE: this does not work on VMs or Windows. | |
44 | ||
45 | ```shell | |
46 | ./make_machine_json.sh | |
47 | ``` | |
48 | ||
49 | Submit the machine details via http using the command | |
50 | ||
51 | > NOTE: This will only work if we have selected graphql as a client | |
52 | > and have it running in production or if during development | |
53 | > you have run `docker-compose up` to create and run both a | |
54 | > database Docker container and graphql client Docker container. | |
55 | ||
56 | ```shell | |
57 | ./graphql_submit.sh machine machine.json localhost:5000/graphql | |
58 | ``` | |
59 | ||
60 | or submit after starting up the psql client from this directory, using | |
61 | ||
62 | ``` | |
63 | \set content `cat machine.json` | |
64 | SELECT ingest_machine_view(:'content'::jsonb); | |
65 | ``` | |
66 | ||
67 | > NOTE: If you don't have a "machine.json" file generated, | |
68 | > use the example file "examples/machine.json" instead. | |
69 | ||
70 | ## Local testing | |
71 | ||
72 | There is a file named "[.env][.env]" in this directory that is used by | |
73 | `docker-compose` to set up the postgres user and password for the | |
74 | local containers. Currently the name and password are both | |
75 | `benchmark`. This will be the password for the psql client as well. | |
76 | ||
77 | The Postgres Alpine image runs any added '\*.sql' and '\*.sh' scripts placed | |
78 | in '/docker-entrypoint-initdb.d/' during its startup script, so the local | |
79 | database will be set up automatically once the container is running. | |
80 | ||
81 | To start the containers, be sure to have [Docker installed][docker], | |
82 | and then run the following from this directory (arrow/dev/benchmarking). | |
83 | ||
84 | ||
85 | ``` | |
86 | docker-compose up | |
87 | ``` | |
88 | ||
89 | This will start a process that will show logs from both the running | |
90 | Postgres container and the running GraphQL container. | |
91 | To stop the running containers gracefully, background the process | |
92 | and run | |
93 | ||
94 | ``` | |
95 | docker-compose down | |
96 | fg # To re-foreground the backgrounded process while it exits | |
97 | ``` | |
98 | ||
99 | You will still have the container images "benchmarking_pg", | |
100 | "graphile/postgraphile", and "postgres:11-alpine" on your | |
101 | computer. You should keep them if you want to run this again. | |
102 | If you don't, then remove them with the command: | |
103 | ||
104 | ``` | |
105 | docker rmi benchmarking_pg postgres:11-alpine graphile/postgraphile | |
106 | ``` | |
107 | ||
108 | ### Postgres client | |
109 | ||
110 | The `psql` shell client is bundled with the PostgreSQL core distribution | |
111 | available from the [Postgres download page][postgres-downloads]. | |
112 | Using the `PG_USER` defined in the `.env` file (currently "benchmark"), | |
113 | the command to connect to the container is: | |
114 | ```shell | |
115 | psql -h localhost -p 5432 -U benchmark | |
116 | ``` | |
117 | There is an example script in [examples/example.sql](examples/example.sql) that | |
118 | runs some queries against the database. To run it in the psql client, type | |
119 | the following in the psql command-line interface: | |
120 | ||
121 | ``` | |
122 | \i examples/example.sql | |
123 | ``` | |
124 | ||
125 | #### Bulk ingestion using CSV | |
126 | ||
127 | An example CSV file for bulk ingestion is in | |
128 | [examples/benchmark_run_example.csv](examples/benchmark_run_example.csv). | |
129 | The columns are listed in the same order as they are defined, to avoid having | |
130 | to explicitly name every column in ingestion. The "id" column is left empty | |
131 | and will be automatically assigned on insert. | |
132 | ||
133 | To ingest the example CSV file from the command line, | |
134 | use the command below: | |
135 | ||
136 | ```shell | |
137 | CSV='examples/benchmark_run_example.csv' && \ | |
138 | psql -U benchmark -h localhost -p 5432 \ | |
139 | -c "\copy benchmark_run_view FROM '${CSV}' WITH (FORMAT csv, HEADER);" | |
140 | ``` | |
141 | ||
142 | #### Bulk ingestion using JSON | |
143 | ||
144 | To ingest the example JSON file using the psql client, use the command below. | |
145 | ||
146 | ``` | |
147 | \set content `cat examples/benchmark_example.json` | |
148 | SELECT ingest_benchmark_view(:'content'::jsonb); | |
149 | ``` | |
150 | ||
151 | ### HTTP client | |
152 | ||
153 | This section requires an actual HTTP client to be up, either | |
154 | for the production database or via the testing setup. | |
155 | (See the [local testing section](#local-testing) for how to set it up). | |
156 | ||
157 | The 'graphile/postgraphile' container provides an HTTP interface | |
158 | to the database via two url routes: | |
159 | ||
160 | - A GraphiQL page ([localhost:5000/graphiql][graphiql]) | |
161 | to aid visual exploration of the data model. | |
162 | (The `--watch` flag on the command line. Not recommended for production.) | |
163 | - An endpoint that receives POST requests only (localhost:5000/graphql). | |
164 | ||
165 | #### Ingestion | |
166 | ||
167 | The script [graphql_submit.sh](./graphql_submit.sh) simplifies submission | |
168 | to the database via curl. Examples: | |
169 | ||
170 | ```shell | |
171 | ./graphql_submit.sh benchmarks examples/benchmark_example.json | |
172 | ./graphql_submit.sh runs examples/benchmark_run_example.json | |
173 | ``` | |
174 | ||
175 | #### Querying | |
176 | ||
177 | The output of the query is a JSON object that is hard to read on the command line. | |
178 | Here is an example query in the shell: | |
179 | ```shell | |
180 | curl -X POST \ | |
181 | -H "Content-Type: application/json" \ | |
182 | --data '{"query": "{projectDetails{ projectName }}"}' \ | |
183 | localhost:5000/graphql | |
184 | ``` | |
185 | ||
186 | which (if you have previously run the "examples.sql" command) yields | |
187 | ||
188 | ``` | |
189 | {"data":{"projectDetails":{"projectName":"Apache Arrow"}}} | |
190 | ``` | |
191 | ||
192 | Here is an example query using Python: | |
193 | ```python | |
194 | import json | |
195 | import requests | |
196 | ||
197 | uri = "http://localhost:5000/graphql" | |
198 | query = json.load(open("examples/graphql_query_environment_view.json")) | |
199 | response = requests.post(uri, json=query) | |
200 | message = "{benchmarkLanguage}: {languageImplementationVersion}, {dependencies}" | |
201 | ||
202 | for row in response.json()['data']['allEnvironmentViews']['edges']: | |
203 | print(message.format(**row['node'])) | |
204 | ||
205 | # result: | |
206 | # | |
207 | # Python: CPython 2.7, {"six":"","numpy":"1.14","other_lib":"1.0"} | |
208 | # Python: CPython 2.7, {"six":"","numpy":"1.15","other_lib":"1.0"} | |
209 | # Python: CPython 3.6, {"boost":"1.42","numpy":"1.15"} | |
210 | ``` | |
211 | ||
212 | ## Deployment | |
213 | ||
214 | (work in progress). | |
215 | ||
216 | > NOTE: For those deploying this database, Postgres does not by default use | |
217 | > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in | |
218 | > some columns to always work. This [stackoverflow post][so-utf8] describes | |
219 | > how to do it for Amazon RDS. This [section of the docs][pg-charset] | |
220 | > states how to do it in general, i.e.: `initdb -E UTF8`. | |
221 | ||
222 | ||
223 | ## Quick reference | |
224 | ||
225 | - String variables `'have single quotes'` | |
226 | - Arrays `'{"have", "curly", "braces"}'::text[]` or `'{1, 2, 3}'::integer[]` | |
227 | - JSONb `'{"has":"this", "format":42}'::jsonb` | |
228 | - Elements inserted using JSON-formatted strings can use standard | |
229 | JSON-formatted arrays (`[1, 2, 3]`) and do not have to use the above | |
230 | string formats. | |
231 | - When comparing nullable values use `x IS NOT DISTINCT FROM y` rather than `x = y` | |
232 | - An auto-generated summary of the [Data model][./data_model.rst]. | |
233 | ||
234 | ## Data model documentation | |
235 | ||
236 | To recreate the data model documentation, | |
237 | (1) install the [psql client][postgres-downloads] | |
238 | (sorry you need to download the whole thing), | |
239 | (2) start the docker container using `docker-compose up`, | |
240 | (3) and then run these scripts: | |
241 | ||
242 | ``` | |
243 | ./make_dotfile.sh | |
244 | ./make_data_model_rst.sh | |
245 | ``` | |
246 | ||
247 | [pg-jsonb]: https://www.postgresql.org/docs/11/datatype-json.html#id-1.5.7.22.3 | |
248 | [so-utf8]: https://stackoverflow.com/a/33557023 | |
249 | [pg-charset]: https://www.postgresql.org/docs/9.3/multibyte.html#AEN34424 | |
250 | [docker]: https://www.docker.com/get-started | |
251 | [citext-limitations]: https://www.postgresql.org/docs/11/citext.html#id-1.11.7.17.7 | |
252 | [postgres-downloads]: https://www.postgresql.org/download/ | |
253 | [graphiql]: http://localhost:5000/graphiql | |
254 | [postgraphile-lambda]: https://github.com/graphile/postgraphile-lambda-example | |
255 | [postgraphile-cli]: https://www.graphile.org/postgraphile/usage-cli/ |