2 ~ Licensed to the Apache Software Foundation (ASF) under one
3 ~ or more contributor license agreements. See the NOTICE file
4 ~ distributed with this work for additional information
5 ~ regarding copyright ownership. The ASF licenses this file
6 ~ to you under the Apache License, Version 2.0 (the
7 ~ "License"); you may not use this file except in compliance
8 ~ with the License. You may obtain a copy of the License at
10 ~ http://www.apache.org/licenses/LICENSE-2.0
12 ~ Unless required by applicable law or agreed to in writing,
13 ~ software distributed under the License is distributed on an
14 ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 ~ KIND, either express or implied. See the License for the
16 ~ specific language governing permissions and limitations
21 > NOTE: For those deploying this database, Postgres does not by default use
22 > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in
23 > some columns to always work. This [stackoverflow post][so-utf8] describes
24 > how to do it for Amazon RDS. This [section of the docs][pg-charset]
25 > states how to do it in general, i.e.: `initdb -E UTF8`.
29 This directory contains files related to the benchmark database.
31 - 'ddl/\*.sql' contains the database definition.
32 - 'examples/' contain code to test the database and demonstrate its use.
33 - 'Dockerfile' and 'docker-compose.yml' are for developing benchmarks
34 against a testing database.
35 - An auto-generated summary of views in the [Data model][./data_model.rst].
39 To create a 'machine.json' file that will uniquely identify a computer for
40 benchmark submission, run the provided shell script and fill in the prompts
43 > NOTE: this does not work on VMs or Windows.
46 ./make_machine_json.sh
49 Submit the machine details via http using the command
51 > NOTE: This will only work if we have selected graphql as a client
52 > and have it running in production or if during development
53 > you have run `docker-compose up` to create and run both a
54 > database Docker container and graphql client Docker container.
57 ./graphql_submit.sh machine machine.json localhost:5000/graphql
60 or submit after starting up the psql client from this directory, using
63 \set content `cat machine.json`
64 SELECT ingest_machine_view(:'content'::jsonb);
67 > NOTE: If you don't have a "machine.json" file generated,
68 > use the example file "examples/machine.json" instead.
72 There is a file named "[.env][.env]" in this directory that is used by
73 `docker-compose` to set up the postgres user and password for the
74 local containers. Currently the name and password are both
75 `benchmark`. This will be the password for the psql client as well.
77 The Postgres Alpine image runs any added '\*.sql' and '\*.sh' scripts placed
78 in '/docker-entrypoint-initdb.d/' during its startup script, so the local
79 database will be set up automatically once the container is running.
81 To start the containers, be sure to have [Docker installed][docker],
82 and then run the following from this directory (arrow/dev/benchmarking).
89 This will start a process that will show logs from both the running
90 Postgres container and the running GraphQL container.
91 To stop the running containers gracefully, background the process
96 fg # To re-foreground the backgrounded process while it exits
99 You will still have the container images "benchmarking_pg",
100 "graphile/postgraphile", and "postgres:11-alpine" on your
101 computer. You should keep them if you want to run this again.
102 If you don't, then remove them with the command:
105 docker rmi benchmarking_pg postgres:11-alpine graphile/postgraphile
110 The `psql` shell client is bundled with the PostgreSQL core distribution
111 available from the [Postgres download page][postgres-downloads].
112 Using the `PG_USER` defined in the `.env` file (currently "benchmark"),
113 the command to connect to the container is:
115 psql -h localhost -p 5432 -U benchmark
117 There is an example script in [examples/example.sql](examples/example.sql) that
118 runs some queries against the database. To run it in the psql client, type
119 the following in the psql command-line interface:
122 \i examples/example.sql
125 #### Bulk ingestion using CSV
127 An example CSV file for bulk ingestion is in
128 [examples/benchmark_run_example.csv](examples/benchmark_run_example.csv).
129 The columns are listed in the same order as they are defined, to avoid having
130 to explicitly name every column in ingestion. The "id" column is left empty
131 and will be automatically assigned on insert.
133 To ingest the example CSV file from the command line,
134 use the command below:
137 CSV='examples/benchmark_run_example.csv' && \
138 psql -U benchmark -h localhost -p 5432 \
139 -c "\copy benchmark_run_view FROM '${CSV}' WITH (FORMAT csv, HEADER);"
142 #### Bulk ingestion using JSON
144 To ingest the example JSON file using the psql client, use the command below.
147 \set content `cat examples/benchmark_example.json`
148 SELECT ingest_benchmark_view(:'content'::jsonb);
153 This section requires an actual HTTP client to be up, either
154 for the production database or via the testing setup.
155 (See the [local testing section](#local-testing) for how to set it up).
157 The 'graphile/postgraphile' container provides an HTTP interface
158 to the database via two url routes:
160 - A GraphiQL page ([localhost:5000/graphiql][graphiql])
161 to aid visual exploration of the data model.
162 (The `--watch` flag on the command line. Not recommended for production.)
163 - An endpoint that receives POST requests only (localhost:5000/graphql).
167 The script [graphql_submit.sh](./graphql_submit.sh) simplifies submission
168 to the database via curl. Examples:
171 ./graphql_submit.sh benchmarks examples/benchmark_example.json
172 ./graphql_submit.sh runs examples/benchmark_run_example.json
177 The output of the query is a JSON object that is hard to read on the command line.
178 Here is an example query in the shell:
181 -H "Content-Type: application/json" \
182 --data '{"query": "{projectDetails{ projectName }}"}' \
183 localhost:5000/graphql
186 which (if you have previously run the "examples.sql" command) yields
189 {"data":{"projectDetails":{"projectName":"Apache Arrow"}}}
192 Here is an example query using Python:
197 uri = "http://localhost:5000/graphql"
198 query = json.load(open("examples/graphql_query_environment_view.json"))
199 response = requests.post(uri, json=query)
200 message = "{benchmarkLanguage}: {languageImplementationVersion}, {dependencies}"
202 for row in response.json()['data']['allEnvironmentViews']['edges']:
203 print(message.format(**row['node']))
207 # Python: CPython 2.7, {"six":"","numpy":"1.14","other_lib":"1.0"}
208 # Python: CPython 2.7, {"six":"","numpy":"1.15","other_lib":"1.0"}
209 # Python: CPython 3.6, {"boost":"1.42","numpy":"1.15"}
216 > NOTE: For those deploying this database, Postgres does not by default use
217 > UTF-8, however it is [required for the jsonb][pg-jsonb] format used in
218 > some columns to always work. This [stackoverflow post][so-utf8] describes
219 > how to do it for Amazon RDS. This [section of the docs][pg-charset]
220 > states how to do it in general, i.e.: `initdb -E UTF8`.
225 - String variables `'have single quotes'`
226 - Arrays `'{"have", "curly", "braces"}'::text[]` or `'{1, 2, 3}'::integer[]`
227 - JSONb `'{"has":"this", "format":42}'::jsonb`
228 - Elements inserted using JSON-formatted strings can use standard
229 JSON-formatted arrays (`[1, 2, 3]`) and do not have to use the above
231 - When comparing nullable values use `x IS NOT DISTINCT FROM y` rather than `x = y`
232 - An auto-generated summary of the [Data model][./data_model.rst].
234 ## Data model documentation
236 To recreate the data model documentation,
237 (1) install the [psql client][postgres-downloads]
238 (sorry you need to download the whole thing),
239 (2) start the docker container using `docker-compose up`,
240 (3) and then run these scripts:
244 ./make_data_model_rst.sh
247 [pg-jsonb]: https://www.postgresql.org/docs/11/datatype-json.html#id-1.5.7.22.3
248 [so-utf8]: https://stackoverflow.com/a/33557023
249 [pg-charset]: https://www.postgresql.org/docs/9.3/multibyte.html#AEN34424
250 [docker]: https://www.docker.com/get-started
251 [citext-limitations]: https://www.postgresql.org/docs/11/citext.html#id-1.11.7.17.7
252 [postgres-downloads]: https://www.postgresql.org/download/
253 [graphiql]: http://localhost:5000/graphiql
254 [postgraphile-lambda]: https://github.com/graphile/postgraphile-lambda-example
255 [postgraphile-cli]: https://www.graphile.org/postgraphile/usage-cli/