]> git.proxmox.com Git - ceph.git/blob - ceph/src/arrow/docs/source/developers/cpp/development.rst
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / docs / source / developers / cpp / development.rst
1 .. Licensed to the Apache Software Foundation (ASF) under one
2 .. or more contributor license agreements. See the NOTICE file
3 .. distributed with this work for additional information
4 .. regarding copyright ownership. The ASF licenses this file
5 .. to you under the Apache License, Version 2.0 (the
6 .. "License"); you may not use this file except in compliance
7 .. with the License. You may obtain a copy of the License at
8
9 .. http://www.apache.org/licenses/LICENSE-2.0
10
11 .. Unless required by applicable law or agreed to in writing,
12 .. software distributed under the License is distributed on an
13 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14 .. KIND, either express or implied. See the License for the
15 .. specific language governing permissions and limitations
16 .. under the License.
17
18 ======================
19 Development Guidelines
20 ======================
21
22 This section provides information for developers who wish to contribute to the
23 C++ codebase.
24
25 .. note::
26
27 Since most of the project's developers work on Linux or macOS, not all
28 features or developer tools are uniformly supported on Windows. If you are
29 on Windows, have a look at :ref:`developers-cpp-windows`.
30
31 Compiler warning levels
32 =======================
33
34 The ``BUILD_WARNING_LEVEL`` CMake option switches between sets of predetermined
35 compiler warning levels that we use for code tidiness. For release builds, the
36 default warning level is ``PRODUCTION``, while for debug builds the default is
37 ``CHECKIN``.
38
39 When using ``CHECKIN`` for debug builds, ``-Werror`` is added when using gcc
40 and clang, causing build failures for any warning, and ``/WX`` is set with MSVC
41 having the same effect.
42
43 Running unit tests
44 ==================
45
46 The ``-DARROW_BUILD_TESTS=ON`` CMake option enables building of unit test
47 executables. You can then either run them individually, by launching the
48 desired executable, or run them all at once by launching the ``ctest``
49 executable (which is part of the CMake suite).
50
51 A possible invocation is something like::
52
53 $ ctest -j16 --output-on-failure
54
55 where the ``-j16`` option runs up to 16 tests in parallel, taking advantage
56 of multiple CPU cores and hardware threads.
57
58 Running benchmarks
59 ==================
60
61 The ``-DARROW_BUILD_BENCHMARKS=ON`` CMake option enables building of benchmark
62 executables. You can then run benchmarks individually by launching the
63 corresponding executable from the command line, e.g.::
64
65 $ ./build/release/arrow-builder-benchmark
66
67 .. note::
68 For meaningful benchmark numbers, it is very strongly recommended to build
69 in ``Release`` mode, so as to enable compiler optimizations.
70
71 Code Style, Linting, and CI
72 ===========================
73
74 This project follows `Google's C++ Style Guide
75 <https://google.github.io/styleguide/cppguide.html>`_ with minor exceptions:
76
77 * We relax the line length restriction to 90 characters.
78 * We use the ``NULLPTR`` macro in header files (instead of ``nullptr``) defined
79 in ``src/arrow/util/macros.h`` to support building C++/CLI (ARROW-1134)
80 * We relax the guide's rules regarding structs. For public headers we should
81 use struct only for objects that are principally simple data containers where
82 it is OK to expose all the internal members and any methods are primarily
83 conveniences. For private headers the rules are relaxed further and structs
84 can be used where convenient for types that do not need access control even
85 though they may not be simple data containers.
86
87 Our continuous integration builds on GitHub Actions run the unit test
88 suites on a variety of platforms and configuration, including using
89 Address Sanitizer and Undefined Behavior Sanitizer to check for various
90 patterns of misbehaviour such as memory leaks. In addition, the
91 codebase is subjected to a number of code style and code cleanliness checks.
92
93 In order to have a passing CI build, your modified git branch must pass the
94 following checks:
95
96 * C++ builds with the project's active version of ``clang`` without
97 compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN``. Note that
98 there are classes of warnings (such as ``-Wdocumentation``, see more
99 on this below) that are not caught by ``gcc``.
100 * CMake files pass style checks, can be fixed by running
101 ``archery lint --cmake-format --fix``. This requires Python
102 3 and `cmake_format <https://github.com/cheshirekow/cmake_format>`_ (note:
103 this currently does not work on Windows)
104 * Passes various C++ (and others) style checks, checked with the ``lint``
105 subcommand to :ref:`Archery <archery>`. This can also be fixed locally
106 by running ``archery lint --cpplint --fix``.
107
108 In order to account for variations in the behavior of ``clang-format`` between
109 major versions of LLVM, we pin the version of ``clang-format`` used (current
110 LLVM 8).
111
112 Depending on how you installed clang-format, the build system may not be able
113 to find it. You can provide an explicit path to your LLVM installation (or the
114 root path for the clang tools) with the environment variable
115 `$CLANG_TOOLS_PATH` or by passing ``-DClangTools_PATH=$PATH_TO_CLANG_TOOLS`` when
116 invoking CMake.
117
118 To make linting more reproducible for everyone, we provide a ``docker-compose``
119 target that is executable from the root of the repository:
120
121 .. code-block:: shell
122
123 docker-compose run ubuntu-lint
124
125 Cleaning includes with include-what-you-use (IWYU)
126 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
127
128 We occasionally use Google's `include-what-you-use
129 <https://github.com/include-what-you-use/include-what-you-use>`_ tool, also
130 known as IWYU, to remove unnecessary imports.
131
132 To begin using IWYU, you must first build it by following the instructions in
133 the project's documentation. Once the ``include-what-you-use`` executable is in
134 your ``$PATH``, you must run CMake with ``-DCMAKE_EXPORT_COMPILE_COMMANDS=ON``
135 in a new out-of-source CMake build directory like so:
136
137 .. code-block:: shell
138
139 mkdir -p $ARROW_ROOT/cpp/iwyu
140 cd $ARROW_ROOT/cpp/iwyu
141 cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
142 -DARROW_PYTHON=ON \
143 -DARROW_PARQUET=ON \
144 -DARROW_FLIGHT=ON \
145 -DARROW_PLASMA=ON \
146 -DARROW_GANDIVA=ON \
147 -DARROW_BUILD_BENCHMARKS=ON \
148 -DARROW_BUILD_BENCHMARKS_REFERENCE=ON \
149 -DARROW_BUILD_TESTS=ON \
150 -DARROW_BUILD_UTILITIES=ON \
151 -DARROW_S3=ON \
152 -DARROW_WITH_BROTLI=ON \
153 -DARROW_WITH_BZ2=ON \
154 -DARROW_WITH_LZ4=ON \
155 -DARROW_WITH_SNAPPY=ON \
156 -DARROW_WITH_ZLIB=ON \
157 -DARROW_WITH_ZSTD=ON ..
158
159 In order for IWYU to run on the desired component in the codebase, it must be
160 enabled by the CMake configuration flags. Once this is done, you can run IWYU
161 on the whole codebase by running a helper ``iwyu.sh`` script:
162
163 .. code-block:: shell
164
165 IWYU_SH=$ARROW_ROOT/cpp/build-support/iwyu/iwyu.sh
166 ./$IWYU_SH
167
168 Since this is very time consuming, you can check a subset of files matching
169 some string pattern with the special "match" option
170
171 .. code-block:: shell
172
173 ./$IWYU_SH match $PATTERN
174
175 For example, if you wanted to do IWYU checks on all files in
176 ``src/arrow/array``, you could run
177
178 .. code-block:: shell
179
180 ./$IWYU_SH match arrow/array
181
182 Checking for ABI and API stability
183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184
185 To build ABI compliance reports, you need to install the two tools
186 ``abi-dumper`` and ``abi-compliance-checker``.
187
188 Build Arrow C++ in Debug mode, alternatively you could use ``-Og`` which also
189 builds with the necessary symbols but includes a bit of code optimization.
190 Once the build has finished, you can generate ABI reports using:
191
192 .. code-block:: shell
193
194 abi-dumper -lver 9 debug/libarrow.so -o ABI-9.dump
195
196 The above version number is freely selectable. As we want to compare versions,
197 you should now ``git checkout`` the version you want to compare it to and re-run
198 the above command using a different version number. Once both reports are
199 generated, you can build a comparison report using
200
201 .. code-block:: shell
202
203 abi-compliance-checker -l libarrow -d1 ABI-PY-9.dump -d2 ABI-PY-10.dump
204
205 The report is then generated in ``compat_reports/libarrow`` as a HTML.
206
207 API Documentation
208 =================
209
210 We use Doxygen style comments (``///``) in header files for comments
211 that we wish to show up in API documentation for classes and
212 functions.
213
214 When using ``clang`` and building with
215 ``-DBUILD_WARNING_LEVEL=CHECKIN``, the ``-Wdocumentation`` flag is
216 used which checks for some common documentation inconsistencies, like
217 documenting some, but not all function parameters with ``\param``. See
218 the `LLVM documentation warnings section
219 <https://releases.llvm.org/7.0.1/tools/clang/docs/DiagnosticsReference.html#wdocumentation>`_
220 for more about this.
221
222 While we publish the API documentation as part of the main Sphinx-based
223 documentation site, you can also build the C++ API documentation anytime using
224 Doxygen. Run the following command from the ``cpp/apidoc`` directory:
225
226 .. code-block:: shell
227
228 doxygen Doxyfile
229
230 This requires `Doxygen <https://www.doxygen.org>`_ to be installed.
231
232 Apache Parquet Development
233 ==========================
234
235 To build the C++ libraries for Apache Parquet, add the flag
236 ``-DARROW_PARQUET=ON`` when invoking CMake.
237 To build Apache Parquet with encryption support, add the flag
238 ``-DPARQUET_REQUIRE_ENCRYPTION=ON`` when invoking CMake. The Parquet libraries and unit tests
239 can be built with the ``parquet`` make target:
240
241 .. code-block:: shell
242
243 make parquet
244
245 On Linux and macOS if you do not have Apache Thrift installed on your system,
246 or you are building with ``-DThrift_SOURCE=BUNDLED``, you must install
247 ``bison`` and ``flex`` packages. On Windows we handle these build dependencies
248 automatically when building Thrift from source.
249
250 Running ``ctest -L unittest`` will run all built C++ unit tests, while ``ctest -L
251 parquet`` will run only the Parquet unit tests. The unit tests depend on an
252 environment variable ``PARQUET_TEST_DATA`` that depends on a git submodule to the
253 repository https://github.com/apache/parquet-testing:
254
255 .. code-block:: shell
256
257 git submodule update --init
258 export PARQUET_TEST_DATA=$ARROW_ROOT/cpp/submodules/parquet-testing/data
259
260 Here ``$ARROW_ROOT`` is the absolute path to the Arrow codebase.
261
262 Arrow Flight RPC
263 ================
264
265 In addition to the Arrow dependencies, Flight requires:
266
267 * gRPC (>= 1.14, roughly)
268 * Protobuf (>= 3.6, earlier versions may work)
269 * c-ares (used by gRPC)
270
271 By default, Arrow will try to download and build these dependencies
272 when building Flight.
273
274 The optional ``flight`` libraries and tests can be built by passing
275 ``-DARROW_FLIGHT=ON``.
276
277 .. code-block:: shell
278
279 cmake .. -DARROW_FLIGHT=ON -DARROW_BUILD_TESTS=ON
280 make
281
282 You can also use existing installations of the extra dependencies.
283 When building, set the environment variables ``gRPC_ROOT`` and/or
284 ``Protobuf_ROOT`` and/or ``c-ares_ROOT``.
285
286 We are developing against recent versions of gRPC, and the versions. The
287 ``grpc-cpp`` package available from https://conda-forge.org/ is one reliable
288 way to obtain gRPC in a cross-platform way. You may try using system libraries
289 for gRPC and Protobuf, but these are likely to be too old. On macOS, you can
290 try `Homebrew <https://brew.sh/>`_:
291
292 .. code-block:: shell
293
294 brew install grpc