]>
Commit | Line | Data |
---|---|---|
1d09f67e TL |
1 | .. Licensed to the Apache Software Foundation (ASF) under one |
2 | .. or more contributor license agreements. See the NOTICE file | |
3 | .. distributed with this work for additional information | |
4 | .. regarding copyright ownership. The ASF licenses this file | |
5 | .. to you under the Apache License, Version 2.0 (the | |
6 | .. "License"); you may not use this file except in compliance | |
7 | .. with the License. You may obtain a copy of the License at | |
8 | ||
9 | .. http://www.apache.org/licenses/LICENSE-2.0 | |
10 | ||
11 | .. Unless required by applicable law or agreed to in writing, | |
12 | .. software distributed under the License is distributed on an | |
13 | .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
14 | .. KIND, either express or implied. See the License for the | |
15 | .. specific language governing permissions and limitations | |
16 | .. under the License. | |
17 | ||
18 | .. default-domain:: cpp | |
19 | .. highlight:: cpp | |
20 | ||
21 | High-Level Overview | |
22 | =================== | |
23 | ||
24 | The Arrow C++ library is comprised of different parts, each of which serves | |
25 | a specific purpose. | |
26 | ||
27 | The physical layer | |
28 | ------------------ | |
29 | ||
30 | **Memory management** abstractions provide a uniform API over memory that | |
31 | may be allocated through various means, such as heap allocation, the memory | |
32 | mapping of a file or a static memory area. In particular, the **buffer** | |
33 | abstraction represents a contiguous area of physical data. | |
34 | ||
35 | The one-dimensional layer | |
36 | ------------------------- | |
37 | ||
38 | **Data types** govern the *logical* interpretation of *physical* data. | |
39 | Many operations in Arrow are parametered, at compile-time or at runtime, | |
40 | by a data type. | |
41 | ||
42 | **Arrays** assemble one or several buffers with a data type, allowing to | |
43 | view them as a logical contiguous sequence of values (possibly nested). | |
44 | ||
45 | **Chunked arrays** are a generalization of arrays, comprising several same-type | |
46 | arrays into a longer logical sequence of values. | |
47 | ||
48 | The two-dimensional layer | |
49 | ------------------------- | |
50 | ||
51 | **Schemas** describe a logical collection of several pieces of data, | |
52 | each with a distinct name and type, and optional metadata. | |
53 | ||
54 | **Tables** are collections of chunked array in accordance to a schema. They | |
55 | are the most capable dataset-providing abstraction in Arrow. | |
56 | ||
57 | **Record batches** are collections of contiguous arrays, described | |
58 | by a schema. They allow incremental construction or serialization of tables. | |
59 | ||
60 | The compute layer | |
61 | ----------------- | |
62 | ||
63 | **Datums** are flexible dataset references, able to hold for example an array or table | |
64 | reference. | |
65 | ||
66 | **Kernels** are specialized computation functions running in a loop over a | |
67 | given set of datums representing input and output parameters to the functions. | |
68 | ||
69 | The IO layer | |
70 | ------------ | |
71 | ||
72 | **Streams** allow untyped sequential or seekable access over external data | |
73 | of various kinds (for example compressed or memory-mapped). | |
74 | ||
75 | The Inter-Process Communication (IPC) layer | |
76 | ------------------------------------------- | |
77 | ||
78 | A **messaging format** allows interchange of Arrow data between processes, using | |
79 | as few copies as possible. | |
80 | ||
81 | The file formats layer | |
82 | ---------------------- | |
83 | ||
84 | Reading and writing Arrow data from/to various file formats is possible, for | |
85 | example **Parquet**, **CSV**, **Orc** or the Arrow-specific **Feather** format. | |
86 | ||
87 | The devices layer | |
88 | ----------------- | |
89 | ||
90 | Basic **CUDA** integration is provided, allowing to describe Arrow data backed | |
91 | by GPU-allocated memory. | |
92 | ||
93 | The filesystem layer | |
94 | -------------------- | |
95 | ||
96 | A filesystem abstraction allows reading and writing data from different storage | |
97 | backends, such as the local filesystem or a S3 bucket. |