]> git.proxmox.com Git - ceph.git/blob - ceph/src/arrow/docs/source/cpp/io.rst
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / docs / source / cpp / io.rst
1 .. Licensed to the Apache Software Foundation (ASF) under one
2 .. or more contributor license agreements. See the NOTICE file
3 .. distributed with this work for additional information
4 .. regarding copyright ownership. The ASF licenses this file
5 .. to you under the Apache License, Version 2.0 (the
6 .. "License"); you may not use this file except in compliance
7 .. with the License. You may obtain a copy of the License at
8
9 .. http://www.apache.org/licenses/LICENSE-2.0
10
11 .. Unless required by applicable law or agreed to in writing,
12 .. software distributed under the License is distributed on an
13 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14 .. KIND, either express or implied. See the License for the
15 .. specific language governing permissions and limitations
16 .. under the License.
17
18 .. default-domain:: cpp
19 .. highlight:: cpp
20 .. cpp:namespace:: arrow::io
21
22 ==============================
23 Input / output and filesystems
24 ==============================
25
26 Arrow provides a range of C++ interfaces abstracting the concrete details
27 of input / output operations. They operate on streams of untyped binary data.
28 Those abstractions are used for various purposes such as reading CSV or
29 Parquet data, transmitting IPC streams, and more.
30
31 .. seealso::
32 :doc:`API reference for input/output facilities <api/io>`.
33
34 Reading binary data
35 ===================
36
37 Interfaces for reading binary data come in two flavours:
38
39 * Sequential reading: the :class:`InputStream` interface provides
40 ``Read`` methods; it is recommended to ``Read`` to a ``Buffer`` as it
41 may in some cases avoid a memory copy.
42
43 * Random access reading: the :class:`RandomAccessFile` interface
44 provides additional facilities for positioning and, most importantly,
45 the ``ReadAt`` methods which allow parallel reading from multiple threads.
46
47 Concrete implementations are available for :class:`in-memory reads <BufferReader>`,
48 :class:`unbuffered file reads <ReadableFile>`,
49 :class:`memory-mapped file reads <MemoryMappedFile>`,
50 :class:`buffered reads <BufferedInputStream>`,
51 :class:`compressed reads <CompressedInputStream>`.
52
53 Writing binary data
54 ===================
55
56 Writing binary data is mostly done through the :class:`OutputStream`
57 interface.
58
59 Concrete implementations are available for :class:`in-memory writes <BufferOutputStream>`,
60 :class:`unbuffered file writes <FileOutputStream>`,
61 :class:`memory-mapped file writes <MemoryMappedFile>`,
62 :class:`buffered writes <BufferedOutputStream>`,
63 :class:`compressed writes <CompressedOutputStream>`.
64
65 .. cpp:namespace:: arrow::fs
66
67 .. _cpp-filesystems:
68
69 Filesystems
70 ===========
71
72 The :class:`filesystem interface <FileSystem>` allows abstracted access over
73 various data storage backends such as the local filesystem or a S3 bucket.
74 It provides input and output streams as well as directory operations.
75
76 The filesystem interface exposes a simplified view of the underlying data
77 storage. Data paths are represented as *abstract paths*, which are
78 ``/``-separated, even on Windows, and shouldn't include special path
79 components such as ``.`` and ``..``. Symbolic links, if supported by the
80 underlying storage, are automatically dereferenced. Only basic
81 :class:`metadata <FileStats>` about file entries, such as the file size
82 and modification time, is made available.
83
84 Concrete implementations are available for
85 :class:`local filesystem access <LocalFileSystem>`,
86 :class:`HDFS <HadoopFileSystem>` and
87 :class:`Amazon S3-compatible storage <S3FileSystem>`.