]> git.proxmox.com Git - ceph.git/blob - ceph/src/arrow/docs/source/python/feather.rst
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / docs / source / python / feather.rst
1 .. Licensed to the Apache Software Foundation (ASF) under one
2 .. or more contributor license agreements. See the NOTICE file
3 .. distributed with this work for additional information
4 .. regarding copyright ownership. The ASF licenses this file
5 .. to you under the Apache License, Version 2.0 (the
6 .. "License"); you may not use this file except in compliance
7 .. with the License. You may obtain a copy of the License at
8
9 .. http://www.apache.org/licenses/LICENSE-2.0
10
11 .. Unless required by applicable law or agreed to in writing,
12 .. software distributed under the License is distributed on an
13 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14 .. KIND, either express or implied. See the License for the
15 .. specific language governing permissions and limitations
16 .. under the License.
17
18 .. currentmodule:: pyarrow
19
20 .. _feather:
21
22 Feather File Format
23 ===================
24
25 Feather is a portable file format for storing Arrow tables or data frames (from
26 languages like Python or R) that utilizes the :ref:`Arrow IPC format <ipc>`
27 internally. Feather was created early in the Arrow project as a proof of
28 concept for fast, language-agnostic data frame storage for Python (pandas) and
29 R. There are two file format versions for Feather:
30
31 * Version 2 (V2), the default version, which is exactly represented as the
32 Arrow IPC file format on disk. V2 files support storing all Arrow data types
33 as well as compression with LZ4 or ZSTD. V2 was first made available in
34 Apache Arrow 0.17.0.
35 * Version 1 (V1), a legacy version available starting in 2016, replaced by
36 V2. V1 files are distinct from Arrow IPC files and lack many features, such
37 as the ability to store all Arrow data types. V1 files also lack compression
38 support. We intend to maintain read support for V1 for the foreseeable
39 future.
40
41 The ``pyarrow.feather`` module contains the read and write functions for the
42 format. :func:`~pyarrow.feather.write_feather` accepts either a
43 :class:`~pyarrow.Table` or ``pandas.DataFrame`` object:
44
45 .. code-block:: python
46
47 import pyarrow.feather as feather
48 feather.write_feather(df, '/path/to/file')
49
50 :func:`~pyarrow.feather.read_feather` reads a Feather file as a
51 ``pandas.DataFrame``. :func:`~pyarrow.feather.read_table` reads a Feather file
52 as a :class:`~pyarrow.Table`. Internally, :func:`~pyarrow.feather.read_feather`
53 simply calls :func:`~pyarrow.feather.read_table` and the result is converted to
54 pandas:
55
56 .. code-block:: python
57
58 # Result is pandas.DataFrame
59 read_df = feather.read_feather('/path/to/file')
60
61 # Result is pyarrow.Table
62 read_arrow = feather.read_table('/path/to/file')
63
64 These functions can read and write with file-paths or file-like objects. For
65 example:
66
67 .. code-block:: python
68
69 with open('/path/to/file', 'wb') as f:
70 feather.write_feather(df, f)
71
72 with open('/path/to/file', 'rb') as f:
73 read_df = feather.read_feather(f)
74
75 A file input to ``read_feather`` must support seeking.
76
77 Using Compression
78 -----------------
79
80 As of Apache Arrow version 0.17.0, Feather V2 files (the default version)
81 support two fast compression libraries, LZ4 (using the frame format) and
82 ZSTD. LZ4 is used by default if it is available (which it should be if you
83 obtained pyarrow through a normal package manager):
84
85 .. code-block:: python
86
87 # Uses LZ4 by default
88 feather.write_feather(df, file_path)
89
90 # Use LZ4 explicitly
91 feather.write_feather(df, file_path, compression='lz4')
92
93 # Use ZSTD
94 feather.write_feather(df, file_path, compression='zstd')
95
96 # Do not compress
97 feather.write_feather(df, file_path, compression='uncompressed')
98
99 Note that the default LZ4 compression generally yields much smaller files
100 without sacrificing much read or write performance. In some instances,
101 LZ4-compressed files may be faster to read and write than uncompressed due to
102 reduced disk IO requirements.
103
104 Writing Version 1 (V1) Files
105 ----------------------------
106
107 For compatibility with libraries without support for Version 2 files, you can
108 write the version 1 format by passing ``version=1`` to ``write_feather``. We
109 intend to maintain read support for V1 for the foreseeable future.