--- /dev/null
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. default-domain:: cpp
+.. highlight:: cpp
+
+.. cpp:namespace:: arrow::csv
+
+=============================
+Reading and Writing CSV files
+=============================
+
+Arrow provides a fast CSV reader allowing ingestion of external data
+as Arrow tables.
+
+.. seealso::
+ :ref:`CSV reader/writer API reference <cpp-api-csv>`.
+
+Basic usage
+===========
+
+A CSV file is read from a :class:`~arrow::io::InputStream`.
+
+.. code-block:: cpp
+
+ #include "arrow/csv/api.h"
+
+ {
+ // ...
+ arrow::io::IOContext io_context = arrow::io::default_io_context();
+ std::shared_ptr<arrow::io::InputStream> input = ...;
+
+ auto read_options = arrow::csv::ReadOptions::Defaults();
+ auto parse_options = arrow::csv::ParseOptions::Defaults();
+ auto convert_options = arrow::csv::ConvertOptions::Defaults();
+
+ // Instantiate TableReader from input stream and options
+ auto maybe_reader =
+ arrow::csv::TableReader::Make(io_context,
+ input,
+ read_options,
+ parse_options,
+ convert_options);
+ if (!maybe_reader.ok()) {
+ // Handle TableReader instantiation error...
+ }
+ std::shared_ptr<arrow::csv::TableReader> reader = *maybe_reader;
+
+ // Read table from CSV file
+ auto maybe_table = reader->Read();
+ if (!maybe_table.ok()) {
+ // Handle CSV read error
+ // (for example a CSV syntax error or failed type conversion)
+ }
+ std::shared_ptr<arrow::Table> table = *maybe_table;
+ }
+
+A CSV file is written to a :class:`~arrow::io::OutputStream`.
+
+.. code-block:: cpp
+
+ #include <arrow/csv/api.h>
+ {
+ // Oneshot write
+ // ...
+ std::shared_ptr<arrow::io::OutputStream> output = ...;
+ auto write_options = arrow::csv::WriteOptions::Defaults();
+ if (WriteCSV(table, write_options, output.get()).ok()) {
+ // Handle writer error...
+ }
+ }
+ {
+ // Write incrementally
+ // ...
+ std::shared_ptr<arrow::io::OutputStream> output = ...;
+ auto write_options = arrow::csv::WriteOptions::Defaults();
+ auto maybe_writer = arrow::csv::MakeCSVWriter(output, schema, write_options);
+ if (!maybe_writer.ok()) {
+ // Handle writer instantiation error...
+ }
+ std::shared_ptr<arrow::ipc::RecordBatchWriter> writer = *maybe_writer;
+
+ // Write batches...
+ if (!writer->WriteRecordBatch(*batch).ok()) {
+ // Handle write error...
+ }
+
+ if (!writer->Close().ok()) {
+ // Handle close error...
+ }
+ if (!output->Close().ok()) {
+ // Handle file close error...
+ }
+ }
+
+.. note:: The writer does not yet support all Arrow types.
+
+Column names
+============
+
+There are three possible ways to infer column names from the CSV file:
+
+* By default, the column names are read from the first row in the CSV file
+* If :member:`ReadOptions::column_names` is set, it forces the column
+ names in the table to these values (the first row in the CSV file is
+ read as data)
+* If :member:`ReadOptions::autogenerate_column_names` is true, column names
+ will be autogenerated with the pattern "f0", "f1"... (the first row in the
+ CSV file is read as data)
+
+Column selection
+================
+
+By default, Arrow reads all columns in the CSV file. You can narrow the
+selection of columns with the :member:`ConvertOptions::include_columns`
+option. If some columns in :member:`ConvertOptions::include_columns`
+are missing from the CSV file, an error will be emitted unless
+:member:`ConvertOptions::include_missing_columns` is true, in which case
+the missing columns are assumed to contain all-null values.
+
+Interaction with column names
+-----------------------------
+
+If both :member:`ReadOptions::column_names` and
+:member:`ConvertOptions::include_columns` are specified,
+the :member:`ReadOptions::column_names` are assumed to map to CSV columns,
+and :member:`ConvertOptions::include_columns` is a subset of those column
+names that will part of the Arrow Table.
+
+Data types
+==========
+
+By default, the CSV reader infers the most appropriate data type for each
+column. Type inference considers the following data types, in order:
+
+* Null
+* Int64
+* Boolean
+* Date32
+* Time32 (with seconds unit)
+* Timestamp (with seconds unit)
+* Timestamp (with nanoseconds unit)
+* Float64
+* Dictionary<String> (if :member:`ConvertOptions::auto_dict_encode` is true)
+* Dictionary<Binary> (if :member:`ConvertOptions::auto_dict_encode` is true)
+* String
+* Binary
+
+It is possible to override type inference for select columns by setting
+the :member:`ConvertOptions::column_types` option. Explicit data types
+can be chosen from the following list:
+
+* Null
+* All Integer types
+* Float32 and Float64
+* Decimal128
+* Boolean
+* Date32 and Date64
+* Time32 and Time64
+* Timestamp
+* Binary and Large Binary
+* String and Large String (with optional UTF8 input validation)
+* Fixed-Size Binary
+* Dictionary with index type Int32 and value type one of the following:
+ Binary, String, LargeBinary, LargeString, Int32, UInt32, Int64, UInt64,
+ Float32, Float64, Decimal128
+
+Other data types do not support conversion from CSV values and will error out.
+
+Dictionary inference
+--------------------
+
+If type inference is enabled and :member:`ConvertOptions::auto_dict_encode`
+is true, the CSV reader first tries to convert string-like columns to a
+dictionary-encoded string-like array. It switches to a plain string-like
+array when the threshold in :member:`ConvertOptions::auto_dict_max_cardinality`
+is reached.
+
+Nulls
+-----
+
+Null values are recognized from the spellings stored in
+:member:`ConvertOptions::null_values`. The :func:`ConvertOptions::Defaults`
+factory method will initialize a number of conventional null spellings such
+as ``N/A``.
+
+Character encoding
+------------------
+
+CSV files are expected to be encoded in UTF8. However, non-UTF8 data
+is accepted for Binary columns.
+
+Write Options
+=============
+
+The format of written CSV files can be customized via :class:`~arrow::csv::WriteOptions`.
+Currently few options are available; more will be added in future releases.
+
+Performance
+===========
+
+By default, the CSV reader will parallelize reads in order to exploit all
+CPU cores on your machine. You can change this setting in
+:member:`ReadOptions::use_threads`. A reasonable expectation is at least
+100 MB/s per core on a performant desktop or laptop computer (measured in
+source CSV bytes, not target Arrow data bytes).