]>
Commit | Line | Data |
---|---|---|
1d09f67e TL |
1 | % Generated by roxygen2: do not edit by hand |
2 | % Please edit documentation in R/dataset.R, R/dataset-factory.R | |
3 | \name{Dataset} | |
4 | \alias{Dataset} | |
5 | \alias{FileSystemDataset} | |
6 | \alias{UnionDataset} | |
7 | \alias{InMemoryDataset} | |
8 | \alias{DatasetFactory} | |
9 | \alias{FileSystemDatasetFactory} | |
10 | \title{Multi-file datasets} | |
11 | \description{ | |
12 | Arrow Datasets allow you to query against data that has been split across | |
13 | multiple files. This sharding of data may indicate partitioning, which | |
14 | can accelerate queries that only touch some partitions (files). | |
15 | ||
16 | A \code{Dataset} contains one or more \code{Fragments}, such as files, of potentially | |
17 | differing type and partitioning. | |
18 | ||
19 | For \code{Dataset$create()}, see \code{\link[=open_dataset]{open_dataset()}}, which is an alias for it. | |
20 | ||
21 | \code{DatasetFactory} is used to provide finer control over the creation of \code{Dataset}s. | |
22 | } | |
23 | \section{Factory}{ | |
24 | ||
25 | \code{DatasetFactory} is used to create a \code{Dataset}, inspect the \link{Schema} of the | |
26 | fragments contained in it, and declare a partitioning. | |
27 | \code{FileSystemDatasetFactory} is a subclass of \code{DatasetFactory} for | |
28 | discovering files in the local file system, the only currently supported | |
29 | file system. | |
30 | ||
31 | For the \code{DatasetFactory$create()} factory method, see \code{\link[=dataset_factory]{dataset_factory()}}, an | |
32 | alias for it. A \code{DatasetFactory} has: | |
33 | \itemize{ | |
34 | \item \verb{$Inspect(unify_schemas)}: If \code{unify_schemas} is \code{TRUE}, all fragments | |
35 | will be scanned and a unified \link{Schema} will be created from them; if \code{FALSE} | |
36 | (default), only the first fragment will be inspected for its schema. Use this | |
37 | fast path when you know and trust that all fragments have an identical schema. | |
38 | \item \verb{$Finish(schema, unify_schemas)}: Returns a \code{Dataset}. If \code{schema} is provided, | |
39 | it will be used for the \code{Dataset}; if omitted, a \code{Schema} will be created from | |
40 | inspecting the fragments (files) in the dataset, following \code{unify_schemas} | |
41 | as described above. | |
42 | } | |
43 | ||
44 | \code{FileSystemDatasetFactory$create()} is a lower-level factory method and | |
45 | takes the following arguments: | |
46 | \itemize{ | |
47 | \item \code{filesystem}: A \link{FileSystem} | |
48 | \item \code{selector}: Either a \link{FileSelector} or \code{NULL} | |
49 | \item \code{paths}: Either a character vector of file paths or \code{NULL} | |
50 | \item \code{format}: A \link{FileFormat} | |
51 | \item \code{partitioning}: Either \code{Partitioning}, \code{PartitioningFactory}, or \code{NULL} | |
52 | } | |
53 | } | |
54 | ||
55 | \section{Methods}{ | |
56 | ||
57 | ||
58 | A \code{Dataset} has the following methods: | |
59 | \itemize{ | |
60 | \item \verb{$NewScan()}: Returns a \link{ScannerBuilder} for building a query | |
61 | \item \verb{$schema}: Active binding that returns the \link{Schema} of the Dataset; you | |
62 | may also replace the dataset's schema by using \code{ds$schema <- new_schema}. | |
63 | This method currently supports only adding, removing, or reordering | |
64 | fields in the schema: you cannot alter or cast the field types. | |
65 | } | |
66 | ||
67 | \code{FileSystemDataset} has the following methods: | |
68 | \itemize{ | |
69 | \item \verb{$files}: Active binding, returns the files of the \code{FileSystemDataset} | |
70 | \item \verb{$format}: Active binding, returns the \link{FileFormat} of the \code{FileSystemDataset} | |
71 | } | |
72 | ||
73 | \code{UnionDataset} has the following methods: | |
74 | \itemize{ | |
75 | \item \verb{$children}: Active binding, returns all child \code{Dataset}s. | |
76 | } | |
77 | } | |
78 | ||
79 | \seealso{ | |
80 | \code{\link[=open_dataset]{open_dataset()}} for a simple interface to creating a \code{Dataset} | |
81 | } |