]> git.proxmox.com Git - ceph.git/blame - ceph/src/arrow/r/vignettes/flight.Rmd
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / r / vignettes / flight.Rmd
CommitLineData
1d09f67e
TL
1---
2title: "Connecting to Flight RPC Servers"
3output: rmarkdown::html_vignette
4vignette: >
5 %\VignetteIndexEntry{Connecting to Flight RPC Servers}
6 %\VignetteEngine{knitr::rmarkdown}
7 %\VignetteEncoding{UTF-8}
8---
9
10[**Flight**](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
11is a general-purpose client-server framework for high performance
12transport of large datasets over network interfaces, built as part of the
13[Apache Arrow](https://arrow.apache.org) project.
14
15Flight allows for highly efficient data transfer as it:
16
17* removes the need for deserialization during data transfer
18* allows for parallel data streaming
19* is highly optimized to take advantage of Arrow's columnar format.
20
21The arrow package provides methods for connecting to Flight RPC servers
22to send and receive data.
23
24## Getting Started
25
26The `flight` functions in the package use [reticulate](https://rstudio.github.io/reticulate/) to call methods in the
27[pyarrow](https://arrow.apache.org/docs/python/api/flight.html) Python package.
28
29Before using them for the first time,
30you'll need to be sure you have reticulate and pyarrow installed:
31
32```r
33install.packages("reticulate")
34arrow::install_pyarrow()
35```
36
37See `vignette("python", package = "arrow")` for more details on setting up
38`pyarrow`.
39
40## Example
41
42The package includes methods for starting a Python-based Flight server, as well
43as methods for connecting to a Flight server running elsewhere.
44
45To illustrate both sides, in one process let's start a demo server:
46
47```r
48library(arrow)
49demo_server <- load_flight_server("demo_flight_server")
50server <- demo_server$DemoFlightServer(port = 8089)
51server$serve()
52```
53
54We'll leave that one running.
55
56In a different R process, let's connect to it and put some data in it.
57
58```r
59library(arrow)
60client <- flight_connect(port = 8089)
61# Upload some data to our server so there's something to demo
62flight_put(client, iris, path = "test_data/iris")
63```
64
65Now, in a new R process, let's connect to the server and pull the data we
66put there:
67
68```r
69library(arrow)
70library(dplyr)
71client <- flight_connect(port = 8089)
72client %>%
73 flight_get("test_data/iris") %>%
74 group_by(Species) %>%
75 summarize(max_petal = max(Petal.Length))
76
77## # A tibble: 3 x 2
78## Species max_petal
79## <fct> <dbl>
80## 1 setosa 1.9
81## 2 versicolor 5.1
82## 3 virginica 6.9
83```
84
85Because `flight_get()` returns an Arrow data structure, you can directly pipe
86its result into a [dplyr](https://dplyr.tidyverse.org/) workflow.
87See `vignette("dataset", package = "arrow")` for more information on working with Arrow objects via a dplyr interface.