]>
Commit | Line | Data |
---|---|---|
1d09f67e TL |
1 | --- |
2 | title: "Connecting to Flight RPC Servers" | |
3 | output: rmarkdown::html_vignette | |
4 | vignette: > | |
5 | %\VignetteIndexEntry{Connecting to Flight RPC Servers} | |
6 | %\VignetteEngine{knitr::rmarkdown} | |
7 | %\VignetteEncoding{UTF-8} | |
8 | --- | |
9 | ||
10 | [**Flight**](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) | |
11 | is a general-purpose client-server framework for high performance | |
12 | transport of large datasets over network interfaces, built as part of the | |
13 | [Apache Arrow](https://arrow.apache.org) project. | |
14 | ||
15 | Flight allows for highly efficient data transfer as it: | |
16 | ||
17 | * removes the need for deserialization during data transfer | |
18 | * allows for parallel data streaming | |
19 | * is highly optimized to take advantage of Arrow's columnar format. | |
20 | ||
21 | The arrow package provides methods for connecting to Flight RPC servers | |
22 | to send and receive data. | |
23 | ||
24 | ## Getting Started | |
25 | ||
26 | The `flight` functions in the package use [reticulate](https://rstudio.github.io/reticulate/) to call methods in the | |
27 | [pyarrow](https://arrow.apache.org/docs/python/api/flight.html) Python package. | |
28 | ||
29 | Before using them for the first time, | |
30 | you'll need to be sure you have reticulate and pyarrow installed: | |
31 | ||
32 | ```r | |
33 | install.packages("reticulate") | |
34 | arrow::install_pyarrow() | |
35 | ``` | |
36 | ||
37 | See `vignette("python", package = "arrow")` for more details on setting up | |
38 | `pyarrow`. | |
39 | ||
40 | ## Example | |
41 | ||
42 | The package includes methods for starting a Python-based Flight server, as well | |
43 | as methods for connecting to a Flight server running elsewhere. | |
44 | ||
45 | To illustrate both sides, in one process let's start a demo server: | |
46 | ||
47 | ```r | |
48 | library(arrow) | |
49 | demo_server <- load_flight_server("demo_flight_server") | |
50 | server <- demo_server$DemoFlightServer(port = 8089) | |
51 | server$serve() | |
52 | ``` | |
53 | ||
54 | We'll leave that one running. | |
55 | ||
56 | In a different R process, let's connect to it and put some data in it. | |
57 | ||
58 | ```r | |
59 | library(arrow) | |
60 | client <- flight_connect(port = 8089) | |
61 | # Upload some data to our server so there's something to demo | |
62 | flight_put(client, iris, path = "test_data/iris") | |
63 | ``` | |
64 | ||
65 | Now, in a new R process, let's connect to the server and pull the data we | |
66 | put there: | |
67 | ||
68 | ```r | |
69 | library(arrow) | |
70 | library(dplyr) | |
71 | client <- flight_connect(port = 8089) | |
72 | client %>% | |
73 | flight_get("test_data/iris") %>% | |
74 | group_by(Species) %>% | |
75 | summarize(max_petal = max(Petal.Length)) | |
76 | ||
77 | ## # A tibble: 3 x 2 | |
78 | ## Species max_petal | |
79 | ## <fct> <dbl> | |
80 | ## 1 setosa 1.9 | |
81 | ## 2 versicolor 5.1 | |
82 | ## 3 virginica 6.9 | |
83 | ``` | |
84 | ||
85 | Because `flight_get()` returns an Arrow data structure, you can directly pipe | |
86 | its result into a [dplyr](https://dplyr.tidyverse.org/) workflow. | |
87 | See `vignette("dataset", package = "arrow")` for more information on working with Arrow objects via a dplyr interface. |