]> git.proxmox.com Git - ceph.git/blob - ceph/src/arrow/cpp/apidoc/HDFS.md
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / cpp / apidoc / HDFS.md
1 <!---
2 Licensed to the Apache Software Foundation (ASF) under one
3 or more contributor license agreements. See the NOTICE file
4 distributed with this work for additional information
5 regarding copyright ownership. The ASF licenses this file
6 to you under the Apache License, Version 2.0 (the
7 "License"); you may not use this file except in compliance
8 with the License. You may obtain a copy of the License at
9
10 http://www.apache.org/licenses/LICENSE-2.0
11
12 Unless required by applicable law or agreed to in writing,
13 software distributed under the License is distributed on an
14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 KIND, either express or implied. See the License for the
16 specific language governing permissions and limitations
17 under the License.
18 -->
19
20 ## Using Arrow's HDFS (Apache Hadoop Distributed File System) interface
21
22 ### Build requirements
23
24 To build the integration, pass the following option to CMake
25
26 ```shell
27 -DARROW_HDFS=on
28 ```
29
30 For convenience, we have bundled `hdfs.h` for libhdfs from Apache Hadoop in
31 Arrow's thirdparty. If you wish to build against the `hdfs.h` in your installed
32 Hadoop distribution, set the `$HADOOP_HOME` environment variable.
33
34 ### Runtime requirements
35
36 By default, the HDFS client C++ class in `libarrow_io` uses the libhdfs JNI
37 interface to the Java Hadoop client. This library is loaded **at runtime**
38 (rather than at link / library load time, since the library may not be in your
39 LD_LIBRARY_PATH), and relies on some environment variables.
40
41 * `HADOOP_HOME`: the root of your installed Hadoop distribution. Often has
42 `lib/native/libhdfs.so`.
43 * `JAVA_HOME`: the location of your Java SDK installation.
44 * `CLASSPATH`: must contain the Hadoop jars. You can set these using:
45
46 ```shell
47 export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
48 ```
49
50 * `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is
51 installed somewhere other than `$HADOOP_HOME/lib/native`.
52
53 To accommodate distribution-specific nuances, the `JAVA_HOME` variable may be
54 set to the root path for the Java SDK, the JRE path itself, or to the directory
55 containing the `libjvm` library.
56
57 ### Mac Specifics
58
59 The installed location of Java on OS X can vary, however the following snippet
60 will set it automatically for you:
61
62 ```shell
63 export JAVA_HOME=$(/usr/libexec/java_home)
64 ```
65
66 Homebrew's Hadoop does not have native libs. Apache doesn't build these, so
67 users must build Hadoop to get the native libs. See this Stack Overflow
68 answer for details:
69
70 http://stackoverflow.com/a/40051353/478288
71
72 Be sure to include the path to the native libs in `JAVA_LIBRARY_PATH`:
73
74 ```shell
75 export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
76 ```
77
78 If you get an error about needing to install Java 6, then add *BundledApp* and
79 *JNI* to the `JVMCapabilities` in `$JAVA_HOME/../Info.plist`. See
80
81 https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/
82
83 https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/