]> git.proxmox.com Git - ceph.git/blame - ceph/src/arrow/docs/source/java/vector_schema_root.rst
import quincy 17.2.0
[ceph.git] / ceph / src / arrow / docs / source / java / vector_schema_root.rst
CommitLineData
1d09f67e
TL
1.. Licensed to the Apache Software Foundation (ASF) under one
2.. or more contributor license agreements. See the NOTICE file
3.. distributed with this work for additional information
4.. regarding copyright ownership. The ASF licenses this file
5.. to you under the Apache License, Version 2.0 (the
6.. "License"); you may not use this file except in compliance
7.. with the License. You may obtain a copy of the License at
8
9.. http://www.apache.org/licenses/LICENSE-2.0
10
11.. Unless required by applicable law or agreed to in writing,
12.. software distributed under the License is distributed on an
13.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14.. KIND, either express or implied. See the License for the
15.. specific language governing permissions and limitations
16.. under the License.
17
18================
19VectorSchemaRoot
20================
21A :class:`VectorSchemaRoot` is a container that can hold batches, batches flow through :class:`VectorSchemaRoot`
22as part of a pipeline. Note this is different from other implementations (i.e. in C++ and Python,
23a :class:`RecordBatch` is a collection of equal-length vector instances and was created each time for a new batch).
24
25The recommended usage for :class:`VectorSchemaRoot` is creating a single :class:`VectorSchemaRoot`
26based on the known schema and populated data over and over into the same VectorSchemaRoot in a stream
27of batches rather than creating a new :class:`VectorSchemaRoot` instance each time
28(see `Numba <https://github.com/apache/arrow/tree/master/java/flight/src/main/java/org/apache/arrow/flight>`_ or
29``ArrowFileWriter`` for better understanding). Thus at any one point a VectorSchemaRoot may have data or
30may have no data (say it was transferred downstream or not yet populated).
31
32
33Here is the example of building a :class:`VectorSchemaRoot`
34
35.. code-block:: Java
36
37 BitVector bitVector = new BitVector("boolean", allocator);
38 VarCharVector varCharVector = new VarCharVector("varchar", allocator);
39 bitVector.allocateNew();
40 varCharVector.allocateNew();
41 for (int i = 0; i < 10; i++) {
42 bitVector.setSafe(i, i % 2 == 0 ? 0 : 1);
43 varCharVector.setSafe(i, ("test" + i).getBytes(StandardCharsets.UTF_8));
44 }
45 bitVector.setValueCount(10);
46 varCharVector.setValueCount(10);
47
48 List<Field> fields = Arrays.asList(bitVector.getField(), varCharVector.getField());
49 List<FieldVector> vectors = Arrays.asList(bitVector, varCharVector);
50 VectorSchemaRoot vectorSchemaRoot = new VectorSchemaRoot(fields, vectors);
51
52The vectors within a :class:`VectorSchemaRoot` could be loaded/unloaded via :class:`VectorLoader` and :class:`VectorUnloader`.
53:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch`(
54representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
55
56.. code-block:: Java
57
58 // create a VectorSchemaRoot root1 and convert its data into recordBatch
59 VectorSchemaRoot root1 = new VectorSchemaRoot(fields, vectors);
60 VectorUnloader unloader = new VectorUnloader(root1);
61 ArrowRecordBatch recordBatch = unloader.getRecordBatch();
62
63 // create a VectorSchemaRoot root2 and load the recordBatch
64 VectorSchemaRoot root2 = VectorSchemaRoot.create(root1.getSchema(), allocator);
65 VectorLoader loader = new VectorLoader(root2);
66 loader.load(recordBatch);
67
68A new :class:`VectorSchemaRoot` could be sliced from an existing instance with zero-copy
69
70.. code-block:: Java
71
72 // 0 indicates start index (inclusive) and 5 indicated length (exclusive).
73 VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);
74