]> git.proxmox.com Git - ceph.git/blame - ceph/src/s3select/s3select.rst
update ceph source to reef 18.1.2
[ceph.git] / ceph / src / s3select / s3select.rst
CommitLineData
f67539c2
TL
1===============
2 Ceph s3 select
3===============
4
5.. contents::
6
7Overview
8--------
9
10 | The purpose of **s3 select** engine is to create an efficient pipe between user client to storage node (the engine should be close as possible to storage).
11 | It enables the user to define the exact portion of data should be received by his side.
12 | It also enables for higher level analytic-applications (such as SPARK-SQL) , using that feature to improve their latency and throughput.
13
14 | For example, a s3-object of several GB (CSV file), a user needs to extract a single column which filtered by another column.
15 | As the following query:
16 | ``select customer-id from s3Object where age>30 and age<65;``
17
18 | Currently the whole s3-object must retrieve from OSD via RGW before filtering and extracting data.
19 | By "pushing down" the query into OSD , it's possible to save a lot of network and CPU(serialization / deserialization).
20
21 | **The bigger the object, and the more accurate the query, the better the performance**.
22
23Basic workflow
24--------------
25
26 | S3-select query is sent to RGW via `AWS-CLI <https://docs.aws.amazon.com/cli/latest/reference/s3api/select-object-content.html>`_
27
28 | It passes the authentication and permission process as an incoming message (POST).
29 | **RGWSelectObj_ObjStore_S3::send_response_data** is the “entry point”, it handles each fetched chunk according to input object-key.
30 | **send_response_data** is first handling the input query, it extracts the query and other CLI parameters.
31
32 | Per each new fetched chunk (~4m), it runs the s3-select query on that chunk.
33 | The current implementation supports CSV objects and since chunks are randomly “cutting” the CSV rows in the middle, those broken-lines (first or last per chunk) are skipped while processing the query.
34 | Those “broken” lines are stored and later merged with the next broken-line (belong to the next chunk), and finally processed.
35
36 | Per each processed chunk an output message is formatted according to AWS specification and sent back to the client.
37 | For aggregation queries the last chunk should be identified as the end of input, following that the s3-select-engine initiates end-of-process and produces an aggregate result.
38
39Design Concepts
40---------------
41
42AST- Abstract Syntax Tree
43~~~~~~~~~~~~~~~~~~~~~~~~~
44 | The s3-select main flow is initiated with parsing of input-string (i.e user query), and follows
45 | with building an AST (abstract-syntax-tree) as a result.
46 | The execution phase is built upon the AST.
47
48 | ``Base_statement`` is the base for the all object-nodes participating in the execution phase, it consists of the ``eval()`` method which returns the <value> object.
49
50 | ``value`` object is handling the known basic-types such as int,string,float,time-stamp
51 | It is able to operate comparison and basic arithmetic operations on mentioned types.
52
53 | The execution-flow is actually calling the ``eval()`` method on the root-node (per each projection), it goes all the way down, and returns the actual result (``value`` object) from bottom node to root node(all the way up) .
54
55 | **Alias** programming-construct is an essential part of s3-select language, it enables much better programming especially with objects containing many columns or in the case of complex queries.
56
57 | Upon parsing the statement containing alias construct, it replaces alias with reference to the correct AST-node, on runtime the node is simply evaluated as any other node.
58
59 | There is a risk that self(or cyclic) reference may occur causing stack-overflow(endless-loop), for that concern upon evaluating an alias, it is validated for cyclic reference.
60
61 | Alias also maintains result-cache, meaning upon using the same alias more than once, it’s not evaluating the same node again(it will return the same result),instead it uses the result from cache.
62
63 | Of Course, per each new row the cache is invalidated.
64
65
66S3 select parser definition
67~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | The implementation of s3-select uses the `boost::spirit <https://www.boost.org/doc/libs/1_71_0/libs/spirit/classic/doc/grammar.html>`_ the definition of s3-select command is according to AWS.
69
70 | Upon parsing is initiated on input text, and a specific rule is identified, an action which is bound to that rule is executed.
71 | Those actions are building the AST, each action is unique (as its rule), at the end of the process it forms a structure similar to a tree.
72
73 | As mentioned, running eval() on the root node, execute the s3-select statement (per projection).
74 | The input stream is accessible to the execution tree, by the scratch-area object, that object is constantly updated per each new row.
75
76Basic functionalities
77~~~~~~~~~~~~~~~~~~~~~
78
79 | **S3select** has a definite set of functionalities that should be implemented (if we wish to stay compliant with AWS), currently only a portion of it is implemented.
80
81 | The implemented software architecture supports basic arithmetic expressions, logical and compare expressions, including nested function calls and casting operators, that alone enables the user reasonable flexibility.
82 | review the bellow feature-table_.
83
84
85
86Memory handling
87~~~~~~~~~~~~~~~
88
89 | S3select structures and objects are lockless and thread-safe, it uses placement-new in order to reduce the alloc/dealloc intensive cycles, which may impact the main process hosting s3-select.
90
91 | Once AST is built there is no need to allocate memory for the execution itself, the AST is “static” for the query-execution life-cycle.
92
93 | The execution itself is stream-oriented, meaning there is no pre-allocation before execution, object size has no impact on memory consumption.
94
95 | It processes chunk after chunk, row after row, all memory needed for processing resides on AST.
96
97 | The AST is similar to stack behaviour in that it consumes already allocated memory and “releases” it upon completing its task.
98
99S3 Object different types
100~~~~~~~~~~~~~~~~~~~~~~~~~
101
102 | The processing of input stream is decoupled from s3-select-engine, meaning , each input-type should have its own parser, converting s3-object into columns.
103
104 | Current implementation includes only CSV reader; its parsing definitions are according to AWS.
105 | The parser is implemented using `boost::state-machine <https://www.boost.org/doc/libs/1_64_0/libs/msm/doc/HTML/index.html>`_.
106
107 | The CSV parser handles NULL,quote,escape rules,field delimiter,row delimiter and users may define (via AWS CLI) all of those dynamically.
108
109Error Handling
110~~~~~~~~~~~~~~
111 | S3-select statement may be syntactically correct but semantically wrong, for one example ``select a * b from …`` , where a is number and b is a string.
112 | Current implementation is for CSV file types, CSV has no schema, column-types may evaluate on runtime.
113 | The above means that wrong semantic statements may occur on runtime.
114
115 | As for syntax error ``select x frm stdin;`` , the builtin parser fails on first miss-match to language definition, and produces an error message back to client (AWS-CLI).
116 | The error message is point on location of miss-match.
117
118 | Fatal severity (attached to the exception) will end execution immediately, other error severity are counted, upon reaching 100, it ends execution with an error message.
119
120
121AST denostration
122~~~~~~~~~~~~~~~~
123.. ditaa::
124
125 +---------------------+
126 | select |
127 +------ +---------------------+---------+
128 | | |
129 | | |
130 | | |
131 | V |
132 | +--------------------+ |
133 | | s3object | |
134 | +--------------------+ |
135 | |
136 V V
137 +---------------------+ +-------------+
138 | projections | | where |
139 +---------------------+ +-------------+
140 | | |
141 | | |
142 | | |
143 | | |
144 | | |
145 | | |
146 V V V
147 +-----------+ +-----------+ +-------------+
148 | multiply | | date | | and |
149 +-----------+ +-----------+ +-------------+
150 | | | |
151 | | | |
152 | | | |
153 | | | |
154 V V V V
155 +-------+ +-------+ +-----+ +-----+
156 |payment| | 0.3 | | EQ | | LT |
157 +-------+ +-------+ +--+-----+ +-----+--+
158 | | | |
159 | | | |
160 V V V V
161 +-------+ +----+ +-----+ +-----+
162 | region| |east| |age | | 30 |
163 +-------+ +----+ +-----+ +-----+
164
165Features Support
166----------------
167
168.. _feature-table:
169
170The following table describes the support for s3-select functionalities:
171
172+---------------------------------+-----------------+-----------------------------------------------------------------------+
173| Feature | Detailed | Example |
174+=================================+=================+=======================================================================+
175| Arithmetic operators | ^ * / + - ( ) | select (int(_1)+int(_2))*int(_9) from stdin; |
176+---------------------------------+-----------------+-----------------------------------------------------------------------+
177| | | select ((1+2)*3.14) ^ 2 from stdin; |
178+---------------------------------+-----------------+-----------------------------------------------------------------------+
179| Compare operators | > < >= <= == != | select _1,_2 from stdin where (int(1)+int(_3))>int(_5); |
180+---------------------------------+-----------------+-----------------------------------------------------------------------+
181| logical operator | AND OR | select count(*) from stdin where int(1)>123 and int(_5)<200; |
182+---------------------------------+-----------------+-----------------------------------------------------------------------+
183| casting operator | int(expression) | select int(_1),int( 1.2 + 3.4) from stdin; |
184+---------------------------------+-----------------+-----------------------------------------------------------------------+
185| |float(expression)| |
186+---------------------------------+-----------------+-----------------------------------------------------------------------+
187| | timestamp(...) | select timestamp("1999:10:10-12:23:44") from stdin; |
188+---------------------------------+-----------------+-----------------------------------------------------------------------+
189| Aggregation Function | sum | select sum(int(_1)) from stdin; |
190+---------------------------------+-----------------+-----------------------------------------------------------------------+
191| Aggregation Function | min | select min( int(_1) * int(_5) ) from stdin; |
192+---------------------------------+-----------------+-----------------------------------------------------------------------+
193| Aggregation Function | max | select max(float(_1)),min(int(_5)) from stdin; |
194+---------------------------------+-----------------+-----------------------------------------------------------------------+
195| Aggregation Function | count | select count(*) from stdin where (int(1)+int(_3))>int(_5); |
196+---------------------------------+-----------------+-----------------------------------------------------------------------+
197| Timestamp Functions | extract | select count(*) from stdin where |
198| | | extract("year",timestamp(_2)) > 1950 |
199| | | and extract("year",timestamp(_1)) < 1960; |
200+---------------------------------+-----------------+-----------------------------------------------------------------------+
201| Timestamp Functions | dateadd | select count(0) from stdin where |
202| | | datediff("year",timestamp(_1),dateadd("day",366,timestamp(_1))) == 1; |
203+---------------------------------+-----------------+-----------------------------------------------------------------------+
204| Timestamp Functions | datediff | select count(0) from stdin where |
205| | | datediff("month",timestamp(_1),timestamp(_2))) == 2; |
206+---------------------------------+-----------------+-----------------------------------------------------------------------+
207| Timestamp Functions | utcnow | select count(0) from stdin where |
208| | | datediff("hours",utcnow(),dateadd("day",1,utcnow())) == 24 ; |
209+---------------------------------+-----------------+-----------------------------------------------------------------------+
210| String Functions | substr | select count(0) from stdin where |
211| | | int(substr(_1,1,4))>1950 and int(substr(_1,1,4))<1960; |
212+---------------------------------+-----------------+-----------------------------------------------------------------------+
213| alias support | | select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 |
214| | | from stdin where a3>100 and a3<300; |
215+---------------------------------+-----------------+-----------------------------------------------------------------------+
216
217Sending Query to RGW
218--------------------
219
220Syntax
221~~~~~~
222CSV default defintion for field-delimiter,row-delimiter,quote-char,escape-char are: { , \\n " \\ }
223
224::
225
226 aws --endpoint-url http://localhost:8000 s3api select-object-content
227 --bucket {BUCKET-NAME}
228 --expression-type 'SQL'
229 --input-serialization
230 '{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" , "QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
231 --output-serialization '{"CSV": {}}'
232 --key {OBJECT-NAME}
233 --expression "select count(0) from stdin where int(_1)<10;" output.csv
234
235CSV parsing behavior
236--------------------
237
238+---------------------------------+-----------------+-----------------------------------------------------------------------+
239| Feature | Description | input ==> tokens |
240+=================================+=================+=======================================================================+
241| NULL | successive | ,,1,,2, ==> {null}{null}{1}{null}{2}{null} |
242| | field delimiter | |
243+---------------------------------+-----------------+-----------------------------------------------------------------------+
244| QUOTE | quote character | 11,22,"a,b,c,d",last ==> {11}{22}{"a,b,c,d"}{last} |
245| | overrides | |
246| | field delimiter | |
247+---------------------------------+-----------------+-----------------------------------------------------------------------+
248| Escape | escape char | 11,22,str=\\"abcd\\"\\,str2=\\"123\\",last |
249| | overrides | ==> {11}{22}{str="abcd",str2="123"}{last} |
250| | meta-character. | |
251| | escape removed | |
252+---------------------------------+-----------------+-----------------------------------------------------------------------+
253| row delimiter | no close quote, | 11,22,a="str,44,55,66 |
254| | row delimiter is| ==> {11}{22}{a="str,44,55,66} |
255| | closing line | |
256+---------------------------------+-----------------+-----------------------------------------------------------------------+
257| csv header info | FileHeaderInfo | "**USE**" value means each token on first line is column-name, |
258| | tag | "**IGNORE**" value means to skip the first line |
259+---------------------------------+-----------------+-----------------------------------------------------------------------+