]>
Commit | Line | Data |
---|---|---|
1 | HTTP Parser | |
2 | =========== | |
3 | ||
4 | [![Build Status](https://api.travis-ci.org/nodejs/http-parser.svg?branch=master)](https://travis-ci.org/nodejs/http-parser) | |
5 | ||
6 | This is a parser for HTTP messages written in C. It parses both requests and | |
7 | responses. The parser is designed to be used in performance HTTP | |
8 | applications. It does not make any syscalls nor allocations, it does not | |
9 | buffer data, it can be interrupted at anytime. Depending on your | |
10 | architecture, it only requires about 40 bytes of data per message | |
11 | stream (in a web server that is per connection). | |
12 | ||
13 | Features: | |
14 | ||
15 | * No dependencies | |
16 | * Handles persistent streams (keep-alive). | |
17 | * Decodes chunked encoding. | |
18 | * Upgrade support | |
19 | * Defends against buffer overflow attacks. | |
20 | ||
21 | The parser extracts the following information from HTTP messages: | |
22 | ||
23 | * Header fields and values | |
24 | * Content-Length | |
25 | * Request method | |
26 | * Response status code | |
27 | * Transfer-Encoding | |
28 | * HTTP version | |
29 | * Request URL | |
30 | * Message body | |
31 | ||
32 | ||
33 | Usage | |
34 | ----- | |
35 | ||
36 | One `http_parser` object is used per TCP connection. Initialize the struct | |
37 | using `http_parser_init()` and set the callbacks. That might look something | |
38 | like this for a request parser: | |
39 | ```c | |
40 | http_parser_settings settings; | |
41 | settings.on_url = my_url_callback; | |
42 | settings.on_header_field = my_header_field_callback; | |
43 | /* ... */ | |
44 | ||
45 | http_parser *parser = malloc(sizeof(http_parser)); | |
46 | http_parser_init(parser, HTTP_REQUEST); | |
47 | parser->data = my_socket; | |
48 | ``` | |
49 | ||
50 | When data is received on the socket execute the parser and check for errors. | |
51 | ||
52 | ```c | |
53 | size_t len = 80*1024, nparsed; | |
54 | char buf[len]; | |
55 | ssize_t recved; | |
56 | ||
57 | recved = recv(fd, buf, len, 0); | |
58 | ||
59 | if (recved < 0) { | |
60 | /* Handle error. */ | |
61 | } | |
62 | ||
63 | /* Start up / continue the parser. | |
64 | * Note we pass recved==0 to signal that EOF has been received. | |
65 | */ | |
66 | nparsed = http_parser_execute(parser, &settings, buf, recved); | |
67 | ||
68 | if (parser->upgrade) { | |
69 | /* handle new protocol */ | |
70 | } else if (nparsed != recved) { | |
71 | /* Handle error. Usually just close the connection. */ | |
72 | } | |
73 | ``` | |
74 | ||
75 | HTTP needs to know where the end of the stream is. For example, sometimes | |
76 | servers send responses without Content-Length and expect the client to | |
77 | consume input (for the body) until EOF. To tell http_parser about EOF, give | |
78 | `0` as the fourth parameter to `http_parser_execute()`. Callbacks and errors | |
79 | can still be encountered during an EOF, so one must still be prepared | |
80 | to receive them. | |
81 | ||
82 | Scalar valued message information such as `status_code`, `method`, and the | |
83 | HTTP version are stored in the parser structure. This data is only | |
84 | temporally stored in `http_parser` and gets reset on each new message. If | |
85 | this information is needed later, copy it out of the structure during the | |
86 | `headers_complete` callback. | |
87 | ||
88 | The parser decodes the transfer-encoding for both requests and responses | |
89 | transparently. That is, a chunked encoding is decoded before being sent to | |
90 | the on_body callback. | |
91 | ||
92 | ||
93 | The Special Problem of Upgrade | |
94 | ------------------------------ | |
95 | ||
96 | HTTP supports upgrading the connection to a different protocol. An | |
97 | increasingly common example of this is the WebSocket protocol which sends | |
98 | a request like | |
99 | ||
100 | GET /demo HTTP/1.1 | |
101 | Upgrade: WebSocket | |
102 | Connection: Upgrade | |
103 | Host: example.com | |
104 | Origin: http://example.com | |
105 | WebSocket-Protocol: sample | |
106 | ||
107 | followed by non-HTTP data. | |
108 | ||
109 | (See [RFC6455](https://tools.ietf.org/html/rfc6455) for more information the | |
110 | WebSocket protocol.) | |
111 | ||
112 | To support this, the parser will treat this as a normal HTTP message without a | |
113 | body, issuing both on_headers_complete and on_message_complete callbacks. However | |
114 | http_parser_execute() will stop parsing at the end of the headers and return. | |
115 | ||
116 | The user is expected to check if `parser->upgrade` has been set to 1 after | |
117 | `http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied | |
118 | offset by the return value of `http_parser_execute()`. | |
119 | ||
120 | ||
121 | Callbacks | |
122 | --------- | |
123 | ||
124 | During the `http_parser_execute()` call, the callbacks set in | |
125 | `http_parser_settings` will be executed. The parser maintains state and | |
126 | never looks behind, so buffering the data is not necessary. If you need to | |
127 | save certain data for later usage, you can do that from the callbacks. | |
128 | ||
129 | There are two types of callbacks: | |
130 | ||
131 | * notification `typedef int (*http_cb) (http_parser*);` | |
132 | Callbacks: on_message_begin, on_headers_complete, on_message_complete. | |
133 | * data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);` | |
134 | Callbacks: (requests only) on_url, | |
135 | (common) on_header_field, on_header_value, on_body; | |
136 | ||
137 | Callbacks must return 0 on success. Returning a non-zero value indicates | |
138 | error to the parser, making it exit immediately. | |
139 | ||
140 | For cases where it is necessary to pass local information to/from a callback, | |
141 | the `http_parser` object's `data` field can be used. | |
142 | An example of such a case is when using threads to handle a socket connection, | |
143 | parse a request, and then give a response over that socket. By instantiation | |
144 | of a thread-local struct containing relevant data (e.g. accepted socket, | |
145 | allocated memory for callbacks to write into, etc), a parser's callbacks are | |
146 | able to communicate data between the scope of the thread and the scope of the | |
147 | callback in a threadsafe manner. This allows http-parser to be used in | |
148 | multi-threaded contexts. | |
149 | ||
150 | Example: | |
151 | ``` | |
152 | typedef struct { | |
153 | socket_t sock; | |
154 | void* buffer; | |
155 | int buf_len; | |
156 | } custom_data_t; | |
157 | ||
158 | ||
159 | int my_url_callback(http_parser* parser, const char *at, size_t length) { | |
160 | /* access to thread local custom_data_t struct. | |
161 | Use this access save parsed data for later use into thread local | |
162 | buffer, or communicate over socket | |
163 | */ | |
164 | parser->data; | |
165 | ... | |
166 | return 0; | |
167 | } | |
168 | ||
169 | ... | |
170 | ||
171 | void http_parser_thread(socket_t sock) { | |
172 | int nparsed = 0; | |
173 | /* allocate memory for user data */ | |
174 | custom_data_t *my_data = malloc(sizeof(custom_data_t)); | |
175 | ||
176 | /* some information for use by callbacks. | |
177 | * achieves thread -> callback information flow */ | |
178 | my_data->sock = sock; | |
179 | ||
180 | /* instantiate a thread-local parser */ | |
181 | http_parser *parser = malloc(sizeof(http_parser)); | |
182 | http_parser_init(parser, HTTP_REQUEST); /* initialise parser */ | |
183 | /* this custom data reference is accessible through the reference to the | |
184 | parser supplied to callback functions */ | |
185 | parser->data = my_data; | |
186 | ||
187 | http_parser_settings settings; / * set up callbacks */ | |
188 | settings.on_url = my_url_callback; | |
189 | ||
190 | /* execute parser */ | |
191 | nparsed = http_parser_execute(parser, &settings, buf, recved); | |
192 | ||
193 | ... | |
194 | /* parsed information copied from callback. | |
195 | can now perform action on data copied into thread-local memory from callbacks. | |
196 | achieves callback -> thread information flow */ | |
197 | my_data->buffer; | |
198 | ... | |
199 | } | |
200 | ||
201 | ``` | |
202 | ||
203 | In case you parse HTTP message in chunks (i.e. `read()` request line | |
204 | from socket, parse, read half headers, parse, etc) your data callbacks | |
205 | may be called more than once. Http-parser guarantees that data pointer is only | |
206 | valid for the lifetime of callback. You can also `read()` into a heap allocated | |
207 | buffer to avoid copying memory around if this fits your application. | |
208 | ||
209 | Reading headers may be a tricky task if you read/parse headers partially. | |
210 | Basically, you need to remember whether last header callback was field or value | |
211 | and apply the following logic: | |
212 | ||
213 | (on_header_field and on_header_value shortened to on_h_*) | |
214 | ------------------------ ------------ -------------------------------------------- | |
215 | | State (prev. callback) | Callback | Description/action | | |
216 | ------------------------ ------------ -------------------------------------------- | |
217 | | nothing (first call) | on_h_field | Allocate new buffer and copy callback data | | |
218 | | | | into it | | |
219 | ------------------------ ------------ -------------------------------------------- | |
220 | | value | on_h_field | New header started. | | |
221 | | | | Copy current name,value buffers to headers | | |
222 | | | | list and allocate new buffer for new name | | |
223 | ------------------------ ------------ -------------------------------------------- | |
224 | | field | on_h_field | Previous name continues. Reallocate name | | |
225 | | | | buffer and append callback data to it | | |
226 | ------------------------ ------------ -------------------------------------------- | |
227 | | field | on_h_value | Value for current header started. Allocate | | |
228 | | | | new buffer and copy callback data to it | | |
229 | ------------------------ ------------ -------------------------------------------- | |
230 | | value | on_h_value | Value continues. Reallocate value buffer | | |
231 | | | | and append callback data to it | | |
232 | ------------------------ ------------ -------------------------------------------- | |
233 | ||
234 | ||
235 | Parsing URLs | |
236 | ------------ | |
237 | ||
238 | A simplistic zero-copy URL parser is provided as `http_parser_parse_url()`. | |
239 | Users of this library may wish to use it to parse URLs constructed from | |
240 | consecutive `on_url` callbacks. | |
241 | ||
242 | See examples of reading in headers: | |
243 | ||
244 | * [partial example](http://gist.github.com/155877) in C | |
245 | * [from http-parser tests](http://github.com/joyent/http-parser/blob/37a0ff8/test.c#L403) in C | |
246 | * [from Node library](http://github.com/joyent/node/blob/842eaf4/src/http.js#L284) in Javascript |