]>
Commit | Line | Data |
---|---|---|
13cf67c4 XL |
1 | ## Building a Single-Threaded Web Server |
2 | ||
3 | We’ll start by getting a single-threaded web server working. Before we begin, | |
4 | let’s look at a quick overview of the protocols involved in building web | |
5 | servers. The details of these protocols are beyond the scope of this book, but | |
6 | a brief overview will give you the information you need. | |
7 | ||
8 | The two main protocols involved in web servers are the *Hypertext Transfer | |
9 | Protocol* *(HTTP)* and the *Transmission Control Protocol* *(TCP)*. Both | |
10 | protocols are *request-response* protocols, meaning a *client* initiates | |
11 | requests and a *server* listens to the requests and provides a response to the | |
12 | client. The contents of those requests and responses are defined by the | |
13 | protocols. | |
14 | ||
15 | TCP is the lower-level protocol that describes the details of how information | |
16 | gets from one server to another but doesn’t specify what that information is. | |
17 | HTTP builds on top of TCP by defining the contents of the requests and | |
18 | responses. It’s technically possible to use HTTP with other protocols, but in | |
19 | the vast majority of cases, HTTP sends its data over TCP. We’ll work with the | |
20 | raw bytes of TCP and HTTP requests and responses. | |
21 | ||
22 | ### Listening to the TCP Connection | |
23 | ||
24 | Our web server needs to listen to a TCP connection, so that’s the first part | |
25 | we’ll work on. The standard library offers a `std::net` module that lets us do | |
26 | this. Let’s make a new project in the usual fashion: | |
27 | ||
f035d41b | 28 | ```console |
13cf67c4 XL |
29 | $ cargo new hello |
30 | Created binary (application) `hello` project | |
31 | $ cd hello | |
32 | ``` | |
33 | ||
34 | Now enter the code in Listing 20-1 in *src/main.rs* to start. This code will | |
35 | listen at the address `127.0.0.1:7878` for incoming TCP streams. When it gets | |
36 | an incoming stream, it will print `Connection established!`. | |
37 | ||
38 | <span class="filename">Filename: src/main.rs</span> | |
39 | ||
40 | ```rust,no_run | |
74b04a01 | 41 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-01/src/main.rs}} |
13cf67c4 XL |
42 | ``` |
43 | ||
44 | <span class="caption">Listing 20-1: Listening for incoming streams and printing | |
45 | a message when we receive a stream</span> | |
46 | ||
47 | Using `TcpListener`, we can listen for TCP connections at the address | |
48 | `127.0.0.1:7878`. In the address, the section before the colon is an IP address | |
49 | representing your computer (this is the same on every computer and doesn’t | |
50 | represent the authors’ computer specifically), and `7878` is the port. We’ve | |
51 | chosen this port for two reasons: HTTP is normally accepted on this port, and | |
52 | 7878 is *rust* typed on a telephone. | |
53 | ||
54 | The `bind` function in this scenario works like the `new` function in that it | |
55 | will return a new `TcpListener` instance. The reason the function is called | |
56 | `bind` is that in networking, connecting to a port to listen to is known as | |
57 | “binding to a port.” | |
58 | ||
59 | The `bind` function returns a `Result<T, E>`, which indicates that binding | |
60 | might fail. For example, connecting to port 80 requires administrator | |
fc512014 | 61 | privileges (nonadministrators can listen only on ports higher than 1023), so if |
13cf67c4 XL |
62 | we tried to connect to port 80 without being an administrator, binding wouldn’t |
63 | work. As another example, binding wouldn’t work if we ran two instances of our | |
64 | program and so had two programs listening to the same port. Because we’re | |
65 | writing a basic server just for learning purposes, we won’t worry about | |
66 | handling these kinds of errors; instead, we use `unwrap` to stop the program if | |
67 | errors happen. | |
68 | ||
69 | The `incoming` method on `TcpListener` returns an iterator that gives us a | |
70 | sequence of streams (more specifically, streams of type `TcpStream`). A single | |
71 | *stream* represents an open connection between the client and the server. A | |
72 | *connection* is the name for the full request and response process in which a | |
73 | client connects to the server, the server generates a response, and the server | |
74 | closes the connection. As such, `TcpStream` will read from itself to see what | |
75 | the client sent and then allow us to write our response to the stream. Overall, | |
76 | this `for` loop will process each connection in turn and produce a series of | |
77 | streams for us to handle. | |
78 | ||
79 | For now, our handling of the stream consists of calling `unwrap` to terminate | |
80 | our program if the stream has any errors; if there aren’t any errors, the | |
81 | program prints a message. We’ll add more functionality for the success case in | |
82 | the next listing. The reason we might receive errors from the `incoming` method | |
83 | when a client connects to the server is that we’re not actually iterating over | |
84 | connections. Instead, we’re iterating over *connection attempts*. The | |
85 | connection might not be successful for a number of reasons, many of them | |
86 | operating system specific. For example, many operating systems have a limit to | |
87 | the number of simultaneous open connections they can support; new connection | |
88 | attempts beyond that number will produce an error until some of the open | |
89 | connections are closed. | |
90 | ||
91 | Let’s try running this code! Invoke `cargo run` in the terminal and then load | |
92 | *127.0.0.1:7878* in a web browser. The browser should show an error message | |
93 | like “Connection reset,” because the server isn’t currently sending back any | |
94 | data. But when you look at your terminal, you should see several messages that | |
95 | were printed when the browser connected to the server! | |
96 | ||
97 | ```text | |
98 | Running `target/debug/hello` | |
99 | Connection established! | |
100 | Connection established! | |
101 | Connection established! | |
102 | ``` | |
103 | ||
104 | Sometimes, you’ll see multiple messages printed for one browser request; the | |
105 | reason might be that the browser is making a request for the page as well as a | |
106 | request for other resources, like the *favicon.ico* icon that appears in the | |
107 | browser tab. | |
108 | ||
109 | It could also be that the browser is trying to connect to the server multiple | |
110 | times because the server isn’t responding with any data. When `stream` goes out | |
111 | of scope and is dropped at the end of the loop, the connection is closed as | |
112 | part of the `drop` implementation. Browsers sometimes deal with closed | |
113 | connections by retrying, because the problem might be temporary. The important | |
114 | factor is that we’ve successfully gotten a handle to a TCP connection! | |
115 | ||
116 | Remember to stop the program by pressing <span class="keystroke">ctrl-c</span> | |
117 | when you’re done running a particular version of the code. Then restart `cargo | |
118 | run` after you’ve made each set of code changes to make sure you’re running the | |
119 | newest code. | |
120 | ||
121 | ### Reading the Request | |
122 | ||
123 | Let’s implement the functionality to read the request from the browser! To | |
124 | separate the concerns of first getting a connection and then taking some action | |
125 | with the connection, we’ll start a new function for processing connections. In | |
126 | this new `handle_connection` function, we’ll read data from the TCP stream and | |
127 | print it so we can see the data being sent from the browser. Change the code to | |
128 | look like Listing 20-2. | |
129 | ||
130 | <span class="filename">Filename: src/main.rs</span> | |
131 | ||
132 | ```rust,no_run | |
74b04a01 | 133 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-02/src/main.rs}} |
13cf67c4 XL |
134 | ``` |
135 | ||
136 | <span class="caption">Listing 20-2: Reading from the `TcpStream` and printing | |
137 | the data</span> | |
138 | ||
139 | We bring `std::io::prelude` into scope to get access to certain traits that let | |
140 | us read from and write to the stream. In the `for` loop in the `main` function, | |
141 | instead of printing a message that says we made a connection, we now call the | |
142 | new `handle_connection` function and pass the `stream` to it. | |
143 | ||
144 | In the `handle_connection` function, we’ve made the `stream` parameter mutable. | |
145 | The reason is that the `TcpStream` instance keeps track of what data it returns | |
146 | to us internally. It might read more data than we asked for and save that data | |
147 | for the next time we ask for data. It therefore needs to be `mut` because its | |
148 | internal state might change; usually, we think of “reading” as not needing | |
149 | mutation, but in this case we need the `mut` keyword. | |
150 | ||
f9f354fc XL |
151 | Next, we need to actually read from the stream. We do this in two steps: |
152 | first, we declare a `buffer` on the stack to hold the data that is read in. | |
153 | We’ve made the buffer 1024 bytes in size, which is big enough to hold the | |
154 | data of a basic request and sufficient for our purposes in this chapter. If | |
155 | we wanted to handle requests of an arbitrary size, buffer management would | |
156 | need to be more complicated; we’ll keep it simple for now. We pass the buffer | |
157 | to `stream.read`, which will read bytes from the `TcpStream` and put them in | |
158 | the buffer. | |
13cf67c4 XL |
159 | |
160 | Second, we convert the bytes in the buffer to a string and print that string. | |
161 | The `String::from_utf8_lossy` function takes a `&[u8]` and produces a `String` | |
162 | from it. The “lossy” part of the name indicates the behavior of this function | |
163 | when it sees an invalid UTF-8 sequence: it will replace the invalid sequence | |
164 | with `�`, the `U+FFFD REPLACEMENT CHARACTER`. You might see replacement | |
165 | characters for characters in the buffer that aren’t filled by request data. | |
166 | ||
167 | Let’s try this code! Start the program and make a request in a web browser | |
168 | again. Note that we’ll still get an error page in the browser, but our | |
169 | program’s output in the terminal will now look similar to this: | |
170 | ||
f035d41b | 171 | ```console |
13cf67c4 XL |
172 | $ cargo run |
173 | Compiling hello v0.1.0 (file:///projects/hello) | |
74b04a01 | 174 | Finished dev [unoptimized + debuginfo] target(s) in 0.42s |
13cf67c4 XL |
175 | Running `target/debug/hello` |
176 | Request: GET / HTTP/1.1 | |
177 | Host: 127.0.0.1:7878 | |
178 | User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 | |
179 | Firefox/52.0 | |
180 | Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 | |
181 | Accept-Language: en-US,en;q=0.5 | |
182 | Accept-Encoding: gzip, deflate | |
183 | Connection: keep-alive | |
184 | Upgrade-Insecure-Requests: 1 | |
185 | ������������������������������������ | |
186 | ``` | |
187 | ||
188 | Depending on your browser, you might get slightly different output. Now that | |
189 | we’re printing the request data, we can see why we get multiple connections | |
190 | from one browser request by looking at the path after `Request: GET`. If the | |
191 | repeated connections are all requesting */*, we know the browser is trying to | |
192 | fetch */* repeatedly because it’s not getting a response from our program. | |
193 | ||
194 | Let’s break down this request data to understand what the browser is asking of | |
195 | our program. | |
196 | ||
197 | ### A Closer Look at an HTTP Request | |
198 | ||
199 | HTTP is a text-based protocol, and a request takes this format: | |
200 | ||
201 | ```text | |
202 | Method Request-URI HTTP-Version CRLF | |
203 | headers CRLF | |
204 | message-body | |
205 | ``` | |
206 | ||
207 | The first line is the *request line* that holds information about what the | |
208 | client is requesting. The first part of the request line indicates the *method* | |
209 | being used, such as `GET` or `POST`, which describes how the client is making | |
210 | this request. Our client used a `GET` request. | |
211 | ||
212 | The next part of the request line is */*, which indicates the *Uniform Resource | |
213 | Identifier* *(URI)* the client is requesting: a URI is almost, but not quite, | |
214 | the same as a *Uniform Resource Locator* *(URL)*. The difference between URIs | |
215 | and URLs isn’t important for our purposes in this chapter, but the HTTP spec | |
216 | uses the term URI, so we can just mentally substitute URL for URI here. | |
217 | ||
218 | The last part is the HTTP version the client uses, and then the request line | |
219 | ends in a *CRLF sequence*. (CRLF stands for *carriage return* and *line feed*, | |
220 | which are terms from the typewriter days!) The CRLF sequence can also be | |
221 | written as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The | |
222 | CRLF sequence separates the request line from the rest of the request data. | |
223 | Note that when the CRLF is printed, we see a new line start rather than `\r\n`. | |
224 | ||
225 | Looking at the request line data we received from running our program so far, | |
226 | we see that `GET` is the method, */* is the request URI, and `HTTP/1.1` is the | |
227 | version. | |
228 | ||
229 | After the request line, the remaining lines starting from `Host:` onward are | |
230 | headers. `GET` requests have no body. | |
231 | ||
232 | Try making a request from a different browser or asking for a different | |
233 | address, such as *127.0.0.1:7878/test*, to see how the request data changes. | |
234 | ||
235 | Now that we know what the browser is asking for, let’s send back some data! | |
236 | ||
237 | ### Writing a Response | |
238 | ||
239 | Now we’ll implement sending data in response to a client request. Responses | |
240 | have the following format: | |
241 | ||
242 | ```text | |
243 | HTTP-Version Status-Code Reason-Phrase CRLF | |
244 | headers CRLF | |
245 | message-body | |
246 | ``` | |
247 | ||
248 | The first line is a *status line* that contains the HTTP version used in the | |
249 | response, a numeric status code that summarizes the result of the request, and | |
250 | a reason phrase that provides a text description of the status code. After the | |
251 | CRLF sequence are any headers, another CRLF sequence, and the body of the | |
252 | response. | |
253 | ||
254 | Here is an example response that uses HTTP version 1.1, has a status code of | |
255 | 200, an OK reason phrase, no headers, and no body: | |
256 | ||
257 | ```text | |
258 | HTTP/1.1 200 OK\r\n\r\n | |
259 | ``` | |
260 | ||
261 | The status code 200 is the standard success response. The text is a tiny | |
262 | successful HTTP response. Let’s write this to the stream as our response to a | |
263 | successful request! From the `handle_connection` function, remove the | |
264 | `println!` that was printing the request data and replace it with the code in | |
265 | Listing 20-3. | |
266 | ||
267 | <span class="filename">Filename: src/main.rs</span> | |
268 | ||
74b04a01 XL |
269 | ```rust,no_run |
270 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-03/src/main.rs:here}} | |
13cf67c4 XL |
271 | ``` |
272 | ||
273 | <span class="caption">Listing 20-3: Writing a tiny successful HTTP response to | |
274 | the stream</span> | |
275 | ||
276 | The first new line defines the `response` variable that holds the success | |
277 | message’s data. Then we call `as_bytes` on our `response` to convert the string | |
278 | data to bytes. The `write` method on `stream` takes a `&[u8]` and sends those | |
279 | bytes directly down the connection. | |
280 | ||
281 | Because the `write` operation could fail, we use `unwrap` on any error result | |
282 | as before. Again, in a real application you would add error handling here. | |
283 | Finally, `flush` will wait and prevent the program from continuing until all | |
284 | the bytes are written to the connection; `TcpStream` contains an internal | |
285 | buffer to minimize calls to the underlying operating system. | |
286 | ||
287 | With these changes, let’s run our code and make a request. We’re no longer | |
288 | printing any data to the terminal, so we won’t see any output other than the | |
289 | output from Cargo. When you load *127.0.0.1:7878* in a web browser, you should | |
290 | get a blank page instead of an error. You’ve just hand-coded an HTTP request | |
291 | and response! | |
292 | ||
293 | ### Returning Real HTML | |
294 | ||
295 | Let’s implement the functionality for returning more than a blank page. Create | |
296 | a new file, *hello.html*, in the root of your project directory, not in the | |
297 | *src* directory. You can input any HTML you want; Listing 20-4 shows one | |
298 | possibility. | |
299 | ||
300 | <span class="filename">Filename: hello.html</span> | |
301 | ||
302 | ```html | |
74b04a01 | 303 | {{#include ../listings/ch20-web-server/listing-20-04/hello.html}} |
13cf67c4 XL |
304 | ``` |
305 | ||
306 | <span class="caption">Listing 20-4: A sample HTML file to return in a | |
307 | response</span> | |
308 | ||
309 | This is a minimal HTML5 document with a heading and some text. To return this | |
310 | from the server when a request is received, we’ll modify `handle_connection` as | |
311 | shown in Listing 20-5 to read the HTML file, add it to the response as a body, | |
312 | and send it. | |
313 | ||
314 | <span class="filename">Filename: src/main.rs</span> | |
315 | ||
74b04a01 XL |
316 | ```rust,no_run |
317 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-05/src/main.rs:here}} | |
13cf67c4 XL |
318 | ``` |
319 | ||
320 | <span class="caption">Listing 20-5: Sending the contents of *hello.html* as the | |
321 | body of the response</span> | |
322 | ||
9fa01778 XL |
323 | We’ve added a line at the top to bring the standard library’s filesystem module |
324 | into scope. The code for reading the contents of a file to a string should look | |
13cf67c4 XL |
325 | familiar; we used it in Chapter 12 when we read the contents of a file for our |
326 | I/O project in Listing 12-4. | |
327 | ||
328 | Next, we use `format!` to add the file’s contents as the body of the success | |
f9f354fc XL |
329 | response. To ensure a valid HTTP response, we add the `Content-Length` header |
330 | which is set to the size of our response body, in this case the size of `hello.html`. | |
13cf67c4 XL |
331 | |
332 | Run this code with `cargo run` and load *127.0.0.1:7878* in your browser; you | |
333 | should see your HTML rendered! | |
334 | ||
335 | Currently, we’re ignoring the request data in `buffer` and just sending back | |
336 | the contents of the HTML file unconditionally. That means if you try requesting | |
337 | *127.0.0.1:7878/something-else* in your browser, you’ll still get back this | |
338 | same HTML response. Our server is very limited and is not what most web servers | |
339 | do. We want to customize our responses depending on the request and only send | |
340 | back the HTML file for a well-formed request to */*. | |
341 | ||
342 | ### Validating the Request and Selectively Responding | |
343 | ||
344 | Right now, our web server will return the HTML in the file no matter what the | |
345 | client requested. Let’s add functionality to check that the browser is | |
346 | requesting */* before returning the HTML file and return an error if the | |
347 | browser requests anything else. For this we need to modify `handle_connection`, | |
348 | as shown in Listing 20-6. This new code checks the content of the request | |
349 | received against what we know a request for */* looks like and adds `if` and | |
350 | `else` blocks to treat requests differently. | |
351 | ||
352 | <span class="filename">Filename: src/main.rs</span> | |
353 | ||
74b04a01 XL |
354 | ```rust,no_run |
355 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-06/src/main.rs:here}} | |
13cf67c4 XL |
356 | ``` |
357 | ||
358 | <span class="caption">Listing 20-6: Matching the request and handling requests | |
532ac7d7 | 359 | to */* differently from other requests</span> |
13cf67c4 XL |
360 | |
361 | First, we hardcode the data corresponding to the */* request into the `get` | |
362 | variable. Because we’re reading raw bytes into the buffer, we transform `get` | |
363 | into a byte string by adding the `b""` byte string syntax at the start of the | |
364 | content data. Then we check whether `buffer` starts with the bytes in `get`. If | |
365 | it does, it means we’ve received a well-formed request to */*, which is the | |
366 | success case we’ll handle in the `if` block that returns the contents of our | |
367 | HTML file. | |
368 | ||
369 | If `buffer` does *not* start with the bytes in `get`, it means we’ve received | |
370 | some other request. We’ll add code to the `else` block in a moment to respond | |
371 | to all other requests. | |
372 | ||
373 | Run this code now and request *127.0.0.1:7878*; you should get the HTML in | |
374 | *hello.html*. If you make any other request, such as | |
375 | *127.0.0.1:7878/something-else*, you’ll get a connection error like those you | |
376 | saw when running the code in Listing 20-1 and Listing 20-2. | |
377 | ||
378 | Now let’s add the code in Listing 20-7 to the `else` block to return a response | |
379 | with the status code 404, which signals that the content for the request was | |
380 | not found. We’ll also return some HTML for a page to render in the browser | |
381 | indicating the response to the end user. | |
382 | ||
383 | <span class="filename">Filename: src/main.rs</span> | |
384 | ||
74b04a01 XL |
385 | ```rust,no_run |
386 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-07/src/main.rs:here}} | |
13cf67c4 XL |
387 | ``` |
388 | ||
389 | <span class="caption">Listing 20-7: Responding with status code 404 and an | |
390 | error page if anything other than */* was requested</span> | |
391 | ||
392 | Here, our response has a status line with status code 404 and the reason | |
393 | phrase `NOT FOUND`. We’re still not returning headers, and the body of the | |
394 | response will be the HTML in the file *404.html*. You’ll need to create a | |
395 | *404.html* file next to *hello.html* for the error page; again feel free to use | |
396 | any HTML you want or use the example HTML in Listing 20-8. | |
397 | ||
398 | <span class="filename">Filename: 404.html</span> | |
399 | ||
400 | ```html | |
74b04a01 | 401 | {{#include ../listings/ch20-web-server/listing-20-08/404.html}} |
13cf67c4 XL |
402 | ``` |
403 | ||
404 | <span class="caption">Listing 20-8: Sample content for the page to send back | |
405 | with any 404 response</span> | |
406 | ||
407 | With these changes, run your server again. Requesting *127.0.0.1:7878* | |
408 | should return the contents of *hello.html*, and any other request, like | |
409 | *127.0.0.1:7878/foo*, should return the error HTML from *404.html*. | |
410 | ||
411 | ### A Touch of Refactoring | |
412 | ||
413 | At the moment the `if` and `else` blocks have a lot of repetition: they’re both | |
414 | reading files and writing the contents of the files to the stream. The only | |
415 | differences are the status line and the filename. Let’s make the code more | |
416 | concise by pulling out those differences into separate `if` and `else` lines | |
417 | that will assign the values of the status line and the filename to variables; | |
418 | we can then use those variables unconditionally in the code to read the file | |
419 | and write the response. Listing 20-9 shows the resulting code after replacing | |
420 | the large `if` and `else` blocks. | |
421 | ||
422 | <span class="filename">Filename: src/main.rs</span> | |
423 | ||
74b04a01 XL |
424 | ```rust,no_run |
425 | {{#rustdoc_include ../listings/ch20-web-server/listing-20-09/src/main.rs:here}} | |
13cf67c4 XL |
426 | ``` |
427 | ||
428 | <span class="caption">Listing 20-9: Refactoring the `if` and `else` blocks to | |
429 | contain only the code that differs between the two cases</span> | |
430 | ||
431 | Now the `if` and `else` blocks only return the appropriate values for the | |
432 | status line and filename in a tuple; we then use destructuring to assign these | |
433 | two values to `status_line` and `filename` using a pattern in the `let` | |
434 | statement, as discussed in Chapter 18. | |
435 | ||
436 | The previously duplicated code is now outside the `if` and `else` blocks and | |
437 | uses the `status_line` and `filename` variables. This makes it easier to see | |
438 | the difference between the two cases, and it means we have only one place to | |
439 | update the code if we want to change how the file reading and response writing | |
440 | work. The behavior of the code in Listing 20-9 will be the same as that in | |
441 | Listing 20-8. | |
442 | ||
443 | Awesome! We now have a simple web server in approximately 40 lines of Rust code | |
444 | that responds to one request with a page of content and responds to all other | |
445 | requests with a 404 response. | |
446 | ||
447 | Currently, our server runs in a single thread, meaning it can only serve one | |
448 | request at a time. Let’s examine how that can be a problem by simulating some | |
449 | slow requests. Then we’ll fix it so our server can handle multiple requests at | |
450 | once. |