]> git.proxmox.com Git - rustc.git/blob - src/doc/book/src/ch20-01-single-threaded.md
New upstream version 1.45.0+dfsg1
[rustc.git] / src / doc / book / src / ch20-01-single-threaded.md
1 ## Building a Single-Threaded Web Server
2
3 We’ll start by getting a single-threaded web server working. Before we begin,
4 let’s look at a quick overview of the protocols involved in building web
5 servers. The details of these protocols are beyond the scope of this book, but
6 a brief overview will give you the information you need.
7
8 The two main protocols involved in web servers are the *Hypertext Transfer
9 Protocol* *(HTTP)* and the *Transmission Control Protocol* *(TCP)*. Both
10 protocols are *request-response* protocols, meaning a *client* initiates
11 requests and a *server* listens to the requests and provides a response to the
12 client. The contents of those requests and responses are defined by the
13 protocols.
14
15 TCP is the lower-level protocol that describes the details of how information
16 gets from one server to another but doesn’t specify what that information is.
17 HTTP builds on top of TCP by defining the contents of the requests and
18 responses. It’s technically possible to use HTTP with other protocols, but in
19 the vast majority of cases, HTTP sends its data over TCP. We’ll work with the
20 raw bytes of TCP and HTTP requests and responses.
21
22 ### Listening to the TCP Connection
23
24 Our web server needs to listen to a TCP connection, so that’s the first part
25 we’ll work on. The standard library offers a `std::net` module that lets us do
26 this. Let’s make a new project in the usual fashion:
27
28 ```text
29 $ cargo new hello
30 Created binary (application) `hello` project
31 $ cd hello
32 ```
33
34 Now enter the code in Listing 20-1 in *src/main.rs* to start. This code will
35 listen at the address `127.0.0.1:7878` for incoming TCP streams. When it gets
36 an incoming stream, it will print `Connection established!`.
37
38 <span class="filename">Filename: src/main.rs</span>
39
40 ```rust,no_run
41 {{#rustdoc_include ../listings/ch20-web-server/listing-20-01/src/main.rs}}
42 ```
43
44 <span class="caption">Listing 20-1: Listening for incoming streams and printing
45 a message when we receive a stream</span>
46
47 Using `TcpListener`, we can listen for TCP connections at the address
48 `127.0.0.1:7878`. In the address, the section before the colon is an IP address
49 representing your computer (this is the same on every computer and doesn’t
50 represent the authors’ computer specifically), and `7878` is the port. We’ve
51 chosen this port for two reasons: HTTP is normally accepted on this port, and
52 7878 is *rust* typed on a telephone.
53
54 The `bind` function in this scenario works like the `new` function in that it
55 will return a new `TcpListener` instance. The reason the function is called
56 `bind` is that in networking, connecting to a port to listen to is known as
57 “binding to a port.”
58
59 The `bind` function returns a `Result<T, E>`, which indicates that binding
60 might fail. For example, connecting to port 80 requires administrator
61 privileges (nonadministrators can listen only on ports higher than 1024), so if
62 we tried to connect to port 80 without being an administrator, binding wouldn’t
63 work. As another example, binding wouldn’t work if we ran two instances of our
64 program and so had two programs listening to the same port. Because we’re
65 writing a basic server just for learning purposes, we won’t worry about
66 handling these kinds of errors; instead, we use `unwrap` to stop the program if
67 errors happen.
68
69 The `incoming` method on `TcpListener` returns an iterator that gives us a
70 sequence of streams (more specifically, streams of type `TcpStream`). A single
71 *stream* represents an open connection between the client and the server. A
72 *connection* is the name for the full request and response process in which a
73 client connects to the server, the server generates a response, and the server
74 closes the connection. As such, `TcpStream` will read from itself to see what
75 the client sent and then allow us to write our response to the stream. Overall,
76 this `for` loop will process each connection in turn and produce a series of
77 streams for us to handle.
78
79 For now, our handling of the stream consists of calling `unwrap` to terminate
80 our program if the stream has any errors; if there aren’t any errors, the
81 program prints a message. We’ll add more functionality for the success case in
82 the next listing. The reason we might receive errors from the `incoming` method
83 when a client connects to the server is that we’re not actually iterating over
84 connections. Instead, we’re iterating over *connection attempts*. The
85 connection might not be successful for a number of reasons, many of them
86 operating system specific. For example, many operating systems have a limit to
87 the number of simultaneous open connections they can support; new connection
88 attempts beyond that number will produce an error until some of the open
89 connections are closed.
90
91 Let’s try running this code! Invoke `cargo run` in the terminal and then load
92 *127.0.0.1:7878* in a web browser. The browser should show an error message
93 like “Connection reset,” because the server isn’t currently sending back any
94 data. But when you look at your terminal, you should see several messages that
95 were printed when the browser connected to the server!
96
97 ```text
98 Running `target/debug/hello`
99 Connection established!
100 Connection established!
101 Connection established!
102 ```
103
104 Sometimes, you’ll see multiple messages printed for one browser request; the
105 reason might be that the browser is making a request for the page as well as a
106 request for other resources, like the *favicon.ico* icon that appears in the
107 browser tab.
108
109 It could also be that the browser is trying to connect to the server multiple
110 times because the server isn’t responding with any data. When `stream` goes out
111 of scope and is dropped at the end of the loop, the connection is closed as
112 part of the `drop` implementation. Browsers sometimes deal with closed
113 connections by retrying, because the problem might be temporary. The important
114 factor is that we’ve successfully gotten a handle to a TCP connection!
115
116 Remember to stop the program by pressing <span class="keystroke">ctrl-c</span>
117 when you’re done running a particular version of the code. Then restart `cargo
118 run` after you’ve made each set of code changes to make sure you’re running the
119 newest code.
120
121 ### Reading the Request
122
123 Let’s implement the functionality to read the request from the browser! To
124 separate the concerns of first getting a connection and then taking some action
125 with the connection, we’ll start a new function for processing connections. In
126 this new `handle_connection` function, we’ll read data from the TCP stream and
127 print it so we can see the data being sent from the browser. Change the code to
128 look like Listing 20-2.
129
130 <span class="filename">Filename: src/main.rs</span>
131
132 ```rust,no_run
133 {{#rustdoc_include ../listings/ch20-web-server/listing-20-02/src/main.rs}}
134 ```
135
136 <span class="caption">Listing 20-2: Reading from the `TcpStream` and printing
137 the data</span>
138
139 We bring `std::io::prelude` into scope to get access to certain traits that let
140 us read from and write to the stream. In the `for` loop in the `main` function,
141 instead of printing a message that says we made a connection, we now call the
142 new `handle_connection` function and pass the `stream` to it.
143
144 In the `handle_connection` function, we’ve made the `stream` parameter mutable.
145 The reason is that the `TcpStream` instance keeps track of what data it returns
146 to us internally. It might read more data than we asked for and save that data
147 for the next time we ask for data. It therefore needs to be `mut` because its
148 internal state might change; usually, we think of “reading” as not needing
149 mutation, but in this case we need the `mut` keyword.
150
151 Next, we need to actually read from the stream. We do this in two steps:
152 first, we declare a `buffer` on the stack to hold the data that is read in.
153 We’ve made the buffer 1024 bytes in size, which is big enough to hold the
154 data of a basic request and sufficient for our purposes in this chapter. If
155 we wanted to handle requests of an arbitrary size, buffer management would
156 need to be more complicated; we’ll keep it simple for now. We pass the buffer
157 to `stream.read`, which will read bytes from the `TcpStream` and put them in
158 the buffer.
159
160 Second, we convert the bytes in the buffer to a string and print that string.
161 The `String::from_utf8_lossy` function takes a `&[u8]` and produces a `String`
162 from it. The “lossy” part of the name indicates the behavior of this function
163 when it sees an invalid UTF-8 sequence: it will replace the invalid sequence
164 with `�`, the `U+FFFD REPLACEMENT CHARACTER`. You might see replacement
165 characters for characters in the buffer that aren’t filled by request data.
166
167 Let’s try this code! Start the program and make a request in a web browser
168 again. Note that we’ll still get an error page in the browser, but our
169 program’s output in the terminal will now look similar to this:
170
171 ```text
172 $ cargo run
173 Compiling hello v0.1.0 (file:///projects/hello)
174 Finished dev [unoptimized + debuginfo] target(s) in 0.42s
175 Running `target/debug/hello`
176 Request: GET / HTTP/1.1
177 Host: 127.0.0.1:7878
178 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
179 Firefox/52.0
180 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
181 Accept-Language: en-US,en;q=0.5
182 Accept-Encoding: gzip, deflate
183 Connection: keep-alive
184 Upgrade-Insecure-Requests: 1
185 ������������������������������������
186 ```
187
188 Depending on your browser, you might get slightly different output. Now that
189 we’re printing the request data, we can see why we get multiple connections
190 from one browser request by looking at the path after `Request: GET`. If the
191 repeated connections are all requesting */*, we know the browser is trying to
192 fetch */* repeatedly because it’s not getting a response from our program.
193
194 Let’s break down this request data to understand what the browser is asking of
195 our program.
196
197 ### A Closer Look at an HTTP Request
198
199 HTTP is a text-based protocol, and a request takes this format:
200
201 ```text
202 Method Request-URI HTTP-Version CRLF
203 headers CRLF
204 message-body
205 ```
206
207 The first line is the *request line* that holds information about what the
208 client is requesting. The first part of the request line indicates the *method*
209 being used, such as `GET` or `POST`, which describes how the client is making
210 this request. Our client used a `GET` request.
211
212 The next part of the request line is */*, which indicates the *Uniform Resource
213 Identifier* *(URI)* the client is requesting: a URI is almost, but not quite,
214 the same as a *Uniform Resource Locator* *(URL)*. The difference between URIs
215 and URLs isn’t important for our purposes in this chapter, but the HTTP spec
216 uses the term URI, so we can just mentally substitute URL for URI here.
217
218 The last part is the HTTP version the client uses, and then the request line
219 ends in a *CRLF sequence*. (CRLF stands for *carriage return* and *line feed*,
220 which are terms from the typewriter days!) The CRLF sequence can also be
221 written as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The
222 CRLF sequence separates the request line from the rest of the request data.
223 Note that when the CRLF is printed, we see a new line start rather than `\r\n`.
224
225 Looking at the request line data we received from running our program so far,
226 we see that `GET` is the method, */* is the request URI, and `HTTP/1.1` is the
227 version.
228
229 After the request line, the remaining lines starting from `Host:` onward are
230 headers. `GET` requests have no body.
231
232 Try making a request from a different browser or asking for a different
233 address, such as *127.0.0.1:7878/test*, to see how the request data changes.
234
235 Now that we know what the browser is asking for, let’s send back some data!
236
237 ### Writing a Response
238
239 Now we’ll implement sending data in response to a client request. Responses
240 have the following format:
241
242 ```text
243 HTTP-Version Status-Code Reason-Phrase CRLF
244 headers CRLF
245 message-body
246 ```
247
248 The first line is a *status line* that contains the HTTP version used in the
249 response, a numeric status code that summarizes the result of the request, and
250 a reason phrase that provides a text description of the status code. After the
251 CRLF sequence are any headers, another CRLF sequence, and the body of the
252 response.
253
254 Here is an example response that uses HTTP version 1.1, has a status code of
255 200, an OK reason phrase, no headers, and no body:
256
257 ```text
258 HTTP/1.1 200 OK\r\n\r\n
259 ```
260
261 The status code 200 is the standard success response. The text is a tiny
262 successful HTTP response. Let’s write this to the stream as our response to a
263 successful request! From the `handle_connection` function, remove the
264 `println!` that was printing the request data and replace it with the code in
265 Listing 20-3.
266
267 <span class="filename">Filename: src/main.rs</span>
268
269 ```rust,no_run
270 {{#rustdoc_include ../listings/ch20-web-server/listing-20-03/src/main.rs:here}}
271 ```
272
273 <span class="caption">Listing 20-3: Writing a tiny successful HTTP response to
274 the stream</span>
275
276 The first new line defines the `response` variable that holds the success
277 message’s data. Then we call `as_bytes` on our `response` to convert the string
278 data to bytes. The `write` method on `stream` takes a `&[u8]` and sends those
279 bytes directly down the connection.
280
281 Because the `write` operation could fail, we use `unwrap` on any error result
282 as before. Again, in a real application you would add error handling here.
283 Finally, `flush` will wait and prevent the program from continuing until all
284 the bytes are written to the connection; `TcpStream` contains an internal
285 buffer to minimize calls to the underlying operating system.
286
287 With these changes, let’s run our code and make a request. We’re no longer
288 printing any data to the terminal, so we won’t see any output other than the
289 output from Cargo. When you load *127.0.0.1:7878* in a web browser, you should
290 get a blank page instead of an error. You’ve just hand-coded an HTTP request
291 and response!
292
293 ### Returning Real HTML
294
295 Let’s implement the functionality for returning more than a blank page. Create
296 a new file, *hello.html*, in the root of your project directory, not in the
297 *src* directory. You can input any HTML you want; Listing 20-4 shows one
298 possibility.
299
300 <span class="filename">Filename: hello.html</span>
301
302 ```html
303 {{#include ../listings/ch20-web-server/listing-20-04/hello.html}}
304 ```
305
306 <span class="caption">Listing 20-4: A sample HTML file to return in a
307 response</span>
308
309 This is a minimal HTML5 document with a heading and some text. To return this
310 from the server when a request is received, we’ll modify `handle_connection` as
311 shown in Listing 20-5 to read the HTML file, add it to the response as a body,
312 and send it.
313
314 <span class="filename">Filename: src/main.rs</span>
315
316 ```rust,no_run
317 {{#rustdoc_include ../listings/ch20-web-server/listing-20-05/src/main.rs:here}}
318 ```
319
320 <span class="caption">Listing 20-5: Sending the contents of *hello.html* as the
321 body of the response</span>
322
323 We’ve added a line at the top to bring the standard library’s filesystem module
324 into scope. The code for reading the contents of a file to a string should look
325 familiar; we used it in Chapter 12 when we read the contents of a file for our
326 I/O project in Listing 12-4.
327
328 Next, we use `format!` to add the file’s contents as the body of the success
329 response. To ensure a valid HTTP response, we add the `Content-Length` header
330 which is set to the size of our response body, in this case the size of `hello.html`.
331
332 Run this code with `cargo run` and load *127.0.0.1:7878* in your browser; you
333 should see your HTML rendered!
334
335 Currently, we’re ignoring the request data in `buffer` and just sending back
336 the contents of the HTML file unconditionally. That means if you try requesting
337 *127.0.0.1:7878/something-else* in your browser, you’ll still get back this
338 same HTML response. Our server is very limited and is not what most web servers
339 do. We want to customize our responses depending on the request and only send
340 back the HTML file for a well-formed request to */*.
341
342 ### Validating the Request and Selectively Responding
343
344 Right now, our web server will return the HTML in the file no matter what the
345 client requested. Let’s add functionality to check that the browser is
346 requesting */* before returning the HTML file and return an error if the
347 browser requests anything else. For this we need to modify `handle_connection`,
348 as shown in Listing 20-6. This new code checks the content of the request
349 received against what we know a request for */* looks like and adds `if` and
350 `else` blocks to treat requests differently.
351
352 <span class="filename">Filename: src/main.rs</span>
353
354 ```rust,no_run
355 {{#rustdoc_include ../listings/ch20-web-server/listing-20-06/src/main.rs:here}}
356 ```
357
358 <span class="caption">Listing 20-6: Matching the request and handling requests
359 to */* differently from other requests</span>
360
361 First, we hardcode the data corresponding to the */* request into the `get`
362 variable. Because we’re reading raw bytes into the buffer, we transform `get`
363 into a byte string by adding the `b""` byte string syntax at the start of the
364 content data. Then we check whether `buffer` starts with the bytes in `get`. If
365 it does, it means we’ve received a well-formed request to */*, which is the
366 success case we’ll handle in the `if` block that returns the contents of our
367 HTML file.
368
369 If `buffer` does *not* start with the bytes in `get`, it means we’ve received
370 some other request. We’ll add code to the `else` block in a moment to respond
371 to all other requests.
372
373 Run this code now and request *127.0.0.1:7878*; you should get the HTML in
374 *hello.html*. If you make any other request, such as
375 *127.0.0.1:7878/something-else*, you’ll get a connection error like those you
376 saw when running the code in Listing 20-1 and Listing 20-2.
377
378 Now let’s add the code in Listing 20-7 to the `else` block to return a response
379 with the status code 404, which signals that the content for the request was
380 not found. We’ll also return some HTML for a page to render in the browser
381 indicating the response to the end user.
382
383 <span class="filename">Filename: src/main.rs</span>
384
385 ```rust,no_run
386 {{#rustdoc_include ../listings/ch20-web-server/listing-20-07/src/main.rs:here}}
387 ```
388
389 <span class="caption">Listing 20-7: Responding with status code 404 and an
390 error page if anything other than */* was requested</span>
391
392 Here, our response has a status line with status code 404 and the reason
393 phrase `NOT FOUND`. We’re still not returning headers, and the body of the
394 response will be the HTML in the file *404.html*. You’ll need to create a
395 *404.html* file next to *hello.html* for the error page; again feel free to use
396 any HTML you want or use the example HTML in Listing 20-8.
397
398 <span class="filename">Filename: 404.html</span>
399
400 ```html
401 {{#include ../listings/ch20-web-server/listing-20-08/404.html}}
402 ```
403
404 <span class="caption">Listing 20-8: Sample content for the page to send back
405 with any 404 response</span>
406
407 With these changes, run your server again. Requesting *127.0.0.1:7878*
408 should return the contents of *hello.html*, and any other request, like
409 *127.0.0.1:7878/foo*, should return the error HTML from *404.html*.
410
411 ### A Touch of Refactoring
412
413 At the moment the `if` and `else` blocks have a lot of repetition: they’re both
414 reading files and writing the contents of the files to the stream. The only
415 differences are the status line and the filename. Let’s make the code more
416 concise by pulling out those differences into separate `if` and `else` lines
417 that will assign the values of the status line and the filename to variables;
418 we can then use those variables unconditionally in the code to read the file
419 and write the response. Listing 20-9 shows the resulting code after replacing
420 the large `if` and `else` blocks.
421
422 <span class="filename">Filename: src/main.rs</span>
423
424 ```rust,no_run
425 {{#rustdoc_include ../listings/ch20-web-server/listing-20-09/src/main.rs:here}}
426 ```
427
428 <span class="caption">Listing 20-9: Refactoring the `if` and `else` blocks to
429 contain only the code that differs between the two cases</span>
430
431 Now the `if` and `else` blocks only return the appropriate values for the
432 status line and filename in a tuple; we then use destructuring to assign these
433 two values to `status_line` and `filename` using a pattern in the `let`
434 statement, as discussed in Chapter 18.
435
436 The previously duplicated code is now outside the `if` and `else` blocks and
437 uses the `status_line` and `filename` variables. This makes it easier to see
438 the difference between the two cases, and it means we have only one place to
439 update the code if we want to change how the file reading and response writing
440 work. The behavior of the code in Listing 20-9 will be the same as that in
441 Listing 20-8.
442
443 Awesome! We now have a simple web server in approximately 40 lines of Rust code
444 that responds to one request with a page of content and responds to all other
445 requests with a 404 response.
446
447 Currently, our server runs in a single thread, meaning it can only serve one
448 request at a time. Let’s examine how that can be a problem by simulating some
449 slow requests. Then we’ll fix it so our server can handle multiple requests at
450 once.