]> git.proxmox.com Git - rustc.git/blame - src/doc/book/src/ch20-01-single-threaded.md
Merge tag 'debian/1.52.1+dfsg1-1_exp2' into proxmox/buster
[rustc.git] / src / doc / book / src / ch20-01-single-threaded.md
CommitLineData
13cf67c4
XL
1## Building a Single-Threaded Web Server
2
3We’ll start by getting a single-threaded web server working. Before we begin,
4let’s look at a quick overview of the protocols involved in building web
5servers. The details of these protocols are beyond the scope of this book, but
6a brief overview will give you the information you need.
7
8The two main protocols involved in web servers are the *Hypertext Transfer
9Protocol* *(HTTP)* and the *Transmission Control Protocol* *(TCP)*. Both
10protocols are *request-response* protocols, meaning a *client* initiates
11requests and a *server* listens to the requests and provides a response to the
12client. The contents of those requests and responses are defined by the
13protocols.
14
15TCP is the lower-level protocol that describes the details of how information
16gets from one server to another but doesn’t specify what that information is.
17HTTP builds on top of TCP by defining the contents of the requests and
18responses. It’s technically possible to use HTTP with other protocols, but in
19the vast majority of cases, HTTP sends its data over TCP. We’ll work with the
20raw bytes of TCP and HTTP requests and responses.
21
22### Listening to the TCP Connection
23
24Our web server needs to listen to a TCP connection, so that’s the first part
25we’ll work on. The standard library offers a `std::net` module that lets us do
26this. Let’s make a new project in the usual fashion:
27
f035d41b 28```console
13cf67c4
XL
29$ cargo new hello
30 Created binary (application) `hello` project
31$ cd hello
32```
33
34Now enter the code in Listing 20-1 in *src/main.rs* to start. This code will
35listen at the address `127.0.0.1:7878` for incoming TCP streams. When it gets
36an incoming stream, it will print `Connection established!`.
37
38<span class="filename">Filename: src/main.rs</span>
39
40```rust,no_run
74b04a01 41{{#rustdoc_include ../listings/ch20-web-server/listing-20-01/src/main.rs}}
13cf67c4
XL
42```
43
44<span class="caption">Listing 20-1: Listening for incoming streams and printing
45a message when we receive a stream</span>
46
47Using `TcpListener`, we can listen for TCP connections at the address
48`127.0.0.1:7878`. In the address, the section before the colon is an IP address
49representing your computer (this is the same on every computer and doesn’t
50represent the authors’ computer specifically), and `7878` is the port. We’ve
6a06907d 51chosen this port for two reasons: HTTP isn't normally accepted on this port, and
13cf67c4
XL
527878 is *rust* typed on a telephone.
53
54The `bind` function in this scenario works like the `new` function in that it
55will return a new `TcpListener` instance. The reason the function is called
56`bind` is that in networking, connecting to a port to listen to is known as
57“binding to a port.”
58
59The `bind` function returns a `Result<T, E>`, which indicates that binding
60might fail. For example, connecting to port 80 requires administrator
fc512014 61privileges (nonadministrators can listen only on ports higher than 1023), so if
13cf67c4
XL
62we tried to connect to port 80 without being an administrator, binding wouldn’t
63work. As another example, binding wouldn’t work if we ran two instances of our
64program and so had two programs listening to the same port. Because we’re
65writing a basic server just for learning purposes, we won’t worry about
66handling these kinds of errors; instead, we use `unwrap` to stop the program if
67errors happen.
68
69The `incoming` method on `TcpListener` returns an iterator that gives us a
70sequence of streams (more specifically, streams of type `TcpStream`). A single
71*stream* represents an open connection between the client and the server. A
72*connection* is the name for the full request and response process in which a
73client connects to the server, the server generates a response, and the server
74closes the connection. As such, `TcpStream` will read from itself to see what
75the client sent and then allow us to write our response to the stream. Overall,
76this `for` loop will process each connection in turn and produce a series of
77streams for us to handle.
78
79For now, our handling of the stream consists of calling `unwrap` to terminate
80our program if the stream has any errors; if there aren’t any errors, the
81program prints a message. We’ll add more functionality for the success case in
82the next listing. The reason we might receive errors from the `incoming` method
83when a client connects to the server is that we’re not actually iterating over
84connections. Instead, we’re iterating over *connection attempts*. The
85connection might not be successful for a number of reasons, many of them
86operating system specific. For example, many operating systems have a limit to
87the number of simultaneous open connections they can support; new connection
88attempts beyond that number will produce an error until some of the open
89connections are closed.
90
91Let’s try running this code! Invoke `cargo run` in the terminal and then load
92*127.0.0.1:7878* in a web browser. The browser should show an error message
93like “Connection reset,” because the server isn’t currently sending back any
94data. But when you look at your terminal, you should see several messages that
95were printed when the browser connected to the server!
96
97```text
98 Running `target/debug/hello`
99Connection established!
100Connection established!
101Connection established!
102```
103
104Sometimes, you’ll see multiple messages printed for one browser request; the
105reason might be that the browser is making a request for the page as well as a
106request for other resources, like the *favicon.ico* icon that appears in the
107browser tab.
108
109It could also be that the browser is trying to connect to the server multiple
110times because the server isn’t responding with any data. When `stream` goes out
111of scope and is dropped at the end of the loop, the connection is closed as
112part of the `drop` implementation. Browsers sometimes deal with closed
113connections by retrying, because the problem might be temporary. The important
114factor is that we’ve successfully gotten a handle to a TCP connection!
115
116Remember to stop the program by pressing <span class="keystroke">ctrl-c</span>
117when you’re done running a particular version of the code. Then restart `cargo
118run` after you’ve made each set of code changes to make sure you’re running the
119newest code.
120
121### Reading the Request
122
123Let’s implement the functionality to read the request from the browser! To
124separate the concerns of first getting a connection and then taking some action
125with the connection, we’ll start a new function for processing connections. In
126this new `handle_connection` function, we’ll read data from the TCP stream and
127print it so we can see the data being sent from the browser. Change the code to
128look like Listing 20-2.
129
130<span class="filename">Filename: src/main.rs</span>
131
132```rust,no_run
74b04a01 133{{#rustdoc_include ../listings/ch20-web-server/listing-20-02/src/main.rs}}
13cf67c4
XL
134```
135
136<span class="caption">Listing 20-2: Reading from the `TcpStream` and printing
137the data</span>
138
139We bring `std::io::prelude` into scope to get access to certain traits that let
140us read from and write to the stream. In the `for` loop in the `main` function,
141instead of printing a message that says we made a connection, we now call the
142new `handle_connection` function and pass the `stream` to it.
143
144In the `handle_connection` function, we’ve made the `stream` parameter mutable.
145The reason is that the `TcpStream` instance keeps track of what data it returns
146to us internally. It might read more data than we asked for and save that data
147for the next time we ask for data. It therefore needs to be `mut` because its
148internal state might change; usually, we think of “reading” as not needing
149mutation, but in this case we need the `mut` keyword.
150
f9f354fc
XL
151Next, we need to actually read from the stream. We do this in two steps:
152first, we declare a `buffer` on the stack to hold the data that is read in.
153We’ve made the buffer 1024 bytes in size, which is big enough to hold the
154data of a basic request and sufficient for our purposes in this chapter. If
155we wanted to handle requests of an arbitrary size, buffer management would
156need to be more complicated; we’ll keep it simple for now. We pass the buffer
157to `stream.read`, which will read bytes from the `TcpStream` and put them in
158the buffer.
13cf67c4
XL
159
160Second, we convert the bytes in the buffer to a string and print that string.
161The `String::from_utf8_lossy` function takes a `&[u8]` and produces a `String`
162from it. The “lossy” part of the name indicates the behavior of this function
163when it sees an invalid UTF-8 sequence: it will replace the invalid sequence
164with `�`, the `U+FFFD REPLACEMENT CHARACTER`. You might see replacement
165characters for characters in the buffer that aren’t filled by request data.
166
167Let’s try this code! Start the program and make a request in a web browser
168again. Note that we’ll still get an error page in the browser, but our
169program’s output in the terminal will now look similar to this:
170
f035d41b 171```console
13cf67c4
XL
172$ cargo run
173 Compiling hello v0.1.0 (file:///projects/hello)
74b04a01 174 Finished dev [unoptimized + debuginfo] target(s) in 0.42s
13cf67c4
XL
175 Running `target/debug/hello`
176Request: GET / HTTP/1.1
177Host: 127.0.0.1:7878
178User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
179Firefox/52.0
180Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
181Accept-Language: en-US,en;q=0.5
182Accept-Encoding: gzip, deflate
183Connection: keep-alive
184Upgrade-Insecure-Requests: 1
185������������������������������������
186```
187
188Depending on your browser, you might get slightly different output. Now that
189we’re printing the request data, we can see why we get multiple connections
190from one browser request by looking at the path after `Request: GET`. If the
191repeated connections are all requesting */*, we know the browser is trying to
192fetch */* repeatedly because it’s not getting a response from our program.
193
194Let’s break down this request data to understand what the browser is asking of
195our program.
196
197### A Closer Look at an HTTP Request
198
199HTTP is a text-based protocol, and a request takes this format:
200
201```text
202Method Request-URI HTTP-Version CRLF
203headers CRLF
204message-body
205```
206
207The first line is the *request line* that holds information about what the
208client is requesting. The first part of the request line indicates the *method*
209being used, such as `GET` or `POST`, which describes how the client is making
210this request. Our client used a `GET` request.
211
212The next part of the request line is */*, which indicates the *Uniform Resource
213Identifier* *(URI)* the client is requesting: a URI is almost, but not quite,
214the same as a *Uniform Resource Locator* *(URL)*. The difference between URIs
215and URLs isn’t important for our purposes in this chapter, but the HTTP spec
216uses the term URI, so we can just mentally substitute URL for URI here.
217
218The last part is the HTTP version the client uses, and then the request line
219ends in a *CRLF sequence*. (CRLF stands for *carriage return* and *line feed*,
220which are terms from the typewriter days!) The CRLF sequence can also be
221written as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The
222CRLF sequence separates the request line from the rest of the request data.
223Note that when the CRLF is printed, we see a new line start rather than `\r\n`.
224
225Looking at the request line data we received from running our program so far,
226we see that `GET` is the method, */* is the request URI, and `HTTP/1.1` is the
227version.
228
229After the request line, the remaining lines starting from `Host:` onward are
230headers. `GET` requests have no body.
231
232Try making a request from a different browser or asking for a different
233address, such as *127.0.0.1:7878/test*, to see how the request data changes.
234
235Now that we know what the browser is asking for, let’s send back some data!
236
237### Writing a Response
238
239Now we’ll implement sending data in response to a client request. Responses
240have the following format:
241
242```text
243HTTP-Version Status-Code Reason-Phrase CRLF
244headers CRLF
245message-body
246```
247
248The first line is a *status line* that contains the HTTP version used in the
249response, a numeric status code that summarizes the result of the request, and
250a reason phrase that provides a text description of the status code. After the
251CRLF sequence are any headers, another CRLF sequence, and the body of the
252response.
253
254Here is an example response that uses HTTP version 1.1, has a status code of
255200, an OK reason phrase, no headers, and no body:
256
257```text
258HTTP/1.1 200 OK\r\n\r\n
259```
260
261The status code 200 is the standard success response. The text is a tiny
262successful HTTP response. Let’s write this to the stream as our response to a
263successful request! From the `handle_connection` function, remove the
264`println!` that was printing the request data and replace it with the code in
265Listing 20-3.
266
267<span class="filename">Filename: src/main.rs</span>
268
74b04a01
XL
269```rust,no_run
270{{#rustdoc_include ../listings/ch20-web-server/listing-20-03/src/main.rs:here}}
13cf67c4
XL
271```
272
273<span class="caption">Listing 20-3: Writing a tiny successful HTTP response to
274the stream</span>
275
276The first new line defines the `response` variable that holds the success
277message’s data. Then we call `as_bytes` on our `response` to convert the string
278data to bytes. The `write` method on `stream` takes a `&[u8]` and sends those
279bytes directly down the connection.
280
281Because the `write` operation could fail, we use `unwrap` on any error result
282as before. Again, in a real application you would add error handling here.
283Finally, `flush` will wait and prevent the program from continuing until all
284the bytes are written to the connection; `TcpStream` contains an internal
285buffer to minimize calls to the underlying operating system.
286
287With these changes, let’s run our code and make a request. We’re no longer
288printing any data to the terminal, so we won’t see any output other than the
289output from Cargo. When you load *127.0.0.1:7878* in a web browser, you should
290get a blank page instead of an error. You’ve just hand-coded an HTTP request
291and response!
292
293### Returning Real HTML
294
295Let’s implement the functionality for returning more than a blank page. Create
296a new file, *hello.html*, in the root of your project directory, not in the
297*src* directory. You can input any HTML you want; Listing 20-4 shows one
298possibility.
299
300<span class="filename">Filename: hello.html</span>
301
302```html
74b04a01 303{{#include ../listings/ch20-web-server/listing-20-04/hello.html}}
13cf67c4
XL
304```
305
306<span class="caption">Listing 20-4: A sample HTML file to return in a
307response</span>
308
309This is a minimal HTML5 document with a heading and some text. To return this
310from the server when a request is received, we’ll modify `handle_connection` as
311shown in Listing 20-5 to read the HTML file, add it to the response as a body,
312and send it.
313
314<span class="filename">Filename: src/main.rs</span>
315
74b04a01
XL
316```rust,no_run
317{{#rustdoc_include ../listings/ch20-web-server/listing-20-05/src/main.rs:here}}
13cf67c4
XL
318```
319
320<span class="caption">Listing 20-5: Sending the contents of *hello.html* as the
321body of the response</span>
322
9fa01778
XL
323We’ve added a line at the top to bring the standard library’s filesystem module
324into scope. The code for reading the contents of a file to a string should look
13cf67c4
XL
325familiar; we used it in Chapter 12 when we read the contents of a file for our
326I/O project in Listing 12-4.
327
328Next, we use `format!` to add the file’s contents as the body of the success
f9f354fc
XL
329response. To ensure a valid HTTP response, we add the `Content-Length` header
330which is set to the size of our response body, in this case the size of `hello.html`.
13cf67c4
XL
331
332Run this code with `cargo run` and load *127.0.0.1:7878* in your browser; you
333should see your HTML rendered!
334
335Currently, we’re ignoring the request data in `buffer` and just sending back
336the contents of the HTML file unconditionally. That means if you try requesting
337*127.0.0.1:7878/something-else* in your browser, you’ll still get back this
338same HTML response. Our server is very limited and is not what most web servers
339do. We want to customize our responses depending on the request and only send
340back the HTML file for a well-formed request to */*.
341
342### Validating the Request and Selectively Responding
343
344Right now, our web server will return the HTML in the file no matter what the
345client requested. Let’s add functionality to check that the browser is
346requesting */* before returning the HTML file and return an error if the
347browser requests anything else. For this we need to modify `handle_connection`,
348as shown in Listing 20-6. This new code checks the content of the request
349received against what we know a request for */* looks like and adds `if` and
350`else` blocks to treat requests differently.
351
352<span class="filename">Filename: src/main.rs</span>
353
74b04a01
XL
354```rust,no_run
355{{#rustdoc_include ../listings/ch20-web-server/listing-20-06/src/main.rs:here}}
13cf67c4
XL
356```
357
358<span class="caption">Listing 20-6: Matching the request and handling requests
532ac7d7 359to */* differently from other requests</span>
13cf67c4
XL
360
361First, we hardcode the data corresponding to the */* request into the `get`
362variable. Because we’re reading raw bytes into the buffer, we transform `get`
363into a byte string by adding the `b""` byte string syntax at the start of the
364content data. Then we check whether `buffer` starts with the bytes in `get`. If
365it does, it means we’ve received a well-formed request to */*, which is the
366success case we’ll handle in the `if` block that returns the contents of our
367HTML file.
368
369If `buffer` does *not* start with the bytes in `get`, it means we’ve received
370some other request. We’ll add code to the `else` block in a moment to respond
371to all other requests.
372
373Run this code now and request *127.0.0.1:7878*; you should get the HTML in
374*hello.html*. If you make any other request, such as
375*127.0.0.1:7878/something-else*, you’ll get a connection error like those you
376saw when running the code in Listing 20-1 and Listing 20-2.
377
378Now let’s add the code in Listing 20-7 to the `else` block to return a response
379with the status code 404, which signals that the content for the request was
380not found. We’ll also return some HTML for a page to render in the browser
381indicating the response to the end user.
382
383<span class="filename">Filename: src/main.rs</span>
384
74b04a01
XL
385```rust,no_run
386{{#rustdoc_include ../listings/ch20-web-server/listing-20-07/src/main.rs:here}}
13cf67c4
XL
387```
388
389<span class="caption">Listing 20-7: Responding with status code 404 and an
390error page if anything other than */* was requested</span>
391
392Here, our response has a status line with status code 404 and the reason
6a06907d
XL
393phrase `NOT FOUND`. The body of the response will be the HTML in the file
394*404.html*. You’ll need to create a *404.html* file next to *hello.html* for
395the error page; again feel free to use any HTML you want or use the example
396HTML in Listing 20-8.
13cf67c4
XL
397
398<span class="filename">Filename: 404.html</span>
399
400```html
74b04a01 401{{#include ../listings/ch20-web-server/listing-20-08/404.html}}
13cf67c4
XL
402```
403
404<span class="caption">Listing 20-8: Sample content for the page to send back
405with any 404 response</span>
406
407With these changes, run your server again. Requesting *127.0.0.1:7878*
408should return the contents of *hello.html*, and any other request, like
409*127.0.0.1:7878/foo*, should return the error HTML from *404.html*.
410
411### A Touch of Refactoring
412
413At the moment the `if` and `else` blocks have a lot of repetition: they’re both
414reading files and writing the contents of the files to the stream. The only
415differences are the status line and the filename. Let’s make the code more
416concise by pulling out those differences into separate `if` and `else` lines
417that will assign the values of the status line and the filename to variables;
418we can then use those variables unconditionally in the code to read the file
419and write the response. Listing 20-9 shows the resulting code after replacing
420the large `if` and `else` blocks.
421
422<span class="filename">Filename: src/main.rs</span>
423
74b04a01
XL
424```rust,no_run
425{{#rustdoc_include ../listings/ch20-web-server/listing-20-09/src/main.rs:here}}
13cf67c4
XL
426```
427
428<span class="caption">Listing 20-9: Refactoring the `if` and `else` blocks to
429contain only the code that differs between the two cases</span>
430
431Now the `if` and `else` blocks only return the appropriate values for the
432status line and filename in a tuple; we then use destructuring to assign these
433two values to `status_line` and `filename` using a pattern in the `let`
434statement, as discussed in Chapter 18.
435
436The previously duplicated code is now outside the `if` and `else` blocks and
437uses the `status_line` and `filename` variables. This makes it easier to see
438the difference between the two cases, and it means we have only one place to
439update the code if we want to change how the file reading and response writing
440work. The behavior of the code in Listing 20-9 will be the same as that in
441Listing 20-8.
442
443Awesome! We now have a simple web server in approximately 40 lines of Rust code
444that responds to one request with a page of content and responds to all other
445requests with a 404 response.
446
447Currently, our server runs in a single thread, meaning it can only serve one
448request at a time. Let’s examine how that can be a problem by simulating some
449slow requests. Then we’ll fix it so our server can handle multiple requests at
450once.