src/doc/book/src/ch20-01-single-threaded.md

   1 ## Building a Single-Threaded Web Server
   2
   3 We’ll start by getting a single-threaded web server working. Before we begin,
   4 let’s look at a quick overview of the protocols involved in building web
   5 servers. The details of these protocols are beyond the scope of this book, but
   6 a brief overview will give you the information you need.
   7
   8 The two main protocols involved in web servers are the *Hypertext Transfer
   9 Protocol* *(HTTP)* and the *Transmission Control Protocol* *(TCP)*. Both
  10 protocols are *request-response* protocols, meaning a *client* initiates
  11 requests and a *server* listens to the requests and provides a response to the
  12 client. The contents of those requests and responses are defined by the
  13 protocols.
  14
  15 TCP is the lower-level protocol that describes the details of how information
  16 gets from one server to another but doesn’t specify what that information is.
  17 HTTP builds on top of TCP by defining the contents of the requests and
  18 responses. It’s technically possible to use HTTP with other protocols, but in
  19 the vast majority of cases, HTTP sends its data over TCP. We’ll work with the
  20 raw bytes of TCP and HTTP requests and responses.
  21
  22 ### Listening to the TCP Connection
  23
  24 Our web server needs to listen to a TCP connection, so that’s the first part
  25 we’ll work on. The standard library offers a `std::net` module that lets us do
  26 this. Let’s make a new project in the usual fashion:
  27
  28 ```text
  29 $ cargo new hello
  30      Created binary (application) `hello` project
  31 $ cd hello
  32 ```
  33
  34 Now enter the code in Listing 20-1 in *src/main.rs* to start. This code will
  35 listen at the address `127.0.0.1:7878` for incoming TCP streams. When it gets
  36 an incoming stream, it will print `Connection established!`.
  37
  38 <span class="filename">Filename: src/main.rs</span>
  39
  40 ```rust,no_run
  41 {{#rustdoc_include ../listings/ch20-web-server/listing-20-01/src/main.rs}}
  42 ```
  43
  44 <span class="caption">Listing 20-1: Listening for incoming streams and printing
  45 a message when we receive a stream</span>
  46
  47 Using `TcpListener`, we can listen for TCP connections at the address
  48 `127.0.0.1:7878`. In the address, the section before the colon is an IP address
  49 representing your computer (this is the same on every computer and doesn’t
  50 represent the authors’ computer specifically), and `7878` is the port. We’ve
  51 chosen this port for two reasons: HTTP is normally accepted on this port, and
  52 7878 is *rust* typed on a telephone.
  53
  54 The `bind` function in this scenario works like the `new` function in that it
  55 will return a new `TcpListener` instance. The reason the function is called
  56 `bind` is that in networking, connecting to a port to listen to is known as
  57 “binding to a port.”
  58
  59 The `bind` function returns a `Result<T, E>`, which indicates that binding
  60 might fail. For example, connecting to port 80 requires administrator
  61 privileges (nonadministrators can listen only on ports higher than 1024), so if
  62 we tried to connect to port 80 without being an administrator, binding wouldn’t
  63 work. As another example, binding wouldn’t work if we ran two instances of our
  64 program and so had two programs listening to the same port. Because we’re
  65 writing a basic server just for learning purposes, we won’t worry about
  66 handling these kinds of errors; instead, we use `unwrap` to stop the program if
  67 errors happen.
  68
  69 The `incoming` method on `TcpListener` returns an iterator that gives us a
  70 sequence of streams (more specifically, streams of type `TcpStream`). A single
  71 *stream* represents an open connection between the client and the server. A
  72 *connection* is the name for the full request and response process in which a
  73 client connects to the server, the server generates a response, and the server
  74 closes the connection. As such, `TcpStream` will read from itself to see what
  75 the client sent and then allow us to write our response to the stream. Overall,
  76 this `for` loop will process each connection in turn and produce a series of
  77 streams for us to handle.
  78
  79 For now, our handling of the stream consists of calling `unwrap` to terminate
  80 our program if the stream has any errors; if there aren’t any errors, the
  81 program prints a message. We’ll add more functionality for the success case in
  82 the next listing. The reason we might receive errors from the `incoming` method
  83 when a client connects to the server is that we’re not actually iterating over
  84 connections. Instead, we’re iterating over *connection attempts*. The
  85 connection might not be successful for a number of reasons, many of them
  86 operating system specific. For example, many operating systems have a limit to
  87 the number of simultaneous open connections they can support; new connection
  88 attempts beyond that number will produce an error until some of the open
  89 connections are closed.
  90
  91 Let’s try running this code! Invoke `cargo run` in the terminal and then load
  92 *127.0.0.1:7878* in a web browser. The browser should show an error message
  93 like “Connection reset,” because the server isn’t currently sending back any
  94 data. But when you look at your terminal, you should see several messages that
  95 were printed when the browser connected to the server!
  96
  97 ```text
  98      Running `target/debug/hello`
  99 Connection established!
 100 Connection established!
 101 Connection established!
 102 ```
 103
 104 Sometimes, you’ll see multiple messages printed for one browser request; the
 105 reason might be that the browser is making a request for the page as well as a
 106 request for other resources, like the *favicon.ico* icon that appears in the
 107 browser tab.
 108
 109 It could also be that the browser is trying to connect to the server multiple
 110 times because the server isn’t responding with any data. When `stream` goes out
 111 of scope and is dropped at the end of the loop, the connection is closed as
 112 part of the `drop` implementation. Browsers sometimes deal with closed
 113 connections by retrying, because the problem might be temporary. The important
 114 factor is that we’ve successfully gotten a handle to a TCP connection!
 115
 116 Remember to stop the program by pressing <span class="keystroke">ctrl-c</span>
 117 when you’re done running a particular version of the code. Then restart `cargo
 118 run` after you’ve made each set of code changes to make sure you’re running the
 119 newest code.
 120
 121 ### Reading the Request
 122
 123 Let’s implement the functionality to read the request from the browser! To
 124 separate the concerns of first getting a connection and then taking some action
 125 with the connection, we’ll start a new function for processing connections. In
 126 this new `handle_connection` function, we’ll read data from the TCP stream and
 127 print it so we can see the data being sent from the browser. Change the code to
 128 look like Listing 20-2.
 129
 130 <span class="filename">Filename: src/main.rs</span>
 131
 132 ```rust,no_run
 133 {{#rustdoc_include ../listings/ch20-web-server/listing-20-02/src/main.rs}}
 134 ```
 135
 136 <span class="caption">Listing 20-2: Reading from the `TcpStream` and printing
 137 the data</span>
 138
 139 We bring `std::io::prelude` into scope to get access to certain traits that let
 140 us read from and write to the stream. In the `for` loop in the `main` function,
 141 instead of printing a message that says we made a connection, we now call the
 142 new `handle_connection` function and pass the `stream` to it.
 143
 144 In the `handle_connection` function, we’ve made the `stream` parameter mutable.
 145 The reason is that the `TcpStream` instance keeps track of what data it returns
 146 to us internally. It might read more data than we asked for and save that data
 147 for the next time we ask for data. It therefore needs to be `mut` because its
 148 internal state might change; usually, we think of “reading” as not needing
 149 mutation, but in this case we need the `mut` keyword.
 150
 151 Next, we need to actually read from the stream. We do this in two steps:
 152 first, we declare a `buffer` on the stack to hold the data that is read in.
 153 We’ve made the buffer 1024 bytes in size, which is big enough to hold the
 154 data of a basic request and sufficient for our purposes in this chapter. If
 155 we wanted to handle requests of an arbitrary size, buffer management would
 156 need to be more complicated; we’ll keep it simple for now. We pass the buffer
 157 to `stream.read`, which will read bytes from the `TcpStream` and put them in
 158 the buffer.
 159
 160 Second, we convert the bytes in the buffer to a string and print that string.
 161 The `String::from_utf8_lossy` function takes a `&[u8]` and produces a `String`
 162 from it. The “lossy” part of the name indicates the behavior of this function
 163 when it sees an invalid UTF-8 sequence: it will replace the invalid sequence
 164 with `�`, the `U+FFFD REPLACEMENT CHARACTER`. You might see replacement
 165 characters for characters in the buffer that aren’t filled by request data.
 166
 167 Let’s try this code! Start the program and make a request in a web browser
 168 again. Note that we’ll still get an error page in the browser, but our
 169 program’s output in the terminal will now look similar to this:
 170
 171 ```text
 172 $ cargo run
 173    Compiling hello v0.1.0 (file:///projects/hello)
 174     Finished dev [unoptimized + debuginfo] target(s) in 0.42s
 175      Running `target/debug/hello`
 176 Request: GET / HTTP/1.1
 177 Host: 127.0.0.1:7878
 178 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 179 Firefox/52.0
 180 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
 181 Accept-Language: en-US,en;q=0.5
 182 Accept-Encoding: gzip, deflate
 183 Connection: keep-alive
 184 Upgrade-Insecure-Requests: 1
 185 ������������������������������������
 186 ```
 187
 188 Depending on your browser, you might get slightly different output. Now that
 189 we’re printing the request data, we can see why we get multiple connections
 190 from one browser request by looking at the path after `Request: GET`. If the
 191 repeated connections are all requesting */*, we know the browser is trying to
 192 fetch */* repeatedly because it’s not getting a response from our program.
 193
 194 Let’s break down this request data to understand what the browser is asking of
 195 our program.
 196
 197 ### A Closer Look at an HTTP Request
 198
 199 HTTP is a text-based protocol, and a request takes this format:
 200
 201 ```text
 202 Method Request-URI HTTP-Version CRLF
 203 headers CRLF
 204 message-body
 205 ```
 206
 207 The first line is the *request line* that holds information about what the
 208 client is requesting. The first part of the request line indicates the *method*
 209 being used, such as `GET` or `POST`, which describes how the client is making
 210 this request. Our client used a `GET` request.
 211
 212 The next part of the request line is */*, which indicates the *Uniform Resource
 213 Identifier* *(URI)* the client is requesting: a URI is almost, but not quite,
 214 the same as a *Uniform Resource Locator* *(URL)*. The difference between URIs
 215 and URLs isn’t important for our purposes in this chapter, but the HTTP spec
 216 uses the term URI, so we can just mentally substitute URL for URI here.
 217
 218 The last part is the HTTP version the client uses, and then the request line
 219 ends in a *CRLF sequence*. (CRLF stands for *carriage return* and *line feed*,
 220 which are terms from the typewriter days!) The CRLF sequence can also be
 221 written as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The
 222 CRLF sequence separates the request line from the rest of the request data.
 223 Note that when the CRLF is printed, we see a new line start rather than `\r\n`.
 224
 225 Looking at the request line data we received from running our program so far,
 226 we see that `GET` is the method, */* is the request URI, and `HTTP/1.1` is the
 227 version.
 228
 229 After the request line, the remaining lines starting from `Host:` onward are
 230 headers. `GET` requests have no body.
 231
 232 Try making a request from a different browser or asking for a different
 233 address, such as *127.0.0.1:7878/test*, to see how the request data changes.
 234
 235 Now that we know what the browser is asking for, let’s send back some data!
 236
 237 ### Writing a Response
 238
 239 Now we’ll implement sending data in response to a client request. Responses
 240 have the following format:
 241
 242 ```text
 243 HTTP-Version Status-Code Reason-Phrase CRLF
 244 headers CRLF
 245 message-body
 246 ```
 247
 248 The first line is a *status line* that contains the HTTP version used in the
 249 response, a numeric status code that summarizes the result of the request, and
 250 a reason phrase that provides a text description of the status code. After the
 251 CRLF sequence are any headers, another CRLF sequence, and the body of the
 252 response.
 253
 254 Here is an example response that uses HTTP version 1.1, has a status code of
 255 200, an OK reason phrase, no headers, and no body:
 256
 257 ```text
 258 HTTP/1.1 200 OK\r\n\r\n
 259 ```
 260
 261 The status code 200 is the standard success response. The text is a tiny
 262 successful HTTP response. Let’s write this to the stream as our response to a
 263 successful request! From the `handle_connection` function, remove the
 264 `println!` that was printing the request data and replace it with the code in
 265 Listing 20-3.
 266
 267 <span class="filename">Filename: src/main.rs</span>
 268
 269 ```rust,no_run
 270 {{#rustdoc_include ../listings/ch20-web-server/listing-20-03/src/main.rs:here}}
 271 ```
 272
 273 <span class="caption">Listing 20-3: Writing a tiny successful HTTP response to
 274 the stream</span>
 275
 276 The first new line defines the `response` variable that holds the success
 277 message’s data. Then we call `as_bytes` on our `response` to convert the string
 278 data to bytes. The `write` method on `stream` takes a `&[u8]` and sends those
 279 bytes directly down the connection.
 280
 281 Because the `write` operation could fail, we use `unwrap` on any error result
 282 as before. Again, in a real application you would add error handling here.
 283 Finally, `flush` will wait and prevent the program from continuing until all
 284 the bytes are written to the connection; `TcpStream` contains an internal
 285 buffer to minimize calls to the underlying operating system.
 286
 287 With these changes, let’s run our code and make a request. We’re no longer
 288 printing any data to the terminal, so we won’t see any output other than the
 289 output from Cargo. When you load *127.0.0.1:7878* in a web browser, you should
 290 get a blank page instead of an error. You’ve just hand-coded an HTTP request
 291 and response!
 292
 293 ### Returning Real HTML
 294
 295 Let’s implement the functionality for returning more than a blank page. Create
 296 a new file, *hello.html*, in the root of your project directory, not in the
 297 *src* directory. You can input any HTML you want; Listing 20-4 shows one
 298 possibility.
 299
 300 <span class="filename">Filename: hello.html</span>
 301
 302 ```html
 303 {{#include ../listings/ch20-web-server/listing-20-04/hello.html}}
 304 ```
 305
 306 <span class="caption">Listing 20-4: A sample HTML file to return in a
 307 response</span>
 308
 309 This is a minimal HTML5 document with a heading and some text. To return this
 310 from the server when a request is received, we’ll modify `handle_connection` as
 311 shown in Listing 20-5 to read the HTML file, add it to the response as a body,
 312 and send it.
 313
 314 <span class="filename">Filename: src/main.rs</span>
 315
 316 ```rust,no_run
 317 {{#rustdoc_include ../listings/ch20-web-server/listing-20-05/src/main.rs:here}}
 318 ```
 319
 320 <span class="caption">Listing 20-5: Sending the contents of *hello.html* as the
 321 body of the response</span>
 322
 323 We’ve added a line at the top to bring the standard library’s filesystem module
 324 into scope. The code for reading the contents of a file to a string should look
 325 familiar; we used it in Chapter 12 when we read the contents of a file for our
 326 I/O project in Listing 12-4.
 327
 328 Next, we use `format!` to add the file’s contents as the body of the success
 329 response. To ensure a valid HTTP response, we add the `Content-Length` header
 330 which is set to the size of our response body, in this case the size of `hello.html`.
 331
 332 Run this code with `cargo run` and load *127.0.0.1:7878* in your browser; you
 333 should see your HTML rendered!
 334
 335 Currently, we’re ignoring the request data in `buffer` and just sending back
 336 the contents of the HTML file unconditionally. That means if you try requesting
 337 *127.0.0.1:7878/something-else* in your browser, you’ll still get back this
 338 same HTML response. Our server is very limited and is not what most web servers
 339 do. We want to customize our responses depending on the request and only send
 340 back the HTML file for a well-formed request to */*.
 341
 342 ### Validating the Request and Selectively Responding
 343
 344 Right now, our web server will return the HTML in the file no matter what the
 345 client requested. Let’s add functionality to check that the browser is
 346 requesting */* before returning the HTML file and return an error if the
 347 browser requests anything else. For this we need to modify `handle_connection`,
 348 as shown in Listing 20-6. This new code checks the content of the request
 349 received against what we know a request for */* looks like and adds `if` and
 350 `else` blocks to treat requests differently.
 351
 352 <span class="filename">Filename: src/main.rs</span>
 353
 354 ```rust,no_run
 355 {{#rustdoc_include ../listings/ch20-web-server/listing-20-06/src/main.rs:here}}
 356 ```
 357
 358 <span class="caption">Listing 20-6: Matching the request and handling requests
 359 to */* differently from other requests</span>
 360
 361 First, we hardcode the data corresponding to the */* request into the `get`
 362 variable. Because we’re reading raw bytes into the buffer, we transform `get`
 363 into a byte string by adding the `b""` byte string syntax at the start of the
 364 content data. Then we check whether `buffer` starts with the bytes in `get`. If
 365 it does, it means we’ve received a well-formed request to */*, which is the
 366 success case we’ll handle in the `if` block that returns the contents of our
 367 HTML file.
 368
 369 If `buffer` does *not* start with the bytes in `get`, it means we’ve received
 370 some other request. We’ll add code to the `else` block in a moment to respond
 371 to all other requests.
 372
 373 Run this code now and request *127.0.0.1:7878*; you should get the HTML in
 374 *hello.html*. If you make any other request, such as
 375 *127.0.0.1:7878/something-else*, you’ll get a connection error like those you
 376 saw when running the code in Listing 20-1 and Listing 20-2.
 377
 378 Now let’s add the code in Listing 20-7 to the `else` block to return a response
 379 with the status code 404, which signals that the content for the request was
 380 not found. We’ll also return some HTML for a page to render in the browser
 381 indicating the response to the end user.
 382
 383 <span class="filename">Filename: src/main.rs</span>
 384
 385 ```rust,no_run
 386 {{#rustdoc_include ../listings/ch20-web-server/listing-20-07/src/main.rs:here}}
 387 ```
 388
 389 <span class="caption">Listing 20-7: Responding with status code 404 and an
 390 error page if anything other than */* was requested</span>
 391
 392 Here, our response has a status line with status code 404 and the reason
 393 phrase `NOT FOUND`. We’re still not returning headers, and the body of the
 394 response will be the HTML in the file *404.html*. You’ll need to create a
 395 *404.html* file next to *hello.html* for the error page; again feel free to use
 396 any HTML you want or use the example HTML in Listing 20-8.
 397
 398 <span class="filename">Filename: 404.html</span>
 399
 400 ```html
 401 {{#include ../listings/ch20-web-server/listing-20-08/404.html}}
 402 ```
 403
 404 <span class="caption">Listing 20-8: Sample content for the page to send back
 405 with any 404 response</span>
 406
 407 With these changes, run your server again. Requesting *127.0.0.1:7878*
 408 should return the contents of *hello.html*, and any other request, like
 409 *127.0.0.1:7878/foo*, should return the error HTML from *404.html*.
 410
 411 ### A Touch of Refactoring
 412
 413 At the moment the `if` and `else` blocks have a lot of repetition: they’re both
 414 reading files and writing the contents of the files to the stream. The only
 415 differences are the status line and the filename. Let’s make the code more
 416 concise by pulling out those differences into separate `if` and `else` lines
 417 that will assign the values of the status line and the filename to variables;
 418 we can then use those variables unconditionally in the code to read the file
 419 and write the response. Listing 20-9 shows the resulting code after replacing
 420 the large `if` and `else` blocks.
 421
 422 <span class="filename">Filename: src/main.rs</span>
 423
 424 ```rust,no_run
 425 {{#rustdoc_include ../listings/ch20-web-server/listing-20-09/src/main.rs:here}}
 426 ```
 427
 428 <span class="caption">Listing 20-9: Refactoring the `if` and `else` blocks to
 429 contain only the code that differs between the two cases</span>
 430
 431 Now the `if` and `else` blocks only return the appropriate values for the
 432 status line and filename in a tuple; we then use destructuring to assign these
 433 two values to `status_line` and `filename` using a pattern in the `let`
 434 statement, as discussed in Chapter 18.
 435
 436 The previously duplicated code is now outside the `if` and `else` blocks and
 437 uses the `status_line` and `filename` variables. This makes it easier to see
 438 the difference between the two cases, and it means we have only one place to
 439 update the code if we want to change how the file reading and response writing
 440 work. The behavior of the code in Listing 20-9 will be the same as that in
 441 Listing 20-8.
 442
 443 Awesome! We now have a simple web server in approximately 40 lines of Rust code
 444 that responds to one request with a page of content and responds to all other
 445 requests with a 404 response.
 446
 447 Currently, our server runs in a single thread, meaning it can only serve one
 448 request at a time. Let’s examine how that can be a problem by simulating some
 449 slow requests. Then we’ll fix it so our server can handle multiple requests at
 450 once.