ceph/src/seastar/doc/shared-token-bucket.md

   1 # Shared token bucket
   2
   3 ## Intro
   4
   5 The classical token bucket has two parameters -- rate and limit. The rate
   6 is the amount of tokens that are put into bucket per some period of time
   7 (say -- second), the limit is the maximum amount of tokens that can be
   8 accumulated in the bucket. The process of regeneration of tokens this way
   9 is called "replenishing" below. When a request needs to be served it should
  10 try to get a certain amount of tokens from the bucket.
  11
  12 The shared token bucket implements the above model for seastar sharded
  13 architecture. The implementation doesn't use locks and is built on atomic
  14 arithmetics.
  15
  16 ## Theory
  17
  18 ### Rovers
  19
  20 The bucket is implemented as a pair of increasing counters -- tail and
  21 head rovers. The consumer of tokens advances the tail rover. To replenish
  22 tokens into the bucket the head rover is advanced.
  23
  24     +--------------------------------------------------------------------->
  25     ^             ^
  26     tail          head
  27
  28     grab N tokens:
  29
  30     +--------------------------------------------------------------------->
  31     .---> ^       ^
  32           tail    head
  33
  34     replenish N tokens:
  35
  36     +--------------------------------------------------------------------->
  37           ^       .---> ^
  38           tail          head
  39
  40 It's possible that after grab the tail would overrun the head and will occur
  41 in front of it. This would mean that there's not enough tokens in the bucket
  42 and that some amount of tokens were claimed from it.
  43
  44     grab a lot of tokens:
  45
  46     +--------------------------------------------------------------------->
  47           .------------ ^ --> ^
  48                         head  tail
  49
  50 To check if the tokens were grabbed the caller needs to check if the head is
  51 (still) in front of the tail. This approach adds the ability for the consumers
  52 to line up in the queue when they all try to grab tokens from a contented
  53 bucket. The "ticket lock" model works the same way.
  54
  55 ### Capped release
  56
  57 The implementation additionally support so called "capped release". This is
  58 when tokens are not replenished from nowhere, but leak into the main bucket
  59 from another bucket into which the caller should explicitly put them. This mode
  60 can be useful in cases when the token bucket guards the entrance into some
  61 location that can temporarily (or constantly, but in that case it would denote
  62 a bucket misconfiguration) slow down and stop handling tokens at the given
  63 rate. To prevent token bucket from over-subscribing the guardee at those times,
  64 the second bucket can be refilled with the completions coming from the latter.
  65
  66 In terms of rovers this is implemented with the help of a third rover called
  67 ceiling (or ceil in the code). This ceil rover actually defines the upper
  68 limit at which the head rover may point. Respectively, putting tokens into
  69 the second bucket (it's called releasing below) means advancing the ceil.
  70
  71 ## Practice
  72
  73 ### API
  74
  75 To work with the token bucket there are 4 calls:
  76
  77  * `grab(N)` -- grabs a certain amount of tokens from the bucket and returns
  78    back the resulting "tail" rover value. The value is useless per-se and is
  79    only needed to call the deficiency() method
  80
  81  * `replenish(time)` -- tries to replenish tokens into the bucket. The amount
  82    of replenished tokens is how many had accumulated since last replenish
  83    till the `time` parameter
  84
  85  * `release(N)` -- releases the given number of token making them available
  86    for replenishment. Only works if capped release is turned on by the
  87    template parameter, otherwise asserts
  88
  89  * `deficiency(tail)` -- returns back the number of tokens that were claimed
  90    from the bucket but that are not yet there. Non-zero number means that the
  91    bucket is contented and the request dispatching should be delayed
  92
  93 ### Example
  94
  95 For example, the simple dispatch loop may look like this
  96
  97     while (true) {
  98         request = get_next_request()
  99         tail = token_bucket.grab(request.cost())
 100         while (token_bucket.deficiency(tail)) {
 101             yield
 102         }
 103         request.dispatch()
 104     }
 105
 106 And in the background there should run a timer calling `token_bucket.replenish(now())`
 107
 108 Additionally, if there's a need to cap the token bucket with the real request serving
 109 rate, upon request completion one should call `token_bucket.release(request.cost())`