vendor/chalk-engine/src/README.md

   1 # The on-demand SLG solver
   2
   3 ## Description of how it works
   4
   5 The basis of the solver is the `Forest` type. A *forest* stores a
   6 collection of *tables* as well as a *stack*. Each *table* represents
   7 the stored results of a particular query that is being performed, as
   8 well as the various *strands*, which are basically suspended
   9 computations that may be used to find more answers. Tables are
  10 interdependent: solving one query may require solving others.
  11
  12 ### Walkthrough
  13
  14 Perhaps the easiest way to explain how the solver works is to walk
  15 through an example. Let's imagine that we have the following program:
  16
  17 ```rust
  18 trait Debug { }
  19
  20 struct u32 { }
  21 impl Debug for u32 { }
  22
  23 struct Rc<T> { }
  24 impl<T: Debug> Debug for Rc<T> { }
  25
  26 struct Vec<T> { }
  27 impl<T: Debug> Debug for Vec<T> { }
  28 ```
  29
  30 Now imagine that we want to find answers for the query `exists<T> {
  31 Rc<T>: Debug }`. The first step would be to u-canonicalize this query; this
  32 is the act of giving canonical names to all the unbound inference variables based on the
  33 order of their left-most appearance, as well as canonicalizing the universes of any
  34 universally bound names (e.g., the `T` in `forall<T> { ... }`). In this case, there are no
  35 universally bound names, but the canonical form Q of the query might look something like:
  36
  37     Rc<?0>: Debug
  38
  39 where `?0` is a variable in the root universe U0. We would then go and
  40 look for a table with this as the key: since the forest is empty, this
  41 lookup will fail, and we will create a new table T0, corresponding to
  42 the u-canonical goal Q.
  43
  44 **Ignoring negative reasoning and regions.** To start, we'll ignore
  45 the possibility of negative goals like `not { Foo }`. We'll phase them
  46 in later, as they bring several complications.
  47
  48 **Creating a table.** When we first create a table, we also initialize
  49 it with a set of *initial strands*. A "strand" is kind of like a
  50 "thread" for the solver: it contains a particular way to produce an
  51 answer. The initial set of strands for a goal like `Rc<?0>: Debug`
  52 (i.e., a "domain goal") is determined by looking for *clauses* in the
  53 environment. In Rust, these clauses derive from impls, but also from
  54 where-clauses that are in scope. In the case of our example, there
  55 would be three clauses, each coming from the program. Using a
  56 Prolog-like notation, these look like:
  57
  58 ```
  59 (u32: Debug).
  60 (Rc<T>: Debug) :- (T: Debug).
  61 (Vec<T>: Debug) :- (T: Debug).
  62 ```
  63
  64 To create our initial strands, then, we will try to apply each of
  65 these clauses to our goal of `Rc<?0>: Debug`. The first and third
  66 clauses are inapplicable because `u32` and `Vec<?0>` cannot be unified
  67 with `Rc<?0>`. The second clause, however, will work.
  68
  69 **What is a strand?** Let's talk a bit more about what a strand *is*. In the code, a strand
  70 is the combination of an inference table, an X-clause, and (possibly)
  71 a selected subgoal from that X-clause. But what is an X-clause
  72 (`ExClause`, in the code)? An X-clause pulls together a few things:
  73
  74 - The current state of the goal we are trying to prove;
  75 - A set of subgoals that have yet to be proven;
  76 - There are also a few things we're ignoring for now:
  77   - delayed literals, region constraints
  78
  79 The general form of an X-clause is written much like a Prolog clause,
  80 but with somewhat different semantics. Since we're ignoring delayed
  81 literals and region constraints, an X-clause just looks like this:
  82
  83     G :- L
  84
  85 where G is a goal and L is a set of subgoals that must be proven.
  86 (The L stands for *literal* -- when we address negative reasoning, a
  87 literal will be either a positive or negative subgoal.) The idea is
  88 that if we are able to prove L then the goal G can be considered true.
  89
  90 In the case of our example, we would wind up creating one strand, with
  91 an X-clause like so:
  92
  93     (Rc<?T>: Debug) :- (?T: Debug)
  94
  95 Here, the `?T` refers to one of the inference variables created in the
  96 inference table that accompanies the strand. (I'll use named variables
  97 to refer to inference variables, and numbered variables like `?0` to
  98 refer to variables in a canonicalized goal; in the code, however, they
  99 are both represented with an index.)
 100
 101 For each strand, we also optionally store a *selected subgoal*. This
 102 is the subgoal after the turnstile (`:-`) that we are currently trying
 103 to prove in this strand. Initally, when a strand is first created,
 104 there is no selected subgoal.
 105
 106 **Activating a strand.** Now that we have created the table T0 and
 107 initialized it with strands, we have to actually try and produce an
 108 answer. We do this by invoking the `ensure_answer` operation on the
 109 table: specifically, we say `ensure_answer(T0, A0)`, meaning "ensure
 110 that there is a 0th answer".
 111
 112 Remember that tables store not only strands, but also a vector of
 113 cached answers. The first thing that `ensure_answer` does is to check
 114 whether answer 0 is in this vector. If so, we can just return
 115 immediately.  In this case, the vector will be empty, and hence that
 116 does not apply (this becomes important for cyclic checks later on).
 117
 118 When there is no cached answer, `ensure_answer` will try to produce
 119 one.  It does this by selecting a strand from the set of active
 120 strands -- the strands are stored in a `VecDeque` and hence processed
 121 in a round-robin fashion. Right now, we have only one strand, storing
 122 the following X-clause with no selected subgoal:
 123
 124     (Rc<?T>: Debug) :- (?T: Debug)
 125
 126 When we activate the strand, we see that we have no selected subgoal,
 127 and so we first pick one of the subgoals to process. Here, there is only
 128 one (`?T: Debug`), so that becomes the selected subgoal, changing
 129 the state of the strand to:
 130
 131     (Rc<?T>: Debug) :- selected(?T: Debug, A0)
 132
 133 Here, we write `selected(L, An)` to indicate that (a) the literal `L`
 134 is the selected subgoal and (b) which answer `An` we are looking for. We
 135 start out looking for `A0`.
 136
 137 **Processing the selected subgoal.** Next, we have to try and find an
 138 answer to this selected goal. To do that, we will u-canonicalize it
 139 and try to find an associated table. In this case, the u-canonical
 140 form of the subgoal is `?0: Debug`: we don't have a table yet for
 141 that, so we can create a new one, T1. As before, we'll initialize T1
 142 with strands. In this case, there will be three strands, because all
 143 the program clauses are potentially applicable. Those three strands
 144 will be:
 145
 146 - `(u32: Debug) :-`, derived from the program clause `(u32: Debug).`.
 147   - Note: This strand has no subgoals.
 148 - `(Vec<?U>: Debug) :- (?U: Debug)`, derived from the `Vec` impl.
 149 - `(Rc<?U>: Debug) :- (?U: Debug)`, derived from the `Rc` impl.
 150
 151 We can thus summarize the state of the whole forest at this point as
 152 follows:
 153
 154 ```
 155 Table T0 [Rc<?0>: Debug]
 156   Strands:
 157     (Rc<?T>: Debug) :- selected(?T: Debug, A0)
 158
 159 Table T1 [?0: Debug]
 160   Strands:
 161     (u32: Debug) :-
 162     (Vec<?U>: Debug) :- (?U: Debug)
 163     (Rc<?V>: Debug) :- (?V: Debug)
 164 ```
 165
 166 **Delegation between tables.** Now that the active strand from T0 has
 167 created the table T1, it can try to extract an answer. It does this
 168 via that same `ensure_answer` operation we saw before. In this case,
 169 the strand would invoke `ensure_answer(T1, A0)`, since we will start
 170 with the first answer. This will cause T1 to activate its first
 171 strand, `u32: Debug :-`.
 172
 173 This strand is somewhat special: it has no subgoals at all. This means
 174 that the goal is proven. We can therefore add `u32: Debug` to the set
 175 of *answers* for our table, calling it answer A0 (it is the first
 176 answer). The strand is then removed from the list of strands.
 177
 178 The state of table T1 is therefore:
 179
 180 ```
 181 Table T1 [?0: Debug]
 182   Answers:
 183     A0 = [?0 = u32]
 184   Strand:
 185     (Vec<?U>: Debug) :- (?U: Debug)
 186     (Rc<?V>: Debug) :- (?V: Debug)
 187 ```
 188
 189 Note that I am writing out the answer A0 as a substitution that can be
 190 applied to the table goal; actually, in the code, the goals for each
 191 X-clause are also represented as substitutions, but in this exposition
 192 I've chosen to write them as full goals, following NFTD.
 193
 194 Since we now have an answer, `ensure_answer(T1, A0)` will return `Ok`
 195 to the table T0, indicating that answer A0 is available. T0 now has
 196 the job of incorporating that result into its active strand. It does
 197 this in two ways. First, it creates a new strand that is looking for
 198 the next possible answer of T1. Next, it incorpoates the answer from
 199 A0 and removes the subgoal. The resulting state of table T0 is:
 200
 201 ```
 202 Table T0 [Rc<?0>: Debug]
 203   Strands:
 204     (Rc<?T>: Debug) :- selected(?T: Debug, A1)
 205     (Rc<u32>: Debug) :-
 206 ```
 207
 208 We then immediately activate the strand that incorporated the answer
 209 (the `Rc<u32>: Debug` one). In this case, that strand has no further
 210 subgoals, so it becomes an answer to the table T0. This answer can
 211 then be returned up to our caller, and the whole forest goes quiescent
 212 at this point (remember, we only do enough work to generate *one*
 213 answer). The ending state of the forest at this point will be:
 214
 215 ```
 216 Table T0 [Rc<?0>: Debug]
 217   Answer:
 218     A0 = [?0 = u32]
 219   Strands:
 220     (Rc<?T>: Debug) :- selected(?T: Debug, A1)
 221
 222 Table T1 [?0: Debug]
 223   Answers:
 224     A0 = [?0 = u32]
 225   Strand:
 226     (Vec<?U>: Debug) :- (?U: Debug)
 227     (Rc<?V>: Debug) :- (?V: Debug)
 228 ```
 229
 230 Here you can see how the forest captures both the answers we have
 231 created thus far *and* the strands that will let us try to produce
 232 more answers later on.
 233
 234 ## Heritage and acroynms
 235
 236 This solver implements the SLG solving technique, though extended to
 237 accommodate hereditary harrop (HH) predicates, as well as the needs of
 238 lazy normalization.
 239
 240 Its design is kind of a fusion of [MiniKanren] and the following
 241 papers, which I will refer to as EWFS and NTFD respectively:
 242
 243 > Efficient Top-Down Computation of Queries Under the Well-formed Semantics
 244 > (Chen, Swift, and Warren; Journal of Logic Programming '95)
 245
 246 > A New Formulation of Tabled resolution With Delay
 247 > (Swift; EPIA '99)
 248
 249 [MiniKanren]: http://minikanren.org/
 250
 251 In addition, I incorporated extensions from the following papers,
 252 which I will refer to as SA and RR respectively, that describes how to
 253 do introduce approximation when processing subgoals and so forth:
 254
 255 > Terminating Evaluation of Logic Programs with Finite Three-Valued Models
 256 > Riguzzi and Swift; ACM Transactions on Computational Logic 2013
 257 > (Introduces "subgoal abstraction", hence the name SA)
 258 >
 259 > Radial Restraint
 260 > Grosof and Swift; 2013
 261
 262 Another useful paper that gives a kind of high-level overview of
 263 concepts at play is the following:
 264
 265 > XSB: Extending Prolog with Tabled Logic Programming
 266 > (Swift and Warren; Theory and Practice of Logic Programming '10)
 267
 268 There are a places where I intentionally diverged from the semantics
 269 as described in the papers -- e.g. by more aggressively approximating
 270 -- which I marked them with a comment DIVERGENCE. Those places may
 271 want to be evaluated in the future.
 272
 273 A few other acronyms that I use:
 274
 275 - WAM: Warren abstract machine, an efficient way to evaluate Prolog programs.
 276   See <http://wambook.sourceforge.net/>.
 277 - HH: Hereditary harrop predicates. What Chalk deals in.
 278   Popularized by Lambda Prolog.
 279
 280