]>
Commit | Line | Data |
---|---|---|
e7f02a52 SH |
1 | I. About the distribution tables |
2 | ||
3 | The table used for "synthesizing" the distribution is essentially a scaled, | |
4 | translated, inverse to the cumulative distribution function. | |
5 | ||
6 | Here's how to think about it: Let F() be the cumulative distribution | |
7 | function for a probability distribution X. We'll assume we've scaled | |
8 | things so that X has mean 0 and standard deviation 1, though that's not | |
9 | so important here. Then: | |
10 | ||
11 | F(x) = P(X <= x) = \int_{-inf}^x f | |
12 | ||
13 | where f is the probability density function. | |
14 | ||
15 | F is monotonically increasing, so has an inverse function G, with range | |
16 | 0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may | |
17 | have singularities if X has point masses, i.e., points x such that | |
18 | P(X = x) > 0.) | |
19 | ||
20 | Now we create a tabular representation of G as follows: Choose some table | |
21 | size N, and for the ith entry, put in G(i/N). Let's call this table T. | |
22 | ||
23 | The claim now is, I can create a (discrete) random variable Y whose | |
24 | distribution has the same approximate "shape" as X, simply by letting | |
25 | Y = T(U), where U is a discrete uniform random variable with range 1 to N. | |
26 | To see this, it's enough to show that Y's cumulative distribution function, | |
27 | (let's call it H), is a discrete approximation to F. But | |
28 | ||
29 | H(x) = P(Y <= x) | |
30 | = (# of entries in T <= x) / N -- as Y chosen uniformly from T | |
31 | = i/N, where i is the largest integer such that G(i/N) <= x | |
32 | = i/N, where i is the largest integer such that i/N <= F(x) | |
33 | -- since G and F are inverse functions (and F is | |
34 | increasing) | |
35 | = floor(N*F(x))/N | |
36 | ||
37 | as desired. | |
38 | ||
39 | II. How to create distribution tables (in theory) | |
40 | ||
41 | How can we create this table in practice? In some cases, F may have a | |
42 | simple expression which allows evaluating its inverse directly. The | |
6c513a00 | 43 | Pareto distribution is one example of this. In other cases, and |
e7f02a52 SH |
44 | especially for matching an experimentally observed distribution, it's |
45 | easiest simply to create a table for F and "invert" it. Here, we give | |
46 | a concrete example, namely how the new "experimental" distribution was | |
47 | created. | |
48 | ||
49 | 1. Collect enough data points to characterize the distribution. Here, I | |
50 | collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov). | |
51 | That's far more data than is really necessary, but it was fairly painless to | |
52 | collect it, so... | |
53 | ||
54 | 2. Normalize the data so that it has mean 0 and standard deviation 1. | |
55 | ||
56 | 3. Determine the cumulative distribution. The code I wrote creates a table | |
57 | covering the range -10 to +10, with granularity .00005. Obviously, this | |
58 | is absurdly over-precise, but since it's a one-time only computation, I | |
59 | figured it hardly mattered. | |
60 | ||
61 | 4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE | |
62 | (here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table | |
63 | for the ("normalized") inverse of size TABLESIZE, covering its domain 0 | |
64 | to 1 with granularity 1/TABLESIZE. Note that even with the granularity | |
65 | used in creating the table for F, it's possible not all the entries in | |
66 | the table for G will be filled in. So, make a pass through the | |
67 | inverse's table, filling in any missing entries by linear interpolation. | |
68 | ||
69 | III. How to create distribution tables (in practice) | |
70 | ||
71 | If you want to do all this yourself, I've provided several tools to help: | |
72 | ||
73 | 1. maketable does the steps 2-4 above, and then generates the appropriate | |
74 | header file. So if you have your own time distribution, you can generate | |
75 | the header simply by: | |
76 | ||
77 | maketable < time.values > header.h | |
78 | ||
79 | 2. As explained in the other README file, the somewhat sleazy way I have | |
80 | of generating correlated values needs correction. You can generate your | |
81 | own correction tables by compiling makesigtable and makemutable with | |
82 | your header file. Check the Makefile to see how this is done. | |
83 | ||
84 | 3. Warning: maketable, makesigtable and especially makemutable do | |
85 | enormous amounts of floating point arithmetic. Don't try running | |
86 | these on an old 486. (NIST Net itself will run fine on such a | |
87 | system, since in operation, it just needs to do a few simple integral | |
88 | calculations. But getting there takes some work.) | |
89 | ||
90 | 4. The tables produced are all normalized for mean 0 and standard | |
91 | deviation 1. How do you know what values to use for real? Here, I've | |
92 | provided a simple "stats" utility. Give it a series of floating point | |
93 | values, and it will return their mean (mu), standard deviation (sigma), | |
94 | and correlation coefficient (rho). You can then plug these values | |
95 | directly into NIST Net. |