X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=README.distribution;h=4e2e8eabd04748b9fd50252bda0c38ef43d621aa;hb=aea41afcfd6d6547dd2e80ddde8df7e3b2800482;hp=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391;hpb=1ff891ab238f5b2387d0fb18c5800bbf4facdaed;p=mirror_iproute2.git diff --git a/README.distribution b/README.distribution index e69de29b..4e2e8eab 100644 --- a/README.distribution +++ b/README.distribution @@ -0,0 +1,95 @@ +I. About the distribution tables + +The table used for "synthesizing" the distribution is essentially a scaled, +translated, inverse to the cumulative distribution function. + +Here's how to think about it: Let F() be the cumulative distribution +function for a probability distribution X. We'll assume we've scaled +things so that X has mean 0 and standard deviation 1, though that's not +so important here. Then: + + F(x) = P(X <= x) = \int_{-inf}^x f + +where f is the probability density function. + +F is monotonically increasing, so has an inverse function G, with range +0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may +have singularities if X has point masses, i.e., points x such that +P(X = x) > 0.) + +Now we create a tabular representation of G as follows: Choose some table +size N, and for the ith entry, put in G(i/N). Let's call this table T. + +The claim now is, I can create a (discrete) random variable Y whose +distribution has the same approximate "shape" as X, simply by letting +Y = T(U), where U is a discrete uniform random variable with range 1 to N. +To see this, it's enough to show that Y's cumulative distribution function, +(let's call it H), is a discrete approximation to F. But + + H(x) = P(Y <= x) + = (# of entries in T <= x) / N -- as Y chosen uniformly from T + = i/N, where i is the largest integer such that G(i/N) <= x + = i/N, where i is the largest integer such that i/N <= F(x) + -- since G and F are inverse functions (and F is + increasing) + = floor(N*F(x))/N + +as desired. + +II. How to create distribution tables (in theory) + +How can we create this table in practice? In some cases, F may have a +simple expression which allows evaluating its inverse directly. The +Pareto distribution is one example of this. In other cases, and +especially for matching an experimentally observed distribution, it's +easiest simply to create a table for F and "invert" it. Here, we give +a concrete example, namely how the new "experimental" distribution was +created. + +1. Collect enough data points to characterize the distribution. Here, I +collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov). +That's far more data than is really necessary, but it was fairly painless to +collect it, so... + +2. Normalize the data so that it has mean 0 and standard deviation 1. + +3. Determine the cumulative distribution. The code I wrote creates a table +covering the range -10 to +10, with granularity .00005. Obviously, this +is absurdly over-precise, but since it's a one-time only computation, I +figured it hardly mattered. + +4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE +(here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table +for the ("normalized") inverse of size TABLESIZE, covering its domain 0 +to 1 with granularity 1/TABLESIZE. Note that even with the granularity +used in creating the table for F, it's possible not all the entries in +the table for G will be filled in. So, make a pass through the +inverse's table, filling in any missing entries by linear interpolation. + +III. How to create distribution tables (in practice) + +If you want to do all this yourself, I've provided several tools to help: + +1. maketable does the steps 2-4 above, and then generates the appropriate +header file. So if you have your own time distribution, you can generate +the header simply by: + + maketable < time.values > header.h + +2. As explained in the other README file, the somewhat sleazy way I have +of generating correlated values needs correction. You can generate your +own correction tables by compiling makesigtable and makemutable with +your header file. Check the Makefile to see how this is done. + +3. Warning: maketable, makesigtable and especially makemutable do +enormous amounts of floating point arithmetic. Don't try running +these on an old 486. (NIST Net itself will run fine on such a +system, since in operation, it just needs to do a few simple integral +calculations. But getting there takes some work.) + +4. The tables produced are all normalized for mean 0 and standard +deviation 1. How do you know what values to use for real? Here, I've +provided a simple "stats" utility. Give it a series of floating point +values, and it will return their mean (mu), standard deviation (sigma), +and correlation coefficient (rho). You can then plug these values +directly into NIST Net.