From e7f02a522fb0910b0e3f1db485ec059661438b5d Mon Sep 17 00:00:00 2001
From: "osdl.net!shemminger" <osdl.net!shemminger>
Date: Mon, 23 Aug 2004 20:21:21 +0000
Subject: [PATCH] Rename: tc/README.distribution -> README.distribution

(Logical change 1.71)
---
 README.distribution | 95 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/README.distribution b/README.distribution
index e69de29b..fe78fb4a 100644
--- a/README.distribution
+++ b/README.distribution
@@ -0,0 +1,95 @@
+I. About the distribution tables
+
+The table used for "synthesizing" the distribution is essentially a scaled,
+translated, inverse to the cumulative distribution function.
+
+Here's how to think about it: Let F() be the cumulative distribution
+function for a probability distribution X.  We'll assume we've scaled
+things so that X has mean 0 and standard deviation 1, though that's not
+so important here.  Then:
+
+	F(x) = P(X <= x) = \int_{-inf}^x f
+
+where f is the probability density function.
+
+F is monotonically increasing, so has an inverse function G, with range
+0 to 1.  Here, G(t) = the x such that P(X <= x) = t.  (In general, G may
+have singularities if X has point masses, i.e., points x such that
+P(X = x) > 0.)
+
+Now we create a tabular representation of G as follows:  Choose some table
+size N, and for the ith entry, put in G(i/N).  Let's call this table T.
+
+The claim now is, I can create a (discrete) random variable Y whose
+distribution has the same approximate "shape" as X, simply by letting
+Y = T(U), where U is a discrete uniform random variable with range 1 to N.
+To see this, it's enough to show that Y's cumulative distribution function,
+(let's call it H), is a discrete approximation to F.  But
+
+	H(x) = P(Y <= x)
+	     = (# of entries in T <= x) / N   -- as Y chosen uniformly from T
+	     = i/N, where i is the largest integer such that G(i/N) <= x
+	     = i/N, where i is the largest integer such that i/N <= F(x)
+	     		-- since G and F are inverse functions (and F is
+	     		   increasing)
+	     = floor(N*F(x))/N
+
+as desired.
+
+II. How to create distribution tables (in theory)
+
+How can we create this table in practice? In some cases, F may have a
+simple expression which allows evaluating its inverse directly.  The
+pareto distribution is one example of this.  In other cases, and
+especially for matching an experimentally observed distribution, it's
+easiest simply to create a table for F and "invert" it.  Here, we give
+a concrete example, namely how the new "experimental" distribution was
+created.
+
+1. Collect enough data points to characterize the distribution.  Here, I
+collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
+That's far more data than is really necessary, but it was fairly painless to
+collect it, so...
+
+2. Normalize the data so that it has mean 0 and standard deviation 1.
+
+3. Determine the cumulative distribution.  The code I wrote creates a table
+covering the range -10 to +10, with granularity .00005.  Obviously, this
+is absurdly over-precise, but since it's a one-time only computation, I
+figured it hardly mattered.
+
+4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
+(here, 4096) entry be x*TABLEFACTOR (here, 8192).  This creates a table
+for the ("normalized") inverse of size TABLESIZE, covering its domain 0
+to 1 with granularity 1/TABLESIZE.  Note that even with the granularity
+used in creating the table for F, it's possible not all the entries in
+the table for G will be filled in.  So, make a pass through the
+inverse's table, filling in any missing entries by linear interpolation.
+
+III. How to create distribution tables (in practice)
+
+If you want to do all this yourself, I've provided several tools to help:
+
+1. maketable does the steps 2-4 above, and then generates the appropriate
+header file.  So if you have your own time distribution, you can generate
+the header simply by:
+
+	maketable < time.values > header.h
+
+2. As explained in the other README file, the somewhat sleazy way I have
+of generating correlated values needs correction.  You can generate your
+own correction tables by compiling makesigtable and makemutable with
+your header file.  Check the Makefile to see how this is done.
+
+3. Warning: maketable, makesigtable and especially makemutable do
+enormous amounts of floating point arithmetic.  Don't try running
+these on an old 486.  (NIST Net itself will run fine on such a
+system, since in operation, it just needs to do a few simple integral
+calculations.  But getting there takes some work.)
+
+4. The tables produced are all normalized for mean 0 and standard
+deviation 1.  How do you know what values to use for real?  Here, I've
+provided a simple "stats" utility.  Give it a series of floating point
+values, and it will return their mean (mu), standard deviation (sigma),
+and correlation coefficient (rho).  You can then plug these values
+directly into NIST Net.
-- 
2.39.2