 |
Content
Home page
Products
Support
Download
About us
Contact
|
 |
 |
Diehard battery of tests
General
Prof. Georges Marsaglia from the Florida State University has devised a set
of powerfull statistical tests for testing randomness of sequences of numbers,
called the Diehard battery of randomness tests.
The Diehard program
written by B. Narasimhan can be found on the Web, but unfortunately seems
not to have been maintained for the last few years.
A renewed version of the Diehard program, including the graphical interface,
is included in the PowerTest.
The Diehard test is important because it seems to be the most powerfull
general test of randomness. Many software and hardware generators that we have
tested, and which claim "perfect randomness" actually fail one or more sections
of Diehard. There are whole classes of frequently used software pseudo-random
generators which are known to fail Diehard test, such as Linear Congruental and
Lagged Fibonacci generators.
In testing of hardware random number generators some other tests may be even
more sensitive (efficient) to the hardware-specific problems. In hardware
generators the main problems are the unavoidable bias (the probability of 1's is
not the same as the probability of 0's) and bits correlation (a bit depends
(statistically) on previous bits). Testing longer and longer sequences of random
bits, the Diehard test will eventually detect these defects too, but specialized
frequency and autocorrelation tests will do it faster thus saving the time. That
is the reason why the PowerTest
contains many other interesting tests besides the Diehard.
In conclusion, a generator which pass Diehard test should be considered good.
Description
The Diehard battery of tests consisits of 18 different, independent
statistical tests. Results of tests are so called "p-values". In the PowerTest
version of Diehard, these values are of Kolmogorov-Smirnov type, which means
their values are real, between 0 and 1. For any given test, smaller p-value
means better test result. An individual test is considered to be failed if p
value approaches 1 closely, for example p>0.9999. See the PowerTest description for further details.
What follows is the original description of the tests as found in the
original Diehard program by B. Narasimhan.
THE BIRTHDAY SPACINGS TEST
Choose m birthdays in a year of n days. List the spacings
between the birthdays. If j is the number of values that
occur more than once in that list, then j is asymptotically
Poisson distributed with mean m^3/(4n). Experience shows n
must be quite large, say n>=2^18, for comparing the results
to the Poisson distribution with that mean. This test uses
n=2^24 and m=2^9, so that the underlying distribution for j
is taken to be Poisson with lambda=2^27/(2^26)=2. A sample
of 500 j's is taken, and a chi-square goodness of fit test
provides a p value. The first test uses bits 1-24 (counting
from the left) from integers in the specified file.
Then the file is closed and reopened. Next, bits 2-25 are
used to provide birthdays, then 3-26 and so on to bits 9-32.
Each set of bits provides a p-value, and the nine p-values
provide a sample for a KSTEST.
THE OVERLAPPING 5-PERMUTATION TEST
This is the OPERM5 test. It looks at a sequence of one mill-
ion 32-bit random integers. Each set of five consecutive
integers can be in one of 120 states, for the 5! possible or-
derings of five numbers. Thus the 5th, 6th, 7th,...numbers
each provide a state. As many thousands of state transitions
are observed, cumulative counts are made of the number of
occurences of each state. Then the quadratic form in the
weak inverse of the 120x120 covariance matrix yields a test
equivalent to the likelihood ratio test that the 120 cell
counts came from the specified (asymptotically) normal dis-
tribution with the specified 120x120 covariance matrix (with
rank 99). This version uses 1,000,000 integers, twice.
THE BINARY RANK TEST
for 31x31 matrices. The leftmost
31 bits of 31 random integers from the test sequence are used
to form a 31x31 binary matrix over the field {0,1}. The rank
is determined. That rank can be from 0 to 31, but ranks< 28
are rare, and their counts are pooled with those for rank 28.
Ranks are found for 40,000 such random matrices and a chisqua-
re test is performed on counts for ranks 31,30,29 and <=28.
THE BINARY RANK TEST
for 32x32 matrices. A random 32x32
binary matrix is formed, each row a 32-bit random integer.
The rank is determined. That rank can be from 0 to 32, ranks
less than 29 are rare, and their counts are pooled with those
for rank 29. Ranks are found for 40,000 such random matrices
and a chisquare test is performed on counts for ranks 32,31,
30 and <=29.
THE BINARY RANK TEST
for 6x8 matrices. From each of
six random 32-bit integers from the generator under test, a
specified byte is chosen, and the resulting six bytes form a
6x8 binary matrix whose rank is determined. That rank can be
from 0 to 6, but ranks 0,1,2,3 are rare; their counts are
pooled with those for rank 4. Ranks are found for 100,000
random matrices, and a chi-square test is performed on
counts for ranks 6,5 and <=4.
THE BITSTREAM TEST
The file under test is viewed as a stream of bits. Call them
b1,b2,... . Consider an alphabet with two "letters", 0 and 1
and think of the stream of bits as a succession of 20-letter
"words", overlapping. Thus the first word is b1b2...b20, the
second is b2b3...b21, and so on. The bitstream test counts
the number of missing 20-letter (20-bit) words in a string of
2^21 overlapping 20-letter words. There are 2^20 possible 20
letter words. For a truly random string of 2^21+19 bits, the
number of missing words j should be (very close to) normally
distributed with mean 141,909 and sigma 428. Thus
(j-141909)/428 should be a standard normal variate (z score)
that leads to a uniform [0,1) p value. The test is repeated
twenty times.
THE OPSO TEST
OPSO means Overlapping-Pairs-Sparse-Occupancy
The OPSO test considers 2-letter words from an alphabet of
1024 letters. Each letter is determined by a specified ten
bits from a 32-bit integer in the sequence to be tested. OPSO
generates 2^21 (overlapping) 2-letter words (from 2^21+1
"keystrokes") and counts the number of missing words---that
is 2-letter words which do not appear in the entire sequence.
That count should be very close to normally distributed with
mean 141,909, sigma 290. Thus (missingwrds-141909)/290 should
be a standard normal variable. The OPSO test takes 32 bits at
a time from the test file and uses a designated set of ten
consecutive bits. It then restarts the file for the next de-
signated 10 bits, and so on.
THE OQSO TEST
OQSO means Overlapping-Quadruples-Sparse-Occupancy
The test OQSO is similar, except that it considers 4-letter
words from an alphabet of 32 letters, each letter determined
by a designated string of 5 consecutive bits from the test
file, elements of which are assumed 32-bit random integers.
The mean number of missing words in a sequence of 2^21 four-
letter words, (2^21+3 "keystrokes"), is again 141909, with
sigma = 295. The mean is based on theory; sigma comes from
extensive simulation.
THE DNA TEST
The DNA test considers an alphabet of 4 letters:: C,G,A,T,
determined by two designated bits in the sequence of random
integers being tested. It considers 10-letter words, so that
as in OPSO and OQSO, there are 2^20 possible words, and the
mean number of missing words from a string of 2^21 (over-
lapping) 10-letter words (2^21+9 "keystrokes") is 141909.
The standard deviation sigma=339 was determined as for OQSO
by simulation. (Sigma for OPSO, 290, is the true value (to
three places), not determined by simulation.
The COUNT-THE-1's TEST on a stream of bytes.
Consider the file under test as a stream of bytes (four per
32 bit integer). Each byte can contain from 0 to 8 1's,
with probabilities 1,8,28,56,70,56,28,8,1 over 256. Now let
the stream of bytes provide a string of overlapping 5-letter
words, each "letter" taking values A,B,C,D,E. The letters are
determined by the number of 1's in a byte:: 0,1,or 2 yield A,
3 yields B, 4 yields C, 5 yields D and 6,7 or 8 yield E. Thus
we have a monkey at a typewriter hitting five keys with vari-
ous probabilities (37,56,70,56,37 over 256). There are 5^5
possible 5-letter words, and from a string of 256,000 (over-
lapping) 5-letter words, counts are made on the frequencies
for each word. The quadratic form in the weak inverse of
the covariance matrix of the cell counts provides a chisquare
test:: Q5-Q4, the difference of the naive Pearson sums of
(OBS-EXP)^2/EXP on counts for 5- and 4-letter cell counts.
The COUNT-THE-1's TEST for specific bytes.
Consider the file under test as a stream of 32-bit integers.
From each integer, a specific byte is chosen , say the left-
most:: bits 1 to 8. Each byte can contain from 0 to 8 1's,
with probabilitie 1,8,28,56,70,56,28,8,1 over 256. Now let
the specified bytes from successive integers provide a string
of (overlapping) 5-letter words, each "letter" taking values
A,B,C,D,E. The letters are determined by the number of 1's,
in that byte:: 0,1,or 2 ---> A, 3 ---> B, 4 ---> C, 5 ---> D,
and 6,7 or 8 ---> E. Thus we have a monkey at a typewriter
hitting five keys with with various probabilities:: 37,56,70,
56,37 over 256. There are 5^5 possible 5-letter words, and
from a string of 256,000 (overlapping) 5-letter words, counts
are made on the frequencies for each word. The quadratic form
in the weak inverse of the covariance matrix of the cell
counts provides a chisquare test:: Q5-Q4, the difference of
the naive Pearson sums of (OBS-EXP)^2/EXP on counts for 5-
and 4-letter cell counts.
THE PARKING LOT TEST
In a square of side 100, randomly "park" a car---a circle of
radius 1. Then try to park a 2nd, a 3rd, and so on, each
time parking "by ear". That is, if an attempt to park a car
causes a crash with one already parked, try again at a new
random location. (To avoid path problems, consider parking
helicopters rather than cars.) Each attempt leads to either
a crash or a success, the latter followed by an increment to
the list of cars already parked. If we plot n: the number of
attempts, versus k:: the number successfully parked, we get a
curve that should be similar to those provided by a perfect
random number generator. Theory for the behavior of such a
random curve seems beyond reach, and as graphics displays are
not available for this battery of tests, a simple characteriz
ation of the random experiment is used: k, the number of cars
successfully parked after n=12,000 attempts. Simulation shows
that k should average 3523 with sigma 21.9 and is very close
to normally distributed. Thus (k-3523)/21.9 should be a st-
andard normal variable, which, converted to a uniform varia-
ble, provides input to a KSTEST based on a sample of 10.
THE MINIMUM DISTANCE TEST
It does this 100 times:: choose n=8000 random points in a
square of side 10000. Find d, the minimum distance between
the (n^2-n)/2 pairs of points. If the points are truly inde-
pendent uniform, then d^2, the square of the minimum distance
should be (very close to) exponentially distributed with mean
.995 . Thus 1-exp(-d^2/.995) should be uniform on [0,1) and
a KSTEST on the resulting 100 values serves as a test of uni-
formity for random points in the square. Test numbers=0 mod 5
are printed but the KSTEST is based on the full set of 100
random choices of 8000 points in the 10000x10000 square.
THE 3DSPHERES TEST
Choose 4000 random points in a cube of edge 1000. At each
point, center a sphere large enough to reach the next closest
point. Then the volume of the smallest such sphere is (very
close to) exponentially distributed with mean 120pi/3. Thus
the radius cubed is exponential with mean 30. (The mean is
obtained by extensive simulation). The 3DSPHERES test gener-
ates 4000 such spheres 20 times. Each min radius cubed leads
to a uniform variable by means of 1-exp(-r^3/30.), then a
KSTEST is done on the 20 p-values.
THE SQEEZE TEST
Random integers are floated to get uniforms on [0,1). Start-
ing with k=2^31=2147483647, the test finds j, the number of
iterations necessary to reduce k to 1, using the reduction
k=ceiling(k*U), with U provided by floating integers from
the file being tested. Such j's are found 100,000 times,
then counts for the number of times j was <=6,7,...,47,>=48
are used to provide a chi-square test for cell frequencies.
THE OVERLAPPING SUMS TEST
Integers are floated to get a sequence U(1),U(2),... of uni-
form [0,1) variables. Then overlapping sums,
S(1)=U(1)+...+U(100), S2=U(2)+...+U(101),... are formed.
The S's are virtually normal with a certain covariance mat-
rix. A linear transformation of the S's converts them to a
sequence of independent standard normals, which are converted
to uniform variables for a KSTEST. The p-values from ten
KSTESTs are given still another KSTEST.
THE RUNS TEST
It counts runs up, and runs down,
in a sequence of uniform [0,1) variables, obtained by float-
ing the 32-bit integers in the specified file. This example
shows how runs are counted: .123,.357,.789,.425,.224,.416,.95
contains an up-run of length 3, a down-run of length 2 and an
up-run of (at least) 2, depending on the next values. The
covariance matrices for the runs-up and runs-down are well
known, leading to chisquare tests for quadratic forms in the
weak inverses of the covariance matrices. Runs are counted
for sequences of length 10,000. This is done ten times. Then
repeated.
THE CRAPS TEST
It plays 200,000 games of craps, finds
the number of wins and the number of throws necessary to end
each game. The number of wins should be (very close to) a
normal with mean 200000p and variance 200000p(1-p), with
p=244/495. Throws necessary to complete the game can vary
from 1 to infinity, but counts for all>21 are lumped with 21.
A chi-square test is made on the no.-of-throws cell counts.
Each 32-bit integer from the test file provides the value for
the throw of a die, by floating to [0,1), multiplying by 6
and taking 1 plus the integer part of the result.
NOTE:
Most of the tests in DIEHARD return a p-value, which should be uniform on
[0,1) if the input file contains truly independent random bits. Those p-values
are obtained by p=F(X), where F is the assumed distribution of the sample random
variable X---often normal. But that assumed F is just an asymptotic
approximation, for which the fit will be worst in the tails. Thus you should not
be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When
a bit stream really FAILS BIG, you will get p's of 0 or 1 to six or more places.
By all means, do not, as a Statistician might, think that a p < .025 or p>
.975 means that the RNG has "failed the test at the .05 level". Such p's happen
among the hundreds that DIEHARD produces, even with good RNG's. So keep in mind
that " p happens".
|