![]() |
||||||||||||||||||||||||||||||||||||||||
Content |
Universal Statistical Test for testing randomness of sequence of bits
The testProf. Ueli Maurer has devised a randomness test which is "universal" in the sense that it is sensitive to several "standard" statistical defects that can be otherwise tested with specialized algorithms. It seems to be especially sensitive to defects typical for non-deterministic (hardware) random generators. Though it is generally slower than specialized algorithms in the sense that it takes a longer sequence to detect a given statistical defect (for example it takes 29 times longer sequence to detect bias than with ordinary frequency test), it pays off to use it because you run the program only once, instead of running many separate tests. The program I give here is particularly fast, although it can be made even faster by pre-calculating of the logarithms. It also incorporates several options that allow easy and time-saving use of the test in its full power. You can download the program (both source in C, in QB2C BASIC and executable for i486 Linux) from below. As we know from our physics lessons, most macroscopic physical systems are near deterministic. The deterministic system is such that given the initial state of the system (positions and velocities of all bodies in the system), the future of the system is uniquely determined. Since it is all so nicely and uniquely determined, it is also possible to backtrace the events. In another words, system remembers its past. If it wouldn't be for the chaos and quantum effects, all physical systems would be predictable and would have perfect memory of the past. Luckily it is not the case. Thus, we are able to construct physical non-deterministic (hardware) random bit generators (such as HG 324) which still, being physical bodies, may and generally do have some "memory". The basic structure of the Universal Statistical Test is such that you suppose the maximum memory length in bits (L) and you run the test with this value of L. Possible values in the program presented here are L = 4, 8 and 16. Specifying smaller value of L, the test is more sensitive to defects associated with a small memory (for example low-range correlations), but you are more likely to miss large memory effects. The other side of the medal is that with increasing L, the minimum length of the sequence needed to run the test increases dramatically. For most applications, L = 8 and file length between 2 and 10 Megabytes is the best choice.
The result of the test is the "mean value" which depends only on L. When the test is performed on a sequence of bits, the "measured" mean value should be close to the theoretical value, if bits are random. Main work of prof. Maurer and Coron et. al. was to calculate the otheoretical mean (Expectance) and its spread (Variance) as a function of L. Measure of the randomness of a given sequence of bits is then given as the distance of the measured mean to the theoretical value, expressed in theoretical standard deviations. D = (value - Expectance(L)) / sqrt(Variance(L)) Smaller the distance, better the randomness mark. Of course, the distance D
itself is a random variable. For perfect random bits sequences it is nearly
Gaussian distributed, with the mean of 0 and variance of 1.
You can download program files from here (11k).
An example. Suppose you have generated a binary file 'tmp.bits' containing a
sequence of random bits which is 1 000 000 bytes long (that is 8 million of
bits). A command like:
utest -r 3 -b 300000 tmp.bits
The result is:
Using L = 8 and 299520 bytes from the file: tmp
uses the first 900 000 bytes (3 x 300 000) in three consecutive slices and
calculates the statistics for each slice independently.
The left colon is the measured mean value of the test, and the left colon is
the distance of that value to the expectance in standard deviations.
These were good random bits.
|
|||||||||||||||||||||||||||||||||||||||
Send mail to webmaster with questions or comments about this web site.
|
||||||||||||||||||||||||||||||||||||||||