

     Note: Visually, this ASCII documentation file is only a crude
approximation  to the postscript documentation file. The postscript file is
much easier to read, and it is more complete in that it has various formulas
that have been omitted from this documentation because they do not format
well in ASCII. Thus, if you find that BIVAR may be useful, I suggest
that you print out the postscript file.



             BIVAR:  A  Program  for Generating
              Correlated  Random  Numbers
                              Version  1.1

                                  Jeff Miller
                      Department of Psychology
                           University of Otago
                         Dunedin, New Zealand

                                5 August, 1997

Copyright 1997 Jeff Miller.

This program and documentation may be duplicated and used without
charge for any educational or noncommercial purposes. Use BIVAR at
your own risk. I believe it to be correct, but cannot guarantee the
accuracy of the calculations. If you do use this program, please send
me an acknowledgement letter, once a year or so, saying how you use
it (see sample at end of this documentation). If I get enough such
letters, I'll put this piece of software on my vita and maybe get some
credit for it with the university. Besides, both my kids collect stamps.

For commercial use, please contact the author.

Contents
1     Introduction
2     Overview of Bivariate Probability Distributions
3     Available Bivariate Structures
     3.1   Normal
     3.2   Mixture
     3.3   Bands
     3.4   RhoController
     3.5   How RhoController is Adjusted
4     Starting and Stopping the Program
5     Modes of Program Operation
6     Preparing the Input Control File
7     Program Output
8     Available Marginal Distributions
     8.1   Continuous Distributions
     8.2   Discrete Distributions
     8.3   Derived Distributions
     8.4   Distributions Arising in Connection with Signal Detection Theory
9    Technical Notes
     9.1   Installation
     9.2   Interfacing BIVAR to a Simulation Program
     9.3   Optimizing Speed
     9.4   Numerical Approximations
10   For Further Information
11   Release History
12   Sample Acknowledgement Letter

1        Introduction
BIVAR was designed to generate pairs of random numbers,  (X,Y),
from any of a wide variety of bivariate probability distributions.  The
user chooses the marginal distributions of X and Y  (which need not be
the same; see section 5 for the possible marginals), and chooses one of
three different types of bivariate structures (described in section 3) to
induce positive or negative XY  correlations. The random numbers are
written to an ASCII output file, which can then be used as input to a
user-written simulation program needing correlated random numbers.
BIVAR  was  designed  for  use  in  batch  files,  so  all  program  control
information is read from an ASCII file of input parameters that the
user creates with an editor prior to starting BIVAR.
     BIVAR was written using the same routines for implementing probability
distributions as used in the program CUPID, which is also available by
FTP from many archive sites.  The CUPID documentation provides
more detailed information on both the available marginal distributions
and on the univariate computational algorithms, so the descriptions
in this documentation will be brief and will focus on bivariate issues.

2       Overview of Bivariate Probability Distributions
This section provides a brief overview of bivariate probability distributions,
and it can probably be skipped by readers already familiar with this topic.
The next section describes the three bivariate structures available within
this program.  If you get lost reading that section, it might be helpful to
come back and read this one.
     A bivariate probability distribution can be thought of as a set of
possible (X,Y) pairs with an associated probability of each pair.  The
bivariate probability distribution uniquely determines both the marginal
distributions of X and Y  (i.e., the univariate distribution of each one
ignoring the other) and the correlation of X and Y . The reverse is not
necessarily true, however; given a pair of desired marginal distributions
for X and Y  and a desired value of the XY  correlation, you can't always
solve for the underlying bivariate probability distribution.  There may be
infinitely many bivariate distributions consistent with those marginals and
that correlation.
     This leads to a problem: Suppose you want to run some simulations with
given marginal distributions and  see how the model behaves when the random
variables are correlated (say at a correlation of -0.2, which seems
intuitively plausible to you).  The problem is that (for most marginal
distributions) you have not yet uniquely specified what you want to do,
because there are infinitely many bivariate distributions consistent with your
constraints.
     This program allows you to choose among three different bivariate
structures,  described  in  the  next  section.   Each  one  gives  you  the
marginals you request, but uses a different technique to give you the
correlation you want.  With luck, one of the available structures may seem to
capture, at least approximately, the sources of correlation that you imagine
are operating in your situation.

3        Available  Bivariate  Structures
This program implements three different bivariate structures.  I would
be happy to have suggestions for other general structures to add.
     Let Fx and Fy be the cumulative marginal distributions of X and
Y, respectively.

3.1      Normal
One bivariate structure is a transformation of the bivariate normal
distribution.   With  this  structure,  each  (X,Y)  pair  is  generated  as
follows:
    1. A pair (Zx, Zy) is generated randomly from a bivariate normal
       distribution with marginal means of zero, marginal standard deviations
       of one, and a prespecified correlation ae.
    2. The pair (Px, Py) is computed, where Px and Py are the cumulative
       probabilities in the standard normal distribution associated with
       Zx and Zy, respectively.
    3. X is taken as Fx1 (Px), and Y  is taken as Fy1 (Py).

     It can be seen that this method does indeed produce the desired
marginals  for  X  and  Y ;  Px  and  Py  are  both  uniformly  distributed
between  zero  and  one  by  construction,  so  the  different  values  of  X
and Y  occur with the desired marginal probabilities.  Moreover, the
correlation of X and Y  will have the same sign as ae, although it will
not  necessarily  have  the  same  magnitude.   As  described  below,  the
value of ae can be adjusted to obtain the desired numerical value for the
correlation between X and Y, as described in section 3.5.

3.2      Mixture
A second bivariate structure uses a mixture of maximally correlated
pairs  and  uncorrelated  pairs1.   With  this  structure,  X  is  randomly
 __________________________________
   1This structure was suggested to me by Dr.\ Ellen Hertz of the
  National Highway Traffic Safety Administration, United States
  Department of Transportation.

generated  from  its  marginal  distribution.   Then,  with  a  preselected
probability p, Y  is taken as Fy1 (Fx(X)) (or, if a negative correlation
is desired, as Fy1 (1  Fx(X))).  With probability 1  p, however, Y  is
chosen from its marginal distribution independently of X.
     Again,  the method produces the correct marginals.  X  is chosen
directly from its desired marginal distribution, and Y is simply a mixture
of two different cases, each of which has the desired marginal distribution
for Y .  The strength of the correlation between X and Y  is controlled
by the value of the mixture probability p, which may be adjusted to
obtain the desired correlation as described in section 3.5.

3.3      Bands
The third bivariate structure divides each marginal probability distribution
into percentile regions or "bands." With four bands, for example, each
distribution is divided into bands from 0-25%, 25-50%, 50-75%, and
75-100%.   X  is  generated  randomly  from  its  marginal  distribution,
and the program determines which band X  came from.  Then Y  is
generated  randomly  from  the  same  band  if  a  positive  correlation  is
desired, and from the complementary band for a negative correlation
(i.e.,  if one band goes from a to b%,  then the complementary band
goes from (100  b) to (100  a)%).   This structure also yields the
desired marginals. X is generated directly from its marginal, and Y  is
generated from one of a set of equally likely bands within its marginal.
The strength of the correlation between X  and Y  increases with the
number of bands, which can be adjusted to give the desired correlation
as described in section 3.5.
     A limitation of the band structure is that with an integral number
of bands it may not be possible to produce a desired correlation. With
normal and exponential marginals, for example, the correlation is 0.555
with two bands and 0.697 with three bands, so an intermediate correlation
could not be produced with an integral number of bands. To overcome
this limitation, the band technique was generalized to use a mixture
of different numbers of bands to generate different (X,Y) pairs.  For
example, if two bands are used with probability 0.6 and three bands
are used with probability 0.4, then the overall correlation is 0.612.  In
fact, any intermediate correlation can be produced by adjusting this
probability.  By convention, then, the program allows the number of
bands to be a real number rather than an integer, and it uses the real
part to determine the probabilities of the smaller versus larger number
of bands. The real number can be thought of as the expected number of
bands: 4.3 bands, for example, means there are 4 bands with probability
0.7 and 5 bands with probability 0.3.

3.4      RhoController
In the rest of this documentation, it is convenient to have a generic
name for the parameter controlling strength of the correlation, and I
will use the term "RhoController" for this purpose.  The meaning of
RhoController depends on the bivariate structure, as follows:

  Structure    Meaning of RhoController
  ---------    ----------------------------------------------------------
  Normal       Correlation ae between underlying normal random variables.
  Mixture      Probability p of generating a perfectly correlated pair.
  Bands        The number of bands.

Note that the value of RhoController is monotonically related to the
XY  correlation, but is not usually equal to it.

3.5  How RhoController is Adjusted

If requested to do so, Bivar will adjust the value of RhoController to try to
produce a given desired correlation in the underlying bivariate distribution.
For example, the user may request two underlying exponential marginals with a
bivariate mixture structure, and may instruct Bivar to adjust the mixture
probability to attain a true correlation of (say) 0.6 in the underlying
bivariate distribution.  In that case, Bivar will adjust the value of
RhoController using a simple numerical search algorithm, trying to find a
value of the mixture probability that produces the desired bivariate
correlation of 0.6.

During the numerical search, Bivar computes the true correlation produced by
each candidate value of RhoController using an N X N grid approximation of the
bivariate distribution, where N is the user-specified number of steps used to
approximate each distribution.  If the user specifies that each distribution
should be approximated by 100 points, for example, Bivar uses 100
equally-spaced percentile points of the X distribution (i.e., at percentiles
of 0.5%, 1.5%, ..., 99.5%).  For each of these points, it computes 100
equally-spaced percentile points of the Y distribution conditional on that
value of X.  In total, then, it computes 100 X 100 (X,Y) pairs, and
computes the numerical value of the correlation as if these were all the
possible pairs in the bivariate distribution and as if they were all equally
likely.  This is usually quite a good estimate of the true bivariate
correlation obtained with that value of RhoController, as long as the grid
approximation is OK.


4  Starting and Stopping the Program

The command to run Bivar is simply ``Bivar RootFileName'', where RootFileName
can be up to eight characters long.  The parameters needed for program
operation are specified in a file ``RootFileName.In'', which must be prepared
in advance with an ASCII text editor.  The random numbers are written to a
file called ``RootFileName.Dat'', and some other output is written to a file
called ``RootFileName.Biv''.

Bivar is sometimes pretty slow.  You should be able to abort it with
control-break, if you get impatient.


5  Modes of Program Operation

BIVAR has five different modes of operation,
corresponding to different actions that you might want it to perform.

Generate: This is the most common mode.   In this mode, BIVAR generates a set
          of random numbers   corresponding to the indicated marginals,
          bivariate structure,   and value of RhoController.
Search:   In this mode, BIVAR will search for a value of RhoController
          yielding a desired correlation with the specified marginals
          and bivariate structure.  This mode is useful when you are not sure
          what value of RhoController will give you the correlation you want.
          Note that no actual random numbers are generated in this mode ---
          you only get a value of RhoController that will give you the
          correlation you want.
SGen:     Search and then generate; combination of above two modes.
          This mode is useful when you want BIVAR to do the search and then
          generate the random numbers without any pause for operator intervention.
LimitChk: In this mode, BIVAR will simply compute the largest and smallest
          correlations possible with the specified marginals. This correlation
          is sometimes called the Frechet bound, and it is found by letting
          Fy(Y)=Fx(X) (largest positive correlation) or by letting
          Fy(Y)=1-Fx(X) (largest negative correlation).
ComputeRho: This mode is useful when you want to check the correlation that
          will be obtained with a certain value of RhoController.  You simply
          specify RhoController, and BIVAR computes the correlation that will
          result.

6        Preparing  the  Input  Control  File
The following block of six lines shows a sample input file.   The six
components of this input file are explained below.
                  * This is just a sample input control file.
                  Lognormal(5,1)
                  Gamma(3,.01)
                  Normal
                  Generate
                  0.8 10000 3 3 3 3
    1. The input file begins with any number of notepad lines, defined
       as lines that begin with an asterisk as the first character.  These
       lines are ignored by the program, and are simply for your use in
       reminding yourself what is in the file and how you used it.

    2. The first line after the notepad lines specifies the marginal
       distribution of X. The distribution names and parameters correspond to
       those used in the program CUPID. They are listed in the section 5; more
       complete explanations are given in the documentation accompanying
       CUPID.

    3. The next line gives the marginal distribution of Y .

    4. The next line gives the bivariate structure.  The alternatives are
       described in detail in Section 3. In brief, they are:
       Normal:     Based on bivariate normal.
       Mixture:     Mixture of uncorrelated & maximally correlated.
       Bands:    X and Y from same bands, e.g., first 10% of distribution.

    5. The next line specifies the program "mode" for this run.  It tells
       the program what you want it to do. The options are:

       Generate:     Generate a set of random numbers.

       Search:       Search for a value of RhoController yielding a desired
                     correlation.

       SGen:         Search and then generate; combination of above two modes.

       LimitChk:     Compute the largest and smallest correlations possible
                     with the specified marginals.

       ComputeRho:   Compute the correlation produced by a certain
                     value of RhoController.

    6. The last line is a set of numeric parameter values, and the program
       mode determines which parameter values need to be specified.
       The following table summarizes the parameters that are needed.

     Mode             Parameters on line
     ----------       -----------------------------------------------------
     Generate         RhoController.
                      Number of random (X,Y) pairs to generate.
                      Number of places to left of decimal point when writing X.
                      Number of places to right of decimal point when writing X.
                      Number of places to left of decimal point when writing Y .
                      Number of places to right of decimal point when writing Y .
     Search           Desired correlation.
                      Number of steps to use in approximating each distribution
                        (see section 3.5).
                      Error tolerance for search process (i.e., stop search
                        when correlation is within tolerance of desired).
     SGen             All parameters of Search followed by all parameters
                        of Generate except RhoController, on a single line.
     LimitChk         Number of steps to use in approximating each distribution.
     ComputeRho       RhoController.
                      Number of steps to use in approximating each distribution.

     Important note:  Lines in the input control file should have NO
leading or trailing white space.


7  Program Output

Bivar writes two output files, called RootFileName.Biv and RootFileName.Dat,
where ``RootFileName'' is specified on the command line.  RootFileName.Dat
simply contains the two columns of generated random numbers, Xi and Yi, for
i=1 to the requested sample size.  RootFileName.Biv contains other information
about the run that may be useful (this information is also written to the
screen).  Here is an example:

    Maximum correlation, ApproxSteps =  1.0000 200
    Bands method: NBands, ResultRho, ApproxSteps =  2.4996 0.8196 200
    Bands method: NBands, ResultRho, ApproxSteps =  1.7832 0.5874 200
    Bands method: NBands, ResultRho, ApproxSteps =  1.8114 0.6085 200
    Bands method: NBands, ResultRho, ApproxSteps =  1.7999 0.5999 200
    Search result: RhoController, ComputedRho =    1.79988  0.5999
    ! Summary of random numbers generated:
    ! SampleSize = 10000
    ! X: Mu   = 100; Sigma = 15.01
    ! X: Mean = 100.1; SD =    14.98
    ! Y: Mu   = 100; Sigma = 15.01
    ! Y: Mean = 100.1; SD =    15.08
    ! r(XY) =  0.5961
    ! ChiSqr test for fits to desired marginal distributions, df = 100:
    !  X: ChiSqr =    88.240, p = 0.794
    !  Y: ChiSqr =    93.580, p = 0.662

All lines down to the one beginning ``Search result'' are generated during the
search phase of the program, as the value of RhoController is being adjusted.
The only one of these lines you would ever be likely to need is the search
result line itself, which gives the final value of RhoController and the value
of Rho that it produces.  Subsequent lines, all starting with an exclamation
point, mainly summarize the random numbers that were generated in the current
run.  The Mu and Sigma values are the true values from the underlying
marginals, and these should not vary from run to run.  The Mean and SD values
are properties of the random sample generated on that run value; in other
words, these are subject to sampling error, but they should presumably be
close to true Mu and Sigma.  The r(XY) line gives the observed correlation in
the generated sample, and it should be within sampling error of the desired
value if the program is working correctly.

The last three lines report the results of Chi-square tests carried out to see
whether the generated values deviate significantly from the desired marginal
distributions.  Each chi-square test is computed as follows:
  1. The variable's range is divided into 100 bins, each containing 1\% of the
     probability in the true underlying marginal distribution.
  2. The number of random numbers in each bin is counted.
  3. The observed chi-square is the sum, across the 100 bins, of
       (fe -fo)^2/fe, where fe and fo are the
      expected and observed number in each bin.
If this chi-square test is significant (e.g., p<.05), then the random numbers
deviate from the specified marginal, indicating that the program is not
working correctly or that a chance deviation has occurred.  (It should be
noted that the p values associated with this test are not very accurate
unless you generated at least 500 pairs of random numbers.)


8        Available  Marginal  Distributions

Here is a list of the available marginal probability distributions, together
with a brief explanation of their parameters.  For further information,
see the CUPID documentation.

8.1      Continuous Distributions

Beta(A, B)     The Beta distribution is defined over the interval from zero
       to one, and its shape is determined by its two parameters A and
       B. The mean is A=(A+B), and the variance is AB(A+B)2  (A+
       B + 1)1  .

Cauchy(L, S)      This  distribution  is  defined  in  terms  of  location  and
       scale parameters L and S > 0, respectively.

ChiSquare(df )       This is the distribution of the sum of df independent
       squared  standard  normals.   Its  parameter  is  df  _  a  positive
       integer.

ExGaussian(Mu,Sigma,Rate)               This is the distribution of the sum
       of independent Normal and Exponential random variables.  Its
       parameters are the Mu and Sigma of the Normal, and the Rate
       of the Exponential.

Exponential(Rate)         The mean is 1/Rate.

ExpoNo(Mu,Sigma)            This is the distribution of
                                             e^X
                                     Y  = ________
                                          1 + e^X
       where X has a normal distribution with mean Mu and standard
       deviation Sigma.

ExtremeVal(Alpha,Beta)             Extreme-value Type I distribution (a.k.a.
       Fisher-Tippett distribution, Gumbel distribution, sometimes also
       called the double exponential distribution, to be confused with
       the Laplace distribution).

F(df1,df2)     This is Fisher's distribution.  The two integer parameters
       are the degrees of freedom of the numerator and denominator,
       respectively.

Gamma       This is the distribution of the sum of N  exponentials, each
       with rate ff. In this distribution, N must be a positive integer. In
       the RNGamma distribution (see below), N is any positive real.

Geary(SampleSize)

Laplace(Location,Scale)

Lilliefors(SampleSize)

Logistic(Mu,Beta)

LogNormal(Mu,Sigma)             The distribution of X  such that ln (X) is
       normally distributed.  The parameters are the  and oe  of the
       normal.

Normal(Mu,Sigma)

Rayleigh(Scale)

RNGamma(RN,Rate)               Like "Gamma", except in this version, the
       first parameter is a real number rather than an integer.

rPearson(SampleSize)           Sampling distribution of Pearson's r (correlation
       coefficient) under the null hypothesis that the true correlation is
       zero (and assuming the usual bivariate normality).

t(df )  Student's t-distribution, with parameter df .

Triangular(Low,High)           The density function has the shape of a triangle
       with the peak in the middle of the range from Low to High.

TriangularG(Low,Peak,High)              The density function has the shape
       of a triangle across the range, but the peak need not be halfway
       between low and high.

Uniform(Low,High)           All values are equally likely within some range.

Weibull(Scale,Power,Origin)

8.2      Discrete Distributions

Binomial(N,p)

Constant(Value)         This is a degenerate distribution that always takes
       on the same value. Its parameter is that value.

Poisson(U)

8.3      Derived Distributions
It is also possible to specify marginal distributions that are derived from
one or more of the above primitive or "basis" distributions.

Convolution       This is the distribution of a sum of independent random
       variables.  To define a convolution, the user types something of
       the form:
         Convolution(BasisDist1(Parms),. . .,BasisDistK(Parms))
       There are K random variables summed together, and the distributions
       of these summed variables are simply listed, separated by commas.
       For example, Convolution(Normal(0,1),Uniform(0,1)) defines
       the  distribution  that  is  the  sum  of  a  standard  normal  and  a
       uniform(0,1).

ConvolutionIID         This is just an easier way to specify a convolution
       when all the summed random variables have the same distribution.
                        ConvolutionIID(3,Uniform(0,1))
       is the same as
         Convolution(Uniform(0,1),Uniform(0,1),Uniform(0,1))

Mixture      Mixtures are distributions formed by randomly selecting one
       of a number of random variables. For example,
         Mixture(0.5,Normal(0,1),0.5,Uniform(0,1))
       defines a random variable that comes from a standard normal half
       the time and a standard uniform the other half of the time.  In
       general, the format of this distribution is:
       Mixture(p1,BasisDist1(Parms),p2,BasisDist1(Parms),. . .,pk,BasisDistK(Parms))
       and the pi's must sum to one (it is also legal to omit pk).

Truncated  A truncated distribution is a conditional distribution, conditioning
       on the random variable falling within the interval from Min to
       Max.   For  example,  Truncated(Normal(0,1),-1,1)  defines  a
       random variable that is always between -1 and 1, and which within
       that interval has relative probabilities defined by the PDF of the
       standard normal. In general, the format of this distribution is:
               Truncated(BasisDistribution(Parms),Min,Max)
       It is sometimes convenient to specify the truncation boundaries in
       terms the probabilities you want to cut off rather than the scores
       themselves.  For example, you might want to look at the middle
       90% of a normal distribution but might not immediately know
       which scores cut off the top and bottom 5%.  For this reason,
       there is a variant of the command that takes probabilities instead
       of values for min and max, like this:
             TruncatedP(BasisDistribution(Parms),0.05,0.95)

Order     The distribution of this order statistic is the distribution of the
       k'th largest observation in a sample of n independent observations.
       For example, Order(2,Normal(0,1),Uniform(0,1),Exponential(1))
       defines a random variable that is the median (2nd largest) in a
       sample containing one score from the standard normal, one from
       the uniform from 0-1, and one from the exponential with rate 1.
       In general, the format of this distribution is:
           Order(k,BasisDist1(Parms),. . .,BasisDistN(Parms))
       In the special case where the basis distributions are all identical,
       it is more convenient to use the OrderIID distribution, described
       next.

OrderIID      This is the special case of the order distribution in which
       the basis distributions are identical as well as independent.  In
       general, the format of this distribution is:
                       OrderIID(k,N ,BasisDist(Parms))
       It is only necessary to specify the basis distribution once, since
       all are identical; instead, you have to specify how many there are
       (N ).

Transformations        One can form a new random variable (Y ) by taking
       a mathematical transformation of an existing one (X). The following
       table lists the transformations that are recognized, illustrating the
       syntax for each.  Also listed are the constraints on the values of
       X.

Transformation          Example of Syntax                 Constraints on X
----------------------  -----------------------           ------------------
Exponential (Y = e^X)   ExpTrans(Uniform(.5,1))           Not too far from 0.
Inverse (Y = 1/X)       InverseTrans(Uniform(.5,1))       Not too close to 0.
Linear (Y = A X + B)    LinearTrans(Uniform(.5,1),2,10)
Natural Log (Y = ln[X]) LnTrans(Uniform(0.5,1))           X > 0
Power (Y = X^p)         PowerTrans(Uniform(.5,1),2)       X > 0

     Because distributions are constructed recursively, it is legal to construct
weird marginal distributions by any combination of the above.   For
example, this would be legal:
Truncated(Mixture(.5,Normal(0,1),.5,OrderIID(4,5,Normal(0,1))),-1,1)
and it indicates a truncated mixture of a normal distribution and an
order statistic.

8.4      Distributions Arising in Connection with Signal Detection Theory
The  distributions  described  in  this  section  arise  in  connection  with
signal detection theory experiments,  and will be of interest to some
psychophysicists and perhaps engineers. If you don't know what signal
detection theory is, then it is unlikely that you will care about these.
Note: These are all discrete distributions, as each reflects the outcome
of one or two binomial-type conditions with a finite number of trials.

ZfromP(SampleSize,TrueP,Adjust)   This is the discrete distribution
       of Z, which is derived from the binomial distribution as follows:
          1. For any sample from a Binomial(N,P), convert the number
             of successes k  to the probability of success,  p  j  k=N .  If
             p = 0, set p = Adjust  =N ; if p = 1, set p = 1  Adjust  =N .
          2. Find  Z  such  that  p  =  Pr(z    Z),  where  z  is  a  random
             variable having the standard normal distribution.

APrime(NSignalTrials,PrHit,NNoiseTrials,PrFA)   This is the distribution
       of the sample A0computed from an experiment with NSignalTrials
       signal trials each having the specified true probability of a hit, and
       NNoiseTrials noise trials each having the specified true probability
       of a false alarm.  Specifically, A0 is the distribution-free estimate
       of the area under the ROC curve computed using Equations 2
       and 9 of Aaronson and Watts (1987, Psychological Bulletin).

APrimeSym(NTrials,PrCorrect)                This is a shortcut for the previous
       distribution that can be used when there are equal numbers of
       signal  and  noise  trials  and  when  the  probability  of  a  correct
       response (hit or correct rejection) is the same for both signal and
       noise trials.

YNdPrime(NSignalTrials,PrHit,NNoiseTrials,PrFA,Adjust)           This
       is the distribution of the sample ^d0computed from an experiment
       with  NSignalTrials  signal  trials  each  having  the  specified  true
       probability  of  a  hit,  NNoiseTrials  noise  trials  each  having  the
       specified true probability of a false alarm, and using the Adjust
       factor to correct cases with 0% or 100% hits or false alarms (e.g.,
       replace 0 hits with Adjust hits,  and replace NSignalTrials hits
       with [NSignalTrials - Adjust] hits).

YNdPrimeSym(NTrials,TrueDprime,Adjust)      This is the special
       case of YNdPrime in which NSignalTrials = NNoiseTrials and
       Pr(Hit) = 1 - Pr(FA). Note that the second parameter is the true
       d0 rather than the hit probability.

9      Technical  Notes

9.1      Installation
No special installation is required. You can simply run BIVAR in the directory
to which you unzipped it, or copy BIVAR.EXE to any directory in your
path.
     Nonetheless, there is one installation issue to consider. BIVAR uses
a file called CUPID.RND (also used by CUPID) to control the seeding
of the random number generator.  When it is executed, BIVAR checks
the  current  directory  for  CUPID.RND;  if  this  file  is  found,  BIVAR
retrieves the state of the random number generator from it. When
it is done,  BIVAR writes out the final state of the random number
generator to CUPID.RND, so that the next time it runs the random
number generator will continue on from where it left off.
     Note that you can make BIVAR restart the random number generator
from  the  same  spot  if  you  want  to:  Either  move  the  CUPID.RND
file out of the directory to hide it, or else keep your own copy of
CUPID.RND in another directory and restore this file each time BIVAR
overwrites it.
     Since it may be annoying to keep moving CUPID.RND to the whatever
directory you are working in,  another option is provided.  If no file
CUPID.RND is found in the current directory, then BIVAR checks for
an environment variable called "CUPID". If it is defined, its value tells
BIVAR which directory to look in for CUPID.RND. The advantage of
this is that you can then run BIVAR within any directory and have it
go find the seed file in a standard location, rather than having a copy
of CUPID.RND in each directory where you want to run the program.
     To set the environment variable, put a line like
 set  CUPID=C:\AMAZING\CUPID
in your autoexec.bat file, and then store CUPID.RND in the indicated
directory. (Don't forget you have to reboot for this to take effect.)

9.2      Interfacing BIVAR to a Simulation Program
The main problem in using BIVAR to generate random numbers for
one of your own simulation programs is to give your program access
to the random numbers. I know of only two solutions to this problem;
neither is particularly fast in terms of computer time, but they may
save the all-important programmer time, compared to coding your own
random-number generators from scratch.
     The simpler solution is to use BIVAR to generate a large disk file
of random numbers from the desired distribution, and then start up
your program and let it read the random numbers from the file.  This
solution works well enough provided that (a) the desired distributions
can be specified before you start running your simulation program, and
(b)  your  program  needs  few  enough  random  numbers  that  you  can
conveniently store them all in a disk file.
     The more complex solution is not subject to either of these constraints,
but requires a little more programming effort. To use this solution, your
program must invoke BIVAR as an external program.  In Pascal, for
example, the syntax for this would be something like:
           Exec('BIVAR.EXE','RootFileName');
This  would  start  BIVAR  with  the  command-line  parameter  in  the
second string. (All the programming languages I know of allow you to
start an external program from within your own program, so I assume
this is generally possible.) Of course your program would have to have
already written the control file called RootFileName.In, with the desired
specifications;  this should not be too difficult,  however,  since it is a
pretty straightforward ASCII file.  Using this approach, the idea is to
generate an appropriate batch of random numbers to a file RootFileName.Dat,
read the random numbers from that file, generate a new batch, and so on.
Since you can write the input file within your simulation program, you can
generate the random numbers from whatever distribution the program
selects at run-time, and of course you can use different distributions
on successive calls to BIVAR. And of course you can generate random
numbers in batches of any convenient size.  Of course, when using this
strategy your program has to keep track of how many numbers it has read from
the Dat file, so that it will know when to generate another batch.
     A useful trick is available if you use the more complex solution. By
default, BIVAR always asks for user confirmation before writing over
an existing file.  That means that your program would have to delete
the old output files before BIVAR will generate a new batch of random
numbers, or else you would have to babysit the simulations and provide
user confirmation each time a new batch was generated. The trick is to
use an exclamation point as the first character of the output file name.
If you do that, BIVAR will (a) delete the exclamation point from the
output file names, and (b) write the output to a file with the
indicated name, automatically overwriting the file if it
already exists.

9.3      Optimizing Speed
If you are going to generate a lot of random numbers, it is probably
worth your while to try switching the two distributions you call the
marginals  for  X  and  Y .   The  two  distributions  are  used  differently
within  the  program  (see  descriptions  of  computational  methods  for
the three bivariate structures), and so sometimes the speed of random
number generation varies depending on which variable you call X and
which you call Y .  For example, generating random uniform numbers
and random values of Pearson's r  using the mixture structure,  it is
faster to let X be the Pearson and Y  be the uniform than the reverse.

Another speed issue arises in connection with the search process described in
section 3.5.  It is easy to see that the time needed for searching increases
with the square of the value of the ``number of steps'' approximation
parameter.  In practice, I have usually found that 100 steps is adequate to
give quite a good approximation, but nonetheless I usually use 200 or even 400
steps to err on the side of slowness and accuracy.

If you are interested in search speed, you should play around a bit with the
``number of steps'' parameter.  For example, do a first run with 50 steps,
then others with 100, 200, and so on.  For each run, note the value of
RhoController that the search process converges on, holding fixed the target
correlation, of course.  As you increase the number of steps, RhoController
should stabilize to a relatively constant value; you can use the minimum
number of steps yielding this constant asymptotic value.  My general
impression is that you need a lot of steps only with distributions that have
really extreme tails, but I have not looked closely at this question.

9.4      Numerical Approximations
One key point is that all distributions are represented numerically, with
finite limits. BIVAR's version of the standard normal distribution, for
example, goes from about -5.6 to 5.6, not from 1 to 1.  Similarly,
there are numerical bounds for all distributions (you can find out what
bounds BIVAR is using by running CUPID and using the functions
minimum and maximum).  In addition, BIVAR sometimes has to change
bounds of naturally bounded distributions in order to avoid numerical
errors.  Gamma distributions, for example, start at 0.00001 instead of
0.0, because the Gamma PDF cannot be evaluated at 0.0.
     Although it is very general,  BIVAR is not always very accurate.
Many values are obtained through numerical integration, and the results
can be substantially off in some pathological cases, due to the vagaries
of numerical approximations with finite-precision math.  The moral of
the story is that you should check the values that you care most about.
One good check is to make very minor changes in parameter values and
make sure that the results change only slightly.

10     For  Further  Information
                Jeff Miller
                Dept. of Psychology
                University of Otago
                Dunedin, New Zealand
                email: miller@otago.ac.nz
                FAX: (64-3)-479-8335

11     Release  History
Version 1.0 was released on a limited basis in April 1997.
Version 1.1 was released in August 1997, with improved documentation, a
revised interface, and some bug fixes.

12     Sample  Acknowledgement  Letter
I don't want much, just some feedback on who is using BIVAR and what
they are using it for.  Something like the letter shown below would be
fabulous.  But it has to be a real signed letter on paper to do me any
good.  Please don't use e-mail!  Of course I would also welcome bug
reports and suggestions for improvement, too, although I can't promise
any fast action on those. Don't forget what you paid for this!
           Prof Jeff Miller
           Department of Psychology
           Univ of Otago
           Dunedin, New Zealand
           Dear Prof Miller,
           This is to acknowledge that I have used the computer program
           BIVAR for generating random numbers during the past year.
           Include all of the following that apply, and any other uses that I
           haven't thought of:   I have used it for simulation and modeling in
           my research in the field of Cognitive Psychology, and also in the
           construction or analysis of data to be used as classroom examples
           in teaching research methods and statistics. Approximately 5 of
           my (Masters, PhD, post-doctoral) students have also used this
           program in their work, and approximately 30 of my students used
           it for doing their assignments in statistics classes.
           Sincerely,
           etc
