Input

(The MQLS_input file is a modified version of the input.txt file from the CC-QLS software package.)

1.  marker data file (e.g. 'markid')

    This file contains the marker data and binary phenotype information. 
    It has the standard linkage format :
   

       1   1   7   6   1   1   1   2   1   2
       1   2   7   6   2   2   1   1   1   2
       1   3   7   6   1   2   3   1   1   2
       1   7   0   0   1   1   1   1   1   1
       1   6   0   0   2   0   2   3   2   2
       2   1   8   9   2   2   1   3   1   1
       2   2   8   9   1   1   2   3   1   1
       2   8   0   0   1   0   1   2   1   2
       2   9   5   6   2   1   3   1   1   1
       2   5   0   0   1   0   3   2   1   2
       2   6   0   0   2   0   1   1   1   2
      (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) ....
  
     (1) family ID 
     (2) individual ID 
     (3) father's ID (0=unknown)
     (4) mother's ID (0=unknown)
     (5) sex (1=male, 2=female) 
     (6) affection status (0=unknown, 1=unaffected, 2=affected) 
     (7-8), (9-10)... marker genotype (0 for missing alleles)

    Families should be numbered from 1 to F, without gaps. There is
    no limit on the number of individuals, but 
    the number of families is set to be smaller than 500. To increase this
    limit, just change the value of MAXFAM in the MQLStest.c source file 
     and recompile the program.
    Each individual should be entered only once.

    The number of columns should be the same for every individual: 
     Use 0 for missing information (missing genotypes or unknown
	phenotype).

    The number of markers to be analyzed is determined using the number 
    of columns on the first line of this file. There is a limit on the 
    number of markers (set to 500). To increase the limit, just change the
    value of MAXMARK in the MQLS.c source file and recompile the program.

    Alleles at each marker locus should be numbered from 1 to M without gaps.
    There is no limit on the number of alleles at each marker locus.
  
    All the individuals who have either (1) known affection status or
    (2) at least some genotype information should also be listed in the 
    kinship coefficient file (see below).
 
2.  The kinship coefficient file (e.g. 'kinshipcoef')

    This file contains the kinship and inbreeding coefficients for all 
    possible pairs of individuals, within each family, who have either 
    (1) known affection status or (2) non-missing genotype for at least 
    one marker.  (E.g. an individual with unknown phenotype should still
    be included if he or she has any non-missing genotype information.)
    If an individual has an inbreeding coefficient equal to 0 or if a 
    pair of individuals within a family has a kinship
    coefficient equal to 0, they must still be in the file or the 
    individuals will not be included in the analysis.

    All pairs should be considered, regardless of phenotype 
    (affected/affected, affected/unaffected and affected/unknown,
    unaffected/unaffected, unaffected/unknown, unknown/unknown)
   
    It has the following format :
   
    1   1   1   0.0
    1   1   2   0.25
    1   1   3   0.25
    1   1   7   0.25 
    1   1   6   0.25
    1   2   2   0.0
    1   2   3   0.25
    1   2   7   0.25
    1   2   6   0.25
    1   3   3   0.0
    1   3   7   0.25
    1   3   6   0.25
    1   7   7   0
    1   7   6   0
    1   6   6   0
    1   3   3   1
    2   1   1   0.01251 
    2   1   2   0.26124
    .   .   .   .
    .   .   .   .
   (1) (2) (3) (4)

   (1) family ID 
   (2) individual 1 ID (Id1)
   (3) individual 2 ID (Id2)
   (4) kinship coefficient between Id1 and Id2 if Id1 is different from Id2
       inbreeding coefficient of Id1 if Id1 equals Id2


   The family ID and individual ID should match exactly with the Id's in the
   marker data file.

    The program runs faster when the coefficients are ordered in the following way :

	   In each family, the order of the pairs follow the order of the individuals
   given in the marker data file. Considering a family numbered 4 with 3 individuals 
   47, 48 and 49 listed in this order in the marker data file, 
   the order in the kinship coefficient file would be : 
              4 47 47  H_47
	      4 47 48  phi_(47,48)
              4 47 49  phi_(47,49)	
              4 48 48  phi_(48,48)
              4 48 49  phi_(48,49)
              4 49 49  phi_(49,49)

Two software programs that can be used to obtain kinship and inbreeding coefficients are

(1) The KinInbcoef software.  The output file of the KinInbcoef program has the exact format required for the MQLS software.
  
    The KinInbcoef program can be found at http://www.stat.uchicago.edu/~mcpeek/software/KinInbcoef/index.html

and

(2) The idcoefs 2.0 software which can be found at http://home.uchicago.edu/~abney/abney_web/Software.html

The idcoefs 2.0 software computes identity coefficients for pairs of 
individuals.  Kinship and inbreeding coefficients can then easily 
be computed from the identity coefficients (the output from this software).
 
  
 3. The prevalence file (e.g. 'prevalence')

    This file contains the estimate of the prevalence of the binary
trait in the general population.  This prevalence value is used in the 
calculation the MQLS statistic.


 4. The MQLS software gives the user TWO OPTIONS for how to handle the 
individuals of unknown phenotype in the analysis.  The user specifies the 
option, either 1 or 2, at the command line for running the executable 
program 'MQLStest'.  See the 'MQLS_README' file for running the executable 
program with the two options.  Below are details of the two options:

	OPTION 1:  This should be considered the default for the MQLS
test.  Under this option, the MQLS test is performed with 3 
different phenotype categories allowed: affected, unaffected, and unknown. 
This option is useful in that it allows individuals who are genotyped but 
not phenotyped to be included in the analysis.  Furthermore, phenotyped 
individuals with missing genotype data are also allowed to contribute to 
the MQLS test with this option (if they have genotyped relatives in the 
sample).  The WQLS and corrected chi-squared statistics are computed with 
the cases taken to be the affecteds and the controls taken to be the 
unknown and unaffected individuals combined.  They do not make use of 
individuals with missing genotype data at the tested marker.

	OPTION 2:  This option is provided for backward compatibility 
with the CC-QLS software for calculating WQLS and corrected chi-squared.
In this option, individuals with unknown phenotype are excluded from 
all tests, and individuals with missing genotype data at a given marker
are excluded from the test at that marker.  If this option is run, 
results for WQLS and corrected chi-squared will be consistent with the 
output of the CC-QLS software (provided that there are no MZ twin pairs 
in the sample --- MZ twins are allowed when using the MQLS software but 
not in the CC-QLS software).  Under option 2, the MQLS test will also be 
performed with these individuals removed from the analysis, which could 
reduce its power.