Input DHSMAP requires 2 input files which are described below. 1. "pedfile" file This file contains the haplotypes or genotypes from the affected and control subjects. Data for a given subject are listed together on one or more consecutive rows. There are no limitations on the number of subjects. Subjects may be listed in any order; there is no need to separate affecteds from controls in this file. The marker data for each individual consists of a series of pairs of allele labels, the genotypes at the marker loci. The loci may be entered in any order, but this ordering must be the same for all individuals. If haplotypes are available, the first allele in each pair lies on one haplotype and the second in each pair lies on the other haplotype. When only genotypes are available, the order in which the alleles appear within each pair is ignored. The data for one subject may require more than one line, e.g., if there are a large number of typed loci. The software allows for this and begins reading data for the next subject after encountering the expected numbers of skipped entries (see below) and marker genotypes. The same number of entries must appear for each subject; all entries must be separated by spaces or tabs. The marker genotypes may be preceded by several (numeric or non-numeric) entries, e.g., subject IDs, M/F indicator, etc. These entries, the number of which are specified in the "datafile" file, are ignored by the software. Alleles should be labeled with positive integers; non-typed alleles are denoted by "0"; a genotype at a marker may consist of one typed and one non-typed allele, i.e., 1 0 and 0 1 are allowed. Below is a sample pedfile, "pedfile_ex", composed of simulated data: 1 0 2 1 1 1 1 1 1 1 2 2 1 1 2 0 2 2 1 1 1 1 1 1 2 2 2 2 3 0 2 2 1 1 1 1 1 1 2 2 1 1 4 0 2 2 1 1 1 1 1 2 2 2 1 1 5 0 2 2 2 2 2 1 1 1 2 2 1 1 6 0 2 2 1 1 1 1 1 1 2 2 1 1 7 0 2 2 1 1 1 1 1 1 1 1 1 1 8 0 2 2 1 1 1 1 1 1 2 1 1 1 9 0 1 2 1 1 2 1 1 1 1 2 1 1 10 0 2 2 1 1 1 1 1 1 2 2 1 1 11 0 2 2 1 1 1 1 1 1 1 2 1 1 12 0 2 1 1 1 1 2 1 1 2 2 1 1 13 0 2 1 1 1 1 2 1 1 2 2 2 1 14 0 1 2 1 1 1 1 1 1 2 2 1 1 15 0 1 2 1 1 1 1 1 1 2 2 1 1 16 0 2 2 2 2 2 1 2 2 2 1 2 1 17 0 2 1 2 2 2 2 1 2 2 1 2 1 18 0 2 2 2 2 1 1 1 2 2 2 1 2 19 0 1 2 1 1 1 1 1 2 1 2 2 2 20 0 2 2 1 2 1 2 1 2 1 2 2 1 21 0 2 1 1 2 2 2 2 1 2 2 2 1 22 0 2 1 2 1 2 2 2 2 2 2 2 1 23 0 2 1 2 1 2 1 2 1 2 2 2 1 24 0 2 1 2 1 2 1 1 1 2 2 2 1 25 0 2 2 1 2 2 1 1 2 2 1 2 1 26 0 2 2 2 2 1 1 1 2 1 1 1 2 27 0 1 1 2 1 2 2 2 2 2 1 1 1 28 0 2 2 2 2 2 2 2 1 1 2 1 2 29 0 1 2 2 1 1 1 1 1 2 2 2 2 30 0 1 2 1 1 1 1 2 2 1 1 1 2 31 0 1 1 1 1 1 1 1 2 2 1 1 1 32 0 2 2 2 1 1 2 1 2 2 1 1 2 33 0 1 1 1 2 1 2 1 2 1 1 1 2 34 0 1 1 1 1 2 2 1 2 2 2 2 1 35 0 1 1 1 1 2 2 2 1 1 1 1 1 col. 1 2 3 4 5 6 ... (1)-(2) columns to be skipped (3)-(4) marker genotype for 1st marker listed (5)-(6) marker genotype for 2nd marker listed... 2. "datafile" file This file contains settings for DHSMAP, as well as lists of included affected and control individuals and the marker map. In the lists of individuals included in the analysis, subjects are identified by the order in which they appear in the pedfile, i.e., the first subject listed is "1", the second is "2", etc. Similarly, the loci are labeled according to the order of appearance in pedfile. When a parameter must be given for each locus, this information may be entered in map order, i.e., the order in which the markers appear on the map, or pedfile order, i.e., the order in which they appear in the pedfile. Below is a sample datafile, "datafile_ex", with comments interspersed (Note: 1. the symbol "#" precedes all comments in this file--the software ignores the rest of the line when it encounters this character. 2. datafile is generally robust to placement of spaces and lines; exceptions are noted below.): 1 #program code; set to 1 when fine-mapping 1 #data type for affecteds; #1, if haplotypes available #2, if only genotypes available 6 #number of loci listed in pedfile 2 #number of columns (i.e. fields) to skip in pedfile before #marker genotypes begin 6 #number of markers from pedfile included in analysis 2 1 3 4 5 6 #markers, IN MAP ORDER, where loci in #pedfile are assumed to be given in order #1:n 1 #map type #1, if inter-locus distances in map order (1 fewer than incl. loci) #2, if location on numberline, loci in map order #3, if location on numberline, loci in pedfile order #(for types 2&3, locus farthest left should be assigned #position 0.0) 0.2 0.2 0.2 0.2 0.2 #map, corresponding to map type 1 (all distances #given in cM; see "Tips" for more info.) #for map type=2, map is 0.0 0.2 0.4 0.6 0.8 1.0 #for map type=3, map is 0.2 0.0 0.4 0.6 0.8 1.0 20 #no. of affecteds given in pedfile 20 #no. affected individuals included in analysis 1 2 3 4 5 #Included affected individuals; subjects 6 7 8 9 10 #are labeled in the order in which they are 11 12 13 14 15 #given in pedfile 16 17 18 19 20 1 #controls type #1, if haplotypes available for controls #2, if genotypes available for controls #3, if only allele freqs. avail. for controls, in pedfile order #4, if only allele freqs. avail. for controls, in map order #(see "Tips" for more info. on controls types 3/4) 1 #order of Markov Chain modelling background LD (1 or 2) 1 #Bayesian adjustment to estimated hap. freqs? (1=yes,0=no) 15 #number of controls given in pedfile (only if cont. type is 1/2) 15 #number of controls used in analysis (only if cont. type is 1/2) 21 22 23 24 25 #Included controls (only if cont. type is 1/2) 26 27 28 29 30 31 32 33 34 35 #if controls type is 3 or 4, i.e., only allele frequencies available, #the following is the format for reading in the allele frequencies, #in place of the above (See "Tips" for more info. on this option): # #3 #controls type # #2 1 2 0.5 0.5 # #2 1 2 0.5 0.5 # #2 1 2 0.5 0.5 # #2 1 2 0.5 0.5 # #2 1 2 0.5 0.5 # #2 1 2 0.5 0.5 # #Each line corresponds to a marker. The first entry gives the # #number of alleles for that marker. This is followed by the list # #of alleles and a list of the corresponding frequencies (Note # #that the freqs. must sum to 1.0 for each marker) 1e-5 1e-5 1e-5 1e-5 1e-5 #per meiosis per locus mutation rates 1e-5 #(given in pedfile order) 1 #estimate heterogeneity parameter p? #1, to estimate p #0, to fix p at initial value 0.25 #initial value for p (between 0.0 and 1.0) used for E-M alg. #Names given to results files; each filename must be given on #a new line (for descriptions, see "Output") resout ancout maxout oneout 0 #E_int; interval from the midpoint of which to grow #ancestral hap; 0, if unknown (See "Search Procedures" #for description and "Tips" for advice) 0.1 #max_res (See "Search Procedures" for description and #"Tips" for advice) 20 #map_res (See "Search Procedures" for description and #"Tips" for advice) 20 #max_cand (See "Search Procedures" for description and #"Tips" for advice) 0 # anc_hap_known #0, ancestral haplotypes will be estimated #1, if set of ancestral haplotypes over which to search #is specified #if anc_hap_known=1, it is followed by the number of haplotypes #in the set and then the haplotypes themselves, e.g., # 2 #number of candidate ancestral haplotypes to search over # 1 2 1 1 2 1 # 1 2 2 2 2 2