|
phastCons data files contain the compressed
conservation scores that underlie the Conservation annotation
track and the phastCons table. For a detailed description
of the algorithm used to produce the scores, see the
Genome Browser description page associated with the
Conservation track.
File Format (assemblies released Nov. 2004 and later)
When uncompressed, the file contains a declaration line and
one column of data in wiggle table fixed-step format:
fixedStep chrom=scaffold_1 start=3462 step=1
0.0978
0.1588
0.1919
0.1948
0.1684
1. Declaration line: The declaration line specifies the
starting point of the data in
the assembly. It consists of the following fields:
-
fixedStep -- keyword indicating the wiggle
track format used to write the data. In fixed step
format, the data is single-column with a fixed
interval between values.
-
chrom -- chromosome or scaffold on which
first value is located.
-
start -- position of first value on
chromosome or scaffold specified by chrom. NOTE:
Unlike most Genome Browser coordinates, these are one-based.
-
step -- size of the interval (in bases)
between values.
A new declaration line is inserted in the file when the
chrom value changes, when a gap is encountered
(requiring a new start value), or when the
step interval changes.
2. Data lines: The first data value below the header
shows the score
corresponding to the position specified in the header.
Subsequent score values step along the assembly in one-base
intervals. The score shows the posterior probability that
phastCons's phylogenetic hidden Markov model (HMM) is in its
most-conserved state at that base position.
File Format (assemblies prior to Nov. 2004)
When uncompressed, the data file contains two columns:
294 0.0953
295 0.0948
296 0.0943
297 0.0936
298 0.0929
299 0.0921
Column #1 contains a one-based position coordinate.
Column #2 contains a score showing the posterior probability
that phastCons's phylogenetic hidden Markov model (HMM) is in
its most conserved state at that base position.
References for phastCons
Siepel A and Haussler D (2005). Phylogenetic hidden Markov
models. In R. Nielsen, ed., Statistical Methods in Molecular
Evolution, pp. 325-351, Springer, New York.
Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M.,
Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W.,
Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent,
W.J., Miller, W., and Haussler, D.
Evolutionarily conserved elements in vertebrate,
insect, worm, and yeast genomes.
Genome Res. 15, 1034-1050 (2005).
For a discussion of the methods used to calculate the
phastCons scores, see the description page for the hg17
Conservation track in the Genome Browser.
| |