Directory structure

The complete work directory is downloadable for each cluster. On its top level are the analysis results done for all vertebrate sequences and all families. The structure chosen to represent the cluster can also be found there.

The directory contains further subdirectories for each member of the cluster. There two sets of results can be found - one using mammalian sequences only, and the other using all vertebrates. Note that the results also offer specialization scoring (by comparing with the remaining members of the cluster) but the alignments are restricted to the eponymous member.

Note finally, that the results of the mutual-best-hit search using Ensembl genomes (including the log file) can be found in both "mammals" and "all_vertebrates" directories. Not to duplicate the information available in "mammals" directory, in "all vertebrates," the search log refers to vertebrates other than mammals.

FIg. 1. Directory structure of downloadable work directories.

The "raw output" files include the alignment files (msf). These should perhaps be called the raw input files, but oh well. They *are* the output of something. ( Mafft , in this case) The rest are the raw output from the code called "hypercube" (can be found here), the engine behind the database. These files are all human-readable, and consist of the "patched" alignment (in the aligned fasta [afa] format; the Ensemble sequences often contain X's, that we "patch" by taking them to be the same as in the most complete sequence from some other specie), the species used in patching (patchlog) and the table of scores, that can be found in its multicolored incarnation in the form of the spreadsheet (xls file).