Cube-DB publications

Zong Hong Zhang, Kavitha Bharatham, Sharon M. Q. Chee, and Ivana Mihalek, Cube-DB: detection of functional divergence in human protein families, Nucleic Acids Res. 2012 January; 40(D1): D490–D494. freely acessible on the publishers site

Bharatham K, Zhang ZH, Mihalek I (2011) Determinants, Discriminants, Conserved Residues - A Heuristic Approach to Detection of Functional Divergence in Protein Families. PLoS ONE 6(9): e24382. freely acessible on the publishers site

Scores calculated by Cube-DB

The publication describing specialization scoring method used in Cube-DB can be found here (please cite if you find the results useful). In particular, you can read about the score in the supplementary document downloadable here. The specialization scores used in Cube-DB are linear combinations of conservation in the protein group of interest, and the overlap (or the lack thereof) in the amino acid choice with the other groups in the family.

Cube-DB server reports two types of specialization scores. One score highlights discriminants, residues that are conserved in each of the protein subfamilies, but different in each one of them (or, in practical terms, in as many of groups as possible). The other score is geared toward finding determinants of one of the subfamilies: that is positions that are conserved in that group, and do whatever they like in the rest.

Conservation scoring metod used in Cube-DB is fully described here. Briefly, the score termed "real valued trace" uses as the conservation score the sum of average (information) entropies at each subdivision of the sequence similarity tree.

Each score is converted to ranking, and the results expressed in terms of the top fraction that a residue belongs to.

Orthologue selection in Cube-DB

In Cube-DB orthologous sequences are selected from the full-genome vertebrate sequences deposite in Ensembl database, using mutual-best-hit (a.k.a bidirectional best hit, BBH) strategy. For a recent review of BBH, see here.

Columns in the table and in the downloadable spreadsheet

* almt:   position in the overall alignment (can be downloaded from the right-side menu)
* gaps:   fraction of gaps in the column of the overall alignment
    overall:   conservation across all groups in the alignment, according to real valued trace
    group_1:   conservation in group named "group_1" using information entropy
    group_2:   ... the same for all groups in the cluster
    discr:   discriminants: the positions that are "conserved-but-different" across all groups
    dets_in_group_1:   determinants of the specificity of the group named "group_1" (more).
    dets_in_group_2:   ...
    for each group, the corresponding residue type and the sequential numbering
    in the human variant of the protein
* pdb_id: (when applicable) the numbering in the structure file
* pdb_aa: residue type in the structure file

Downloadable files

* score table in xls format:   see above for the column listing
* the overall alignment file   alignment of all sequences in the cluster, in the aligned fasta (afa) format
* score file   conservation and specialization scores in the plain text format
* pymol session for conservation and discriminants   scores that refer to the group as a whole (conservation and discriminants) mapped on the structure, in the format readable by PyMol molecular viewer
* pymol session file for determinants in group_1   determinants of specificity in the group named "group_1"
* chimera session file for conservation   conservation scores mapped on the structure, in the format readable by Chimera molecular viewer
* chimera session file for specialization  
* complete work directory   all of the above, as a zipped package

Some further tutorial-type of help in understanding the available files can be found here.