evolution of protein structure and function [EPSF]

Understanding and using the output of hypercube
(conservation and specialization scoring code)

The tut starts off by contrasting the orthologues from different groups of species. Bear with us, we'll get to paralogues.
1. Conservation of p53 residues across placental mammalian sequences.

The following refers to some sample files that can be found here.

In the attached tarball you can find the the selection of sequences from placental mammals. I was surprised to see how much variation is there.

The sequences you have in 4 files, the differences are as follows
placental_mammals.fasta --> just the sequences in fasta
placental_mammals.afa --> aligned fasta format (you can view it with seaview, or whatever is your favorite viewer)
placental_mammals.2ahiA.msf --> sequences + the sequence contained in 2ahiA.pdb, this is the input for the conservation scoring program
placentals.patched.afa ---> now this is one of the outputs of the program You will notice that some sequences have Xs - an omission in sequencing What the program does there is try to guess what these positions might be judging from the nearest specie. The past experience shows that the error is smaller than keeping the Xs. *.patched.afa is showing you what the guesses look like. If you (or some other user) come to believe that the guesses are too wild, the guessing can be scaled down or dropped entirely)

So these are the sequences based on which we score the degree of conservation of p53 in placental mammals. The "species" file should give you an idea which species were used - the name in the alignment file has the first three character of the first and the second part of the scientific name of each specie.

OK so far?

The conservation score is shown in the spreadsheet, in columns C and D. The colorbar is white-black-red. (Shown In column M). Why two columns? Column C is showing the conservation in terms of the top fraction (multiply by 100 to get percentages) of the whole alignment, and the column D is with respect to human p53. (That is "0.34" means: belonging to the top 34% conserved positions in the whole alignment.) If this is confusing, just delete the column C :)

The columns G and H are showing "specificity." We need another group (in addition to placentals) for the question of specificity to start making sense, so will tackle it in the next installment.

How come no position is better than top 34% conserved? Because 34% of the whole alignment is completely conserved, so we cannot tell that any positions is more favored than that. (Not until we start taking the similarity of amino acid positions into account, but that's another story, too).

Then I and J columns contain types and numbering in PDB, nothing mysterious there, I hope.

The column K is a bit more interesting. It is not produced by the conservation scoring program, I put it there by hand. It shows the "contacts" that each residue is making with DNA and with the other monomer in the structure (is it a crystallographic artifact?) (If it says AB_if, than it's contacts between chains A and B in 2ahi, otherwise its DNA. noc, bb, and min_dist mean: the total number of contacts (atoms within 5A) on the other chain, the number of backbone contacts in that number, and the minimal distance (in A) that the other chain can be found at. All tight contacts fall in our top 34% conserved residues. (Tight = many contacts, and not through backbone.)

The column K is an example of annotation. Can you add some more of this kind, "per-residue" annotation?

And then finally, there is the pymol session "placentals.pse," showing the whole conservation business mapped on the structure, using the same coloring scheme as used in the spreadsheet. Since you know how to manipulate Pymol, you can add your annotation there too.

2. Conservation of p53 residues across all mammals. Positions specific for placental mammals.

Let's check this out: I have added some non-placental mammals to the alignment.

If you open the pymol session (mammals.pse) from our tarball du jour, you will see that some blush appeared in our visualization of the conservation across species. That is because we have added some significant variability for the first time, and now only 20 or so percent of positions are completely conserved. Therefore we can distinguish between the top 20% conserved positions and everybody else. Before we couldn't: for placental mammals only, we could only distinguish between top 30% and everybody else. (Compare conservation across all mammals with the conservation in placentals only by clicking on "conservation" button - tu turn it off -, and then clicking on "cns_placentals" to see conservation in placentals only. Then go back to "conservation" to continue following the story below.)

There are no surprises, though. We can see that the most conserved residues are still the ones contacting DNA. We can even begin to see that the residues with the side chains that reach into DNA grooves are more conserved than the ones that contact the backbone.

The new thing here (even though not very reliable - see below) is that we can try to see which positions distinguish placental mammals from marsupials and monotremes. If you turn off the "conservation" button, and then click on "dts_placentals" you will see our first attempt to see positions which distinguish (or "determine") the placental sequences. They are colored orange. The more intense the orange color, the more we are certain that the position is specific for placentals, as opposed to monotremes and marsupials here.

The opposite of specific is, well, non-specific. Note though that non-specific positions can be either conserved across all mammals, or variable in all. In either case we cannot say that placentals have a unique preference at this position. A cluster or two of such positions can be seen on the side of the dimer(s) opposite to the DNA binding side.

However at the moment we cannot be terribly sure of the reliability of these findings, because the number of non-placental sequences is really small (only 3). In the next installment we'll try contrasting mammals with other vertebrates. Statistically, we might have a healthier situation there.

Finally, note that there is also a button called "spcificity" in the session you have opened. These are special determinants - they are conserved not only in placentals, but in other mammals too; however in monotremes and marsupials we see them as different residue type than in placentals.

How was this?