Share this post on:

Enome?Methods Sequence retrievalThree datasets were constructed: (i) the p53 protein family at the whole protein level in vertebrates, (ii) the p53 protein family at the nucleotide level in vertebrates, and (iii) the p53 protein family at the DNA-binding domain level in a representative set of vertebrate sequences and non-vertebrates. For (i), NCBI BLAST [57] was APTO-253MedChemExpress LT-253 performed using the blastp get RG7800 algorithm with the human p53 protein sequence (NCBI reference sequence: NP_000537.3) against vertebrates in the RefSeq database [58]. To minimize redundancy, only the longest sequence from the samePLOS ONE | DOI:10.1371/journal.pone.0151961 March 22,18 /Evolutionary Dynamics of Sequence, Structure, and Phosphorylation in the p53, p63, and p73 Paralogsgene was chosen as the representative. Partial or much longer proteins were removed to maintain a high quality multiple sequence alignment. In some instances, sequences from key species missing in the RefSeq database were instead identified by BLAST against the nr database. For (ii), the corresponding nucleotide sequences for the amino acids sequences in (i) were retrieved from NCBI. For the final dataset (iii), NCBI BLAST was performed using the blastp algorithm with the human p53 protein DNA-binding domain excluding vertebrates in the RefSeq database to get non-vertebrate sequences. Partial proteins with an incomplete p53 DBD were removed to maintain a high quality multiple sequence alignment. To minimize redundancy and to reduce the dataset a selection of sequences was used. For major vertebrate taxonomic groups, a representative organism with sequence information for all three paralogs in the p53 protein family was selected from (i). Vertebrate organisms included in (iii) were: Homo sapiens, Bos taurus, Gallus gallus, Anolis carolinensis, Xenopus tropicalis, Latimeria chalumnae, Takifugu rubripes, Danio rerio, and Callorhinchus milii. Sequence identifiers for all vertebrate sequences are given in S1 Table and protein identifiers are included in the phylogenetic trees that show sequence names.Phylogenetic reconstructionSequences for datasets (i) and (iii) were aligned fpsyg.2017.00209 with MAFFT v7.123? [59] using the L-INS-i algorithm for a maximum of 1000 iterations. Sequences in dataset (ii) were aligned using TranslatorX [60] to map corresponding codons to the amino acid jir.2012.0140 alignment from (i). Phylogenetic trees for all datasets were constructed using MrBayes v3.2.2 [61]. For protein based phylogenies [(i) and (iii)], Bayesian MCMC analysis was performed using a mixed amino acid model with gamma distributed rate variation among sites. The nucleotide based phylogeny (ii) was estimated with Bayesian MCMC analysis using a GTR model with gamma distributed rate variation among sites. For all trees, MrBayes ran two simultaneous analyses (each with four chains: three heated and one cold) for 15 million generations with a sampling frequency of 100 generations. For dataset (i) the best tree was constructed with TBR branch swaps, while for (ii) and (iii) the best trees were constructed with TBR branch swaps disabled. The final average standard deviation of the split frequencies were 0.0060 (max. s.d. 0.051) for dataset (i), 0.0053 (max. s.d. 0.092) for dataset (ii), and 0.0023 (max. s.d. 0.016) for dataset (iii). Consensus trees were built with the default burn-in phase (discarding the first 25 of trees) using the 50 majority rule. The tree from the third dataset was rooted on a branch containing Monosiga brevicollis and Salp.Enome?Methods Sequence retrievalThree datasets were constructed: (i) the p53 protein family at the whole protein level in vertebrates, (ii) the p53 protein family at the nucleotide level in vertebrates, and (iii) the p53 protein family at the DNA-binding domain level in a representative set of vertebrate sequences and non-vertebrates. For (i), NCBI BLAST [57] was performed using the blastp algorithm with the human p53 protein sequence (NCBI reference sequence: NP_000537.3) against vertebrates in the RefSeq database [58]. To minimize redundancy, only the longest sequence from the samePLOS ONE | DOI:10.1371/journal.pone.0151961 March 22,18 /Evolutionary Dynamics of Sequence, Structure, and Phosphorylation in the p53, p63, and p73 Paralogsgene was chosen as the representative. Partial or much longer proteins were removed to maintain a high quality multiple sequence alignment. In some instances, sequences from key species missing in the RefSeq database were instead identified by BLAST against the nr database. For (ii), the corresponding nucleotide sequences for the amino acids sequences in (i) were retrieved from NCBI. For the final dataset (iii), NCBI BLAST was performed using the blastp algorithm with the human p53 protein DNA-binding domain excluding vertebrates in the RefSeq database to get non-vertebrate sequences. Partial proteins with an incomplete p53 DBD were removed to maintain a high quality multiple sequence alignment. To minimize redundancy and to reduce the dataset a selection of sequences was used. For major vertebrate taxonomic groups, a representative organism with sequence information for all three paralogs in the p53 protein family was selected from (i). Vertebrate organisms included in (iii) were: Homo sapiens, Bos taurus, Gallus gallus, Anolis carolinensis, Xenopus tropicalis, Latimeria chalumnae, Takifugu rubripes, Danio rerio, and Callorhinchus milii. Sequence identifiers for all vertebrate sequences are given in S1 Table and protein identifiers are included in the phylogenetic trees that show sequence names.Phylogenetic reconstructionSequences for datasets (i) and (iii) were aligned fpsyg.2017.00209 with MAFFT v7.123? [59] using the L-INS-i algorithm for a maximum of 1000 iterations. Sequences in dataset (ii) were aligned using TranslatorX [60] to map corresponding codons to the amino acid jir.2012.0140 alignment from (i). Phylogenetic trees for all datasets were constructed using MrBayes v3.2.2 [61]. For protein based phylogenies [(i) and (iii)], Bayesian MCMC analysis was performed using a mixed amino acid model with gamma distributed rate variation among sites. The nucleotide based phylogeny (ii) was estimated with Bayesian MCMC analysis using a GTR model with gamma distributed rate variation among sites. For all trees, MrBayes ran two simultaneous analyses (each with four chains: three heated and one cold) for 15 million generations with a sampling frequency of 100 generations. For dataset (i) the best tree was constructed with TBR branch swaps, while for (ii) and (iii) the best trees were constructed with TBR branch swaps disabled. The final average standard deviation of the split frequencies were 0.0060 (max. s.d. 0.051) for dataset (i), 0.0053 (max. s.d. 0.092) for dataset (ii), and 0.0023 (max. s.d. 0.016) for dataset (iii). Consensus trees were built with the default burn-in phase (discarding the first 25 of trees) using the 50 majority rule. The tree from the third dataset was rooted on a branch containing Monosiga brevicollis and Salp.

Share this post on:

Author: HMTase- hmtase