

Some effect of variable alignment was observed, but most often alignment had little or no consequence on phylogeny reconstruction in this study. We wanted to determine how “badly” (i.e., counter to available phylogenetic information) alignments could be contrived before traditionally recognized monophyletic families no longer associated with themselves in phylogeny reconstruction (i.e., Gruidae, Rallidae, and Heliornithidae). Regions of variable length were subjected to successive bouts of phylogeny reconstruction separately from the complete data set using maximum parsimony following alterations of alignment ( Section II,A,6). Right: Double loading of H strand, reverse complemented arrows indicate G and C bases not evident on opposite strands.įurther improvement of alignment was made according to the principle of interactive phylogenetic weighting ( Feng and Doolittle, 1987 Hein, 1990 Konings et al., 1987 Lake, 1991 Mindell, 1991 Thorne and Kishino, 1992). Autoradiograph of sequencing gel showing a common sequencing artifact in mitochondrial 12S rDNA, domain III, stem 32 ( Eurypyga helias shown). We detected no such instances.įIGURE 5.5. Inasmuch as nuclear pseudogenes are released from selective constraints, loss of conserved binding motifs and stem complementarity would be conspicuously absent in nuclear copies of mitochondrial rDNA. The hypothesis of an endosymbiont origin of mitochondria predicts the existence of nuclear copies of mitochondrial genes because the mitochondrial genomes themselves are depauperate in housekeeping genes ( Gray et al., 1984). The mapping of sequences onto structural models also served to monitor the possible existence of nuclear pseudogenes of mtDNA sequences ( Fukuda et al., 1985). In so doing, discrepancies between opposite strands resulting from “compressions” (i.e., bases missing on one strand but not the other) were discovered and resolved ( Fig.


Sequences were fitted to a map of secondary structure to identify complementary positions (e.g., Kjer, 1995). Sequence alignment was initiated with a pairwise similarity measure (MacVector 4.14 Needleman and Wunsch, 1970) and was improved by individual discretion (see below). Inserting point mutations can help to increase solubility.ĭetermination of where in the protein sequence solubility patches and orthologs of increased solubility are to be found may improve expression success.įor structural studies on membrane proteins and multidomain complexes, concentration on one or two domains and extramembranal areas is useful and facilitates crystallization. Strongly hydrophilic areas on the protein surface should be avoided, as well as the destruction of intramolecular contacts in α-helices or β-sheets caused by choosing cloning borders incorrectly. These items of information are necessary for plotting length and mutation planning. Sequence alignments of any protein of interest with any related proteins with a known structure can help to predict secondary structure elements: hydrophobic and hydrophilic parts of the protein surface or stabilizing disulfide bonds. Schnapp, in Comprehensive Medicinal Chemistry II, 2007 3.19.3.2 Sequence Alignments and Analysis To make sure that bcftools has been installed and added into the PATH environmental variable in your Linux environment. To make sure that samtools has been installed and added into the PATH environmental variable in your Linux environment.īCFTools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its Binary Call Format (BCF) counterpart. SAMTools is a tool box with multiple programs for manipulating alignments in the SAM format, including sorting, merging, indexing, and generating alignments in a per-position format. The SAM format has become the de facto standard format for storing large alignment results because there are several advantages: it is easy to understand, flexible enough to store various types of alignment information, and compact in size. The Sequence Alignment/Map (SAM) format is a generic format for storing large nucleotide sequence alignments. Yun Zheng, in Computational Non-coding RNA Biology, 2019 1.3.10 SAMTools and BCFTools
