Using gene map science to evaluate the genetic map and eliminate disease

Genetic News

Eukaryotic genomes are rich in transcription units encoding "long noncoding RNAs" (lncRNAs). The purpose of all this transcription is unclear since most lncRNAs are quickly targeted for destruction during synthesis or shortly thereafter. As debates continue over the functional significance of many specific lncRNAs, support grows for the notion that the act of transcription rather than the RNA product itself is functionally important in many cases. Indeed, this alternative mechanism might better explain how low-abundance lncRNAs transcribed from noncoding DNA function in organisms. Here, we highlight some of the recently emerging features that distinguish coding from noncoding transcription and discuss how these differences might have important implications for the functional consequences of noncoding transcription.

A fundamental tenet of inheritance in sexually reproducing organisms such as humans and laboratory mice is that gametes combine randomly at fertilization, thereby ensuring a balanced and statistically predictable representation of inherited variants in each generation. This principle is encapsulated in Mendel’s First Law. But exceptions are known. With transmission ratio distortion, particular alleles are preferentially transmitted to offspring. Preferential transmission usually occurs in one sex but not both, and is not known to require interactions between gametes at fertilization. A reanalysis of our published work in mice and of data in other published reports revealed instances where any of 12 mutant genes biases fertilization, with either too many or too few heterozygotes and homozygotes, depending on the mutant gene and on dietary conditions. Although such deviations are usually attributed to embryonic lethality of the underrepresented genotypes, the evidence is more consistent with genetically-determined preferences for specific combinations of egg and sperm at fertilization that result in genotype bias without embryo loss. This unexpected discovery of genetically-biased fertilization could yield insights about the molecular and cellular interactions between sperm and egg at fertilization, with implications for our understanding of inheritance, reproduction, population genetics, and medical genetics.

Analysis of gene function in complex organisms relies extensively on tools to detect the cellular and subcellular localization of gene products, especially proteins. Typically, immunostaining with antibodies provides these data. However, due to cost, time, and labor limitations, generating specific antibodies against all proteins of a complex organism is not feasible. Furthermore, antibodies do not enable live imaging studies of protein dynamics. Hence, tagging genes with standardized immunoepitopes or fluorescent tags that permit live imaging has become popular. Importantly, tagging genes present in large genomic clones or at their endogenous locus often reports proper expression, subcellular localization, and dynamics of the encoded protein. Moreover, these tagging approaches allow the generation of elegant protein removal strategies, standardization of visualization protocols, and permit protein interaction studies using mass spectrometry. Here, we summarize available genomic resources and techniques to tag genes and discuss relevant applications that are rarely, if at all, possible with antibodies.

Lipid and carbohydrate metabolism are highly conserved processes that affect nearly all aspects of organismal biology. Caenorhabditis elegans eat bacteria, which consist of lipids, carbohydrates, and proteins that are broken down during digestion into fatty acids, simple sugars, and amino acid precursors. With these nutrients, C. elegans synthesizes a wide range of metabolites that are required for development and behavior. In this review, we outline lipid and carbohydrate structures as well as biosynthesis and breakdown pathways that have been characterized in C. elegans. We bring attention to functional studies using mutant strains that reveal physiological roles for specific lipids and carbohydrates during development, aging, and adaptation to changing environmental conditions.

Mutants remain a powerful means for dissecting gene function in model organisms such as Caenorhabditis elegans. Massively parallel sequencing has simplified the detection of variants after mutagenesis but determining precisely which change is responsible for phenotypic perturbation remains a key step. Genetic mapping paradigms in C. elegans rely on bulk segregant populations produced by crosses with the problematic Hawaiian wild isolate and an excess of redundant information from whole-genome sequencing (WGS). To increase the repertoire of available mutants and to simplify identification of the causal change, we performed WGS on 173 temperature-sensitive (TS) lethal mutants and devised a novel mapping method. The mapping method uses molecular inversion probes (MIP-MAP) in a targeted sequencing approach to genetic mapping, and replaces the Hawaiian strain with a Million Mutation Project strain with high genomic and phenotypic similarity to the laboratory wild-type strain N2. We validated MIP-MAP on a subset of the TS mutants using a competitive selection approach to produce TS candidate mapping intervals with a mean size < 3 Mb. MIP-MAP successfully uses a non-Hawaiian mapping strain and multiplexed libraries are sequenced at a fraction of the cost of WGS mapping approaches. Our mapping results suggest that the collection of TS mutants contains a diverse library of TS alleles for genes essential to development and reproduction. MIP-MAP is a robust method to genetically map mutations in both viable and essential genes and should be adaptable to other organisms. It may also simplify tracking of individual genotypes within population mixtures.

One difficulty when identifying alternative splicing (AS) events in plants is distinguishing functional AS from splicing noise. One way to add confidence to the validity of a splice isoform is to observe that it is conserved across evolutionarily related species. We use a high throughput method to identify junction-based conserved AS events from RNA-Seq data across nine plant species, including five grass monocots (maize, sorghum, rice, Brachpodium, and foxtail millet), plus two nongrass monocots (banana and African oil palm), the eudicot Arabidopsis, and the basal angiosperm Amborella. In total, 9804 AS events were found to be conserved between two or more species studied. In grasses containing large regions of conserved synteny, the frequency of conserved AS events is twice that observed for genes outside of conserved synteny blocks. In plant-specific RS and RS2Z subfamilies of the serine/arginine (SR) splice-factor proteins, we observe both conservation and divergence of AS events after the whole genome duplication in maize. In addition, plant-specific RS and RS2Z splice-factor subfamilies are highly connected with R2R3-MYB in STRING functional protein association networks built using genes exhibiting conserved AS. Furthermore, we discovered that functional protein association networks constructed around genes harboring conserved AS events are enriched for phosphatases, kinases, and ubiquitylation genes, which suggests that AS may participate in regulating signaling pathways. These data lay the foundation for identifying and studying conserved AS events in the monocots, particularly across grass species, and this conserved AS resource identifies an additional layer between genotype to phenotype that may impact future crop improvement efforts.

Mendelian randomization is the use of genetic variants as instrumental variables to estimate causal effects of risk factors on outcomes. The total causal effect of a risk factor is the change in the outcome resulting from intervening on the risk factor. This total causal effect may potentially encompass multiple mediating mechanisms. For a proposed mediator, the direct effect of the risk factor is the change in the outcome resulting from a change in the risk factor, keeping the mediator constant. A difference between the total effect and the direct effect indicates that the causal pathway from the risk factor to the outcome acts at least in part via the mediator (an indirect effect). Here, we show that Mendelian randomization estimates of total and direct effects can be obtained using summarized data on genetic associations with the risk factor, mediator, and outcome, potentially from different data sources. We perform simulations to test the validity of this approach when there is unmeasured confounding and/or bidirectional effects between the risk factor and mediator. We illustrate this method using the relationship between age at menarche and risk of breast cancer, with body mass index (BMI) as a potential mediator. We show an inverse direct causal effect of age at menarche on risk of breast cancer (independent of BMI), and a positive indirect effect via BMI. In conclusion, multivariable Mendelian randomization using summarized genetic data provides a rapid and accessible analytic strategy that can be undertaken using publicly available data to better understand causal mechanisms.

Today, genomic prediction (GP) is an established technology in plant and animal breeding programs. Current standard methods are purely based on statistical considerations but do not make use of the abundant biological knowledge, which is easily available from public databases. Major questions that have to be answered before biological prior information can be used routinely in GP approaches are which types of information can be used, and at which points they can be incorporated into prediction methods. In this study, we propose a novel strategy to incorporate gene annotation into GP of complex phenotypes by defining haploblocks according to gene positions. Haplotype effects are then modeled as categorical or as numerical allele dosage variables. The underlying concept of this approach is to build the statistical model on variables representing the biologically functional units. We evaluate the new methods with data from a heterogeneous stock mouse population, the Drosophila Genetic Reference Panel (DGRP), and a rice breeding population from the Rice Diversity Panel. Our results show that using gene annotation to define haploblocks often leads to a comparable, but for some traits to a higher, predictive ability compared to SNP-based models or to haplotype models that do not use gene annotation information. Modeling gene interaction effects can further improve predictive ability. We also illustrate that the additional use of markers that have not been mapped to any gene in a second separate relatedness matrix does in many cases not lead to a relevant additional increase in predictive ability when the first matrix is based on haploblocks defined with gene annotation data, suggesting that intergenic markers only provide redundant information on the considered data sets. Therefore, gene annotation information seems to be appropriate to perceive the importance of DNA segments. Finally, we discuss the effects of gene annotation quality, marker density, and linkage disequilibrium on the performance of the new methods. To our knowledge, this is the first work that incorporates epistatic interaction or gene annotation into haplotype-based prediction approaches.

Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations.

A key unresolved issue in molecular evolution is how paralogs diverge after gene duplication. For multifunctional genes, duplication is often followed by subfunctionalization. Subsequently, new or optimized molecular properties may evolve once the protein is no longer constrained to achieve multiple functions. A potential example of this process is the evolution of the yeast heterochromatin protein Sir3, which arose by duplication from the conserved DNA replication protein Orc1. We previously found that Sir3 subfunctionalized after duplication. In this study, we investigated whether Sir3 evolved new or optimized properties after subfunctionalization . This possibility is supported by our observation that nonduplicated Orc1/Sir3 proteins from three species were unable to complement a sir3 mutation in Saccharomyces cerevisiae. To identify regions of Sir3 that may have evolved new properties, we created chimeric proteins of ScSir3 and nonduplicated Orc1 from Kluyveromyces lactis. We identified the AAA+ base subdomain of KlOrc1 as insufficient for heterochromatin formation in S. cerevisiae. In Orc1, this subdomain is intimately associated with other ORC subunits, enabling ATP hydrolysis. In Sir3, this subdomain binds Sir4 and perhaps nucleosomes. Our data are inconsistent with the insufficiency of KlOrc1 resulting from its ATPase activity or an inability to bind ScSir4. Thus, once Sir3 was no longer constrained to assemble into the ORC complex, its heterochromatin-forming potential evolved through changes in the AAA+ base subdomain.

Repetitive DNA sequences are subject to gene silencing in various animal species. Under specific circumstances repetitive DNA sequences can escape such silencing. For example, exogenously added, extrachromosomal DNA sequences that are stably inherited in multicopy repetitive arrays in the nematode Caenorhabditis elegans are frequently silenced in the germline, whereas such silencing often does not occur in the soma. This indicates that somatic cells might utilize factors that prevent repetitive DNA silencing. Indeed, such "antisilencing" factors have been revealed through genetic screens that identified mutant loci in which repetitive transgenic arrays are aberrantly silenced in the soma. We describe here a novel locus, pals-22 (for protein containing ALS2CR12 signature), required to prevent silencing of repetitive transgenes in neurons and other somatic tissue types. pals-22 deficiency also severely impacts animal vigor and confers phenotypes reminiscent of accelerated aging. We find that pals-22 is a member of a large family of divergent genes (39 members), defined by homology to the ALS2CR12 protein family. While gene family members are highly divergent, they show striking patterns of chromosomal clustering. The family expansion appears C. elegans-specific and has not occurred to the same extent in other nematode species for which genome sequences are available. The transgene-silencing phenotype observed upon loss of PALS-22 protein depends on the biogenesis of small RNAs. We speculate that the pals gene family may be part of a species-specific cellular defense mechanism.

Eukaryotic chromosome segregation requires a protein complex known as the kinetochore that mediates attachment between mitotic spindle microtubules and centromere-specific nucleosomes composed of the widely conserved histone variant CENP-A. Mutations in kinetochore proteins of the fission yeast Schizosaccharomyces pombe lead to chromosome missegregation such that daughter cells emerge from mitosis with unequal DNA content. We find that multiple copies of Msc1—a fission yeast homolog of the KDM5 family of proteins—suppresses the temperature-sensitive growth defect of several kinetochore mutants, including mis16 and mis18, as well as mis6, mis15, and mis17, components of the Constitutive Centromere Associated Network (CCAN). On the other hand, deletion of msc1 exacerbates both the growth defect and chromosome missegregation phenotype of each of these mutants. The C-terminal PHD domains of Msc1, previously shown to associate with a histone deacetylase activity, are necessary for Msc1 function when kinetochore mutants are compromised. We also demonstrate that, in the absence of Msc1, the frequency of localization to the kinetochore of Mis16 and Mis15 is altered from wild-type cells. As we show here for msc1, others have shown that elevating cnp1 levels acts similarly to promote survival of the CCAN mutants. The rescue of mis15 and mis17 by cnp1 is, however, independent of msc1. Thus, Msc1 appears to contribute to the chromatin environment at the centromere: the absence of Msc1 sensitizes cells to perturbations in kinetochore function, while elevating Msc1 overcomes loss of function of critical components of the kinetochore and centromere.

Stress-induced sleep (SIS) in Caenorhabditis elegans is important for restoration of cellular homeostasis and is a useful model to study the function and regulation of sleep. SIS is triggered when epidermal growth factor (EGF) activates the ALA neuron, which then releases neuropeptides to promote sleep. To further understand this behavior, we established a new model of SIS using irradiation by ultraviolet C (UVC) light. While UVC irradiation requires ALA signaling and leads to a sleep state similar to that induced by heat and other stressors, it does not induce the proteostatic stress seen with heat exposure. Based on the known genotoxic effects of UVC irradiation, we tested two genes, atl-1 and cep-1, which encode proteins that act in the DNA damage response pathway. Loss-of-function mutants of atl-1 had no defect in UVC-induced SIS but a partial loss-of-function mutant of cep-1, gk138, had decreased movement quiescence following UVC irradiation. Germline ablation experiments and tissue-specific RNA interference experiments showed that cep-1 is required somatically in neurons for its effect on SIS. The cep-1(gk138) mutant suppressed body movement quiescence controlled by EGF, indicating that CEP-1 acts downstream or in parallel to ALA activation to promote quiescence in response to ultraviolet light.

The evolution of complex body plans in land plants has been paralleled by gene duplication and divergence within nuclear auxin-signaling networks. A deep mechanistic understanding of auxin signaling proteins therefore may allow rational engineering of novel plant architectures. Toward that end, we analyzed natural variation in the auxin receptor F-box family of wild accessions of the reference plant Arabidopsis thaliana and used this information to populate a structure/function map. We employed a synthetic assay to identify natural hypermorphic F-box variants and then assayed auxin-associated phenotypes in accessions expressing these variants. To more directly measure the impact of the strongest variant in our synthetic assay on auxin sensitivity, we generated transgenic plants expressing this allele. Together, our findings link evolved sequence variation to altered molecular performance and auxin sensitivity. This approach demonstrates the potential for combining synthetic biology approaches with quantitative phenotypes to harness the wealth of available sequence information and guide future engineering efforts of diverse signaling pathways.

Circadian clocks organize the metabolism, physiology, and behavior of organisms throughout the day–night cycle by controlling daily rhythms in gene expression at the transcriptional and post-transcriptional levels. While many transcription factors underlying circadian oscillations are known, the splicing factors that modulate these rhythms remain largely unexplored. A genome-wide assessment of the alterations of gene expression in a null mutant of the alternative splicing regulator SR-related matrix protein of 160 kDa (SRm160) revealed the extent to which alternative splicing impacts on behavior-related genes. We show that SRm160 affects gene expression in pacemaker neurons of the Drosophila brain to ensure proper oscillations of the molecular clock. A reduced level of SRm160 in adult pacemaker neurons impairs circadian rhythms in locomotor behavior, and this phenotype is caused, at least in part, by a marked reduction in period (per) levels. Moreover, rhythmic accumulation of the neuropeptide PIGMENT DISPERSING FACTOR in the dorsal projections of these neurons is abolished after SRm160 depletion. The lack of rhythmicity in SRm160-downregulated flies is reversed by a fully spliced per construct, but not by an extra copy of the endogenous locus, showing that SRm160 positively regulates per levels in a splicing-dependent manner. Our findings highlight the significant effect of alternative splicing on the nervous system and particularly on brain function in an in vivo model.

Large-scale forward genetic screens have been instrumental for identifying genes that regulate development, homeostasis, and regeneration, as well as the mechanisms of disease. The zebrafish, Danio rerio, is an established genetic and developmental model used in genetic screens to uncover genes necessary for early development. However, the regulation of postembryonic development has received less attention as these screens are more labor intensive and require extensive resources. The lack of systematic interrogation of late development leaves large aspects of the genetic regulation of adult form and physiology unresolved. To understand the genetic control of postembryonic development, we performed a dominant screen for phenotypes affecting the adult zebrafish. In our screen, we identified 72 adult viable mutants showing changes in the shape of the skeleton as well as defects in pigmentation. For efficient mapping of these mutants and mutation identification, we devised a new mapping strategy based on identification of mutant-specific haplotypes. Using this method in combination with a candidate gene approach, we were able to identify linked mutations for 22 out of 25 mutants analyzed. Broadly, our mutational analysis suggests that there are key genes and pathways associated with late development. Many of these pathways are shared with humans and are affected in various disease conditions, suggesting constraint in the genetic pathways that can lead to change in adult form. Taken together, these results show that dominant screens are a feasible and productive means to identify mutations that can further our understanding of gene function during postembryonic development and in disease.

A lipid and glycoprotein-rich apical extracellular matrix (aECM) or glycocalyx lines exposed membranes in the body, and is particularly important to protect narrow tube integrity. Lipocalins ("fat cups") are small, secreted, cup-shaped proteins that bind and transport lipophilic cargo and are often found in luminal or aECM compartments such as mammalian plasma, urine, or tear film. Although some lipocalins can bind known aECM lipids and/or matrix metalloproteinases, it is not known if and how lipocalins affect aECM structure due to challenges in visualizing the aECM in most systems. Here we show that two Caenorhabditis elegans lipocalins, LPR-1 and LPR-3, have distinct functions in the precuticular glycocalyx of developing external epithelia. LPR-1 moves freely through luminal compartments, while LPR-3 stably localizes to a central layer of the membrane-anchored glycocalyx, adjacent to the transient zona pellucida domain protein LET-653. Like LET-653 and other C. elegans glycocalyx components, these lipocalins are required to maintain the patency of the narrow excretory duct tube, and also affect multiple aspects of later cuticle organization. lpr-1 mutants cannot maintain a continuous excretory duct apical domain and have misshapen cuticle ridges (alae) and abnormal patterns of cuticular surface lipid staining. lpr-3 mutants cannot maintain a passable excretory duct lumen, properly degrade the eggshell, or shed old cuticle during molting, and they lack cuticle barrier function. Based on these phenotypes, we infer that both LPR-1 and LPR-3 are required to build a properly organized aECM, while LPR-3 additionally is needed for aECM clearance and remodeling. The C. elegans glycocalyx provides a powerful system, amenable to both genetic analysis and live imaging, for investigating how lipocalins and lipids affect aECM structure.

Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85–100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.

Frequency-independent selection is generally considered as a force that acts to reduce the genetic variation in evolving populations, yet rigorous arguments for this idea are scarce. When selection fluctuates in time, it is unclear whether frequency-independent selection may maintain genetic polymorphism without invoking additional mechanisms. We show that constant frequency-independent selection with arbitrary epistasis on a well-mixed haploid population eliminates genetic variation if we assume linkage equilibrium between alleles. To this end, we introduce the notion of frequency-independent selection at the level of alleles, which is sufficient to prove our claim and contains the notion of frequency-independent selection on haploids. When selection and recombination are weak but of the same order, there may be strong linkage disequilibrium; numerical calculations show that stable equilibria are highly unlikely. Using the example of a diallelic two-locus model, we then demonstrate that frequency-independent selection that fluctuates in time can maintain stable polymorphism if linkage disequilibrium changes its sign periodically. We put our findings in the context of results from the existing literature and point out those scenarios in which the possible role of frequency-independent selection in maintaining genetic variation remains unclear.

Recent theory predicts that the fitness of pioneer populations can decline when species expand their range, due to high rates of genetic drift on wave fronts making selection less efficient at purging deleterious variants. To test these predictions, we studied the fate of mutator bacteria expanding their range for 1650 generations on agar plates. In agreement with theory, we find that growth abilities of strains with a high mutation rate (HMR lines) decreased significantly over time, unlike strains with a lower mutation rate (LMR lines) that present three to four times fewer mutations. Estimation of the distribution of fitness effect under a spatially explicit model reveals a mean negative effect for new mutations (–0.38%), but it suggests that both advantageous and deleterious mutations have accumulated during the experiment. Furthermore, the fitness of HMR lines measured in different environments has decreased relative to the ancestor strain, whereas that of LMR lines remained unchanged. Contrastingly, strains with a HMR evolving in a well-mixed environment accumulated less mutations than agar-evolved strains and showed an increased fitness relative to the ancestor. Our results suggest that spatially expanding species are affected by deleterious mutations, leading to a drastic impairment of their evolutionary potential.

X and Y chromosomes differ in effective population size (Ne), rates of recombination, and exposure to natural selection, all of which can affect patterns of genetic diversity. On Y chromosomes with suppressed recombination, natural selection is expected to eliminate linked neutral variation, and lower the Ne of Y compared to X chromosomes or autosomes. However, female-biased sex ratios and high variance in male reproductive success can also reduce Y-linked Ne, making it difficult to infer the causes of low Y-diversity. Here, we investigate the factors affecting levels of polymorphism during sex chromosome evolution in the dioecious plant Rumex hastatulus (Polygonaceae). Strikingly, we find that neutral diversity for genes on the Y chromosome is, on average, 2.1% of the value for their X-linked homologs, corresponding to a chromosome-wide reduction of 93% compared to the standard neutral expectation. We demonstrate that the magnitude of this diversity loss is inconsistent with reduced male Ne caused by neutral processes. Instead, using forward simulations and estimates of the distribution of deleterious fitness effects, we show that Y chromosome diversity loss can be explained by purifying selection acting in aggregate over a large number of genetically linked sites. Simulations also suggest that our observed level of Y-diversity is consistent with the joint action of purifying and positive selection, but only for models in which there were fewer constrained sites than we empirically estimated. Given the relatively recent origin of R. hastatulus sex chromosomes, our results imply that Y-chromosome degeneration in the early stages may be largely driven by selective interference rather than by neutral genetic drift of silenced Y-linked genes.

A long-standing evolutionary puzzle is that all eukaryotic genomes contain large amounts of tandemly-repeated DNA whose sequence motifs and abundance vary greatly among even closely related species. To elucidate the evolutionary forces governing tandem repeat dynamics, quantification of the rates and patterns of mutations in repeat copy number and tests of its selective neutrality are necessary. Here, we used whole-genome sequences of 28 mutation accumulation (MA) lines of Daphnia pulex, in addition to six isolates from a non-MA population originating from the same progenitor, to both estimate mutation rates of abundances of repeat sequences and evaluate the selective regime acting upon them. We found that mutation rates of individual repeats were both high and highly variable, ranging from additions/deletions of 0.29–105 copies per generation (reflecting changes of 0.12–0.80% per generation). Our results also provide evidence that new repeat sequences are often formed from existing ones. The non-MA population isolates showed a signal of either purifying or stabilizing selection, with 33% lower variation in repeat copy number on average than the MA lines, although the level of selective constraint was not evenly distributed across all repeats. The changes between many pairs of repeats were correlated, and the pattern of correlations was significantly different between the MA lines and the non-MA population. Our study demonstrates that tandem repeats can experience extremely rapid evolution in copy number, which can lead to high levels of divergence in genome-wide repeat composition between closely related species.

Evolutionary transitions between male and female heterogamety are common in both vertebrates and invertebrates. Theoretical studies of these transitions have found that, when all genotypes are equally fit, continuous paths of intermediate equilibria link the two sex chromosome systems. This observation has led to a belief that neutral evolution along these paths can drive transitions, and that arbitrarily small fitness differences among sex chromosome genotypes can determine the system to which evolution leads. Here, we study stochastic evolutionary dynamics along these equilibrium paths. We find non-neutrality, both in transitions retaining the ancestral pair of sex chromosomes, and in those creating a new pair. In fact, substitution rates are biased in favor of dominant sex determining chromosomes, which fix with higher probabilities than mutations of no effect. Using diffusion approximations, we show that this non-neutrality is a result of "drift-induced selection" operating at every point along the equilibrium paths: stochastic jumps off the paths return with, on average, a directional bias in favor of the dominant segregating sex chromosome. Our results offer a novel explanation for the observed preponderance of dominant sex determining genes, and hint that drift-induced selection may be a common force in standard population genetic systems.

Y chromosome function, structure and evolution is poorly understood in many species, including the Anopheles genus of mosquitoes—an emerging model system for studying speciation that also represents the major vectors of malaria. While the Anopheline Y had previously been implicated in male mating behavior, recent data from the Anopheles gambiae complex suggests that, apart from the putative primary sex-determiner, no other genes are conserved on the Y. Studying the functional basis of the evolutionary divergence of the Y chromosome in the gambiae complex is complicated by complete F1 male hybrid sterility. Here, we used an F1 F0 crossing scheme to overcome a severe bottleneck of male hybrid incompatibilities that enabled us to experimentally purify a genetically labeled A. gambiae Y chromosome in an A. arabiensis background. Whole genome sequencing (WGS) confirmed that the A. gambiae Y retained its original sequence content in the A. arabiensis genomic background. In contrast to comparable experiments in Drosophila, we find that the presence of a heterospecific Y chromosome has no significant effect on the expression of A. arabiensis genes, and transcriptional differences can be explained almost exclusively as a direct consequence of transcripts arising from sequence elements present on the A. gambiae Y chromosome itself. We find that Y hybrids show no obvious fertility defects, and no substantial reduction in male competitiveness. Our results demonstrate that, despite their radically different structure, Y chromosomes of these two species of the gambiae complex that diverged an estimated 1.85 MYA function interchangeably, thus indicating that the Y chromosome does not harbor loci contributing to hybrid incompatibility. Therefore, Y chromosome gene flow between members of the gambiae complex is possible even at their current level of divergence. Importantly, this also suggests that malaria control interventions based on sex-distorting Y drive would be transferable, whether intentionally or contingent, between the major malaria vector species.

The organization of functional regions within genomes has important implications for evolutionary potential. Considerable research effort has gone toward identifying the genomic basis of phenotypic traits of interest through quantitative trait loci (QTL) analyses. Less research has assessed the arrangement of QTL in the genome within and across species. To investigate the distribution, extent of colocalization, and the synteny of QTL for ecologically relevant traits, we used a comparative genomic mapping approach within and across a range of salmonid species. We compiled 943 QTL from all available species [lake whitefish (Coregonus clupeaformis), coho salmon (Oncorhynchus kisutch), rainbow trout (O. mykiss), Chinook salmon (O. tshawytscha), Atlantic salmon (Salmo salar), and Arctic charr (Salvelinus alpinus)]. We developed a novel analytical framework for mapping and testing the distribution of these QTL. We found no correlation between QTL density and gene density at the chromosome level but did at the fine-scale. Two chromosomes were significantly enriched for QTL. We found multiple synteny blocks for morphological, life history, and physiological traits across species, but only morphology and physiology had significantly more than expected. Two or three pairs of traits were significantly colocalized in three species (lake whitefish, coho salmon, and rainbow trout). Colocalization and fine-scale synteny suggest genetic linkage between traits within species and a conserved genetic basis across species. However, this pattern was weak overall, with colocalization and synteny being relatively rare. These findings advance our understanding of the role of genomic organization in the renowned ecological and phenotypic variability of salmonid fishes.

Selection during evolution, whether natural or artificial, acts through the phenotype. For multifaceted phenotypes such as plant and inflorescence architecture, the underlying genetic architecture is comprised of a complex network of interacting genes rather than single genes that act independently to determine the trait. As such, selection acts on entire gene networks. Here, we begin to define the genetic regulatory network to which the maize domestication gene, teosinte branched1 (tb1), belongs. Using a combination of molecular methods to uncover either direct or indirect regulatory interactions, we identified a set of genes that lie downstream of tb1 in a gene network regulating both plant and inflorescence architecture. Additional genes, known from the literature, also act in this network. We observed that tb1 regulates both core cell cycle genes and another maize domestication gene, teosinte glume architecture1 (tga1). We show that several members of the MADS-box gene family are either directly or indirectly regulated by tb1 and/or tga1, and that tb1 sits atop a cascade of transcriptional regulators controlling both plant and inflorescence architecture. Multiple members of the tb1 network appear to have been the targets of selection during maize domestication. Knowledge of the regulatory hierarchies controlling traits is central to understanding how new morphologies evolve.

Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and data sets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified ~50,000 crossover hotspots on the genome, and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting interindividual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly two other loci of smaller effects including the KCNJ15 and FSHR genes. The comparison of these new results to those obtained previously in a distantly related population of domestic sheep (the Soay) revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome. The two data sets were thus combined to create more precise male meiotic recombination maps in Sheep. However, despite their similar recombination maps, Soay and Lacaune males were found to exhibit different heritabilities and QTL effects for interindividual variation in genome-wide recombination rates. This highlights the robustness of recombination patterns to underlying variation in their genetic determinism.

Selection experiments and experimental evolution provide unique opportunities to study the genetics of adaptation because the target and intensity of selection are known relatively precisely. In contrast to natural selection, where populations are never strictly "replicated," experimental evolution routinely includes replicate lines so that selection signatures—genomic regions showing excessive differentiation between treatments—can be separated from possible founder effects, genetic drift, and multiple adaptive solutions. We developed a mouse model with four lines within a high running (HR) selection treatment and four nonselected controls (C). At generation 61, we sampled 10 mice of each line and used the Mega Mouse Universal Genotyping Array to obtain single nucleotide polymorphism (SNP) data for 25,318 SNPs for each individual. Using an advanced mixed model procedure developed in this study, we identified 152 markers that were significantly different in frequency between the two selection treatments. They occurred on all chromosomes except 1, 2, 8, 13, and 19, and showed a variety of patterns in terms of fixation (or the lack thereof) in the four HR and four C lines. Importantly, none were fixed for alternative alleles between the two selection treatments. The current state-of-the-art regularized F test applied after pooling DNA samples for each line failed to detect any markers. We conclude that when SNP or sequence data are available from individuals, the mixed model methodology is recommended for selection signature detection. As sequencing at the individual level becomes increasingly feasible, the new methodology may be routinely applied for detection of selection.

Mucus hyper-secretion is a hallmark feature of asthma and other muco-obstructive airway diseases. The mucin proteins MUC5AC and MUC5B are the major glycoprotein components of mucus and have critical roles in airway defense. Despite the biomedical importance of these two proteins, the loci that regulate them in the context of natural genetic variation have not been studied. To identify genes that underlie variation in airway mucin levels, we performed genetic analyses in founder strains and incipient lines of the Collaborative Cross (CC) in a house dust mite mouse model of asthma. CC founder strains exhibited significant differences in MUC5AC and MUC5B, providing evidence of heritability. Analysis of gene and protein expression of Muc5ac and Muc5b in incipient CC lines (n = 154) suggested that post-transcriptional events were important regulators of mucin protein content in the airways. Quantitative trait locus (QTL) mapping identified distinct, trans protein QTL for MUC5AC (chromosome 13) and MUC5B (chromosome 2). These two QTL explained 18 and 20% of phenotypic variance, respectively. Examination of the MUC5B QTL allele effects and subsequent phylogenetic analysis allowed us to narrow the MUC5B QTL and identify Bpifb1 as a candidate gene. Bpifb1 mRNA and protein expression were upregulated in parallel to MUC5B after allergen challenge, and Bpifb1 knockout mice exhibited higher MUC5B expression. Thus, BPIFB1 is a novel regulator of MUC5B.

Retrotransposons (RTs) can rapidly increase in copy number due to periodic bursts of transposition. Such bursts are mutagenic and thus potentially deleterious. However, certain transposition-induced gain-of-function or regulatory mutations may be of selective advantage. How an optimal balance between these opposing effects arises is not well characterized. Here, we studied transposition bursts of a heat-activated retrotransposon family in Arabidopsis. We recorded a high inter and intraplant variation in the number and chromosomal position of new insertions, which usually did not affect plant fertility and were equally well transmitted through male and female gametes, even though 90% of them were within active genes. We found that a highly heterogeneous distribution of these new retroelement copies result from a combination of two mechanisms, of which the first prevents multiple transposition bursts in a given somatic cell lineage that later contributes to differentiation of gametes, and the second restricts the regulatory influence of new insertions toward neighboring chromosomal DNA. As a whole, such regulatory characteristics of this family of RTs ensure its rapid but stepwise accumulation in plant populations experiencing transposition bursts accompanied by high diversity of chromosomal sites harboring new RT insertions.



Genetic Markers

You know how an interstate map can guide you from one city to another. A genetic map is like that, and it guides researchers toward their target gene. Just as there are landmarks in interstate maps, there also are landmarks in genetic maps known as genetic markers...
Read More


Genetic Linkage Map

A chromosome map showing the relative positions of the known genes on the chromosomes of a given species.