Using gene map science to evaluate the genetic map and eliminate disease

Genetic News

The genetics and evolution of complex traits, including quantitative traits and disease, have been hotly debated ever since Darwin. A century ago, a paper from R.A. Fisher reconciled Mendelian and biometrical genetics in a landmark contribution that is now accepted as the main foundation stone of the field of quantitative genetics. Here, we give our perspective on Fisher’s 1918 paper in the context of how and why it is relevant in today’s genome era. We mostly focus on human trait variation, in part because Fisher did so too, but the conclusions are general and extend to other natural populations, and to populations undergoing artificial selection.

In this Review, we focus on the similarity of the concepts underlying prediction of estimated breeding values (EBVs) in livestock and polygenic risk scores (PRS) in humans. Our research spans both fields and so we recognize factors that are very obvious for those in one field, but less so for those in the other. Differences in family size between species is the wedge that drives the different viewpoints and approaches. Large family size achievable in nonhuman species accompanied by selection generates a smaller effective population size, increased linkage disequilibrium and a higher average genetic relationship between individuals within a population. In human genetic analyses, we select individuals unrelated in the classical sense (coefficient of relationship <0.05) to estimate heritability captured by common SNPs. In livestock data, all animals within a breed are to some extent "related," and so it is not possible to select unrelated individuals and retain a data set of sufficient size to analyze. These differences directly or indirectly impact the way data analyses are undertaken. In livestock, genetic segregation variance exposed through samplings of parental genomes within families is directly observable and taken for granted. In humans, this genomic variation is under-recognized for its contribution to variation in polygenic risk of common disease, in both those with and without family history of disease. We explore the equation that predicts the expected proportion of variance explained using PRS, and quantify how GWAS sample size is the key factor for maximizing accuracy of prediction in both humans and livestock. Last, we bring together the concepts discussed to address some frequently asked questions.

CRISPR-based genome-editing methods in model organisms are evolving at an extraordinary speed. Whereas the generation of deletion or missense mutants is quite straightforward, the production of endogenous fluorescent reporters is more challenging. We have developed Nested CRISPR, a cloning-free ribonucleoprotein-driven method that robustly produces endogenous fluorescent reporters with EGFP, mCherry or wrmScarlet in Caenorhabditis elegans. This method is based on the division of the fluorescent protein (FP) sequence in three fragments. In the first step, single-stranded DNA (ssDNA) donors (≤200 bp) are used to insert the 5' and 3' fragments of the FP in the locus of interest. In the second step, these sequences act as homology regions for homology-directed repair using a double-stranded DNA (dsDNA) donor (PCR product) containing the middle fragment, thus completing the FP sequence. In Nested CRISPR, the first step involving ssDNA donors is a well-established method that yields high editing efficiencies, and the second step is reliable because it uses universal CRISPR RNAs (crRNAs) and PCR products. We have also used Nested CRISPR in a nonessential gene to produce a deletion mutant in the first step and a transcriptional reporter in the second step. In the search for modifications to optimize the method, we tested synthetic single guide RNAs (sgRNAs), but did not observe a significant increase in efficiency. To streamline the approach, we combined all step 1 and step 2 reagents in a single injection and were successful in three of five loci tested with editing efficiencies of up to 20%. Finally, we discuss the prospects of this method in the future.

To understand gene function, the cre/loxP conditional system is the most powerful available for temporal and spatial control of expression in mouse. However, the research community requires more cre recombinase expressing transgenic mouse strains (cre-drivers) that restrict expression to specific cell types. To address these problems, a high-throughput method for large-scale production that produces high-quality results is necessary. Further, endogenous promoters need to be chosen that drive cell type specific expression, or we need to further focus the expression by manipulating the promoter. Here we test the suitability of using knock-ins at the docking site 5' of Hprt for rapid development of numerous cre-driver strains focused on expression in adulthood, using an improved cre tamoxifen inducible allele (icre/ERT2), and testing a novel inducible-first, constitutive-ready allele (icre/f3/ERT2/f3). In addition, we test two types of promoters either to capture an endogenous expression pattern (MaxiPromoters), or to restrict expression further using minimal promoter element(s) designed for expression in restricted cell types (MiniPromoters). We provide new cre-driver mouse strains with applicability for brain and eye research. In addition, we demonstrate the feasibility and applicability of using the locus 5' of Hprt for the rapid generation of substantial numbers of cre-driver strains. We also provide a new inducible-first constitutive-ready allele to further speed cre-driver generation. Finally, all these strains are available to the research community through The Jackson Laboratory.

High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have $${\rho }^{2}\approx 10\%$$, standard two-step methods all have $$ > 10$$-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.

Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number ($${\beta }_{1}$$) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call , with a natural connection to coalescent models, and present novel arguments relating $${\beta }_{1}$$ to population genetic models. Using simulations, we show that and $${\beta }_{1}$$ are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE’s efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.

Enhancers and promoters both regulate gene expression by recruiting transcription factors (TFs); however, the degree to which enhancer vs. promoter activity is due to differences in their sequences or to genomic context is the subject of ongoing debate. We examined this question by analyzing the sequences of thousands of transcribed enhancers and promoters from hundreds of cellular contexts previously identified by cap analysis of gene expression. Support vector machine classifiers trained on counts of all possible 6-bp-long sequences (6-mers) were able to accurately distinguish promoters from enhancers and distinguish their breadth of activity across tissues. Classifiers trained to predict enhancer activity also performed well when applied to promoter prediction tasks, but promoter-trained classifiers performed poorly on enhancers. This suggests that the learned sequence patterns predictive of enhancer activity generalize to promoters, but not vice versa. Our classifiers also indicate that there are functionally relevant differences in enhancer and promoter GC content beyond the influence of CpG islands. Furthermore, sequences characteristic of broad promoter or broad enhancer activity matched different TFs, with predicted ETS- and RFX-binding sites indicative of promoters, and AP-1 sites indicative of enhancers. Finally, we evaluated the ability of our models to distinguish enhancers and promoters defined by histone modifications. Separating these classes was substantially more difficult, and this difference may contribute to ongoing debates about the similarity of enhancers and promoters. In summary, our results suggest that high-confidence transcribed enhancers and promoters can largely be distinguished based on biologically relevant sequence properties.

CAF-1 is an evolutionarily conserved H3/H4 histone chaperone that plays a key role in replication-coupled chromatin assembly and is targeted to the replication fork via interactions with PCNA, which, if disrupted, leads to epigenetic defects. In Saccharomyces cerevisiae, when the silent mating-type locus HMR contains point mutations within the E silencer, Sir protein association and silencing is lost. However, mutation of CDC7, encoding an S-phase-specific kinase, or subunits of the H4 K16-specific acetyltransferase complex SAS-I, restore silencing to this crippled HMR, HMRae**. Here, we observed that loss of Cac1p, the largest subunit of CAF-1, also restores silencing at HMRae**, and silencing in both cac1 and cdc7 mutants is suppressed by overexpression of SAS2. We demonstrate Cdc7p and Cac1p interact in vivo in S phase, but not in G1, consistent with observed cell cycle-dependent phosphorylation of Cac1p, and hypoacetylation of chromatin at H4 K16 in both cdc7 and cac1 mutants. Moreover, silencing at HMRae** is restored in cells expressing cac1p mutants lacking Cdc7p phosphorylation sites. We also discovered that cac1 and cdc7-90 synthetically interact negatively in the presence of DNA damage, but that Cdc7p phosphorylation sites on Cac1p are not required for responses to DNA damage. Combined, our results support a model in which Cdc7p regulates replication-coupled histone modification via a CAC1-dependent mechanism involving H4 K16ac deposition, and thereby silencing, while CAF-1-dependent replication- and repair-coupled chromatin assembly per se are functional in the absence of phosphorylation of Cdc7p consensus sites on CAF-1.

Conserved noncoding elements (CNEs) have a significant regulatory influence on their neighboring genes. Loss of proximity to CNEs through genomic rearrangements can, therefore, impact the transcriptional states of the cognate genes. Yet, the evolutionary implications of such chromosomal alterations have not been studied. Through genome-wide analysis of CNEs and the cognate genes of representative species from five different mammalian orders, we observed a significant loss of genes’ linear proximity to CNEs in the rat lineage. The CNEs and the genes losing proximity had a significant association with fetal, but not postnatal, brain development as assessed through ontology terms, developmental gene expression, chromatin marks, and genetic mutations. The loss of proximity to CNEs correlated with the independent evolutionary loss of fetus-specific upregulation of nearby genes in the rat brain. DNA breakpoints implicated in brain abnormalities of germline origin had significant representation between a CNE and the gene that exhibited loss of proximity, signifying the underlying developmental tolerance of genomic rearrangements that allowed the evolutionary splits of CNEs and the cognate genes in the rodent lineage. Our observations highlighted a nontrivial impact of chromosomal rearrangements in shaping the evolutionary dynamics of mammalian brain development and might explain the loss of brain traits, like cerebral folding of the cortex, in the rodent lineage.

Cells rarely exist alone, which drives the evolution of diverse mechanisms for identifying and responding appropriately to the presence of other nearby cells. Filamentous fungi depend on somatic cell-to-cell communication and fusion for the development and maintenance of a multicellular, interconnected colony that is characteristic of this group of organisms. The filamentous fungus Neurospora crassa is a model for investigating the mechanisms of somatic cell-to-cell communication and fusion. N. crassa cells chemotropically grow toward genetically similar cells, which ultimately make physical contact and undergo cell fusion. Here, we describe the development of a Pprm1-luciferase reporter system that differentiates whether genes function upstream or downstream of a conserved MAP kinase (MAPK) signaling complex, by using a set of mutants required for communication and cell fusion. The vast majority of these mutants are deficient for self-fusion and for fusion when paired with wild-type cells. However, the ham-11 mutant is unique in that it fails to undergo self-fusion, but chemotropic interactions and cell fusion are restored in ham-11 + wild-type interactions. In genetically dissimilar cells, chemotropic interactions are regulated by genetic differences at doc-1 and doc-2, which regulate prefusion non-self recognition; cells with dissimilar doc-1 and doc-2 alleles show greatly reduced cell-fusion frequencies. Here, we show that HAM-11 functions in parallel with the DOC-1 and DOC-2 proteins to regulate the activity of the MAPK signaling complex. Together, our data support a model of integrated self and non-self recognition processes that modulate somatic cell-to-cell communication in N. crassa.

Inner nuclear membrane (INM) protein composition regulates nuclear function, affecting processes such as gene expression, chromosome organization, nuclear shape, and stability. Mechanisms that drive changes in the INM proteome are poorly understood, in part because it is difficult to definitively assay INM composition rigorously and systematically. Using a split-GFP complementation system to detect INM access, we examined the distribution of all C-terminally tagged Saccharomyces cerevisiae membrane proteins in wild-type cells and in mutants affecting protein quality control pathways, such as INM-associated degradation (INMAD), ER-associated degradation, and vacuolar proteolysis. Deletion of the E3 ligase Asi1 had the most specific effect on the INM compared to mutants in vacuolar or ER-associated degradation pathways, consistent with a role for Asi1 in the INMAD pathway. Our data suggest that Asi1 not only removes mistargeted proteins at the INM, but also controls the levels and distribution of native INM components, such as the membrane nucleoporin Pom33. Interestingly, loss of Asi1 does not affect Pom33 protein levels but instead alters Pom33 distribution in the nuclear envelope through Pom33 ubiquitination, which drives INM redistribution. Taken together, our data demonstrate that the Asi1 E3 ligase has a novel function in INM protein regulation in addition to protein turnover.

Purine homeostasis is ensured through a metabolic network widely conserved from prokaryotes to humans. Purines can either be synthesized de novo, reused, or produced by interconversion of extant metabolites using the so-called recycling pathway. Although thoroughly characterized in microorganisms, such as yeast or bacteria, little is known about regulation of the purine biosynthesis network in metazoans. In humans, several diseases are linked to purine metabolism through as yet poorly understood etiologies. Particularly, the deficiency in adenylosuccinate lyase (ADSL)—an enzyme involved both in the purine de novo and recycling pathways—causes severe muscular and neuronal symptoms. In order to address the mechanisms underlying this deficiency, we established Caenorhabditis elegans as a metazoan model organism to study purine metabolism, while focusing on ADSL. We show that the purine biosynthesis network is functionally conserved in C. elegans. Moreover, adsl-1 (the gene encoding ADSL in C. elegans) is required for developmental timing, germline stem cell maintenance and muscle integrity. Importantly, these traits are not affected when solely the de novo pathway is abolished, and we present evidence that germline maintenance is linked specifically to ADSL activity in the recycling pathway. Hence, our results allow developmental and tissue specific phenotypes to be ascribed to separable steps of the purine metabolic network in an animal model.

Genetic screens in the nematode Caenorhabditis elegans identified the EGF/Ras and Notch pathways as central for vulval precursor cell fate patterning. Schematically, the anchor cell secretes EGF, inducing the P6.p cell to a primary (1°) vulval fate; P6.p in turn induces its neighbors to a secondary (2°) fate through Delta-Notch signaling and represses Ras signaling. In the nematode Oscheius tipulae, the anchor cell successively induces 2° then 1° vulval fates. Here, we report on the molecular identification of mutations affecting vulval induction in O. tipulae. A single Induction Vulvaless mutation was found, which we identify as a cis-regulatory deletion in a tissue-specific enhancer of the O. tipulae lin-3 homolog, confirmed by clustered regularly interspaced short palindromic repeats/Cas9 mutation. In contrast to this predictable Vulvaless mutation, mutations resulting in an excess of 2° fates unexpectedly correspond to the plexin/semaphorin pathway. Hyperinduction of P4.p and P8.p in these mutants likely results from mispositioning of these cells due to a lack of contact inhibition. The third signaling pathway found by forward genetics in O. tipulae is the Wnt pathway; a decrease in Wnt pathway activity results in loss of vulval precursor competence and induction, and 1° fate miscentering on P5.p. Our results suggest that the EGF and Wnt pathways have qualitatively similar activities in vulval induction in C. elegans and O. tipulae, albeit with quantitative differences in the effects of mutation. Thus, the derived induction process in C. elegans with an early induction of the 1° fate appeared during evolution, after the recruitment of the EGF pathway for vulval induction.

The central nervous system of most animals is bilaterally symmetrical. Closer observation often reveals some functional or anatomical left–right asymmetries. In the nematode Caenorhabditis elegans, the most obvious asymmetry in the nervous system is found in the ventral nerve cord (VNC), where most axons are in the right axon tract. The asymmetry is established when axons entering the VNC from the brain switch from the left to the right side at the anterior end of the VNC. In genetic screens we identified several mutations compromising VNC asymmetry. This includes alleles of col-99 (encoding a transmembrane collagen), unc-52/perlecan and unc-34 (encoding the actin modulator Enabled/Vasodilator-stimulated phosphoproteins). In addition, we evaluated mutants in known axon guidance pathways for asymmetry defects and used genetic interaction studies to place the genes into genetic pathways. In total we identified four different pathways contributing to the establishment of VNC asymmetry, represented by UNC-6/netrin, SAX-3/Robo, COL-99, and EPI-1/laminin. The combined inactivation of these pathways in triple and quadruple mutants leads to highly penetrant VNC asymmetry defects, suggesting these pathways are important contributors to the establishment of VNC asymmetry in C. elegans.

To detect a direction to evolution, without the pitfalls of reconstructing ancestral states, we need to compare "more evolved" to "less evolved" entities. But because all extant species have the same common ancestor, none are chronologically more evolved than any other. However, different gene families were born at different times, allowing us to compare young protein-coding genes to those that are older and hence have been evolving for longer. To be retained during evolution, a protein must not only have a function, but must also avoid toxic dysfunction such as protein aggregation. There is conflict between the two requirements: hydrophobic amino acids form the cores of protein folds, but also promote aggregation. Young genes avoid strongly hydrophobic amino acids, which is presumably the simplest solution to the aggregation problem. Here we show that young genes’ few hydrophobic residues are clustered near one another along the primary sequence, presumably to assist folding. The higher aggregation risk created by the higher hydrophobicity of older genes is counteracted by more subtle effects in the ordering of the amino acids, including a reduction in the clustering of hydrophobic residues until they eventually become more interspersed than if distributed randomly. This interspersion has previously been reported to be a general property of proteins, but here we find that it is restricted to old genes. Quantitatively, the index of dispersion delineates a gradual trend, i.e., a decrease in the clustering of hydrophobic amino acids over billions of years.

Linked beneficial and deleterious mutations are known to decrease the fixation probability of a favorable mutation in large asexual populations. While the hindering effect of strongly deleterious mutations on adaptive evolution has been well studied, how weakly deleterious mutations, either in isolation or with superior beneficial mutations, influence the rate of adaptation has not been fully explored. When the selection against the deleterious mutations is weak, the beneficial mutant can fix in many genetic backgrounds, besides the one it arose on. Here, taking this factor into account, I obtain an accurate analytical expression for the fixation probability of a beneficial mutant in an asexual population at mutation-selection balance. I then exploit this result along with clonal interference theory to investigate the joint effect of linked beneficial and deleterious mutations on the rate of adaptation, and identify parameter regions where it is reduced due to interference by either beneficial or deleterious or both types of mutations. I also study the evolution of mutation rates in adapting asexual populations, and find that linked beneficial mutations have a stronger influence than the deleterious mutations on mutator fixation.

Neutral models for quantitative trait evolution are useful for identifying phenotypes under selection. These models often assume normally distributed phenotypes. This assumption may be violated when a trait is affected by relatively few variants or when the effects of those variants arise from skewed or heavy tailed distributions. Molecular phenotypes such as gene expression levels may have these properties. To accommodate deviations from normality, models making fewer assumptions about the underlying genetics and patterns of variation are needed. Here, we develop a general neutral model for quantitative trait variation using a coalescent approach. This model allows interpretation of trait distributions in terms of familiar population genetic parameters because it is based on the coalescent. We show how the normal distribution resulting from the infinitesimal limit, where the number of loci grows large as the effect size per mutation becomes small, depends only on expected pairwise coalescent times. We then demonstrate how deviations from normality depend on demography through the distribution of coalescence times as well as through genetic parameters. In particular, population growth events exacerbate deviations while bottlenecks reduce them. We demonstrate the practical applications of this model by showing how to sample from the neutral distribution of $${Q}_{ST}$$, the ratio of the variance between subpopulations to that in the overall population. We further show it is likely impossible to distinguish sparsity from skewed or heavy tailed mutational effects using only sampled trait values. The model analyzed here greatly expands the parameter space for neutral trait models.

In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified? We consider studying effect heterogeneity using Bayesian random effect interaction models. The proposed methodology can be applied using shrinkage and variable selection methods, and produces useful information about effect heterogeneity in the form of whole-genome summaries (e.g., the proportions of variance of a complex trait explained by a set of SNPs and the average correlation of effects) as well as SNP-specific attributes. Using simulations, we show that the proposed methodology yields (nearly) unbiased estimates when the sample size is not too small relative to the number of SNPs used. Subsequently, we used the methodology for the analyses of four complex human traits (standing height, high-density lipoprotein, low-density lipoprotein, and serum urate levels) in European-Americans (EAs) and African-Americans (AAs). The estimated correlations of effects between the two subpopulations were well below unity for all the traits, ranging from 0.73 to 0.50. The extent of effect heterogeneity varied between traits and SNP sets. Height showed less differences in SNP effects between AAs and EAs whereas HDL, a trait highly influenced by lifestyle, exhibited a greater extent of effect heterogeneity. For all the traits, we observed substantial variability in effect heterogeneity across SNPs, suggesting that effect heterogeneity varies between regions of the genome.

Pesticide resistance arises rapidly in arthropod herbivores, as can host plant adaptation, and both are significant problems in agriculture. These traits have been challenging to study as both are often polygenic and many arthropods are genetically intractable. Here, we examined the genetic architecture of pesticide resistance and host plant adaptation in the two-spotted spider mite, Tetranychus urticae, a global agricultural pest. We show that the short generation time and high fecundity of T. urticae can be readily exploited in experimental evolution designs for high-resolution mapping of quantitative traits. As revealed by selection with spirodiclofen, an acetyl-CoA carboxylase inhibitor, in populations from a cross between a spirodiclofen-resistant and a spirodiclofen-susceptible strain, and which also differed in performance on tomato, we found that a limited number of loci could explain quantitative resistance to this compound. These were resolved to narrow genomic intervals, suggesting specific candidate genes, including acetyl-CoA carboxylase itself, clustered and copy variable cytochrome P450 genes, and NADPH cytochrome P450 reductase, which encodes a redox partner for cytochrome P450s. For performance on tomato, candidate genomic regions for response to selection were distinct from those responding to the synthetic compound and were consistent with a more polygenic architecture. In accomplishing this work, we exploited the continuous nature of allele frequency changes across experimental populations to resolve the existing fragmented T. urticae draft genome to pseudochromosomes. This improved assembly was indispensable for our analyses, as it will be for future research with this model herbivore that is exceptionally amenable to genetic studies.

Due to the complexity of genotype–phenotype relationships, simultaneous analyses of genomic associations with multiple traits will be more powerful and informative than a series of univariate analyses. However, in most cases, studies of genotype–phenotype relationships have been analyzed only one trait at a time. Here, we report the results of a fully integrated multivariate genome-wide association analysis of the shape of the Drosophila melanogaster wing in the Drosophila Genetic Reference Panel. Genotypic effects on wing shape were highly correlated between two different laboratories. We found 2396 significant SNPs using a 5% false discovery rate cutoff in the multivariate analyses, but just four significant SNPs in univariate analyses of scores on the first 20 principal component axes. One quarter of these initially significant SNPs retain their effects in regularized models that take into account population structure and linkage disequilibrium. A key advantage of multivariate analysis is that the direction of the estimated phenotypic effect is much more informative than a univariate one. We exploit this fact to show that the effects of knockdowns of genes implicated in the initial screen were on average more similar than expected under a null model. A subset of SNP effects were replicable in an unrelated panel of inbred lines. Association studies that take a phenomic approach, considering many traits simultaneously, are an important complement to the power of genomics.

We leverage two complementary Drosophila melanogaster mapping panels to genetically dissect starvation resistance—an important fitness trait. Using >1600 genotypes from the multiparental Drosophila Synthetic Population Resource (DSPR), we map numerous starvation stress QTL that collectively explain a substantial fraction of trait heritability. Mapped QTL effects allowed us to estimate DSPR founder phenotypes, predictions that were correlated with the actual phenotypes of these lines. We observe a modest phenotypic correlation between starvation resistance and triglyceride level, traits that have been linked in previous studies. However, overlap among QTL identified for each trait is low. Since we also show that DSPR strains with extreme starvation phenotypes differ in desiccation resistance and activity level, our data imply multiple physiological mechanisms contribute to starvation variability. We additionally exploited the Drosophila Genetic Reference Panel (DGRP) to identify sequence variants associated with starvation resistance. Consistent with prior work these sites rarely fall within QTL intervals mapped in the DSPR. We were offered a unique opportunity to directly compare association mapping results across laboratories since two other groups previously measured starvation resistance in the DGRP. We found strong phenotypic correlations among studies, but extremely low overlap in the sets of genomewide significant sites. Despite this, our analyses revealed that the most highly associated variants from each study typically showed the same additive effect sign in independent studies, in contrast to otherwise equivalent sets of random variants. This consistency provides evidence for reproducible trait-associated sites in a widely used mapping panel, and highlights the polygenic nature of starvation resistance.

Cryptic genetic variation may be an important contributor to heritable traits, but its extent and regulation are not fully understood. Here, we investigate the cryptic genetic variation underlying a Saccharomyces cerevisiae colony phenotype that is typically suppressed in a cross of the laboratory strain BY4716 (BY) and a derivative of the clinical isolate 322134S (3S). To do this, we comprehensively dissect the trait’s genetic basis in the BYx3S cross in the presence of three different genetic perturbations that enable its expression. This allows us to detect and compare the specific loci that interact with each perturbation to produce the trait. In total, we identify 21 loci, all but one of which interact with just a subset of the perturbations. Beyond impacting which loci contribute to the trait, the genetic perturbations also alter the extent of additivity, epistasis, and genotype–environment interaction among the detected loci. Additionally, we show that the single locus interacting with all three perturbations corresponds to the coding region of the cell surface gene FLO11. While nearly all of the other remaining loci influence FLO11 transcription in cis or trans, the perturbations tend to interact with loci in different pathways and subpathways. Our work shows how layers of cryptic genetic variation can influence complex traits. Here, these layers mainly represent different regulatory inputs into the transcription of a single key gene.



Genetic Markers

You know how an interstate map can guide you from one city to another. A genetic map is like that, and it guides researchers toward their target gene. Just as there are landmarks in interstate maps, there also are landmarks in genetic maps known as genetic markers...
Read More



A process by which genes undergo a structural change.