Using gene map science to evaluate the genetic map and eliminate disease

Genetic News

The Elizabeth W. Jones Award for Excellence in Education recognizes an individual or group that has had significant, sustained impact on genetics education at any level, from K-12 through graduate school and beyond. Bruce Weir (University of Washington) is the 2019 recipient in recognition of his work training thousands of researchers in the rigorous use of statistical analysis methods for genetic and genomic data. His contributions fall into three categories: the acclaimed Summer Institute in Statistical Genetics, which has been held continuously for 23 years and has trained > 10,000 researchers worldwide; the popular graduate-level textbook Genetic Data Analysis; and the training of a growing number of forensic geneticists during the rise of DNA evidence in courts around the world.

The power of any genetic model organism is derived, in part, from the ease with which gene expression can be manipulated. The short generation time and invariant developmental lineage have made Caenorhabditis elegans very useful for understanding, e.g., developmental programs, basic cell biology, neurobiology, and aging. Over the last decade, the C. elegans transgenic toolbox has expanded considerably, with the addition of a variety of methods to control expression and modify genes with unprecedented resolution. Here, we provide a comprehensive overview of transgenic methods in C. elegans, with an emphasis on recent advances in transposon-mediated transgenesis, CRISPR/Cas9 gene editing, conditional gene and protein inactivation, and bipartite systems for temporal and spatial control of expression.

In plant breeding, heritability is often calculated (i) as a measure of precision of trials and/or (ii) to compute the response to selection. It is usually estimated on an entry-mean basis, since the phenotype is usually an aggregated value, as genotypes are replicated in trials, which stands in contrast with animal breeding and human genetics. When this was first proposed, assumptions such as balanced data and independent genotypic effects were made that are often violated in modern plant breeding trials/analyses. Due to this, multiple alternative methods have been proposed, aiming to generalize heritability on an entry-mean basis. In this study, we propose an extension of the concept for heritability on an entry-mean to an entry-difference basis, which allows for more detailed insight and is more meaningful in the context of selection in plant breeding, because the correlation among entry means can be accounted for. We show that under certain circumstances our method reduces to other popular generalized methods for heritability estimation on an entry-mean basis. The approach is exemplified via four examples that show different levels of complexity, where we compare six methods for heritability estimation on an entry-mean basis to our approach (example codes: Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.

We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure. The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model’s natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.

Construction of genetic linkage maps has become a routine step for mapping quantitative trait loci (QTL), particularly in animal and plant breeding populations. Many multiparental populations have recently been produced to increase genetic diversity and QTL mapping resolution. However, few software packages are available for map construction in these populations. In this paper, we build a general framework for the construction of genetic linkage maps from genotypic data in diploid populations, including bi- and multiparental populations, cross-pollinated (CP) populations, and breeding pedigrees. The framework is implemented as an automatic pipeline called magicMap, where the maximum multilocus likelihood approach utilizes genotypic information efficiently. We evaluate magicMap by extensive simulations and eight real datasets: one biparental, one CP, four multiparent advanced generation intercross (MAGIC), and two nested association mapping (NAM) populations, the number of markers ranging from a few hundred to tens of thousands. Not only is magicMap the only software capable of accommodating all of these designs, it is more accurate and robust to missing genotypes and genotyping errors than commonly used packages.

The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach ("HaploBlocker") for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.

We develop a flexible and computationally efficient approach for analyzing high-throughput chemical genetic screens. In such screens, a library of genetic mutants is phenotyped in a large number of stresses. Typically, interactions between genes and stresses are detected by grouping the mutants and stresses into categories, and performing modified t-tests for each combination. This approach does not have a natural extension if mutants or stresses have quantitative or nonoverlapping annotations (e.g., if conditions have doses or a mutant falls into more than one category simultaneously). We develop a matrix linear model (MLM) framework that allows us to model relationships between mutants and conditions in a simple, yet flexible, multivariate framework. It encodes both categorical and continuous relationships to enhance detection of associations. We develop a fast estimation algorithm that takes advantage of the structure of MLMs. We evaluate our method’s performance in simulations and in an Escherichia coli chemical genetic screen, comparing it with an existing univariate approach based on modified t-tests. We show that MLMs perform slightly better than the univariate approach when mutants and conditions are classified in nonoverlapping categories, and substantially better when conditions can be ordered in dosage categories. Therefore, it is an attractive alternative to current methods, and provides a computationally scalable framework for larger and complex chemical genetic screens. A Julia language implementation of MLMs and the code used for this paper are available at and, respectively.

For years, animal selection in livestock species has been performed by selecting animals based on genetic inheritance. However, evolutionary studies have reported that nongenetic information that drives natural selection can also be inherited across generations (epigenetic, microbiota, environmental inheritance). In response to this finding, the concept of inclusive heritability, which combines all sources of information inherited across generations, was developed. To better predict the transmissible potential of each animal by taking into account these diverse sources of inheritance and improve selection in livestock species, we propose the "transmissibility model." Similarly to the animal model, this model uses pedigree and phenotypic information to estimate variance components and predict the transmissible potential of an individual, but differs by estimating the path coefficients of inherited information from parent to offspring instead of using a set value of 0.5 for both the sire and the dam (additive genetic relationship matrix). We demonstrated the structural identifiability of the transmissibility model, and performed a practical identifiability and power study of the model. We also performed simulations to compare the performances of the animal and transmissibility models for estimating the covariances between relatives and predicting the transmissible potential under different combinations of sources of inheritance. The transmissibility model provided similar results to the animal model when inheritance was of genetic origin only, but outperformed the animal model for estimating the covariances between relatives and predicting the transmissible potential when the proportion of inheritance of nongenetic origin was high or when the sire and dam path coefficients were very different.

Meiotic recombination shuffles genetic variation and promotes correct segregation of chromosomes. Rates of recombination vary on several scales, both within genomes and between individuals, and this variation is affected by both genetic and environmental factors. Social insects have extremely high rates of recombination, although the evolutionary causes of this are not known. Here, we estimate rates of crossovers and gene conversions in 22 colonies of the honeybee, Apis mellifera, and 9 colonies of the bumblebee, Bombus terrestris, using direct sequencing of 299 haploid drone offspring. We confirm that both species have extremely elevated crossover rates, with higher rates measured in the highly eusocial honeybee than the primitively social bumblebee. There are also significant differences in recombination rate between subspecies of honeybee. There is substantial variation in genome-wide recombination rate between individuals of both A. mellifera and B. terrestris and the distribution of these rates overlap between species. A large proportion of interindividual variation in recombination rate is heritable, which indicates the presence of variation in trans-acting factors that influence recombination genome-wide. We infer that levels of crossover interference are significantly lower in honeybees compared to bumblebees, which may be one mechanism that contributes to higher recombination rates in honeybees. We also find a significant increase in recombination rate with distance from the centromere, mirrored by methylation differences. We detect a strong transmission bias due to GC-biased gene conversion associated with noncrossover gene conversions. Our results shed light on the mechanistic causes of extreme rates of recombination in social insects and the genetic architecture of recombination rate variation.

The diploid budding yeast Candida albicans harbors unique CENPA-rich 3- to 5-kb regions that form the centromere (CEN) core on each of its eight chromosomes. The epigenetic nature of these CENs does not permit the stabilization of a functional kinetochore on an exogenously introduced CEN plasmid. The flexible nature of such centromeric chromatin is exemplified by the reversible silencing of a transgene upon its integration into the CENPA-bound region. The lack of a conventional heterochromatin machinery and the absence of defined boundaries of CENPA chromatin makes the process of CEN specification in this organism elusive. Additionally, upon native CEN deletion, C. albicans can efficiently activate neocentromeres proximal to the native CEN locus, hinting at the importance of CEN-proximal regions. In this study, we examine this CEN-proximity effect and identify factors for CEN specification in C. albicans. We exploit a counterselection assay to isolate cells that can silence a transgene when integrated into the CEN-flanking regions. We show that the frequency of reversible silencing of the transgene decreases from the central core of CEN7 to its peripheral regions. Using publicly available C. albicans high-throughput chromosome conformation capture data, we identify a 25-kb region centering on the CENPA-bound core that acts as CEN-flanking compact chromatin (CFCC). Cis- and trans-chromosomal interactions associated with the CFCC spatially segregates it from bulk chromatin. We further show that neocentromere activation on chromosome 7 occurs within this specified region. Hence, this study identifies a specialized CEN-proximal domain that specifies and restricts the centromeric activity to a unique region.

Recombination between divergent DNA sequences is actively prevented by heteroduplex rejection mechanisms. In baker’s yeast, such antirecombination mechanisms can be initiated by the recognition of DNA mismatches in heteroduplex DNA by MSH proteins, followed by recruitment of the Sgs1-Top3-Rmi1 helicase–topoisomerase complex to unwind the recombination intermediate. We previously showed that the repair/rejection decision during single-strand annealing recombination is temporally regulated by MSH (MutS homolog) protein levels and by factors that excise nonhomologous single-stranded tails. These observations, coupled with recent studies indicating that mismatch repair (MMR) factors interact with components of the histone chaperone machinery, encouraged us to explore roles for epigenetic factors and chromatin conformation in regulating the decision to reject vs. repair recombination between divergent DNA substrates. This work involved the use of an inverted repeat recombination assay thought to measure sister chromatid repair during DNA replication. Our observations are consistent with the histone chaperones CAF-1 and Rtt106, and the histone deacetylase Sir2, acting to suppress heteroduplex rejection and the Rpd3, Hst3, and Hst4 deacetylases acting to promote heteroduplex rejection. These observations, and double-mutant analysis, have led to a model in which nucleosomes located at DNA lesions stabilize recombination intermediates and compete with MMR factors that mediate heteroduplex rejection.

Malassezia encompasses a monophyletic group of basidiomycetous yeasts naturally found on the skin of humans and other animals. Malassezia species have lost genes for lipid biosynthesis, and are therefore lipid-dependent and difficult to manipulate under laboratory conditions. In this study, we applied a recently-developed Agrobacterium tumefaciens-mediated transformation protocol to perform transfer (T)-DNA random insertional mutagenesis in Malassezia furfur. A total of 767 transformants were screened for sensitivity to 10 different stresses, and 19 mutants that exhibited a phenotype different from the wild type were further characterized. The majority of these strains had single T-DNA insertions, which were identified within open reading frames of genes, untranslated regions, and intergenic regions. Some T-DNA insertions generated chromosomal rearrangements while others could not be characterized. To validate the findings of our forward genetic screen, a novel clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system was developed to generate targeted deletion mutants for two genes identified in the screen: CDC55 and PDR10. This system is based on cotransformation of M. furfur mediated by A. tumefaciens, to deliver both a CAS9-gRNA construct that induces double-strand DNA breaks and a gene replacement allele that serves as a homology-directed repair template. Targeted deletion mutants for both CDC55 and PDR10 were readily generated with this method. This study demonstrates the feasibility and reliability of A. tumefaciens-mediated transformation to aid in the identification of gene functions in M. furfur, through both insertional mutagenesis and CRISPR/Cas9-mediated targeted gene deletion.

Activation of the Saccharomyces cerevisiae HO promoter is highly regulated, requiring the ordered recruitment of activators and coactivators and allowing production of only a few transcripts in mother cells within a short cell cycle window. We conducted genetic screens to identify the negative regulators of HO expression necessary to limit HO transcription. Known repressors of HO (Ash1 and Rpd3) were identified, as well as several additional chromatin-associated factors including the Hda1 histone deacetylase, the Isw2 chromatin remodeler, and the corepressor Tup1. We also identified clusters of HO promoter mutations that suggested roles for the Dot6/Tod6 (PAC site) and Ume6 repression pathways. We used ChIP assays with synchronized cells to validate the involvement of these factors and map the association of Ash1, Dot6, and Ume6 with the HO promoter to a brief window in the cell cycle between binding of the initial activating transcription factor and initiation of transcription. We found that Ash1 and Ume6 each recruit the Rpd3 histone deacetylase to HO, and their effects are additive. In contrast, Rpd3 was not recruited significantly to the PAC site, suggesting this site has a distinct mechanism for repression. Increases in HO expression and SWI/SNF recruitment were all additive upon loss of Ash1, Ume6, and PAC site factors, indicating the convergence of independent pathways for repression. Our results demonstrate that multiple protein complexes are important for limiting the spread of SWI/SNF-mediated nucleosome eviction across the HO promoter, suggesting that regulation requires a delicate balance of activities that promote and repress transcription.

Saccharomyces cerevisiae lives in boom and bust nutritional environments. Sophisticated regulatory systems have evolved to rapidly cope with these changes while preserving intracellular homeostasis. Target of Rapamycin Complex 1 (TorC1), is a serine/threonine kinase complex and a principle nitrogen-responsive regulator. TorC1 is activated by excess nitrogen and downregulated by limiting nitrogen. Two of TorC1’s many downstream targets are Gln3 and Gat1—GATA-family transcription activators—whose localization and function are Nitrogen Catabolite Repression- (NCR-) sensitive. In nitrogen replete environments, TorC1 is activated, thereby inhibiting the PTap42-Sit4 and PTap42-PP2A (Pph21/Pph22-Tpd3, Pph21,22-Rts1/Cdc55) phosphatase complexes. Gln3 is phosphorylated, sequestered in the cytoplasm and NCR-sensitive transcription repressed. In nitrogen-limiting conditions, TorC1 is downregulated and PTap42-Sit4 and PTap42-PP2A are active. They dephosphorylate Gln3, which dissociates from Ure2, relocates to the nucleus, and activates transcription. A paradoxical observation, however, led us to suspect that Gln3 control was more complex than appreciated, i.e., Sit4 dephosphorylates Gln3 more in excess than in limiting nitrogen conditions. This paradox motivated us to reinvestigate the roles of these phosphatases in Gln3 regulation. We discovered that: (i) Sit4 and PP2A actively function both in conditions where TorC1 is activated as well as down-regulated; (ii) nuclear Gln3 is more highly phosphorylated than when it is sequestered in the cytoplasm; (iii) in nitrogen-replete conditions, Gln3 relocates from the nucleus to the cytoplasm, where it is dephosphorylated by Sit4 and PP2A; and (iv) in nitrogen excess and limiting conditions, Sit4, PP2A, and Ure2 are all required to maintain cytoplasmic Gln3 in its dephosphorylated form.

Colorectal cancer is a complex disease driven by well-established mutations such as APC and other yet to be identified pathways. The GTPase Rab11 regulates endosomal protein trafficking, and previously we showed that loss of Rab11 caused intestinal inflammation and hyperplasia in mice and flies. To test the idea that loss of Rab11 may promote cancer progression, we have analyzed archival human patient tissues and observed that 51 out of 70 colon cancer tissues had lower Rab11 protein staining. By using the Drosophila midgut model, we have found that loss of Rab11 can lead to three changes that may relate to cancer progression. First is the disruption of enterocyte polarity based on staining of the FERM domain protein Coracle. Second is an increased proliferation due to an increased expression of the JAK-STAT pathway ligand Upd3. Third is an increased expression of ImpL2, which is an IGFBP7 homolog and can suppress metabolism. Furthermore, loss of Rab11 can act synergistically with the oncoprotein RasV12 to regulate these cancer-related phenotypes.

The target of rapamycin (TOR) pathway is an evolutionarily conserved signal transduction system that governs a plethora of eukaryotic biological processes, but its role in Cryptococcus neoformans remains elusive. In this study, we investigated the TOR pathway by functionally characterizing two Tor-like kinases, Tor1 and Tlk1, in C. neoformans. We successfully deleted TLK1, but not TOR1. TLK1 deletion did not result in any evident in vitro phenotypes, suggesting that Tlk1 is dispensable for the growth of C. neoformans. We demonstrated that Tor1, but not Tlk1, is essential and the target of rapamycin by constructing and analyzing conditionally regulated strains and sporulation analysis of heterozygous mutants in the diploid strain background. To further analyze the Tor1 function, we constructed constitutive TOR1 overexpression strains. Tor1 negatively regulated thermotolerance and the DNA damage response, which are two important virulence factors of C. neoformans. TOR1 overexpression reduced Mpk1 phosphorylation, which is required for cell wall integrity and thermoresistance, and Rad53 phosphorylation, which governs the DNA damage response pathway. Tor1 is localized to the cytoplasm, but enriched in the vacuole membrane. Phosphoproteomics and transcriptomics revealed that Tor1 regulates a variety of biological processes, including metabolic processes, cytoskeleton organization, ribosome biogenesis, and stress response. TOR inhibition by rapamycin caused actin depolarization in a Tor1-dependent manner. Finally, screening rapamycin-sensitive and -resistant kinase and transcription factor mutants revealed that the TOR pathway may crosstalk with a number of stress signaling pathways. In conclusion, our study demonstrates that a single Tor1 kinase plays pleiotropic roles in C. neoformans.

Innate immune responses protect organisms against various insults, but may lead to tissue damage when aberrantly activated. In higher organisms, cytoplasmic DNA can trigger inflammatory responses that can lead to tissue degeneration. Simpler metazoan models could shed new mechanistic light on how inflammatory responses to cytoplasmic DNA lead to pathologies. Here, we show that in a DNase II-defective Caenorhabditis elegans strain, persistent cytoplasmic DNA leads to systemic tissue degeneration and loss of tissue functionality due to impaired proteostasis. These pathological outcomes can be therapeutically alleviated by restoring protein homeostasis, either via ectopic induction of the ER unfolded protein response or N-acetylglucosamine treatment. Our results establish C. elegans as an ancestral metazoan model for studying the outcomes of inflammation-like conditions caused by persistent cytoplasmic DNA and provide insight into potential therapies for human conditions involving chronic inflammation.

The actomyosin network is involved in crucial cellular processes including morphogenesis, cell adhesion, apoptosis, proliferation, differentiation, and collective cell migration in Drosophila, Caenorhabditis elegans, and mammals. Here, we demonstrate that Drosophila larval blood stem-like progenitors require actomyosin activity for their maintenance. Genetic loss of the actomyosin network from progenitors caused a decline in their number. Likewise, the progenitor population increased upon sustained actomyosin activation via phosphorylation by Rho-associated kinase. We show that actomyosin positively regulates larval blood progenitors by controlling the maintenance factor Cubitus interruptus (Ci). Overexpression of the maintenance signal via a constitutively activated construct (ci.HA) failed to sustain Ci-155 in the absence of actomyosin components like Zipper (zip) and Squash (sqh), thus favoring protein kinase A (PKA)-independent regulation of Ci activity. Furthermore, we demonstrate that a change in cortical actomyosin assembly mediated by DE-cadherin modulates Ci activity, thereby determining progenitor status. Thus, loss of cell adhesion and downstream actomyosin activity results in desensitization of the progenitors to Hh signaling, leading to their differentiation. Our data reveal how cell adhesion and the actomyosin network cooperate to influence patterning, morphogenesis, and maintenance of the hematopoietic stem-like progenitor pool in the developing Drosophila hematopoietic organ.

Fibroblast growth factor (Fgf) signaling regulates many processes during development. In most cases, one tissue layer secretes an Fgf ligand that binds and activates an Fgf receptor (Fgfr) expressed by a neighboring tissue. Although studies have identified the roles of specific Fgf ligands during development, less is known about the requirements for the receptors. We have generated null mutations in each of the five fgfr genes in zebrafish. Considering the diverse requirements for Fgf signaling throughout development, and that null mutations in the mouse Fgfr1 and Fgfr2 genes are embryonic lethal, it was surprising that all zebrafish homozygous mutants are viable and fertile, with no discernable embryonic defect. Instead, we find that multiple receptors are involved in coordinating most Fgf-dependent developmental processes. For example, mutations in the ligand fgf8a cause loss of the midbrain-hindbrain boundary, whereas, in the fgfr mutants, this phenotype is seen only in embryos that are triple mutant for fgfr1a;fgfr1b;fgfr2, but not in any single or double mutant combinations. We show that this apparent fgfr redundancy is also seen during the development of several other tissues, including posterior mesoderm, pectoral fins, viscerocranium, and neurocranium. These data are an essential step toward defining the specific Fgfrs that function with particular Fgf ligands to regulate important developmental processes in zebrafish.

As multi-individual population-scale data become available, more complex modeling strategies are needed to quantify genome-wide patterns of nucleotide usage and associated mechanisms of evolution. Recently, the multivariate neutral Moran model was proposed. However, it was shown insufficient to explain the distribution of alleles in great apes. Here, we propose a new model that includes allelic selection. Our theoretical results constitute the basis of a new Bayesian framework to estimate mutation rates and selection coefficients from population data. We apply the new framework to a great ape dataset, where we found patterns of allelic selection that match those of genome-wide GC-biased gene conversion (gBGC). In particular, we show that great apes have patterns of allelic selection that vary in intensity—a feature that we correlated with great apes’ distinct demographies. We also demonstrate that the AT/GC toggling effect decreases the probability of a substitution, promoting more polymorphisms in the base composition of great ape genomes. We further assess the impact of GC-bias in molecular analysis, and find that mutation rates and genetic distances are estimated under bias when gBGC is not properly accounted for. Our results contribute to the discussion on the tempo and mode of gBGC evolution, while stressing the need for gBGC-aware models in population genetics and phylogenetics.

Understanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an ad hoc basis. Therefore, it remains unclear how to compare different studies and which measures to use. Here, we systematically compare measures based on identity-by-state (IBS) and identity-by-descent (IBD) using a globally diverse data set of malaria parasites, Plasmodium falciparum and P. vivax, and provide marker requirements for estimates based on IBD. We formally show that the informativeness of polyallelic markers for relatedness inference is maximized when alleles are equifrequent. Estimates based on IBS are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on IBD. To generate estimates with errors below an arbitrary threshold of 0.1, we recommend ~100 polyallelic or 200 biallelic markers. Marker requirements are immediately applicable to haploid malaria parasites and other haploid eukaryotes. C.I.s facilitate comparison when different marker sets are used. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology. We hope it will provide a basis for statistically informed prospective study design and surveillance strategies.

Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.

The outcome of selection on genetic variation depends on the geographic organization of individuals and populations as well as the organization of loci within the genome. Spatially variable selection between marine and freshwater habitats has had a significant and heterogeneous impact on patterns of genetic variation across the genome of threespine stickleback fish. When marine stickleback invade freshwater habitats, more than a quarter of the genome can respond to divergent selection, even in as little as 50 years. This process largely uses standing genetic variation that can be found ubiquitously at low frequency in marine populations, can be millions of years old, and is likely maintained by significant bidirectional gene flow. Here, we combine population genomic data of marine and freshwater stickleback from Cook Inlet, Alaska, with genetic maps of stickleback fish derived from those same populations to examine how linkage to loci under selection affects genetic variation across the stickleback genome. Divergent selection has had opposing effects on linked genetic variation on chromosomes from marine and freshwater stickleback populations: near loci under selection, marine chromosomes are depauperate of variation, while these same regions among freshwater genomes are the most genetically diverse. Forward genetic simulations recapitulate this pattern when different selective environments also differ in population structure. Lastly, dense genetic maps demonstrate that the interaction between selection and population structure may impact large stretches of the stickleback genome. These findings advance our understanding of how the structuring of populations across geography influences the outcomes of selection, and how the recombination landscape broadens the genomic reach of selection.

Divergent selection works when an allele establishes in the subpopulations in which it is adaptive, but not in the ones in which it is deleterious. While such a locally adaptive allele is maintained, the target locus of selection works as a genetic barrier to gene flow or a barrier locus. The genetic divergence (or FST) around the barrier locus can be maintained, while in other regions of the genome, genetic variation can be mixed by gene flow or migration. In this work, we consider theoretically the evolutionary process of a barrier locus, from its birth to stable preservation. Under a simple two-population model, we use a diffusion approach to obtain analytical expressions for the probability of initial establishment of a locally adaptive allele, the reduction of genetic variation due to the spread of the adaptive allele, and the process to the development of a sharp peak of divergence (genomic island of divergence). Our results will be useful to understanding how genomes evolve through local adaptation and divergent selection.

Maternally transmitted Wolbachia infect about half of insect species, yet the predominant mode(s) of Wolbachia acquisition remains uncertain. Species-specific associations could be old, with Wolbachia and hosts codiversifying (i.e., cladogenic acquisition), or relatively young and acquired by horizontal transfer or introgression. The three Drosophila yakuba-clade hosts [(D. santomea, D. yakuba) D. teissieri] diverged ~3 MYA and currently hybridize on the West African islands Bioko and São Tomé. Each species is polymorphic for nearly identical Wolbachia that cause weak cytoplasmic incompatibility (CI)–reduced egg hatch when uninfected females mate with infected males. D. yakuba-clade Wolbachia are closely related to wMel, globally polymorphic in D. melanogaster. We use draft Wolbachia and mitochondrial genomes to demonstrate that D. yakuba-clade phylogenies for Wolbachia and mitochondria tend to follow host nuclear phylogenies. However, roughly half of D. santomea individuals, sampled both inside and outside of the São Tomé hybrid zone, have introgressed D. yakuba mitochondria. Both mitochondria and Wolbachia possess far more recent common ancestors than the bulk of the host nuclear genomes, precluding cladogenic Wolbachia acquisition. General concordance of Wolbachia and mitochondrial phylogenies suggests that horizontal transmission is rare, but varying relative rates of molecular divergence complicate chronogram-based statistical tests. Loci that cause CI in wMel are disrupted in D. yakuba-clade Wolbachia; but a second set of loci predicted to cause CI are located in the same WO prophage region. These alternative CI loci seem to have been acquired horizontally from distantly related Wolbachia, with transfer mediated by flanking Wolbachia-specific ISWpi1 transposons.

Present-day humans outside Africa descend mainly from a single expansion out ~50,000–70,000 years ago, but many details of this expansion remain unclear, including the history of the male-specific Y chromosome at this time. Here, we reinvestigate a rare deep-rooting African Y-chromosomal lineage by sequencing the whole genomes of three Nigerian men described in 2003 as carrying haplogroup DE* Y chromosomes, and analyzing them in the context of a calibrated worldwide Y-chromosomal phylogeny. We confirm that these three chromosomes do represent a deep-rooting DE lineage, branching close to the DE bifurcation, but place them on the D branch as an outgroup to all other known D chromosomes, and designate the new lineage D0. We consider three models for the expansion of Y lineages out of Africa ~50,000–100,000 years ago, incorporating migration back to Africa where necessary to explain present-day Y-lineage distributions. Considering both the Y-chromosomal phylogenetic structure incorporating the D0 lineage, and published evidence for modern humans outside Africa, the most favored model involves an origin of the DE lineage within Africa with D0 and E remaining there, and migration out of the three lineages (C, D, and FT) that now form the vast majority of non-African Y chromosomes. The exit took place 50,300–81,000 years ago (latest date for FT lineage expansion outside Africa – earliest date for the D/D0 lineage split inside Africa), and most likely 50,300–59,400 years ago (considering Neanderthal admixture). This work resolves a long-running debate about Y-chromosomal out-of-Africa/back-to-Africa migrations, and provides insights into the out-of-Africa expansion more generally.

Mitochondrial DNA (mtDNA) mutations cause severe congenital diseases but may also be associated with healthy aging. mtDNA is stochastically replicated and degraded, and exists within organelles which undergo dynamic fusion and fission. The role of the resulting mitochondrial networks in the time evolution of the cellular proportion of mutated mtDNA molecules (heteroplasmy), and cell-to-cell variability in heteroplasmy (heteroplasmy variance), remains incompletely understood. Heteroplasmy variance is particularly important since it modulates the number of pathological cells in a tissue. Here, we provide the first wide-reaching theoretical framework which bridges mitochondrial network and genetic states. We show that, under a range of conditions, the (genetic) rate of increase in heteroplasmy variance and de novo mutation are proportionally modulated by the (physical) fraction of unfused mitochondria, independently of the absolute fission–fusion rate. In the context of selective fusion, we show that intermediate fusion:fission ratios are optimal for the clearance of mtDNA mutants. Our findings imply that modulating network state, mitophagy rate, and copy number to slow down heteroplasmy dynamics when mean heteroplasmy is low could have therapeutic advantages for mitochondrial disease and healthy aging.

Domestic animals are adapted to conditions vastly different from those of their wild ancestors, and this is particularly true for their diets. The most numerous of all domestic species, the chicken, originated from the Red Junglefowl (RJF), a native of subtropical forests in Southeast Asia. Surprisingly however, in domestic chicken breeds, a common haplotype of the β-carotene oxygenase 2 (BCO2) gene, which is involved in carotenoid metabolism, is introgressed from a related species, the Gray Junglefowl, and has been under strong selective pressure during domestication. This suggests that a hybridization event may have conferred a fitness advantage on chickens carrying the derived allele. To investigate the possible biological function of the introgressed BCO2 allele in chicken, we introgressed the ancestral BCO2 allele into domestic White Leghorn chickens. We measured gene expression as well as carotenoid accumulation in skin and eggs of chickens carrying either the ancestral or the derived BCO2 allele. The derived haplotype was associated with down-regulation of BCO2 in skin, muscle, and adipose tissue, but not in liver or duodenum, indicating that carotenoid accumulation occurred in the tissues with reduced gene expression. Most importantly, we found that hens with the derived BCO2 genotype were capable of allocating stored carotenoids to their eggs, suggesting a functional benefit through buffering any shortage in the diet during egg production. Nevertheless, it is of interest that loss of function mutations in BCO2 gene are prevalent in other domesticates including cows, rabbits, and sheep, and, given the importance of carotenoids in development, reproduction, and immunity, it is possible that derived BCO2 alleles may provide a general mechanism in multiple domestic species to deal with higher demand for carotenoids in an environment with carotenoid shortage in the diet.

Bleomycin is a powerful chemotherapeutic drug used to treat a variety of cancers. However, individual patients vary in their responses to bleomycin. The identification of genetic differences that underlie this response variation could improve treatment outcomes by tailoring bleomycin dosages to each patient. We used the model organism Caenorhabditis elegans to identify genetic determinants of bleomycin-response differences by performing linkage mapping on recombinants derived from a cross between the laboratory strain (N2) and a wild strain (CB4856). This approach identified a small genomic region on chromosome V that underlies bleomycin-response variation. Using near-isogenic lines, and strains with CRISPR-Cas9 mediated deletions and allele replacements, we discovered that a novel nematode-specific gene (scb-1) is required for bleomycin resistance. Although the mechanism by which this gene causes variation in bleomycin responses is unknown, we suggest that a rare variant present in the CB4856 strain might cause differences in the potential stress-response function of scb-1 between the N2 and CB4856 strains, thereby leading to differences in bleomycin resistance.



Genetic Markers

You know how an interstate map can guide you from one city to another. A genetic map is like that, and it guides researchers toward their target gene. Just as there are landmarks in interstate maps, there also are landmarks in genetic maps known as genetic markers...
Read More



All of the genes carried by a single gamete; the DNA content of an individual, which includes all 44 autosomes, 2 sex chromosomes, and the mitochondrial DNA.