Using gene map science to evaluate the genetic map and eliminate disease

Genetic News

The Genetics Society of America (GSA) Medal recognizes researchers who have made outstanding contributions to the field of genetics in the past 15 years. The 2018 GSA Medal has been awarded to Mariana Wolfner of Cornell University for her work on reproductive processes that occur around the time of fertilization. This includes characterization of seminal proteins in Drosophila melanogaster, which has uncovered a wealth of information about sexual conflict in evolution.

A long-standing question in biology concerns the genetic mechanisms by which two sexes can evolve (botanists call this the dioecious condition and zoologists call it gonochory) from a functionally ancestral hermaphroditic state (without separate sexes). In 1932, H. J. Muller, one of the great 20th century geneticists but also a fine evolutionary biologist, pointed out that two mutations were necessary. It was therefore puzzling that sex determination often involves a single genetic locus. Muller believed that the evolution of a single-gene system was possible, because maize geneticists had synthesized a single-gene system with separate sexes. However, this system is highly artificial, requiring geneticists to actively eliminate the wild-type allele at one of the two genes involved. This genetic system cannot therefore explain the natural evolution of dioecy. In 1958, Westergaard reviewed studies from a diversity of flowering plants, and showed that the genetics of natural sex determination in plants does not support the maize system. Instead, the genetic results pointed to a model involving two separate factors, with close linkage creating a single genetic locus. Moreover, Westergaard also pointed out that a two-gene model offers a natural explanation for the evolution of suppressed recombination between sex chromosome pairs. Studying plants allowed genetic analyses of the early steps in the evolution of dioecy, using dioecious species that evolved recently from species without separate sexes, whereas Muller failed to fully understand such evolutionary changes because he focused on animals, where later changes have often happened and obscured the early stages.

RNA viruses are diverse, abundant, and rapidly evolving. Genetic data have been generated from virus populations since the late 1970s and used to understand their evolution, emergence, and spread, culminating in the generation and analysis of many thousands of viral genome sequences. Despite this wealth of data, evolutionary genetics has played a surprisingly small role in our understanding of virus evolution. Instead, studies of RNA virus evolution have been dominated by two very different perspectives, the experimental and the comparative, that have largely been conducted independently and sometimes antagonistically. Here, we review the insights that these two approaches have provided over the last 40 years. We show that experimental approaches using in vitro and in vivo laboratory models are largely focused on short-term intrahost evolutionary mechanisms, and may not always be relevant to natural systems. In contrast, the comparative approach relies on the phylogenetic analysis of natural virus populations, usually considering data collected over multiple cycles of virus–host transmission, but is divorced from the causative evolutionary processes. To truly understand RNA virus evolution it is necessary to meld experimental and comparative approaches within a single evolutionary genetic framework, and to link viral evolution at the intrahost scale with that which occurs over both epidemiological and geological timescales. We suggest that the impetus for this new synthesis may come from methodological advances in next-generation sequencing and metagenomics.

Triacylglycerol (TAG) is the most important caloric source with respect to energy homeostasis in animals. In addition to its evolutionarily conserved importance as an energy source, TAG turnover is crucial to the metabolism of structural and signaling lipids. These neutral lipids are also key players in development and disease. Here, we review the metabolism of TAG in the Drosophila model system. Recently, the fruit fly has attracted renewed attention in research due to the unique experimental approaches it affords in studying the tissue-autonomous and interorgan regulation of lipid metabolism in vivo. Following an overview of the systemic control of fly body fat stores, we will cover lipid anabolic, enzymatic, and regulatory processes, which begin with the dietary lipid breakdown and de novo lipogenesis that results in lipid droplet storage. Next, we focus on lipolytic processes, which mobilize storage TAG to make it metabolically accessible as either an energy source or as a building block for biosynthesis of other lipid classes. Since the buildup and breakdown of fat involves various organs, we highlight avenues of lipid transport, which are at the heart of functional integration of organismic lipid metabolism. Finally, we draw attention to some "missing links" in basic neutral lipid metabolism and conclude with a perspective on how fly research can be exploited to study functional metabolic roles of diverse lipids.

Thousands of maize landraces are stored in seed banks worldwide. Doubled-haploid libraries (DHL) produced from landraces harness their rich genetic diversity for future breeding. We investigated the prospects of genomic prediction (GP) for line per se performance in DHL from six European landraces and 53 elite flint (EF) lines by comparing four scenarios: GP within a single library (sL); GP between pairs of libraries (LwL); and GP among combined libraries, either including (cLi) or excluding (cLe) lines from the training set (TS) that belong to the same DHL as the prediction set. For scenario sL, with N = 50 lines in the TS, the prediction accuracy () among seven agronomic traits varied from –0.53 to 0.57 for the DHL and reached up to 0.74 for the EF lines. For LwL, was close to zero for all DHL and traits. Whereas scenario cLi showed improved values compared to sL, for cLe remained at the low level observed for LwL. Forecasting with deterministic equations yielded inflated values compared to empirical estimates of for the DHL, but conserved the ranking. In conclusion, GP is promising within DHL, but large TS sizes (N > 100) are needed to achieve decent prediction accuracy because LD between QTL and markers is the primary source of information that can be exploited by GP. Since production of DHL from landraces is expensive, we recommend GP only for very large DHL produced from a few highly preselected landraces.

The number of chromosomes carried by an individual species is one of its defining characteristics. Some species, however, can also carry supernumerary chromosomes referred to as B chromosomes. B chromosomes were recently identified in a laboratory stock of Drosophila melanogaster—an established model organism with a wealth of genetic and genomic resources—enabling us to subject them to extensive molecular analysis. We isolated the B chromosomes by pulsed-field gel electrophoresis and determined their composition through next-generation sequencing. Although these B chromosomes carry no known euchromatic sequence, they are rich in transposable elements and long arrays of short nucleotide repeats, the most abundant being the uncharacterized AAGAT satellite repeat. Fluorescent in situ hybridization on metaphase chromosome spreads revealed this repeat is located on chromosome 4, strongly suggesting the origin of the B chromosomes is chromosome 4. Cytological and quantitative comparisons of signal intensity between chromosome 4 and the B chromosomes supports the hypothesis that the structure of the B chromosome is an isochromosome. We also report the identification of a new B chromosome variant in a related laboratory stock. This B chromosome has a similar repeat signature as the original but is smaller and much less prevalent. We examined additional stocks with similar genotypes and did not find B chromosomes, but did find these stocks lacked the AAGAT satellite repeat. Our molecular characterization of D. melanogaster B chromosomes is the first step toward understanding how supernumerary chromosomes arise from essential chromosomes and what may be necessary for their stable inheritance.

Meiotic recombination is a major driver of genome evolution by creating new genetic combinations. To probe the factors driving variability of meiotic recombination, we used a high-throughput method to measure recombination rates in hybrids between SK1 and a total of 26 Saccharomyces cerevisiae strains from different geographic origins and habitats. Fourteen intervals were monitored for each strain, covering chromosomes VI and XI entirely, and part of chromosome I. We found an average number of crossovers per chromosome ranging between 1.0 and 9.5 across strains ("domesticated" or not), which is higher than the average between 0.5 and 1.5 found in most organisms. In the different intervals analyzed, recombination showed up to ninefold variation across strains but global recombination landscapes along chromosomes varied less. We also built an incomplete diallel experiment to measure recombination rates in one region of chromosome XI in 10 different crosses involving five parental strains. Our overall results indicate that recombination rate is increasingly positively correlated with sequence similarity between homologs (i) in DNA double-strand-break-rich regions within intervals, (ii) in entire intervals, and (iii) at the whole genome scale. Therefore, these correlations cannot be explained by cis effects only. We also estimated that cis and trans effects explained 38 and 17%, respectively, of the variance of recombination rate. In addition, by using a quantitative genetics analysis, we identified an inbreeding effect that reduces recombination rate in homozygous genotypes, while other interaction effects (specific combining ability) or additive effects (general combining ability) are found to be weak. Finally, we measured significant crossover interference in some strains, and interference intensity was positively correlated with crossover number.

Cells expend a large amount of energy to maintain their DNA sequence. DNA repair pathways, cell cycle checkpoint activation, proofreading polymerases, and chromatin structure are ways in which the cell minimizes changes to the genome. During replication, the DNA-damage tolerance pathway allows the replication forks to bypass damage on the template strand. This avoids prolonged replication fork stalling, which can contribute to genome instability. The DNA-damage tolerance pathway includes two subpathways: translesion synthesis and template switch. Post-translational modification of PCNA and the histone tails, cell cycle phase, and local DNA structure have all been shown to influence subpathway choice. Chromatin architecture contributes to maintaining genome stability by providing physical protection of the DNA and by regulating DNA-processing pathways. As such, chromatin-binding factors have been implicated in maintaining genome stability. Using Saccharomyces cerevisiae, we examined the role of Spn1 (Suppresses postrecruitment gene number 1), a chromatin-binding and transcription elongation factor, in DNA-damage tolerance. Expression of a mutant allele of SPN1 results in increased resistance to the DNA-damaging agent methyl methanesulfonate, lower spontaneous and damage-induced mutation rates, along with increased chronological life span. We attribute these effects to an increased usage of the template switch branch of the DNA-damage tolerance pathway in the spn1 strain. This provides evidence for a role of wild-type Spn1 in promoting genome instability, as well as having ties to overcoming replication stress and contributing to chronological aging.

Pathological mutations involving noncoding microsatellite repeats are typically located near promoters in CpG islands and are coupled with extensive repeat instability when sufficiently long. What causes these regions to be prone to repeat instability is not fully understood. There is a general consensus that instability results from the induction of unusual structures in the DNA by the repeats as a consequence of mispairing between complementary strands. In addition, there is some evidence that repeat instability is mediated by RNA transcription through the formation of three-stranded nucleic structures composed of persistent DNA:RNA hybrids, concomitant with single-strand DNA displacements (R-loops). Using human embryonic stem cells with wild-type and repeat expanded alleles in the FMR1 (CGGs) and C9orf72 (GGGGCCs) genes, we show that these loci constitute preferential sites (hotspots) for DNA unpairing. When R-loops are formed, DNA unpairing is more extensive, and is coupled with the interruptions of double-strand structures by the nontranscribing (G-rich) DNA strand. These interruptions are likely to reflect unusual structures in the DNA that drive repeat instability when the G-rich repeats considerably expand. Further, we demonstrate that when the CGGs in FMR1 are hyper-methylated and transcriptionally inactive, local DNA unpairing is abolished. Our study thus takes one more step toward the identification of dynamic, unconventional DNA structures across the G-rich repeats at FMR1 and C9orf72 disease-associated loci.

Laboratory baker’s yeast strains bearing an incompatible combination of MLH1 and PMS1 mismatch repair alleles are mutators that can adapt more rapidly to stress, but do so at the cost of long-term fitness. We identified 18 baker’s yeast isolates from 1011 surveyed that contain the incompatible MLH1-PMS1 genotype in a heterozygous state. Surprisingly, the incompatible combination from two human clinical heterozygous diploid isolates, YJS5845 and YJS5885, contain the exact MLH1 (S288c-derived) and PMS1 (SK1-derived) open reading frames originally shown to confer incompatibility. While these isolates were nonmutators, their meiotic spore clone progeny displayed mutation rates in a DNA slippage assay that varied over a 340-fold range. This range was 30-fold higher than observed between compatible and incompatible combinations of laboratory strains. Genotyping analysis indicated that MLH1-PMS1 incompatibility was the major driver of mutation rate in the isolates. The variation in the mutation rate of incompatible spore clones could be due to background suppressors and enhancers, as well as aneuploidy seen in the spore clones. Our data are consistent with the observed variance in mutation rate contributing to adaptation to stress conditions (e.g., in a human host) through the acquisition of beneficial mutations, with high mutation rates leading to long-term fitness costs that are buffered by mating or eliminated through natural selection.

Splicing of precursor messenger RNAs (pre-mRNAs) is an essential step in the expression of most eukaryotic genes. Both constitutive splicing and alternative splicing, which produces multiple messenger RNA (mRNA) isoforms from a single primary transcript, are modulated by reversible protein phosphorylation. Although the plant splicing machinery is known to be a target for phosphorylation, the protein kinases involved remain to be fully defined. We report here the identification of pre-mRNA processing 4 (PRP4) KINASE A (PRP4KA) in a forward genetic screen based on an alternatively spliced GFP reporter gene in Arabidopsis thaliana (Arabidopsis). Prp4 kinase is the first spliceosome-associated kinase shown to regulate splicing in fungi and mammals but it has not yet been studied in plants. In the same screen we identified mutants defective in SAC3A, a putative mRNA export factor that is highly coexpressed with PRP4KA in Arabidopsis. Whereas the sac3a mutants appear normal, the prp4ka mutants display a pleiotropic phenotype featuring atypical rosettes, late flowering, tall final stature, reduced branching, and lowered seed set. Analysis of RNA-sequencing data from prp4ka and sac3a mutants identified widespread and partially overlapping perturbations in alternative splicing in the two mutants. Quantitative phosphoproteomic profiling of a prp4ka mutant detected phosphorylation changes in several serine/arginine-rich proteins, which regulate constitutive and alternative splicing, and other splicing-related factors. Tests of PRP4KB, the paralog of PRP4KA, indicated that the two genes are not functionally redundant. The results demonstrate the importance of PRP4KA for alternative splicing and plant phenotype, and suggest that PRP4KA may influence alternative splicing patterns by phosphorylating a subset of splicing regulators.

Transgenerational epigenetic inheritance (TEI) is the inheritance of epigenetic information for two or more generations. In most cases, TEI is limited to a small number of generations (two to three). The short-term nature of TEI could be set by innate biochemical limitations to TEI or by genetically encoded systems that actively limit TEI. In Caenorhabditis elegans, double-stranded RNA (dsRNA)-mediated gene silencing [RNAi (RNA interference)] can be inherited (termed RNAi inheritance or RNA-directed TEI). To identify systems that might actively limit RNA-directed TEI, we conducted a forward genetic screen for factors whose mutation enhanced RNAi inheritance. This screen identified the gene heritable enhancer of RNAi (heri-1), whose mutation causes RNAi inheritance to last longer (> 20 generations) than normal. heri-1 encodes a protein with a chromodomain, and a kinase homology domain that is expressed in germ cells and localizes to nuclei. In C. elegans, a nuclear branch of the RNAi pathway [termed the nuclear RNAi or NRDE (nuclear RNA defective) pathway] promotes RNAi inheritance. We find that heri-1(–) animals have defects in spermatogenesis that are suppressible by mutations in the nuclear RNAi Argonaute (Ago) HRDE-1, suggesting that HERI-1 might normally act in sperm progenitor cells to limit nuclear RNAi and/or RNAi inheritance. Consistent with this idea, we find that the NRDE nuclear RNAi pathway is hyperresponsive to experimental RNAi treatments in heri-1 mutant animals. Interestingly, HERI-1 binds to genes targeted by RNAi, suggesting that HERI-1 may have a direct role in limiting nuclear RNAi and, therefore, RNAi inheritance. Finally, the recruitment of HERI-1 to chromatin depends upon the same factors that drive cotranscriptional gene silencing, suggesting that the generational perdurance of RNAi inheritance in C. elegans may be set by competing pro- and antisilencing outputs of the nuclear RNAi machinery.

Protein isoprenylation targets a subset of COOH-terminal Cxxx tetrapeptide sequences that has been operationally defined as a CaaX motif. The specificity of the farnesyl transferase toward each of the possible 8000 combinations of Cxxx sequences, however, remains largely unresolved. In part, it has been difficult to consolidate results stemming from in vitro and in silico approaches that yield a wider array of prenylatable sequences relative to those known in vivo. We have investigated whether this disconnect results from the multistep complexity of post-translational modification that occurs in vivo to CaaX proteins. For example, the Ras GTPases undergo isoprenylation followed by additional proteolysis and carboxymethylation events at the COOH-terminus. By contrast, Saccharomyces cerevisiae Hsp40 Ydj1p is isoprenylated but not subject to additional modification. In fact, additional modifications are detrimental to Ydj1p activity in vivo. We have taken advantage of the properties of Ydj1p and a Ydj1p-dependent growth assay to identify sequences that permit Ydj1p isoprenylation in vivo while simultaneously selecting against nonprenylatable and more extensively modified sequences. The recovered sequences are largely nonoverlapping with those previously identified using an in vivo Ras-based yeast reporter. Moreover, most of the sequences are not readily predicted as isoprenylation targets by existing prediction algorithms. Our results reveal that the yeast CaaX-type prenyltransferases can utilize a range of sequence combinations that extend beyond the traditional constraints for CaaX proteins, which implies that more proteins may be isoprenylated than previously considered.

Dealing with physiological stress is a necessity for all organisms, and the pathways charged with this task are highly conserved in Metazoa . Accumulating evidence highlights cell-nonautonomous activation as an important mode of integrating stress responses at the organism level. Work in Caenorhabditis elegans highlighted the importance of such regulation for the unfolded protein response (UPR) and for gene expression downstream of the longevity-associated transcription factor DAF-16. Here we describe a role for the JNK homolog KGB-1 in cell-nonautonomous regulation of these two response modules. KGB-1 protects developing larvae from heavy metals and from protein folding stress (which we found to be independent of canonical UPR pathways), but sensitizes adults to the same stress, further shortening life span under normal conditions. This switch is associated with age-dependent antagonistic regulation of DAF-16. Using transgenic tissue-specific KGB-1 expression or tissue-specific KGB-1 activation we examined the contributions of KGB-1 to gene regulation, stress resistance, and life span. While cell-autonomous contributions were observed, particularly in the epidermis, cell-nonautonomous contributions of neuronal KGB-1 (and also in muscle) were effective in driving intestinal gene induction, age-dependent regulation of intestinal DAF-16, and stress resistance, and did not require KGB-1 expression in the target tissue. Additional genetic analyses revealed requirement for UNC-13 in mediating neuronal contributions, indicating involvement of neurotransmission. Our results expand the role of KGB-1 in stress responses from providing local cellular protection to integrating stress responses at the level of the whole organism.

Animals have evolved critical mechanisms to maintain cellular and organismal proteostasis during development, disease, and exposure to environmental stressors. The Unfolded Protein Response (UPR) is a conserved pathway that senses and responds to the accumulation of misfolded proteins in the endoplasmic reticulum (ER) lumen. We have previously demonstrated that the IRE-1-XBP-1 branch of the UPR is required to maintain Caenorhabditis elegans ER homeostasis during larval development in the presence of pathogenic Pseudomonas aeruginosa. In this study, we identify loss-of-function mutations in four conserved transcriptional regulators that suppress the larval lethality of xbp-1 mutant animals caused by immune activation in response to infection by pathogenic bacteria: FKH-9, a forkhead family transcription factor; ARID-1, an ARID/Bright domain-containing transcription factor; HCF-1, a transcriptional regulator that associates with histone modifying enzymes; and SIN-3, a subunit of a histone deacetylase complex. Further characterization of FKH-9 suggests that loss of FKH-9 enhances resistance to the ER toxin tunicamycin and results in enhanced ER-associated degradation (ERAD). Increased ERAD activity of fkh-9 loss-of-function mutants is accompanied by a diminished capacity to degrade cytosolic proteasomal substrates and a corresponding increased sensitivity to the proteasomal inhibitor bortezomib. Our data underscore how the balance between ER and cytosolic proteostasis can be influenced by compensatory activation of ERAD during the physiological ER stress of infection and immune activation.

The notoriety of the small GTPase Ras as the most mutated oncoprotein has led to a well-characterized signaling network largely conserved across metazoans. Yet the role of its close relative Rap1 (Ras Proximal), which shares 100% identity between their core effector binding sequences, remains unclear. A long-standing controversy in the field is whether Rap1 also functions to activate the canonical Ras effector, the S/T kinase Raf. We used the developmentally simpler Caenorhabditis elegans, which lacks the extensive paralog redundancy of vertebrates, to examine the role of RAP-1 in two distinct LET-60/Ras-dependent cell fate patterning events: induction of 1° vulval precursor cell (VPC) fate and of the excretory duct cell. Fluorescence-tagged endogenous RAP-1 is localized to plasma membranes and is expressed ubiquitously, with even expression levels across the VPCs. RAP-1 and its activating GEF PXF-1 function cell autonomously and are necessary for maximal induction of 1° VPCs. Critically, mutationally activated endogenous RAP-1 is sufficient both to induce ectopic 1°s and duplicate excretory duct cells. Like endogenous RAP-1, before induction GFP expression from the pxf-1 promoter is uniform across VPCs. However, unlike endogenous RAP-1, after induction GFP expression is increased in presumptive 1°s and decreased in presumptive 2°s. We conclude that RAP-1 is a positive regulator that promotes Ras-dependent inductive fate decisions. We hypothesize that PXF-1 activation of RAP-1 serves as a minor parallel input into the major LET-60/Ras signal through LIN-45/Raf.

Body size is a tightly regulated phenotype in metazoans that depends on both intrinsic and extrinsic factors. While signaling pathways are known to control organ and body size, the downstream effectors that mediate their effects remain poorly understood. In the nematode Caenorhabditis elegans, a Bone Morphogenetic Protein (BMP)-related signaling pathway is the major regulator of growth and body size. We investigated the transcriptional network through which the BMP pathway regulates body size and identified cuticle collagen genes as major effectors of growth control. We demonstrate that cuticle collagens can act as positive regulators (col-41), negative regulators (col-141), or dose-sensitive regulators (rol-6) of body size. Moreover, we find a requirement of BMP signaling for stage-specific expression of cuticle collagen genes. We show that the Smad signal transducers directly bind conserved Smad-binding elements in regulatory regions of col-141 and col-142, but not of col-41. Hence, cuticle collagen genes may be directly and indirectly regulated via the BMP pathway. Our work thus connects a conserved signaling pathway with its critical downstream effectors, advancing insight into how body size is specified. Since collagen mutations and misregulation are implicated in numerous human genetic disorders and injury sequelae, understanding how collagen gene expression is regulated has broad implications.

An essential characteristic of sleep is heightened arousal threshold, with decreased behavioral response to external stimuli. The molecular and cellular mechanisms underlying arousal threshold changes during sleep are not fully understood. We report that loss of UNC-7 or UNC-9 innexin function dramatically reduced sleep and decreased arousal threshold during developmentally timed sleep in Caenorhabditis elegans. UNC-7 function was required in premotor interneurons and UNC-9 function was required in motor neurons in this paradigm. Simultaneous transient overexpression of UNC-7 and UNC-9 was sufficient to induce anachronistic sleep in adult animals. Moreover, loss of UNC-7 or UNC-9 suppressed the increased sleep of EGL-4 gain-of-function animals, which have increased cyclic-GMP–dependent protein kinase activity. These results suggest C. elegans gap junctions may act downstream of previously identified sleep regulators. In other paradigms, the NCA cation channels act upstream of gap junctions. Consistent with this, diminished NCA channel activity in C. elegans robustly increased arousal thresholds during sleep bouts in L4-to-adult developmentally timed sleep. Total time in sleep bouts was only modestly increased in animals lacking NCA channel auxiliary subunit UNC-79, whereas increased channel activity dramatically decreased sleep. Loss of EGL-4 or innexin proteins suppressed UNC-79 loss-of-function sleep and arousal defects. In Drosophila, the ion channel narrow abdomen, an ortholog of the C. elegans NCA channels, drive the pigment dispersing factor (PDF) neuropeptide release, regulating circadian behavior. However, in C. elegans, we found that loss of the PDF receptor PDFR-1 did not suppress gain-of-function sleep defects, suggesting an alternative downstream pathway. This study emphasizes the conservation and importance of neuronal activity modulation during sleep, and unequivocally demonstrates that gap junction function is critical for normal sleep.

The plant circadian clock allows the synchronization of internal physiological responses to match the predicted environment. HSP90.2 is a molecular chaperone that has been previously described as required for the proper functioning of the Arabidopsis oscillator under both ambient and warm temperatures. Here, we have characterized the circadian phenotype of the hsp90.2-3 mutant. As previously reported using pharmacological or RNA interference inhibitors of HSP90 function, we found that hsp90.2-3 lengthens the circadian period and that the observed period lengthening was more exaggerated in warm–cold-entrained seedlings. However, we observed no role for the previously identified interactors of HSP90.2, GIGANTEA and ZEITLUPPE, in HSP90-mediated period lengthening. We constructed phase-response curves (PRCs) in response to warmth pulses to identify the entry point of HSP90.2 to the oscillator. These PRCs revealed that hsp90.2-3 has a circadian defect within the morning. Analysis of the cca1, lhy, prr9, and prr7 mutants revealed a role for CCA1, LHY, and PRR7, but not PRR9, in HSP90.2 action to the circadian oscillator. Overall, we define a potential pathway for how HSP90.2 can entrain the Arabidopsis circadian oscillator.

Cadherins are cell adhesion molecules that regulate numerous adhesive interactions during embryonic development and adult life. Consistent with these functions, when their expression goes astray cells lose their normal adhesive properties resulting in defective morphogenesis, disease, and even metastatic cancer. In general, classical cadherins exert their effect by homophilic interactions via their five characteristic extracellular (EC) repeats. The EC1 repeat provides the mechanism for cadherins to dimerize with each other whereas the EC2 repeat may facilitate dimerization. Less is known about the other EC repeats. Here, we show that a zebrafish missense mutation in the EC5 repeat of N-cadherin is a dominant gain-of-function mutation and demonstrate that this mutation alters cell adhesion almost to the same degree as a zebrafish missense mutation in the EC1 repeat of N-cadherin. We also show that zebrafish E- and N-cadherin dominant gain-of-function missense mutations genetically interact. Perturbation of cell adhesion in embryos that are heterozygous mutant at both loci is similar to that observed in single homozygous mutants. Introducing an E-cadherin EC5 missense allele into the homozygous N-cadherin EC1 missense mutant more radically affects morphogenesis, causing synergistic phenotypes consistent with interdependent functions being disrupted. Our studies indicate that a functional EC5 repeat is critical for cadherin-mediated cell affinity, suggesting that its role may be more important than previously thought. These results also suggest the possibility that E- and N-cadherin have heterophilic interactions during early morphogenesis of the embryo; interactions that might help balance the variety of cell affinities needed during embryonic development.

We study how a block of genome with a large number of weakly selected loci introgresses under directional selection into a genetically homogeneous population. We derive exact expressions for the expected rate of growth of any fragment of the introduced block during the initial phase of introgression, and show that the growth rate of a single-locus variant is largely insensitive to its own additive effect, but depends instead on the combined effect of all loci within a characteristic linkage scale. The expected growth rate of a fragment is highly correlated with its long-term introgression probability in populations of moderate size, and can hence identify variants that are likely to introgress across replicate populations. We clarify how the introgression probability of an individual variant is determined by the interplay between hitchhiking with relatively large fragments during the early phase of introgression and selection on fine-scale variation within these, which at longer times results in differential introgression probabilities for beneficial and deleterious loci within successful fragments. By simulating individuals, we also investigate how introgression probabilities at individual loci depend on the variance of fitness effects, the net fitness of the introduced block, and the size of the recipient population, and how this shapes the net advance under selection. Our work suggests that even highly replicable substitutions may be associated with a range of selective effects, which makes it challenging to fine map the causal loci that underlie polygenic adaptation.

Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.

Hybrid sterility is a common form of reproductive isolation between nascent species. Although hybrid sterility is routinely documented and genetically dissected in speciation studies, its developmental basis is rarely examined, especially in generations beyond the F1 generation. To identify phenotypic and genetic determinants of hybrid male sterility from a developmental perspective, we characterized testis histology in 312 F2 hybrids generated by intercrossing inbred strains of Mus musculus domesticus and M. m. musculus, two subspecies of house mice. Hybrids display a range of histologic abnormalities that indicate defective spermatogenesis. Among these abnormalities, we quantified decreased testis size, reductions in spermatocyte and spermatid number, increased apoptosis of meiosis I spermatocytes, and more multinucleated syncytia. Collectively, our phenotypic data point to defects in meiosis I as a primary barrier to reproduction. We identified seven quantitative trait loci (QTL) controlling five histologic traits. A region of chromosome 17 that contains Prdm9, a gene known to confer F1 hybrid male sterility, affects multinucleated syncytia and round spermatids, potentially extending the phenotypic outcomes of this incompatibility. The X chromosome also plays a key role, with loci affecting multinucleated syncytia, apoptosis of round spermatids, and round spermatid numbers. We detected an epistatic interaction between QTL on chromosomes 17 and X for multinucleated syncytia. Our results refine the developmental basis of a key reproductive barrier in a classic model system for speciation genetics.

Parentage analysis is an important method that is used widely in zoological and ecological studies. Current mathematical models of parentage analyses usually assume that a population has a uniform genetic structure and that mating is panmictic. In a natural population, the geographic or social structure of a population, and/or nonrandom mating, usually leads to a genetic structure and results in genotypic frequencies deviating from those expected under the Hardy-Weinberg equilibrium (HWE). In addition, in the presence of null alleles, an observed genotype represents one of several possible true genotypes. The true father of a given offspring may thus be erroneously excluded in parentage analyses, or may have a low or negative LOD score. Here, we present a new mathematical model to estimate parentage that includes simultaneously the effects of inbreeding, null alleles, and negative amplification. The influences of these three factors on previous model are evaluated by Monte-Carlo simulations and empirical data, and the performance of our new model is compared under controlled conditions. We found that, for both simulated and empirical data, our new model outperformed other methods in many situations. We make available our methods in a new, free software package entitled parentage. This can be downloaded via

Early prediction of complex disorders (e.g., autism and other neurodevelopmental disorders) is one of the fundamental goals of precision medicine and personalized genomics. An early prediction of complex disorders can improve the prognosis, increase the effectiveness of interventions and treatments, and enhance the life quality of affected patients. Considering the genetic heritability of neurodevelopmental disorders, we are proposing a novel framework for utilizing rare coding variation for early prediction of these disorders in subset of affected samples. We provide a combinatorial framework for addressing this problem, denoted as Odin (Oracle for DIsorder predictioN), to make a prediction for a small, yet significant, subset of affected cases while having very low false positive rate (FPR) prediction for unaffected samples. Odin also takes advantage of the available functional information (e.g., pairwise coexpression of genes during brain development) to increase the prediction power beyond genes with recurrent variants. Application of our method accurately recovers an additional 8% of autism cases without any severe variant in known recurrent mutated genes with a <1% FPR. Furthermore, Odin predicted a set of 391 genes that severe variants in these genes can cause autism or other developmental delay disorders. Approaches such as the one presented in this paper are needed to translate the biomedical discoveries into actionable items by clinicians. Odin is publicly available at

Carrots are among the richest sources of provitamin A carotenes in the human diet, but genetic variation in the carotenoid pathway does not fully explain the high levels of carotenoids in carrot roots. Using a diverse collection of modern and historic domesticated varieties, and wild carrot accessions, an association analysis for orange pigmentation revealed a significant genomic region that contains the Or gene, advancing it as a candidate for carotenoid presence in carrot. Analysis of sequence variation at the Or locus revealed a nonsynonymous mutation cosegregating with carotenoid content. This mutation was absent in all wild carrot samples and nearly fixed in all orange domesticated samples. Or has been found to control carotenoid presence in other crops but has not previously been described in carrot. Our analysis also allowed us to more completely characterize the genetic structure of carrot, showing that the Western domesticated carrot largely forms one genetic group, despite dramatic phenotypic differences among market classes. Eastern domesticated and wild accessions form a second group, which reflects the recent cultivation history of carrots in Central Asia. Other wild accessions form distinct geographic groups, particularly on the Iberian peninsula and in Northern Africa. Using genome-wide Fst, nucleotide diversity, and the cross-population composite likelihood ratio, we analyzed the genome for regions putatively under selection during domestication and identified 12 regions that were significant for all three methods of detection, one of which includes the Or gene. The Or domestication allele appears to have been selected after the initial domestication of yellow carrots in the East, near the proposed center of domestication in Central Asia. The rapid fixation of the Or domestication allele in almost all orange and nonorange carrots in the West may explain why it has not been found with less genetically diverse mapping populations.

Phenotypic complexity is caused by the contributions of environmental factors and multiple genetic loci, interacting or acting independently. Studies of yeast and Arabidopsis often find that the majority of natural variation across phenotypes is attributable to independent additive quantitative trait loci (QTL). Detected loci in these organisms explain most of the estimated heritable variation. By contrast, many heritable components underlying phenotypic variation in metazoan models remain undetected. Before the relative impacts of additive and interactive variance components on metazoan phenotypic variation can be dissected, high replication and precise phenotypic measurements are required to obtain sufficient statistical power to detect loci contributing to this missing heritability. Here, we used a panel of 296 recombinant inbred advanced intercross lines of Caenorhabditis elegans and a high-throughput fitness assay to detect loci underlying responses to 16 different toxins, including heavy metals, chemotherapeutic drugs, pesticides, and neuropharmaceuticals. Using linkage mapping, we identified 82 QTL that underlie variation in responses to these toxins, and predicted the relative contributions of additive loci and genetic interactions across various growth parameters. Additionally, we identified three genomic regions that impact responses to multiple classes of toxins. These QTL hotspots could represent common factors impacting toxin responses. We went further to generate near-isogenic lines and chromosome substitution strains, and then experimentally validated these QTL hotspots, implicating additive and interactive loci that underlie toxin-response variation.

To identify novel disease genes for type 2 diabetes (T2D) we generated two backcross populations of obese and diabetes-susceptible New Zealand Obese (NZO/HI) mice with the two lean mouse strains 129P2/OlaHsd and C3HeB/FeJ. Subsequent whole-genome linkage scans revealed 30 novel quantitative trait loci (QTL) for T2D-associated traits. The strongest association with blood glucose [12 cM, logarithm of the odds (LOD) 13.3] and plasma insulin (17 cM, LOD 4.8) was detected on proximal chromosome 7 (designated Nbg7p, NZO blood glucose on proximal chromosome 7) exclusively in the NZOxC3H crossbreeding, suggesting that the causal gene is contributed by the C3H genome. Introgression of the critical C3H fragment into the genetic NZO background by generating recombinant congenic strains and metabolic phenotyping validated the phenotype. For the detection of candidate genes in the critical region (30–46 Mb), we used a combined approach of haplotype and gene expression analysis to search for C3H-specific gene variants in the pancreatic islets, which appeared to be the most likely target tissue for the QTL. Two genes, Atp4a and Pop4, fulfilled the criteria from our candidate gene approaches. The knockdown of both genes in MIN6 cells led to decreased glucose-stimulated insulin secretion, indicating a regulatory role of both genes in insulin secretion, thereby possibly contributing to the phenotype linked to Nbg7p. In conclusion, our combined- and comparative-cross analysis approach has successfully led to the identification of two novel diabetes susceptibility candidate genes, and thus has been proven to be a valuable tool for the discovery of novel disease genes.

In the budding yeast Saccharomyces cerevisiae, ribosomal RNA genes are encoded in a highly repetitive tandem array referred to as the ribosomal DNA (rDNA) locus. The yeast rDNA is the site of a diverse set of DNA-dependent processes, including transcription of ribosomal RNAs by RNA polymerases I and III, transcription of noncoding RNAs by RNA polymerase II, DNA replication initiation, replication fork blocking, and recombination-mediated regulation of rDNA repeat copy number. All of this takes place in the context of chromatin, but little is known about the roles played by ATP-dependent chromatin remodeling factors at the yeast rDNA. In this work, we report that the Isw2 and Ino80 chromatin remodeling factors are targeted to this highly repetitive locus. We characterize for the first time their function in modifying local chromatin structure, finding that loss of these factors decreases the fraction of actively transcribed 35S ribosomal RNA genes and the positioning of nucleosomes flanking the ribosomal origin of replication. In addition, we report that Isw2 and Ino80 promote efficient firing of the ribosomal origin of replication and facilitate the regulated increase of rDNA repeat copy number. This work significantly expands our understanding of the importance of ATP-dependent chromatin remodeling for rDNA biology.



Genetic Ethics

The advances in genetic mapping have made very real what seemed so improbable twenty years ago. ... Genetic mapping is a powerful tool ... but it is also vulnerable to abuse. Many ethical, legal and societal issues are beginning to emerge...
Read More


Gene Map

The linear arrangement of mutable sites on a chromosome as deduced from genetic recombination experiments.