Using gene map science to evaluate the genetic map and eliminate disease

## Genetic News

Genetic variants disrupting DNA methylation at CpG dinucleotides (CpG-SNP) provide a set of known causal variants to serve as models to test fine-mapping methodology. We use 1716 CpG-SNPs to test three fine-mapping approaches (Bayesian imputation-based association mapping, Bayesian sparse linear mixed model, and the J-test), assessing the impact of imputation errors and the choice of reference panel by using both whole-genome sequence (WGS), and genotype array data on the same individuals (n = 1166). The choice of imputation reference panel had a strong effect on imputation accuracy, with the 1000 Genomes Project Phase 3 (1000G) reference panel (n = 2504 from 26 populations) giving a mean nonreference discordance rate between imputed and sequenced genotypes of 3.2% compared to 1.6% when using the Haplotype Reference Consortium (HRC) reference panel (n = 32,470 Europeans). These imputation errors had an impact on whether the CpG-SNP was included in the 95% credible set, with a difference of ~23% and ~7% between the WGS and the 1000G and HRC imputed datasets, respectively. All of the fine-mapping methods failed to reach the expected 95% coverage of the CpG-SNP. This is attributed to secondary cis genetic effects that are unable to be statistically separated from the CpG-SNP, and through a masking mechanism where the effect of the methylation disrupting allele at the CpG-SNP is hidden by the effect of a nearby SNP that has strong linkage disequilibrium with the CpG-SNP. The reduced accuracy in fine-mapping a known causal variant in a low-level biological trait with imputed genetic data has implications for the study of higher-order complex traits and disease.

Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5–6x and down to 7–8x for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses.

Double-strand DNA breaks are repaired by one of several mechanisms that rejoin two broken ends. However, cells are challenged when asked to repair a single broken end and respond by: (1) inducing programmed cell death; (2) healing the broken end by constructing a new telomere; (3) adapting to the broken end and resuming the mitotic cycle without repair; and (4) using information from the sister chromatid or homologous chromosome to restore a normal chromosome terminus. During one form of homolog-dependent repair in yeast, termed break-induced replication (BIR), a template chromosome can be copied for hundreds of kilobases. BIR efficiency depends on Pif1 helicase and Pol32, a nonessential subunit of DNA polymerase . To date, there is little evidence that BIR can be used for extensive chromosome repair in higher eukaryotes. We report that a dicentric chromosome broken in mitosis in the male germline of Drosophila melanogaster is usually repaired by healing, but can also be repaired in a homolog-dependent fashion, restoring at least 1.3 Mb of terminal sequence information. This mode of repair is significantly reduced in pif1 and pol32 mutants. Formally, the repaired chromosomes are recombinants. However, the absence of reciprocal recombinants and the dependence on Pif1 and Pol32 strongly support the hypothesis that BIR is the mechanism for restoration of the chromosome terminus. In contrast to yeast, pif1 mutants in Drosophila exhibit a reduced rate of chromosome healing, likely owing to fundamental differences in telomeres between these organisms.

Fumarase is a well-characterized TCA cycle enzyme that catalyzes the reversible conversion of fumarate to malate. In mammals, fumarase acts as a tumor suppressor, and loss-of-function mutations in the FH gene in hereditary leiomyomatosis and renal cell cancer result in the accumulation of intracellular fumarate—an inhibitor of α-ketoglutarate-dependent dioxygenases. Fumarase promotes DNA repair by nonhomologous end joining in mammalian cells through interaction with the histone variant H2A.Z, and inhibition of KDM2B, a H3 K36-specific histone demethylase. Here, we report that Saccharomyces cerevisiae fumarase, Fum1p, acts as a response factor during DNA replication stress, and fumarate enhances survival of yeast lacking Htz1p (H2A.Z in mammals). We observed that exposure to DNA replication stress led to upregulation as well as nuclear enrichment of Fum1p, and raising levels of fumarate in cells via deletion of FUM1 or addition of exogenous fumarate suppressed the sensitivity to DNA replication stress of htz1 mutants. This suppression was independent of modulating nucleotide pool levels. Rather, our results are consistent with fumarate conferring resistance to DNA replication stress in htz1 mutants by inhibiting the H3 K4-specific histone demethylase Jhd2p, and increasing H3 K4 methylation. Although the timing of checkpoint activation and deactivation remained largely unaffected by fumarate, sensors and mediators of the DNA replication checkpoint were required for fumarate-dependent resistance to replication stress in the htz1 mutants. Together, our findings imply metabolic enzymes and metabolites aid in processing replicative intermediates by affecting chromatin modification states, thereby promoting genome integrity.

Microsatellite sequences have an enhanced susceptibility to mutation, and can act as sentinels indicating elevated mutation rates and increased risk of cancer. The probability of mutant fixation within the intestinal epithelium is dictated by a combination of stem cell dynamics and mutation rate. Here, we exploit this relationship to infer microsatellite mutation rates. First a sensitive, multiplexed, and quantitative method for detecting somatic changes in microsatellite length was developed that allowed the parallel detection of mutant [CA]n sequences from hundreds of low-input tissue samples at up to 14 loci. The method was applied to colonic crypts in Mus musculus, and enabled detection of mutant subclones down to 20% of the cellularity of the crypt (~50 of 250 cells). By quantifying age-related increases in clone frequencies for multiple loci, microsatellite mutation rates in wild-type and Msh2-deficient epithelium were established. An average 388-fold increase in mutation per mitosis rate was observed in Msh2-deficient epithelium (2.4 10–2) compared to wild-type epithelium (6.2 10–5).

Signaling pathways can regulate biological responses by the transcriptional regulation of target genes. In yeast, multiple signaling pathways control filamentous growth, a morphogenetic response that occurs in many species including fungal pathogens. Here, we examine the role of signaling pathways that control filamentous growth in regulating adhesion-dependent surface responses, including mat formation and colony patterning. Expression profiling and mutant phenotype analysis showed that the major pathways that regulate filamentous growth [filamentous growth MAPK (fMAPK), RAS, retrograde (RTG), RIM101, RPD3, ELP, SNF1, and PHO85] also regulated mat formation and colony patterning. The chromatin remodeling complex, SAGA, also regulated these responses. We also show that the RAS and RTG pathways coregulated a common set of target genes, and that SAGA regulated target genes known to be controlled by the fMAPK, RAS, and RTG pathways. Analysis of surface growth-specific targets identified genes that respond to low oxygen, high temperature, and desiccation stresses. We also explore the question of why cells make adhesive contacts in colonies. Cell adhesion contacts mediated by the coregulated target and adhesion molecule, Flo11p, deterred entry into colonies by macroscopic predators and impacted colony temperature regulation. The identification of new regulators (e.g., SAGA), and targets of surface growth in yeast may provide insights into fungal pathogenesis in settings where surface growth and adhesion contributes to virulence.

Neurospora crassa is an established reference organism to investigate carotene biosynthesis and light regulation. However, there is little evidence of its capacity to produce secondary metabolites. Here, we report the role of the fungal-specific regulatory velvet complexes in development and secondary metabolism (SM) in N. crassa. Three velvet proteins VE-1, VE-2, VOS-1, and a putative methyltransferase LAE-1 show light-independent nucleocytoplasmic localization. Two distinct velvet complexes, a heterotrimeric VE-1/VE-2/LAE-1 and a heterodimeric VE-2/VOS-1 are found in vivo. The heterotrimer-complex, which positively regulates sexual development and represses asexual sporulation, suppresses siderophore coprogen production under iron starvation conditions. The VE-1/VE-2 heterodimer controls carotene production. VE-1 regulates the expression of >15% of the whole genome, comprising mainly regulatory and developmental features. We also studied intergenera functions of the velvet complex through complementation of Aspergillus nidulans veA, velB, laeA, vosA mutants with their N. crassa orthologs ve-1, ve-2, lae-1, and vos-1, respectively. Expression of VE-1 and VE-2 in A. nidulans successfully substitutes the developmental and SM functions of VeA and VelB by forming two functional chimeric velvet complexes in vivo, VelB/VE-1/LaeA and VE-2/VeA/LaeA, respectively. Reciprocally, expression of veA restores the phenotypes of the N. crassa ve-1 mutant. All N. crassa velvet proteins heterologously expressed in A. nidulans are localized to the nuclear fraction independent of light. These data highlight the conservation of the complex formation in N. crassa and A. nidulans. However, they also underline the intergenera similarities and differences of velvet roles according to different life styles, niches and ontogenetic processes.

The Polymerase Associated Factor 1 complex (Paf1C) is a multifunctional regulator of eukaryotic gene expression important for the coordination of transcription with chromatin modification and post-transcriptional processes. In this study, we investigated the extent to which the functions of Paf1C combine to regulate the Saccharomyces cerevisiae transcriptome. While previous studies focused on the roles of Paf1C in controlling mRNA levels, here, we took advantage of a genetic background that enriches for unstable transcripts, and demonstrate that deletion of PAF1 affects all classes of Pol II transcripts including multiple classes of noncoding RNAs (ncRNAs). By conducting a de novo differential expression analysis independent of gene annotations, we found that Paf1 positively and negatively regulates antisense transcription at multiple loci. Comparisons with nascent transcript data revealed that many, but not all, changes in RNA levels detected by our analysis are due to changes in transcription instead of post-transcriptional events. To investigate the mechanisms by which Paf1 regulates protein-coding genes, we focused on genes involved in iron and phosphate homeostasis, which were differentially affected by PAF1 deletion. Our results indicate that Paf1 stimulates phosphate gene expression through a mechanism that is independent of any individual Paf1C-dependent histone modification. In contrast, the inhibition of iron gene expression by Paf1 correlates with a defect in H3 K36 trimethylation. Finally, we showed that one iron regulon gene, FET4, is coordinately controlled by Paf1 and transcription of upstream noncoding DNA. Together, these data identify roles for Paf1C in controlling both coding and noncoding regions of the yeast genome.

Condensins are evolutionarily conserved protein complexes that are required for chromosome segregation during cell division and genome organization during interphase. In Caenorhabditis elegans, a specialized condensin, which forms the core of the dosage compensation complex (DCC), binds to and represses X chromosome transcription. Here, we analyzed DCC localization and the effect of DCC depletion on histone modifications, transcription factor binding, and gene expression using chromatin immunoprecipitation sequencing and mRNA sequencing. Across the X, the DCC accumulates at accessible gene regulatory sites in active chromatin and not heterochromatin. The DCC is required for reducing the levels of activating histone modifications, including H3K4me3 and H3K27ac, but not repressive modification H3K9me3. In X-to-autosome fusion chromosomes, DCC spreading into the autosomal sequences locally reduces gene expression, thus establishing a direct link between DCC binding and repression. Together, our results indicate that DCC-mediated transcription repression is associated with a reduction in the activity of X chromosomal gene regulatory elements.

Alp/Enigma family members have a unique PDZ domain followed by zero to four LIM domains, and are essential for myofibril assembly across all species analyzed so far. Drosophila melanogaster has three Alp/Enigma family members, Zasp52, Zasp66, and Zasp67. Ortholog search and phylogenetic tree analysis suggest that Zasp genes have a common ancestor, and that Zasp66 and Zasp67 arose by duplication in insects. While Zasp66 has a conserved domain structure across orthologs, Zasp67 domains and lengths are highly variable. In flies, Zasp67 appears to be expressed only in indirect flight muscles, where it colocalizes with Zasp52 at Z-discs. We generated a CRISPR null mutant of Zasp67, which is viable but flightless. We can rescue all phenotypes by re-expressing a Zasp67 transgene at endogenous levels. Zasp67 mutants show extended and broken Z-discs in adult flies, indicating that the protein helps stabilize the highly regular myofibrils of indirect flight muscles. In contrast, a Zasp66 CRISPR null mutant has limited viability, but only mild indirect flight muscle defects illustrating the diverging evolutionary paths these two paralogous genes have taken since they arose by duplication.

Self-perpetuating transmissible protein aggregates, termed prions, are implicated in mammalian diseases and control phenotypically detectable traits in Saccharomyces cerevisiae. Yeast stress-inducible chaperone proteins, including Hsp104 and Hsp70-Ssa that counteract cytotoxic protein aggregation, also control prion propagation. Stress-damaged proteins that are not disaggregated by chaperones are cleared from daughter cells via mother-specific asymmetric segregation in cell divisions following heat shock. Short-term mild heat stress destabilizes [PSI+], a prion isoform of the yeast translation termination factor Sup35. This destabilization is linked to the induction of the Hsp104 chaperone. Here, we show that the region of Hsp104 known to be required for curing by artificially overproduced Hsp104 is also required for heat-shock-mediated [PSI+] destabilization. Moreover, deletion of the SIR2 gene, coding for a deacetylase crucial for asymmetric segregation of heat-damaged proteins, also counteracts heat-shock-mediated destabilization of [PSI+], and Sup35 aggregates are colocalized with aggregates of heat-damaged proteins marked by Hsp104-GFP. These results support the role of asymmetric segregation in prion destabilization. Finally, we show that depletion of the heat-shock noninducible ribosome-associated chaperone Hsp70-Ssb decreases heat-shock-mediated destabilization of [PSI+], while disruption of a cochaperone complex mediating the binding of Hsp70-Ssb to the ribosome increases prion loss. Our data indicate that Hsp70-Ssb relocates from the ribosome to the cytosol during heat stress. Cytosolic Hsp70-Ssb has been shown to antagonize the function of Hsp70-Ssa in prion propagation, which explains the Hsp70-Ssb effect on prion destabilization by heat shock. This result uncovers the stress-related role of a stress noninducible chaperone.

Cell diversity in multicellular organisms relies on coordination between cell proliferation and the acquisition of cell identity. The equilibrium between these two processes is essential to assure the correct number of determined cells at a given time at a given place. Using genetic approaches and correlative microscopy, we show that Tramtrack-69 (Ttk69, a Broad-complex, Tramtrack and Bric-à-brac - Zinc Finger (BTB-ZF) transcription factor ortholog of the human promyelocytic leukemia zinc finger factor) plays an essential role in controlling this balance. In the Drosophila bristle cell lineage, which produces the external sensory organs composed by a neuron and accessory cells, we show that ttk69 loss-of-function leads to supplementary neural-type cells at the expense of accessory cells. Our data indicate that Ttk69 (1) promotes cell cycle exit of newborn terminal cells by downregulating CycE, the principal cyclin involved in S-phase entry, and (2) regulates cell-fate acquisition and terminal differentiation, by downregulating the expression of hamlet and upregulating that of Suppressor of Hairless, two transcription factors involved in neural-fate acquisition and accessory cell differentiation, respectively. Thus, Ttk69 plays a central role in shaping neural cell lineages by integrating molecular mechanisms that regulate progenitor cell cycle exit and cell-fate commitment.

In many species, sperm can remain viable in the reproductive tract of a female well beyond the typical interval to remating. This creates an opportunity for sperm from different males to compete for oocyte fertilization inside the female’s reproductive tract. In Drosophila melanogaster, sperm characteristics and seminal fluid content affect male success in sperm competition. On the other hand, although genome-wide association studies (GWAS) have demonstrated that female genotype plays a role in sperm competition outcome as well, the biochemical, sensory, and physiological processes by which females detect and selectively use sperm from different males remain elusive. Here, we functionally tested 26 candidate genes implicated via a GWAS for their contribution to the female’s role in sperm competition, measured as changes in the relative success of the first male to mate (P1). Of these 26 candidates, we identified eight genes that affect P1 when knocked down in females, and showed that five of them do so when knocked down in the female nervous system. In particular, Rim knockdown in sensory pickpocket (ppk)+ neurons lowered P1, confirming previously published results, and a novel candidate, caup, lowered P1 when knocked down in octopaminergic Tdc2+ neurons. These results demonstrate that specific neurons in the female’s nervous system play a functional role in sperm competition and expand our understanding of the genetic, neuronal, and mechanistic basis of female responses to multiple matings. We propose that these neurons in females are used to sense, and integrate, signals from courtship or ejaculates, to modulate sperm competition outcome accordingly.

Hybrid male progeny from interspecies crosses are more prone to sterility or inviability than hybrid female progeny, and the male sterility and inviability often demonstrate parent-of-origin asymmetry. However, the underlying genetic mechanism of asymmetric sterility or inviability remains elusive. We previously established a genome-wide hybrid incompatibility (HI) landscape between Caenorhabditis briggsae and C. nigoni by phenotyping a large collection of C. nigoni strains each carrying a C. briggsae introgression. In this study, we systematically dissect the genetic mechanism of asymmetric sterility and inviability in both hybrid male and female progeny between the two species. Specifically, we performed reciprocal crosses between C. briggsae and different C. nigoni strains that each carry a GFP-labeled C. briggsae genomic fragment referred to as introgression, and scored the HI phenotypes in the F1 progeny. The aggregated introgressions cover 94.6% of the C. briggsae genome, including 100% of the X chromosome. Surprisingly, we observed that two C. briggsae X fragments that produce C. nigoni male sterility as an introgression rescued hybrid F1 sterility in males fathered by C. briggsae. Subsequent backcrossing analyses indicated that a specific interaction between the X-linked interaction and one autosome introgression is required to rescue the hybrid male sterility. In addition, we identified another two C. briggsae genomic intervals on chromosomes II and IV that can rescue the inviability, but not the sterility, of hybrid F1 males fathered by C. nigoni, suggesting the involvement of differential epistatic interactions in the asymmetric hybrid male fertility and inviability. Importantly, backcrossing of the rescued sterile males with C. nigoni led to the isolation of a 1.1-Mb genomic interval that specifically interacts with an X-linked introgression, which is essential for hybrid male fertility. We further identified three C. briggsae genomic intervals on chromosome I, II, and III that produced inviability in all F1 progeny, dependent on or independent of the parent-of-origin. Taken together, we identified multiple independent interacting loci that are responsible for asymmetric hybrid male and female sterility, and inviability, which lays a foundation for their molecular characterization.

Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about one-third of the Y chromosome, containing 568 transcripts and spanning 22.3 cM in the corresponding female map, has ceased recombining. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii, which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining ~1 MYA. Patterns of gene expression within the nonrecombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.

Experimental investigations into the rates and fitness effects of spontaneous mutations are fundamental to our understanding of the evolutionary process. To gain insights into the molecular and fitness consequences of spontaneous mutations, we conducted a mutation accumulation (MA) experiment at varying population sizes in the nematode Caenorhabditis elegans, evolving 35 lines in parallel for 409 generations at three population sizes (N = 1, 10, and 100 individuals). Here, we focus on nuclear SNPs and small insertion/deletions (indels) under minimal influence of selection, as well as their accrual rates in larger populations under greater selection efficacy. The spontaneous rates of base substitutions and small indels are 1.84 (95% C.I. ± 0.14) x 10–9 substitutions and 6.84 (95% C.I. ± 0.97) x 10–10 changes/site/generation, respectively. Small indels exhibit a deletion bias with deletions exceeding insertions by threefold. Notably, there was no correlation between the frequency of base substitutions, nonsynonymous substitutions, or small indels with population size. These results contrast with our previous analysis of mitochondrial DNA mutations and nuclear copy-number changes in these MA lines, and suggest that nuclear base substitutions and small indels are under less stringent purifying selection compared to the former mutational classes. A transition bias was observed in exons as was a near universal base substitution bias toward A/T. Strongly context-dependent base substitutions, where 5'–Ts and 3'–As increase the frequency of A/T -> T/A transversions, especially at the boundaries of A or T homopolymeric runs, manifest as higher mutation rates in (i) introns and intergenic regions relative to exons, (ii) chromosomal cores vs. arms and tips, and (iii) germline-expressed genes.

Pedigrees provide the genealogical relationships among individuals at a fine resolution and serve an important function in many areas of genetic studies. One such use of pedigree information is in the estimation of the short-term effective population size $$\left({N}_{e}\right)$$, which is of great relevance in fields such as conservation genetics. Despite the usefulness of pedigrees, however, they are often an unknown parameter and must be inferred from genetic data. In this study, we present a Bayesian method to jointly estimate pedigrees and $${N}_{e}$$ from genetic markers using Markov Chain Monte Carlo. Our method supports analysis of a large number of markers and individuals within a single generation with the use of a composite likelihood, which significantly increases computational efficiency. We show, on simulated data, that our method is able to jointly estimate relationships up to first cousins and $${N}_{e}$$ with high accuracy. We also apply the method on a real dataset of house sparrows to reconstruct their previously unreported pedigree.

We present an algorithm for inferring ancestry segments and characterizing admixture events, which involve an arbitrary number of genetically differentiated groups coming together. This allows inference of the demographic history of the species, properties of admixing groups, identification of signatures of natural selection, and may aid disease gene mapping. The algorithm employs nested hidden Markov models to obtain local ancestry estimation along the genome for each admixed individual. In a range of simulations, the accuracy of these estimates equals or exceeds leading existing methods. Moreover, and unlike these approaches, we do not require any prior knowledge of the relationship between subgroups of donor reference haplotypes and the unseen mixing ancestral populations. Our approach infers these in terms of conditional "copying probabilities." In application to the Human Genome Diversity Project, we corroborate many previously inferred admixture events (e.g., an ancient admixture event in the Kalash). We further identify novel events such as complex four-way admixture in San-Khomani individuals, and show that Eastern European populations possess $$1-3\hbox{ \% }$$ ancestry from a group resembling modern-day central Asians. We also identify evidence of recent natural selection favoring sub-Saharan ancestry at the human leukocyte antigen (HLA) region, across North African individuals. We make available an R and C++ software library, which we term MOSAIC (which stands for MOSAIC Organizes Segments of Ancestry In Chromosomes).

Thousands of genes responsible for many diseases and other common traits in humans have been detected by Genome Wide Association Studies (GWAS) in the last decade. However, candidate causal variants found so far usually explain only a small fraction of the heritability estimated by family data. The most common explanation for this observation is that the missing heritability corresponds to variants, either rare or common, with very small effect, which pass undetected due to a lack of statistical power. We carried out a meta-analysis using data from the NHGRI-EBI GWAS Catalog in order to explore the observed distribution of locus effects for a set of 42 complex traits and to quantify their contribution to narrow-sense heritability. With the data at hand, we were able to predict the expected distribution of locus effects for 16 traits and diseases, their expected contribution to heritability, and the missing number of loci yet to be discovered to fully explain the familial heritability estimates. Our results indicate that, for 6 out of the 16 traits, the additive contribution of a great number of loci is unable to explain the familial (broad-sense) heritability, suggesting that the gap between GWAS and familial estimates of heritability may not ever be closed for these traits. In contrast, for the other 10 traits, the additive contribution of hundreds or thousands of loci yet to be found could potentially explain the familial heritability estimates, if this were the case. Computer simulations are used to illustrate the possible contribution from nonadditive genetic effects to the gap between GWAS and familial estimates of heritability.

Expression QTL (eQTL) detection has emerged as an important tool for unraveling the relationship between genetic risk factors and disease or clinical phenotypes. Most studies are predicated on the assumption that only a single causal variant explains the association signal in each interval. This greatly simplifies the statistical modeling, but is liable to biases in scenarios where multiple local causal-variants are responsible. Here, our primary goal was to address the prevalence of secondary cis-eQTL signals regulating peripheral blood gene expression locally, utilizing two large human cohort studies, each >2500 samples with accompanying whole genome genotypes. The CAGE (Consortium for the Architecture of Gene Expression) dataset is a compendium of Illumina microarray studies, and the Framingham Heart Study is a two-generation Affymetrix dataset. We also describe Bayesian colocalization analysis of the extent of sharing of cis-eQTL detected in both studies as well as with the BIOS RNAseq dataset. Stepwise conditional modeling demonstrates that multiple eQTL signals are present for ~40% of over 3500 eGenes in both microarray datasets, and that the number of loci with additional signals reduces by approximately two-thirds with each conditioning step. Although <20% of the peak signals across platforms fine map to the same credible interval, the colocalization analysis finds that as many as 50–60% of the primary eQTL are actually shared. Subsequently, colocalization of eQTL signals with GWAS hits detected 1349 genes whose expression in peripheral blood is associated with 591 human phenotype traits or diseases, including enrichment for genes with regulatory functions. At least 10%, and possibly as many as 40%, of eQTL-trait colocalized signals are due to nonprimary cis-eQTL peaks, but just one-quarter of these colocalization signals replicated across the gene expression datasets. Our results are provided as a web-based resource for visualization of multi-site regulation of gene expression and its association with human complex traits and disease states.

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative "reference" traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

MicroRNAs (miRNAs) are known to modulate gene expression, but their activity at the tissue-specific level remains largely uncharacterized. To study their contribution to tissue-specific gene expression, we developed novel tools to profile putative miRNA targets in the Caenorhabditis elegans intestine and body muscle. We validated many previously described interactions and identified ~3500 novel targets. Many of the candidate miRNA targets curated are known to modulate the functions of their respective tissues. Within our data sets we observed a disparity in the use of miRNA-based gene regulation between the intestine and body muscle. The intestine contained significantly more putative miRNA targets than the body muscle highlighting its transcriptional complexity. We detected an unexpected enrichment of RNA-binding proteins targeted by miRNA in both tissues, with a notable abundance of RNA splicing factors. We developed in vivo genetic tools to validate and further study three RNA splicing factors identified as putative miRNA targets in our study (asd-2, hrp-2, and smu-2), and show that these factors indeed contain functional miRNA regulatory elements in their 3'UTRs that are able to repress their expression in the intestine. In addition, the alternative splicing pattern of their respective downstream targets (unc-60, unc-52, lin-10, and ret-1) is dysregulated when the miRNA pathway is disrupted. A reannotation of the transcriptome data in C. elegans strains that are deficient in the miRNA pathway from past studies supports and expands on our results. This study highlights an unexpected role for miRNAs in modulating tissue-specific gene isoforms, where post-transcriptional regulation of RNA splicing factors associates with tissue-specific alternative splicing.

### Genetic Ethics

The advances in genetic mapping have made very real what seemed so improbable twenty years ago. ... Genetic mapping is a powerful tool ... but it is also vulnerable to abuse. Many ethical, legal and societal issues are beginning to emerge...