Evolution of the LS2 regulated Splicing Network
Abstract
Splicing factors bind pre-mRNA transcripts, and cause changes through the selection of one splice site over another. However, they cannot function without an established splicing network. Splicing networks minimally require: 1) a functional splicing factor to affect expression of target genes, 2) tissue-specific expression of that functional splicing factor, 3) the presence of genes to be regulated with targeted exons, and 4) the sensitivity of targets to the regulatory molecule, in many cases achieved with a cis-acting binding motif.
LS2 is a retroduplicated paralog of the highly conserved ubiquitously expressed splicing factor U2AF50, that has evolved a distinct function in Drosophila. While it is clear that LS2 evolved from U2AF50, how LS2 evolved its distinct function is unknown. To determine how LS2 evolved a distinct function, I will systematically reconstruct the evolutionary history of the LS2 regulated splicing network.
Preliminary in-silico analysis suggests that the functionality of LS2 arose in Scaptodrosophila, a sister species of Drosophila. To confirm this, I will test functionality using molecular biology techniques. Furthermore, I am using genomic data to confirm orthologous features of LS2 target genes. While primary analyses using RNA-seq reads and Bowtie suggest LS2 has maintained testis specific expression throughout its existence, qPCR analyses will be employed in the progenitor species, Scaptodrosophila, to definitively confirm this assumption. By studying each splicing network requirement independently, I will elucidate how LS2 evolved a distinct function. The evolutionary history of LS2 will shed light on how lineages evolve novel genes and proteins with distinct function.
Key Words: [Gene Evolution, Splicing, Evolution, Splicing regulation, Drosophila]
Summary
LS2 is a retroduplicated paralog of the highly conserved ubiquitously expressed splicing factor U2AF50. While it has been shown to have a distinct role in Drosophila, its evolutionary history and ancestral state is currently unknown. I am using in-silico and molecular techniques to reconstruct the evolutionary history of the LS2 regulated splicing network.
Background
The diversity of metazoan body structures without a direct correlation to total gene count implies that alternative splicing plays an integral role in metazoan gene regulation. A good illustration is the fruit fly, Drosophila melanogaster, who produces over 300,000 transcripts from 17,564 genes[1]. Alternative splicing, the production of different mRNA transcripts with distinct fates from a single pre-mRNA transcript, has been observed in many animal adaptations and novel phenotypes [2-4]. For example, metazoa may splice in or out localization sequences of a single protein that enjoys functional requirements in multiple cellular compartments [5], or downregulate translation of a particular protein by splicing in or out an exon containing a premature in-frame stop codon. Splicing in these appropriately named ‘poison exons’ result in a truncated and subsequently degraded protein[6]. Furthermore, alternative splicing is regulated by several auxiliary proteins that influence pre-mRNA splicing. Serine/ arginine (SR) splicing factors are auxiliary proteins that influence splicing by binding cis elements in pre-mRNA transcripts. The subsequent recruitment or blockage of spliceosome assembly at a splice junction is due to two special properties of splicing factors. They contain one or more N-terminal RNA-recognition motif (RRM) that interacts with specific cis elements in pre-mRNA transcripts, and a C-terminal arginine/ serine rich (RS) domain for interacting with the spliceosome [7-9]. U2AF65 in humans, the homolog of U2AF50 in Drosophila, contains three RRMs and one RS domain[10]. Other properties and motifs of individual splicing factors give each their distinct role in the cell. While some splicing factors have been shown to only repress splicing at major splice sites, most can enhance or inhibit the alternative splicing reaction depending on the genomic context [8, 11-14].
The alternative splicing reaction requires four key network of components to function: 1) functional splicing factor with distinct targets, 2) specific tissue and spatial expression, 3) regulatory targets; genes and exons targeted by the splicing factor, and 4) sensitivity of the targets to regulation by the splicing factor. Sensitivity in this case requires a unique cis-acting binding motif [15]. Taking a systematic approach to elucidating each requirement independently, Irimia et al (2011) reconstructed the evolutionary history of a splicing network regulated by Nova, a vertebrate CNS splicing factor. They found that while the target genes were ancestral, the target exons were newly created. And while Nova is primarily expressed in vertebrate CNS, this pattern was only conserved in tunicates, the closest living relatives of vertebrates. Beyond chordates the expression pattern of Nova is highly divergent: Nova is expressed in various tissues and germ layers across invertebrates including the gut (endoderm), mesoderm, pharynx, and salivary glands (ectoderm). Although the ancestral history of the Nova regulated splicing network was explored, the story was incomplete. Lacking the ancestral state of the Nova regulated splicing network, the explored history provided little insight on the mechanisms by which a new gene acquires its distinct functional role. To this end I will explore the evolutionary history of a retrogene.
To enhance our understanding of how new genes acquire distinct functional roles, I am studying LS2 (Large Subunit 2), a retroduplicate of the ancestral U2AF50 splicing factor in Drosophila. Retroduplicates are genes transcribed to intronless mRNA and re-inserted into the genome by reverse transcription. With the function maintained by the progenitor, the retrogene is able to acquire a distinct function and give rise to a new gene [16-18]. I will study the origins of this gene and elucidate the ancestral state of the splicing network to determine how the distinct function of LS2 arose.
LS2 is a retroduplicated paralog of the highly conserved ubiquitously expressed splicing factor U2AF50. U2AF50 is the larger subunit of the U2AF heterodimer auxiliary protein and along with its conservation among all eukaryotic species, it promotes splicing in every tissue by interacting with a poly-pyrimidine tract 4-40 nucleotides upstream of the 3’ splice site. U2AF50 has been shown to preferentially activate splicing at targeted splice sites in Drosophila[14]. According to Taliaferro et al. (2011), U2AF50 underwent a duplication event between 60 and 250 mya that produced LS2. Since its arrival, LS2 has evolved sufficiently to have its own target specificity, function, and spatial expression [14]: While U2AF50 is ubiquitously expressed with its own set of target genes and exons, LS2 is preferentially expressed in the testis and targets a distinct set of genes and exons. Because U2AF50 promotes the inclusion of its target exons (splice promoter), LS2’s function as a splicing repressor produces a transcript distinct from that of U2AF50 in co-targeted genes. Thus, when target genes overlap the effect is combinatorial or competitive but not redundant, highlighting the distinction of LS2 as a highly evolved independent protein.
Preliminary data suggests that LS2 arose in the ancestor of Drosophila and Scaptodrosophila, and has maintained testis specificity in eight sequenced Drosophila species. Furthermore, all 38 previously targeted genes are conserved in Dipteran and Hymenopteran lineages. From Taliaferro et al. (2011), we know that the LS2 regulated splicing network produces distinct transcripts in the testis of Drosophila melanogaster, yet the steps in which this fast evolving splicing network was assembled is currently unknown. By uncovering the evolutionary steps in which the LS2 promoted splicing network was assembled, we will contribute to understanding the general pattern of novel gene acquisition and burgeoning transcriptome complexity. Here, we use a multi-disciplinary approach to reconstruct the evolutionary history and origin of the splicing network promoted by the SR protein splicing factor, LS2.
Overall Aim
I will use comparative biology to determine when each splicing network requirement was met and to how LS2 evolved a distinct function.
Objectives
Obj. 1: To determine when LS2 acquired its function as a splicing factor.
Obj. 2: To determine when LS2 acquired testis specific expression
Obj. 3: To determine when the LS2 targets (genes and exons) arose
Obj. 4: To determine when the LS2 targets acquired LS2 regulation sensitivity
Experimental Design
Objective 1. To determine when LS2 acquired its FUNCTION as a splicing factor.
LS2 affects the expression of target genes in Drosophila by acting as a splicing factor, promoting the alternative splicing of introns in target pre-mRNA transcripts. I hypothesize that the progenitor species of LS2 to be Scaptodrosophila, a clade within Dipteran order at the base of the Drosophila lineage. To test for splicing activity of Scaptodrosophila LS2 (S_LS2), I will clone Scapto_LS2 into an animal expression vector and transfect the recombinant vector into Drosophila S2 cells. Because LS2 is only expressed in testis, transfecting it into S2 cells, a tissue that does not normally express LS2, may result in a change in splicing patterns due to the distinct splicing activity of LS2.
Results: Because a change in splicing patterns will result in a change in transcribed mRNAs, I will verify LS2’s splice promoting activity by sequencing the RNAs of wild-type and LS2-transformed S2 cells carrying Dm_LS2 and Scapto_LS2. Dm_LS2 will be used to positively control for technical errors. I will compare wt-S2 cell and Dm_LS2 splicing patterns. Comparing the splicing patterns of S2 cells containing a Scapto_LS2 expression vector to wt_S2 cells will tell me if Scapto_LS2 functions as a distinct splicing factor. If Scapto_LS2 has the ability to act as a distinct splicing factor, I will observe a change in the cells splicing patterns. If there is no change in the cells splicing pattern, Scapto_LS2 could be acting as a redundant U2AF50. Thus Scapto_LS2 would have not evolved sufficiently from its progenitor to have a distinct role in the Scaptodrosophila lineage and would signify that function was gained in the Drosophila lineage.
Objective 2. To determine when LS2 acquired testis specific EXPRESSION.
The functioning of the U2AF50 splicing network is dependent on specific spatial expression of LS2. Unlike the ubiquitously expressed U2AF50, LS2 is differentially expressed in the testis of D. melanogaster. I want to know if this expression pattern is conserved in Scaptodrosophila. To this end, I will perform qPCR on enriched Scaptodrosophila tissue samples: Testis, brain, and whole-fly minus testis and brain. [if the pattern is conserved in Scaptodrosophila it must be ancestral
Results: qPCR will reveal the levels of LS2 expression in each tissue. I will assess the results for two factors: 1) differential spatial expression and 2) tissue specificity. Is Scapto_LS2 is globally or differentially expressed? And, given Scapto_LS2 is differentially expressed, is the tissue of choice conserved?
Many alternative splicing events are known to be tissue specific. Having a known functional role in testis function and male gamete production, Dmel_LS2 achieves its specificity by differential expression in the testis.[14] Given the functional constraints of LS2, I hypothesize Scapto_LS2 will exhibit specific spatial expression in the testis, mimicking the observed pattern of Dmel_LS2. Differential expression in a tissue other than testis may indicate a more distinct functional role of Scapto_LS2 targets, warranting a commensurate expression pattern. Not observing a clear pattern of differential expression would indicate that Scapto_LS2 isn’t distinct enough from its progenitor, U2AF50, to require specific spatial expression. The latter two possibilities would indicate that testis specific expression is not ancestral and gained in Drosophila.
Objective 3. To determine when LS2 TARGETS arose: target genes and exons.
Dm_LS2 is known to affect 311 intron-exon splice junctions in 168 target genes in Drosophila melanogaster. To determine the evolutionary history of LS2 targets I will look for presence or absence of LS2 targets in an outgroup and ingroup. Using the honey bee, Apis mellifera, as an outgroup and the mosquito, Anopheles gambiae, as the sister species to Drosophila, I will determine the evolutionary history of Drosophila LS2 targets on two levels of conservation: gene and exon conservation. To determine gene conservation, I will use BLAST to locate orthologous LS2 target genes in the outgroup and sister species. If the gene exists, I will look for an orthologous target exon within that gene, and confirm using Clustalw2 and Seaview [19, 20].
Results: Drosophila genes and exons with an ortholog in the mosquito or honey bee are considered ancestral genes and exons. If the gene is present in the mosquito and honey bee without the orthologous target exon, I will conclude that the target exon is new and may have arisen in Drosophila for the LS2 splicing network.
Objective 4. To determine when LS2 TARGET SENSITIVITY evolved.
Binding sites make the splice sites/ exons/ genes visible to LS2. LS2 recognizes and binds to a Guanine rich binding site (poly-G). Using the approach from objective 3, I will look for binding sites in orthologous genes upstream of target exons in sequenced Diptera sister groups and Hymenopteran outgroups.
Results: Binding site presence and absence screens will be conducted using BLAST and Clustalw2 and novel Perl scripts to search for likely binding site candidates near those reported in D. melanogaster. If the binding sites are present in the ingroup and outgroups, I will conclude that the poly g binding motif was ancestral and utilized by LS2 in Drosophila; if absent in the test groups, the LS2 binding site must be “novel”, created in the Drosophila lineage. If the available data is inadequate, I will use a SELEX binding assay to determine the optimal binding site of S_LS2.
Conclusions/Significance
With this systematic approach, Irimia et al. 2011 demonstrated the orderly sequence in which the Nova promoted splicing network, a vertebrate CNS splicing factor, was assembled. Using this systematic approach, I will reconstruct the evolutionary history of the LS2 promoted splicing network to further understand how new genes evolve distinct function. This work will enrich our knowledge of gene evolution.
Literature Cited
1. Brown, J.B., et al., Diversity and dynamics of the Drosophila transcriptome. Nature, 2014. 512(7515): p. 393-9.
2. Nilsen, T.W. and B.R. Graveley, Expansion of the eukaryotic proteome by alternative splicing. Nature, 2010. 463(7280): p. 457-63.
3. Lee, Y. and D.C. Rio, Mechanisms and Regulation of Alternative Pre-mRNA Splicing. Annu Rev Biochem, 2015. 84: p. 291-323.
4. Wang, E.T., et al., Alternative isoform regulation in human tissue transcriptomes. Nature, 2008. 456(7221): p. 470-6.
5. Ast, J., et al., Dual targeting of peroxisomal proteins. Front Physiol, 2013. 4: p. 297.
6. Lareau, L.F., et al., Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature, 2007. 446(7138): p. 926-9.
7. Schaal, T.D., Selection and Characterization of Pre-mRNA Splicing
Enhancers: Identification of Novel SR Protein-Specific
Enhancer Sequences. MOLECULAR AND CELLULAR BIOLOGY,, 1999. 19(3): p. 1705–1719.
8. Bradley, T., M.E. Cook, and M. Blanchette, SR proteins control a complex network of RNA-processing events. RNA, 2015. 21(1): p. 75-92.
9. Schaal, T.D., Multiple Distinct Splicing Enhancers in the Protein-Coding
Sequences of a Constitutively Spliced Pre-mRNA. Mol Cell, 1999. 19(1): p. 261–273.
10. Long, J.C. and J.F. Caceres, The SR protein family of splicing factors: master regulators of gene expression. Biochem J, 2009. 417(1): p. 15-27.
11. Fairbrother, W.G., Ru-Fang Yeh,1* Phillip A. Sharp,1,2 and C.B. Burge1†, Predictive Identification of
Exonic Splicing Enhancers in
Human Genes. Mol Cell, 2000. 20(18): p. 6816-6825.
12. Dreyfuss, C.G.B.a.G., Conserved Structures and Diversity of Functions of RNA-Binding Proteins. Science, 1994.
13. Richard A. Padgett, P.J.G., Splicing of Messenger RNA Precursors. Ann. Rev. Biochem, 1986.
14. Taliaferro, J.M., et al., Evolution of a tissue-specific splicing network. Genes Dev, 2011. 25(6): p. 608-20.
15. Manuel Irimia, et al., Stepwise assembly of the Nova-regulated alternative splicing network in the vertebrate brain. PNAS, 2011.
16. Marques, A.C., et al., Emergence of young human genes after a burst of retroposition in primates. PLoS Biol, 2005. 3(11): p. e357.
17. Ohno, S., Evolution By Gene Duplication. 1970.
18. Betran, E., Retroposed New Genes Out of the X in Drosophila. Genome Res, 2002.
19. Gouy, M., S. Guindon, and O. Gascuel, SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol, 2010. 27(2): p. 221-4.
20. Larkin, M.A., et al., Clustal W and Clustal X version 2.0. Bioinformatics, 2007. 23(21): p. 2947-8.