This is an essay I wrote for my class: The Ancient and Modern RNA Worlds, taught by Dr. Laura Landweber at Columbia University.
This paper traces the history of RNA splicing from its discovery in 1977 to the recent atomic structures of the spliceosome. It highlights the key experiments that revealed the RNA origins and functions of the spliceosome, as well as the challenges and questions that remain unresolved. The paper focuses on the role of U6 RNA as the main catalytic agent in the spliceosome, and discusses how the complex interactions of proteins and RNAs enable the precise and flexible regulation of splicing in higher eukaryotes.
The early years of genetic studies focused mainly on bacteria. Genes were thought to be continuous segments of long, double-stranded DNA molecules that carry hereditary information. However, this view of gene structure changed dramatically in 1977 when Richard J. Roberts and Phillip A. Sharp independently discovered that genes could be split into several segments that are separated by non-coding regions within the DNA, known today as introns. This finding of split genes raised new questions about the evolutionary and functional significance of this phenomenon and the mechanism behind it. Early links were made to self splicing introns commonly observed in bacteria while the complex attributes of the splicing machinery are continued to be studied in detail today.
Successful splicing is dependent on the proper assembly and function of the spliceosome, which is a large ribonucleoprotein (RNP) complex composed of five small nuclear ribonucleoproteins (snRNPs) and associated proteins discussed in more detail in the following section. Precursor messenger RNA (pre-mRNA) splicing involves two transesterification reactions, relatively simple chemical reactions that rely on functional groups from three reactive regions in the pre-mRNA. These reactions involve the 5’ splice site (SS), branch point sequence (BPS), and 3’ SS, defined by short consensus sequence elements (mainly GT-AG) that are overall poorly conserved in metazoans. Although chemically simple, splicing reactions require hydrolysis of a large quantity of ATP with immense regulation by proteins and RNA. This increased complexity may be in place to help ensure accurate splicing choices (Matera and Wang 2014). Specifically, the spliceosome is a type of RNP that carries out splicing in a dynamic and flexible way. The spliceosome recognizes specific sequences at the splice sites and forms a series of complexes with the pre-mRNA. It can recognize different splice sites in the pre-mRNA and choose among them during alternative splicing, which allows for diversity of protein products from a single gene. This is unlike most RNPs whose targets are typically predefined. The spliceosome changes its composition and structure during splicing, depending on the substrate and the stage of the reaction (Wahl, Will, and Lührmann 2009). During this process, the spliceosome undergoes several conformational changes that bring the splice sites closer together in order to activate the catalytic center while destabilizing and releasing the spliced out lariat.
The spliceosome is highly conserved across yeast to humans, indicating its fundamental role in eukaryotic gene expression. However, the spliceosome is also very flexible as it can coordinate splicing of short and long introns with varied sequences. Some insights into the origin and evolution of the spliceosome came from self-splicing introns, introns known to catalyze their own removal without the help of proteins. Group II introns in particular are thought to be ancestral to the spliceosome, as they share several key structural and functional features with it (Koonin 2006). Importantly, unlike group II introns that can self-splice and form an active site with their own three-dimensional fold (Toor et al. 2008), nuclear pre-mRNA introns need many trans-acting factors in the spliceosome to fold and splice properly. The spliceosome continues to be an active area of research, as many aspects of its structure, function and regulation remain to be elucidated.
Small nuclear RNAs (snRNAs) range from 90 to 220 nucleotides and exist as RNA-protein complexes in small nuclear ribonucleoproteins (snRNPs). These are abundant, non-coding, non-polyadenylated transcripts that function in the nucleoplasm. Five snRNAs are known to be involved in the major spliceosome, U1, U2, U4, U5, and U6, named for their high uridine content (Karijolich and Yu 2010). U1, U2, U4, and U5 make up the Sm class while U6, alone makes up the Sm-like class (Matera and Wang 2014). Newly transcribed snRNAs (not U6) are exported to the cytoplasm where they are bound by Sm proteins before being imported back into the nucleus (except for budding yeast and trypanosomes), suggesting an intricate regulatory mechanism, involving many proteins. Sm proteins bind individual snRNAs while forming heptameric rings around each transcript, ultimately establishing the core spliceosomal ribonucleoparticles involving U1, U2, U4 and U5 snRNPs. Over forty years ago, Joan Steitz first proposed the idea that snRNPs and snRNAs were themselves important for the chemical reactions during splicing (Lerner et al. 1980). She made the early observation while studying U1 snRNA which revealed that the 5’ end nucleotide sequence exhibited complementarity to nucleotides across splice junctions, suggesting that snRNAs directly influence splicing by base-pairing with RNA substrates while ensuring the proper recognition and excision of introns from precursor RNA molecules. Further, early results from the Guthrie lab also suggested that U2 could recognize the branch point through pairing, based on genetic studies in yeast (Parker, Siliciano, and Guthrie 1987). Further studies have since shown that snRNAs are essential for splice site recognition, spliceosome assembly, and catalytic activity of splicing itself. While short, snRNAs have been found to be conserved from yeast to humans, suggesting an integral role in the early days of this complex. It begs the question, why are there only five snRNAs? Or are there more? (hint: see minor spliceosome). The answer to the first question is beyond this discussion but presents an additional avenue for discussing the RNA origins of splicing.
One of the major debates in molecular evolution is whether introns were present in the earliest forms of life or emerged later in eukaryotes. The ‘introns early’ hypothesis suggests that introns were abundant in the ancestral genome and that they contributed to the origin and diversification of proteins by allowing the recombination of modular exons. According to this view, prokaryotes lost most of their introns due to selective pressure for genome compactness. The ‘introns late’ hypothesis, on the other hand, proposes that introns are a relatively recent invention of eukaryotes and that they have been added into genes during eukaryotic evolution. This perspective implies that prokaryotes never had introns or spliceosomes in their history. One way to test these hypotheses is to compare the introns and spliceosomes of different organisms. For example, it has been proposed that modern introns (spliced by the major spliceosome) evolved from group II self-splicing introns, introns found in some bacteria and organelles. Group II introns are ribozymes (RNA enzymes) that are able to catalyze their splicing without proteins. They have a similar secondary structure and splicing mechanism to spliceosomal introns, but they differ in sequence and size. Some evidence supports the idea that group II introns were originally transferred from the ancient bacterial ancestor of mitochondria to the eukaryotic nucleus, where they gave rise to spliceosomal introns. Another way to study the evolution of introns and spliceosomes is to examine their diversity and complexity among eukaryotes. For instance, it has been observed that intron size correlates with organism complexity, with larger introns found in higher eukaryotes than in lower ones. This could be due to different selective pressures on genome size and gene expression efficiency. Alternatively, it could reflect different rates of intron insertion and deletion over evolutionary time. Moreover, alternative splicing is more prevalent and diverse in higher eukaryotes than in lower ones, suggesting that it evolved as an additional pathway to increase phenotypic complexity and functional specialization, a phenomenon that is known to have occurred more recently in terms of evolutionary time. It is likely that the intron rich environment allowed for alternative splicing to occur, another hypothesis that remains to be fully answered today.
One convincing hypothesis is that splicing of pre-mRNA evolved from self-splicing introns (group II) to the complex and dynamic modern day spliceosome. This would support the notion that at the heart of the splicing catalytic reactions, lie RNA molecules. Group II introns use metal ions as cofactors while catalyzing their own splicing (Shukla and Padgett 2002). However, most eukaryotic introns have lost this splicing capability and rely on a large macromolecular machine, the spliceosome, to remove them. The spliceosome is made up of five snRNAs and hundreds of proteins that assemble and disassemble on each intron. The evolutionary argument for this transition is that it allowed for more accuracy during recognition of active sites and flexibility and regulation of alternative splicing, which is essential for generating protein diversity and complexity in eukaryotes. However, the core mechanism of splicing is believed to remain conserved across group II self splicing introns and the modern spliceosome. A recent study by Fica et al. (2013) showed that the U6 snRNA, one of the snRNAs in the spliceosome, acts as a metalloenzyme that coordinates the catalytic magnesium ions within the catalytic site (Fica et al. 2013). To do this, they used a chemical approach to replace specific oxygen atoms in the phosphate groups of the U6 snRNA with sulfur atoms which have different metal-binding properties that should affect splicing efficiency. While 20 positions in the U6 snRNA were tested, five were shown to impair splicing in the presence of magnesium ions, but not when manganese or cadmium ions were added. These results suggest that these positions are involved in coordinating metal ions that are essential for splicing. The authors also found that the phosphate groups that coordinate the metal ions within the group II introns are conserved in the U6 snRNA and the pre-mRNA substrate. Moreover, they observed that the geometry and stereochemistry of the metal-binding sites are similar between the two systems. These findings provided strong evidence for the chemical equivalence of the spliceosome and the group II intron, and shed light on the mechanism of RNA splicing. This finding demonstrated concrete evidence that pre-mRNA splicing is fundamentally an RNA-catalyzed reaction, and suggested that the U6 snRNA may be a direct descendant of group II introns.
Cryo-EM has revolutionized the field of structural biology in recent years by enabling the visualization of large macromolecular complexes in their native states, including the spliceosome. Prior to the development of cryo-EM, it was challenging to obtain high-resolution structural information on the spliceosome due to its large size and dynamic nature. Importantly, several structures of the human spliceosomal C complex, found to occur just before the exon ligation step (C) were published in 2017. The first pseudo-atomic structure (Bertram et al. 2017) looked at the stalled C complex immediately after PRP16 RNA helicase activity with overall resolution of 5.9 Å. While the structure of the catalytic U2-U6 RNA-protein complex core has great similarity to the yeast C complex (pre-Prp16 activity), the branched intron region in human C was shown to be separated from the catalytic center while remaining in proximity to U6 through interactions with the highly conserved PRP8 RNase H domain. The structure also revealed that the PRP22 RNA helicase is in the periphery of the catalytic center and helps remove spliced mRNA after the second catalytic step from afar. The second cryo-EM structure published in 2017 obtained the first atomic structure with an average of 3.6 Å resolution of the same complex (Zhang et al. 2017). They found that the PRP17 splicing factor helped keep the active site conformation stable while RBM22 contained a positively charged central channel through which the intron lariat could traverse through. Importantly, they found that the overall structural organization and the conformation of the human C∗ complex displayed high similarity to the 5.9-Å structure while the improved resolution helped identify six additional exon ligation related proteins. Furthermore, the presence of metals and their coordination with specific phosphates from U6 snRNA were verified by the structure. Specifically, M1 was shown to be coordinated by G72 and U74 phosphates of U6 snRNA and the 3’-OH at the 3’end of the 5’-exon while M2 helped stabilize the exon ligation reaction (through A53, G54, and U74 on U6 snRNA).
While not discussed in detail herein, several comparisons have been made between aspects of the spliceosome and ribosome that are similar. Early previous work showed that the large and conserved protein Prp8 at the heart of the spliceosome catalytic unit debate interacts with Snu114 in a way that mimics the anticodon of tRNAs, through the IV domain in EF-2, mediating its translocation function (Staley and Guthrie 1998). Previous speculation by Jon Staley and Christine Guthria also suggested that the central stem loop of U5 snRNA resembles a tRNA anticodon stem loop (Staley and Guthrie 1998).
A rare category of introns with consensus sequences that were non-canonical was discovered across metazoan genes during the 1990s. These introns were shown to be spliced by a minor-class spliceosome made up of four snRNPs (U11, U12, U4atac and U6atac) that were found to be inherently comparable to the U1, U2, U4 and U6 snRNPs of the major-class spliceosome, respectively. The only common snRNP between the two spliceosomes was U5 (Patel and Steitz 2003). The Prp8 protein, which is highly conserved and crucial for the spliceosomal catalytic center, was shown to be co-precipitated with both minor and major class spliceosomes (Luo et al. 1999), suggesting similar and evolutionary conserved structural organization. Further, the group II intron domain 5 was shown to be able to successfully replace U6atac snRNA and catalyze the first splicing step in vitro and in vivo, which also supports the idea that the spliceosome (minor or major) originated from ancient RNA (Shukla and Padgett 2002). Unlike the rare cases of major introns with AT-AC ends, the majority of minor introns can be found to have the canonical GT-AG at their splice site ends (Sharp and Burge 1997). Key experiments showed that transforming AT-AC to GT-AG at the sequence ends failed to affect U12-dependent splicing. Instead, U12-dependent splicing was suggested to depend on the longer and more specific consensus sequences at the 5′ splice site and branch site of their target introns, in addition to the lack of a polypyrimidine tract upstream of the 3′ splice site. Hence, the term ‘U12-type’ was decided upon for referring to this rare class of introns. Interestingly, U12-type introns were found with nearby U2-type introns in the same gene in most cases, and do not have any preference for their location within the gene. Surprisingly, human U6atac is more divergent from human U6 snRNA than human and yeast U6 are from each other. This may suggest that U6 evolved in an independent lineage from U6atac throughout evolution.
Overall, the main differences between minor and major class splicing are likely to occur during intron recognition rather than catalysis. Another possible difference in recognizing U12-type rather than U2-type introns is the requirement for 5′ exon sequences when establishing correct U6atac–5′-splice-site base pairing by the minor-class spliceosome. It was shown that very short 5′ exons are able to support the first step of major-class splicing. Meanwhile, the recognition of the 5′ splice site within the major-class spliceosome occurs twice, by U1 followed by U6. This suggests a higher reliance on the 5’ sequence and a potential mechanism for improving upon splicing specificity, when compared to the fidelity of the minor spliceosome. The splicing of U12-type introns was also shown to be a slower mechanism when compared to U2-introns, affecting downstream gene expression (Patel, McCarthy, and Steitz 2002). It is possible that U12 introns have been mutated to U2 over time to improve efficiency in modern organisms. The splice sites of major and minor classes are distinct and do not form hybrid introns, but they enable different patterns of alternative splicing choices. One example is the prospero gene of Drosophila melanogaster, which has an intron-within-an-intron structure with minor-class splice sites enclosing an internal major-class intron (Otake et al. 2002). In this case, only one of the two spliceosome pathways is used for splicing, which affects the protein-coding sequence and can result in distinct protein isoforms.
The field of RNA splicing research remains full of outstanding questions that need to be addressed. For example, why do metazoan cells maintain two seemingly redundant systems for intron removal? Further, how well can deep learning models predict splicing outcomes from pre-mRNA sequences? These models often capture the main splicing motifs, such as the 5’ and 3’ splice sites and the branch point, but do they also recognize minor spliceosome motifs that have been overlooked in previous studies?
Furthermore, what is the splicing code that determines how different factors regulate splicing in different contexts? Beyond the splice site motifs, what other sequence elements or structural features influence splicing decisions? How do these elements interact with the various proteins involved in the spliceosome assembly and catalysis?
Finally, how can we obtain more detailed structural information about the spliceosome and its interactions with pre-mRNAs? Cryo-electron microscopy has revealed remarkable insights into the architecture and dynamics of the spliceosome, but there are still many gaps in our understanding of how SS are recognized and selected by different splicing factors across all stages of this dynamic process, and how mutations in these proteins could alter splicing outcomes and gene expression. Structural analyses of spliceosomes formed on alternatively spliced pre-mRNAs may also reveal the complex mechanisms whereby splice site choices are regulated.
Several of the ideas discussed herein were inspired by Dr. Christine Guthrie’s essay titled “From the Ribosome to the Spliceosome and Back Again” (Guthrie 2010).
Bertram, Karl, Dmitry E. Agafonov, Wen-Ti Liu, Olexandr Dybkov, Cindy L. Will, Klaus Hartmuth, Henning Urlaub, Berthold Kastner, Holger Stark, and Reinhard Lührmann. 2017. “Cryo-EM Structure of a Human Spliceosome Activated for Step 2 of Splicing.” Nature 542 (7641): 318–23.
Fica, Sebastian M., Nicole Tuttle, Thaddeus Novak, Nan-Sheng Li, Jun Lu, Prakash Koodathingal, Qing Dai, Jonathan P. Staley, and Joseph A. Piccirilli. 2013. “RNA Catalyses Nuclear Pre-mRNA Splicing.” Nature 503 (7475): 229–34.
Guthrie, Christine. 2010. “From the Ribosome to the Spliceosome and Back Again.” The Journal of Biological Chemistry 285 (1): 1–12.
Karijolich, John, and Yi-Tao Yu. 2010. “Spliceosomal snRNA Modifications and Their Function.” RNA Biology 7 (2): 192–204.
Koonin, Eugene V. 2006. “The Origin of Introns and Their Role in Eukaryogenesis: A Compromise Solution to the Introns-Early versus Introns-Late Debate?” Biology Direct 1 (August): 22.
Lerner, M. R., J. A. Boyle, S. M. Mount, S. L. Wolin, and J. A. Steitz. 1980. “Are snRNPs Involved in Splicing?” Nature 283 (5743): 220–24.
Luo, H. R., G. A. Moreau, N. Levin, and M. J. Moore. 1999. “The Human Prp8 Protein Is a Component of Both U2- and U12-Dependent Spliceosomes.” RNA 5 (7): 893–908.
Matera, A. Gregory, and Zefeng Wang. 2014. “A Day in the Life of the Spliceosome.” Nature Reviews. Molecular Cell Biology 15 (2): 108–21.
Otake, Leo R., Petra Scamborova, Carl Hashimoto, and Joan A. Steitz. 2002. “The Divergent U12-Type Spliceosome Is Required for Pre-mRNA Splicing and Is Essential for Development in Drosophila.” Molecular Cell 9 (2): 439–46.
Parker, R., P. G. Siliciano, and C. Guthrie. 1987. “Recognition of the TACTAAC Box during mRNA Splicing in Yeast Involves Base Pairing to the U2-like snRNA.” Cell 49 (2): 229–39.
Patel, Abhijit A., Matthew McCarthy, and Joan A. Steitz. 2002. “The Splicing of U12-Type Introns Can Be a Rate-Limiting Step in Gene Expression.” The EMBO Journal 21 (14): 3804–15.
Sharp, P. A., and C. B. Burge. 1997. “Classification of Introns: U2-Type or U12-Type.” Cell 91 (7): 875–79.
Shukla, Girish C., and Richard A. Padgett. 2002. “A Catalytically Active Group II Intron Domain 5 Can Function in the U12-Dependent Spliceosome.” Molecular Cell 9 (5): 1145–50.
Staley, J. P., and C. Guthrie. 1998. “Mechanical Devices of the Spliceosome: Motors, Clocks, Springs, and Things.” Cell 92 (3): 315–26.
Toor, Navtej, Kevin S. Keating, Sean D. Taylor, and Anna Marie Pyle. 2008. “Crystal Structure of a Self-Spliced Group II Intron.” Science 320 (5872): 77–82.
Wahl, Markus C., Cindy L. Will, and Reinhard Lührmann. 2009. “The Spliceosome: Design Principles of a Dynamic RNP Machine.” Cell 136 (4): 701–18.
Zhang, Xiaofeng, Chuangye Yan, Jing Hang, Lorenzo I. Finci, Jianlin Lei, and Yigong Shi. 2017. “An Atomic Structure of the Human Spliceosome.” Cell 169 (5): 918–29.e14.