- GBETranscription, mRNA Export and Immune Evasion Shape the Codon Usage of VirusesMordstein, Christine, Cano, Laura, Morales, Atahualpa Castillo, Young, Bethan, Ho, Alexander T, Rice, Alan M, Liss, Michael, Hurst, Laurence D, and Kudla, Grzegorz
The nucleotide composition, dinucleotide composition, and codon usage of many viruses differs from their hosts. These differences arise because viruses are subject to unique mutation and selection pressures that do not apply to host genomes; however, the molecular mechanisms that underlie these evolutionary forces are unclear. Here, we analysed the patterns of codon usage in 1,520 vertebrate-infecting viruses, focusing on parameters known to be under selection and associated with gene regulation. We find that GC content, dinucleotide content, and splicing and m6A modification-related sequence motifs are associated with the type of genetic material (DNA or RNA), strandedness, and replication compartment of viruses. In an experimental follow-up, we find that the effects of GC content on gene expression depend on whether the genetic material is delivered to the cell as DNA or mRNA, whether it is transcribed by endogenous or exogenous RNA polymerase, and whether transcription takes place in the nucleus or cytoplasm. Our results suggest that viral codon usage cannot be explained by a simple adaptation to the codon usage of the host – instead, it reflects the combination of multiple selective and mutational pressures, including the need for efficient transcription, export, and immune evasion.
- GBECauses and Consequences of Purifying Selection on SARS-CoV-2Morales, Atahualpa Castillo, Rice, Alan M, Ho, Alexander T, Mordstein, Christine, Mühlhausen, Stefanie, Watson, Samir, Cano, Laura, Young, Bethan, Kudla, Grzegorz, and Hurst, Laurence D
Owing to a lag between a deleterious mutation’s appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus’s complex mutational and compositional biases, estimate that the mutation rate is at least 49–67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the “missing” mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.
- MBEEvidence for Strong Mutation Bias toward, and Selection against, U Content in SARS-CoV-2: Implications for Vaccine DesignRice, Alan M, Castillo Morales, Atahualpa, Ho, Alexander T, Mordstein, Christine, Mühlhausen, Stefanie, Watson, Samir, Cano, Laura, Young, Bethan, Kudla, Grzegorz, and Hurst, Laurence D
Large-scale re-engineering of synonymous sites is a promising strategy to generate vaccines either through synthesis of attenuated viruses or via codon-optimized genes in DNA vaccines. Attenuation typically relies on deoptimization of codon pairs and maximization of CpG dinucleotide frequencies. So as to formulate evolutionarily informed attenuation strategies that aim to force nucleotide usage against the direction favored by selection, here, we examine available whole-genome sequences of SARS-CoV-2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias toward U. In turn, analysis of observed synonymous site composition implicates selection against U. Accounting for dinucleotide effects reinforces this conclusion, observed UU content being a quarter of that expected under neutrality. Possible mechanisms of selection against U mutations include selection for higher expression, for high mRNA stability or lower immunogenicity of viral genes. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV-2 genes. We propose an evolutionarily informed approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.
- MOL CELLA Family of Vertebrate-Specific Polycombs Encoded by the LCOR / LCORL Genes Balance PRC2 Subtype ActivitiesConway, Eric, Jerman, Emilia, Healy, Evan, Ito, Shinsuke, Holoch, Daniel, Oliviero, Giorgio, Deevy, Orla, Glancy, Eleanor, Fitzpatrick, Darren J., Mucha, Marlena, Watson, Ariane, Rice, Alan M., Chammas, Paul, Huang, Christine, Pratt-Kelly, Indigo, Koseki, Yoko, Nakayama, Manabu, Ishikura, Tomoyuki, Streubel, Gundula, Wynne, Kieran, Hokamp, Karsten, McLysaght, Aoife, Ciferri, Claudio, Di Croce, Luciano, Cagney, Gerard, Margueron, Raphaël, Koseki, Haruhiko, and Bracken, Adrian P.
The polycomb repressive complex 2 (PRC2) consists of core subunits SUZ12, EED, RBBP4/7, and EZH1/2 and is responsible for mono-, di-, and tri-methylation of lysine 27 on histone H3. Whereas two distinct forms exist, PRC2.1 (containing one polycomb-like protein) and PRC2.2 (containing AEBP2 and JARID2), little is known about their differential functions. Here, we report the discovery of a family of vertebrate-specific PRC2.1 proteins, “PRC2 associated LCOR isoform 1” (PALI1) and PALI2, encoded by the LCOR and LCORL gene loci, respectively. PALI1 promotes PRC2 methyltransferase activity in vitro and in vivo and is essential for mouse development. Pali1 and Aebp2 define mutually exclusive, antagonistic PRC2 subtypes that exhibit divergent H3K27-tri-methylation activities. The balance of these PRC2.1/PRC2.2 activities is required for the appropriate regulation of polycomb target genes during differentiation. PALI1/2 potentially link polycombs with transcriptional co-repressors in the regulation of cellular identity during development and in cancer.
- NAT COMMUNDosage Sensitivity Is a Major Determinant of Human Copy Number Variant PathogenicityRice, Alan M., and McLysaght, Aoife
Human copy number variants (CNVs) account for genome variation an order of magnitude larger than single-nucleotide polymorphisms. Although much of this variation has no phenotypic consequences, some variants have been associated with disease, in particular neurodevelopmental disorders. Pathogenic CNVs are typically very large and contain multiple genes, and understanding the cause of the pathogenicity remains a major challenge. Here we show that pathogenic CNVs are significantly enriched for genes involved in development and genes that have greater evolutionary copy number conservation across mammals, indicative of functional constraints. Conversely, genes found in benign CNV regions have more variable copy number. These evolutionary constraints are characteristic of genes in pathogenic CNVs and can only be explained by dosage sensitivity of those genes. These results implicate dosage sensitivity of individual genes as a common cause of CNV pathogenicity. These evolutionary metrics suggest a path to identifying disease genes in pathogenic CNVs.
- BMC BIOLDosage-Sensitive Genes in Evolution and DiseaseRice, Alan M., and McLysaght, Aoife
For a subset of genes in our genome a change in gene dosage, by duplication or deletion, causes a phenotypic effect. These dosage-sensitive genes may confer an advantage upon copy number change, but more typically they are associated with disease, including heart disease, cancers and neuropsychiatric disorders. This gene copy number sensitivity creates characteristic evolutionary constraints that can serve as a diagnostic to identify dosage-sensitive genes. Though the link between copy number change and disease is well-established, the mechanism of pathogenicity is usually opaque. We propose that gene expression level may provide a common basis for the pathogenic effects of many copy number variants.
- GENE DEVA Chromatin-Independent Role of Polycomb-like 1 to Stabilize P53 and Promote Cellular QuiescenceBrien, Gerard L., Healy, Evan, Jerman, Emilia, Conway, Eric, Fadda, Elisa, O’Donovan, Darragh, Krivtsov, Andrei V., Rice, Alan M., Kearney, Conor J., Flaus, Andrew, McDade, Simon S., Martin, Seamus J., McLysaght, Aoife, O’Connell, David J., Armstrong, Scott A., and Bracken, Adrian P.
Polycomb-like proteins 1-3 (PCL1-3) are substoichiometric components of the Polycomb-repressive complex 2 (PRC2) that are essential for association of the complex with chromatin. However, it remains unclear why three proteins with such apparent functional redundancy exist in mammals. Here we characterize their divergent roles in both positively and negatively regulating cellular proliferation. We show that while PCL2 and PCL3 are E2F-regulated genes expressed in proliferating cells, PCL1 is a p53 target gene predominantly expressed in quiescent cells. Ectopic expression of any PCL protein recruits PRC2 to repress the INK4A gene; however, only PCL2 and PCL3 confer an INK4A-dependent proliferative advantage. Remarkably, PCL1 has evolved a PRC2- and chromatin-independent function to negatively regulate proliferation. We show that PCL1 binds to and stabilizes p53 to induce cellular quiescence. Moreover, depletion of PCL1 phenocopies the defects in maintaining cellular quiescence associated with p53 loss. This newly evolved function is achieved by the binding of the PCL1 N-terminal PHD domain to the C-terminal domain of p53 through two unique serine residues, which were acquired during recent vertebrate evolution. This study illustrates the functional bifurcation of PCL proteins, which act in both a chromatin-dependent and a chromatin-independent manner to regulate the INK4A and p53 pathways.