Introduction
RNA binding proteins (RBPs) are critical in modulating RNA metabolism and linked with erroneous gene regulation in a wide range of disease conditions [1]. The human genome codes for more than 3500 RBPs [2]. Their emerging role is underscored by genome-wide studies indicating that hundreds of these RBPs are significantly dysregulated in a variety of cancer types [3], where some are even identified as potential cancer drivers [4]. In addition, RBPs are implicated in numerous somatic and mendelian genetic diseases, impacting multiple organ systems in humans such as metabolic, neurodegenerative, musculoskeletal and connective tissue diseases [2]. Altered expression or function of RBPs translates into aberrant control of target RNAs, and hence gene expression, ultimately driving pathological phenotypes [5]. RBP-RNA interactions are driven by RNA-binding domains (RBDs) and are often dysregulated in human cancers [5]. Importantly, many RBPs bind the same sub-set of target RNAs, potentially exploiting a synergistic or competitive physiology [6]. However, the molecular mechanisms by which RBPs direct their specificity, including the selective use of their constituent RBDs to target specific RNA types, remain elusive. RBPs are known to interact with RNA molecules through two RNA-binding motifs (RNP1 and RNP2). RNPs within an RBD provide some underlying principles about how an RBP recognizes specific RNA species [7]. RNPs are evolutionarily conserved among many RBPs and correspond to a β11-β2324structural arrangement [8]. The two beta strands found in the middle of this arrangement (indicated in bold) are known to interact with RNA either as the octameric RNP1 or as the hexameric RNP2 motif with the conserved sequence (R/K)-G-(F/Y)-(G/A)-(F/Y)-V-X-(F/Y) [9] or as the hexameric RNP2 motif with the conserved sequence of (L/I)-(F/Y)-(V/I)-X-(N/G)-L [9]. RNP-RNA interactions are predominantly hydrophobic, and aromatic residues are especially important in mediating the interaction through Van der Waals forces, π-π stacking interactions [10] with nucleotide bases and π-sugar ring interactions [10]. Additionally, basic residues in these conserved motifs also form salt bridges with phosphate groups to enhance stability [10]. Previous studies have established the role of aromatic and basic residues in interactions of hnRNP A1[11,12] and Lin28 [13] with RNA molecules.
Nucleolin (NCL), a multifunctional RBP is often overexpressed in many cancers and disease conditions [14]. NCL is involved in myriads of cellular processes that are ultimately tied to its RNA/DNA-binding functions to regulate gene expression that control cell survival, growth and or death. These roles include sensing stress [15], ribosome biogenesis [16], chromatin remodeling [17], DNA replication, transcription, messenger RNA (mRNA) turnover [18], induction [19] & inhibition [20] of translation, and microRNA (miRNA) biogenesis [21]. NCL protein is organized into distinct functional domains: (a) the highly acidic N-terminal domain with basic stretches that contains the nuclear localization signal, is heavily phosphorylated during the cell cycle by stage-specific kinases, and drives the histone chaperone activity of NCL [17]; (b) the glycine and arginine-rich (RGG/GAR) C-terminal domain, known to play a critical role in protein-protein interactions such as with ribosomal proteins [22] and the tumor suppressor p53 [23] and is also implicated in non-specific interactions with RNA; and (c) the central region constitutes two-to-four distinct RNA-binding domains and is critical for its interaction with different species of RNAs [24]. Most eukaryotic species, including plants, contain only two RBDs in NCL protein, where the individual RBD domains are better conserved among the orthologs than within the protein. Interestingly, NCL from Dictyostelium Discoideum uniquely possesses an odd number of RBDs (three RBDs) suggesting a unique RNA binding profile in this organism (Singh Lab, unpublished). NCL has evolved in vertebrates, including humans, to an increased (four) number of RBDs where RBDs 3 and 4 are unique to these organisms [24]. It is also well-established that RBDs 1 and 2 are sufficient for certain NCL-RNA interactions, specifically binding to mRNA [25,26] and rRNA molecules [16]. The newly emerged RBD3 and 4 domains suggest potential evolutionary novel functions of NCL in these higher organisms. However, in contrast to RBDs 1 and 2, RBDs 3 and 4 have remained overlooked and understudied.
NCL regulates gene expression by binding both coding (mRNA) and non-coding RNA species (rRNA, miRNA, and lnc RNA). It is well established that NCL interacts with RNA preferentially through stem loop structures including apical loops or hairpin loops [20,26] and AU [18,27]/G rich elements [28,29], both serving as signature sequence or structural motifs for NCL-RNA affinity. In fact, a G-rich stem-loop structure called nucleolin recognition element (NRE), found in pre-ribosomal RNA, establishes a primary role of NCL in processing rRNA. Similarly, NCL also demonstrates high affinity for the 11 nt single stranded evolutionary conserved motif (ECM) found 5 nt downstream of the pre-rRNA processing site [30]. Additional RNA recognition motifs which NCL is known to interact with include AU rich elements (ARE), G-quadruplex structures in the tumor suppressor TP53 mRNA [25] and a stem loop forming GCCCGG motif in GADD45α mRNA in DNA damage response [29]. NCL-mRNA interactions mediated by its RBDs influence mRNA turnover rate or translation [25,26], while NCL-lnc-RNA binding also has implications in RNA localization [31,32]. Overall, it is clear that NCL-RNA interactions have a profound influence on many cellular processes that control growth, proliferation, and survival.
As a member of the short non-coding RNA molecules, microRNAs (miRNA) are often dysregulated in many cancers where the aberrant miRNA processing is linked to tumorigenesis [33]. Processing of primary-miRNA (pri-miRNA) to precursor-miRNA (pre-miRNA) in animals is mediated via the microprocessor complex (MPC) in the nucleus. The pre-miRNA is then transported to the cytoplasm by shuttle proteins and subsequently processed into its mature form by Dicer [34,35]. In plants, on the other hand, both the pri- and pre-miRNA are processed solely in the nucleus by Dicer-Like1 protein (DCL-1) and a few more helper proteins [36,37]. A similar mechanism also exists in Dictyostelium Discodeium , a slime mold species, where the double stranded RNA (dsRNA) binding protein RbdB processes pre-miRNA molecules [38]. NCL is known to interact with the active components, Drosha and DCGR8 in the microprocessor complex [21] and the emergence of NCL-RBD3-4 in higher organisms coincides with the roles of NCL in miRNA processing. We, therefore, propose that the emergence of NCL-RBD3-4 in higher organisms coincides with the NCL role in miRNA processing in evolution and that RBD3-4 possess sequence/structural determinants that specifically recognize miRNA precursor molecules in NCL protein.
The focus of this study is to elucidate the selective preference of specific NCL RBDs for the recognition of miRNAs using an in silico approach. Structural information for NCL RBDs and miRNA molecules is either unavailable or limited to partial structures. To fill these structural gaps, in this study we generated 3D models of the human NCL central region containing all 4 RBDs as well and various tandem pairs of RBD, as well as selected miRNAs . Our data include much needed structural models of NCL-RBDs, miRNAs and predicted scenarios of NCL-miRNA interactions from RNA-Protein docking algorithms. Our study suggests a predominant role of NCL RBDs 3 and 4 in miRNA target specificity and provides details about key motifs/residues at the NCL-substrate interface responsible of specific NCL-miRNA interactions. Structural modeling and in silico analysis tools provide valuable information to fill in the knowledge gaps and provide a cost effective and rational entry point in experimental design. Ultimately, the insights from this study can lead to future studies for identifying new drug design targets to regulate NCL functions in gene expression during tumorigenesis.