Sequence analysis and generation of 3D models of NCL-RBDs
Since only partial structural information for NCL-RBDs is available,
robust 3D models of all 4 NCL RBDs (RBD1-4), and RBD3-4 were built;
structural information for RBD1-2 in tandem (PDB ID: 2KRR) [39] and
for individual RBDs (PDB ID: 1FJ7 & 1FJC, respectively) [42] is
available for human NCL. The human NCL sequence available from NCBI
database [58] was analyzed for its domain architecture using the
programs SMART [59], Pfam [60], Uniprot [61], and Interpro
[62] to confirm the domain boundaries of the individual RBDs
accurately as well as to identify any potential sequence motifs of
relevance. The multiple sequence alignment tool Clustal Omega [63]
was used to align the NCL-RBDs with hnRNPA1 RBDs to identify conserved
residues. The multiple sequence alignment was visualized using the
alignment editor Espript3 [64].
Delineated tandem domain pairs were modeled using both template-based
methods (Swissmodel [65] Intfold [66], Phyre2 [67] andab initio modeling approaches (Robetta [68], QUARK[69],
and I-TASSER [70]) to generate structural models. To identify high
quality models, the constructed models were rigorously evaluated by
model verification programs including Verify3D [71], VoroMQA
[72], Prosa-web [73], and ProQ3 [74] (Supporting Tables S1
and S2) and correlation of their biophysical and structural properties
with experimental observations. Top models were refined using ModRefiner
[75] and SCWRL4 [76] and then re-evaluated. ModRefiner first
modifies the protein side chain packing by adding atoms and improves the
structural quality of reconstructed models by energy minimization
procedures. SCWRL4 focuses on side chain refinements to improve the
models. Top scoring models were chosen for further analysis
(Supporting Tables S1 and S2 ).