Other identification tools

IntSplice(http://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice) is the web server to identify SNVs affecting intronic cis-elements. The precise spatiotemporal regulation of splicing is mediated by the splicing cis-elements on pre-mRNA. IntSplice uses an online SVM model based on the analysis of the effect size of each intron nucleotide on the annotated alternative splicing. It can predict the splicing consequences of SNVs at intron positions -50 to -3 in the human genome. The IntSplice model was applied to distinguish pathogenic SNVs from the Human Gene Mutation Database and normal SNVs from the dbSNP database and achieved good results [79].
Natural Language Processing based non-synonymous Single Nucleotide Polymorphism Predictor (NLP-SNPPred,http://www.nlp-snppred.cbrlab.org.) web server could distinguish pathogenic protein-coding variations and neutral protein-coding variations based on the state-of-the-art Natural language Processing (NLP) of Artificial Intelligence (AI). Through feature extraction, a multi-class classifier (CLF1) followed by a binary-class classifier (CLF2) is created. NLP-SNPPred uses the NPL approach to read biological literature for the identification of pathogenic versus neutral variants and outperforms state-of-the-art functional prediction methods and can be used to predict functional effects of protein-coding mutations. NLP will add more features such as homology, epigenomics, and evolutionary information [80].

The impact of genetic variation on macromolecular structure in diseases

Although GWAS has identified some genetic variants, the mechanisms between genetic variation and disease remain unclear. Genetic variations can alter the structure of RNA or proteins, leading to abnormalities in biological function or even disease. Some studies found that the molecular mechanism of some typical diseases was closely related to the structural effects of genetic variation, such as hypertension [81,82], Retinoblastoma [83,84], β Thalassemia [85-90], and so on.
Hereditary hyperferritinemia-cataract syndrome
Hereditary hyperferritinemia-cataract syndrome (HHCS) is a rare disease characterized by high serum ferritin levels and congenital bilateral cataracts. U22G and U22G - G14C are two SNPs in the 5’- UTR of ferritin light (FTL) mRNA. FTL chain is an iron-responsive element (IRE) in 5’- UTR, which plays a major regulatory role in mRNA translation. Some studies found that [91] these two SNPs can affect RNA structure and subsequent gene function. SNP U22G can disrupt the structure of the IRE, leading to abnormal FTL gene regulation. However, U22G - G14C can restore the mutated ire to wild type [92]. Rs886037623 (T22G) changes the spatial structure of mRNA folding because the original U was replaced by G in mRNA [93]. Under normal circumstances, in a low iron environment, iron regulatory proteins (IRP) will combine with IRE in correctly folded mRNA to form a repressor complex of protein synthesis, and the synthesis of ferritin is inhibited. After mutation, the structurally altered mRNA can no longer bind to IRP, the transcriptional regulation is lost, and a large amount of ferritin is secreted, resulting in the formation of hyperferritinemia. At the same time, too much ferritin precipitates in the lens, resulting in cataracts [94]. The effects of genetic variation on HHCS are shown in Figure 5A.
Sicklemia
Sicklemia is the most serious of the abnormal hemoglobinopathy, whose clinical manifestations are chronic hemolytic anemia, susceptibility to infection, and chronic ischemia leading to organ and tissue damage [95]. Its pathogenesis is complex and significantly related to genetic factors. There is evidence that the T is replaced by A, in rs334 on the gene encoding hemoglobin, after the transcription and translation process, glutamic acid is replaced by valine to form abnormal hemoglobin at the sixth position in the β-chain amino acid sequence [96]. When the oxygen partial pressure decreases, hemoglobin molecules interact with each other to form a spiral polymer, which distorts red blood cells into sickle cells, and finally leads to anemia [97]. The effects of genetic variation on Sicklemia are shown in Figure 5B.
The effects of genetic variation on HHCS are shown in Figure 5A. Because of the mutation rs886037623, T -> G, and the corresponding U in the transcribed mRNA is replaced by G, leading to its structural change, IRP cannot bind to it, and finally resulting in the overexpression of ferritin. The effects of genetic variation on Sicklemia are shown in Figure 5B. Because of the mutation rs334, T > A, and through the subsequent transcription and translation process, Glu (E) at position 6 of the amino acid sequence of the protein becomes Val (V). Change the structure of hemoglobin molecules, and finally lead to Sicklemia.
Tumor and cancer
In cancer research, the influence of genetic variation on the macromolecular structure cannot be ignored. For example, Retinoblastoma (RB) is a malignant tumor caused by photoreceptor precursor cells, which is common in children under 3 years old and has family genetic susceptibility [98]. It is proven that some mutations are closely related to RB [84]. J K Cowell et al. [83] identified a novel mutation(G→C) within a core motif of specificity protein 1 (SP1) transcription factor from a family with a mild RB and a band shift of an unidentified protein was found in the mutant oligomer. This protein may affect the expression of the RB1 gene and eventually lead to RB.
In addition, the influence of genetic variation on lncRNA has been extensively explored in some cancers. LINC00673 is a potential tumor suppressor of pancreatic cancer. Rs11655237 is an SNP in the exon of LINC00673, which causes LINC00673 to have a new binding target, thus weakening its role and increasing the risk of pancreatic cancer [99]. A713G and T714C mutations in lncRNA GAS8-AS1 accelerate the growth of cancer cells and increase the risk of thyroid cancer [100]. Abnormal copy number and expression of somatic cells on focally amplified lncRNA on chromosome 1 (FAL1) can inhibit P21and lead to ovarian cancer [11].