Edge features
GNN defines a graph as a set of \(G=\left(V,E\right)\),\(V\)and\(\ E\) representing nodes and edges respectively. Nodes are divided into word nodes and document nodes, and the edges between word nodes are defined as Point-wise Mutual Information (PMI)4. The edge between the word node and the document node is defined as Term Frequency-Inverse Document Frequency (TF-IDF)25. The weight of two nodes \(i\) and \(j\)the edges between them is defined as:
where \(A_{i,j}\) is the adjacency matrix, the word frequency denotes the number of times a word appears in the document, and inverse document frequency is the logarithm of the total number of documents over the number of documents containing the word. PMI is a popular word association measure, which can collect co-occurrence information on all documents in the corpus with a sliding window of fixed size and calculate the weight between two word-nodes. The PMI values for words\(i\) and words \(j\) are calculated as:
where, \(\#W\left(i\right)\ \)is the number of sliding windows containing words \(i\) in the corpus, \(\#W\left(i,j\right)\ \)is the number of windows containing words \(i\) and \(j\), \(\#W\) represents the total number of sliding windows in the corpus. To extract multidimensional edge features, the \(N\times N\) dimensional adjacency matrix \(A_{i,j}\ \) is raised to an \(N\times N\times P\) tensor\({\hat{E}}_{\text{ijp}}\), where \(P\) represents the \(P\) dimensional features of the edge. The specific process is shown in Figure 3.
Figure 3. Upgrading of tensor dimension\(A_{i,j}.\)
The tensor changes with the training of the network, and the extra dimension is the newly learned weight. The adjacency matrix, after dimensionalization, represents edge features with the value of continuous multidimensional, which can make full use of edge features compared with traditional GNN. After network training is completed,\({\hat{E}}_{\text{ijp}}\)is normalized as follows:
The | | operator joins operations. Herein, the initial node feature of the graph \(X\) is defined as the identity matrix, that is, each word or document is represented as a one-hot vector. After two layers of GCN, \(X\) is sent to Softmax for classification:
Where, \(E=D^{-1/2}E_{\text{ij}}D^{-1/2}\) denotes the normalized edge feature matrix. \(W_{0}\) and \(W_{1}\) are the weight parameters of training respectively.The output of the GCN layer is considered as the final representation of the document, which is then fed to the Softmax layer for classification.