Shauna is a speaker for ODSC East 2020 this April 13-17 in Boston. Be sure to check out her talk, “Graph Neural Networks and their Applications,” there!
Graph neural networks have revolutionized the performance of neural networks on graph data. Companies such as Pinterest, Google, and Uber have implemented graph neural network algorithms to dramatically improve the performance of large-scale data-driven tasks.
Introduction to Graphs
A graph is any dataset that contains nodes and edges. Nodes are entities (e.g. people, organizations), and edges represent the connections between nodes. For example, a social network is a graph in which people in the network are considered nodes. Edges exist when two people are connected in some way (e.g. friends, sharing one’s posts).
In retail applications, customers and products can be viewed as nodes. An edge shows the relationship between the customer and a purchased product. A graph can be used to represent spending habits for each customer. Additionally, nodes can also have features or attributes. People have attributes such as age and height, and products have attributes such as price and size. Pinterest has used graph neural networks in this fashion to improve the performance of its recommendation system by 150%.
The advent of Graph Neural Networks
Until the development of graph neural networks, deep learning methods could not be applied to edges to extract knowledge and make predictions, and instead could only operate based on the node features. Applying deep learning to graph data allows us to approach tasks link prediction, community detection, and generating recommendations.
Deep learning can also be applied to node classification, or predicting the label of an unlabelled node. This takes place in a semi-supervised setting, where the labels of some nodes are known, but others are unknown. A survey of deep learning node classification methods shows a history of advances in state-of-the-art performance while illustrating the range of use cases and applications.
Methods discussed in this blog post were evaluated on the benchmark CoRA dataset. CoRA consists of journal publications in deep learning. Each publication is a node, and edges exist between nodes if they cite or are cited by another paper in the dataset. The dataset is comprised of 2708 publications with 5429 edges.
DeepWalk, released in 2014, was the first significant deep learning-based method to approach node classification. DeepWalk’s approach was similar to that taken in natural language processing (NLP) to derive embeddings. An embedding is a vector-representation of an object such a word in NLP or a node in a graph. To create its embeddings, DeepWalk takes truncated random walks from graph data to learn latent representations of nodes. On the CoRA dataset, DeepWalk achieved 67.2% accuracy on the benchmark node classification experiment. At the time, this was 10% better than competing methods with 60% less training data.
To demonstrate how the embeddings generated by DeepWalk contain information about the graph structure, below is a figure that shows how DeepWalk works on Zachary’s Karate Network. The Karate Network is a small network of connections of members of a karate club, where edges are made between two members if they interact outside of karate. The node coloring is based on the different sub-communities within the club. The left image is the input graph of the social network, and the right is the two-dimensional output generated by the DeepWalk algorithm operating on the graph data.
Graph Convolutional Networks
In 2016, Thomas N. Kipf and Max Welling introduced graph convolutional networks (GCNs), which improved the state-of-the-art CoRA benchmark to 81.5%. A GCN is a network that consists of stacked linear layers with an activation function. Kipf and Welling introduced a new propagation function that operates layer-wise and works directly on graph data.
Source:  The t-SNE visualization of the two-layer GCN trained on the CoRA dataset using 5% of labels. The colors represent document class.
The number of linear layers in a GCN determines the size of the target node neighborhood to consider when making the classification prediction. For example, one hidden layer would imply that the graph network only examines immediate neighbors when making a classification decision.
The input to a graph convolutional network is an adjacency matrix, which is the representation of the graph itself. It also takes the feature vectors of each node as input. This can be as simple as a one-hot encoding of each node’s attributes, while more complex versions can be used to represent the complex features of a node.
Applying GCN to Real-World Data
Our research team was interested in implementing a GCN on real-world data in order to speed up analyst tasking. We implemented a GCN architecture written in PyTorch to perform node classification on article data. The graph used in our dataset was derived from article data grouped together in “playlists” by a user who determined that they were relevant. The nodes were individual articles and playlists, and edges existed between an article and a playlist if a specific article was included in the playlist. Instead of manually poring through the corpus to determine additional relevant articles, we used the GCN to recommend other potentially relevant documents. After running our corpus of about 100,000 articles and 7 different playlists through a 2-layer GCN, our network performed 5x better than random.
GCN’s were the leading architecture for years, and many variations of them were subsequently released. Then, in January 2020 Graph-BERT removed the dependency on links and reformatted the way that graph networks are usually represented. This is important for scalability, while also showing improved accuracy and efficiency relative to other types of graph neural networks. We are currently exploring how Graph-BERT will impact the use cases we have been addressing with graph neural networks.
Graph neural networks are an evolving field in the study of neural networks. Their ability to use graph data has made difficult problems such as node classification more tractable.
For a deeper discussion on graph neural networks and the problems that they can help solve, attend my talk at ODSC East, “Graph Neural Networks and their Applications.”
 R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proc. of KDD. ACM, 2018, pp. 974–983.
 Sanchez-Lengeling , Benjamin, et al. “Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules.” ArXiv, no. 1910.10685, 2019. Stat.ML, arxiv.org/abs/1910.10685.
 Uber Engineering. “Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations”, eng.uber.com/uber-eats-graph-learning.
 Perozzi, Bryan, Al-Rfou, Rami and Skiena, Steven. “DeepWalk: Online Learning of Social Representations.” CoRR abs/1403.6652 (2014).
 “Node Classification on CoRA.” paperswithcode.com/sota/node-classification-on-coraom.
 Kipf, Thomas N. and Welling, Max. “Semi-Supervised Classification with Graph Convolutional Networks.” Paper presented at the meeting of the Proceedings of the 5th International Conference on Learning Representations, 2017.
 Zhang, Jiawei, Zhang, Haopeng, Xia, Congying, and Sun, Li. “Graph-Bert: Only Attention is Needed for Learning Graph Representations.” arxiv.org/abs/2001.05140 (2020).
About the speaker/author: Shauna Revay is an applied machine learning researcher for Novetta’s Machine Learning Center of Excellence. She specializes in rapid prototyping of ML solutions spanning fields such as NLP, audio analysis and ASR, and the implementation of graph neural networks. Shauna earned her PhD in mathematics at George Mason University where she studied harmonic and Fourier analysis. She pursues research in interdisciplinary topics and has published papers in mathematics, biology, and electrical engineering journals and conference proceedings.