As data becomes increasingly interconnected and systems increasingly sophisticated, it’s essential to make use of the rich and evolving relationships within our data. Graphs are uniquely suited to this task because they are, very simply, a mathematical representation of a network. The objects that makeup graphs are called nodes (or vertices) and the links between them are called relationships (or edges).
[See more articles from ODSC West 2019 speakers here!]
A property graph model consists of entities, often called nodes, and links between them, often called relationships. Nodes and relationships can also contain properties and attributes.
Graph algorithms are built to operate on relationships and are exceptionally capable of finding structures and revealing patterns in connected data. This is important because real-world networks tend to form highly dense groups with structure and “lumpy” distributions. We see this behavior in everything from IT and social networks to economic and transportation systems. Traditional statistical approaches don’t fully utilize the topology of data itself and often “average out” distributions. Graph analytics vary from conventional analysis by calculating metrics based on the relationships between things.
Graph algorithms are used when we need to understand structures and relationships to answer questions about the pathways that things might take, how they flow, who influences that flow, and how groups interact. This is essential for tasks like forecasting behavior, understanding dynamic groups, or finding predictive components and patterns.
There are many types of graph algorithms but the three classic categories consider the overall nature of the graph: pathfinding, centrality, and community detection. However, other graph algorithms such as similarity and link prediction consider and compare specific nodes.
- Pathfinding algorithms are fundamental to graph analytics and explore routes between nodes.
- Centrality algorithms help us understand the impact of individual nodes to the overall network. They identify the most influential nodes and help us understand group dynamics.
- Community algorithms find communities where members have more relationships within the group that outside it. This helps infer similar behavior or preferences, estimate resiliency and prepare data for other analyses.
- Similarity algorithms look at how alike individual nodes are by comparing the properties and attributes of nodes.
- Link Prediction algorithms consider the proximity of nodes as well as structural elements, such as potential triangles, to estimate the formation of new relationships or the existence of undocumented connections.
Example Combating Fraud
Let’s say we’re trying to combat fraud in online orders. We likely already have profile information or behavioral indicators that would flag fraudulent behavior. However, it can be difficult to differentiate between behaviors that indicate a minor offense, unusual activity, and a fraud ring. This can lead us into a lose-lose choice: Chase all suspicious orders—which is costly and slows business—or let most suspicious activity go by. Moreover, as criminal activity evolves, we could be blind to new patterns.
Graph algorithms, such as Louvain Modularity, can be used for more advanced community detection to find group interacting at different levels. For example, in a fraud scenario, we may want to correlate tightly knit groups of accounts with a certain threshold of returned products. Or perhaps we want to identify which accounts in each group have the most overall incoming transactions, including indirect paths, using the PageRank algorithm.
To illustrate these algorithms, below is a screenshot using Louvain and PageRank on season two of Game of Thrones. It finds community groups and the most influential characters using our experimental tool, the Graph Algorithms Playground. Notice how Jon is influential in a weakly-connected community but not overall, and that the Daenerys group is isolated. Interestingly, it’s been noted that highly connected “islands” of communities can signal fraud in certain financial networks.
We’ve quickly overviewed what graphs are and how graph algorithms are uniquely suited for today’s connected data, however, we’ve just scratched the surface of what’s possible. If you’re interested in diving deeper, consider attending our training, “Reveal Predictive Patterns with Neo4j Graph Algorithms” at ODSC West 2019 on Wednesday, October 30th.
[Related Article: Creating Multiple Visualizations in a Single Python Notebook]
We also recommend downloading a free copy of the O’Reilly book, “Graph Algorithms: Practical Examples in Apache Spark and Neo4j” while it’s still available. This book walks through hands-on examples of how to use graph algorithms in Apache Spark and Neo4j, including a chapter dedicated to machine learning.
Jennifer is a Developer Relations Engineer at Neo4j, conference speaker, blogger, and an avid developer and problem-solver. She has worked with a variety of commercial and open source tools and enjoys learning new technologies, sometimes on a daily basis! Her passion is finding ways to organize chaos and deliver software more effectively.
Amy E. Hodler
Amy is a network science devotee and a program director for AI and graph analytics at Neo4j. Amy is the co-author of Graph Algorithms: Practical Examples in Apache Spark and Neo4j. She tweets @amyhodler
Originally posted on OpenDataScience.com