ChiSeL: Graph Similarity Search using Chi-Squared Statistics in Large Probabilistic Graphs

Published in Proceedings of the International Conference on Very Large Data Bases (VLDB), 2020,, 2020

Subgraph querying is one of the most important primitives in many applications. Although the field is well studied for deterministic graphs, in many situations, the graphs are probabilistic in nature. In this paper, we address the problem of subgraph querying in large probabilistic labeled graphs. We employ a novel algorithmic frame work, called ChiSeL, that uses the idea of statistical significance for approximate subgraph matching on uncertain graphs that have uncertainty in edges. For each candidate matching vertex in the target graph that matches a query vertex, we compute its statistical significance using the chi-squared statistic. The search algorithm then proceeds in a greedy manner by exploring the vertex neighbors having the largest chi-square score. In addition to edge uncertainty, we also show how ChiSeL can handle uncertainty in labels and/or vertices. Experiments on large real-life graphs show the efficiency and effectiveness of our algorithm.

Download paper here

Recommended citation:
ChiSeL: Graph Similarity Search using Chi-Squared Statistics in Large Probabilistic Graphs, Shubhangi Agarwal, Sourav Dutta and Arnab Bhattacharya, Proceedings of the International Conference on Very Large Data Bases (VLDB), 2020, pages 1654-1668, Japan.