VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev’s Inequality
Published in CIKM 2021: The 30th {ACM} International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, 2021
Approximate subgraph matching, an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computational efficiency, and robustness to noise. In this paper, we propose VerSaChI for finding the top-k most similar subgraphs based on 2-hop label and structural overlap similarity with the query. The similarity is characterized using Chebyshev’s inequality to compute the chi-square statistical significance for measuring the degree of matching of the subgraphs. Experiments on real-life graph datasets showcase significant improvements in terms of accuracy compared to state-of-the-art methods, as well as robustness to noise.
Recommended citation:
VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev’s Inequality”, Shubhangi Agarwal, Sourav Dutta and Arnab Bhattacharya, Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2021, pages 2812-2816, Australia.