graph_tool.topology.vertex_similarity#
- graph_tool.topology.vertex_similarity(g, sim_type='jaccard', vertex_pairs=None, eweight=None, sim_map=None)[source]#
Return the similarity between pairs of vertices.
- Parameters:
- g
Graph
The graph to be used.
- sim_type
str
(optional, default:"jaccard"
) Type of similarity to use. This must be one of
"dice"
,"salton"
,"hub-promoted"
,"hub-suppressed"
,"jaccard"
,"inv-log-weight"
,"resource-allocation"
or"leicht-holme-newman"
.- vertex_pairsiterable of pairs of integers (optional, default:
None
) Pairs of vertices to compute the similarity. If omitted, all pairs will be considered.
- eweight
EdgePropertyMap
(optional, default:None
) Edge weights.
- sim_map
VertexPropertyMap
(optional, default:None
) If provided, and
vertex_pairs is None
, the vertex similarities will be stored in this vector-valued property. Otherwise, a new one will be created.
- g
- Returns:
- similarities
numpy.ndarray
orVertexPropertyMap
If
vertex_pairs
was supplied, this will be anumpy.ndarray
with the corresponding similarities, otherwise it will be a vector-valued vertexVertexPropertyMap
, with the similarities to all other vertices.
- similarities
Notes
According to
sim_type
, this function computes one of the following similarities:sim_type == "dice"
The Sørensen–Dice similarity [sorensen-dice] of vertices \(u\) and \(v\) is defined as
\[\frac{2|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)|+|\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "salton"
The Salton (or cosine) similarity [salton] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\sqrt{|\Gamma(u)||\Gamma(v)|}},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-promoted"
The “hub promoted” similarity [ravasz_hierarchical_2002] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\max(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-suppressed"
The “hub suppressed” similarity of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\min(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "jaccard"
The Jaccard similarity [jaccard] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)\cup\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "inv-log-weight"
The inverse log weighted similarity [adamic-friends-2003] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{\log |\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "resource-allocation"
The resource allocation similarity [zhou-predicting-2009] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{|\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "leicht-holme-newman"
The Leicht-Holme-Newman similarity [leicht_vertex_2006] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)||\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
For directed graphs, only out-neighbors are considered in the above algorthms (for “inv-log-weight”, the in-degrees are used to compute the weights). To use the in-neighbors instead, a
GraphView
should be used to reverse the graph, e.g.vertex_similarity(GraphView(g, reversed=True))
.For weighted or multigraphs, in the above equations it is assumed the following:
\[\begin{split}|\Gamma(u)\cap\Gamma(v)| &= \sum_w \min(A_{wv}, A_{wu}),\\ |\Gamma(u)\cup\Gamma(v)| &= \sum_w \max(A_{wv}, A_{wu}),\\ |\Gamma(u)| &= \sum_w A_{wu},\end{split}\]where \(A_{wu}\) is the weighted adjacency matrix.
See [liben-nowell-link-prediction-2007] for a review of the above.
The algorithm runs with complexity \(O(\left<k\right>N^2)\) if
vertex_pairs is None
, otherwise with \(O(\left<k\right>P)\) where \(P\) is the length ofvertex_pairs
.If enabled during compilation, this algorithm runs in parallel.
References
[salton]G. Salton, M. J. McGill, “Introduction to Modern Informa-tion Retrieval”, (MuGraw-Hill, Auckland, 1983).
[ravasz_hierarchical_2002]Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L., “Hierarchical organization of modularity in metabolic networks”, Science, 297(5586), 1551-1555, (2002). DOI: 10.1126/science.1073374 [sci-hub, @tor]
[leicht_vertex_2006]E. A. Leicht, Petter Holme, and M. E. J. Newman, “Vertex similarity in networks”, Phys. Rev. E 73, 026120 (2006), DOI: 10.1103/PhysRevE.73.026120 [sci-hub, @tor], arXiv: physics/0510143
[adamic-friends-2003]Lada A. Adamic and Eytan Adar, “Friends and neighbors on the Web”, Social Networks Volume 25, Issue 3, Pages 211–230 (2003) DOI: 10.1016/S0378-8733(03)00009-1 [sci-hub, @tor]
[liben-nowell-link-prediction-2007]David Liben-Nowell and Jon Kleinberg, “The link-prediction problem for social networks”, Journal of the American Society for Information Science and Technology, Volume 58, Issue 7, pages 1019–1031 (2007), DOI: 10.1002/asi.20591 [sci-hub, @tor]
[zhou-predicting-2009]Zhou, Tao, Linyuan Lü, and Yi-Cheng Zhang, “Predicting missing links via local information”, The European Physical Journal B 71, no. 4: 623-630 (2009), DOI: 10.1140/epjb/e2009-00335-8 [sci-hub, @tor], arXiv: 0901.0553
Examples
>>> g = gt.collection.data["polbooks"] >>> s = gt.vertex_similarity(g, "jaccard") >>> color = g.new_vp("double") >>> color.a = s[0].a >>> gt.graph_draw(g, pos=g.vp.pos, vertex_text=g.vertex_index, ... vertex_color=color, vertex_fill_color=color, ... vcmap=matplotlib.cm.inferno, ... output="polbooks-jaccard.svg") <...>
Jaccard similarities to vertex
0
in a political books network.#