graph_tool.generation.generate_knn#
- graph_tool.generation.generate_knn(points, k, dist=None, pairs=False, exact=False, local=True, r=0.5, max_rk=None, epsilon=0.001, directed=False, verbose=False)[source]#
Generate a graph of k-nearest neighbors (or pairs) from a set of multidimensional points.
- Parameters:
- pointsiterable of lists (or
numpy.ndarray) of dimension \(N\times D\) orint Points of dimension \(D\) to be considered. If the parameter dist is passed, this should be just an int containing the number of points.
- k
int Number of nearest neighbors per node (or number of pairs if
pairs is True).- distfunction (optional, default:
None) If given, this should be a function that returns the distance between two points. The arguments of this function should just be two integers, corresponding to the vertex index. In this case the value of
pointsshould just be the total number of points. Ifdist is None, then the L2-norm (Euclidean distance) is used.- pairs
bool(optional, default:False) If
True, thekclosest pairs of nodes will be returned, otherwise theknearest neighbors for every edge is returned.- exact
bool(optional, default:False) If
False, an fast approximation will be used, otherwise an exact but slow algorithm will be used.- local
bool(optional, default:True) If
True, the local version of the algorithm will be used.- r
float(optional, default:.5) If
exact is False, this specifies the fraction of randomly chosen neighbors that are used for the search.- max_rk
int(optional, default:None) If provided and
exact is Falseandlocal is false, this specifies the maximum number of randomly chosen out neighbors to consider during each iteration. A value ofNoneimplies that all out neighbors are considered.- epsilon
float(optional, default:.001) If
exact is False, this determines the convergence criterion used by the algorithm. When the fraction of updated neighbors drops below this value, the algorithm stops.- directed
bool(optional, default:False) If
Truea directed version of the graph will be returned, otherwise the graph is undirected.
- pointsiterable of lists (or
- Returns:
- g
Graph The k-nearest neighbors graph.
- w
EdgePropertyMap Edge property map with the computed distances.
- g
Notes
The approximate version of this algorithm is based on [dong-efficient-2011], and has an (empirical) run-time of \(O(N^{1.14})\). The exact version has a complexity of \(O(N^2)\).
If
pairs is True, the \(k\) closest pairs are found from the nearest neighbors problem as described in [lenhof-k-closest].If enabled during compilation, this algorithm runs in parallel.
References
[dong-efficient-2011]Wei Dong, Charikar Moses, and Kai Li, “Efficient k-nearest neighbor graph construction for generic similarity measures”, In Proceedings of the 20th international conference on World wide web (WWW ‘11). Association for Computing Machinery, New York, NY, USA, 577–586, (2011) DOI: https://doi.org/10.1145/1963405.1963487 [sci-hub, @tor]
[lenhof-k-closest]HP Lenhof, M Smid, “The k closest pairs problem”, https://people.scs.carleton.ca/~michiel/k-closestnote.pdf
Examples
>>> points = np.random.random((1000, 10)) >>> g, w = gt.generate_knn(points, k=5)