Find nearest neighbors

kd_nearest_neighbors(x, v, n, ...)

# S3 method for arrayvec
kd_nearest_neighbors(x, v, n, p = 2, a = 0, ...)

# S3 method for matrix
kd_nearest_neighbors(x, v, n, cols = NULL, p = 2, a = 0, ...)

# S3 method for data.frame
kd_nearest_neighbors(x, v, n, cols = NULL, w = NULL, p = 2, a = 0, ...)

kd_nn_indices(x, v, n, ...)

# S3 method for arrayvec
kd_nn_indices(x, v, n, distances = FALSE, p = 2, a = 0, ...)

# S3 method for matrix
kd_nn_indices(
  x,
  v,
  n,
  cols = NULL,
  distances = FALSE,
  p = 2,
  a = 0,
  validate = TRUE,
  ...
)

# S3 method for data.frame
kd_nn_indices(
  x,
  v,
  n,
  cols = NULL,
  w = NULL,
  distances = FALSE,
  p = 2,
  a = 0,
  validate = TRUE,
  ...
)

kd_nearest_neighbor(x, v)

# S3 method for arrayvec
kd_nearest_neighbor(x, v)

# S3 method for matrix
kd_nearest_neighbor(x, v)

Arguments

x

an object sorted by kd_sort

v

a vector specifying where to look

n

the number of neighbors to return

...

ignored

p

exponent of p-norm (Minkowski) distance

a

approximate neighbors within (1 + a)

cols

integer or character vector or formula indicating columns

w

distance weights

distances

return distances as attribute if true

validate

if FALSE, no input validation is performed

Value

kd_nearest_neighborsone or more rows from the sorted input
kd_nn_indicesa vector of row indices indicating the result
kd_nearest_neighborthe row index of the neighbor

Details

Distance is calculated as $$D_{ij} = [\sum_k w_k G(x_{ik}, x_{jk}) ^ p] ^ {1 / p}$$ where \(i\) and \(j\) are records, \(k\) is the \(k\)th field or tuple element, and \(w_k\) is the weight in the \(k\)th dimension. Here, \(G\) depends on the type. For reals, \(G(a, b) = |a - b|\). For logicals and integers, \(G(a, b)\) is one if \(a = b\) and zero otherwise. For strings, \(G(a, b)\) is Levenshtein or edit distance. Convert strings to factors unless edit distance makes sense for your application.

When using the cols argument, the search key v is handled specially. If the length of v is equal to the number of columns of x, then it is assumed that the key is given in the same order as the columns of x. In that case, the key v is mapped through the cols argument. This will possibly change the order of the elements and length of the search key. The reason for this is that it allows one to use a row of x as the key and it will respect the cols argument. Otherwise, or if validate is FALSE, the search key v is passed unchanged and must be given with the correct length and order to match the cols argument. The same is true of the w parameter.

Examples

if (has_cxx17()) {
x = matrix(runif(200), 100)
y = matrix_to_tuples(x)
kd_sort(y, inplace = TRUE)
y[kd_nearest_neighbor(y, c(1/2, 1/2)),]
kd_nearest_neighbors(y, c(1/2, 1/2), 3)
y[kd_nn_indices(y, c(1/2, 1/2), 5),]
}
#>           [,1]      [,2]
#> [1,] 0.5262575 0.4970160
#> [2,] 0.4668478 0.4783650
#> [3,] 0.5236941 0.5392107
#> [4,] 0.4728633 0.4533170
#> [5,] 0.4516867 0.5455092