Skip to content

peterthehan/group-similar

Repository files navigation

Group Similar

Discord Twitter Follow

Group similar items together.

Runtime complexity is O(N^2 * (M + α(N))), where N is the number of elements in items, M is the runtime complexity of the similarityFunction, and α(N) is the inverse Ackermann function (amortized constant time for all practical purposes).

Space complexity is O(N).

Getting started

npm i group-similar

Examples

Group similar strings
import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function levenshteinSimilarityFunction(a: string, b: string): number {
  return a.length === 0 && b.length === 0
    ? 1
    : 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
  items: ["cat", "bat", "kitten", "dog", "sitting"],
  mapper: (i) => i,
  similarityFunction: levenshteinSimilarityFunction,
  similarityThreshold: 0.5,
});

// [ [ 'cat', 'bat' ], [ 'kitten', 'sitting' ], [ 'dog' ] ]
Group similar numbers
import { groupSimilar } from "group-similar";

function evenOddSimilarityFunction(a: number, b: number): number {
  return Number(a % 2 === b % 2);
}

groupSimilar({
  items: [1, 5, 10, 0, 2, 123],
  mapper: (i) => i,
  similarityFunction: evenOddSimilarityFunction,
  similarityThreshold: 1,
});

// [ [ 1, 5, 123 ], [ 10, 0, 2 ] ]
Group similar objects
import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function nestedMapper(object: { a: { b: { value: string } } }): string {
  return object.a.b.value;
}

function levenshteinSimilarityFunction(a: string, b: string): number {
  return a.length === 0 && b.length === 0
    ? 1
    : 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
  items: [
    { a: { b: { value: "sitting" } } },
    { a: { b: { value: "dog" } } },
    { a: { b: { value: "kitten" } } },
    { a: { b: { value: "bat" } } },
    { a: { b: { value: "cat" } } },
  ],
  mapper: nestedMapper,
  similarityFunction: levenshteinSimilarityFunction,
  similarityThreshold: 0.5,
});

// [
//   [{ a: { b: { value: "sitting" } } }, { a: { b: { value: "kitten" } } }],
//   [{ a: { b: { value: "dog" } } }],
//   [{ a: { b: { value: "bat" } } }, { a: { b: { value: "cat" } } }],
// ]

Syntax

groupSimilar(options);
groupSimilar({ items, mapper, similarityFunction, similarityThreshold });

Parameters

Parameter Type Required Default Description
options Object Yes none Arguments to pass to the function

Options

Property Type Required Default Description
items T[] Yes none Array of items to group
mapper (t: T) => K Yes none Function to apply to each element in items prior to measuring similarity
similarityFunction (a: K, b: K) => number Yes none Function to measure similarity between mapped items
similarityThreshold number Yes none Threshold at which items whose similarity value is greater than or equal to it are grouped together

Return value

The return value is a new nested array of type T[][] containing elements of items grouped by similarity. If there are no elements in items, an empty array will be returned.

Benchmark

Benchmark test results where N is the number of items being grouped, higher ops/sec is better.

Library N=16 N=32 N=64 N=128 N=256 N=512 N=1024 N=2048
group-similar 86867 17538 6067 1594 444 171 75 27
set-clustering 28506 6258 1831 455 121 30 6 1

Benchmark configuration details can be found here.