Integrate with Annoy¶
Comet integrates with Annoy.
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are[mmapped] (https://en.wikipedia.org/wiki/Mmap) into memory so that many processes may share the same data.
Configure Comet for Annoy¶
When using Comet with Annoy, there is no additional data that is logged automatically.
End-to-end example¶
import random
import comet_ml
from annoy import AnnoyIndex
comet_ml.login()
experiment = comet_ml.start(project_name="comet-example-annoy-doc")
# Annoy hyper-parameters
f = 40 # Length of item vector that will be indexed
metric = "angular"
seed = 42
output_file = "test.ann"
# Create and fill Annoy Index
t = AnnoyIndex(f, metric)
t.set_seed(seed)
for i in range(1000):
v = [random.gauss(0, 1) for z in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
t.save(output_file)
# Comet logging
index_metadata = {
"f": f,
metric: metric,
"n_items": t.get_n_items(),
"n_trees": t.get_n_trees(),
"seed": seed,
}
experiment.log_parameters(index_metadata, prefix="annoy_index_1")
experiment.log_asset(output_file, metadata=index_metadata)
This example will log the following hyper-parameter:
annoy_index_1_f
: The length of item vector that will be indexedannoy_index_1_angular
: The distance metric used to create the Annoy indexannoy_index_1_n_items
: The number of items in the indexannoy_index_1_n_trees
: The number of trees in the indexannoy_index_1_seed
: The random number generator seed
An asset named test.ann
will be logged to the Experiment. It contains the index content saved as a file. All of the hyperparameters are also saved as a JSON metadata of that asset.
Jan. 17, 2025