k-NN Benchmarks Part I – Wikipedia

This article is the first in a series comparing different available methods for accelerating large-scale k-Nearest Neighbor searches on high-dimensional vectors (i.e., 100 components or more). The emphasis here is on practicality versus novelty–that is, we’re focusing on solutions which are readily available and can be used in production applications Read more…

Concept Search on Wikipedia

I recently created a project on GitHub called wiki-sim-search where I used gensim to perform concept searches on English Wikipedia. gensim includes a script, make_wikicorpus.py, which converts all of Wikipedia into vectors. They’ve also got a nice tutorial on using it here. I started from this gensim script and modified it heavily to comment and Read more…