Vector Streaming: The Memory Efficient Indexing for Vector Databases PyData Global 2024

Vector Streaming: The Memory Efficient Indexing for Vector Databases
.ical

12-05, 18:30–19:00 (UTC), LLM Track

Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.

The library gives many vector database supports, like Pinecone, Weavaite, and Elastic.

Embedding creation is mostly done synchronously; a lot of time is wasted while the chunks are being created, as chunking is not a compute-heavy operation. As the chunks are being made, passing them to the embedding model would be efficient. This problem further intensifies with late interaction embeddings like CoLBert or ColPali.

The solution is to create an asynchronous chunking and embedding task. We can effectively spawn threads to handle this task using Rust's concurrency patterns and thread safety. This is done using Rust's MPSC (Multi-producer Single Consumer) module, which passes messages between threads. Thus, this creates a stream of chunks passed into the embedding thread with a buffer. Once the buffer is complete, it embeds the chunks and sends the embeddings back to the main thread, where they are sent to the vector database. This ensures no time is wasted on a single operation and no bottlenecks. Moreover, only the chunks and embeddings in the buffer are stored in the system memory. They are erased from the memory once moved to the vector database.

All this is then bound into Python using pyo3 and maturin, so it's easily accessible from Python, but the core is still asynchronous with rust.

Prior Knowledge Expected –

No previous knowledge expected

Sonam

Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings and stream them to vector databases, it’s built in rust and thus it’s more greener and efficient. She works as the GenerativeAI Evangelist at Articul8, spun-off of Interl, Articul8 provides verticle genAI services to enterprise.

Akshay Ballal

I develop AI applications in Python powered by Rust. I am currently doing my masters in AI and Engineering systems at Technical University, Eindhoven. I maintain an Opensource project, EmbedAnything that has 250 stars and over 40000 downloads.

Vector Streaming: The Memory Efficient Indexing for Vector Databases .ical 12-05, 18:30–19:00 (UTC), LLM Track

Vector Streaming: The Memory Efficient Indexing for Vector Databases
.ical

12-05, 18:30–19:00 (UTC), LLM Track