Back to blog

Operator Log #0002

Index Sharing, TypeAhead Beta, EPUB Support, Multilingual Support

Operator log

Happy Canada Day folks! This past week has been all over the place, yet overall incredibly productive. Let’s get into the updates!

Index Sharing

It’s one thing to curate a collection of objects (e.g., texts, PDFs, images), but it’s another thing entirely to be able to share that knowledge within your friends, family, co-workers, or even a bigger audience (perhaps your Twitter followers)!

Our web UI now features a (purposely) prominent sharing button, which opens a modal and allows you to configure a new search interface for a collection. When you’re finished configuring the UI, you’ll be redirected to a new page (with a unique URL) that you can share with anyone.

It’s important to note that the interface is specific to the collection that it was created for, and will only operate on data within that collection (and any sub-collections). If you add or remove data from this collection, it will be searchable in the UI (keep that in mind if you’re indexing sensitive data).

For security and peace of mind, you can keep track of and revoke any existing UIs in the settings page of the dashboard (Settings → API Keys).

Index Sharing

iMessage Demo

This past weekend, we released our first open-source demo of the Operand API in action, specifically, indexing incoming iMessage and SMS messages. Specifically, using an open-source tool we wrote, we intercepted all incoming messages on a macOS machine and uploaded them as text objects to Operand. In addition to messages, we also automatically index photos and PDFs.

The code for this demo project is located here, and was released alongside a blog post.

TypeAhead Beta

We’ve written in detail about the difference between Search and Discovery (see: Search is Solved, Discovery is Not). One of the fundamental differences is that discovery happens when the user isn’t searching. More specifically, it involves surfacing something the user wasn't actively looking for, or perhaps didn't know existed.

One of the best interfaces for discovery, in our opinion, is an inline writer interface, something that we’re calling TypeAhead. We're releasing this today in early beta.

TypeAhead

TypeAhead combines the power of large language models with your Operand index, meaning completions aren’t generic, they’re unique and personalized to each individual user.

You can create your own TypeAhead interface today through the sharing menu of the Operand dashboard, or via the API.

EPUB Support

It’s now possible to index entire EPUB files with Operand, directly on the dashboard or via the API. Whether you’re indexing a textbook or your favourite light read, we’ve got you covered.

Diving deeper into the “why” behind this feature, books struggle from information density, or more specifically, the lack thereof. Often, a good book will be filled with a bunch of small gems, usually crowded or hidden by a bunch of noise and other fluff. Indexing a book, or even your entire book library, with Operand gives us the chance to surface those hidden gems at just the right time, when you need it the most.

EPUB

Multilingual Embeddings

Embeddings are fundamental to how we operate our search product, specifically, they allow us to compare the “semantic similarity” of two different pieces of text. By computing the embedding of the query and comparing it to a corpus of indexed text, we can find the most similar pieces of content for any given query. At its core, this is essentially how our semantic search engine works.

Embedding quality has a huge impact on search performance in a semantic search engine, and the characteristics of how embedding models are trained (for instance, what languages the models are exposed to) is really important.

As of this week, we’re now offering two core embedding models as part of our service:

  • General purpose embeddings, optimized for high-quality semantic searches over the top 29 languages in the CCNet dataset. Not only can these embeddings perform single-language queries (i.e. searching French documents in French), but they can also be used to perform cross-language queries (i.e. searching Spanish documents with an English query).
  • English-optimized embeddings, which are optimized for English queries and documents. This model performs slightly better than our general purpose embeddings on English texts.

Currently, the default embedding model for our customers is english-optimized, since the majority of our index and queries are in English. New and existing enterprise-level customers can opt-in to our general-purpose embeddings, best for multilingual workloads.

Bug Fixes / Improvements

  • Fixed a bug where indexing HTML documents would fail if there were <li /> tags present in the document. Specifically, list items weren’t handled very well by our parser.
  • Fixed a bug where objects wouldn’t get indexed properly in the event of a server shutdown or failure. As of now, if a document is dequeued but not able to be indexed, it is re-queued (and likely able to be picked up by another machine).
  • Fixed a bug where objects would be marked as “indexing” indefinitely if an error occurred during indexing. We now properly mark these documents as errored, show this on our dashboard UI, and report the errors to our engineering team.
  • Fixed a typo in our TypeScript SDK, where the searchRelated function returned the wrong type (specifically, the response type for the searchContents function).
  • Fixed a bug where dragging and dropping a document in our dashboard UI would fail due to a CORS error. We’ve since updated our storage server implementation which should’ve resolved this issue for good. Please contact us if this issue is persisting.