Operator Log #0006
Object-Based Search, Performance Improvements, Bug Fixes
Oh my, it’s almost the end of July! Time goes by quick when you’re having fun 😊
We have a new endpoint in our API,
One of the most difficult parts about building search engines, or software products in general, is the UI/UX side of it — how do you present information in a way that’s easily digestible for end-users? At least on the search side, users are generally used to document-based result, i.e. for a webpage, with an included snippet of text.
As of today, we support these types of interfaces natively in our API, specifically, returning a set of objects for a given query along with a small snippet of text sourced from that document.
Little insider tip: Including a
title property in the metadata (where applicable) or within the properties of an object will improve results here, as the title is partially weighted in the search results. If you’re indexing HTML and/or Markdown documents, and don’t specify a title, we’ll do our best to extract it automatically.
We’ve also gone ahead and made a new pre-built search interface for object-based search. If you want to search over the objects within a collection, simply press CMD+K → Create public interface and select “object-search” in the Operand dashboard.
Here’s an example of the endpoint in action:
If you want to search over Paul Graham’s Essays yourself, here’s the shareable link 🙂
Search My Notion
We mentioned searchmynotion.com in the last Operator Log™️, but we officially shipped the site this week with along with the object based search endpoint. Thanks to everyone who checked it out and helped us find lots of bugs. It is a great example of using object-based search, and all of the code is public on our Github.
Spent a bit of time this week going through and optimizing some of our endpoints to handle larger collections of objects, or more specifically, large objects in general. We’ve made the following key performance improvements.
- Indexing large objects (i.e. Notion workspaces, RSS feeds) should now be considerably faster. For example, indexing an RSS feed of Paul Graham’s essays is now 1.7x faster (~13 minutes → ~5 minutes). We expect this to get even faster in the coming weeks.
- Deletions of large objects or collections should now also be significantly faster, up to 10x. Behind the scenes, we’ve offloaded a lot of this work onto background workers and avoided blocking the caller and/or timing out the HTTP request.
Bug Fixes / Improvements
- Fixed some concurrency issues within our engines, there was a pretty nasty race condition hiding in the depths of them involving our filtering logic.
- Fixed an issue with RSS feed indexing which prevented an RSS feed from being properly indexed if one of the feed items weren’t able to be indexed. We now take a “best effort” approach here, meaning we “do our best” to index all the feed items we can.
- Fixed an issue with our indexing pipeline where Notion page titles wouldn’t be properly indexed within object based search.