Object Types
All of the "things" you can index with Operand. For example, an HTML webpage, PDF, EPUB, or even a podcast.
Operand supports a number of different "object types", i.e. things you can index.
This article will outline all of the different types, and how to use them.
If you haven't already, follow our getting started guide to get setup!
Text
The most basic type of object you can index is a plain text document.
const object = await operand.upsert({
type: Object.ObjectType.TEXT,
metadata: {
value: {
case: "text",
value: {
text: "This is some cool text.",
},
},
},
});
HTML
HTML content, or the content of a webpage.
const object = await operand.upsert({
type: Object.ObjectType.HTML,
metadata: {
value: {
case: "html",
value: {
html: "<p>Hello World</p>",
},
},
},
});
If you want to index a webpage, you can use the _url
property to allow us to auto-fetch the content for you. In this case, you don't need to provide the html
directly.
const object = await operand.upsert({
type: Object.ObjectType.HTML,
metadata: {
value: {
case: "html",
value: {},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value: "https://en.wikipedia.org/wiki/Virus",
},
},
},
},
});
Websites
By parsing out the sitemap.xml
files found on webpages, we can automatically index the content of a website. Additionally, we'll keep the content up to date as the website changes automatically.
const object = await operand.upsert({
type: Object.ObjectType.SITEMAP,
metadata: {
value: {
case: "sitemap",
value: {
urlRegex: `https:\/\/jzhao\.xyz\/thoughts\/.+`,
},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value: "https://jzhao.xyz/sitemap.xml",
},
},
},
},
});
All of the URLs from the sitemap, passed in via URL, will be indexed. If urlRegex
is specified, then only URLs matching the regex will be indexed. We will poll the sitemap every 15 minutes to check for new or updated content.
Collections
Collections are a special type of object, which allow you to group other objects together.
You can think of a collection like a folder inside the index. Collections can be nested.
To create a collection:
const collection = await operand.upsert({
type: Object.ObjectType.COLLECTION,
metadata: {
value: {
case: "collection",
value: {},
},
},
});
Any object can be indexed inside a collection by adding the parentId
parameter.
For example, to index a text document inside a collection:
const object = await operand.upsert({
parentId: collection.object.id, // Put this object inside the collection.
type: Object.ObjectType.TEXT,
metadata: {
value: {
case: "text",
value: {
text: "I'm inside the collection!",
},
},
},
});
RSS
Index an entire RSS feed. We'll automatically poll the feed and index new content every 15 minutes.
const object = await operand.upsert({
type: Object.ObjectType.RSS,
metadata: {
value: {
case: "rss",
value: {},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value: "http://www.aaronsw.com/2002/feeds/pgessays.rss",
},
},
},
},
});
Audio
We can auto-transcribe audio files, in the following formats:
mp3
ogg
flac
wav
You can directly pass in the file extension and the raw bytes into the metadata
, however, we recommend using the _url
property to allow us to auto-fetch the content for you and auto-detect the file type.
const object = await operand.upsert({
type: Object.ObjectType.AUDIO,
metadata: {
value: {
case: "audio",
value: {},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value:
"https://upload.wikimedia.org/wikipedia/commons/f/f7/David_Nutt_in_The_Life_Scientific_b01mqp1c.flac",
},
},
},
},
});
Podcasts
You can use Operand to index entire podcasts with <20 lines of code. We'll automatically index any new episodes, as well as the entire back catalog of episodes. As part of this, we'll also auto-transcribe the audio for all episodes.
Under the hood, we use the Listen Notes API to fetch metadata about podcasts.
To find podcasts that you can index, you can query our API:
const suggestions = await operand.suggestions({
query: "joe rogan",
type: Object.ObjectType.PODCAST,
});
console.log(
suggestions.upserts
.map((s) => `(${s.upsert.metadata.toJsonString()}) ${s.description}`)
.flat()
);
This will print:
[
'({"podcast":{"listennotesId":"21159b4568244d9a88ab676929f3b9b8"}}) The Joe Rogan Experience Experience',
'({"podcast":{"listennotesId":"b97e9dfe9c924e2ebb25c4112b0215fe"}}) Joe Rogan Experience Review podcast',
'({"podcast":{"listennotesId":"07a5007596164f1ca7f33fd854e8ca61"}}) Investigate Joe Rogan',
...
]
To index a particular podcast, you can:
const object = await operand.upsert({
type: Object.ObjectType.PODCAST,
metadata: {
value: {
case: "podcast",
value: {
listennotesId: "21159b4568244d9a88ab676929f3b9b8", // Joe Rogan Experience
},
},
},
});
We can automatically index the text content of PDF files.
const object = await operand.upsert({
type: Object.ObjectType.PDF,
metadata: {
value: {
case: "pdf",
value: {},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value: "https://www.cs.virginia.edu/~robins/YouAndYourResearch.pdf",
},
},
},
},
});
EPUB
Same goes for EPUB, i.e. ebooks.
const object = await operand.upsert({
type: Object.ObjectType.EPUB,
metadata: {
value: {
case: "epub",
value: {},
},
},
properties: {
properties: {
_url: {
indexed: false,
value: {
case: "text",
value: "your epub url",
},
},
},
},
});
Conclusion
Get to the end and seems like we're missing something? Let us know by sending us an email, and we'd be happy to add any additional object types you may need.