← Back to Documentation

Object Types

All of the "things" you can index with Operand. For example, an HTML webpage, PDF, EPUB, or even a podcast.


Operand supports a number of different "object types", i.e. things you can index.

This article will outline all of the different types, and how to use them.

If you haven't already, follow our getting started guide to get setup!

Text

The most basic type of object you can index is a plain text document.

const object = await operand.upsert({
  type: Object.ObjectType.TEXT,
  metadata: {
    value: {
      case: "text",
      value: {
        text: "This is some cool text.",
      },
    },
  },
});

HTML

HTML content, or the content of a webpage.

const object = await operand.upsert({
  type: Object.ObjectType.HTML,
  metadata: {
    value: {
      case: "html",
      value: {
        html: "<p>Hello World</p>",
      },
    },
  },
});

If you want to index a webpage, you can use the _url property to allow us to auto-fetch the content for you. In this case, you don't need to provide the html directly.

const object = await operand.upsert({
  type: Object.ObjectType.HTML,
  metadata: {
    value: {
      case: "html",
      value: {},
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value: "https://en.wikipedia.org/wiki/Virus",
        },
      },
    },
  },
});

Websites

By parsing out the sitemap.xml files found on webpages, we can automatically index the content of a website. Additionally, we'll keep the content up to date as the website changes automatically.

const object = await operand.upsert({
  type: Object.ObjectType.SITEMAP,
  metadata: {
    value: {
      case: "sitemap",
      value: {
        urlRegex: `https:\/\/jzhao\.xyz\/thoughts\/.+`,
      },
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value: "https://jzhao.xyz/sitemap.xml",
        },
      },
    },
  },
});

All of the URLs from the sitemap, passed in via URL, will be indexed. If urlRegex is specified, then only URLs matching the regex will be indexed. We will poll the sitemap every 15 minutes to check for new or updated content.

Collections

Collections are a special type of object, which allow you to group other objects together.

You can think of a collection like a folder inside the index. Collections can be nested.

To create a collection:

const collection = await operand.upsert({
  type: Object.ObjectType.COLLECTION,
  metadata: {
    value: {
      case: "collection",
      value: {},
    },
  },
});

Any object can be indexed inside a collection by adding the parentId parameter.

For example, to index a text document inside a collection:

const object = await operand.upsert({
  parentId: collection.object.id, // Put this object inside the collection.
  type: Object.ObjectType.TEXT,
  metadata: {
    value: {
      case: "text",
      value: {
        text: "I'm inside the collection!",
      },
    },
  },
});

RSS

Index an entire RSS feed. We'll automatically poll the feed and index new content every 15 minutes.

const object = await operand.upsert({
  type: Object.ObjectType.RSS,
  metadata: {
    value: {
      case: "rss",
      value: {},
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value: "http://www.aaronsw.com/2002/feeds/pgessays.rss",
        },
      },
    },
  },
});

Audio

We can auto-transcribe audio files, in the following formats:

  • mp3
  • ogg
  • flac
  • wav

You can directly pass in the file extension and the raw bytes into the metadata, however, we recommend using the _url property to allow us to auto-fetch the content for you and auto-detect the file type.

const object = await operand.upsert({
  type: Object.ObjectType.AUDIO,
  metadata: {
    value: {
      case: "audio",
      value: {},
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value:
            "https://upload.wikimedia.org/wikipedia/commons/f/f7/David_Nutt_in_The_Life_Scientific_b01mqp1c.flac",
        },
      },
    },
  },
});

Podcasts

You can use Operand to index entire podcasts with <20 lines of code. We'll automatically index any new episodes, as well as the entire back catalog of episodes. As part of this, we'll also auto-transcribe the audio for all episodes.

Under the hood, we use the Listen Notes API to fetch metadata about podcasts.

To find podcasts that you can index, you can query our API:

const suggestions = await operand.suggestions({
  query: "joe rogan",
  type: Object.ObjectType.PODCAST,
});
console.log(
  suggestions.upserts
    .map((s) => `(${s.upsert.metadata.toJsonString()}) ${s.description}`)
    .flat()
);

This will print:

[
  '({"podcast":{"listennotesId":"21159b4568244d9a88ab676929f3b9b8"}}) The Joe Rogan Experience Experience',
  '({"podcast":{"listennotesId":"b97e9dfe9c924e2ebb25c4112b0215fe"}}) Joe Rogan Experience Review podcast',
  '({"podcast":{"listennotesId":"07a5007596164f1ca7f33fd854e8ca61"}}) Investigate Joe Rogan',
  ...
]

To index a particular podcast, you can:

const object = await operand.upsert({
  type: Object.ObjectType.PODCAST,
  metadata: {
    value: {
      case: "podcast",
      value: {
        listennotesId: "21159b4568244d9a88ab676929f3b9b8", // Joe Rogan Experience
      },
    },
  },
});

PDF

We can automatically index the text content of PDF files.

const object = await operand.upsert({
  type: Object.ObjectType.PDF,
  metadata: {
    value: {
      case: "pdf",
      value: {},
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value: "https://www.cs.virginia.edu/~robins/YouAndYourResearch.pdf",
        },
      },
    },
  },
});

EPUB

Same goes for EPUB, i.e. ebooks.

const object = await operand.upsert({
  type: Object.ObjectType.EPUB,
  metadata: {
    value: {
      case: "epub",
      value: {},
    },
  },
  properties: {
    properties: {
      _url: {
        indexed: false,
        value: {
          case: "text",
          value: "your epub url",
        },
      },
    },
  },
});

Conclusion

Get to the end and seems like we're missing something? Let us know by sending us an email, and we'd be happy to add any additional object types you may need.