What, exactly, is Sibyl?

The Problem

The software abstractions (e.g., NumPy) that power traditional DataFrame operations fail when applied to unstructured data. A filter over a structured column (e.g., df[df[”age”] > 15]) can be implemented with one line of code, but no simple abstraction exists that could implement a semantic filter over unstructured data (e.g., df[df[”image”].contains(”person”)]).

The Solution

So we’re building Sibyl, a Pandas-like data frame for multimodal data that seamlessly stores rich objects (e.g. videos, images, web pages, audio) and their tensor representations (e.g. embeddings, logits) alongside structured data (e.g., strings, numbers, dates). Enterprises use their own computing resources while utilizing our interface to apply foundation model-backed unstructured data transforms on their data lakes in real-time. Under the hood, we’re backed by a delta lake (enabling zero-copy operations and eliminating data egress costs), and we store embeddings in a separate vector database, which facilitates the easy exploration of data and creation of semantic applications.Sibyl makes data engineering and application building simple.