Core Concepts
LOTUS’ implements the semantic operator programming model. Semantic operators are declarative transformations over one or more datasets, parameterized by a natural langauge expression (langex) that can be implemnted by a variety of AI-based algorithms. Semantic operators seamlessly extend the relational model, operating over datasets that may contain traditional structured data as well as unstructured fields, such as free-form text or images. Because semantic operators are composable, modular and declarative, they allow you to write AI-based piplines with intuitive, high-level logic, leaving the rest of the work to the query engine! Each operator can be implmented and optimized in multiple ways, opening a rich space for execution plans, similar to relational operators. Here is a quick example of semantic operators in action:
langex = "The {abstract} suggests that LLMs efficeintly utilize long context"
filtered_df = papers_df.sem_filter(langex)
With LOTUS, applications can be built by chaining togethor different semantic operators. Much like relational operators, semantic operators represent transformations over the dataset, and can be implemented and optimized under the hood. Each semantic operator is parameterized by a natural language expression. Here are some key semantic operators:
Operator |
Description |
|---|---|
sem_map |
Map each record using a natural language projection |
sem_extract |
Extract one or more attributes from each row |
sem_filter |
Keep records that match the natural language predicate |
sem_agg |
Aggregate across all records (e.g. for summarization) |
sem_topk |
Order records by the natural langauge ranking criteria |
sem_join |
Join two datasets based on a natural language predicate |
sem_sim_join |
Join two DataFrames based on semantic similarity |
sem_search |
Perform semantic search the over a text column |