Core Concepts

LOTUS’ implements the semantic operator programming model. Semantic operators are declarative transformations over one or more datasets, parameterized by a natural langauge expression (langex) that can be implemnted by a variety of AI-based algorithms. Semantic operators seamlessly extend the relational model, operating over datasets that may contain traditional structured data as well as unstructured fields, such as free-form text or images. Because semantic operators are composable, modular and declarative, they allow you to write AI-based piplines with intuitive, high-level logic, leaving the rest of the work to the query engine! Each operator can be implmented and optimized in multiple ways, opening a rich space for execution plans, similar to relational operators. Here is a quick example of semantic operators in action:

langex = "The {abstract} suggests that LLMs efficeintly utilize long context"
filtered_df = papers_df.sem_filter(langex)

With LOTUS, applications can be built by chaining togethor different semantic operators. Much like relational operators, semantic operators represent transformations over the dataset, and can be implemented and optimized under the hood. Each semantic operator is parameterized by a natural language expression. Here are some key semantic operators:

Operator	Description
sem_map	Map each record using a natural language projection
sem_extract	Extract one or more attributes from each row
sem_filter	Keep records that match the natural language predicate
sem_agg	Aggregate across all records (e.g. for summarization)
sem_topk	Order records by the natural langauge ranking criteria
sem_join	Join two datasets based on a natural language predicate
sem_sim_join	Join two DataFrames based on semantic similarity
sem_search	Perform semantic search the over a text column