sem_agg
Overview
This operator performs an aggregation over the input relation, with a langex signature that provides a commutative and associative aggregation function
Motivation
Semantic aggregations are useful for tasks, such as summarization and reasoning across multiple rows of the dataset.
Examples
import pandas as pd
import lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
data = {
"ArticleTitle": [
"Advancements in Quantum Computing",
"Climate Change and Renewable Energy",
"The Rise of Artificial Intelligence",
"A Journey into Deep Space Exploration"
],
"ArticleContent": [
"""Quantum computing harnesses the properties of quantum mechanics
to perform computations at speeds unimaginable with classical machines.
As research and development progress, emerging quantum algorithms show
great promise in solving previously intractable problems.""",
"""Global temperatures continue to rise, and societies worldwide
are turning to renewable resources like solar and wind power to mitigate
climate change. The shift to green technology is expected to reshape
economies and significantly reduce carbon footprints.""",
"""Artificial Intelligence (AI) has grown rapidly, integrating
into various industries. Machine learning models now enable systems to
learn from massive datasets, improving efficiency and uncovering hidden
patterns. However, ethical concerns about privacy and bias must be addressed.""",
"""Deep space exploration aims to understand the cosmos beyond
our solar system. Recent missions focus on distant exoplanets, black holes,
and interstellar objects. Advancements in propulsion and life support systems
may one day enable human travel to far-off celestial bodies."""
]
}
df = pd.DataFrame(data)
df = df.sem_agg("Provide a concise summary of all {ArticleContent} in a single paragraph, highlighting the key technological progress and its implications for the future.")
print(df._output[0])
Output:
"Recent technological advancements are reshaping various fields and have significant implications for the future.
Quantum computing is emerging as a powerful tool capable of solving complex problems at unprecedented speeds, while the
global shift towards renewable energy sources like solar and wind power aims to combat climate change and transform economies.
In the realm of Artificial Intelligence, rapid growth and integration into industries are enhancing efficiency and revealing
hidden data patterns, though ethical concerns regarding privacy and bias persist. Additionally, deep space exploration is
advancing with missions targeting exoplanets and black holes, potentially paving the way for human travel beyond our solar
system through improved propulsion and life support technologies."
Example with group-by
import pandas as pd
import lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
# Example DataFrame
data = {
"Category": ["Tech", "Env", "Tech", "Env"],
"ArticleContent": [
"Quantum computing shows promise in solving complex problems.",
"Renewable energy helps mitigate climate change.",
"AI improves efficiency but raises ethical concerns.",
"New holes in the ozone layer have been found."
]
}
df = pd.DataFrame(data)
# Perform semantic aggregation with groupby
df = df.sem_agg(
"Summarize the {ArticleContent} for each {Category}.",
group_by=["Category"]
)
print(df._output)
Output:
0 The "Env" category features two key points: re...
0 In the Tech category, two key developments are...
Required Parameters
user_instructions : Prompt to pass into LM
Optional Parameters
all_cols : Whether to use all columns in the dataframe.
suffix : The suffix for the new column
group_by : The columns to group by before aggregation. Each group will be aggregated separately.