sem_topk
Overview
LOTUS supports a semantic top-k, which takes the langex ranking criteria. Programmers can optionally specify a group-by parameter to indicate a subset of columns to group over during ranking. The groupings are defined using standard equality matches over the group-by columns
Motivation
This operator is useful for re-ordering records based on complex, arbitrary natural language comparators.
Example
import pandas as pd
import lotus
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
data = {
"Course Name": [
"Probability and Random Processes",
"Optimization Methods in Engineering",
"Digital Design and Integrated Circuits",
"Computer Security",
]
}
df = pd.DataFrame(data)
for method in ["quick", "heap", "naive"]:
sorted_df, stats = df.sem_topk(
"Which {Course Name} requires the least math?",
K=2,
method=method,
return_stats=True,
)
print(sorted_df)
print(stats)
Output:
Course Name |
|
0 |
Computer Security |
1 |
Digital Design and Integrated Circuits |
Required Parameters
user_instruction : The user instruction for sorting.
K: The number of rows to return.
Optional Paramaters
method : The method to use for sorting. Options are “quick”, “heap”, “naive”, “quick-sem”.
group_by : The columns to group by before sorting. Each group will be sorted separately.
cascade_threshold: The confidence threshold for cascading to a larger model.
return_stats : Whether to return stats.