sem_map
Overview
This operator performs a semantic mapping over input data using natural language instructions. It applies a user-defined instruction to each row of data, transforming the content based on the specified criteria. The operator supports both DataFrame operations and direct function calls on multimodal data.
Motivation
The sem_map operator is useful for performing row-wise transformations over data using natural language instructions. It enables users to apply complex mappings, transformations, or analyses without writing custom code, making it ideal for tasks like content summarization, sentiment analysis, format conversion, and data enrichment.
Basic Example
import pandas as pd
import lotus
from lotus.models import LM
lotus.settings.configure(lm=LM(model="gpt-4o-mini"))
courses = pd.DataFrame({
"Course Name": [
"Probability and Random Processes",
"Optimization Methods in Engineering",
"Digital Design and Integrated Circuits",
"Computer Security",
]
})
mapped = courses.sem_map(
"What is a similar course to {Course Name}? Be concise.",
suffix="_similar_course",
)
print(mapped)
Output:
Course Name |
_similar_course |
|
|---|---|---|
0 |
Probability and Random Processes |
Stochastic Processes |
1 |
Optimization Methods in Engineering |
Convex Optimization |
2 |
Digital Design and Integrated Circuits |
Computer Architecture |
3 |
Computer Security |
Cybersecurity |
Few-Shot Examples
Use examples when you want to show the model the desired style or output
format. The examples DataFrame should include the referenced input columns and
an Answer column.
examples = pd.DataFrame({
"issue_title": ["Fix typo in README"],
"Answer": ["Correct a typo in the README file."],
})
mapped = issues.sem_map(
"Rewrite {issue_title} as a concise contributor task.",
examples=examples,
suffix="_task",
)
Reasoning and Explanations
Reasoning strategies ask the model to reason before producing the final answer. Use them when the mapping requires judgment, such as classifying an issue into a category or deciding whether text implies a risk.
from lotus.types import ReasoningStrategy
mapped = issues.sem_map(
"Classify {issue_title} as docs, frontend, security, or infrastructure.",
strategy=ReasoningStrategy.ZS_COT,
return_explanations=True,
suffix="_category",
)
return_explanations=True adds explanation_category. This is useful
while developing prompts, but it costs extra output tokens and is usually not
needed in production pipelines.
Raw Outputs and Postprocessing
LOTUS normally stores the parsed model output in the suffix column. Set
return_raw_outputs=True when you also want the unparsed text returned by
the model.
mapped = issues.sem_map(
"Return a priority for {issue_title}: low, medium, or high.",
return_raw_outputs=True,
suffix="_priority",
)
Use a custom postprocessor when the model output needs custom parsing.
The postprocessor receives the raw model outputs and returns parsed outputs,
raw outputs, and optional explanations.
Required Parameters
user_instruction: Natural language instruction for the row-wise transformation. Reference columns with{column_name}.
Optional Parameters
system_prompt: Custom system prompt for the LM.postprocessor: Function that parses raw model outputs.return_explanations: Add anexplanation{suffix}column when reasoning is available.return_raw_outputs: Add araw_output{suffix}column with the raw model text.suffix: Name of the main output column. Defaults to"_map".examples: Few-shot examples with the referenced columns and anAnswercolumn.strategy: Optional reasoning strategy, such asReasoningStrategy.ZS_COT.safe_mode: Estimate cost before execution.progress_bar_desc: Progress bar label.**model_kwargs: Extra keyword arguments passed to the configured LM.