Agentic Map-Reduce Examples
Two worked examples. Both use only a task — the planner derives the map and reduce
steps, and tool usage (the Python REPL) is handled transparently. Runnable versions live
in examples/agentic_map_reduce/.
Example 1: Totaling expense reports
A batch of expense reports, each with line items. The pipeline computes each report’s total in parallel (the map), then reduces to a grand total and highest-spending category. Because the reducer has the REPL, the arithmetic is computed, not estimated.
import lotus
from lotus.models import LM
from lotus.tools import PythonREPLTool
lotus.settings.configure(lm=LM(model="gpt-4o-mini"))
reports = [
"Q1 travel: flights 420.50, hotel 610.00, meals 133.25.",
"Q1 software: licenses 1200.00, cloud 348.75, monitoring 99.00.",
"Q1 office: desks 890.00, chairs 445.50, supplies 76.20.",
"Q1 marketing: ads 2300.00, design 500.00, swag 212.40.",
]
corpus = lotus.Corpus.from_documents(reports)
result = corpus.agentic_map_reduce(
task=(
"Each document is an expense report with line items. Compute the exact total "
"for the report and report its category and total. Then produce one overall "
"summary with the grand total and the highest-spending category."
),
tools=[PythonREPLTool()],
)
print(result.output)
What happens:
Plan — the planner turns the task into a per-report map instruction (“compute the total for this report”) and a reduce instruction (“combine into a grand total and top category”).
Map — four agents run in parallel, one per report, each using the REPL to sum its line items:
1163.75,1647.75,1411.70,3012.40.Reduce — the reducer sums the per-report totals with the REPL.
Output (abridged):
The grand total is $7,235.60. The highest-spending category is "Q1 marketing"
($3,012.40).
You can inspect the intermediate results:
result.findings # ['... 1163.75', '... 1647.75', '... 1411.70', '... 3012.40']
result.plan # map_instruction / reduce_instruction / shard_size / parallelism
result.usage # token totals
Example 2: Sweeping a codebase
Load source files as a corpus (one file per unit), analyze each in parallel, and reduce the per-file analyses into a single architecture overview — a fan-out-then-synthesize pattern over a codebase.
import lotus
from lotus.models import LM
from lotus.tools import PythonREPLTool
lotus.settings.configure(lm=LM(model="gpt-4o-mini"))
corpus = lotus.Corpus.from_files("lotus/agentic/*.py")
print(f"Loaded {len(corpus)} files")
result = corpus.agentic_map_reduce(
task=(
"You are analyzing a Python codebase. For each file, summarize its purpose "
"and list the key functions/classes it defines, each with a one-line "
"description. Then produce a single architecture overview explaining how the "
"files fit together."
),
tools=[PythonREPLTool()],
)
for path, finding in zip([u.id for u in corpus.units], result.findings):
print(f"\n--- {path} ---\n{finding}")
print("\n=== ARCHITECTURE OVERVIEW ===")
print(result.output)
What happens:
Shard — each file becomes its own shard, so agents analyze files independently with focused context.
Map — one agent per file summarizes its purpose and key definitions.
Reduce — the reducer synthesizes the per-file summaries into one architecture overview describing how the pieces fit together.
Pass a different glob to sweep another codebase:
python examples/agentic_map_reduce/codebase_sweep.py "lotus/sem_ops/*.py"
Running the examples
Set OPENAI_API_KEY and run from the repository root:
python examples/agentic_map_reduce/expense_reports.py
python examples/agentic_map_reduce/codebase_sweep.py
See Agentic Map-Reduce for the full API and the Corpus loaders.