Agentic Map-Reduce Examples

Two worked examples. Both use only a task — the planner derives the map and reduce steps, and tool usage (the Python REPL) is handled transparently. Runnable versions live in examples/agentic_map_reduce/.

Example 1: Totaling expense reports

A batch of expense reports, each with line items. The pipeline computes each report’s total in parallel (the map), then reduces to a grand total and highest-spending category. Because the reducer has the REPL, the arithmetic is computed, not estimated.

import lotus
from lotus.models import LM
from lotus.tools import PythonREPLTool

lotus.settings.configure(lm=LM(model="gpt-4o-mini"))

reports = [
    "Q1 travel: flights 420.50, hotel 610.00, meals 133.25.",
    "Q1 software: licenses 1200.00, cloud 348.75, monitoring 99.00.",
    "Q1 office: desks 890.00, chairs 445.50, supplies 76.20.",
    "Q1 marketing: ads 2300.00, design 500.00, swag 212.40.",
]
corpus = lotus.Corpus.from_documents(reports)

result = corpus.agentic_map_reduce(
    task=(
        "Each document is an expense report with line items. Compute the exact total "
        "for the report and report its category and total. Then produce one overall "
        "summary with the grand total and the highest-spending category."
    ),
    tools=[PythonREPLTool()],
)

print(result.output)

What happens:

  • Plan — the planner turns the task into a per-report map instruction (“compute the total for this report”) and a reduce instruction (“combine into a grand total and top category”).

  • Map — four agents run in parallel, one per report, each using the REPL to sum its line items: 1163.75, 1647.75, 1411.70, 3012.40.

  • Reduce — the reducer sums the per-report totals with the REPL.

Output (abridged):

The grand total is $7,235.60. The highest-spending category is "Q1 marketing"
($3,012.40).

You can inspect the intermediate results:

result.findings   # ['... 1163.75', '... 1647.75', '... 1411.70', '... 3012.40']
result.plan       # map_instruction / reduce_instruction / shard_size / parallelism
result.usage      # token totals

Example 2: Sweeping a codebase

Load source files as a corpus (one file per unit), analyze each in parallel, and reduce the per-file analyses into a single architecture overview — a fan-out-then-synthesize pattern over a codebase.

import lotus
from lotus.models import LM
from lotus.tools import PythonREPLTool

lotus.settings.configure(lm=LM(model="gpt-4o-mini"))

corpus = lotus.Corpus.from_files("lotus/agentic/*.py")
print(f"Loaded {len(corpus)} files")

result = corpus.agentic_map_reduce(
    task=(
        "You are analyzing a Python codebase. For each file, summarize its purpose "
        "and list the key functions/classes it defines, each with a one-line "
        "description. Then produce a single architecture overview explaining how the "
        "files fit together."
    ),
    tools=[PythonREPLTool()],
)

for path, finding in zip([u.id for u in corpus.units], result.findings):
    print(f"\n--- {path} ---\n{finding}")

print("\n=== ARCHITECTURE OVERVIEW ===")
print(result.output)

What happens:

  • Shard — each file becomes its own shard, so agents analyze files independently with focused context.

  • Map — one agent per file summarizes its purpose and key definitions.

  • Reduce — the reducer synthesizes the per-file summaries into one architecture overview describing how the pieces fit together.

Pass a different glob to sweep another codebase:

python examples/agentic_map_reduce/codebase_sweep.py "lotus/sem_ops/*.py"

Running the examples

Set OPENAI_API_KEY and run from the repository root:

python examples/agentic_map_reduce/expense_reports.py
python examples/agentic_map_reduce/codebase_sweep.py

See Agentic Map-Reduce for the full API and the Corpus loaders.