LOTUS

Getting Started

  • Installation
  • Core Concepts
  • Examples

Semantic Operators

  • sem_map
  • sem_extract
  • sem_filter
  • sem_agg
  • sem_topk
  • sem_join
  • sem_search
  • sem_sim_join
  • sem_cluster_by

Utility Operators

  • sem_partition_by
  • sem_index
  • sem_dedup
  • web_search
    • Overview
    • Arxiv Example
    • Google Example
    • You.com Example
    • Bing Example
    • Tavily Example
    • Required Parameters
    • Optional Parameters

Models

  • LLM
  • Retrieval Models
  • ReRanker Models
  • Multimodal Models
  • Vector Stores
  • Tracking LM Usage

Advanced Usage

  • Optimized Processing with Approximations
  • Prompt Strategies
  • Setting Configurations
  • Reasoning Models
  • LLM-based Evaluation Suite
  • LLM-as-Judge
  • Pairwise Judge
  • Advanced Features
  • API Reference
  • Best Practices

Data Loading and DB Connectors

  • Database Connectors
  • File Loading with DirectoryReader
LOTUS
  • web_search
  • View page source

web_search

Overview

The web_search function allows you to load documents from the web, then process that data with LOTUS.

Different search engines are supported, including Google, Google Scholar, Arxiv, You.com, Bing and Tavily.

Arxiv Example

To get started, you will need to install the lotus submodule as follows: .. code-block:: shell

pip install lotus[arxiv]

Then you can run your lotus program:

import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM

lm = LM(model="gpt-4o-mini")

lotus.settings.configure(lm=lm)

df = web_search(WebSearchCorpus.ARXIV, "deep learning", 5)[["title", "abstract"]]
print(f"Results from Arxiv\n{df}\n\n")

most_interesting_articles = df.sem_topk("Which {abstract} is most exciting?", K=1)
print(f"Most interesting article: \n{most_interesting_articles.iloc[0]}")

Google Example

Before running the following example, you need to set the SERPAPI_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell

pip install lotus[serpapi]

Then you can run your lotus program:

import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM

lm = LM(model="gpt-4o-mini")

lotus.settings.configure(lm=lm)

df = web_search(WebSearchCorpus.GOOGLE, "deep learning research", 5)[["title", "snippet"]]
print(f"Results from Google\n{df}")
most_interesting_articles = df.sem_topk("Which {snippet} is the most exciting?", K=1)
print(f"Most interesting articles\n{most_interesting_articles}")

You.com Example

Before running the following example, you need to set the YOU_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell

pip install lotus[you]

Then you can run your lotus program:

import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM

lm = LM(model="gpt-4o-mini")

lotus.settings.configure(lm=lm)

df = web_search(WebSearchCorpus.YOU, "latest AI breakthroughs", 10)[["title", "snippet"]]
print(f"Results from You.com:\n{df}\n")
top_you_articles = df.sem_topk("Which {snippet} is the most groundbreaking?", K=3)
print(f"Top 3 most interesting articles from You.com:\n{top_you_articles}")

Bing Example

Before running the following example, you need to set the BING_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell

pip install lotus[bing]

Then you can run your lotus program:

import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM

lm = LM(model="gpt-4o-mini")

lotus.settings.configure(lm=lm)

df = web_search(WebSearchCorpus.BING, "state-of-the-art AI models", 10)[["title", "snippet"]]
print(f"Results from Bing:\n{df}\n")
top_bing_articles = df.sem_topk("Which {snippet} provides the best insight into AI models?", K=3)
print(f"Top 3 most insightful articles from Bing:\n{top_bing_articles}")

Tavily Example

Before running the following example, you need to set the TAVILY_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell

pip install lotus[tavily]

Then you can run your lotus program:

import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM

lm = LM(model="gpt-4o-mini")

lotus.settings.configure(lm=lm)

df = web_search(WebSearchCorpus.TAVILY, "AI ethics in 2025", 10)[["title", "summary"]]
print(f"Results from Tavily:\n{df}\n")
top_tavily_articles = df.sem_topk("Which {summary} best explains ethical concerns in AI?", K=3)
print(f"Top 3 articles from Tavily on AI ethics:\n{top_tavily_articles}")

Required Parameters

  • corpus : The search corpus to use. Available options: - WebSearchCorpus.ARXIV: Search academic papers on arxiv.org - WebSearchCorpus.GOOGLE: Search the web using Google Search - WebSearchCorpus.GOOGLE_SCHOLAR: Search academic papers using Google Scholar - WebSearchCorpus.YOU: Search the web using You.com - WebSearchCorpus.BING: Search the web using Bing - WebSearchCorpus.TAVILY: Search the web using Tavily

  • query : The query to search for

  • K : The number of results to return

Optional Parameters

  • cols : The columns to take from the API search results. Default values should be sufficient for most use cases. To see available columns, enable logging:

    import logging
    logging.basicConfig(level=logging.INFO)
    
Previous Next

© Copyright 2024, Liana Patel, Siddharth Jha, Carlos Guestrin, Matei Zaharia.

Built with Sphinx using a theme provided by Read the Docs.