web_search
Overview
The web_search function allows you to load documents from the web, then process that data with LOTUS.
Different search engines are supported, including Google, Google Scholar, Arxiv, You.com, Bing and Tavily.
Arxiv Example
To get started, you will need to install the lotus submodule as follows: .. code-block:: shell
pip install lotus[arxiv]
Then you can run your lotus program:
import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
df = web_search(WebSearchCorpus.ARXIV, "deep learning", 5)[["title", "abstract"]]
print(f"Results from Arxiv\n{df}\n\n")
most_interesting_articles = df.sem_topk("Which {abstract} is most exciting?", K=1)
print(f"Most interesting article: \n{most_interesting_articles.iloc[0]}")
Google Example
Before running the following example, you need to set the SERPAPI_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell
pip install lotus[serpapi]
Then you can run your lotus program:
import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
df = web_search(WebSearchCorpus.GOOGLE, "deep learning research", 5)[["title", "snippet"]]
print(f"Results from Google\n{df}")
most_interesting_articles = df.sem_topk("Which {snippet} is the most exciting?", K=1)
print(f"Most interesting articles\n{most_interesting_articles}")
You.com Example
Before running the following example, you need to set the YOU_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell
pip install lotus[you]
Then you can run your lotus program:
import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
df = web_search(WebSearchCorpus.YOU, "latest AI breakthroughs", 10)[["title", "snippet"]]
print(f"Results from You.com:\n{df}\n")
top_you_articles = df.sem_topk("Which {snippet} is the most groundbreaking?", K=3)
print(f"Top 3 most interesting articles from You.com:\n{top_you_articles}")
Bing Example
Before running the following example, you need to set the BING_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell
pip install lotus[bing]
Then you can run your lotus program:
import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
df = web_search(WebSearchCorpus.BING, "state-of-the-art AI models", 10)[["title", "snippet"]]
print(f"Results from Bing:\n{df}\n")
top_bing_articles = df.sem_topk("Which {snippet} provides the best insight into AI models?", K=3)
print(f"Top 3 most insightful articles from Bing:\n{top_bing_articles}")
Tavily Example
Before running the following example, you need to set the TAVILY_API_KEY environment variable. You will also need to install the lotus submodule as follows: .. code-block:: shell
pip install lotus[tavily]
Then you can run your lotus program:
import lotus
from lotus import WebSearchCorpus, web_search
from lotus.models import LM
lm = LM(model="gpt-4o-mini")
lotus.settings.configure(lm=lm)
df = web_search(WebSearchCorpus.TAVILY, "AI ethics in 2025", 10)[["title", "summary"]]
print(f"Results from Tavily:\n{df}\n")
top_tavily_articles = df.sem_topk("Which {summary} best explains ethical concerns in AI?", K=3)
print(f"Top 3 articles from Tavily on AI ethics:\n{top_tavily_articles}")
Required Parameters
corpus : The search corpus to use. Available options: -
WebSearchCorpus.ARXIV: Search academic papers on arxiv.org -WebSearchCorpus.GOOGLE: Search the web using Google Search -WebSearchCorpus.GOOGLE_SCHOLAR: Search academic papers using Google Scholar -WebSearchCorpus.YOU: Search the web using You.com -WebSearchCorpus.BING: Search the web using Bing -WebSearchCorpus.TAVILY: Search the web using Tavilyquery : The query to search for
K : The number of results to return
Optional Parameters
cols : The columns to take from the API search results. Default values should be sufficient for most use cases. To see available columns, enable logging:
import logging logging.basicConfig(level=logging.INFO)