sem_index

Overview

The sem_index operator in LOTUS creates a semantic index over the specified column in the dataset. This index enables efficient retrieval and ranking of records based on semantic similarity. The index will be generated with the configured retreival model stored locally in the specified directory.

Example

import pandas as pd

import lotus
from lotus.models import LM, SentenceTransformersRM
from lotus.vector_store import FaissVS

lm = LM(model="gpt-4o-mini")
rm = SentenceTransformersRM(model="intfloat/e5-base-v2")
vs = FaissVS()

lotus.settings.configure(lm=lm, rm=rm, reranker=reranker, vs=vs)
data = {
    "Course Name": [
        "Probability and Random Processes",
        "Optimization Methods in Engineering",
        "Digital Design and Integrated Circuits",
        "Computer Security",
        "Introduction to Computer Science",
        "Introduction to Data Science",
        "Introduction to Machine Learning",
        "Introduction to Artificial Intelligence",
        "Introduction to Robotics",
        "Introduction to Computer Vision",
        "Introduction to Natural Language Processing",
        "Introduction to Reinforcement Learning",
        "Introduction to Deep Learning",
        "Introduction to Computer Networks",
    ]
}
df = pd.DataFrame(data)

df = df.sem_index("Course Name", "index_dir")
print(df)

# upon reloading
df = df.load_sem_index("Course Name, "index_dir")

Output:

Course Name

0

Probability and Random Processes

1

Optimization Methods in Engineering

2

Digital Design and Integrated Circuits

3

Computer Security

4

Introduction to Computer Science

5

Introduction to Data Science

6

Introduction to Machine Learning

7

Introduction to Artificial Intelligence

8

Introduction to Robotics

9

Introduction to Computer Vision

10

Introduction to Natural Language Processing

11

Introduction to Reinforcement Learning

12

Introduction to Deep Learning

13

Introduction to Computer Networks

Required Parameters

  • col_name : The column name to index.

  • index_dir : The directory to save the index.