Multimodal Models

Overview

Multimodal models combine textual and visual data to perform advanced tasks such as image captioning, visual questions, and more. The ImageArray class enables handling of image data within a pandas DataFrame. Currently supports these image formats: PIL images, numpy arrays, base64 strings, and image URLs

Initializing ImageArray

The ImageArray class is an extension array designed to handle images as data types in pandas. You can initilize an ImageArray with a list of supported image formats

from PIL import Image
import numpy as np
from lotus.utils import ImageArray

# Example image inputs
image1 = Image.open("path_to_image1.jpg")
image2 = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")

# Create an ImageArray
images = ImageArray([image1, image2, None])

Loading ImageArray

The ImageArray supports multiple input formats for loading images.

  • PIL Images : Directly pass a PIL image object

  • Numpy Arrays : Convert numpy arrays to PIL Images automatically

  • Base64 Strings : Decode base 64 strings into images

  • URLs : Fetch images from HTTP/HTTPS URLs

  • File Paths : Load images from local or remote file Paths

  • S3 URLs : Fetch images stored in S3 buckets

Example:

from lotus.utils import fetch_image
from PIL import Image

image_path = "path_to_image.jpg"
image_url = "https://example.com/image.png"
base64_image = "data:image/png;base64,..."

# Load images
pil_image = fetch_image(image_path)
url_image = fetch_image(image_url)
base64_image_obj = fetch_image(base64_image)