Multimodal Models

Overview

Multimodal models combine textual and visual data to perform advanced tasks such as image captioning, visual questions, and more. The ImageArray class enables handling of image data within a pandas DataFrame. Currently supports these image formats: PIL images, numpy arrays, base64 strings, and image URLs

Initializing ImageArray

The ImageArray class is an extension array designed to handle images as data types in pandas. You can initilize an ImageArray with a list of supported image formats

from PIL import Image
import numpy as np
from lotus.utils import ImageArray

# Example image inputs
image1 = Image.open("path_to_image1.jpg")
image2 = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")

# Create an ImageArray
images = ImageArray([image1, image2, None])

Loading ImageArray

The ImageArray supports multiple input formats for loading images.

PIL Images : Directly pass a PIL image object
Numpy Arrays : Convert numpy arrays to PIL Images automatically
Base64 Strings : Decode base 64 strings into images
URLs : Fetch images from HTTP/HTTPS URLs
File Paths : Load images from local or remote file Paths
S3 URLs : Fetch images stored in S3 buckets

Example:

from lotus.utils import fetch_image
from PIL import Image

image_path = "path_to_image.jpg"
image_url = "https://example.com/image.png"
base64_image = "data:image/png;base64,..."

# Load images
pil_image = fetch_image(image_path)
url_image = fetch_image(image_url)
base64_image_obj = fetch_image(base64_image)