Multimodal Models
Overview
Multimodal models combine textual and visual data to perform advanced tasks such as image captioning, visual questions, and more. The ImageArray class enables handling of image data within a pandas DataFrame. Currently supports these image formats: PIL images, numpy arrays, base64 strings, and image URLs
Initializing ImageArray
The ImageArray class is an extension array designed to handle images as data types in pandas. You can initilize an ImageArray with a list of supported image formats
from PIL import Image
import numpy as np
from lotus.utils import ImageArray
# Example image inputs
image1 = Image.open("path_to_image1.jpg")
image2 = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")
# Create an ImageArray
images = ImageArray([image1, image2, None])
Loading ImageArray
The ImageArray supports multiple input formats for loading images.
PIL Images : Directly pass a PIL image object
Numpy Arrays : Convert numpy arrays to PIL Images automatically
Base64 Strings : Decode base 64 strings into images
URLs : Fetch images from HTTP/HTTPS URLs
File Paths : Load images from local or remote file Paths
S3 URLs : Fetch images stored in S3 buckets
Example:
from lotus.utils import fetch_image
from PIL import Image
image_path = "path_to_image.jpg"
image_url = "https://example.com/image.png"
base64_image = "data:image/png;base64,..."
# Load images
pil_image = fetch_image(image_path)
url_image = fetch_image(image_url)
base64_image_obj = fetch_image(base64_image)