Multimodal Models
===================

Overview
---------
Multimodal models combine textual and visual data to perform advanced tasks such as
image captioning, visual questions, and more. The ImageArray class enables handling of 
image data within a pandas DataFrame. Currently supports these image formats:
PIL images, numpy arrays, base64 strings, and image URLs

Initializing ImageArray
-----------------------
The ImageArray class is an extension array designed to handle images as data types in pandas. 
You can initilize an ImageArray with a list of supported image formats

.. code-block:: python

    from PIL import Image
    import numpy as np
    from lotus.utils import ImageArray

    # Example image inputs
    image1 = Image.open("path_to_image1.jpg")
    image2 = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")

    # Create an ImageArray
    images = ImageArray([image1, image2, None])


Loading ImageArray
-------------------

The ImageArray supports multiple input formats for loading images.

- **PIL Images** : Directly pass a PIL image object
- **Numpy Arrays** : Convert numpy arrays to PIL Images automatically
- **Base64 Strings** : Decode base 64 strings into images
- **URLs** : Fetch images from HTTP/HTTPS URLs
- **File Paths** : Load images from local or remote file Paths
- **S3 URLs** : Fetch images stored in S3 buckets

Example:

.. code-block:: python

    from lotus.utils import fetch_image
    from PIL import Image

    image_path = "path_to_image.jpg"
    image_url = "https://example.com/image.png"
    base64_image = "data:image/png;base64,..." 

    # Load images
    pil_image = fetch_image(image_path)
    url_image = fetch_image(image_url)
    base64_image_obj = fetch_image(base64_image)