🎯 Patchioner Trace Captioning Demo

This demo showcases the Patchioner model for generating image captions based on user-drawn traces or bounding boxes. More details about the Patch-ioner framework can be found in the official project webpage. Patch-ioner is an unified zero-shot captioning framework to describe arbitrary image regions.

Instructions:

Choose between Trace or BBox mode
Upload an image or use one of the provided examples
Use the appropriate tool to mark areas of interest in the image
Click "Generate Caption" to get AI-generated descriptions

Tip: Use the Layer tool to generate multiple captions for different traces.

Model Status: ✅ Default model loaded: https://huggingface.co/Ruggero1912/Patch-ioner_talk2dino_decap_COCO_Captions on cuda

📷 Select from example images or upload your own:

Example Images

📋 Captioning Mode

Choose between trace-based or bounding box-based captioning

trace bbox

🖼️ Image Editor

Upload image and draw traces

🖼️ Annotated Image

Annotated Image

Textbox

Generated caption will appear here...

💡 Tips:

Mode Selection: Switch between trace and bounding box modes based on your needs
Trace Mode: Draw continuous lines over areas you want to describe
BBox Mode: Draw rectangular bounding boxes around objects of interest
Multiple Areas: Change Layer to create multiple traces/boxes for different objects to get individual captions

🔧 Technical Details:

Trace Mode: Converts drawings to normalized (x, y) coordinates
BBox Mode: Uses bounding box coordinates for region-specific captioning
Processing: Each trace/bbox is processed separately to generate corresponding captions. Aggregated region representations also attend to the global image context.

Use the Patch-ioner framework for you projects

just use pip install git+https://github.com/Ruggero1912/Patch-ioner to install the Patch-ioner package
check the official project webpage and the GitHub repository for more details