๐ŸŽฏ Patchioner Trace Captioning Demo

This demo showcases the Patchioner model for generating image captions based on user-drawn traces or bounding boxes. More details about the Patch-ioner framework can be found in the official project webpage. Patch-ioner is an unified zero-shot captioning framework to describe arbitrary image regions.

Instructions:

  1. Choose between Trace or BBox mode
  2. Upload an image or use one of the provided examples
  3. Use the appropriate tool to mark areas of interest in the image
  4. Click "Generate Caption" to get AI-generated descriptions

Tip: Use the Layer tool to generate multiple captions for different traces.

Model Status: โœ… Default model loaded: https://huggingface.co/Ruggero1912/Patch-ioner_talk2dino_decap_COCO_Captions on cuda

๐Ÿ“ท Select from example images or upload your own:

๐Ÿ“‹ Captioning Mode

Choose between trace-based or bounding box-based captioning

๐Ÿ–ผ๏ธ Image Editor

๐Ÿ–ผ๏ธ Annotated Image

Generated caption will appear here...

๐Ÿ’ก Tips:

  • Mode Selection: Switch between trace and bounding box modes based on your needs
  • Trace Mode: Draw continuous lines over areas you want to describe
  • BBox Mode: Draw rectangular bounding boxes around objects of interest
  • Multiple Areas: Change Layer to create multiple traces/boxes for different objects to get individual captions

๐Ÿ”ง Technical Details:

  • Trace Mode: Converts drawings to normalized (x, y) coordinates
  • BBox Mode: Uses bounding box coordinates for region-specific captioning
  • Processing: Each trace/bbox is processed separately to generate corresponding captions. Aggregated region representations also attend to the global image context.

Use the Patch-ioner framework for you projects