Skip to main content

Multimodal

Datasets that combine modalities, and data produced with the help of large models. These tasks share the challenge of aligning and jointly labelling more than one signal.

Shared across multimodal data

  • Aligning modalities (image–text, audio–text)
  • Joint annotation guidelines across modalities
  • Quality control spanning every modality
  • Metadata & combined-format handling
  • Licensing across combined sources

Tasks

  • Image–text – (visual question answering, image captioning)
  • LLM-assisted & synthetic data – (generation, augmentation, distillation)
Contributor
@abumafrim

Join the discussion

Spotted an error, have a question, or want to share what worked on a real project? Sign in with GitHub to add your voice — every thread lives in the open, powered by GitHub Discussions.

Loading discussion…