Show HN: We collected detailed annotations for text-to-image generation

2 points by maalber 4 months ago

Recently, the most popular modality text-to-image annotations has been preference data, where annotators usually choose between two images two indicate their favorite. While this does work to fine-tune models, it lacks additional information about what might be wrong with the images. E.g., what part of the image is misaligned relative to the prompt. Google research propose a modality for more information rich annotations (https://arxiv.org/abs/2312.10240). Based on this, we produced this dataset of ~13k images. We collected in total ~1.5 million annotations from 150k annotators using our annotation API. If you are interested you can learn more about the API at https://docs.rapidata.ai/

Let me know if you have any questions about the dataset or Rapidata in general!