vAIsual Inc, the company behind the largest visual dataset collection in the world, today launched the first of it’s Asian diaspora non-biometric datasets, consisting of thousands of Asian people and scenes.
The Asian People in Context dataset, with over 20,000 images, will play a crucial role in training AI models to recognize, classify, and analyze Asian scenes and characters. The resulting trained models can contribute to applications for environment detection, generative AI and human identification.
The dataset is the first of many delivered through a partnership with Vietnam-based stock agency Dragonimages.
Machine learning researchers and data scientists can use the datasets for a variety of purposes. All the images are legally cleared with model releases and trademark compliance.
The images feature mostly Asian people, of various ages and genders, in a range of contexts, including streets, cafes, work places and retail settings.
A dataset like Asian People in Context allows AI to train for a range of purposes. The dataset is available for non-European customers only, due to non-GDPR compliant model releases.
The dataset is specially prepared to meet the needs of ML teams, such as detailed and consistent metatags, high resolution images and, most importantly, legal clearances.
Self-service access to the dataset is via the Dataset Shop, established in 2022 by clean data specialists vAIsual Inc, and specifically catering for research and engineering teams training AI for a range of applications.
According to vAIsual CEO, Michael Osterrieder, diversity is king in AI training and our customers have been anticipating access datasets with Asian identities.
“We are excited to launch this first Asian People in Context dataset that focus on the Asian diaspora. Using our proprietary dataset building technology, we can now assemble datasets consisting of tens of thousands of images of a particular theme or subject.
Being able to collate and package these datasets saves hundreds of hours for engineers to prepare material for AI training.” says Osterrieder.
While reducing time is a core benefit, Osterrieder also emphasizes the importance of having full legal clearance.
“We are starting to see dataset disclosure requirements emerging in some jurisdictions, which will mean any AI model trained on scraped data will risk being blocked,” says Osterrieder.
The availability of legally clean datasets, that also remunerate the original content creators, is an important step to ensure companies building AI technology are doing it ethically and responsibly.
“Offering custom-prepared datasets containing premium visual content, with the consent of the original copyright owners (or their legal representatives). is essential for the AI industry to mature into a truly commercial and viable industry,” says Osterrieder.
In the coming weeks, additional datasets will be added to the datasetshop.com. The datasets are specially prepared for engineers to add to their workflow for AI training and are commercially available in a variety of resolutions.
About Dataset Shop
First launched in 2022 by the “clean data guys”, vAIsual Inc’s Dataset Shop is a marketplace for visual media designed specifically for AI training purposes.
The online store initially sold the largest biometrically released human dataset, consisting of over 600,000 high quality images, custom shot for AI training.
The Dataset Shop is rapidly growing the collection of datasets through partnerships with stock agencies seeking to address the issue of widespread scraping of datasets, obtained without the consent of copyright owners.
Dragonimages is a cutting-edge production studio based in Vietnam that creates catchy visual content with an Asian flavor.
In 2012, they started as a contributor to stock photo agencies. Since then they contributed 50,000 images that have been used for commercial purposes all over the world. You could find them on magazine spreads or street billboards.
Dragonimages production studio is part of an international holding Everypixel Group.