The PeopleMaker, built by vAIsual, provides a compelling case study for the benefits of technically superior datasets.
The generative AI app, planned to be available on the hugely successful Canva design platform, allows users to include a human model in their designs. The benefit for designers is not only accessing a unique individual, customized through prompt inputs, but also the peace of mind there are no legal risks in using the images commercially.
To explain how the app was designed and built, I interviewed our Michael Osterrieder, our CEO and one of the technical wizards here at vAIsual. In this conversation we learn about the importance of biometric annotations, the ways to address diversity and how high resolutions reduce training times and deliver better results.
Hello Michael, thanks so much for your time today for us to understand more about the Complete People Dataset and how it was used to train the PeopleMaker. Can you provide an overview about how you created the dataset and then used it to build a generative AI model for creating synthetic humans.
Sure, I’d be happy to. Well firstly, the dataset itself was produced in our photographic studio in Buenos Aires, using real life models. Each person is paid for their time to come into the studio and model for the images. They are thoroughly briefed on how the images will be used, then required to sign a biometric model release that we’ve had crafted by several IP and privacy lawyers to make sure it is water tight.
The images are captured in full resolution with professional camera equipment. We ask each person to stand in various postures and angles, with different expressions and emotions. In the end we take between 200 and 300 images of each person.
Our team then manually document all the relevant descriptions for the person. These include things such as hair style, height, body shape, age and gender.
We then transfer these images to our machine learning team who prepare the text to image pairs and resize the images to streamline the training.
After that, we start training using our customized AI model using various diffusion algorithms..
There’s often many reviews and refinements until we are happy with the results. Once we are, the API is published live, which allows our partners, such as Canva, to access our PeopleMaker model.
I’d love to dig into how the Complete People Dataset compares to other training materials you have used. Were there any differences in how the training was done, or the quality of the outputs?
During the process of customizing our own AI model, we utilized several different datasets to see how the results compared.
We found that datasets containing random shots of people with varying image resolutions, needed much, much more training material to get usable results. More training data, means more GPU, and therefore is more expensive and time consuming than with our Complete Human Dataset.
When we used our high resolution, standardized image set, the AI models are able to learn more efficiently and produce a far better output.
Can you explain why you choose to undertake the huge job of capturing images of thousands of people, verus scraping the internet for training materials?
As a professional photographer in my previous career, I knew that model releases and rights clearances underpin the commercial industry. I saw very little chance that wholesale scraping was going to work, as it’s a huge breech of copyright and privacy law. As we expected, gradually new AI Acts are coming into force that will regulate the industry, so we wanted to get ahead of that.
With the biometric model releases we provide as part of the Complete People Dataset, we can guarantee the safe usage of these materials for training AI models.
In fact, our early customers are from the kind of companies who don’t take these matters lightly. From multi-national Thales Group, to startups like Bending Spoons out of Italy, our customers are the ones who are preparing for commercial AI solutions and their legal teams in particular are taking no chances.
Can you describe the data preparation work that makes the Complete People Dataset more useful for AI/ML engineers and data scientists to work with?
One of the main ways we help to improve the work flow of our customers is by providing comprehensive text to image pairs. The text files we provide contain biometric annotations, with all the keyword descriptions to explain what the image contains.As far as I know this is a unique offering in the market.
We also do some custom work that includes bounding boxes, resizing and augmenting the datasets.
I know diversity is a hot topic in the Responsible AI discussions. Can you explain how vAIsual is tackling this issue with the Complete People Dataset?
While ensuring our product is ethical and responsible, it must be noted that no dataset can be fully “representative” or unbiased unless it included images of every person alive.
Having said that, we are implementing a range of strategies to ensure our dataset provide the most comprehensive level of diversity to our customers. Our existing dataset is constantly being expanded and we provide detailed demographics so our customers know what the breakdown of different demographics is in the dataset.
We are adding more identities from Asia, the Middle East and Eastern Europe. We also are noting content gaps and reaching out to partners.
Just like our customers, there are ways to handicap the dataset to equalize the inputs so the outputs are fairer.
And finally, can you explain how people can access the Complete People Dataset and use it for their own machine learning needs?
Certainly. We offer the dataset on our Dataset Shop marketplace. There are several different resolution options, which adjust the pricing considerably. Customers can also choose between a standard license or an extended license (which allows for generative purposes).
Once the purchase is made directly through the shop, we provide download URLs that make transferring these large files more streamlined. There is a customer portal providing easy access to the files.