Introduction to Vana
In February 2024, Reddit struck a $60 million deal with Google to let the search giant use data on the platform to train its artificial intelligence models. Notably absent from the discussions were Reddit users, whose data were being sold. This deal reflected the reality of the modern internet: Big tech companies own virtually all our online data and get to decide what to do with that data.
The Problem with Big Tech
Unsurprisingly, many platforms monetize their data, and the fastest-growing way to accomplish that today is to sell it to AI companies, who are themselves massive tech companies using the data to train ever more powerful models. The decentralized platform Vana, which started as a class project at MIT, is on a mission to give power back to the users.
How Vana Works
The company has created a fully user-owned network that allows individuals to upload their data and govern how they are used. AI developers can pitch users on ideas for new models, and if the users agree to contribute their data for training, they get proportional ownership in the models. The idea is to give everyone a stake in the AI systems that will increasingly shape our society while also unlocking new pools of data to advance the technology.
The Founders’ Vision
Vana co-founder Anna Kazlauskas says, “This data is needed to create better AI systems. We’ve created a decentralized system to get better data — which sits inside big tech companies today — while still letting users retain ultimate ownership.” Kazlauskas came to MIT sure she’d become an economist, but she ended up being one of five students to join the MIT Bitcoin club in 2015, and that experience led her into the world of blockchains and cryptocurrency.
From Economics to Blockchain
Kazlauskas met Art Abal, who was then attending Harvard University, in the former Media Lab class Emergent Ventures, and the pair decided to work on new ways to obtain data to train AI systems. Their approach evolved over the years and was informed by Kazlauskas’ experience working at the financial blockchain company Celo after graduation. But Kazlauskas credits her time at MIT with helping her think about these problems, and the instructor for Emergent Ventures, Ramesh Raskar, still helps Vana think about AI research questions today.
The Power of Decentralization
Today Vana takes advantage of a little-known law that allows users of most big tech platforms to export their data directly. Users can upload that information into encrypted digital wallets in Vana and disburse it to train models as they see fit. AI engineers can suggest ideas for new open-source models, and people can pool their data to help train the model. In the blockchain world, the data pools are called data DAOs, which stands for decentralized autonomous organization.
Crowdsourced, User-Owned AI
Last year, a machine-learning engineer proposed using Vana user data to train an AI model that could generate Reddit posts. More than 140,000 Vana users contributed their Reddit data, which contained posts, comments, messages, and more. Users decided on the terms in which the model could be used, and they maintained ownership of the model after it was created. Vana has enabled similar initiatives with user-contributed data from the social media platform X; sleep data from sources like Oura rings; and more.
The Future of AI
Vana has over 1 million users and over 20 live data DAOs. More than 300 additional data pools have been proposed by users on Vana’s system, and Kazlauskas says many will go into production this year. “I think there’s a lot of promise in generalized AI models, personalized medicine, and new consumer applications, because it’s tough to combine all that data or get access to it in the first place,” Kazlauskas says.
Conclusion
Vana is revolutionizing the way we think about data and AI. By giving users control over their own data and allowing them to contribute to the development of new AI models, Vana is creating a more decentralized and equitable system. As Kazlauskas says, “It’s a win-win: Users get to benefit from the rise of AI because they own the models. Then you don’t end up in a scenario where you don’t have a single company controlling an all-powerful AI model. You get better technology, but everyone benefits.”
FAQs
- Q: What is Vana?
A: Vana is a decentralized platform that allows users to upload their data and govern how it is used to train AI models. - Q: How does Vana work?
A: Vana allows users to upload their data into encrypted digital wallets and disburse it to train models as they see fit. AI engineers can suggest ideas for new open-source models, and people can pool their data to help train the model. - Q: What are data DAOs?
A: Data DAOs, or decentralized autonomous organizations, are data pools that are created when users contribute their data to train a new AI model. - Q: How many users does Vana have?
A: Vana has over 1 million users and over 20 live data DAOs. - Q: What is the goal of Vana?
A: The goal of Vana is to give users control over their own data and allow them to contribute to the development of new AI models, creating a more decentralized and equitable system.