What if biology had its own version of the internet—and AI could learn from it to design new medicines?
That’s the ambition behind Basecamp Research. In a recent conversation with Andreas, co-founder Oliver Vince described the company in simple terms: “we’re trying to build ChatGPT, but for DNA.”
While most AI models in biology rely on a narrow slice of existing data, Basecamp is doing something fundamentally different. Instead of scraping what’s already known, they’re going out into the real world—mapping life on Earth and generating entirely new biological data at scale.
The Data Problem Holding Back Biology
One of the most striking insights from the conversation is just how limited today’s biological data really is.
Roughly 68% of all biological data comes from just a handful of species—humans, mice, and a few others. That means most AI models in biology are trained on an extremely narrow view of life.
Basecamp’s thesis is that this is the bottleneck.
If AI in language improved by training on the entire internet, then AI in biology will only reach its full potential once it can learn from the full diversity of life. That’s what led to the launch of the Trillion Gene Atlas: an effort to map biology at planetary scale and unlock entirely new training data for models.
From Expeditions to Foundation Models
What makes Basecamp unique is how it gathers that data.
The company started by running expeditions to some of the most remote environments on Earth—sequencing DNA in places like ice caps and volcanic ecosystems. In one early experiment, nearly 80% of the genetic data they collected was completely unknown to science.
That moment became the foundation of the company.
Today, Basecamp operates across more than 30 countries, working with local scientists and partners to build a global pipeline for genomic data collection. The goal isn’t just scale—it’s diversity. The harder-to-reach ecosystems are exactly where the most novel biology lives.
From Data to “Prompt-to-Medicine”
The second major breakthrough is what happens once that data is fed into models.
Basecamp’s EDEN models represent a shift from prediction to generation. Instead of analyzing biological sequences, they can begin to design therapeutics directly.
As Oliver explained, the goal is a system where you can input disease biology and get a candidate treatment in return. Early results are already promising, with models generating functional outputs across different therapeutic areas.
It’s still early—but the direction is clear.
Why This Matters Now
A key theme in the conversation is timing.
Advances in sequencing, compute, and AI are all compounding at once. The cost of generating biological data is dropping rapidly, while model capabilities are improving just as quickly. Basecamp is positioning itself at the intersection of these trends—building the data layer that makes everything else possible.
And importantly, this isn’t just about bigger models.
As Oliver noted, synthetic data works in domains where you can simulate reality—but biology isn’t one of them. To truly understand life, you need real-world data at massive scale. That’s the gap Basecamp is filling.
The Long-Term Vision
The end state is ambitious: a world where biology becomes programmable.
Where instead of years of trial-and-error, researchers can generate therapies directly from data. Where AI systems can reason across biology the way language models reason across text.
We’re not fully there yet—but the trajectory is accelerating.
For Basecamp, the launch of the Trillion Gene Atlas isn’t the finish line. As Oliver put it, it’s the starting gun for the next phase—scaling the data, models, and infrastructure needed to make that future real.
And if they succeed, it could redefine not just drug discovery—but how we understand life itself.








