Guy Salvesen and Giovanni Paternostro have interviewed Talmo Pereira, a Fellow and Principal Investigator at the Salk Institute. His lab builds computational tools that leverage deep learning and computer vision to study complex biological systems. He has developed widely used tools that track movements for animal studies of behavior (1).
Â
Dear Talmo,
What could be achieved if there was a public or nonprofit AI effort with the same scale and level of funding as the current large private efforts? What would be the benefits for society?Â
Â
Talmo:
That's a really great question, and frankly, it's one that needs to be addressed at multiple levels. I think that Europe is developing some interesting initiatives to centralize around clusters of high-performance computing that enable AI. When we talk about the kind of resources that are necessary to do AI, really what we're talking about is GPUs. It's a specialized kind of hardware. It is very expensive, because NVIDIA has a monopoly on it, and because the supply chain is deadlocked in geopolitics now, with TSMC and these other manufacturers.
The CHIPS Act has helped a little bit, but at the moment all the software is really designed around NVIDIA's chips, and it is really hard to break out of that, unless you're at Google, and using Google's TPUs, which are really the only other feasible option. These massive pools of GPUs are needed to enable training of what we call foundation models.
These are large-scale models, models of the same scale and capacity as ChatGPT. In science we've seen that some of these already begin to emerge, certainly in protein folding, and also in multi-omics, and omics in general. There's been a series of papers recently on training these foundation models on large-scale genetic data. That's an obvious application. It's sequential and easy to encode in a way that makes sense. In the same way that ChatGPT has all these emergent properties and capabilities, just by digesting a lot of text, you can imagine we could have a lot of properties and capabilities that emerge out of doing this with genetic information.
The most reductionist description I've heard is that you will spend a hundred million dollars to do RNA-seq on every single cell in the brain or any other part of the body, and even in cancer and then you throw it all into a UMAP plot. This is a very simplistic description of what AI models could enable. You could fine-tune these representations, steering the kinds of information captured in omics data to then do a lot more, everything from open reading frame prediction to having this augment the protein folding, to predict binding sites.
If you include regulatory data and expression data, then it can begin to infer the structure of regulatory networks. If you have a cell state that you want to get to, you can do what's called a reverse perturbation.
Essentially, the big idea with these models is that they enable in silico experimentation. Virtual biology, not computational biology, not bioinformatics, but virtual biology.While something like bioinformatics seeks to process the data better, while computational biology seeks to model it better, virtual biology seeks to emulate the process of doing bench science with a sufficiently capable simulacrum of the biological system.
If you have a sufficient amount of data in a sufficiently capable model, it can reproduce some part of that biological system. And you can design it in such a way that it is not just a description of the data, not just a hypothesized model for it, but really a direct analog. You could point to a specific layer in your neural network and say, this corresponds to this gene, this corresponds to this neuron. Then the experiments that fall out of this type of modeling are directly testable.
Â
Guy:
Who will do the experiments needed to test these predictions?
Â
Talmo:
We find that many experimental biologists would very gladly use an AI simulation of the work they're doing at the bench that can tell them how a new result, integrated into all the previous results, might predict the next experiment. And maybe not just what happens if they do a specific experiment, but what happens if they do this whole class? For example, ablating every type of neuron or trying every type of stimulation pattern. The model will give you a ranked list of things that modify and achieve the phenotype that you want, and then you can go and do it in the lab. Scientists already go to NCBI and look at the genome browsers before doing a perturbation and designing CRISPR probes. These kinds of tools are already part of the workflow.
The challenge is to make AI models more accessible, in the sense of the technical barriers but also the cost, such that they can become a run-of-the-mill tool. And when we get there, then you will not need folks like me anymore to run those things and tell you what those hypotheses are. Instead, my experimental collaborator will do it directly. This direct contact will help to ground the algorithmic process in the types of questions that experimentalists want to ask.
There is one ongoing initiative by NSF called NAIRR that is attempting to move us in that direction, but it is still at an early stage.
Another challenge I want to mention is infrastructural. I have worked at Google during part of my PhD work. This made me appreciate how fundamentally important it is to have specialized support. It's not just enough having the GPUs. You need to be able to harvest them.
High-performance computing is a new form of computing, especially when it comes to GPUs. At Google they had all these systems to help you scale from 1 GPU to 1000 or 10,000 GPUs. Many software engineers were dedicated to this task, to make sure that all the GPUs were up and running and monitoring their progress. This support is invaluable.
Â
Guy:
What can we accomplish in the public domain that the private domain large players like Google DeepMind can't or will not do?
Â
Talmo:
If we look at some fields in science and engineering, we can see clear examples of how that route might look like. We might look at space research. Initially, we wanted to compete with the Russians, so we poured billions upon billions of dollars into that basic research, and it was really spread out. A lot of it was done at NASA, but they also just funded technology development everywhere. And then the knock-on effects were incredible. There are so many positive externalities of doing that. What ends up being the big advantage, is that we are a little bit looser and less mission-oriented with public funding. You never know where the next great idea is going come from. In the private sector you can see how Google is now decreasing the resources dedicated to open-ended science, which was done by DeepMind, and putting more resources in LLMs, to face the competition of OpenAI.
Â
We are encouraging researchers at different career stages to share ideas about complex science problems that could benefit from a large-scale AI effort. We found that motivation and recognition could be provided if you and other well-known scientists were willing to talk to people that suggest the best ideas. You would be the judge and decide if any idea is deserving of your attention. Any scientist selected might receive advice but could also be a potential collaborator. Many ideas will be produced, and society will take notice. Would you be willing to talk to any of these scientists?
Â
Talmo:
Yes, of course. I have encountered competition, as any other scientist, but it does not stop me from pursuing a philosophy of openness, sharing and collaboration. This is an approach to science that has served us quite well.
I think that, as the current generation of junior scientists grows up in science, there's going be a rapid culture shift towards understanding more how team science needs to evolve, how recognition and credit assignment need to evolve. It's already happening. I sit in search committees now, and we do not just look at who is the first versus the co-first author, we try to understand exactly what their contributions were.
Â
Â
REFERENCES
Â
1- Marx, V. 20 years of Nature Methods: how some papers shaped science and careers. Nat Methods 21, 1786–1791 (2024). https://doi.org/10.1038/s41592-024-02452-x