Can an AI predict the language of the viral mutation?

Viruses lead to rather repetitive existence. They enter a cell, hijack their machinery to turn it into a viral copy machine and these copies go to other cells armed with instructions to do the same. So it goes, over and over again. But quite often, in the midst of this repetition of pasting copies, things get confused. Mutations arise in copies. Sometimes a mutation means that no amino acid is produced and that a vital protein is not folded, so this viral version enters the rubbish of evolutionary history. Sometimes the mutation does nothing, because different sequences encoding the same proteins compensate for the error. But from time to time, mutations work perfectly. The changes do not affect the ability of the virus to exist; instead, they produce a useful change, such as making the virus unrecognizable in a person’s immune defenses. When this allows the virus to evade antibodies generated from previous infections or a vaccine, this mutant variant of the virus is said to have “escaped.”

Scientists are always looking for signs of potential leakage. This is true for SARS-CoV-2, as new strains appear and scientists are investigating what genetic changes could mean for a long-term vaccine. (So far things are looking good.) It’s also what confuses researchers studying flu and HIV, who routinely evade our immune defenses. So, in an effort to see what can happen, researchers are creating hypothetical mutants in the lab and will test whether they can bypass antibodies extracted from recent patients or vaccine recipients. But the genetic code offers too many possibilities to prove it each evolutionary branch that the virus can take over time. It’s about staying up to date.

Last winter, Brian Hie, an MIT computer biologist and John Donne lyricist, was thinking about this problem when he began an analogy: what if we thought about viral sequences the way we think about written language? He reasoned that each viral sequence has a kind of grammar, a set of rules that must be followed to be this particular virus. When mutations violate this grammar, the virus reaches an evolutionary deadlock. In terms of virology, it lacks “fitness.” Also like language, from the perspective of the immune system, one could also say that the sequence has a kind of semantics. There are some sequences that the immune system can interpret and therefore stop the virus with antibodies and other defenses, and some that it cannot. Thus, a viral leak could be seen as a change that preserves the grammar of the sequence but changes its meaning.

The analogy had a simple elegance, almost too simple. But for Hie it was also practical. In recent years, artificial intelligence systems have been very good at modeling principles of grammar and semantics in human language. They do this by forming a system with data sets of billions of words, arranged in sentences and paragraphs, from which the system derives patterns. In this way, without being told any specific rules, the system learns where the commas should go and how to structure a clause. It can also be said to intuit the meaning of certain sequences (words and phrases) based on the numerous contexts in which they appear throughout the data set. They are patterns, all the way down. This is how more advanced language models, such as OpenAI’s GPT-3, can learn to produce grammatical prose perfectly capable of staying reasonably on the subject.

One of the advantages of this idea is that it is generalizable. For a machine learning model, a sequence is a sequence, either arranged in sonnets or amino acids. According to Jeremy Howard, an AI researcher at the University of San Francisco and an expert on language models, applying these models to biological sequences can be fruitful. With enough data, for example, from genetic sequences of viruses that are known to be infectious, the model will implicitly learn something about how infectious viruses are structured. “This model will have a lot of sophisticated and complex knowledge,” he says. Hie knew that was the case. His graduate advisor, computer scientist Bonnie Berger, had previously done similar work with another member of his lab, using AI to predict protein folding patterns.

.Source

Share this:

Related