Reverse Engineering Source Code of the Biontech Pfizer Vaccine: Part 2

I'm a little surprised there's not a complete toolchain for DNA by now, including assemblers, compilers and linkers, given that the human genome project was completed decades ago.

You still need living things for the runtime, but hasn't AlphaFold basically solved that problem too?

Programming in DNA is like programming in assembly language, but a 7.5 KB assembly language program is well within the reach of a lot of people. Has anyone tried to write a 7.5 KB living thing or DNA-based tool from scratch? It doesn't necessarily even need to reproduce to be revolutionary, tiny genetically engineered transistors or structural fibers or chemical reactors might all be super useful and have super simple DNA programs that only have a few dozen lines of code. What makes this so hard?

mattalex · 5 years ago

The issue is the overall complexity and untestability of such a system: The (human) organism is very very complex and we don't have a model that accurately describes it in its entirety or even sufficiently enough to do any kind of complex testing. This means that we instead look at the part of the organism that is usefull and we can reasonably model.

Systems like AlphaFold2 only solve a tiny tiny part of the problem (i.e. the question "given this amino Acid sequence: how does this Protein look like in 3d"), but it doesn't adress any of the other problems (like: "how does X structure behave in Y environment", "how are the interactions between these N Protein structures", "How do Proteins form complexes?", "how do proteins interact with RNA?", "How do you determine the location of Amino Acid side-chains?" and, most importantly, "what does X do?" )

I liken this to the problems we have in automatic theorem proving: We have formal logics/Type theories that allow for automatic theorem proving, but we still need computer scientists or mathmaticians even for sometimes very trivially complex tasks, because the scale of the problems is way to big to handle automatically. My AI-professor framed it like this "If you have a problem that would take you 10 minutes to solve, a theorem prover will solve it in 50ms. If you have a problem that takes you 20 minutes, the theorem prover won't solve it". We are at a similar, if not worse position, in Biology: We don't even know the system we're working in, yet.

Comparing the protein sequence between the virus and vaccine, it looks like codons 986 and 987 code for different amino acids. I looked it up and it seems like a very important bioengineered constraint. Membrane fusion can be blocked by mutating S residues 986 and 987 to prolines, producing an S antigen stabilised in the prefusion conformation. The introduction of this two-proline substitution yields soluble prefusion coronavirus S ectodomains (overcoming a major hurdle in subunit vaccine dev).

wrikl · 5 years ago

The author talked about this in part 1 (https://news.ycombinator.com/item?id=25538820).

flobosg · 5 years ago

subroutine · 5 years ago

also...

This paper gives background on designing vaccines to maximize rapid antibody responses post-infection, ideally before the virus enters cells:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936610/

This paper talks about how this is achieved specifically for SARS-CoV-2:

https://www.biorxiv.org/content/10.1101/2020.06.15.152835v1....

bitexploder · 5 years ago

Simplified statement, please?

flobosg · 5 years ago

Spike proteins in coronaviruses are present in two structurally distinct conformations: pre- and postfusion. The conformational switch between them promotes the viral fusion to the host cell.

To maximize the effectiveness of the vaccine, its spike protein should stay in the prefusion state no matter what. The two mutations in the engineered spike proteins allow exactly that.

dang · 5 years ago

Previous instalment: https://news.ycombinator.com/item?id=25538820

userbinator · 5 years ago

This means you can typically change every codon into one of two others, and still code for the same amino acid.

I wonder if the authors have also steganographically encoded something in there...

oh_sigh · 5 years ago

Let's hope it's not the initials of the researcher's alma mater: https://www.wired.com/2003/01/he-loves-uk-her-uterus-doesnt/

> You can submit RNA sequences to this server of the Institute for Theoretical Chemistry at the University of Vienna and it will fold RNA for you. This is a very advanced server that does meticulous calculations.

ViennaRNA is also available as a standalone package[1] if you prefer to run the secondary structure predictions locally.

[1]: https://www.tbi.univie.ac.at/RNA/

csense · 5 years ago

I didn’t know of DNA Chisel (https://edinburgh-genome-foundry.github.io/DnaChisel/) until now. Thanks for the heads-up!

person_of_color · 5 years ago

Is it possible to make one at home (assuming you had Sam Zeloof tier skills)

jpeg_hero · 5 years ago

Sounds like the dna printing to rna translation could be done, but not the “secret sauce” of the lipid bubble delivery?

mikeyouse · 5 years ago

Yeah even Pfizer had to “outsource” the lipid nanoparticle delivery system to a speciality firm. Synthesizing the RNA is within the realm of possibility for a sufficiently advanced lab, combining that with a delivery system is another level of specialization.

daemonk · 5 years ago

Here is my take on this. This was a fun project: https://gist.github.com/damiankao/81b6ebd123b9ccf98e0e47f1dd...

This is a naive probabilistic method that calculates probabilities of base change according to base position in codon, base, and amino acid of codon. Then it applies the probability to the viral sequence to generate a vaccine sequence.

The sequence generation is not deterministic. It appears to generate ~87% matching vaccine sequence most of the time.

*edit

I just realized the blog post was calculating % matching of codons, not the base-pair sequences. In which case, my method is ~66% matching.