Dear all,
I am currently working on a deep dive of $DNA (Gingko Bioworks). I believe it has the potential to be the next AWS kind of platform we see emerge in the world. Since $DNA is all about genetic engineering, I am first sending out a brief write up to help you understand the key technological advancements happening in the genomics / proteomics space. This write up will not only help you understand $DNA better, but it will also help prepare you to invest in a field which is likely going to transform our economy.
I will be releasing the $DNA deep dive before the end of this month.
Best wishes to all.
Genomics
Genomics is essentially the study of the genome: the information that guides the evolution of all known living creatures. The human genome, for instance, is simply a set of instructions that determines the way your entire body is created and functions. The genome is composed of deoxyribonucleic acid, otherwise known as DNA. DNA is found within the cells of living creatures, specifically within the nucleus. The cells of mammals are known as eukaryotic cells — cells that contains membrane bound organelles (small “machines” within the cell that perform different functions).
One of these machines is known as the ribosome. The ribosome is in charge of synthesizing proteins, that are the building blocks of our bodies and of other living creatures. How do ribosomes create proteins, you may wonder? Firstly, the DNA in the nucleus gets replicated. Then, the replica of the DNA gets transcribed to what is known as ribonucleic acid, or RNA, which is a slightly different form of DNA. Ribosomes then read the RNA and translate it into aminoacids. We can map DNA to aminoacids, meaning that we know that aminoacid sequence a given DNA sequence will ultimately yield.
Aminoacids are organic molecules with both an amino group and a carboxyl group. The R side-chain varies according to each amino acid.
There are 20 different aminoacids, each with its own R-group. Each has a specific name to it and can be represented by its name´s first letter, as such:
Essentially, the above letters constitute the programming language that I was referring to. These letters can be combined in many ways to create many proteins — you will find out how in the next few paragraphs.
The key thing about aminoacids is that they link up together by joining their carboxyl groups, forming long chains, through a chemical process known as dehydration (it´s called dehydration because it emits a water molecule and dehydrates the carboxyl group). The long chains then begin to fold and create clusters of aminoacid chains, otherwise known as proteins. You may find it fascinating that the chains begin to fold by what are known as Van der Waals interactions: electrostatic interactions between the components of the different parts of the aminoacid molecules, mostly driven by the relative charges of hydrogen atoms. These charges are weak by themselves, but together, form the basis of life. For instance, they also enable water to make its way up a plant.
Once proteins have been folded, it is their shape that determines their function. As such, by passing a specific RNA sequence through a ribosome, we can create a protein with the shape and thus function that we desire. This is how the COVID vaccine has been developed: by inserting a specific RNA code into a viral vector, which once injected into your body, offloads the code, gets read by the ribosome, which produces the S protein of COVID, which ultimately triggers the immune response.
What is the result of the immune response? Simply a protein that has a shape that enables it to latch onto the S protein (the protein that COVID19 uses to dock with your cells and release its malign information) and ultimately render it functionless — otherwise known as an antibody.
Proteomics
The function of proteins leads us onto proteomics, which is the study of how proteins perform different functions according to their shape (in my own words). Consider the protein hemoglobin, for instance, which is in charge of transporting oxygen to tissue. When it has been adequately formed (as a result of the correct aminoacid sequence coming out of the ribosome), it acquires a shape the produces no electrostatic forces between all the hemoglobin proteins in the blood flow, enabling each one to flow into tissue without binding up with each other. When the 6th aminoacid of its otherwise healthy amino-sequence is erroneously synthesized as Valine, the cells takes on a different shape. This shape leads to electrostatic forces between the hemo proteins and brings them together to create a fiber which inhibits their function:
So a slight error in the original DNA code, can lead to a slight error in the aminoacid synthesis, which in turn causes a wide systemic failure — oxygen doesn´t move around the body properly.
The hemoglobin example illustrates the power and promise of proteomics. By understanding the shape of proteins and their resulting functions, we can go back to the genomic level and understand the specific origin of many diseases. We can also fix theses diseases through gene editing, through technologies such as Crispr/Cas9.
Crispr/Cas9
Is a novel method/technology that enables us to cut pieces of DNA, just like we would do with scissors and to stick pieces of DNA, just like we would do with glue. Essentially, we can synthesis bits of RNA, which are ultimately attracted by electrostatic forces to specific and predictable sections of DNA and the Cas9 molecule makes a cut wherever the RNA molecule binds to the DNA. After the excision, the DNA naturally seeks to repair itself with available genetic material, which in the case of editing, we insert into the scene as donor material.
This is just one of the technologies available for gene editing, but the point is that now we can edit genes. Since we increasingly know more about what genetic information leads to proteins of what shape (and hence of what function), we are now able to modify the genome to cure illnesses.
The (Bioinformatics) Platform: AI, Genomics and Proteomics
The genome is a very large dateset and the organisms that emerge from it even more so. The body is a very very complex mechanism. For instance, 30% of mammalian proteins do not fully acquire the shape that makes them functional until they interact with another target molecule or protein. This by itself adds tremendous complexity in understanding function.
However, now we also have the capacity to store large amounts of data and perform large amounts of computation. This is the basis for AI, which in turn, allows us to process large datasets and extract insights and perform associations. A human cannot map a very large number of genomic sequences to a very large number of illnesses or protein structures, for instance, but a sufficiently advanced collection of neural networks can. This is the future we are heading towards. In around 20 years, we will know what parts of the genome to tweak to cure specific illnesses that today seem incurable, thanks to the confluence of AI, genomics and proteomics. For instance, DeepMind was able to train an algorithm known as AlphaFold that was able to very accurately predict the shape of a protein given a specific aminoacid sequence.
Investing
I will dedicate another post or series of posts to specific stocks. Here, I will outline the mental framework that I believe enables one to invest in this platform and it´s actually very simple. If FAANG´s market cap is above 3tr USD, then one can only imagine what the FAANGs that result from this platform may be worth, if they cure things like cancer and other illnesses that haunt us today.
The platform starts with sequencing, because that is how we get the data and ends with editing, because that is how we deliver the value. I have discussed the data value chain in this post. It is an entire topic in itself and worth visiting extensively, since I believe it will be the underlying current for most if not all industries in the future, if it is not already.
In terms of investing, except if one has a very deep knowledge in the domain, I would not venture beyond this framework (as of today). This is because the ramifications of this platform are pretty much unpredictable. We do not know what companies will cure what illnesses and when, but it is fair to say that whatever cures emerge will do so from this platform.
The analogy is the smartphone. When the platform was increasingly ready to go, it would have been hard to predict the ramifications, but you could kind off tell that innovation was going to happen on it and ultimately, value was going to be delivered in all sorts of ways. The above is the same. A diversified approach is highly recommendable, but could also be highly lucrative, per the above FAANG comparison.
Beyond an investing point of view, I am also thrilled to see this platform really make us all better off through time.
If you enjoyed this article, remember to subscribe for free to my newsletter for more! Also, please share the post with friends that you think may find it useful. Thank you so much in advance for your support!
You can also reach me at:
Twitter: @alc2022
LinkedIn: antoniolinaresc