top of page
  • Writer's pictureAngika Basant

The Origins of COVID-19: the truth may not excite you

Biomedical science lacks the sheen and celebrity of space research; wouldn’t you say? Space researchers in movies go on fantastical, thrilling adventures of discovery but a biologist often goes rogue and makes unstable mutants that pose a threat to our existence. Colossal investment in looking for life outside our planet garners so much fanfare. But investigations of life on the only inhabitable place we know of can become shrouded in controversy and invite wrath. In the past weeks, debates over origins of SARS-CoV-2 are (re)appearing everywhere I look - in mainstream print media, news podcasts, scientific journals, and family messaging groups. Most virologists say this virus appeared naturally in humans as a zoonotic transmission directly or indirectly from bats, but some worry it could be the result of scientific experiments and/or has accidentally leaked from a lab. Angry people ask why such dangerous research is allowed and why we don’t know the origins of this pandemic.


An article by Nicholas Wade contains many of the arguments behind the lab-origin theory. It is a long write-up that cites many sources, and ostensibly appears thoughtful and well-researched. It has been reproduced on media platforms of repute such as TheWire.in in India and used by Shekhar Gupta (ThePrint) to script a 34-minute episode (where Gupta also equates natural origins of a virus to an act of God). Even when news outlets report cautiously, they conclude by stating that there is not enough evidence for either a lab leak or natural origin of this virus. This is misleading. The points raised by Nicholas Wade have been refuted, but the tenets of the arguments are complex. I try to explain here why the notion that SARS-CoV-2 came from a lab is presently speculative.



In a natural origin scenario, a precursor bat coronavirus infected an animal wherein it evolved to look like SARS-CoV-2. This animal (technically called an intermediate host) could be a pangolin (discussed briefly later) or another unidentified animal that passed on the virus either directly to humans or via yet another animal. Alternatively, this initial animal recipient could have been a human. Like other animals, a human host could allow the virus to gradually morph into what we all recognise as the pathogen behind the COVID-19 pandemic. The lab leak theory suggests that this virus was created in a lab either by repeatedly growing viruses and essentially evolving them in a dish, or by cutting and pasting genome sequences to create a new virus. I address why viral sequences do not indicate that SARS-CoV-2 arose in these ways. A final scenario is that a coronavirus sample that researchers collected from the wild infected a lab worker and evolved outside the lab. Unfortunately, this could be virtually indistinguishable from the natural origin pathways described above.


One of six seemingly ominous questions (grey circles in the diagram below) typically pops up when someone suggests that a lab-leak may have occurred. In green I briefly summarise plausible explanations for each. If substantial evidence to the contrary appears, the scientific community would want it thoroughly investigated. But for now, let’s unpack what we know. The corresponding red numbers in the main text guide you to more on each question. I­f you’d like to dive deeper, I found it informative to read the WHO report on Phase One of its investigations and interviews (here and here) with members of the team looking into origins of this virus.




Why did this pandemic start at the “doorstep” of the Wuhan Institute of Virology (WIV)?


1. Coronaviruses were first discovered in the 1960s to cause the common cold but until SARS1 broke out in southern China in 2002, these viruses were considered benign. As is often the case, emerging pathogens become the focus of research at institutes in their home country. The WIV has been a leading centre for coronavirus research in the last couple of decades and Shi Zheng-Li is a well-known expert here. Her work includes sampling coronavirus variants from bats in various parts of China, studying their genomes to catalogue and understand them, to better prepare for or prevent a future pandemic. She was trained in France and has collaborated with teams around the world, published and shared her work in reputed journals and conferences over the years. She and her lab have become central figures in the origin story of the virus. Did they make or accidentally release this deadly pathogen?


In 2012 a group of miners in Mojiang county, Yunnan province fell ill with severe pneumonia. The samples collected from the mining cave by Zheng-Li’s lab has received a lot of attention, particularly following a report by the Wall Street Journal. These samples are touted as the only connection between bat caves in southern China and Wuhan, where COVID-19 broke out. Significantly, however, Wuhan is also a key centre for a booming wildlife trade for food and traditional medicine, which is a ~$70 billion industry in China and employs 14 million people. The hazards in this industry are highlighted well by a LA times reporter who says “[z]oonotic diseases are a risk in all animal farming, including that of poultry and livestock, such as chickens and cows. But wildlife poses a higher danger because humans are largely unaware of the diseases wild animals could carry. China’s lax regulation results in wildlife farmers often mixing many types of animals together, increasing virus mutation and transmission.” It may also be important that an African swine fever outbreak in 2018-2019 had led to large-scale culling of livestock in China. This may have placed a greater demand on wildlife trade for food supply at the time (as reported here based on analyses in this preprint study).


2. Another shadow was cast on the WIV because three members of their staff reportedly fell ill in November 2019 and sought hospital care. This story emerged from a United States intelligence report in 2020 during the Trump administration. But the evidence behind this claim has not come to light in the last year. The WHO has invited anyone with more information to contact them even anonymously, but no leads have appeared. The WIV reported to the WHO that serological test results of their staff were negative for COVID. Without more evidence and given this time of the year was influenza season in China, it is difficult to infer more. But let’s say they contracted an illness from a virus present in WIV. What viruses could they have encountered?


What does the genetic fingerprint of the virus tell us?


In February 2020, Zheng-Li’s group published the genome sequence of a bat coronavirus sample named RaTG13, which was collected from the Mojiang cave in Yunnan, and compared it with the SARS-CoV-2 genome sequence. These genomes were reported to be 96.2% identical, making RaTG13 the closest known relative of SARS-CoV-2 (other coronaviruses such as SARS-CoV and MERS are only about 76% and 50% identical to SARS-CoV-2 at the whole genome level respectively). Though the figure 96.2% may sound like the two viruses are nearly the same, this is actually a big difference as far as genomes go. Humans share about 98% of their genomes with chimps and bonobos. For the size of the coronavirus genome, a difference of 3.8% implies that more than 1000 “alphabets” in the genome are different between RaTG13 and SARS-CoV-2. This is a considerable dissimilarity.


A proposal says that the virus can be repeatedly grown in the lab and mutated in a dish to change its sequence. Shi Zheng-Li has said that they were never able to culture the RaTG13 virus, only managed to sequence it. If they had grown it in the lab, given the mutational rate of coronaviruses, it has been calculated that it could not have changed 3.8% of its genome in cell culture since its discovery in 2013. This would also hold for RaTG13 accidentally escaping from the lab; it could not evolve in humans to become SARS-CoV-2 either in such a short time.


Lab-leak proponents also argue that there is more hidden away in WIV that is unpublished. Shi Zheng-Li presented her lab’s work to the WHO team when they visited and has also responded to a list of pointed questions to the journal Science to emphasise that this is the only conclusive and relevant data they had. Given that she has been an active member of the community that studies coronaviruses, I would like to assume that she has shared the entirety of what she knows. But unless an audit is conducted, this idea cannot be put to rest. And if such an audit yielded nothing, it could still be said that something was covered up. Another theory is that the RaTG13 virus could have been used as a parent virus to deliberately insert different sequences and make SARS-CoV-2 or its ancestor. Genomes stitched together typically leave signs, such as “cut/paste” sequences of enzymes used for the design. But these have not been found. If an unknown parent virus or a previously undescribed method to make viruses was used, then that would also lead us to a lab audit conundrum.


3. The idea that SARS-CoV-2 has “peculiar man-made bits” inserted in its genome has arisen from scrutiny of the spike protein sequence. The string of amino acids that make up the SARS-CoV-2 spike protein differs from RaTG13 in two main ways. These features have been flagged as highly concerning and indicative of possible manipulation of the virus in a lab. Firstly, in a patch of the spike protein that binds the human receptor ACE2, 6 contact points differ between SARS-CoV-2 and RaTG13. It has been argued that this could have been engineered. However computationally these changes do not predict improved binding to human cells. Furthermore, these changes in contact points have been recently shown to naturally occur. Viruses isolated from Malayan pangolins match with SARS-CoV-2 in 5 out of 6 of them, as described in studies published after the pandemic began.


The second feature of the SARS-CoV-2 protein that has gotten much coverage is something called a furin cleavage site (FCS). This is a short stretch of amino acids in the spike protein that can be snipped by a protein from the host that acts as a molecular scissor (this is an enzyme that cuts a protein when it sees a particular pattern of amino acids in it). The snipped spike protein facilitates the virus and host membranes to pull closer together and ultimately fuse so that the virus enters and initiates infection. Alright, so what’s so fishy about the FCS in SARS-CoV-2 spike?


One, Nicholas Wade and others have argued that this type of an FCS is not found in SARS-like coronaviruses and therefore must be engineered. This is patently false. For example, MERS and HKU1 (a coronavirus that causes mild symptoms in humans) spike proteins carry this FCS sequence in their spike. This was discussed in a peer-reviewed letter to Nature by a group of virologists in early 2020. In fact, the FCS in SARS-CoV-2 spike is sub-optimal and recently SARS-CoV-2 variants such as “Delta” variant are showing an improved version of the site which may explain their greater infectivity. But why would two coronaviruses like MERS and SARS-CoV-2 that are otherwise rather dissimilar when you consider all the alphabets of their genome, have a common feature like the FCS? An article in the journal Nature rationalises that “Because viruses containing the site [FCS] are scattered across the coronavirus family tree, rather than confined to a group of closely related viruses, Stephen Goldstein, a virologist at the University of Utah in Salt Lake City, says the site probably evolved multiple times because it provides an evolutionary advantage. Convergent evolution — the process by which organisms that aren’t closely related independently evolve similar traits as a result of adapting to similar environments — is incredibly common.” Let me give you another example of convergent evolution. Both bats and whales find their food and prey by a method called echolocation. The two animals are very different from each other, but they converged on a common trait because their needs demanded it. Bats need to find food in the dark and whales need to do so in the depths of the ocean. It is not surprising then if two distant viruses choose a common mechanism to infect their hosts.


A second aspect of the FCS that has generated waves in the media has to do with the genetic code that serves as the template for its protein sequence. As you may know, “alphabets” of a genetic sequence in living organisms get “translated” into amino acids which when beaded together make a protein. This is done using three letter codes called codons. Codons are “read” by the machinery in a cell to put in the corresponding amino acid in the protein being made. The number of possible codons outnumbers that of the corresponding amino acids (64 possible codons code for 20 amino acids), so there is some redundancy built into this code. Every amino acid has more than one codon designated to it, but not all codons for a given amino acid are used with the same frequency. Every organism has preferred codons it likes to put in its genes. It has been pointed out by some (including Nobel laureate David Baltimore) that the codons in the SARS-CoV-2 FCS are suspicious and rare for a coronavirus, and therefore they must have been put there by someone. This has been debunked. The codons though uncommon, do exist naturally in coronaviruses 2-7% of the time. Remarkably, these FCS codons have not changed in >99% of human COVID samples sequenced so far over the course of the pandemic, suggesting that the virus is not mutating away from an “unnatural” genetic sequence. David Baltimore has said he agrees with this assessment to Amy Maxmen (Nature) and as reported in the Guardian, but he thinks there are other possibilities too.


It is very tempting to think that we are in this horrific pandemic because of a defined event, planned or accidental. The messy course of evolutionary changes in viruses that may have led us here are hard to fathom. What we know about the diversity of bat coronaviruses barely scratches the surface. But the puzzle may be pieced together eventually. Studies are now being published of newly sequenced viruses from bats captured many years ago in Cambodia, Thailand and Japan. These possess many of the spike protein features that were first seen in SARS-CoV-2. None of these sequences were determined until after the COVID-19 pandemic began. These viruses may not be direct relatives of SARS-CoV-2 but as they appear to have some of its traits, these new data are important when discussing its ancestry. For example, the comparisons made in Indian biologist P. Balaram’s opinion piece are incomplete and misleading without these new viruses to fill in the SARS-CoV-2 family tree.


Why is this taking so long and what leads do we have?


4. As far as zoonotic spill-overs go, SARS1 and MERS were exceptional in a way. The civets and camels that served as intermediate hosts respectively, were found a few months after the disease broke out. This gives people the impression, as Nicholas Wade states, that “copious traces” can easily be in the environment when such an event occurs. We haven’t always gotten answers easily in the past. Measles virus comes from the rinderpest virus that infects cattle, but it took about 1500 years to figure that out. Ebola first appeared in 1976 and we have had outbreaks as recently as 2013, 2017 and 2018. But it is uncertain where the virus came from. The best guess so far is that there is a bat virus that served as a precursor.


In the case of the MERS outbreak, it helped that camels carry a high viral load of this virus, making it easier to detect and there were fewer animals to screen. The identification of civets as an intermediate host during SARS1 was somewhat fortunate because while most markets in Guangdong were cleaned out as a precautionary measure, one market had not been. The civets at the market also had a high viral load and the virus transmits very quickly in them. But the farms from where they were sourced did not test positive. This 2020 study shows how the geographical origin of SARS1 intermediate host is still unclear, and suggests that the virus may not have spilled-over in civets; we may still be missing the direct link with bats. It also took 14 years to zero in on which bat SARS1 came from (studies published by Shi Zheng-Li here and here). Interestingly the cave dwelling of these bats was 1000 km away from the location of the SARS1 outbreak. Shi Zheng-Li and her team warned of future virus spill-over from bats in these publications and in this review.


One may expect that for the natural origin theory of this pandemic to be true, there will need to be an animal or group of them that are spewing the SARS-CoV-2 progenitor virus. This undiscovered animal may have undetectable levels of virus or may have cleared its infection. Or further still, there may not be an intermediate host at all. A 2018 study shows that people living in proximity to bat colonies have detectable antibodies to bat coronaviruses; indicating that some bat viruses directly infect humans. And this is something to bear in mind when analysing sequences of the virus from the earliest cases we know. The first human to be infected may have gotten the virus from a bat and it could have been moving through human hosts asymptomatically for quite some time until it was first detected.


5. Questions have been raised about why the virus was so “well-adapted” to humans from the “start” of this pandemic. Alina Chan’s bioRxiv paper, & Boston magazine piece argues that it is unlikely that “the virus had been circulating undetected in humans for months, working out the kinks, and nobody had noticed” because “China’s health officials would not have missed it, and even if they had, they’d be able to go back now through stored samples to find the trail of earlier versions. And they weren’t coming up with anything.” In fact, it is not clear that enough sampling for early asymptomatic carriers has been done. However, over the course of Phase One of WHO’s investigations, the Chinese government has agreed to do that exactly to facilitate the next part of the WHO study.


We routinely hear about worrying B variants of the virus in the news from different geographical locations: B.1.1.7 in the UK, B.1.351 in South Africa and B.1.617.2 in India. It is now established through the WHO’s work that an A strain, that differs from the B strain in two letters, was also isolated from early patients in Wuhan. This strain was linked to different wet markets in the city. This suggests that what we consider the “start” of this pandemic at Huanan seafood market where only B strain viruses were detected, may in fact have been a super-spreader event and a later stage in the evolution of this virus. Calculations based on the mutation rates of coronavirus genomes and the earliest lineages sequenced suggest that pro-SARS-CoV-2 appeared in late October 2019. The WHO is now working with blood banks inside and out of Wuhan for patient samples for November 2019 to test them for any traces left by the virus.


Another line of investigation is to track back to wildlife farms that supplied to Wuhan markets and test people who had handled animals for any signs of infection. These approaches are time sensitive as antibodies and other markers may not last indefinitely. Additionally, as the WHO teams notes, supermarket supply chains are very complex in even the most developed countries. So, this is going to be a very tall order and the road ahead is not straightforward. Following COVID-19, trading in wildlife has been completely shut down in China and many of the centres are doing other work such as making coat hangers. While this seems prudent, it could also risk increase in illegal wildlife trade and make tracing of pathogens in the future even harder.


Is lab research on viruses pointless and dangerous?

6. It is true that lab leaks are not unheard of. Viruses have escaped research facilities in the past. Researchers can forget the dangers of the molecules and organisms that they are working with, and safety precautions and oversight can be inadequate. This is the case in many parts of the world. To give you some examples: in 2014, long-forgotten tubes of the smallpox virus were found in a cardboard box near a research centre in Washington. In 2015, the US military accidentally shipped live anthrax to some labs across the world. There are rumours that the biosafety facilities in the institutes at Wuhan were lax, but the basis of these concerns is unclear. Lab leak or not, every lab in the world could afford to be more careful about the materials they work with. The WHO team has described in their interviews that they asked pointed questions to WIV about the possibilities of a lab-leak, and they did not feel any information was being hidden from them. However, their remit was not a full laboratory audit. On 27th May, 2021 in response to much speculation in the media regarding how this pandemic came to be, US President Joe Biden gave his intelligence agencies a 90-day deadline to report back on what they can find out. There are concerns that any hostile demands are likely to backfire causing the Chinese government to be less inclined to share information. International cooperation is required in the long run, specially from China that is home to so many bat species. Which brings us to the vital importance of bat coronavirus research.

Bats have an unusual immune system. They can tolerate high concentrations of viruses without mounting an incapacitating immune response to them. They shed these viruses in their faeces and saliva which may pose a risk to nearby human dwellings. The 2018 Nipah virus outbreak in Kerala is thought to be linked to consumption of raw date palm (toddy) that was infected with virus from fruit-eating bats. However, the scale of this risk has been underappreciated.


Shi Zheng-Li’s lab has been accused of doing dangerous “gain-of-function” research and making “chimeric” viruses that could have been exceptionally dangerous. In a 2015 Nature Medicine publication her lab, along with collaborators, showed that by grafting a spike gene from a horseshoe bat coronavirus into the SARS1 virus genome, human airway cells could be infected in culture. This is an important discovery because this indicated for the first time that some bat coronavirus proteins had the capacity directly infect human populations. Previously an intermediate animal host was thought to be essential. It would be hard to understand and prepare for emerging viral threats if we do not have good data on what these viruses can do and what they look like. The Zheng-Li lab publication of the RaTG13 virus in February 2020 showed that the ACE2 human receptor is involved in SARS-CoV-2 infection. This was not obvious from the start; the MERS virus uses a different receptor to get into host cells. This allowed targeting of the ACE2 protein for therapeutics.


Zheng-Li’s lab has been collaborating with EcoHealth Alliance, a non-profit for pandemic prevention in her investigations of bat coronaviruses. The negative attention received by the work in WIV has led to the National Institutes of Health cancelling their funding in April 2020. This is a worrisome precedent. Science that will benefit humanity requires a lot of resources and long-term investment. We have all submitted samples for PCR in this past year. The method would not have existed if it wasn’t for someone studying bacteria that no one cared about. Understanding of the spike protein came from investigating an obscure mouse virus. The unbelievable feat of generating multiple COVID vaccines in under one year is not a miracle. Rich countries threw pots of money into the effort and the science behind it has been decades in the making. For more on COVID-related discoveries, do have a look at this wonderful website.


Until we meet again..


Viruses have been interacting with us for millions of years. It is estimated that 8% of the human genome has come from viruses. It is mind boggling. Some endogenous retroviruses confer protection against other infections. Some of them have even changed the course of evolution, you could say. Mammalian internal pregnancy is possible only because a viral gene (syncytin) was integrated into ancient mammalian genomes at least six independent times. This gene confers the property of cell fusion which allowed the placenta to form that nourishes a growing foetus inside a mother’s body.


Zoonotic viruses are also not new to us. Maybe some seem like fringe events such as Australia’s 1994 Hendra virus infections, in which the contagion jumped from horses to humans, and Malaysia’s 1998 Nipah virus outbreak, in which it moved from pigs to people. In both cases pathogens originated in fruit-eating bats. Horses and pigs were merely the intermediate hosts. But there are other zoonotic diseases we’ve lived with for so long that they are almost cute and endearing like chickenpox and measles. And the rabies virus that so easily infects so many animals including us? 60,000 people die of rabies each year, but we never think about it unless an unfriendly dog comes snapping.


The viruses we know and what we know about them is perhaps as little as space scientists know about the universe. There are galaxies upon galaxies to be traversed yet. Researchers studying coronaviruses are acutely aware of the fact that there are large gaps in knowledge to be filled and many unknown risks linked to them. I’ll leave you with some stunning graphics of viruses and their variety, and a quote from Shi Zheng-Li, “we need to find the viruses before they find us”.

Recent Posts

See All

India's new Citizenship Amendment Act

This post assumes the reader knows a little background about current affairs in India. Governments and Prime Ministers do not last forever, but their policies and legacies can, and resentments fomente

An eternal battle

Francis Crick was one of the three people who won the Nobel Prize for deducing the structure of DNA in the 1950s - a discovery one can easily say has changed the world. Having recently joined The Fran

bottom of page