TL;DR: Venter and his collaborators originally set out to design a stripped-down genome based on what scientists knew about biology.
...
With the right tools finally in hand, the researchers designed a set of genetic blueprints for their minimal cell and then tried to build them. Yet “not one design worked,"
...
So the team took a different and more labor-intensive tack, replacing the design approach with trial and error. They disrupted M. mycoides’ genes, determining which were essential for the bacteria to survive.
...
Venter is careful to avoid calling syn3.0 a universal minimal cell. If he had done the same set of experiments with a different microbe, he points out, he would have ended up with a different set of genes.
...
In fact, there’s no single set of genes that all living things need in order to exist.
...
They found that not a single gene is shared across all of life. “There are different ways to have a core set of instructions,”
...
Venter’s minimal cell is a product not just of its environment, but of the entirety of the history of life on Earth.
...
He and others are trying to make more basic life-forms that are representative of these earlier stages of evolution.
...
Some scientists say that this type of bottom-up approach is necessary in order to truly understand life’s essence. “If we are ever to understand even the simplest living organism, we have to be able to design and synthesize one from scratch,”
...
“We are still far from this goal.”
As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown ones and you don't know anything about their dependencies and relationships seems to be pretty much impossible.
> In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life.
That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.
I worked in the synthetic biology lab (on a different project) while the early stages of the Syn3.0 was being done, and also have paid several visits to chat with them in the meantime. [Proof: first author on http://dx.doi.org/10.3390/ijms16012020] I was a fly on the wall for most of the group meetings and even contributed some unpublished results (there were transposons that were causing problems by shuffling the the DNA in the yeast and I hypothesized the orientation that causes the issue and discovered the yeast genes responsible for this process)
Firstly what constitutes "minimal" depends on what you feed the organism. There's a set of bacteria called phytoplasma which are plant parasites that are missing genes to make nucleotides (even mycoplasmas have those) and they've adapted by sucking nucleotides from their hosts.
Some insider information: Most of the mystery essential genes are vague cell wall proteins. Probably what is going on is that if you knock out too many of these genes, you lose cell wall turgidity and the cell becomes nonviable. So what is important is not so much which of these genes you have, but how many of them you have.
Second insider information, for fun: The "hypothetical minimal genome" (syn2.0 is referred to as HMG in the paper) was actually a backronym because we called it the "hail mary genome" but decided that was inappropriate for publication.
The "hypothetical minimal genome"...was actually a backronym because we called it the "hail mary genome" but decided that was inappropriate for publication.
One in-house ORM I worked with was called TFP, because it was entirely written on a plane ride to Houston, and they named it TFP because the authors liked to say they wouldn't touch it again with a ten foot pole. Their company later thought about trying to sell it, with the name "Technique for Persistence."
> Most of the mystery essential genes are vague cell wall proteins. Probably what is going on is that if you knock out too many of these genes, you lose cell wall turgidity and the cell becomes nonviable. So what is important is not so much which of these genes you have, but how many of them you have.
Sounds like dependency hell. Gene A only works in the presence of Gene B which only works when there's Gene C which needs Gene A. Knock any one of them out and everything breaks.
In this simplified system, the number of these cell wall genes in the genome is proportional to the quantity of the cell wall proteins produced. It's more of a quantity issue, rather than a dependency issue. For example, having more developers usually results in more lines of code written in a given period of time, especially in enterprise. The quality of code might be questionable but the quantity is usually there.
>> Firstly what constitutes "minimal" depends on what you feed the organism.
So maybe the environment needs to be changed to support an organism with a smaller genome. The world today is certainly different than when life began. It's probably difficult to evolve the environment, so they'd need to understand what goes wrong with gene deletions and figure out how a different environment could help.
When dnautics says "different environment" he's not talking about the global environment, he's talking about the environment around the cell. A minimal viable cell should be able to survive on water, some dissolved gasses, and an energy source like glucose. Bacteria that have even smaller genomes are doing that by stealing components from other lifeforms.
I'm not exactly sure what went wrong, but I think you're really misunderstanding something dnautics said.
> A minimal viable cell should be able to survive on water, some dissolved gasses, and an energy source like glucose.
M. mycoides certainly does not survive on that. Mycoplasmas need cholesterol and lipids because they lack the HMGCoA reductase pathway and FAS, so all the media you grow them on are enriched for that (started as FBS, but we found horse serum worked great and was cheaper). In general, I think you need a few more things than just that for most basic life (nitrogen, phosphorus, metals)... Is using glucose cheating? Because you could use light for energy and CO2 for carbon, but that adds a ton ton ton of genes...
So this becomes a nitpicky definitional conundrum.
>> Bacteria that have even smaller genomes are doing that by stealing components from other lifeforms.
There are no bacteria with smaller genomes today, that was the point of making this one. I was not talking about the global environment either, but the number of options for what substances and concentrations may need to be present in the environment around a cell is certainly very large - making trial and error difficult.
> They found that not a single gene is shared across all of life.
It is definitely a function vs identity. Many mutations of a single gene can perform the same function -- there is a lot of redundancy in genetic code.
>If he had done the same set of experiments with a different microbe, he points out, he would have ended up with a different set of genes.
This is very interesting. I expect also, that if they proceeded with their knockout in a different order or different sets at one time they would end up with a different set of genes. Maybe they have some massively parallel way to knockout genes, but I doubt they have explored the entire set of knockouts from the original functioning natural organism. They very likely could be at a local minima of genes required (eg no single gene can be knocked out, but no where near the absolute minimum).
Years ago I read an interesting paper comparing the types of subsystems found in the Linux kernel with subsystems found in biology (or what we understand of them currently).
There is a lot of redundancy in biology, which allows for biological systems to be pretty robust, adaptable and fault tolerant. Of course this also means it's really hard to fix that biological process when things go horribly wrong, such as in many forms of cancer.
Paper: “Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks.” By Koon-Kiu Yan, Gang Fang, Nitin Bhardwaj, Roger Alexander, Mark Gerstein. Proceedings of the National Academy of Sciences, Vol. 107 No. 18, May 4, 2010.
This means that we need first to decide on common environment that minimal synthetics will live on. Then, with this environment fixed, we could try to isolate minimal set of genes for multiple different organisms that are originally viable in this environment.
Such environment should also be 'minimal' - i.e. having lowest number of resources required to construct it and lowest amounts of those resouces.
> They found that not a single gene is shared across all of life
I think this is a pop-science misunderstanding of what the authors were trying to say. They couldn't find a single unique "minimal set" of genes- but there are genes that we know all organisms share, like the rRNA genes and homologous core ribosomal proteins. All organisms will also require some kind tRNAs and their genes from some source, even if the tRNA genes themselves are not highly homologously conserved.
Exactly. The rRNA peptidyl transferase loop is conserved throughout all life, for example. Like, you say the researchers obviously understand this, but author of the article likely did not.
>In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life.
Isn't the current theory that all life on Earth has a common ancestor? Wouldn't this point put that into question and introduce the possibility that life has developed multiple times independently of each other or would this lack of shared genes eventually appear through divergent evolution? It seems like the former option would greatly increase the fl component in the Drake Equation meaning alien life is even more likely than previously thought.
"Isn't the current theory that all life on Earth has a common ancestor? Wouldn't this point put that into question"
Not really. For one thing there's too much commonality across life in terms of things like the genetic code, i.e. what a sequence of genetic letters "means". The genetic code is arbitrary but essentially universal with minor variants.
For another thing, the organisms now present can be separated from each other by as much as 8 billion years of evolution (4 billion on each branch) meaning there's no particular reason to expect some shared gene to be present everywhere. New genes arise as mutations from old ones, old genes become obsolete and get removed, etc.
It's vaguely like how, if you fork a coding project, and the two projects are allowed to continue indefinitely, eventually there might be no code in common, even without a complete rewrite ever happening.
Not when your browser fork is re-purposed for something else almost entirely, like an inline image or documentation viewer. As I understand it, organisms often have novel uses for old tools.
I would guess that the earliest life didn't accidentally hit upon optimal or even reasonably good ways to do whatever is needed to reproduce.
If that's the case, it isn't that unlikely that every evolutionary branch managed to improve (or discard) every part of the original machinery in some way.
Depends on what kind of life you consider. It's true for mammals. It's true for animals. It's probably even true further up our ancestry. But apparently it doesn't mean that "all things that have genes" share a subset of genes from the same common ancestor. As I understand it that's because they share a common ancestor that didn't have genes (but at that point I'm not sure whether "ancestor" is even the right word).
Only if all life converges on having dna. If there are multiple origins of life that survive there would probably be multiple different self replicating molecules.
There is a great book written 20 years ago by Steve Grand called "Creation: Life and how to make it"[1] it basically uses the game Creatures[2] as the testing environment for creating Artificial Life. There are some great thoughts in that book also about how science needs more cross disciplinary experts to connect the dot's between insights in one field and apply it to others.
> I deeply believed that organisms share the same basic set of genes
They do. What you're responding to says there's no one gene shared by all organisms. The genetic intersection of any two organisms is large; the genetic intersection over all organisms is empty.
If you consider the genome to be information, life is effectively the process that information supports by encoding a method to accomplish the process. Replicating and using energy in a coordinated manner are two processes of life, but their information-based encoding can differ significantly based on the method used.
My understanding is that all known organisms share at least the ribosome RNA genes. Unless you're counting viruses, which is not exactly fair since they hijack the host's ribosomes.
but there is RPL1 up to RPL41, and same for RPS and then MRPL and MRPS , so despite the similarities, one can argue different organisms may have different rRNA genes.
RPL1 and such are genes for ribosomal proteins. I'm talking about the genes that encode the ribosomal RNA subunits. These genes are used to infer the evolutionary relationships of the tree of life, so presumably they are shared across all known organisms, or else it would not be possible to infer a phylogenetic tree.
Is there a single set of instructions needed for Turing Completeness? It's possible to have Turing Completeness in a random access machine with just one instruction. I should think you really need to satisfy certain capabilities, and there would be many ways to satisfy those.
> That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.
It should seem obvious in retrospect. Turing machines have many incarnations as different instruction sets in modern CPUs. I don't see why cells wouldn't have analogous multiple expressions, unless your deep belief was that there was only one way for protein chains to replicate.
> As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown ones and you don't know anything about their dependencies and relationships seems to be pretty much impossible.
Is there any way at all at simulating the tests? I'd like to apply metaheuristics to search for which combinatorics work
This would require a quantum chemistry simulation many orders of magnitude larger and longer than what we currently capable of simulating. Though I suppose you could in principle use heuristics to greatly reduce the conputational complexity.
The definition of life is a tricky question. The most obvious example is red blood cells dump their DNA. So they can't replicate, and virus can't target them. However, insect drones also can't reproduce and are considered alive so it's a little more open to debate than you might think.
As to life without DNA. The methodology is somewhat in question for this but here is a more in depth link: http://science.sciencemag.org/content/332/6034/1163.full Now, this is really 1:1 with DNA just using slightly different chemistry, but it's hard to call something DNA when you swap out one of the building blocks.
Other examples are hard to locate in large part because you can't direct detect this stuff.
Anyway, my point was while a virus is 'stuck' using DNA as it needs to infect things that use DNA. However, they can use RNA internally.
PS: For a really out there example, prions seem really close to life.
This does not mean that arsenate does not get into the bacteria, he points out. “It just shows that this bacterium has evolved to extract phosphate under almost all circumstances.”
Yes, it really wants phosphorus. But, a strong preference does not create phosphorus when there is none to be found.
Not that single cells really want or try etc, but you get the idea.
I'm not aware of any life which lacks DNA. This is kind of "definitional" - really depends on your definition of life, but I don't include viruses, because they don't have a regulated metabolism.
It's obviously incorrect. It's not even really a logic error, it's a category error.
At face value the statement "if not all living things share a common ancestor, that makes them appear more like they were intentionally created" (although that line of reasoning, i.e. the watchmaker argument, has been debunked time and time again) seems reasonable.
However the statement seems far more grandiose than it actually is.
If two very "primitive" organisms (i.e. operating at a much lower complexity with much smaller and fewer moving parts than even, say, an earthworm) don't share their sets of DNA that may be unexpected (because whatever step there was from "not having DNA" to "having DNA" had to happen twice, separately, successfully).
If we found out that, say, all higher lifeforms shared no common DNA with other lifeforms outside their group and those groups closely aligned to the Christian "kinds" (which don't really map to biologically distinct groups in meaningful or consistent ways) THAT would be astounding.
But even that extreme case wouldn't make the "creation hypothesis" (if you even want to call it that) more plausible because it presumes the existence of an unexplained organism that is far more complex than anything it supposedly created. It doesn't explain anything -- it just shuts up questions about the "origin" by giving an answer that is impossible to analyse further.
There may have been a creator. But it's impossible to make a consistent logical argument for it. And it's entirely impossible to make an argument that that creator would in any way resemble something modern mainstream religions are worshipping.
Interesting work, but it reminds me of the famous "Could a biologist fix a radio?" paper [1]. The paper imagines biologists trying to determine what parts of a radio are essential using a similar trial and error technique. As you can imagine, this technique would lead to many erroneous conclusions, especially when paired with the "publish or perish" and "gold rush" mentalities so prevalent in academia. It makes you wonder if we are trying to understand biological systems with a fundamentally wrong approach.
This criticism of the way biologists/geneticists/biochemists approach problems is I think quite misguided and misunderstands both limitation of testing things you can't manipulate with your bare hands, and how subtractive (take stuff out, see what happens) methods fit into the larger process of biological inquiry.
First, when you can't manipulate things directly, it becomes more difficult to tweak systems slightly to learn how they work. This is especially true if you start with something made from things you don't understand, and for which you may not even have a complete catalog. The radio paper raises some issues and acknowledges these challenges, but then sort of glosses over them, saying, "it's complicated, but that's just because we don't understand and don't use the right language."
Second, biologists don't just use subtractive methods, they try to use additive approaches to complement and test the hypotheses that come out of "it's broken when I take this bit out." Biochemists also try to reproduce complex behaviors by mixing only the purified components of the system in question. Biochemists and biophysicists also often measure very precisely physical and chemical properties to mathematically define the behavior of biological components.
The point is biology is hard, and we barely know anything about it. We have to start with simple experiments that point us to what's important and then progressively do better defined experiments to figure out how it works in detail. The next step here is: now that we know what proteins are important, let's figure out how they work exactly. We couldn't have gotten there before trying to knock out as many parts as possible.
TL;DR: this isn't the wrong approach, it's the right one, and given the limitations of the system, realistically the only one we have at this stage. It's also only the first step, now we get to try other approaches on what we've learned.
The last couple pages of the aforementioned paper give some thoughts on that. The author suggests an approach more like engineering that is focused on formalized mathematical descriptions of biological systems.
The submitted article talks about attempts to design a minimal set of genes, but they kept failing. The difficulty with your comment is it sounds like you're dismissing the work; I'm sure they are aware of that paper and its lessons. They tried working in that direction and failed.
Interesting, thanks for the link. I wonder about their trial and error approach as well. Scientists have conveniently broken down genetic material into individual genes which are easy to think about, but biology doesn't care about being easy to think about. It seems like to get anywhere they would have to knock down genes in all sorts of different combinations, not just one at a time.
I'm afraid it's not the individual genes themselves that are important to life, but a specific combination of those genes. Further complicating the problem is that we have no idea how many genes comprise an "essential group". Is it a combination of 2 genes? 3, 5, 10?
When you're talking about 175 unknown genes, the combination of all of these is a huge number. It's like finding a needle in a haystack the size of the solar system.
I don't think this brute force approach is going to work, we need a different way to figure this out, but I'm confident that once figured out, it will seem simple looking back on it.
Also, isn't there an issue of "gene" being something of an unnatural category? Nature doesn't provide a clean labeling of what part of the sequence is "a" gene, and the term historically existed (AIUI) to denote regions that the researcher found to have phenotypic relevance. So I'm not sure it's helpful to phrase a minimum in terms of "n genes". See:
> the term historically existed (AIUI) to denote regions that the researcher found to have phenotypic relevance
That's Mendelian genetics. Geneticists focused on DNA as a physical molecule rather than a conceptual unit of inheritance use the same word to refer to a region of DNA that produces a protein.
They're both important concepts, but they're not the same concept.
Okay I got the terms wrong but I think the core point stands: a gene is typically defined with respect to some research objective and area of interest, and is highly variable depending on what you're looking at. It doesn't (yet, at least) have a natural boundary. People looking at one phenomenon might see one boundary for a a gene, while others see another.
In fact, I think that term is great for comparison: "phenomenon". Asking for the "minimum genes for viable life" seems like asking for the "minimum number of natural phenomena to put a man on the moon". Are those two phenomena really just different manifestations of "gravity" or are they two things in their own right? "It depends."
Arguably, the production of a protein is nearly the smallest element of the observable characteristic of an organism produced by the interaction of its genotype with the environment, and thus sort of the fundamental unit of phenotype, so, from that perspective, they are exactly the same thing.
They are not the same thing from any useful perspective. You can remove a (protein-coding) gene, or replace it, without altering the phenotype at all. You can make changes outside protein-coding regions and get large phenotypic changes.
I am thinking of George Boole (boolean algebra and logic) who wrote in the 18th centuary -- that a symbol (an individual gene in this case) in itself is not meaningful but that its meaning depends on how it combines with with other symbols.
Quote-
"..the validity of the processes of analysis does not depend upon the interpretation of the symbols which are employed, but solely on their laws of combination. Every system of interpretation which does not affect the truth of the relations supposed, is equally admissible, and it is true that the same process may,under one scheme of interpretation, represent the solution of a question on the properties of numbers, under another, that of a geometrical problem, and under a third, that of a problem of dynamics or optics." (Boole 1847)
> I don't think this brute force approach is going to work, we need a different way to figure this out, but I'm confident that once figured out, it will seem simple looking back on it.
Why not? Couldn't we use machines to automate this and analyze the results?
They got it down to 473 genes, but let's say that minimal life is some 20 of those genes. A brute force approach would require trying 10^34 combinations[1]. And that's already with the simplifying assumption that 20 genes is the minimal set rather than 19 or 106 or some other subset.
I wonder how they removed the genes? I am not an expert in microbiology. I work with mainly eukaryotes, but I am guessing intergenic distances are not a huge factor here? What about spatial arrangement of the genes? Did they generally leave that alone?
Genes are not independent functional modules. Their placement and arrangement on the genome matters. Did they only mess with coding features (genes)? Or did they also mess with other genomic features? Or do such things just not matter with bacteria?
with a handful of exceptions (for example TER units, and DnaA, which seems to need to be close to the replication origin), for most microbes, placement and arrangement don't matter.
I believe they wanted to modularize the bacteria by placing cassettes of common functionality (e.g. all the tRNAs together) in parts of the genome. I don't know if they are still are working on that, or how it went.
In yeasts, the tRNAs have to be in the same orientation as the replication bubble, but there didn't seem to be that restriction in mycoplasmas.
Ah yes - that's true - I was thinking of operons, not genes. But you can shuffle operons without too many broader consideration of genomic structure. Some prokaryotic organisms seem to have all of their operons pointing in-line with DNA replication (pointing origin -> ter) but IIRC switching the direction of those doesn't seem to make much of a difference either.
The article says they tried the additive/synthesis approach first, which didn't work. And then they tried the subtractive approach, starting with M. Mycoides. And in that case they were left with a lot of "unexplained" genes.
But he reminds us that the subtractive process entirely depends on the starting cell.
I would like to hear more about the difference between additive and subtractive methods -- it wasn't entirely clear from the article.
Additive method: Synthesize genomes with the genes you expect to be necessary, pretty much an optimization problem on the number of genes where you continually try to remove genes that were in previous iterations.
Subtraction method: After reading a bit of the paper, they utilized Tn5, which causes a DNA sequence to be arbitrarily inserted into the DNA (Random locations). This randomly disrupts genes, causing them to likely not function correctly.
Here's the logic:
If all genes were necessary we would expect no cells to live that had mutations.
If no genes were necessary we would expect all cells to have mutations at about an equal rate relative to the space they occupy. (Assuming no bias by the Tn5, but that's a nuance)
What they instead found was that some cells grew, but there were certain genes that were not mutated, meaning that they are likely necessary.
They also classified a few things like studying the growth of the cell (Slow growing, but still viable was classified as "quasi-essential") and implanting their minimal genome in another species, which failed giving evidence that a subset of genes that are sufficient for survival are not necessarily sufficient in all cells.
As far as I can tell, they synthesized the molecule.
A line from the paper...:
We used whole-genome design and complete chemical synthesis to minimize
I did fret for a second over what meaning of "remove"
daemonk was using (from the design? From the instantiation of the genome?) but decided I at least wasn't posting misinformation.
This reads to me as: They took a working system and did parts away until the system failed (or did stgh "odd" towards failure). Then they were looking at the pieces they took away and guessed what functionality these pieces have.
Shouldn't they have replaced them with a gene of the same length, filled with only stop codons?
Otherwise, it would be like cutting out instructions from an assembly language file, without knowing whether one of the instructions does something like adding a constant integer value to the program counter. You might think the code that was removed was critical to the operation of the program, but you could have instead replaced all the instructions with an equivalent number of no-ops, and it would still work as expected.
That's like forking someone's program, but not understanding two thirds of the code. (What's the big deal? It happens!)
If you take someone's 4000-5000 line program and whittle it down to 473 lines which are still somehow useful, "newly created" doesn't apply in full honesty, let alone if you don't know what a third of those lines do.
This is great. This is the proper engineering approach to understanding life. Whittle it down to the minimal reproducing test-case, and poke it with a stick until you understand what all the parts do.
This is just excellent science. It seems like it should be very easy to get these unknown genes to reveal their function now. Very exciting times.
I find it fascinating we still don't really know how life works. I have a hunch we are going to find the large scale 3D structure of the chromosome is a big deal, and these genes regulate it. There aren't many good tools to study chromosome structure and it's quite possible there's a whole layer of information we've missed so far.
Actually there are new approaches to study exactly that, and they are developing rapidly, cf: https://en.wikipedia.org/wiki/Chromosome_conformation_captur...
Basically, this problem, along with a long list of other applications is being attacked by deep sequencing. The way it works is that you apply a treatment to DNA that 'glues' 3D contacts in place and covers the glued segment, apply restriction enzymes to cut out what's not covered by glue, get rid of the glue, sequence all that's left, and then map the remaining sequenced fragments to the reference genome. The output is the relative tendency of different regions to come into contact with each other.
There is a significant branch of microbiology that studies the 3D structure of chromosomes. Epigenetics is the study of how gene expression (rather than composition) affects a phenotype. There is a strong influence of structure, telomeres, centromeres, and expression modifiers that work at a physical level. It will certainly be a major component of functional life. It took decades after genetics were established for the scientific community to accept epigenetics as a valid influence and avenue of study (due to both a lack of methods to accurately study these aspects of an organism and scientific inertia).
The whole story of molecular biology has been discovering layers of control and information: DNA, genes, gene regulation, RNA, RNA splicing, protein structure and regulation, protein interactions, micro-RNAs... and it's usually been limited by the tools that scientists could bring to bear. Possibly there's another control layer in charge of the long range structure of the chromosome. Being wet, squishy and fragile, it's hard to study.
This was my thought as well. Likely the remaining genes represent some kind of meta-process, effecting and/or controlling how the genes we understand are expressed.
Forgive my complete ignorance, but how do they even count genes? Given a very long string of base pairs, how is it possible to know where one gene ends and the next one begins? Do genes overlap?
I think this is a great question. I was vague on this as well but I found this site after a quick search [1]
A gene is a code for making a specific protein. Each gene is a sequence of three-letter "words" called "codons". There are special "start" and "stop" codons which mark the beginning and end of a gene.
From [1], "Mycoplasma genitalium: it has only about 480 genes in its genome of 580 070 nucleotide pairs." So in average, each gene is 1200 nucleotide pairs.
A nucleotide is one of [A,T,C,G]. So that means it encodes 2 bits of information.
That's a great start at least! Posting something that is wildly wrong is the best way to get someone who knows more to interject with corrected information, so hopefully someone with expertise can confirm or deny.
the cellular genome size of RGD3.0, the organism with a 473 genome, is 531kbp according to the paper. this means that the genome encompasses ~531,000 nucleotides (base-pairs). for reference, the human genome encompasses approximately 3.3-billion base-pairs.
Thanks! By the way, when you say "you can download the human genome", presumably this is the genome of a specific individual whose DNA was sequenced? What person is that?
no, it is a consensus sequence determined by sequencing the genomes of several individuals. i do not know how many individuals were sequenced for hg38, but hg37 was derived from 13 individuals according to [1].
genomic regions with higher variability between individuals are annotated separately and are not represented in the main chromosomal sequence, but can be aligned to it as such.
However, the human "reference" genome (the updated main product of the Human Genome Project) is a patchwork of about 20 donors' genomes. Many(most?) of the donors are anonymous volunteers from Buffalo, NY, USA.
I encourage people to read the one-page summary and at least the section headings in the full article.
To answer a few question that have come up:
How did they synthesize the genome? By creating short sequences of DNA and joining them together step by step to make larger sequences. They chemically synthesized short DNA sequences and assembled them into 1.4 thousand base pair (kbp) fragments. Five fragments were assembled into 7 kbp cassettes, which were assembled in yeast to generate 1/8 chunks of the genome, and these chunks were assembled in yeast to create the full genome. (See figure 2.) To try out deletions, they could replace a 1/8 chunk, rather than synthesizing the whole genome.
Why can't they delete all the non-essential genes? Bacterial have a lot of redundant genes, where two different genes provide the same essential function. Just like redundant disks, you can remove one, but not all of them. And you'll get different of minimal genomes depending on which gene you keep.
One interesting thing is that because the growth medium provides almost all the necessary nutrients, they could remove a lot of the metabolic genes, but needed to keep a lot of genes to transport molecules across the cell membrane. You can imagine a minimal cell constructed the opposite way.
Has anyone else here read Nick Lane's new book - The Vital Question - I just finished it and this article really seems like a continuation of some of his theories he talks about.
I found about the book from HN, and have since bought every single other book by him, almost done with Life Ascending now, which is also amazing
Yes. I read The Vital Question after hearing about it on HN a few weeks ago and found it very interesting (although less so nearer the end, as it seemed to get more speculative). To oversimplify, mitochondria are the key to understanding life and eukaryotes. (I should study electron transport more closely.)
If I were a biology/biochem/genetics/etc type of student today this is exactly what I would love to work on. Perhaps someday we will actually understand how life works. That's both exciting and incredibly scary.
I don't know. The definition of "life" seems so arbitrary and nebulous. It seems really more like we're trying to attribute properties based on a wishful interpretation of emergent behaviour that ultimately boils down to extremely convoluted chemical reactions.
Trying to produce genuinely novel and distinct biological lifeforms sounds cool enough on its own though.
Yes, the boundary between life and non-life is not sharp, but to claim it follows that life is hard to define would be the continuum fallacy (https://en.wikipedia.org/wiki/Continuum_fallacy)
Manufacturing a living cell from purely chemical ingredients is a concrete enough goal.
And almost certainly it lies far, far off in the future.
It does involve loving a lot of really mundane and slow work though. Bioscience is a lot more fun to read about then do, because our tools still really really suck (but grad students are plentiful).
"The function of 79 genes is a complete mystery.'We don't know what they provide or why they are essential for life — maybe they are doing something more subtle, something obviously not appreciated yet in biology' "
It is the activation key God puts on each living being... Ain't gonna work without it. =P
My take-away: Work like this is somewhat akin to attempting to determine the specification of Intercal through reverse engineering given a working program and a compiler.
As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown ones and you don't know anything about their dependencies and relationships seems to be pretty much impossible.
> In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life.
That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.