In Newly Created Life-Form, a Major Mystery

davnn · on June 22, 2016

TL;DR: Venter and his collaborators originally set out to design a stripped-down genome based on what scientists knew about biology. ... With the right tools finally in hand, the researchers designed a set of genetic blueprints for their minimal cell and then tried to build them. Yet “not one design worked," ... So the team took a different and more labor-intensive tack, replacing the design approach with trial and error. They disrupted M. mycoides’ genes, determining which were essential for the bacteria to survive. ... Venter is careful to avoid calling syn3.0 a universal minimal cell. If he had done the same set of experiments with a different microbe, he points out, he would have ended up with a different set of genes. ... In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life. “There are different ways to have a core set of instructions,” ... Venter’s minimal cell is a product not just of its environment, but of the entirety of the history of life on Earth. ... He and others are trying to make more basic life-forms that are representative of these earlier stages of evolution. ... Some scientists say that this type of bottom-up approach is necessary in order to truly understand life’s essence. “If we are ever to understand even the simplest living organism, we have to be able to design and synthesize one from scratch,” ... “We are still far from this goal.”

As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown ones and you don't know anything about their dependencies and relationships seems to be pretty much impossible.

> In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life.

That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.

dnautics · on June 22, 2016

I worked in the synthetic biology lab (on a different project) while the early stages of the Syn3.0 was being done, and also have paid several visits to chat with them in the meantime. [Proof: first author on http://dx.doi.org/10.3390/ijms16012020] I was a fly on the wall for most of the group meetings and even contributed some unpublished results (there were transposons that were causing problems by shuffling the the DNA in the yeast and I hypothesized the orientation that causes the issue and discovered the yeast genes responsible for this process)

Firstly what constitutes "minimal" depends on what you feed the organism. There's a set of bacteria called phytoplasma which are plant parasites that are missing genes to make nucleotides (even mycoplasmas have those) and they've adapted by sucking nucleotides from their hosts.

Some insider information: Most of the mystery essential genes are vague cell wall proteins. Probably what is going on is that if you knock out too many of these genes, you lose cell wall turgidity and the cell becomes nonviable. So what is important is not so much which of these genes you have, but how many of them you have.

Second insider information, for fun: The "hypothetical minimal genome" (syn2.0 is referred to as HMG in the paper) was actually a backronym because we called it the "hail mary genome" but decided that was inappropriate for publication.

stcredzero · on June 22, 2016

The "hypothetical minimal genome"...was actually a backronym because we called it the "hail mary genome" but decided that was inappropriate for publication.

One in-house ORM I worked with was called TFP, because it was entirely written on a plane ride to Houston, and they named it TFP because the authors liked to say they wouldn't touch it again with a ten foot pole. Their company later thought about trying to sell it, with the name "Technique for Persistence."

whoopdedo · on June 22, 2016

> Most of the mystery essential genes are vague cell wall proteins. Probably what is going on is that if you knock out too many of these genes, you lose cell wall turgidity and the cell becomes nonviable. So what is important is not so much which of these genes you have, but how many of them you have.

Sounds like dependency hell. Gene A only works in the presence of Gene B which only works when there's Gene C which needs Gene A. Knock any one of them out and everything breaks.

rosser · on June 23, 2016

> Sounds like dependency hell.

Random mutations are like that. It's almost like no-one was planning out how these things should work, and whatever did work stuck.

Weird, huh?

tadfisher · on June 23, 2016

In other words, God writes spaghetti code.

Eerie · on June 23, 2016

Praise his noodly appendage!

amitprayal · on June 23, 2016

God writes code that writes code that writes code ... that writes spaghetii code

jhou2 · on June 23, 2016

In this simplified system, the number of these cell wall genes in the genome is proportional to the quantity of the cell wall proteins produced. It's more of a quantity issue, rather than a dependency issue. For example, having more developers usually results in more lines of code written in a given period of time, especially in enterprise. The quality of code might be questionable but the quantity is usually there.

phkahler · on June 22, 2016

>> Firstly what constitutes "minimal" depends on what you feed the organism.

So maybe the environment needs to be changed to support an organism with a smaller genome. The world today is certainly different than when life began. It's probably difficult to evolve the environment, so they'd need to understand what goes wrong with gene deletions and figure out how a different environment could help.

civilian · on June 22, 2016

When dnautics says "different environment" he's not talking about the global environment, he's talking about the environment around the cell. A minimal viable cell should be able to survive on water, some dissolved gasses, and an energy source like glucose. Bacteria that have even smaller genomes are doing that by stealing components from other lifeforms.

I'm not exactly sure what went wrong, but I think you're really misunderstanding something dnautics said.

dnautics · on June 23, 2016

> A minimal viable cell should be able to survive on water, some dissolved gasses, and an energy source like glucose.

M. mycoides certainly does not survive on that. Mycoplasmas need cholesterol and lipids because they lack the HMGCoA reductase pathway and FAS, so all the media you grow them on are enriched for that (started as FBS, but we found horse serum worked great and was cheaper). In general, I think you need a few more things than just that for most basic life (nitrogen, phosphorus, metals)... Is using glucose cheating? Because you could use light for energy and CO2 for carbon, but that adds a ton ton ton of genes...

So this becomes a nitpicky definitional conundrum.

Sulfolobus · on June 23, 2016

Hell, we found an intranuclear parasite that didn't even have its own ATP synthase, glycolytic pathway or any other obvious way to generate ATP.

phkahler · on June 23, 2016

>> Bacteria that have even smaller genomes are doing that by stealing components from other lifeforms.

There are no bacteria with smaller genomes today, that was the point of making this one. I was not talking about the global environment either, but the number of options for what substances and concentrations may need to be present in the environment around a cell is certainly very large - making trial and error difficult.

golergka · on June 22, 2016

> So what is important is not so much which of these genes you have, but how many of them you have.

That hints that you could just duplicate the same cell wall protein gene instead, right?

dnautics · on June 22, 2016

presumably. Or just put in fewer of them, behind stronger promoters.

daveguy · on June 22, 2016

> They found that not a single gene is shared across all of life.

It is definitely a function vs identity. Many mutations of a single gene can perform the same function -- there is a lot of redundancy in genetic code.

>If he had done the same set of experiments with a different microbe, he points out, he would have ended up with a different set of genes.

This is very interesting. I expect also, that if they proceeded with their knockout in a different order or different sets at one time they would end up with a different set of genes. Maybe they have some massively parallel way to knockout genes, but I doubt they have explored the entire set of knockouts from the original functioning natural organism. They very likely could be at a local minima of genes required (eg no single gene can be knocked out, but no where near the absolute minimum).

djsumdog · on June 22, 2016

Years ago I read an interesting paper comparing the types of subsystems found in the Linux kernel with subsystems found in biology (or what we understand of them currently).

There is a lot of redundancy in biology, which allows for biological systems to be pretty robust, adaptable and fault tolerant. Of course this also means it's really hard to fix that biological process when things go horribly wrong, such as in many forms of cancer.

Paper: “Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks.” By Koon-Kiu Yan, Gang Fang, Nitin Bhardwaj, Roger Alexander, Mark Gerstein. Proceedings of the National Academy of Sciences, Vol. 107 No. 18, May 4, 2010.

majkinetor · on June 22, 2016

Thanks for that last paper, epic!

Perhaps a good moment to refer to this one:

Cancer tumors as Metazoa 1.0: tapping genes of ancient ancestors http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148211/

majkinetor · on June 22, 2016

This means that we need first to decide on common environment that minimal synthetics will live on. Then, with this environment fixed, we could try to isolate minimal set of genes for multiple different organisms that are originally viable in this environment.

Such environment should also be 'minimal' - i.e. having lowest number of resources required to construct it and lowest amounts of those resouces.

tstactplsignore · on June 22, 2016

> They found that not a single gene is shared across all of life

I think this is a pop-science misunderstanding of what the authors were trying to say. They couldn't find a single unique "minimal set" of genes- but there are genes that we know all organisms share, like the rRNA genes and homologous core ribosomal proteins. All organisms will also require some kind tRNAs and their genes from some source, even if the tRNA genes themselves are not highly homologously conserved.

julienchastang · on June 22, 2016

Exactly. The rRNA peptidyl transferase loop is conserved throughout all life, for example. Like, you say the researchers obviously understand this, but author of the article likely did not.

slg · on June 22, 2016

>In fact, there’s no single set of genes that all living things need in order to exist. ... They found that not a single gene is shared across all of life.

Isn't the current theory that all life on Earth has a common ancestor? Wouldn't this point put that into question and introduce the possibility that life has developed multiple times independently of each other or would this lack of shared genes eventually appear through divergent evolution? It seems like the former option would greatly increase the fl component in the Drake Equation meaning alien life is even more likely than previously thought.

gort · on June 22, 2016

"Isn't the current theory that all life on Earth has a common ancestor? Wouldn't this point put that into question"

Not really. For one thing there's too much commonality across life in terms of things like the genetic code, i.e. what a sequence of genetic letters "means". The genetic code is arbitrary but essentially universal with minor variants.

For another thing, the organisms now present can be separated from each other by as much as 8 billion years of evolution (4 billion on each branch) meaning there's no particular reason to expect some shared gene to be present everywhere. New genes arise as mutations from old ones, old genes become obsolete and get removed, etc.

It's vaguely like how, if you fork a coding project, and the two projects are allowed to continue indefinitely, eventually there might be no code in common, even without a complete rewrite ever happening.

jameshart · on June 22, 2016

Deep down, somewhere in the genetic code, it still says "Mozilla (compatible)"

spacecop · on June 22, 2016

Not when your browser fork is re-purposed for something else almost entirely, like an inline image or documentation viewer. As I understand it, organisms often have novel uses for old tools.

schoen · on June 22, 2016

That explains why a human being is like a gecko.

Someone · on June 22, 2016

I would guess that the earliest life didn't accidentally hit upon optimal or even reasonably good ways to do whatever is needed to reproduce.

If that's the case, it isn't that unlikely that every evolutionary branch managed to improve (or discard) every part of the original machinery in some way.

pluma · on June 22, 2016

Depends on what kind of life you consider. It's true for mammals. It's true for animals. It's probably even true further up our ancestry. But apparently it doesn't mean that "all things that have genes" share a subset of genes from the same common ancestor. As I understand it that's because they share a common ancestor that didn't have genes (but at that point I'm not sure whether "ancestor" is even the right word).

Zigurd · on June 22, 2016

Only if all life converges on having dna. If there are multiple origins of life that survive there would probably be multiple different self replicating molecules.

ThomPete · on June 22, 2016

There is a great book written 20 years ago by Steve Grand called "Creation: Life and how to make it"[1] it basically uses the game Creatures[2] as the testing environment for creating Artificial Life. There are some great thoughts in that book also about how science needs more cross disciplinary experts to connect the dot's between insights in one field and apply it to others.

[1] https://www.amazon.com/Creation-Life-Make-Steve-Grand/dp/067...

[2] http://creatures.wikia.com/wiki/Creatures

thaumasiotes · on June 22, 2016

> I deeply believed that organisms share the same basic set of genes

They do. What you're responding to says there's no one gene shared by all organisms. The genetic intersection of any two organisms is large; the genetic intersection over all organisms is empty.

0xfeba · on June 22, 2016

> I deeply believed that organisms share the same basic set of genes.

Many groups of organisms do. All organisms is a tall order, if you think about it.

jhoechtl · on June 22, 2016

The more I think about that the more makes me believe that the Genome is not the smallest order which defines "life".

Maybe all is hidden behind getting blurry because of Heisebergs uncertainty principle.

alextheparrot · on June 22, 2016

If you consider the genome to be information, life is effectively the process that information supports by encoding a method to accomplish the process. Replicating and using energy in a coordinated manner are two processes of life, but their information-based encoding can differ significantly based on the method used.

Florin_Andrei · on June 22, 2016

It's the process of making things work in a cell. Turns out, there are more than one encoding for how to make that happen.

amelius · on June 22, 2016

> That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.

Yes, isn't there a universal gene that encodes the RNA polymerase enzyme? I.e., the machinery that transcribes DNA into RNA?

dnautics · on June 23, 2016

probably the ribosomes and tRNAs are the most universal-ish genes.

RNA Pol is very diverse across clades.

rcthompson · on June 22, 2016

My understanding is that all known organisms share at least the ribosome RNA genes. Unless you're counting viruses, which is not exactly fair since they hijack the host's ribosomes.

cowsandmilk · on June 22, 2016

but there is RPL1 up to RPL41, and same for RPS and then MRPL and MRPS , so despite the similarities, one can argue different organisms may have different rRNA genes.

rcthompson · on June 22, 2016

RPL1 and such are genes for ribosomal proteins. I'm talking about the genes that encode the ribosomal RNA subunits. These genes are used to infer the evolutionary relationships of the tree of life, so presumably they are shared across all known organisms, or else it would not be possible to infer a phylogenetic tree.

stcredzero · on June 22, 2016

Is there a single set of instructions needed for Turing Completeness? It's possible to have Turing Completeness in a random access machine with just one instruction. I should think you really need to satisfy certain capabilities, and there would be many ways to satisfy those.

nickpsecurity · on June 22, 2016

https://news.ycombinator.com/item?id=9751312

searine · on June 22, 2016

>As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown...seems to be pretty much impossible.

The point isn't the minimal genome, it's to identify the genes important for a minimal genome.

Genetics is all about filtering the signal from the noise. Now we know that these 175 genes are important, we can start focusing efforts on them.

The problem isn't that we don't know how to figure out what a gene does, it's that there are so many goddamn genes we don't know where to start.

naasking · on June 23, 2016

> That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.

It should seem obvious in retrospect. Turing machines have many incarnations as different instruction sets in modern CPUs. I don't see why cells wouldn't have analogous multiple expressions, unless your deep belief was that there was only one way for protein chains to replicate.

alfiedotwtf · on June 22, 2016

> As @sixQuarks has already written, finding the minimal amount of genes when there are 175 unknown ones and you don't know anything about their dependencies and relationships seems to be pretty much impossible.

Is there any way at all at simulating the tests? I'd like to apply metaheuristics to search for which combinatorics work

smaddox · on June 24, 2016

This would require a quantum chemistry simulation many orders of magnitude larger and longer than what we currently capable of simulating. Though I suppose you could in principle use heuristics to greatly reduce the conputational complexity.

Retric · on June 22, 2016

Not all life has DNA.

julienchastang · on June 22, 2016

This comment is not as absurd as everyone seems to think. It is possible that life started out as RNA[1] on Earth or elsewhere in the galaxy[2].

[1] https://en.wikipedia.org/wiki/RNA_world [2] https://en.wikipedia.org/wiki/Panspermia

davnn · on June 22, 2016

Do you know an article about that? Did not find anything about a life form without DNA, except http://news.nationalgeographic.com/news/2010/12/101202-nasa-....

The definition of life itself is not really obvious to me either, some sources count viruses as life other don't.

Retric · on June 22, 2016

The definition of life is a tricky question. The most obvious example is red blood cells dump their DNA. So they can't replicate, and virus can't target them. However, insect drones also can't reproduce and are considered alive so it's a little more open to debate than you might think.

As to life without DNA. The methodology is somewhat in question for this but here is a more in depth link: http://science.sciencemag.org/content/332/6034/1163.full Now, this is really 1:1 with DNA just using slightly different chemistry, but it's hard to call something DNA when you swap out one of the building blocks.

Other examples are hard to locate in large part because you can't direct detect this stuff.

Anyway, my point was while a virus is 'stuck' using DNA as it needs to infect things that use DNA. However, they can use RNA internally.

PS: For a really out there example, prions seem really close to life.

kens · on June 22, 2016

The paper you cite about the bacterium that uses arsenic instead of phosphorus in its DNA was conclusively discredited. http://www.nature.com/news/arsenic-life-bacterium-prefers-ph...

Retric · on June 22, 2016

This does not mean that arsenate does not get into the bacteria, he points out. “It just shows that this bacterium has evolved to extract phosphate under almost all circumstances.”

Yes, it really wants phosphorus. But, a strong preference does not create phosphorus when there is none to be found.

Not that single cells really want or try etc, but you get the idea.

derekp7 · on June 22, 2016

One definition that I like to include in defining life, is something that uses energy to decrease (or maintain?) local entropy.

Lyaserkiev · on June 22, 2016

I'm guessing he means RNA viruses https://en.m.wikipedia.org/wiki/RNA_virus

dekhn · on June 22, 2016

I'm not aware of any life which lacks DNA. This is kind of "definitional" - really depends on your definition of life, but I don't include viruses, because they don't have a regulated metabolism.

gr3yh47 · on June 22, 2016

>That's the most interesting point to me, I deeply believed that organisms share the same basic set of genes.

this seems to support the idea that instead of evolving from a single ancestral life form, all beings were created with care

edit: or at least would seem to support the weaker assertion that singl life form ancestry evolutionary theory is inaccurate.

if this is logically incorrect please feel free to tell my why. downvotes are not for disagreement!

pluma · on June 22, 2016

It's obviously incorrect. It's not even really a logic error, it's a category error.

At face value the statement "if not all living things share a common ancestor, that makes them appear more like they were intentionally created" (although that line of reasoning, i.e. the watchmaker argument, has been debunked time and time again) seems reasonable.

However the statement seems far more grandiose than it actually is.

If two very "primitive" organisms (i.e. operating at a much lower complexity with much smaller and fewer moving parts than even, say, an earthworm) don't share their sets of DNA that may be unexpected (because whatever step there was from "not having DNA" to "having DNA" had to happen twice, separately, successfully).

If we found out that, say, all higher lifeforms shared no common DNA with other lifeforms outside their group and those groups closely aligned to the Christian "kinds" (which don't really map to biologically distinct groups in meaningful or consistent ways) THAT would be astounding.

But even that extreme case wouldn't make the "creation hypothesis" (if you even want to call it that) more plausible because it presumes the existence of an unexplained organism that is far more complex than anything it supposedly created. It doesn't explain anything -- it just shuts up questions about the "origin" by giving an answer that is impossible to analyse further.

There may have been a creator. But it's impossible to make a consistent logical argument for it. And it's entirely impossible to make an argument that that creator would in any way resemble something modern mainstream religions are worshipping.

no_flags · on June 22, 2016

Interesting work, but it reminds me of the famous "Could a biologist fix a radio?" paper [1]. The paper imagines biologists trying to determine what parts of a radio are essential using a similar trial and error technique. As you can imagine, this technique would lead to many erroneous conclusions, especially when paired with the "publish or perish" and "gold rush" mentalities so prevalent in academia. It makes you wonder if we are trying to understand biological systems with a fundamentally wrong approach.

[1] https://www.cmu.edu/biolphys/deserno/pdf/can_a_biologist_fix...

entee · on June 22, 2016

This criticism of the way biologists/geneticists/biochemists approach problems is I think quite misguided and misunderstands both limitation of testing things you can't manipulate with your bare hands, and how subtractive (take stuff out, see what happens) methods fit into the larger process of biological inquiry.

First, when you can't manipulate things directly, it becomes more difficult to tweak systems slightly to learn how they work. This is especially true if you start with something made from things you don't understand, and for which you may not even have a complete catalog. The radio paper raises some issues and acknowledges these challenges, but then sort of glosses over them, saying, "it's complicated, but that's just because we don't understand and don't use the right language."

Second, biologists don't just use subtractive methods, they try to use additive approaches to complement and test the hypotheses that come out of "it's broken when I take this bit out." Biochemists also try to reproduce complex behaviors by mixing only the purified components of the system in question. Biochemists and biophysicists also often measure very precisely physical and chemical properties to mathematically define the behavior of biological components.

The point is biology is hard, and we barely know anything about it. We have to start with simple experiments that point us to what's important and then progressively do better defined experiments to figure out how it works in detail. The next step here is: now that we know what proteins are important, let's figure out how they work exactly. We couldn't have gotten there before trying to knock out as many parts as possible.

TL;DR: this isn't the wrong approach, it's the right one, and given the limitations of the system, realistically the only one we have at this stage. It's also only the first step, now we get to try other approaches on what we've learned.

Houshalter · on June 22, 2016

Also see, Could a neuroscientist understand a microprocessor? http://biorxiv.org/content/biorxiv/early/2016/05/26/055624.f...

They use various neuroscience techniques to see how far they can get reverse engineering an Atari.

roadnottaken · on June 22, 2016

> It makes you wonder if we are trying to understand biological systems with a fundamentally wrong approach.

Got a better idea?

no_flags · on June 22, 2016

The last couple pages of the aforementioned paper give some thoughts on that. The author suggests an approach more like engineering that is focused on formalized mathematical descriptions of biological systems.

scott_s · on June 22, 2016

The submitted article talks about attempts to design a minimal set of genes, but they kept failing. The difficulty with your comment is it sounds like you're dismissing the work; I'm sure they are aware of that paper and its lessons. They tried working in that direction and failed.

pazimzadeh · on June 22, 2016

Interesting, thanks for the link. I wonder about their trial and error approach as well. Scientists have conveniently broken down genetic material into individual genes which are easy to think about, but biology doesn't care about being easy to think about. It seems like to get anywhere they would have to knock down genes in all sorts of different combinations, not just one at a time.

sixQuarks · on June 22, 2016

I'm afraid it's not the individual genes themselves that are important to life, but a specific combination of those genes. Further complicating the problem is that we have no idea how many genes comprise an "essential group". Is it a combination of 2 genes? 3, 5, 10?

When you're talking about 175 unknown genes, the combination of all of these is a huge number. It's like finding a needle in a haystack the size of the solar system.

I don't think this brute force approach is going to work, we need a different way to figure this out, but I'm confident that once figured out, it will seem simple looking back on it.

SilasX · on June 22, 2016

Also, isn't there an issue of "gene" being something of an unnatural category? Nature doesn't provide a clean labeling of what part of the sequence is "a" gene, and the term historically existed (AIUI) to denote regions that the researcher found to have phenotypic relevance. So I'm not sure it's helpful to phrase a minimum in terms of "n genes". See:

https://en.wikipedia.org/wiki/Gene#Functional_definitions

thaumasiotes · on June 22, 2016

> the term historically existed (AIUI) to denote regions that the researcher found to have phenotypic relevance

That's Mendelian genetics. Geneticists focused on DNA as a physical molecule rather than a conceptual unit of inheritance use the same word to refer to a region of DNA that produces a protein.

They're both important concepts, but they're not the same concept.

SilasX · on June 24, 2016

Okay I got the terms wrong but I think the core point stands: a gene is typically defined with respect to some research objective and area of interest, and is highly variable depending on what you're looking at. It doesn't (yet, at least) have a natural boundary. People looking at one phenomenon might see one boundary for a a gene, while others see another.

In fact, I think that term is great for comparison: "phenomenon". Asking for the "minimum genes for viable life" seems like asking for the "minimum number of natural phenomena to put a man on the moon". Are those two phenomena really just different manifestations of "gravity" or are they two things in their own right? "It depends."

dragonwriter · on June 22, 2016

Arguably, the production of a protein is nearly the smallest element of the observable characteristic of an organism produced by the interaction of its genotype with the environment, and thus sort of the fundamental unit of phenotype, so, from that perspective, they are exactly the same thing.

thaumasiotes · on June 22, 2016

They are not the same thing from any useful perspective. You can remove a (protein-coding) gene, or replace it, without altering the phenotype at all. You can make changes outside protein-coding regions and get large phenotypic changes.

grandalf · on June 22, 2016

This is an excellent point. It also illustrates how a useful abstraction at one level can hinder reasoning at another.

gtycomb · on June 22, 2016

I am thinking of George Boole (boolean algebra and logic) who wrote in the 18th centuary -- that a symbol (an individual gene in this case) in itself is not meaningful but that its meaning depends on how it combines with with other symbols. Quote- "..the validity of the processes of analysis does not depend upon the interpretation of the symbols which are employed, but solely on their laws of combination. Every system of interpretation which does not affect the truth of the relations supposed, is equally admissible, and it is true that the same process may,under one scheme of interpretation, represent the solution of a question on the properties of numbers, under another, that of a geometrical problem, and under a third, that of a problem of dynamics or optics." (Boole 1847)

smaili · on June 22, 2016

> I don't think this brute force approach is going to work, we need a different way to figure this out, but I'm confident that once figured out, it will seem simple looking back on it.

Why not? Couldn't we use machines to automate this and analyze the results?

alister · on June 22, 2016

They got it down to 473 genes, but let's say that minimal life is some 20 of those genes. A brute force approach would require trying 10^34 combinations[1]. And that's already with the simplifying assumption that 20 genes is the minimal set rather than 19 or 106 or some other subset.

[1] http://www.wolframalpha.com/input/?i=473+choose+20

sixQuarks · on June 22, 2016

When you're talking about 175 different genes, there are far more combinations of genes than there are atoms in the universe.

Perhaps a quantum computer can tackle this kind of problem, but nothing we have today can brute-force the solution.

JimmyM · on June 22, 2016

I think it might take a while.

http://coolconversion.com/math/factorial/What-is-the-number-...

daemonk · on June 22, 2016

I wonder how they removed the genes? I am not an expert in microbiology. I work with mainly eukaryotes, but I am guessing intergenic distances are not a huge factor here? What about spatial arrangement of the genes? Did they generally leave that alone?

Genes are not independent functional modules. Their placement and arrangement on the genome matters. Did they only mess with coding features (genes)? Or did they also mess with other genomic features? Or do such things just not matter with bacteria?

dnautics · on June 22, 2016

with a handful of exceptions (for example TER units, and DnaA, which seems to need to be close to the replication origin), for most microbes, placement and arrangement don't matter.

I believe they wanted to modularize the bacteria by placing cassettes of common functionality (e.g. all the tRNAs together) in parts of the genome. I don't know if they are still are working on that, or how it went.

In yeasts, the tRNAs have to be in the same orientation as the replication bubble, but there didn't seem to be that restriction in mycoplasmas.

dekhn · on June 22, 2016

I thought most bacterial have their critical biochemistry genes arranged in a specific order so that they can be all be cotranscribed (for efficiency): https://en.wikipedia.org/wiki/Gene_cluster#Prokaryotic_gene_...

dnautics · on June 22, 2016

Ah yes - that's true - I was thinking of operons, not genes. But you can shuffle operons without too many broader consideration of genomic structure. Some prokaryotic organisms seem to have all of their operons pointing in-line with DNA replication (pointing origin -> ter) but IIRC switching the direction of those doesn't seem to make much of a difference either.

maxerickson · on June 22, 2016

If I understand correctly, they built the technology to synthesize arbitrary genomes. So they remove genes by omitting them from the synthesis.

chubot · on June 22, 2016

The article says they tried the additive/synthesis approach first, which didn't work. And then they tried the subtractive approach, starting with M. Mycoides. And in that case they were left with a lot of "unexplained" genes.

But he reminds us that the subtractive process entirely depends on the starting cell.

I would like to hear more about the difference between additive and subtractive methods -- it wasn't entirely clear from the article.

alextheparrot · on June 22, 2016

Additive method: Synthesize genomes with the genes you expect to be necessary, pretty much an optimization problem on the number of genes where you continually try to remove genes that were in previous iterations.

Subtraction method: After reading a bit of the paper, they utilized Tn5, which causes a DNA sequence to be arbitrarily inserted into the DNA (Random locations). This randomly disrupts genes, causing them to likely not function correctly.

Here's the logic: If all genes were necessary we would expect no cells to live that had mutations.

If no genes were necessary we would expect all cells to have mutations at about an equal rate relative to the space they occupy. (Assuming no bias by the Tn5, but that's a nuance)

What they instead found was that some cells grew, but there were certain genes that were not mutated, meaning that they are likely necessary.

They also classified a few things like studying the growth of the cell (Slow growing, but still viable was classified as "quasi-essential") and implanting their minimal genome in another species, which failed giving evidence that a subset of genes that are sufficient for survival are not necessarily sufficient in all cells.

maxerickson · on June 22, 2016

Yes, for the design of the genome.

As far as I can tell, they synthesized the molecule.

A line from the paper...:

We used whole-genome design and complete chemical synthesis to minimize

I did fret for a second over what meaning of "remove" daemonk was using (from the design? From the instantiation of the genome?) but decided I at least wasn't posting misinformation.

jhoechtl · on June 22, 2016

This reads to me as: They took a working system and did parts away until the system failed (or did stgh "odd" towards failure). Then they were looking at the pieces they took away and guessed what functionality these pieces have.

logfromblammo · on June 22, 2016

Shouldn't they have replaced them with a gene of the same length, filled with only stop codons?

Otherwise, it would be like cutting out instructions from an assembly language file, without knowing whether one of the instructions does something like adding a constant integer value to the program counter. You might think the code that was removed was critical to the operation of the program, but you could have instead replaced all the instructions with an equivalent number of no-ops, and it would still work as expected.

XorNot · on June 22, 2016

Stop codons aren't a noop either though, since genes can encode different functionality based on "intentional" frameshifts by various means.

kazinator · on June 22, 2016

That's like forking someone's program, but not understanding two thirds of the code. (What's the big deal? It happens!)

If you take someone's 4000-5000 line program and whittle it down to 473 lines which are still somehow useful, "newly created" doesn't apply in full honesty, let alone if you don't know what a third of those lines do.

fnovd · on June 22, 2016

In this analogy, they're not looking for the code to "work". They simply want it to compile.

entropicmind · on June 22, 2016

They do want it to work, they just want it to be the MVP.

majkinetor · on June 22, 2016

That is wrong analogy. Compiling the code doesn't mean it will run.

oldmanjay · on June 22, 2016

Maybe life is written in Haskell

darawk · on June 22, 2016

This is great. This is the proper engineering approach to understanding life. Whittle it down to the minimal reproducing test-case, and poke it with a stick until you understand what all the parts do.

This is just excellent science. It seems like it should be very easy to get these unknown genes to reveal their function now. Very exciting times.

sevenless · on June 22, 2016

I find it fascinating we still don't really know how life works. I have a hunch we are going to find the large scale 3D structure of the chromosome is a big deal, and these genes regulate it. There aren't many good tools to study chromosome structure and it's quite possible there's a whole layer of information we've missed so far.

roye · on June 22, 2016

Actually there are new approaches to study exactly that, and they are developing rapidly, cf: https://en.wikipedia.org/wiki/Chromosome_conformation_captur... Basically, this problem, along with a long list of other applications is being attacked by deep sequencing. The way it works is that you apply a treatment to DNA that 'glues' 3D contacts in place and covers the glued segment, apply restriction enzymes to cut out what's not covered by glue, get rid of the glue, sequence all that's left, and then map the remaining sequenced fragments to the reference genome. The output is the relative tendency of different regions to come into contact with each other.

sevenless · on June 22, 2016

> The 3-D organization of the genome can also be analyzed via eigendecomposition of the contact matrix.

Love it when linear algebra pops up in unexpected places!

daveguy · on June 22, 2016

There is a significant branch of microbiology that studies the 3D structure of chromosomes. Epigenetics is the study of how gene expression (rather than composition) affects a phenotype. There is a strong influence of structure, telomeres, centromeres, and expression modifiers that work at a physical level. It will certainly be a major component of functional life. It took decades after genetics were established for the scientific community to accept epigenetics as a valid influence and avenue of study (due to both a lack of methods to accurately study these aspects of an organism and scientific inertia).

edit: pedantics.

dekhn · on June 22, 2016

that's not biochemistry- it's structural biology. Biochemistry focuses almost entirely on reducible components in isolation.

grey413 · on June 22, 2016

That's actually already an area of active study.

See http://www.sciencedirect.com/science/article/pii/S1534580708...

jarmitage · on June 22, 2016

You mean like this:

"Second layer of information in DNA confirmed"

http://phys.org/news/2016-06-layer-dna.html

sevenless · on June 22, 2016

That's a theoretical study by physicists; it's a bit of a misleading headline, and anyway it's for eukaryotes, not bacteria, which are much smaller.

jarmitage · on June 23, 2016

Thanks for putting that into context.

MOARDONGZPLZ · on June 22, 2016

>it's quite possible there's a whole layer of information we've missed so far

Can you expand on this comment? What would such a layer look like? Have we missed layers before and then found them?

sevenless · on June 22, 2016

The whole story of molecular biology has been discovering layers of control and information: DNA, genes, gene regulation, RNA, RNA splicing, protein structure and regulation, protein interactions, micro-RNAs... and it's usually been limited by the tools that scientists could bring to bear. Possibly there's another control layer in charge of the long range structure of the chromosome. Being wet, squishy and fragile, it's hard to study.

Practicality · on June 22, 2016

This was my thought as well. Likely the remaining genes represent some kind of meta-process, effecting and/or controlling how the genes we understand are expressed.

fauigerzigerk · on June 22, 2016

Forgive my complete ignorance, but how do they even count genes? Given a very long string of base pairs, how is it possible to know where one gene ends and the next one begins? Do genes overlap?

warrenpj · on June 22, 2016

I think this is a great question. I was vague on this as well but I found this site after a quick search [1]

A gene is a code for making a specific protein. Each gene is a sequence of three-letter "words" called "codons". There are special "start" and "stop" codons which mark the beginning and end of a gene.

[1] http://genetics.thetech.org/about-genetics/how-do-genes-work

pazimzadeh · on June 22, 2016

https://en.wikipedia.org/wiki/Gene_prediction is usually done by searching for https://en.wikipedia.org/wiki/Open_reading_frame

ceejayoz · on June 22, 2016

https://en.wikipedia.org/wiki/Start_codon and https://en.wikipedia.org/wiki/Stop_codon

haberman · on June 22, 2016

How much information is 473 genes? How many bytes does this represent, in a compact but "raw" encoding? What about if you compressed it with gzip?

Also what are the raw/compressed sizes for a human genome?

I have wondered this for a long time but never seem to find a concrete answer.

gberger · on June 22, 2016

From [1], "Mycoplasma genitalium: it has only about 480 genes in its genome of 580 070 nucleotide pairs." So in average, each gene is 1200 nucleotide pairs.

A nucleotide is one of [A,T,C,G]. So that means it encodes 2 bits of information.

473 * 1200 * 2 = 1 135 200 bits, or 141.9 kilobyte.

Of course, this is coming from a software developer using numbers from Google, so I might be wildly wrong.

[1] "Molecular biology of the cell" by Bruce Alberts

haberman · on June 22, 2016

That's a great start at least! Posting something that is wildly wrong is the best way to get someone who knows more to interject with corrected information, so hopefully someone with expertise can confirm or deny.

gberger · on June 28, 2016

Cunningham's Law

pYQAJ6Zm · on June 23, 2016

141.9 kB! In the future, I expect there will be a demoscene of biosynthetics.

amphitoky · on June 22, 2016

the cellular genome size of RGD3.0, the organism with a 473 genome, is 531kbp according to the paper. this means that the genome encompasses ~531,000 nucleotides (base-pairs). for reference, the human genome encompasses approximately 3.3-billion base-pairs.

you can download the human genome from here http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.... (it is recommended to use ftp, see: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/). it is an ~800mb download.

if it takes 800 mb to store 3.3 billion bases, i would assume it takes (531k / 3.3b) * 800 mb = 0.141 mb to store this genome.

haberman · on June 22, 2016

Thanks! By the way, when you say "you can download the human genome", presumably this is the genome of a specific individual whose DNA was sequenced? What person is that?

amphitoky · on June 22, 2016

no, it is a consensus sequence determined by sequencing the genomes of several individuals. i do not know how many individuals were sequenced for hg38, but hg37 was derived from 13 individuals according to [1].

genomic regions with higher variability between individuals are annotated separately and are not represented in the main chromosomal sequence, but can be aligned to it as such.

[1] https://en.wikipedia.org/wiki/Reference_genome

ealloc · on June 22, 2016

You can download Craig Venter's genome here:

http://hgdownload.soe.ucsc.edu/goldenPath/venter1/bigZips/

However, the human "reference" genome (the updated main product of the Human Genome Project) is a patchwork of about 20 donors' genomes. Many(most?) of the donors are anonymous volunteers from Buffalo, NY, USA.

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

meursault334 · on June 22, 2016

Article summary and Article http://www.cba.mit.edu/docs/papers/16.04.minimal.pdf

edit: appears to be the full article (scroll down past summary)

kens · on June 22, 2016

I encourage people to read the one-page summary and at least the section headings in the full article.

To answer a few question that have come up:

How did they synthesize the genome? By creating short sequences of DNA and joining them together step by step to make larger sequences. They chemically synthesized short DNA sequences and assembled them into 1.4 thousand base pair (kbp) fragments. Five fragments were assembled into 7 kbp cassettes, which were assembled in yeast to generate 1/8 chunks of the genome, and these chunks were assembled in yeast to create the full genome. (See figure 2.) To try out deletions, they could replace a 1/8 chunk, rather than synthesizing the whole genome.

Why can't they delete all the non-essential genes? Bacterial have a lot of redundant genes, where two different genes provide the same essential function. Just like redundant disks, you can remove one, but not all of them. And you'll get different of minimal genomes depending on which gene you keep.

One interesting thing is that because the growth medium provides almost all the necessary nutrients, they could remove a lot of the metabolic genes, but needed to keep a lot of genes to transport molecules across the cell membrane. You can imagine a minimal cell constructed the opposite way.

The article doesn't mention that in their first synthetic bacterium (2010), they encoded text (actually HTML!) in the DNA to provide a secret watermark. See http://www.righto.com/2010/06/using-arc-to-decode-venters-se...

misiti3780 · on June 22, 2016

Has anyone else here read Nick Lane's new book - The Vital Question - I just finished it and this article really seems like a continuation of some of his theories he talks about.

I found about the book from HN, and have since bought every single other book by him, almost done with Life Ascending now, which is also amazing

kens · on June 23, 2016

Yes. I read The Vital Question after hearing about it on HN a few weeks ago and found it very interesting (although less so nearer the end, as it seemed to get more speculative). To oversimplify, mitochondria are the key to understanding life and eukaryotes. (I should study electron transport more closely.)

coldcode · on June 22, 2016

If I were a biology/biochem/genetics/etc type of student today this is exactly what I would love to work on. Perhaps someday we will actually understand how life works. That's both exciting and incredibly scary.

pluma · on June 22, 2016

I don't know. The definition of "life" seems so arbitrary and nebulous. It seems really more like we're trying to attribute properties based on a wishful interpretation of emergent behaviour that ultimately boils down to extremely convoluted chemical reactions.

Trying to produce genuinely novel and distinct biological lifeforms sounds cool enough on its own though.

sevenless · on June 22, 2016

Yes, the boundary between life and non-life is not sharp, but to claim it follows that life is hard to define would be the continuum fallacy (https://en.wikipedia.org/wiki/Continuum_fallacy)

Manufacturing a living cell from purely chemical ingredients is a concrete enough goal.

And almost certainly it lies far, far off in the future.

XorNot · on June 22, 2016

It does involve loving a lot of really mundane and slow work though. Bioscience is a lot more fun to read about then do, because our tools still really really suck (but grad students are plentiful).

callesgg · on June 22, 2016

If the cell had a log it would be filled to the brink with warnings and errors :)

Thinking if it like: Just remove files from the OS until it wont start :)

otto_ortega · on June 22, 2016

"The function of 79 genes is a complete mystery.'We don't know what they provide or why they are essential for life — maybe they are doing something more subtle, something obviously not appreciated yet in biology' "

It is the activation key God puts on each living being... Ain't gonna work without it. =P

pluma · on June 22, 2016

So you're proposing the creation of the Church of Genuine Advantage?

malydok · on June 22, 2016

The private key in the L1F3 encryption algorithm.

drabiega · on June 22, 2016

My take-away: Work like this is somewhat akin to attempting to determine the specification of Intercal through reverse engineering given a working program and a compiler.

dekhn · on June 22, 2016

One of the things that originally attracted me to biology was the idea that it was like reverse engineering a computer system with no manual.

After a while that got old.

phieromnimon · on June 22, 2016

So it's like a unikernel of life?

IncRnd · on June 23, 2016

It's closer to a microkernel.

k26dr · on June 23, 2016

Anybody else find it suspicious that 2 quanta magazine stories were the top two stories on HN today?

rgtk · on June 23, 2016

That should be us.

LionessLover · on June 22, 2016

The actual study can be accessed for free on "The Pirate-Bay of Science publications" Sci-Hub:

http://science.sciencemag.org.sci-hub.cc/content/351/6280/aa...

About Sci-Hub: https://en.wikipedia.org/wiki/Sci-Hub

peter303 · on June 22, 2016

Four months old news. I will Dr Venter in Aspen Saturday.