Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In practice it doesn't matter. What Deep Mind has done far outstrips previous results. Now that we know neural networks for protein predictions is not a dead end, the accuracy will easily improve with time.


Yes, we will just map out/prepare more training data, train better networks. Prepare more and more training data... until we have mapped out all proteins known and there is nothing left to predict for the networks...


The 2020 AlphaFold team mapped OFR8, a protein associated with COVID-19. It seems unlikely that mapping out everything we currently know of would be the end of their research. COVID-19 is probably not the last deadly pathogen we will encounter.


This is exactly how I feel; protein folding is almost chaotic in the sense that tiny differences in specific atomic locations end up having huge functional impacts, which is completely unlike, say, neural machine translation, where a slightly garbled translation is still intelligible for the most part. I don't quite see how this approach to protein folding helps if you can't actually be sure about the predicted structure's functionality without doing the expensive experimental verification.


Well the other way to think about the problem is that protein structure can be robust to sequence changes. Among natural proteins, proteins that look similar can differ up to about 70% in sequence identity (since evolution had a hand in making sure the structure stayed folded as the sequence diverged from its ancestor). So long as some critical members of the sequence are preserved, the protein folds roughly to the same structure. AlphaFold does take advantage of this since part of the algorithm looks for sequence alignments with known proteins.


Huh, I had no clue that this was the case, do you have examples of specific proteins that are like differ in sequence to a high degree but are similar in function?


It helps the experimental phase speed up by a lot. It basically caches previous knowledge so you don't need to repeat experiments in each possible configuration.

Isn't it amazing that the same model (transformer) is now SOTA in both language and proteins? Seems like the real story here is the benefits we could get from the transformer in many different fields, not just NLP.


I think part of the answer here is that you can more easily verify than you can determine. Are x y and z where we expect them? Yes? Looks good.


Except you can’t, right? Figuring out where we expect them to be involves finding out where they are in the first place.


This isn't right.

You can conduct cheap-and-easy experiments to verify the results, as opposed to imaging which doesn't always work anyway.


I don't quite understand this; what's an example of an experiment that verifies whether or not you predicted structure is accurate without determining the actual structure?


I mean, use the ML model to predict where they will be and then do an experiment to confirm it. My understanding from what I've read is that it's easier to make sense of experimental data (and likely cheaper) when you have a good idea what you're looking for.


The more useful outcome is CAD for proteins. Studying existing proteins is just a small fraction of the possible uses.


> until we have mapped out all proteins known and there is nothing left to predict for the networks...

I would imagine there's also benefits to studying _unknown_ proteins, and even being able to work backwards from desired characteristics to discover possible new ones.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: