Readit News logoReadit News
sapsan commented on Ex-Meta scientists debut gigantic AI protein design model   nature.com/articles/d4158... · Posted by u/gmays
throwaway24124 · a year ago
Are there any good resources for understanding models like this? Specifically a "protein language model". I have a basic grasp on how LLMs tokenize and encode natural language, but what does a protein language actually look like? An LLM can produce results that look correct but are actually incorrect, how are proteins produced by this model validated? Are the outputs run through some other software to determine whether the proteins are valid?
sapsan · a year ago
I recently saw this about AlphaFold: https://elanapearl.github.io/blog/2024/the-illustrated-alpha.... I don't think it's going to answer all your question but it might still help!
sapsan commented on Goodreads has no incentive to be good   countercraft.substack.com... · Posted by u/lawgimenez
sapsan · 2 years ago
I have been enjoying https://oku.club/ recently in case anyone is searching for a nicely functioning alternative.
sapsan commented on I should have loved biology   jsomers.net/i-should-have... · Posted by u/h2odragon
civilized · 3 years ago
Here's a story to illustrate. Recently there was a headline about some project at MIT that used CRISPR to figure out the function of every protein in a human cell (or something like that, I'm sure I misinterpreted it in some way). I told a friend who is an actual biologist, and he said of course they didn't literally do that, that would be impossible. So I guess what they really did was.... something-something with CRISPR that gave information about a wide range of proteins in the cell, or something. They added a lot of facts to the library. But they marketed it as if they had made a huge stride towards understanding how the whole machine works. That gets people like me more excited. We'd like to know how the machine works and then use that to make it work better.
sapsan · 3 years ago
I believe the parent refers to this [1,2,3] study. Indeed, this was about targeting many (11,923) genes with Perturb-seq (CRISPR screen with single-cell RNA-sequencing readout). There are two human cell lines used in the study (K562 and RPE1). For functional annotation, authors focused on 1,973 targeted genes that had strong transcriptional phenotype after the perturbation. As there's some correlation structure, that's what they studied, annotating clusters of individual perturbations using public databases (like STRING [4]) and literature. Seems like a lot of great work has been done here though stating that we now know all the functions of all the genes might be a bit of a stretch indeed.

[1]: https://news.mit.edu/2022/crispr-based-map-ties-every-human-... [2]: https://www.cell.com/cell/fulltext/S0092-8674(22)00597-9 [3]: https://gwps.wi.mit.edu/ [4]: https://string-db.org/

sapsan commented on The human genome is, at long last, complete   rockefeller.edu/news/3208... · Posted by u/marc__1
Eduard · 3 years ago
What's a cell line, and do we know anything about who CHM13 is?
sapsan · 3 years ago
Seems to be an immortalized (telomerase*-transformed) cell line from a female fetus with near-complete homozygosity (https://sites.google.com/ucsc.edu/t2tworkinggroup/chm13-cell...).

* Telomerase is a reverse transcriptase that allows to achieve replicative immortality (https://academic.oup.com/hmg/article/9/3/403/715108).

u/sapsan

KarmaCake day77January 31, 2015View Original