sapsan (u/sapsan) - Readit News

sapsan commented on Ex-Meta scientists debut gigantic AI protein design model nature.com/articles/d4158... · Posted by u/gmays

throwaway24124 · a year ago

Are there any good resources for understanding models like this? Specifically a "protein language model". I have a basic grasp on how LLMs tokenize and encode natural language, but what does a protein language actually look like? An LLM can produce results that look correct but are actually incorrect, how are proteins produced by this model validated? Are the outputs run through some other software to determine whether the proteins are valid?

sapsan · a year ago

I recently saw this about AlphaFold: https://elanapearl.github.io/blog/2024/the-illustrated-alpha.... I don't think it's going to answer all your question but it might still help!

sapsan commented on SynJax: Jax library for efficient probabilistic modeling of structured objects github.com/deepmind/synja... · Posted by u/sapsan

sapsan · 2 years ago

https://twitter.com/milosstanojevic/status/16888965587905208...

sapsan commented on Goodreads has no incentive to be good countercraft.substack.com... · Posted by u/lawgimenez

sapsan · 2 years ago

I have been enjoying https://oku.club/ recently in case anyone is searching for a nicely functioning alternative.

sapsan commented on I should have loved biology jsomers.net/i-should-have... · Posted by u/h2odragon

civilized · 3 years ago

Here's a story to illustrate. Recently there was a headline about some project at MIT that used CRISPR to figure out the function of every protein in a human cell (or something like that, I'm sure I misinterpreted it in some way). I told a friend who is an actual biologist, and he said of course they didn't literally do that, that would be impossible. So I guess what they really did was.... something-something with CRISPR that gave information about a wide range of proteins in the cell, or something. They added a lot of facts to the library. But they marketed it as if they had made a huge stride towards understanding how the whole machine works. That gets people like me more excited. We'd like to know how the machine works and then use that to make it work better.

sapsan · 3 years ago

I believe the parent refers to this [1,2,3] study. Indeed, this was about targeting many (11,923) genes with Perturb-seq (CRISPR screen with single-cell RNA-sequencing readout). There are two human cell lines used in the study (K562 and RPE1). For functional annotation, authors focused on 1,973 targeted genes that had strong transcriptional phenotype after the perturbation. As there's some correlation structure, that's what they studied, annotating clusters of individual perturbations using public databases (like STRING [4]) and literature. Seems like a lot of great work has been done here though stating that we now know all the functions of all the genes might be a bit of a stretch indeed.

[1]: https://news.mit.edu/2022/crispr-based-map-ties-every-human-... [2]: https://www.cell.com/cell/fulltext/S0092-8674(22)00597-9 [3]: https://gwps.wi.mit.edu/ [4]: https://string-db.org/

sapsan commented on The human genome is, at long last, complete rockefeller.edu/news/3208... · Posted by u/marc__1

Eduard · 3 years ago

What's a cell line, and do we know anything about who CHM13 is?

sapsan · 3 years ago

Seems to be an immortalized (telomerase*-transformed) cell line from a female fetus with near-complete homozygosity (https://sites.google.com/ucsc.edu/t2tworkinggroup/chm13-cell...).

* Telomerase is a reverse transcriptase that allows to achieve replicative immortality (https://academic.oup.com/hmg/article/9/3/403/715108).