x3blah (u/x3blah) - Readit News

x3blah commented on Scraping Roger Ebert’s reviews and finding his favorite movies on Amazon Prime linisnil.com/articles/scr... · Posted by u/catwind7

gittes · 5 years ago

Don't mean to undermine this guy's screen-scraping adventure... but if you want to use something that will tell you all the streaming services that have the list of movies, you can use:

https://letterboxd.com/dvideostor/list/roger-eberts-great-mo...

You can look at each movie to see what streaming service it's on one at a time for free.

If you have a pro paid account, you can even do:

https://letterboxd.com/dvideostor/list/roger-eberts-great-mo...

Which shows that there are 39 movies in Amazon Prime US from Ebert's "Great Movies," not 21 like this guy's spreadsheet says.

To be fair, the exercise was to scrape the reference sources... so it might just need some refinement.

Need to double check though if both lists are correct, only confirmed number totals.

Full disclosure: That letterboxd list is not mine, I just found it

x3blah · 5 years ago

"Which shows that there are 39 movies in Amazon Prime US from Ebert's "Great Movies," not 21 like this guy's spreadsheet says."

I could be wrong, I am not a Prime Video user, but the result I got was that there are 217 movies in Prime Video from Ebert's great movies.

   links -dump 1.html|grep -c Prime.Video

Instructions on how to generate 1.html are here: https://news.ycombinator.com/item?id=23508182

x3blah commented on Scraping Roger Ebert’s reviews and finding his favorite movies on Amazon Prime linisnil.com/articles/scr... · Posted by u/catwind7

cottager · 5 years ago

When you said you were faking a proper agent with `requests`, do you mean you were setting the headers to look like a browser, as in here?: https://stackoverflow.com/questions/27652543/how-to-use-pyth...

That was going to be my suggestion for how to get around the anti-robot responses.

x3blah · 5 years ago

No agent at all is required. I got past the anti-robot response using no user-agent header and a simple delay.

x3blah commented on Scraping Roger Ebert’s reviews and finding his favorite movies on Amazon Prime linisnil.com/articles/scr... · Posted by u/catwind7

x3blah · 5 years ago

Instead of using Python, here is a solution that only requires sh, curl, sed, sort, uniq and grep.

This solution uses a generous 87s delay to retrieve the Amazon pages. There are 328 films listed as "great movies" on rogerebert.com. As such, the script, named "1.sh", needs 8h to complete, e.g., the time while you are at work or sleeping. No cookies, no state, no problems.

   Usage: sh -c 1.sh > 1.html

Open 1.html in a browser and it shows whether each "great movie" is available as Prime Video or whether it is only available in some other format, such as Blu-ray, DVD, Multi-format, Hardcover. A link to the item on Amazon is provided.

   #!/bin/sh

   curl -HUser-Agent: -H'Accept: application/json' --compressed 'https://www.rogerebert.com/great-movies/page/[1-16]?utf8=%E2%9C%93&filters%5Btitle%5D=&sort%5Border%5D=newest&filters%5Byears%5D%5B%5D=1914&filters%5Byears%5D%5B%5D=2020&filters%5Bstar_rating%5D%5B%5D=0.0&filters%5Bstar_rating%5D%5B%5D=4.0&filters%5Bno_stars%5D=1'|grep -o "/reviews/great-movie-[^\\]*"|sed 's/.reviews.great-movie-//'|sort|uniq|while read x;do y=$(echo $x|sed 's/-/+/g');echo $x;curl -s --compressed -HUser-Agent: https://www.amazon.com/s/?k=$y 2>/dev/null|grep -m1 -C4 a-link-normal.a-text-bold;sleep 87;done|sed '/^[^< ]/s/.*/@&,/;1s|.*|<base href=https://www.amazon.com />&|;s/ *//;/^$/d;/^[@<]/!s|$|</a>|;1s/@//;s/@/<br>/'

x3blah commented on Against an Increasingly User-Hostile Web (2017) neustadt.fr/essays/agains... · Posted by u/stargrave

benjaminjosephw · 5 years ago

The author talks about the web as "one of humanity's greatest inventions." which is now in crisis:

> And now, we the architects of the modern web — web designers, UX designers, developers, creative directors, social media managers, data scientists, product managers, start-up people, strategists — are destroying it.

The interests of tech companies, investors and web professionals have not always aligned with the best interests of end-users and so there has been a gradual erosion of the freedoms embedded in the foundations of the web itself.

My favourite StarTrek moment is Captain Pike's statement "We are always in a fight for the future". Given the current state of the web, this feels truer than ever. Unlike the author, however, I don't think the answer is better web pages. Any chance of us winning the fight for user freedoms must be bigger and bolder than that.

There has been an entire generation of entrepreneurs and investors who have thought and planned strategically how to shape the web to work in their best interests. A meaningful counter has to be equally intentional and coordinated to stand a chance at shaping the course technology takes. We are in a fight for the future and we need to think bigger to stand a chance of winning that fight.

x3blah · 5 years ago

If Pike and his crew represent users, then the tech industry are the Talosians?

Absolute favourite episode hands down, The Cage. According to Shatner's autobiography, NBC called the pilot "too cerebral" and "too intellectual".

x3blah commented on Rediscovering the Small Web neustadt.fr/essays/the-sm... · Posted by u/livatlantis

x3blah · 5 years ago

"The Commercial Web (of Marketing)

There has always been a place for commerce and marketing on the web."

Not really true as I remember it. The web opened up to the public in 1993. There was no commerce and marketing in the beginning. Even by 1996 while commerce and marketing may have existed, e.g., Amazon founded in 1995, its place was in the background. As I rememember the early web, the foreground, the "starting point" or "portal", was something like Yahoo! You had to pick a topic (direction) that you wanted to go in. For example, if you were after music, you might end up browsing the Internet Underground Music Archive. The "front page" of the portal was predominantly non-commercial, mostly generic headings for topics. If you wanted to search out something commercial, no doubt you could but the initial starting point was intellectual curiosity. This is IMO what has been lost over time with regard to web use: intellectual curiosity and the ability to actually satisfy it. (A fun tangent here is the collections of inane queries that people type into Google. These are simultaneously hilarious and disturbing.)

As an experiment have a look at the Yahoo! page today. It is full of low quality mainstream "news". There is zero attention to intellectual curiosity. Nothing to see here, folks, but here is the latest news. For part 2 of the experiment, run a Google search for the term "music". The results are dominated by YouTube. Every result is directly or indirectly commercial (either selling something or conducting surveillance and serving ads), except one: Wikipedia. The chances of someone new to the web not following a link to YouTube or some other Google-controlled domain would seem almost nil.

The "onboarding" process for new web users is very different today than it was in the early 1990's. Perhaps it is still possible to approach the web with a sense of awe and wonder, pondering "What is out there?" However a new web user is scant likely to end up on a non-commercial website besides Wikipedia. What is out there? Surveillance, ads and an endless supply of soon-to-be-obsolete Javascript du jour.

x3blah commented on Why is AI so useless for business? mebassett.info/ai-useless... · Posted by u/mebassett

ethanbond · 5 years ago

I’ve been working in the “real world business processes that companies are trying to AI-ify” realm for quite a while now. Pharma, cyber security, oil and gas production, etc.

This article doesn’t mention a really, really straightforward factor for why AI hasn’t invaded these domains despite billions of dollars being dumped into them.

An automated process only has to be wrong once to compel human operators to double or triple check every other result it gives. This immediately destroys the upside as now you’re 1) doing the process manually anyway and 2) fighting the automated system in order to do so.

99% isn’t good enough for truly critical applications, especially when you don’t know for sure that it’s actually 99%; there’s no way to detect which 1% might be wrong; there’s no real path to 100%; and critically: there’s no one to hold responsible for getting it wrong.

x3blah · 5 years ago

Why dump billions of dollars then? Nowhere else to spend it? Effective marketing?[1] Is no one asking this question?

"... and critically: there's no one to hold responsible for getting it wrong."

Could this be part of "AI"'s appeal? A dream of absolving businesses and individuals from accountability.[2]

1. "What's more, artificial research teams lack an awareness of the specific business processes and tasks that could be automated in the first place. Researchers would need to develop an intuition of the business processes involved. We haven't seen this happen in too many areas."

2. Including the ones who designed the "AI" system.

x3blah commented on Learn Genetics learn.genetics.utah.edu/... · Posted by u/serverlessmania

vikramkr · 5 years ago

As usual theres a relevant xkcd: https://xkcd.com/1831/.

I think the level of control programmers have over their domain naturally gives rise to that sort of overconfidence. You need to remember that computer systems are built on human made abstractions to human standards and follow human defined logic. DNA is not code, it's just a molecule that reacts with stuff, as are all the other molecules. They exist as they are and are their own system that needs to be understood, we did not create that system. Chemistry and probability and time did.

x3blah · 5 years ago

"DNA is not code, it's just a molecule that reacts with stuff, as are all other molecules."

This sentence exemplifies the perspective to which I referred.

x3blah commented on They redesigned PubMed sciencemag.org/news/2020/... · Posted by u/dredmorbius

x3blah · 5 years ago

What if PubMed had something like Google's "I'm feeling lucky"? What if we could explore PubMed by selecting a random PubMed URL instead of searching? This script generates a random PubMed URL. To do this we need to know the maximum PMID number in the PubMed database. The current max is included in the script and will be saved in a 9-byte file named "max-PMID" when the script is run. If run with the argument "update" it will search for a newer max PMID. If a newer max PMID is found, the script updates the number in the max-PMID file and in the script itself. An alternative is to use the ftp server[1] to find the max PMID; I noticed the latest ftp update was missing new PMID's caught by this script. If run without any arguments it selects a random PMID between 1 and the max and outputs a URL. uses socat, GNU sed and requires a fifo named "1.fifo" 1. ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/

       #!/bin/sh
       test -s max-PMID||echo 32446294 > max-PMID;read x < max-PMID;x=$((x-1));h=pubmed.ncbi.nlm.nih.gov;
       test ${#x} -eq 8||exec echo weird max-PMID;sed -i "/test/s/echo [0-9]\{8\} /echo $x /" $0;
       case $1 in update) mkfifo 1.fifo 2>/dev/null;test -p 1.fifo||exec echo need 1.fifo;
       (grep "<title>PMID .* is not available" < 1.fifo|sed 1q|sed 's/<title>PMID //;s/ *//;s/ .*//;' >max-PMID)&
       y=$((x+10000));seq $x $y|sed '$!s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: keep-alive\r\n\r\n|; 
       $s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: close\r\n\r\n|'|socat - ssl:$h:443 >1.fifo 2>/dev/null;
       ;;"")awk -v min=1 -v max=$x 'BEGIN{srand();printf "https://'$h'/" int(min+rand()*(max-min+1)) "/\n"}';esac

x3blah · 5 years ago

Improved

   #/bin/sh
   test -s max-PMID||echo 32449615 > max-PMID;read x < max-PMID;h=pubmed.ncbi.nlm.nih.gov;
   test ${#x} -eq 8||rm max-PMID;sed -i "s/[0-9]\{8\}/$x/" $0;
   case $1 in update) mkfifo 1.fifo 2>/dev/null;test -p 1.fifo||exec echo need 1.fifo;
   (grep "<title>PMID .* is not available" < 1.fifo|sed 1q|sed -n 's/<title>PMID //;s/ *//;s/ .*//;wmax-PMID')&
   y=$((x+10000));seq $x $y|sed '$!s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: keep-alive\r\n\r\n|; 
   $s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: close\r\n\r\n|'|socat - ssl:$h:443 2>/dev/null|grep -o '<title>[^<]*' >1.fifo;
   ;;"")awk -v min=1 -v max=$((x-1)) 'BEGIN{srand();printf "https://'$h'/" int(min+rand()*(max-min+1)) "/\n"}';esac

x3blah commented on Duff's device en.wikipedia.org/wiki/Duf... · Posted by u/simonpure

ars · 5 years ago

It's a very fun historical article, but it's no longer worth it on modern CPUs, where it actually harms performance.

x3blah · 5 years ago

https://raw.githubusercontent.com/lemire/fastbase64/master/s...

x3blah commented on Learn Genetics learn.genetics.utah.edu/... · Posted by u/serverlessmania

vikramkr · 5 years ago

This looks like a pretty awesome resource. I think it's well worth it to spend some of your quarantine time on learning the basics of the biology you need to know to understand what's actually going on with the virus that's the reason you're in quarantine in the first place. It's not super productive to try and understand this stuff through analogy to computers and algorithms. Assuming DNA works just like code amd that cells are biological computers turned out to be a very poor assumption indeed, as evidenced by how much less we understand after sequencing the human genome compared to what we thought we would have known. It's also why the secrets of the coronavirus didnt reveal themselves immediately after sequencing it's genome. See the epigenetics section for a bit of an understanding of the layer of complexity that lies on top of DNA, and let's take a moment to appreciate the biochemists and the protein biologists unraveling what makes SARS-COV-2 tick.

If you want to really grok genetics and be able to understand and interpret news and discussion about the field, especially considering how important the field is in our day to day lives, both with the virus and with biotech/medicine in general.

x3blah · 5 years ago

"It's not super productive to try and understand this stuff through analogy to computers and algorithms."

You mean like this: https://ds9a.nl/amazing-dna

Having worked in both industries I prefer working with wet science people. For some reason they generally have a much healthier perspective on life. Their work is humbling because it is, and will forever be, full of unsolved mysteries, not simply because it is challenging. The other folks, whether they call themselves "scientists" or "engineers" or "developers" or "coders" or whatever, are working with something that as far as I can see has no inherent connection to the natural world, other than being a production of the human mind. Perhap that affects the perspective many of them have on life. For example, how common among them is this belief that all things, not simply computers, can be thoroughly understood and mastered. Note this is pure opinion, not fact, and I am generalising; there are exceptions to every generalisation.