Readit News logoReadit News
m-i-l commented on Ask HN: Is building for the web even worth it now?    · Posted by u/spaceman_2020
onion2k · a month ago
I just find it hard to engage with anything AI-made, no matter how good

I don't think this is true for many people.

The best example is the movie industry. Hollywood was using AI (in the form of convolutional neural networks mostly) a decade ago to produce CGI effects for film. The younger versions of the actors in Captain America: Civil War (2016) was basically done with AI. No one outside of movie effects and CGI nerds really cared. They just enjoyed the film because the AI was done well.

When AI is done really well you can't tell. It's similar to good design. If something is designed well you don't notice. You only ever see bad design. Same for AI, you only see it when it's bad.

(Someone will now reply to say they thought the effects in Captain America were terrible, obviously. :) )

m-i-l · a month ago
I think this is missing the point - it is a bit like saying "you only ever notice bad fraud, if the fraud is well done you never notice it" - the point is what it is, not whether you notice it or not. With AI in films at the moment there are still people behind, and reviewing, the AI output, so it is just another creative tool, which is fine. However, if someone were to generate an entire 90 minute film and put it online without even having the decency to spend 90 minutes of their own time watching it themselves first, that would not be fine. But that is happening with AI slop on the internet now. Whether it is any good or not is not the point - the point is that it is disrespectful of people's time and attention.
m-i-l commented on Search My Site – open-source search engine for personal and independent websites   searchmysite.net... · Posted by u/OuterVale
1dom · 9 months ago
I like this, thank you! I just lost an hour of time to the exact sort of random but considered personal websites that I think made the Web great in the first place.
m-i-l · 9 months ago
Thanks for the great feedback:-) This is what searchmysite.net is attempting to do - help make "surfing the web" a fun leisure activity once more. It is good to see more people seem to get that point now. When it was on HN nearly 3 years ago[0], many people saw a search box and thought it must be a Google replacement, but were disappointed to find it wasn't. And I guess now more than ever it is useful to have a way of finding content on the web which has been made by humans rather than AI.

[0] https://news.ycombinator.com/item?id=31395231

m-i-l commented on Search My Site – open-source search engine for personal and independent websites   searchmysite.net... · Posted by u/OuterVale
1dom · 9 months ago
Best of both worlds:

> No results found for "digiatl". Did you mean to search for "digital" instead?

m-i-l · 9 months ago
At a big corporate, we had an Apache Solr based search which had some reasonably clever lemmatization and stats analysis and spell check config to suggest alternative searches if not many results were found for the original query, but one day someone reported an unfortunate edge case which caused a bit of a panic - if you searched "annual report” it returned "did you mean anal report?" (we were in the finance sector rather than medical sector, but there were a lot more documents in the corpus containing words like analysts, analysis, analytics etc). Anyway, the point is yes, it is great to have that sort of functionality, but it does come at a cost, and a small project like this might prefer to keep it simple.
m-i-l commented on Search My Site – open-source search engine for personal and independent websites   searchmysite.net... · Posted by u/OuterVale
Sophira · 9 months ago
According to the site, the funding comes from its "Search as a Service" feature[0], where anybody can pay them in order to have a search service focused on their site (which does not have to be in the public index and thus doesn't have to be personal/independent).

So, in the sense that the funding (aims to) comes from larger companies, you are correct. It's not VC, but it does seem like it could end up relying on payments from large companies, making it potentially vulnerable.

[0] https://searchmysite.net/pages/about/#search-as-a-service

m-i-l · 9 months ago
That's right. Most search engines are funded by advertising, where there is the clear conflict of interest[0], not to mention incentive for spam etc. Alternative models include a subscription fee (which I don't think would work for a small niche search like this) and donations (which may or may not be sustainable). Looking through some of the support forums for the big search engines, I'm pretty sure that enough site owners would pay a fee for support to pay the running costs for a large search engine, although for a smaller search engine like this there needs to be something more than just support, hence the search as a service features.

[0] "Advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers", to quote Sergey Brin and Lawrence Page in their "The Anatomy of a Large-Scale Hypertextual Web Search Engine" paper from 1998.

m-i-l commented on Search My Site – open-source search engine for personal and independent websites   searchmysite.net... · Posted by u/OuterVale
ThinkBeat · 9 months ago
I am a bit confused. Solr is the search engine.

An LLM model is loaded. What does the LLM model add to the solution?

m-i-l · 9 months ago
The LLM was for an experiment in retrieval augmented generation, i.e. "a chat with your website" style interface, using Apache Solr as the vector store. Results (on a small self-hosted LLM to keep costs manageable) weren't good enough for the functionality to be fully rolled out, so the LLM has been disabled and is likely to be fully removed.
m-i-l commented on Search My Site – open-source search engine for personal and independent websites   searchmysite.net... · Posted by u/OuterVale
kreelman · 9 months ago
Thanks for putting this together. I wonder, is Postgres a bit of a large DB if it's just a personal website search tool? I'll have to give it a go. We need more tools like this.
m-i-l · 9 months ago
Postgres is just used for the site admin, i.e. keeping track of submissions, review status, subscriptions etc. The actual search index is in Apache Solr. In theory you could use Solr to store all the admin data, but it is generally not recommended to use a Solr style document store to master data. I guess something more lightweight like SQLite could be used, but it is intended to be deployed on servers and Postgres isn't too resource intensive.
m-i-l commented on In Praise of Print: Reading Is Essential in an Era of Epistemological Collapse   lithub.com/in-praise-of-p... · Posted by u/bertman
m-i-l · a year ago
A couple of references to the Nazis, but no reference to the Nazi book burnings, an incredibly symbolic physical manifestation of knowledge and information destruction, which I'd have thought would be very relevant in this context, i.e. in the praise of physical books? Perhaps it wasn't mentioned because it doesn't quite fit in with the narrative of digital being all bad, given digital knowlege can be more resistant to suppression and physical destruction.

Also some great quotes from 30 years ago, e.g. Carl Sagan's "when awesome technological powers are in the hands of the very few" the nation would “slide, almost without noticing, back into superstition and darkness". But did it actually have to end up this way? And is it still possible (with enough collective will power) to push Big Tech profiteering back enough to deliver some of the society enhancing changes originally envisioned in the mid-1990s? Just as it took decades for the full positive implications of the invention of the printing press to come to fruition, perhaps we still need more time before we decry the internet as a net negative?

m-i-l commented on Chi-fi tuning – Why it sounds piercing to Western ears (2020)   audioreviews.org/chi-fi-t... · Posted by u/userbinator
matthewmorgan · a year ago
I once saw a YouTube a short clip of some kind of Chinese street music / singing performed by old men. It was ear piercing and weird and also strangely fascinating. I'll never be able to find it again
m-i-l · a year ago
My children were given a soft toy a few years back from a relative who had bought it from a Chinese street market while on holiday in China. When it was switched on it jumped about frantically and sang a very loud and shrill song. Not 100% sure which language it was, but it is entirely possible it was some form of Chinese street music, and certainly fits the article's description of "Mainland Chinese recordings" as "shouty, harsh and ear-piercing". Normally my children love things that adults find annoying, but even they were afraid of this one.
m-i-l commented on Ask HN: Website with 6^16 subpages and 80k+ daily bots    · Posted by u/damir
superkuh · a year ago
I did a $ find . -type f | wc -l in my ~/www I've been adding to for 24 years and I have somewhere around 8,476,585 files (not counting the ~250 million 30kb png tiles I have for 24/7/365 radio spectrogram zoomable maps since 2014). I get about 2-3k bot hits per day.

Today's named bots: GPTBot => 726, Googlebot => 659, drive.google.com => 340, baidu => 208, Custom-AsyncHttpClient => 131, MJ12bot => 126, bingbot => 88, YandexBot => 86, ClaudeBot => 43, Applebot => 23, Apache-HttpClient => 22, semantic-visions.com crawler => 16, SeznamBot => 16, DotBot => 16, Sogou => 12, YandexImages => 11, SemrushBot => 10, meta-externalagent => 10, AhrefsBot => 9, GoogleOther => 9, Go-http-client => 6, 360Spider => 4, SemanticScholarBot => 2, DataForSeoBot => 2, Bytespider => 2, DuckDuckBot => 1, SurdotlyBot => 1, AcademicBotRTU => 1, Amazonbot => 1, Mediatoolkitbot => 1,

m-i-l · a year ago
Those are the good bots, which say who they are, probably respect robots.txt, and appear on various known bot lists. They are easy to deal with if you really want. But in my experience it is the bad bots you're more likely to want to deal with, and those can be very difficult, e.g. pretending to be browsers, coming from residential IP proxy farms, mutating their fingerprint too fast to appear on any known bot lists, etc.

u/m-i-l

KarmaCake day3941July 30, 2014
About
Personal website: https://michael-lewis.com/

Side project: https://searchmysite.net/

View Original