Readit News logoReadit News
Posted by u/subhrm 6 years ago
Ask HN: Can we create a new internet where search engines are irrelevant?
If we were to design a brand new internet for today's world, can we develop it such a way that:

1- Finding information is trivial

2- You don't need services indexing billions of pages to find any relevant document

In our current internet, we need a big brother like Google or Bing to effectively find any relevant information in exchange for sharing with them our search history, browsing habits etc. Can we design a hypothetical alternate internet where search engines are not required?

adrianmonk · 6 years ago
I think it would be helpful to remember to distinguish two separate search engine concepts here: indexing and ranking.

Indexing isn't the source of problems. You can index in an objective manner. A new architecture for the web doesn't need to eliminate indexing.

Ranking is where it gets controversial. When you rank, you pick winners and losers. Hopefully based on some useful metric, but the devil is in the details on that.

The thing is, I don't think you can eliminate ranking. Whatever kind of site(s) you're seeking, you are starting with some information that identifies the set of sites that might be what you're looking for. That set might contain 10,000 sites, so you need a way to push the "best" ones to the top of the list.

Even if you go with a different model than keywords, you still need ranking. Suppose you create a browsable hierarchy of categories instead. Within each category, there are still going to be multiple sites.

So it seems to me the key issue isn't ranking and indexing, it's who controls the ranking and how it's defined. Any improved system is going to need an answer for how to do it.

jtolmar · 6 years ago
Some thoughts on the problem, not intended as a complete proposal or argument:

* Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.

* How fast a ranking algorithm is depends on how the indexing is done. Is there some common set of features we could agree on that we'd want to build the shared index on? Any ranking that wants something not in the public index would need either a private index or a slow sequential crawl. Sometimes you could do a rough search using the public index and then re-rank by crawling the top N, so maybe the public index just needs to be good enough that some ranker can get the best result within the top 1000.

* Maybe the indexing servers execute the ranking algorithm? (An equation or SQL-like thing, not something written in a Turing Complete language). Then they might be able to examine the query to figure out where else in the network to look, or where to give up because the score will be too low.

* Maybe the way things are organized and indexed is influenced by the ranking algorithms used. If indexing servers are constantly receiving queries that split a certain way, they can cache / index / shard on that. This might make deciding what goes into a shared index easier.

ergothus · 6 years ago
> Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.

But what are you storing in your index? The content that is considered in your ranking will vary wildly by your ranking methods. (example - early indexes cared only for the presence of words. Then we started to care about the count of words, then the relationships between words and the context. Then about figuring out if the site was scammy, or slow.

The only way to store an index of all content (to cover all the options) is to...store the internet.

I'm not trying to be negative - I feel very poorly served by the rankings that are out there, as I feel on 99% of issues I'm on the longtail rather than what they target. But I can't see how a "shared index" would be practical for all the kinds of ranking algorithms both present and future.

jppope · 6 years ago
this is a pretty killer idea
mavsman · 6 years ago
How about open sourcing the ranking and then allowing people to customize it. I should be able to rank my own search results how I want to without much technical knowledge.

I want to rank my results by what is most popular to my friends (Facebook or otherwise) so I just look for a search engine extension that allows me to do that. This could get complex but can also be simple if novices just use the most popular ranking algorithms.

josephjrobison · 6 years ago
I think Facebook really missed the boat on building their own "network influenced" search engine. They made some progress in allowing you to search based on friends' posting and recommendations to some degree but it seems to have flatlined in the last few years and is very constricting.

One thing I haven't seen much on these recent threads on search is the ability to create your own Google Custom Search Engine based on domains you trust - https://cse.google.com/cse/all

Also, not many people have mention the use of search operators, which allows you to control the results returned. Such as "Paul Graham inurl:interview -site:ycombinator.com -site:techcrunch.com"

aleppe7766 · 6 years ago
That would bring to an even bigger filter bubble issue, more precisely to a techno élite which is capable, willing and knowledgeable enough to feel the need go through the hassle, and all the rest navigating in such an indexed mess that would pave the way to all sort of new gatekeepers, belonging to the aforementioned tech élite. It’s not a simple issue to tackle, perhaps a public scrutiny on the ranking algorithms would be a good first step.
greglindahl · 6 years ago
blekko and bing both implemented ranking by popularity with your Facebook friends, and the data was too sparse to be useful.
smitop · 6 years ago
If the details of a ranking algorithm are open source, it would be easy to manipulate them.
dex011 · 6 years ago
Open sourcing the ranking... YES!!!
MadWombat · 6 years ago
I wonder if indexing and ranking could be decentralized. Lets say we design some data formats and protocols to exchange indexing and ranking information. Then maybe instead of getting a single Google, we could have a hierarchical system of indexers and rankers and some sort of consensus and trust algorithm to aggregate the information between them. Maybe offload indexing to the content providers altogether, i.e. if you want your website found, you need to maintain your own index. Maybe do a market on aggregator trust, if you don't like a particular result, the corresponding aggregator loses a bit of trust and its rankings become a bit less prominent.
allworknoplay · 6 years ago
Spitballing here, but what if instead of a monolithic page rank algorithm, you could combine individually maintained, open set rankings?

===Edit=== I mean to say you as the user would gain control over the ranking sources, the company operating this search service would perform the aggregation and effectively operate marketplace of ranking providers. ===end edit===

For example, one could be an index of "canonical" sites for a given search term, such that it would return an extremely high ranking for the result "news.ycombinator.com" if someone searches the term "hacker news". Layer on a "fraud" ranking built off lists of sites and pages known for fraud, a basic old-school page rank (simply order by link credit), and some other filters. You could compose the global ranking dynamically based off weighted averages of the different ranked sets, and drill down to see what individual ones recommended.

Seems hard to crunch in real time, but not sure. It'd certainly be nicer to have different orgs competing to maintain focused lists, rather than a gargantuan behemoth that doesn't have to respond to anyone.

Maybe you could even channel ad or subscription revenue from the aggregator to the ranking agencies based off which results the user appeared to think were the best.

TaylorAlexander · 6 years ago
Well I suppose Google has some way of customizing search for different people. The big issue for me is that google tracks me to do this. Maybe there could be a way to deliver customized search where we securely held the details of our customization. Or we were pooled with similar users. I suppose if a ranking algorithm had all the possible parameters as variables, we could deliver our profile request on demand at the time of search. That would be nice. You could search as a Linux geek or as a music nut or see the results different political groups get.
dublin · 6 years ago
Building something like this becomes much easier with Xanadu-style bidirectional links. Of course, building those is hard, but eliminating the gatekeeper-censors may finally be the incentive required to get bidi links built. It's also worth noting that such a system will have to have some metrics for trust by multiple communities (e.g. Joe may think say, mercola.com is a good and reliable source of health info, while Jane thinks he's stuck in the past - People should be able to choose whether they value Joe's or Jane's opinion more, affecting the weights they'll see). In addition (and this is hard, too), those metrics should not be substantially game-able by those seeking to either promote or demote sites for their own ends. This requires a very distributed trust network.
bobajeff · 6 years ago
I like the idea of local personalized search ranking that evolves based off of a on device neural network. I'm not sure how that would be work though.
sogen · 6 years ago
Sounds like ad-Blocker repos, nice!
asdff · 6 years ago
Not to mention all the people who will carefully study whatever new system, looking for their angle to game the ranking.
daveloyall · 6 years ago
> When you rank, you pick winners and losers.

...To which people responded with various schemes for fair ranking systems.

...To which people observed that someone will always try to game the ranking systems.

Yep! So long as somebody stands to benefit (profit) from artificially high rankings, they'll aim for that, and try to break the system. Those with more resources will be better able to game the system, and gain more resources... ad nauseam. We'd end up right where we are.

The only way to break that [feedback loop](https://duckduckgo.com/?q=thinking+in+systems+meadows) is to disassociate profit from rank.

Say it with me: we need a global, non-commercial network of networks--an internet, if you will. (Insert Al Gore reference here.)

(Note: I don't have time to read all the comments on this page before my `noprocrast` times out, so please pardon me if somebody already said this.)

zeruch · 6 years ago
This is a bang on distillation of the problem (or at least one way to view the problem, per "who controls the ranking and how it's defined").
aleppe7766 · 6 years ago
That’s a very useful distinction, that brings me to a question: are we sure that automating ranking in 2019, on the basis of publicly scrutinized algorithms, would bring us back to a pre-Google accuracy? Also, ranking on the basis of the sole query instead of the individual, would lead to much more neutral results.
tracker1 · 6 years ago
Absolutely spot on... I've been using DDG as my default search engine for a couple months. But, google has a huge profile on me. I find myself falling back to google a few times a day when searching for technical terms/issues.
Retra · 6 years ago
Couldn't you just randomize result ordering?
penagwin · 6 years ago
You know how google search results can get really useless just a few pages in? And it says it found something crazy like 880,000 results? Imagine randomizing that.

---

Unrelated I searched for "Penguin exhibits in Michigan". Of which we have several. It reports 880,000 results but I can only go to page 12 (after telling it to show omitted results). Interesting...

https://www.google.com/search?q=penguin+exhibits+in+michigan

ZeroBugBounce · 6 years ago
Sure, but then whoever gets to populate the index chooses the winners and losers, because you could just stuff it with different versions of the content or links you wanted to win and the random ranking would should those more often, because they appear in the pool of possible results more often.
cortesoft · 6 years ago
That would make it waaaay less useful to searchers and wayyy easier to game by stuffing results with thousands of your own results
onion2k · 6 years ago
I suspect just randomizing the first 20 or so results would fix most problems. The real issue is people putting effort in to hitting the first page, so if you took the benefit out of doing that people would look for other ways to spend their energy.
z3t4 · 6 years ago
If you find nothing useful, just refresh for a new set. It would also help discovery.
iblaine · 6 years ago
Yes, it was called Yahoo and it did a good job of cataloging the internet when hundreds of sites were added per week: https://web.archive.org/web/19961227005023/http://www2.yahoo...

I'm old enough to remember sorting sites by new to see what new URLs were being created, and getting to that bottom of that list within a few minutes. Google and search was a natural response to solving that problem as the number of sites added to the internet grew exponentially...meaning we need search.

kickscondor · 6 years ago
Directories are still useful - Archive of Our Own (https://archiveofourown.org/) is a large example for fan fiction, Wikipedia has a full directory (https://en.wikipedia.org/wiki/Category:Main_topic_classifica...), Reddit wikis perform this function, Awesome directories (https://github.com/sindresorhus/awesome) or personal directories like mine at href.cool.

The Web is too big for a single large directory - but a network of small directories seems promising. (Supported by link-sharing sites like Pinboard and HN.)

ninju · 6 years ago
brokensegue · 6 years ago
Ao3 isn't really a directory since they do the actual hosting
adrianmonk · 6 years ago
I used Yahoo back in those days, and it literally proved the point that hand-cataloging the internet wasn't tractable, at least not the way Yahoo tried to do it. There was just too much volume.

It was wonderful to have things so carefully organized, but it took months for them to add sites. Their backlog was enormous.

Their failure to keep up is basically what pushed people to an automated approach, i.e. the search engine.

bitwize · 6 years ago
I found myself briefly wondering if it were possible to have a decentralized open source repository of curated sites that anyone could fork, add to, or modify. Then I remembered dmoz, which wasn't really decentralized -- and realized that "awesome lists" on GitHub may be a critical step in the direction I had envisioned.
stakhanov · 6 years ago
You don't have to go all the way back into Yahoo-era when it comes to manually curated directories: DMOZ was actively maintained until quite recently, but ultimately given up for what seems like good reasons.
iblaine · 6 years ago
This is true, and DMOZ was used heavily by Google's earlier search algorithms to rank sites within Google. Early moderators of DMOZ had god like powers to influence search results.
gerbilly · 6 years ago
Earlier than that there was a list of ftp sites giving a summary of what was available on each.
alangibson · 6 years ago
I wonder if you could build a Yahoo/Google hybrid where you start with many trusted catalogs run by special interest groups then index only those sites for search. Doesn't fully solve the centralization problem, but interesting none the less.

Deleted Comment

ovi256 · 6 years ago
Everyone has missed the most important aspect of search engines, from the point of view of their core function of information retrieval: they're the internet equivalent of a library index.

Either you find a way to make information findable in a library without an index (how?!?) or you find a novel way to make a neutral search engine - one that provides as much value as Google but whose costs are paid in a different way, so that it does not have Google's incentives.

davemp · 6 years ago
The problem is that current search engines are indexing what is essentially a stack of random books thrown together by anonymous library goers. Before being able to guide readers to books, librarians have to the following non-trivial tasks over the entire collection:

- identify the book's theme

- measure the quality of the information

- determine authenticity / malicious content

- remember the position of the book in the colossal stacks

Then the librarian can start to refer people to books. This problem was actually present in libraries before the revolutionary Dewy Decimal System [1]. Libraries found that the disorganization caused too much reliance on librarians and made it hard to train replacements if anything happened.

The Internet just solved the problem by building a better librarian rather than building a better library. Personally I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.

[1]: https://en.wikipedia.org/wiki/Dewey_Decimal_Classification

jasode · 6 years ago
>I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.

Let me know if I misunderstand your comment but to me, this has already been tried.

Yahoo's founders originally tried to "organize" the internet like a good librarian. Yahoo in 1994 was originally called, "Jerry and David's Guide to the World Wide Web"[0] with hierarchical directories to curated links.

However, Jerry & David noticed that Google's search results were more useful to web surfers and Yahoo was losing traffic. Therefore, in 2000 they licensed Google's search engine. Google's approach was more scaleable than Yahoo's.

I often see several suggestions that the alternative to Google is curated directories but I can't tell if people are unaware of the early internet's history and don't know that such an idea was already tried and how it ultimately failed.

[0] http://static3.businessinsider.com/image/57977a3188e4a714088...

PeterisP · 6 years ago
The current search engines are also indexing books maliciously inserted in the library in a way to maximize their exposure e.g. a million "different" pamphlets advertising Bob's Bible Auto Repair Service inserted in the Bible category.

A "better library" can't be permissionless and unfiltered; Dewey Decimal System relies on the metadata being truthful, and the internet is anything but.

You can't rely on information provided by content creators; Manual curation is an option but doesn't scale (see the other answer re: early Yahoo and Google).

zaphar · 6 years ago
The really hard part of this to scale is the quality metric. Google was the first to really scale quality measurement by outsourcing it to the web content creators themselves.

Any attempt to create a decentralized index will need to tackle the quality metric problem.

agumonkey · 6 years ago
Also, there's an massive economic market on top on what is on the closest shelves. Libraries are less sensitive to these forces.
basch · 6 years ago
They are also a spam filter. It's not just an index of whats relevant, but removal of what maliciously appears to be relevant at first glance.
izendejas · 6 years ago
This. Everyone's missing the point of a search engine.

We're talking about billions of pages and if not ranked (authority is a good hueristic), filtered (de-ranked), etc then good luck finding valuable information because everyone is gaming the systems to improve their ranking.

I think this is part of the reason you get a lot of fake news on social media. It's a constant stream of information (a new dimension of time has been added to the ranking, basically) that needs to be ranked and with humans in the loop, there's no way to do this very easily without filtering for noise and outright malicious content.

IanSanders · 6 years ago
I think heavy reliance on human language (and its ambiguity) is one of the main problems.

Maybe personal whitelist/blacklist for domains and authors could improve things. Sort of "Web of trust" but done properly.

Not completely without search engines, but for example, if every website was responsible for maintaining it's own index, we could effectively run our own search engines after initialising "base" trusted website lists. Let's say I'm new to this "new internet", I ask around what are some good websites for information I'm interested in. My friend tells me wikipedia is good for general information, webmd for health queries, stackoverflow for programming questions, and so on. I add wikipedia.org/searchindex, webdm.com/searchindex and stackoverflow.com/searchindex to my personal search engine instance, and every time I search something, these three are queried. This could be improved with local cache, synonyms, etc. As you carry on using it, you expand your "library". Of course it would increase workload of individual resources, but has potential to give feel of that web 1.0 once again.

dsparkman · 6 years ago
This was devised by Amazon in 2005. They called it OpenSearch (http://www.opensearch.org/) Basically it was a standard way to expose your own search engine on your site. It made it is to programmatically search a bunch of individual sites.
TheOtherHobbes · 6 years ago
This would be ludicrously easy to game. Crowdsourcing would also be ludicrously easy to game.

The problem isn't solvable without a good AI content scraper.

The scraper/indexer either has to be centralised - an international resource run independently of countries, corporations, and paid interest groups - or it has be an impossible-to-game distributed resource.

The former is hugely challenging politically, because the org would effectively have editorial control over online content, and there would be huge fights over neutrality and censorship.

(This is more or less where are now with Google. Ironically, given the cognitive distortions built into corporate capitalism, users today are more likely to trust a giant corporation with an agenda than a not-for-profit trying to run independently and operate as objectively as possible.)

Distributed content analysis and indexing - let's call it a kind of auto-DNS-for-content - is even harder, because you have to create an un-hackable un-gameable network protocol to handle it.

If it isn't un-gameable it become a battle of cycles, with interests with access to more cycles being able to out-index those with fewer - which will be another way to editorialise and control the results.

Short answer - yes, it's possible, but probably not with current technology, and certainly not with current politics.

ehnto · 6 years ago
So long as there is a mechanism for categorizing information and ranking the results, people will try to game the mechanism to get the top spot regardless of your own incentives.

Despite their incentives to make money, Google have actually been trying for years to stop people from gaming the system. It's impressive how far they've been able to come, but their efforts are thrwarted at every turn thanks to the big budgets employed to get traffic to commercial websites.

Nasrudith · 6 years ago
The only assured way to have a "neutral" search engine is to run your own spiders and indexers which you understand completely.

Neutral in that sense is only "not serving the agenda or judgement of another" at the obvious cost of labor and not just as a one off thing as the searched content often attempts to optimize for views. It isn't like a library of passive books to sort through but a Harry Potter wizard portrait gallery full of jealous media vying for attention.

And pendantically it isn't true neutral - but serves your agenda to the best of your ability. A "true neutral" would serve all to the best of their ability.

Besides neutrality in a search engine on a literal level is oxymoronic and self defeating - its whole function is to prioritize content in the first place.

narag · 6 years ago
A few years ago there was that blogs thing, with rss... all things that favoured federation, independent content generation, etc. Now it's all about platforms. I understand that "regular people" are more comfortable with Facebook but, other than that, why are blogs and forums less popular now?
JaumeGreen · 6 years ago
The problem with forums is that you end visiting 5~10 different forums, each with their own login, and some of them might be restricted at work (not that you should visit them often).

So it's easier to have 2~4 aggregators in where all the information you desire resides, even if in each of them there are different forums.

A unified entry point helps adoption.

r3bl · 6 years ago
I'd argue that forums and blogs require more effort.

Read a cool blog post? Nobody around you will ever give a shit, because in order to do so, they'd have to read it too. Shared a photo from a vacation? It might start a conversation or two with people around you, while you receive dozens or hundreds of affirmations (in the form of likes).

I don't like to use social networks, but that's what I fall back on when I have a few minutes to spare. I rarely look at my list of articles I've saved for later — who has time for that?

arpa · 6 years ago
the problem is multiple actually: a) most internet-connected devices these days favor content consumption vs content creation (blogs vs instagram),

b) mainstream culture > closely-knit communities (facebook > forums)

c) big-player takeovers (facebook for groups, google for search) over previously somewhat niche areas and, actually, internet infrastructure

d) if you're not a big player, you don't exist... and back to c)

z3t4 · 6 years ago
A search engine is more like putting the books in a paper schredder and writing the book title on every piece, then ordering the pices by whatever words you can find on it, putting all pieces that has the word "hacker" on it in the same box. Where as the problem becomes how you sort the pieces. Want to find a book about "hacking"? This box has all the shreds that has the word "hacker" on it, you can find the book title on the back of the piece. Second problem becomes how relevant the word is to the book.
greglindahl · 6 years ago
The library index only indexes the information that fits on a card catalog card. That's extremely unlike a web search engine.

If you'd like to see an experimental discovery interface for a library that goes deeper into book contents, check out https://books.archivelab.org/dateviz/ -- sorry, not very mobile friendly.

Not surprisingly, this book thingie is a big centralized service, like a web search engine.

arpa · 6 years ago
maybe crowdsourcing would be a solution - something similar to "@home" project, only for web indexes/cache - maybe even leverage the browsers via plugin for web scraping. It already kind of works for getpocket.
tracker1 · 6 years ago
I don't think it would be an issue if Google wasn't creating "special" rules for specific winners and losers (overall). Hell, I really wish they'd make it easy to individually exclude certain domains from results.

The canonical example to me of something to exclude would be the expertsexchange site. After stack overflow, ee was more than useless, and even before it was just annoying. There are lots of sites with paywalls, and other obfuscations to content and imho these sites are the ones that should be dropped/low-ranked.

But the fact that there's no autocomplete for "Hillary Clinton is|has" (though "Donald Trump is" is also filtered). Yes, it's been heavily gamed. It's also had active meddling. And their control over YouTube seems to be even worse, with disclosed documents/video that indicate they're willing to go so far as outright election manipulation. With all indications that Facebook, Pinterest and others are going the same route.

ScottFree · 6 years ago
> or you find a novel way to make a neutral search engine

Just because nobody's said it in this thread yet: blockchain? I never bought into the whole bitcoin buzz, but using a blockchain as an internet index could be interesting.

Deleted Comment

KirinDave · 6 years ago
How would Merkle DAGs be relevant?
arpa · 6 years ago
even better, have something like git for the web - effectively working as an archive.
neoteo · 6 years ago
I think Apple's current approach, where all the smarts (Machine Learning, Differential Privacy, Secure Enclave, etc.) reside on your device, not in the cloud, is the most promising. As imagined in so much sci-fi (eg. the Hosaka in Neuromancer) you build a relationship with your device which gets to know you, your habits and, most importantly in regard to search, what you mean when you search for something and what results are most likely to be relevant to you. An on-device search agent could potentially be the best solution because this very personal and, crucially, private device will know much more about you than you are (or should be) willing to forfeit to the cloud providers whose business is, ultimately, to make money off your data.
jasode · 6 years ago
>, where all the smarts [...] reside on your device, not in the cloud, is the most promising. [...] An on-device search agent could potentially be the best solution [...]

Maybe I misunderstand your proposal but to me, this is not technically possible. We can think of a modern search engine as a process that reduces a raw dataset of exabytes[0] into a comprehensible result of ~5000 bytes (i.e. ~5k being the 1st page of search result rendered as HTML.)

Yes, one can take a version of the movies & tv data on IMDB.com and put it on the phone (e.g. like copying the old Microsoft Cinemania CDs to the smartphone storage and having a locally installed app search it) but that's not possible for a generalized dataset representing the gigantic internet.

If you don't intend for the exabytes of the search index to be stored on your smartphone, what exactly is the "on-device search agent" doing? How is it iterating through the vast dataset over a slow cellular connection?

[0] https://www.google.com/search?q="trillion"+web+pages+exabyte...

ken · 6 years ago
The smarts living on-device is not necessarily the same as the smarts executing on-device.

We already have the means to execute arbitrary code (JS) or specific database queries (SQL) on remote hosts. It's not inconceivable, to me, that my device "knowing me" could consist of building up a local database of the types of things that I want to see, and when I ask it to do a new search, it can assemble a small program which it sends to a distributed system (which hosts the actual index), runs a sophisticated and customized query program there, securely and anonymously (I hope), and then sends back the results.

Google's index isn't architected to be used that way, but I would love it if someone did build such a system.

packet_nerd · 6 years ago
Or even an online search engine that was configurable where you could customize the search engine and assign custom weights to different aspects.

I'd love to be able to configure rules like:

+2 weight for clean HTML sites with minimal Javascript

+5 weight for .edu sites

-10 weight for documents longer than 2 pages

-5 weight for wordy documents

I'd also like to increase the weight for hits on a list of known high quality sites. Either a list I maintain myself, or one from an independent 3rd party.

Once upon a time I tried to use Google's custom search engine builder with only hand curated high quality sites as my main search engine. It was to much trouble to be practical, but I think that could change with an actual tool.

ntnlabs · 6 years ago
I think this is not what was the original question. A device that knows You still needs indexing service to find data for You. IMHO.
bogomipz · 6 years ago
I remember hearing something about Differential Privacy from a WWDC keynote a few years back however I haven't heard much lately. Can you say how and where Apple is currently using Differential Privacy/
esmi · 6 years ago
https://www.apple.com/privacy/docs/Differential_Privacy_Over...

Apple uses local differential privacy to help protect the privacy of user activity in a given time period, while still gaining insight that improves the intelligence and usability of such features as: • QuickType suggestions • Emoji suggestions • Lookup Hints • Safari Energy Draining Domains • Safari Autoplay Intent Detection (macOS High Sierra) • Safari Crashing Domains (iOS 11) • Health Type Usage (iOS 10.2)

Found via Google...

alfanick · 6 years ago
I see a lot of good comments here, I got inspired to write this:

What if this new Internet instead of using URI based on ownership (domains that belong to someone), would rely on topic?

In examples:

netv2://speakers/reviews/BW netv2://news/anti-trump netv2://news/pro-trump netv2://computer/engineering/react/i-like-it netv2://computer/engineering/electron/i-dont-like-it

A publisher of webpage (same html/http) would push their content to these new domains (?) and people could easily access list of resources (pub/sub like). Advertisements are driving Internet nowadays, so to keep everyone happy, what if netv2 is neutral, but web browser are not (which is the case now anyway)? You can imagine that some browsers would prioritise some entries in given topic, some would be neutral, but harder to retrieve data that you want.

Second thought: Guess what, I'm reinventing NNTP :)

decasteve · 6 years ago
Inventing/extending a new NNTP is nice idea too.

The Internet has become synonymous with the web/http protocol. The web alternatives to NNTP won instead of newer versions of Usenet. New versions of IRC, UUCP, S/FTP, SMTP, etc., instead of webifying everything would be nice. But those services are still there and fill an important niche for those not interested in seeing everything eternal septembered.

bogomipz · 6 years ago
I believe there is/was an extension to NNTP for full text search or at least a draft proposal no?
alfanick · 6 years ago
Another inspiration: DNS for searching.

What if we implement DNS-like protocol for searching. Think of recursive DNS. Do you have "articles about pistachio coloured usb-c chargers"? Home router says nope, ISP says nope, Cloudflare says nope, let's scan A to Z. Eventually someone gives an answer. This of course can (must?) be cached, just like DNS. And just like DNS, it can be influenced by your not-so-neutral browser or ISP.

quickthrower2 · 6 years ago
The proliferation of Black hat SEOs would render this useless.
PeterisP · 6 years ago
How would topic validity get enforced?

For example, if a publisher has a particular pro-Trump article, they would likely want (for obvious financial reasons) to push it to both etv2://news/anti-trump and netv2://news/pro-trump . What would prevent them from doing that?

Also, a publisher of "GET RICH QUICK NOW!!!" article would want to push it to both netv2://news/anti-trump and netv2://computer/engineering/electron/i-dont-like-it topics.

You can't simply have topics, you can have communities like news/pro-trump that are willing to spend the labor required for moderation i.e. something like reddit. But not all content has such communities willing and able to do so well.

swalsh · 6 years ago
I like this idea of people dreaming about a new internet :D

The idea of moving to a pub-sub like system is a good one. It makes a lot of sense for what the internet has become. It's more than simple document retreival today.

leadingthenet · 6 years ago
To me it seems that you’ve just recreated Reddit.
WhompingWindows · 6 years ago
You want to silo information and create built-in information echo chambers? That seems so bad for polarization.
volkk · 6 years ago
im starting to think echo chambers are just something that will forever be prevalent and its up to the users to try to view alternate viewpoints
bouk · 6 years ago
If netv2 is neutral, I would just stuff all of the topics with my own content millions of time, so everyone can only see my content

Deleted Comment

dymk · 6 years ago
Who maintains, audits, and does validation for content submitted to these global lists of topics?
codeulike · 6 years ago
That was what the early internet was like (I was there). People built indexes by hand, lists of pages on certain topics. There was the Gopher protocol that was supposed to help with finding things. But this was all top-down stuff, the first indexing/crawling search engines were bottom-up and it worked so much better. And for a while we had an ecosystem of different search engines until Google came along, was genuinely miles better than everything else, and wiped everything else out. Really, search isn't the problem, its the way that search has become tied to advertising and tracking thats the problem. But then DuckDuckGo is there if you want to avoid all that.
m-i-l · 6 years ago
In the very early days, you didn't need a search engine because there weren't that many web sites and you knew most of the main ones anyway (or later on had them in your own hotlists in Mosaic). Nowadays you need a search because there is so much content.

The problem is that the amount of content and the size of the potential user base are so large that is is impossible to offer search as a free service, i.e. it has to be funded in some way. Perhaps instead of having a free advertising-driven search, there would be space for a subscription-based model? Subscription based (and advert free) models seem to be working in other areas, e.g. TV/films and music.

Another problem though is that more and more content seems to be becoming unsearchable, e.g. behind walled gardens or inside apps.

vpEfljFL · 6 years ago
Exactly my thought. But it definitely wouldn't get mass adoption which is good because mass-market content websites are questionable in terms of user experience (they also need to cover content creating costs by popups/ads/pushes). One thing, though, ad based search engines lift ad based websites because they can sell ad on a second end.

Maybe we'll see advent of specialised paid search engines SaaSs with authentic and independent content authors like professional blogs.

supernovae · 6 years ago
Search is the problem. If you don’t rank in google you don’t exist on the internet. There is an entire economy built on manipulating search that is pay to play in addition to google continually focusing on paid search of natural SERPs. Controlling search right now is controlling the internet.
bduerst · 6 years ago
>If you don’t rank in google you don’t exist on the internet.

Maybe in 2009. Today there are businesses today that exist solely on Instagram, Facebook, Amazon, etc.

codeulike · 6 years ago
Whatever you replace Search with would be gamed in the same way.
Fjolsvith · 6 years ago
If your target audience isn't on Google, then you don't have to rank there.

Almost all of my customers find me through classified advertising websites. Organic and paid search visitors to my site tend to be window shoppers.

davidy123 · 6 years ago
I think in one sense the answer is it always depends who or what you are asking for your answers.

The early Web wrestled with this, early on it was going to be directories and meta keywords. But that quickly broke down (information isn't hierarchical, meta keywords can be gamed). Google rose up because they use a sort of reputation system based index. In between that, there was a company called RealNames, that tried to replace domains and search with their authoritative naming of things, but that is obviously too centralized.

But back to Google, they now promote using schema.org descriptions of pages, over page text, as do other major search engines. This has tremendous implications for precise content definition (a page that is "not about fish" won't show up in a search result for fish). Google layers it with their reputation system, but these schemas are an important, open feature available to anyone to more accurately map the web. Schema.org is based on Linked Data, its principle being each piece of data can be precisely "followed." Each schema definition is crafted by participation from industry and interest groups to generally reflect its domain. This open world model is much more suitable to the Web, compared to the closed world of a particular database (but, some companies, like Amazon and Facebook, don't adhere to it since apparently they would rather their worlds have control; witness Facebook's open graph degeneration to something that is purely self-serving).

_nalply · 6 years ago
The deeper problem is advertising. It is sort of a prisoner's dilemma: all commercial entities have a shouting contest to attract customer attention. It's expensive for everybody.

If we could kill advertisement permanently, we can have an internet as described in the question. This will almost be like an emergent feature of the internet.

worldsayshi · 6 years ago
We could supercharge word of mouth. I've been thinking about an alternative upvote model where content is ranked not primarily based on aggregate voting but by:

- ranking content that users you have upvoted higher

- ranking content that users with similar upvote behaviour higher

While there is a risk of upvote bubbles, it should potentially make it easier for niche content to spread to interested people and make it possible for products and services to spread using peer trust rather than cold shouting.

thekyle · 6 years ago
> ranking content that users with similar upvote behaviour higher

This is what Reddit originally tried to do before they pivoted.

https://www.reddit.com/r/self/comments/11fiab/are_memes_maki...

endymi0n · 6 years ago
As long as there are big companies making money off their products, you can be sure they'll find a way to advertise them to you.
eterps · 6 years ago
I've had similar ideas recently. Especially niche content (or shared research) would probably be notoriously hard (WRT false positives) for machine learning to decide whether it is relevant to you, people with similar interests know that much better.

I was also wondering what would be good options to store votes/upvotes in a decentralized way.

scrollaway · 6 years ago
Not to echo a R&M quote on purpose but that just sounds like targeted advertising with extra steps.
fifnir · 6 years ago
> ranking content that users with similar upvote behaviour higher

That's how you make echochambers

loxs · 6 years ago
So, basically Facebook?
Fjolsvith · 6 years ago
This sounds so much like Facebook.
vfinn · 6 years ago
Maybe if IPFS (~web 3.0) succeeds in the future, you could solve the advertising problem by inventing a meta network, where all the sites involved would agree to follow certain standardized criteria of site purity. You'd tag the nodes (or sites), and then have an option to search only sites from the pure network. Just a thought. edit: Maybe this would lead to a growing interest in the site purity, and as the network's popularity would grow, you could monetize the difference to its advance.
ativzzz · 6 years ago
Be careful what you wish for, as you might get AMP or some propriety Facebook format as a standard instead.
olegious · 6 years ago
If we kill advertisement, you can say goodbye to the vast majority of content on the internet. The better approach is to make advertising a better experience and to create incentives for advertisers to spend ad dollars on quality content.
rglullis · 6 years ago
There will always be bottom-feeders as long as there is a market where people are not forced to choose with their wallets. Killing the "vast majority of content on the internet" seems like a good thing to me, honestly.
_nalply · 6 years ago
Advertisement just should not be the central means of income of content producers. I really hope this point of view gets killed together with advertisement.
asark · 6 years ago
1) Not to most of the best content, 2) other business models may have an actual chance when not competing with "free", 3) actually-free, community-driven sites and services (and standards and protocols—those used to be nice) will have a larger audience and larger creator interest when not competing with "free" (and well-bankrolled).
fifnir · 6 years ago
The vast majority of content is absolute shit though, so speaking strictly for me, I'm willing to try
amelius · 6 years ago
The question was about search engines, not about content.

But I think the combination of advertising+search engines is particularly bad, so paying for search would be a great first step.

arpa · 6 years ago
maybe it's worth saying goodbye to "8 reasons why current internet sucks that drive spammy copywriters mad". The whole more-clicks-more-revenue based approach did not do good things to the online content.
marknadal · 6 years ago
I wrote up a proposal on this, changing the economics to adapt to and account for post-scarce resources like information:

https://hackernoon.com/wealth-a-new-era-of-economics-ce8acd7...

wolco · 6 years ago
To kill advertising would mean the web would live behind many walled gardens where each site requires membership.

For the remaining free sites you will see advertising in different forms (self promotion blog, the upsell, t-shirt stores on everysite, spam-bait).

Advertising saved the internet.

Now tracking.. for advertising or other purposes is the real problem.

Deleted Comment

BjoernKW · 6 years ago
Other than a completely new approach for producing value such as the 'Freeism' one described in the article suggested in this comment https://news.ycombinator.com/item?id=20282851 (which I hadn't time to read yet and hence I'm neither in favour of or against) this simply boils down to the questions of who will pay for relevant content and what the business model will be.

By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.

Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.

TeMPOraL · 6 years ago
> By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.

I don't think the causality is right here. People might not be willing to pay for content on the web because advertising enables competitors to offer content for free. If you removed that option, if people had no choice but to pay, it might just turn out that people would pay.

Fjolsvith · 6 years ago
> Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.

Not so simple. What is relevant for me may be irrelevant for you.

jppope · 6 years ago
Promotion is a need, and a very important need for ideas to spread. We all know that the concept of "if you build it they will come" doesn't work". Google's adaptation for this was to make advertising relevant... which is actually a considerable improvement over historical media models...

There's a saying in sales: "people hate to be sold, but they love to buy"... which is akin to what you are saying here. Advertising isn't the problem... the problem is that the reasons why people are promoting aren't novel enough... (rent seeking... which creates noise)

bduerst · 6 years ago
The only way to kill advertising is to have perfectly efficient markets.

Until then, you're going to have demand for ferrying information between sellers and buyers, and vice versa, because of information asymmetry. You may disagree with some of the mediums currently used, finding them annoying, but advertising is always evolving to solve this problem, as is evident in the last three decades.