It's time to reveal all recommendation algorithms – by law if necessary

bazmattaz · 3 years ago

Oh wow what a wild ride of an article! As someone who has led the building of recommender systems (PM) for the last 5 years this rubs me the wrong way.

> I defend myself from arbitrary data collection that fuels the algorithms using PiHole, the tracker-blocking Disconnect plugin, and Firefox, plus a few other tricks.

The author clearly has no idea what he/she is talking about. Sure, Everything they mention blocks collecting tracking information but recommender systems mainly track your behaviour in the experience itself (clicks, page views, time on page).

> That leaves me and other privacy conscious folk with just one lever to pull offer - disliking a video

Again this author has it all wrong. If you’re serious about privacy the last thing you want to be doing is providing more input into the models we create!

And finally the comment that made me realise this author has no idea what he’s talking about;

> The solution to both issues is obvious, technically easy, and yet commercially a nearly impossible proposition: open up all recommendation algorithms. Make them completely transparent, and, for the individual being targeted by the recommendation engine, completely programmable.

Are you joking? So he/she thinks the east answer if for every user to ”program” their own algorithm. That’s like saying every Tesla owner should be able to program their own car! Let’s open the source code of Tesla to every owner!

Idiotic article.

eftychis · 3 years ago

Three thoughts:

a) I agree with you that the author lacks a basic understanding, and that they do not solve the problem.

Yet: b) I think there actually might be a positive outcome for several reasons, including innovation, marketing and ethics, if one is to allow trained model to be queried arbitrarily. At least in the general case. (Think what is happening with stable diffusion and LLMs/LMMs.) Recommendation systems of say Youtube might be more useful to people making videos -- essentially somehow create a pre-query system before one finishes a video.

c) As a couple others have noted, the Tesla example is not a good one. I understand it was formed in haste and I think we all can understand your reasoning. My thought though is:

There is a) the right of repair -- yes still being fought for -- and b) the full self driving ability that might actually demand open sourcing code and models.

How else do we envision the Courts and Congress to formulate liability and empower related LE, and rights groups investigations?

Recall that the aviation industry achieved the "FSD" mode by building the infrastructure, extreme reliability (space and aviation started byzantine systems) and definitely not black box.

P.S. As a personal note: I will never allow something to drive me, if I have any liability, especially if it is a limited "black box" that I can not even arbitrarily query.

Right now we are "cruising" along because liability is with the driver. Tesla and the auto industry will need to face that pill, if they ever achieve their goal. I think they will achieve it personally.

walleeee · 3 years ago

> That’s like saying every Tesla owner should be able to program their own car! Let’s open the source code of Tesla to every owner!

Other critiques aside, these things are not very similar. You can't run someone over with a buggy recommender system. You bork your own newsfeed, who cares?

bazmattaz · 3 years ago

Ok fair enough, I formed that analogy in haste.

My point was that it’s not easy at all to “open up” an machine learning model for a user to “program”. It’s complex enough for an ML engineer to develop!

In theory, maybe it is possible but it would be incredibly expensive to produce such a system, let alone maintain. But even then, the ability to control an Ml system is such a niche ask that it would never happen

alexvoda · 3 years ago

I think you should be able to program your own car and that should be the default. And I think that can be made safe too.

It does not mean that any change should be allowed on the road. But there could be a system where you pay a fee and send your code to be audited and aprooved for use on public roads. Otherwhise you can only run that code on a test track.

And also not all parts of a car's software are safety critical.

mochomocha · 3 years ago

Also working in the field and I don't understand why this article is even on the front-page. The author has clearly no idea what they're talking about, it almost reads as a satire.

bazmattaz · 3 years ago

Oh good, I’m glad I’m not the only one then.

I Have a feeling pooping on recommendations systems is en vogue right now. Social commentators love to point out how “evil” ML algorithms are while conversely complaining how X product is terrible at recommendations.

JohnFen · 3 years ago

Speaking as someone not in the field, that's how it struck me as well.

strken · 3 years ago

"Completely programmable" is obviously stupid, but having more control over the way items are ranked in your personal feed isn't, particularly when weights are chosen by hand.

distortedsignal · 3 years ago

I feel like "open the recommendation algorithms!" is easy to _say_, and it's less easy (and maybe impossible) to do effectively.

Should we have access (by law) to the code that makes recommendations on content? That seems like a governmental overreach. If that was the case, then we should have to have direct access to the brain of every editor/layout person of every newspaper. Why was one story above the fold vs below the fold vs on page 4? I don't see a whole lot of distinction between those two.

Once we get access to that, how do we keep people from abusing these algorithms on the supply side? Thought SEO sites were bad before? Guess what happens when those sites know what they're _actually targeting_, as opposed to blindly guessing!

As a question, how well do you understand the recommender system that you've created? For a given piece of content, would you be able to guess at the penetration into the platform? As an outsider, I sometimes feel like the recommender systems I interact with have grown out of the platform holder's control (see: Nazi/ISIS/ISIL propaganda on YouTube, etc.). I think these are the points the article is reacting to.

So if the article is saying "I would really rather not see Nazi/ISIS/ISIL propaganda," then, ok, that's good. That's a good want. But we're going to need an algorithm to determine what that propaganda _is_, and then downrank that for that user.

So, is the conclusion of the article... misguided? Sure! Is the desire of the author valid?

I don't know. I don't have time to read these articles.

distortedsignal · 3 years ago

I hope the /s at the end of the last comment was loud enough to not be stated.

But I read the article. Yeah, it was dumb. Specifically:

> If recommendation algos aren’t shared then we need - by legislation, if necessary - a switch that turns the recommendation engine off.

This is... baffling. On YouTube - the case-in-point of the article - you already have access to a non-algorithmic feed of videos: namely, your Subscriptions feed.

And if you don't want a recommendation algorithm, good luck finding new content that you like.

And if you "turn off" the recommendation algorithm, you want... what, exactly? The latest videos uploaded to the platform? Still a recommendation algorithm. Videos with words in titles similar to the last video you watched? Still a recommendation algorithm. Videos that the creators of your favorite channel watched? STILL A RECOMMENDATION ALGORITHM.

I was wrong. The desire of the author is far dumber than I originally gave credit for.

ad404b8a372f2b9 · 3 years ago

The distinction is easy to make, a recommendation algorithm is personal, automated, scalable to a gigantic amount of content, entirely opaque for the average user.

The social implications are much more dangerous than a newspaper editor and journalist following their paper's political bias.

phailhaus · 3 years ago

Getting tired of this, because the algorithms themselves are worthless. Instead, demand websites that aren't just a recommendation feed. Demand the ability to categorize and rank content explicitly.

The problem is that these websites are creating a feed, and then training themselves based on how you engage with...the feed. It's a toxic feedback loop, with almost no levers for you to interact with. If you get stuck in a rabbit hole, it is difficult to teach it how to get you out. That's independent of the "algorithm" being used, it's a problem with all of them.

cush · 3 years ago

> ability to categorize and rank content

Who has time for this? What about discovering interesting and new content amidst millions of options?

phailhaus · 3 years ago

Not you personally, but users in general. See: Reddit. I never "accidentally" get into a conspiracy theory rabbit hole, that's something I can consciously join or unfollow. And because I have an explicit list of subreddits I'm a member of, the platform can much more accurately recommend me content when I'm in the mood to explore.

ddingus · 3 years ago

I do. Tons of people do.

As for new stuff, this is how I generally do it:

Interact with friends and others near me. We trade all the time, recommend all the time. This is my long standing favorite.

Community discussion. Here, for one. This is a strong second.

Search, and now it may appear a GPT entity may well serve me up stuff I am going to like and need.

See what "the feed" has in it.

ZainRiz · 3 years ago

No need to make every person categorize/rank their own content. But let them choose which of the 3rd party ranking algorithms they'd like to have serving them their feed today.

JohnFen · 3 years ago

I certainly am not going to take the time to do that. But at the same time, I also completely ignore programmatic recommendations, because I find them terrible.

What I do to discover interesting new stuff is to listen to the recommendations of friends, and to check out references or recommendations made from the stuff I already enjoy.

Basically, I do what everyone used to do to find new music, but for all other media as well.

classified · 3 years ago

There is one lever left: Stay away from the toxic fast food.

rektide · 3 years ago

As we've already seen from twitter releasing the algorithm: it's useless & irrelevant without the weights.

BaseballPhysics · 3 years ago

I totally disagree.

For example, Twitter releasing the algorithm showed us that certain specific people were getting artificially boosted.

We also saw that the topic of the war in Ukraine was getting special treatment.

Yes, we all guessed that was happening, but without the code it was just supposition that Twitter could deny.

JohnFen · 3 years ago

> Twitter releasing the algorithm showed us that certain specific people were getting artificially boosted.

It did? Did I miss something here? I don't remember seeing that. There was certainly code that tallied certain specific people and topics, but I don't see an indication of what was done with that tally.

rektide · 3 years ago

One, no one will ever be this careless again.

Two, Twitter claims that whole area was for telemetry.

This comment does a massive disservice & deeply injures the cause for real transparency. It does it by making a very cheap shot at a very dumb situation. To say that this makes not having the models OK or interesting or useful? I highly protest. This is a sad distracting sideshow & letting oneself be lured & baited into thinking this little morsel is filling or informative is... Ughh I just can't no, that's just so low a position.

Buttons840 · 3 years ago

We can't run the algorithm without the weights, be we can understand it -- or rather, we can understand it as much as it's possible for anyone to understand it. Having the weights will not add understanding. Having the loss function used during training will.

dragonwriter · 3 years ago

> Having the weights will not add understanding.

Yes, it will, because with weights you can run it against constructed data, and that kind of experiment with the actual algorithm (including weights) is a much better source of practical understanding than abstract analysis of the algorithm without weights.

rektide · 3 years ago

Your perspective seems to be that I'm not asking for nearly enough, and I 100% agree & apologize for under representing the threat to society & opacity of the new enemy of democratic social good. Thank you & yes yes yes. We also need the training data too.

crooked-v · 3 years ago

Then the weights should be considered a functional part of the algorithm.

sidfthec · 3 years ago

The weights aren't going to help understand "why we see what we see", which is what non-tech people (and this article) think they'll get from companies releasing "the algorithm"

klyrs · 3 years ago

Release the weights too?

AzzieElbab · 3 years ago

in real-time :)

esafak · 3 years ago

Worse, people game the algorithm, making it useless.

jmclnx · 3 years ago

Yes plus when you call a support desk, if the "person" on the phone is not a real person, the first words said is "You are talking with an AI, not a person."

I believe the only reason for Commercial Orgs are looking at ChatGPT is to eventually replace all support people with some kind of AI.

patmcc · 3 years ago

I care a lot more about outcomes than whether the person I'm talking to is "real" or not. If my bill gets corrected by an AI but not a person? I'll take the AI.

Think of the classic "is 0.002 cents the same as 0.002 dollars?" call - https://verizonmath.blogspot.com/ - I just tested, and chatGPT seems to understand the difference - so maybe this would be a case where the AI is better.

edit: to be clear I'm fine with labeling if that's desired, it's the "I want to talk to a real person" that I think is silly (and oddly similar to the "I want to talk to an American" that people also say).

standardUser · 3 years ago

And most people don't care how many carbs are in a candy bar but we mandate labelling because it creates transparency and accountability and empowers the public. We should be doing the same with technology, only far more aggressively and comprehensively.

thomastjeffery · 3 years ago

If outcome is your goal, then why continue the overhead of inter-human dialogue?

mikrotikker · 3 years ago

If they remove the human element then it means all the rules become hard and fast because we lose the ability to reason with and get exemptions in special cases right?

There are many dystopian movies that showcase this and the resulting oppression.

HWR_14 · 3 years ago

My guess is the AI will be less likely to correct your bill than a person.

JohnFen · 3 years ago

> I'm fine with labeling if that's desired, it's the "I want to talk to a real person" that I think is silly

So anything that you don't value is silly?

olyjohn · 3 years ago

I'd rather they use the bot to correct the bill in the first place before they even send it to me.

jebr224 · 3 years ago

You logic is sound ... maybe too sound. Have AI already taken over HK comments.

bediger4000 · 3 years ago

In a world where support people are often 10.5 or 11.5 timezones away, how will we tell?

Seriously. A couple of years ago I hit a truck tire tread and roached my radiator 30 miles east of Denver on I25. I have State Farm auto insurance, I probably pay more than I should. Their after hours support was terrible, as if they'd never heard of someone needing a tow on an interstate highway outside of city limits and street/number/zip code addressing. "Westbound I25, east of Denver at mile marker 316" is a good location, but I had to talk myself hoarse because the idea of "Interstate Highway 25" was incomprehensible. Several reps wanted a street address, which just wasn't possible.

The towing company also basically held my car for ransom, I guess that's an experience an AI can't provide. Yet

ted_dunning · 3 years ago

Well, the problem with an AI-based system here might well have been that I-25 goes north and south through the heart of Denver rather than east and west.

The location "30 miles east of Denver on I-25" literally does not exist.

Perhaps you meant I-70?

And perhaps neither support humans far from Colorado nor an AI with a map would know to make that correct.

bko · 3 years ago

If talking to an ai is the cost of not being put on hold for 20 min for the simplest request so be it. Honestly I welcome ai customer service. Most of the time it's just something super simple i want to do.

toss1 · 3 years ago

"If" is doing a LOT of work in that comment

I have never seen nor heard of any "ai" system for customer service that remotely approaches a barely competent human, and those are frustrating and annoying enough.

ChatGPT4-level systems may be able to get much closer if they are provided the training set, and if they can, then great.

BUT, the absolutely need to be able to figure out when they are at the limit of their knowledge/ability, and then hand-off to a human. ChatGPT4 has been spectacularly unable to do anything even close to this; and it actually just starts fabricating bullcrap in a highly confident voice.

Corporate managers are already no good, horrible, awful, terrible at actually providing service, as it seems their only goal is to provide a cursory appearance of service and reduce their human costs. (Incidentally, this also massively misses the opportunity to gather excellent data on where their company could improve it's product/service and gain market share.) "AI" will only exacerbate the trend until several generations in the future when it is actually good enough, and it may become a competitive advantage to provide better service.

thomastjeffery · 3 years ago

Until "Artificial Intelligence" actually exists, we should use a different word. I understand the distaste for "bot", but something like "system" or "computer" or "network" would be fine.

We really need to stop personifying every project that has the categorical goal of AI, but not the result.

circuit10 · 3 years ago

The word AI has come to mean machine learning of any kind, which definitely exists. Maybe you’re thinking of AGI (artificial general intelligence)?

dragonwriter · 3 years ago

> I believe the only reason for Commercial Orgs are looking at ChatGPT is to eventually replace all support people with some kind of AI.

Support work is far from the only kind of human labor commercial orgs would like to automate if possible, and far from the only kind that AI could automate.

sigstoat · 3 years ago

> the first words said is

you want to make the call even longer than necessary to provide me with information i can do nothing about. excellent.

JohnFen · 3 years ago

I would find the information useful. It would allow me to adjust how I'm interacting with whatever is on the other end of the line.

johnklos · 3 years ago

Add to that algorithms used to ascertain whether email will be treated as spam. I'm tired of explaining to people that Google and Outlook email are NOT deterministic, and that they could be missing email that might never even make it in to their spam folder, and they'd never know, nor would they ever know why.

People say that spammers would just do the minimum they need to get spam delivered, but every person can have their own personal threshold for their own spam filters.

amadeuspagel · 3 years ago

> People say that spammers would just do the minimum they need to get spam delivered, but every person can have their own personal threshold for their own spam filters.

OK, so then rather then doing the minimum to get spam delivered, spammers would be more motivated to improve their spam to get above higher tresholds, so that people would have to set their tresholds higher and higher until all the legitimate mail they get is classified as spam.

johnklos · 3 years ago

That's not how it works, else that's what we'd already be seeing.

Deleted Comment

baby_wipe · 3 years ago

Then all news outlets should be forced to publicize their internal editorial decisions.

Zak · 3 years ago

Here's one possible approach: you can have a secret algorithm OR platform immunity, but not both.

News outlets don't have to publicize their internal editorial decisions, but they are responsible as publishers for the content they publish and can be sued if, for example they publish something false and defamatory.

BeepBipBoop · 3 years ago

Isn’t that just the news? As in what they publicize is their internal editorial decision?

Aside sticking a camera in the newsroom and livestreaming all day, what would this entail?

Deleted Comment

amadeuspagel · 3 years ago

A terrible idea that would make it trivial to game these algorithms.

klik99 · 3 years ago

They are already being gamed, and with bad actors who have access to inside information. Not saying it's a good idea, but this isn't a valid reason

brookst · 3 years ago

I've never understood the "an imperfect solution is no better than no solution and should be thrown out on principle" view.

Door locks aren't perfect; our justice system isn't perfect; software isn't perfect. But they're all pretty useful and I just can't get behind a movement to abolish any of them because they fail to solve every individual case.

amadeuspagel · 3 years ago

If some bad actors have access to inside information, that's a limited and solvable problem and doesn't mean that we should give up and give that information to everyone.

lisasays · 3 years ago

OTOH - having them be open may ultimately lead to stable algorithms that can't be easily gamed.

amadeuspagel · 3 years ago

That's impossible. If you know exactly how some algorithm works, you'll always be able to game it.

djha-skin · 3 years ago

Hard though, because the company often doesn't know itself what its "algorithm" is. It is often spread over lots of different teams and departments, and may comprise hundreds of thousands of little design decisions made by people all over the company who are not talking to each other.