Oh wow what a wild ride of an article! As someone who has led the building of recommender systems (PM) for the last 5 years this rubs me the wrong way.
> I defend myself from arbitrary data collection that fuels the algorithms using PiHole, the tracker-blocking Disconnect plugin, and Firefox, plus a few other tricks.
The author clearly has no idea what he/she is talking about. Sure, Everything they mention blocks collecting tracking information but recommender systems mainly track your behaviour in the experience itself (clicks, page views, time on page).
> That leaves me and other privacy conscious folk with just one lever to pull offer - disliking a video
Again this author has it all wrong. If you’re serious about privacy the last thing you want to be doing is providing more input into the models we create!
And finally the comment that made me realise this author has no idea what he’s talking about;
> The solution to both issues is obvious, technically easy, and yet commercially a nearly impossible proposition: open up all recommendation algorithms. Make them completely transparent, and, for the individual being targeted by the recommendation engine, completely programmable.
Are you joking? So he/she thinks the east answer if for every user to ”program” their own algorithm. That’s like saying every Tesla owner should be able to program their own car! Let’s open the source code of Tesla to every owner!
a) I agree with you that the author lacks a basic understanding, and that they do not solve the problem.
Yet:
b) I think there actually might be a positive outcome for several reasons, including innovation, marketing and ethics, if one is to allow trained model to be queried arbitrarily. At least in the general case. (Think what is happening with stable diffusion and LLMs/LMMs.) Recommendation systems of say Youtube might be more useful to people making videos -- essentially somehow create a pre-query system before one finishes a video.
c) As a couple others have noted, the Tesla example is not a good one. I understand it was formed in haste and I think we all can understand your reasoning. My thought though is:
There is a) the right of repair -- yes still being fought for -- and b) the full self driving ability that might actually demand open sourcing code and models.
How else do we envision the Courts and Congress to formulate liability and empower related LE, and rights groups investigations?
Recall that the aviation industry achieved the "FSD" mode by building the infrastructure, extreme reliability (space and aviation started byzantine systems) and definitely not black box.
P.S. As a personal note: I will never allow something to drive me, if I have any liability, especially if it is a limited "black box" that I can not even arbitrarily query.
Right now we are "cruising" along because liability is with the driver. Tesla and the auto industry will need to face that pill, if they ever achieve their goal. I think they will achieve it personally.
> That’s like saying every Tesla owner should be able to program their own car! Let’s open the source code of Tesla to every owner!
Other critiques aside, these things are not very similar. You can't run someone over with a buggy recommender system. You bork your own newsfeed, who cares?
My point was that it’s not easy at all to “open up” an machine learning model for a user to “program”. It’s complex enough for an ML engineer to develop!
In theory, maybe it is possible but it would be incredibly expensive to produce such a system, let alone maintain. But even then, the ability to control an Ml system is such a niche ask that it would never happen
I think you should be able to program your own car and that should be the default. And I think that can be made safe too.
It does not mean that any change should be allowed on the road. But there could be a system where you pay a fee and send your code to be audited and aprooved for use on public roads. Otherwhise you can only run that code on a test track.
And also not all parts of a car's software are safety critical.
Also working in the field and I don't understand why this article is even on the front-page. The author has clearly no idea what they're talking about, it almost reads as a satire.
I Have a feeling pooping on recommendations systems is en vogue right now. Social commentators love to point out how “evil” ML algorithms are while conversely complaining how X product is terrible at recommendations.
"Completely programmable" is obviously stupid, but having more control over the way items are ranked in your personal feed isn't, particularly when weights are chosen by hand.
I feel like "open the recommendation algorithms!" is easy to _say_, and it's less easy (and maybe impossible) to do effectively.
Should we have access (by law) to the code that makes recommendations on content? That seems like a governmental overreach. If that was the case, then we should have to have direct access to the brain of every editor/layout person of every newspaper. Why was one story above the fold vs below the fold vs on page 4? I don't see a whole lot of distinction between those two.
Once we get access to that, how do we keep people from abusing these algorithms on the supply side? Thought SEO sites were bad before? Guess what happens when those sites know what they're _actually targeting_, as opposed to blindly guessing!
As a question, how well do you understand the recommender system that you've created? For a given piece of content, would you be able to guess at the penetration into the platform? As an outsider, I sometimes feel like the recommender systems I interact with have grown out of the platform holder's control (see: Nazi/ISIS/ISIL propaganda on YouTube, etc.). I think these are the points the article is reacting to.
So if the article is saying "I would really rather not see Nazi/ISIS/ISIL propaganda," then, ok, that's good. That's a good want. But we're going to need an algorithm to determine what that propaganda _is_, and then downrank that for that user.
So, is the conclusion of the article... misguided? Sure! Is the desire of the author valid?
I don't know. I don't have time to read these articles.
I hope the /s at the end of the last comment was loud enough to not be stated.
But I read the article. Yeah, it was dumb. Specifically:
> If recommendation algos aren’t shared then we need - by legislation, if necessary - a switch that turns the recommendation engine off.
This is... baffling. On YouTube - the case-in-point of the article - you already have access to a non-algorithmic feed of videos: namely, your Subscriptions feed.
And if you don't want a recommendation algorithm, good luck finding new content that you like.
And if you "turn off" the recommendation algorithm, you want... what, exactly? The latest videos uploaded to the platform? Still a recommendation algorithm. Videos with words in titles similar to the last video you watched? Still a recommendation algorithm. Videos that the creators of your favorite channel watched? STILL A RECOMMENDATION ALGORITHM.
I was wrong. The desire of the author is far dumber than I originally gave credit for.
The distinction is easy to make, a recommendation algorithm is personal, automated, scalable to a gigantic amount of content, entirely opaque for the average user.
The social implications are much more dangerous than a newspaper editor and journalist following their paper's political bias.
Getting tired of this, because the algorithms themselves are worthless. Instead, demand websites that aren't just a recommendation feed. Demand the ability to categorize and rank content explicitly.
The problem is that these websites are creating a feed, and then training themselves based on how you engage with...the feed. It's a toxic feedback loop, with almost no levers for you to interact with. If you get stuck in a rabbit hole, it is difficult to teach it how to get you out. That's independent of the "algorithm" being used, it's a problem with all of them.
Not you personally, but users in general. See: Reddit. I never "accidentally" get into a conspiracy theory rabbit hole, that's something I can consciously join or unfollow. And because I have an explicit list of subreddits I'm a member of, the platform can much more accurately recommend me content when I'm in the mood to explore.
No need to make every person categorize/rank their own content. But let them choose which of the 3rd party ranking algorithms they'd like to have serving them their feed today.
I certainly am not going to take the time to do that. But at the same time, I also completely ignore programmatic recommendations, because I find them terrible.
What I do to discover interesting new stuff is to listen to the recommendations of friends, and to check out references or recommendations made from the stuff I already enjoy.
Basically, I do what everyone used to do to find new music, but for all other media as well.
> Twitter releasing the algorithm showed us that certain specific people were getting artificially boosted.
It did? Did I miss something here? I don't remember seeing that. There was certainly code that tallied certain specific people and topics, but I don't see an indication of what was done with that tally.
Two, Twitter claims that whole area was for telemetry.
This comment does a massive disservice & deeply injures the cause for real transparency. It does it by making a very cheap shot at a very dumb situation. To say that this makes not having the models OK or interesting or useful? I highly protest. This is a sad distracting sideshow & letting oneself be lured & baited into thinking this little morsel is filling or informative is... Ughh I just can't no, that's just so low a position.
We can't run the algorithm without the weights, be we can understand it -- or rather, we can understand it as much as it's possible for anyone to understand it. Having the weights will not add understanding. Having the loss function used during training will.
Yes, it will, because with weights you can run it against constructed data, and that kind of experiment with the actual algorithm (including weights) is a much better source of practical understanding than abstract analysis of the algorithm without weights.
Your perspective seems to be that I'm not asking for nearly enough, and I 100% agree & apologize for under representing the threat to society & opacity of the new enemy of democratic social good. Thank you & yes yes yes. We also need the training data too.
The weights aren't going to help understand "why we see what we see", which is what non-tech people (and this article) think they'll get from companies releasing "the algorithm"
Yes plus when you call a support desk, if the "person" on the phone is not a real person, the first words said is "You are talking with an AI, not a person."
I believe the only reason for Commercial Orgs are looking at ChatGPT is to eventually replace all support people with some kind of AI.
I care a lot more about outcomes than whether the person I'm talking to is "real" or not. If my bill gets corrected by an AI but not a person? I'll take the AI.
Think of the classic "is 0.002 cents the same as 0.002 dollars?" call - https://verizonmath.blogspot.com/ - I just tested, and chatGPT seems to understand the difference - so maybe this would be a case where the AI is better.
edit: to be clear I'm fine with labeling if that's desired, it's the "I want to talk to a real person" that I think is silly (and oddly similar to the "I want to talk to an American" that people also say).
And most people don't care how many carbs are in a candy bar but we mandate labelling because it creates transparency and accountability and empowers the public. We should be doing the same with technology, only far more aggressively and comprehensively.
If they remove the human element then it means all the rules become hard and fast because we lose the ability to reason with and get exemptions in special cases right?
There are many dystopian movies that showcase this and the resulting oppression.
In a world where support people are often 10.5 or 11.5 timezones away, how will we tell?
Seriously. A couple of years ago I hit a truck tire tread and roached my radiator 30 miles east of Denver on I25. I have State Farm auto insurance, I probably pay more than I should. Their after hours support was terrible, as if they'd never heard of someone needing a tow on an interstate highway outside of city limits and street/number/zip code addressing. "Westbound I25, east of Denver at mile marker 316" is a good location, but I had to talk myself hoarse because the idea of "Interstate Highway 25" was incomprehensible. Several reps wanted a street address, which just wasn't possible.
The towing company also basically held my car for ransom, I guess that's an experience an AI can't provide. Yet
Well, the problem with an AI-based system here might well have been that I-25 goes north and south through the heart of Denver rather than east and west.
The location "30 miles east of Denver on I-25" literally does not exist.
Perhaps you meant I-70?
And perhaps neither support humans far from Colorado nor an AI with a map would know to make that correct.
If talking to an ai is the cost of not being put on hold for 20 min for the simplest request so be it. Honestly I welcome ai customer service. Most of the time it's just something super simple i want to do.
I have never seen nor heard of any "ai" system for customer service that remotely approaches a barely competent human, and those are frustrating and annoying enough.
ChatGPT4-level systems may be able to get much closer if they are provided the training set, and if they can, then great.
BUT, the absolutely need to be able to figure out when they are at the limit of their knowledge/ability, and then hand-off to a human. ChatGPT4 has been spectacularly unable to do anything even close to this; and it actually just starts fabricating bullcrap in a highly confident voice.
Corporate managers are already no good, horrible, awful, terrible at actually providing service, as it seems their only goal is to provide a cursory appearance of service and reduce their human costs. (Incidentally, this also massively misses the opportunity to gather excellent data on where their company could improve it's product/service and gain market share.) "AI" will only exacerbate the trend until several generations in the future when it is actually good enough, and it may become a competitive advantage to provide better service.
Until "Artificial Intelligence" actually exists, we should use a different word. I understand the distaste for "bot", but something like "system" or "computer" or "network" would be fine.
We really need to stop personifying every project that has the categorical goal of AI, but not the result.
> I believe the only reason for Commercial Orgs are looking at ChatGPT is to eventually replace all support people with some kind of AI.
Support work is far from the only kind of human labor commercial orgs would like to automate if possible, and far from the only kind that AI could automate.
Add to that algorithms used to ascertain whether email will be treated as spam. I'm tired of explaining to people that Google and Outlook email are NOT deterministic, and that they could be missing email that might never even make it in to their spam folder, and they'd never know, nor would they ever know why.
People say that spammers would just do the minimum they need to get spam delivered, but every person can have their own personal threshold for their own spam filters.
> People say that spammers would just do the minimum they need to get spam delivered, but every person can have their own personal threshold for their own spam filters.
OK, so then rather then doing the minimum to get spam delivered, spammers would be more motivated to improve their spam to get above higher tresholds, so that people would have to set their tresholds higher and higher until all the legitimate mail they get is classified as spam.
Here's one possible approach: you can have a secret algorithm OR platform immunity, but not both.
News outlets don't have to publicize their internal editorial decisions, but they are responsible as publishers for the content they publish and can be sued if, for example they publish something false and defamatory.
I've never understood the "an imperfect solution is no better than no solution and should be thrown out on principle" view.
Door locks aren't perfect; our justice system isn't perfect; software isn't perfect. But they're all pretty useful and I just can't get behind a movement to abolish any of them because they fail to solve every individual case.
If some bad actors have access to inside information, that's a limited and solvable problem and doesn't mean that we should give up and give that information to everyone.
Hard though, because the company often doesn't know itself what its "algorithm" is. It is often spread over lots of different teams and departments, and may comprise hundreds of thousands of little design decisions made by people all over the company who are not talking to each other.
> I defend myself from arbitrary data collection that fuels the algorithms using PiHole, the tracker-blocking Disconnect plugin, and Firefox, plus a few other tricks.
The author clearly has no idea what he/she is talking about. Sure, Everything they mention blocks collecting tracking information but recommender systems mainly track your behaviour in the experience itself (clicks, page views, time on page).
> That leaves me and other privacy conscious folk with just one lever to pull offer - disliking a video
Again this author has it all wrong. If you’re serious about privacy the last thing you want to be doing is providing more input into the models we create!
And finally the comment that made me realise this author has no idea what he’s talking about;
> The solution to both issues is obvious, technically easy, and yet commercially a nearly impossible proposition: open up all recommendation algorithms. Make them completely transparent, and, for the individual being targeted by the recommendation engine, completely programmable.
Are you joking? So he/she thinks the east answer if for every user to ”program” their own algorithm. That’s like saying every Tesla owner should be able to program their own car! Let’s open the source code of Tesla to every owner!
Idiotic article.
a) I agree with you that the author lacks a basic understanding, and that they do not solve the problem.
Yet: b) I think there actually might be a positive outcome for several reasons, including innovation, marketing and ethics, if one is to allow trained model to be queried arbitrarily. At least in the general case. (Think what is happening with stable diffusion and LLMs/LMMs.) Recommendation systems of say Youtube might be more useful to people making videos -- essentially somehow create a pre-query system before one finishes a video.
c) As a couple others have noted, the Tesla example is not a good one. I understand it was formed in haste and I think we all can understand your reasoning. My thought though is:
There is a) the right of repair -- yes still being fought for -- and b) the full self driving ability that might actually demand open sourcing code and models.
How else do we envision the Courts and Congress to formulate liability and empower related LE, and rights groups investigations?
Recall that the aviation industry achieved the "FSD" mode by building the infrastructure, extreme reliability (space and aviation started byzantine systems) and definitely not black box.
P.S. As a personal note: I will never allow something to drive me, if I have any liability, especially if it is a limited "black box" that I can not even arbitrarily query.
Right now we are "cruising" along because liability is with the driver. Tesla and the auto industry will need to face that pill, if they ever achieve their goal. I think they will achieve it personally.
Other critiques aside, these things are not very similar. You can't run someone over with a buggy recommender system. You bork your own newsfeed, who cares?
My point was that it’s not easy at all to “open up” an machine learning model for a user to “program”. It’s complex enough for an ML engineer to develop!
In theory, maybe it is possible but it would be incredibly expensive to produce such a system, let alone maintain. But even then, the ability to control an Ml system is such a niche ask that it would never happen
It does not mean that any change should be allowed on the road. But there could be a system where you pay a fee and send your code to be audited and aprooved for use on public roads. Otherwhise you can only run that code on a test track.
And also not all parts of a car's software are safety critical.
I Have a feeling pooping on recommendations systems is en vogue right now. Social commentators love to point out how “evil” ML algorithms are while conversely complaining how X product is terrible at recommendations.
Should we have access (by law) to the code that makes recommendations on content? That seems like a governmental overreach. If that was the case, then we should have to have direct access to the brain of every editor/layout person of every newspaper. Why was one story above the fold vs below the fold vs on page 4? I don't see a whole lot of distinction between those two.
Once we get access to that, how do we keep people from abusing these algorithms on the supply side? Thought SEO sites were bad before? Guess what happens when those sites know what they're _actually targeting_, as opposed to blindly guessing!
As a question, how well do you understand the recommender system that you've created? For a given piece of content, would you be able to guess at the penetration into the platform? As an outsider, I sometimes feel like the recommender systems I interact with have grown out of the platform holder's control (see: Nazi/ISIS/ISIL propaganda on YouTube, etc.). I think these are the points the article is reacting to.
So if the article is saying "I would really rather not see Nazi/ISIS/ISIL propaganda," then, ok, that's good. That's a good want. But we're going to need an algorithm to determine what that propaganda _is_, and then downrank that for that user.
So, is the conclusion of the article... misguided? Sure! Is the desire of the author valid?
I don't know. I don't have time to read these articles.
But I read the article. Yeah, it was dumb. Specifically:
> If recommendation algos aren’t shared then we need - by legislation, if necessary - a switch that turns the recommendation engine off.
This is... baffling. On YouTube - the case-in-point of the article - you already have access to a non-algorithmic feed of videos: namely, your Subscriptions feed.
And if you don't want a recommendation algorithm, good luck finding new content that you like.
And if you "turn off" the recommendation algorithm, you want... what, exactly? The latest videos uploaded to the platform? Still a recommendation algorithm. Videos with words in titles similar to the last video you watched? Still a recommendation algorithm. Videos that the creators of your favorite channel watched? STILL A RECOMMENDATION ALGORITHM.
I was wrong. The desire of the author is far dumber than I originally gave credit for.
The social implications are much more dangerous than a newspaper editor and journalist following their paper's political bias.
The problem is that these websites are creating a feed, and then training themselves based on how you engage with...the feed. It's a toxic feedback loop, with almost no levers for you to interact with. If you get stuck in a rabbit hole, it is difficult to teach it how to get you out. That's independent of the "algorithm" being used, it's a problem with all of them.
Who has time for this? What about discovering interesting and new content amidst millions of options?
As for new stuff, this is how I generally do it:
Interact with friends and others near me. We trade all the time, recommend all the time. This is my long standing favorite.
Community discussion. Here, for one. This is a strong second.
Search, and now it may appear a GPT entity may well serve me up stuff I am going to like and need.
See what "the feed" has in it.
What I do to discover interesting new stuff is to listen to the recommendations of friends, and to check out references or recommendations made from the stuff I already enjoy.
Basically, I do what everyone used to do to find new music, but for all other media as well.
For example, Twitter releasing the algorithm showed us that certain specific people were getting artificially boosted.
We also saw that the topic of the war in Ukraine was getting special treatment.
Yes, we all guessed that was happening, but without the code it was just supposition that Twitter could deny.
It did? Did I miss something here? I don't remember seeing that. There was certainly code that tallied certain specific people and topics, but I don't see an indication of what was done with that tally.
Two, Twitter claims that whole area was for telemetry.
This comment does a massive disservice & deeply injures the cause for real transparency. It does it by making a very cheap shot at a very dumb situation. To say that this makes not having the models OK or interesting or useful? I highly protest. This is a sad distracting sideshow & letting oneself be lured & baited into thinking this little morsel is filling or informative is... Ughh I just can't no, that's just so low a position.
Yes, it will, because with weights you can run it against constructed data, and that kind of experiment with the actual algorithm (including weights) is a much better source of practical understanding than abstract analysis of the algorithm without weights.
I believe the only reason for Commercial Orgs are looking at ChatGPT is to eventually replace all support people with some kind of AI.
Think of the classic "is 0.002 cents the same as 0.002 dollars?" call - https://verizonmath.blogspot.com/ - I just tested, and chatGPT seems to understand the difference - so maybe this would be a case where the AI is better.
edit: to be clear I'm fine with labeling if that's desired, it's the "I want to talk to a real person" that I think is silly (and oddly similar to the "I want to talk to an American" that people also say).
There are many dystopian movies that showcase this and the resulting oppression.
So anything that you don't value is silly?
Seriously. A couple of years ago I hit a truck tire tread and roached my radiator 30 miles east of Denver on I25. I have State Farm auto insurance, I probably pay more than I should. Their after hours support was terrible, as if they'd never heard of someone needing a tow on an interstate highway outside of city limits and street/number/zip code addressing. "Westbound I25, east of Denver at mile marker 316" is a good location, but I had to talk myself hoarse because the idea of "Interstate Highway 25" was incomprehensible. Several reps wanted a street address, which just wasn't possible.
The towing company also basically held my car for ransom, I guess that's an experience an AI can't provide. Yet
The location "30 miles east of Denver on I-25" literally does not exist.
Perhaps you meant I-70?
And perhaps neither support humans far from Colorado nor an AI with a map would know to make that correct.
I have never seen nor heard of any "ai" system for customer service that remotely approaches a barely competent human, and those are frustrating and annoying enough.
ChatGPT4-level systems may be able to get much closer if they are provided the training set, and if they can, then great.
BUT, the absolutely need to be able to figure out when they are at the limit of their knowledge/ability, and then hand-off to a human. ChatGPT4 has been spectacularly unable to do anything even close to this; and it actually just starts fabricating bullcrap in a highly confident voice.
Corporate managers are already no good, horrible, awful, terrible at actually providing service, as it seems their only goal is to provide a cursory appearance of service and reduce their human costs. (Incidentally, this also massively misses the opportunity to gather excellent data on where their company could improve it's product/service and gain market share.) "AI" will only exacerbate the trend until several generations in the future when it is actually good enough, and it may become a competitive advantage to provide better service.
We really need to stop personifying every project that has the categorical goal of AI, but not the result.
Support work is far from the only kind of human labor commercial orgs would like to automate if possible, and far from the only kind that AI could automate.
you want to make the call even longer than necessary to provide me with information i can do nothing about. excellent.
People say that spammers would just do the minimum they need to get spam delivered, but every person can have their own personal threshold for their own spam filters.
OK, so then rather then doing the minimum to get spam delivered, spammers would be more motivated to improve their spam to get above higher tresholds, so that people would have to set their tresholds higher and higher until all the legitimate mail they get is classified as spam.
Deleted Comment
News outlets don't have to publicize their internal editorial decisions, but they are responsible as publishers for the content they publish and can be sued if, for example they publish something false and defamatory.
Aside sticking a camera in the newsroom and livestreaming all day, what would this entail?
Deleted Comment
Door locks aren't perfect; our justice system isn't perfect; software isn't perfect. But they're all pretty useful and I just can't get behind a movement to abolish any of them because they fail to solve every individual case.