Readit News logoReadit News
markhahn · 2 years ago
Data autonomy. How do we get across to people that there's real value in owning your data - controlling it, hosting it, not just being someone else's product.

Why should we not take the broadest possible view? You own your likes, your comments, your amazon order history, your dental xrays, your histology reports, everything. One way to incentivize data consumers and processors would be to make them liable for mishandling, make the data so radioactive that they don't even want to hold onto it.

idle_zealot · 2 years ago
I don't think the approach of getting consumers to care can work. Most people really just don't care about taking precautions when the negative impacts of not doing so are so diffuse and time-delayed; it's an unfortunate aspect of human nature. We usually overcome these individual failings by organizing into groups better suited for long-term planning. In this case the solution that comes to mind would be to make personal data legally onerous to hold and process for companies, to the extent that they would go out of their way to design their products and services to never touch the stuff, and if they do need it to operate then they would be incentivized to store it locally on users' devices and only synchronize it in a completely encrypted form such that they never have to deal with the legal implications of having access to it.
plagiarist · 2 years ago
Yeah, leaking financial data for millions of people should ruin the company and have fallout that hits the members of the board and C-suite. Instead it's actually just an opportunity to sell an identity theft "protection" subscription.

I have no faith that people will get clued in and make it happen. Everyone is merrily lining up to use the third-party face scanner at the airports.

CuriouslyC · 2 years ago
AI is going to push data autonomy hard. Users are going to want to subscribe to different models for different things, and plug those models into whatever they're using. Everyone is going to support it, and to make it work there needs to be data exchange. Companies that don't support it to try and keep the data walled are going to hemorrhage customers.
randunel · 2 years ago
So... GDPR? You can download your own data, force the data controller which acquired it to delete it and they're liable for mishandling it. Companies don't usually want to hold on to EU citizens' data because GDPR makes it quite radioactive, for them and whomever they sell it to.

Deleted Comment

cyanydeez · 2 years ago
this isn't being done for privacy these orgs want to keep their LLM gold chests to themselves.
nradov · 2 years ago
Legally speaking you already own your healthcare data such as dental X-rays. In many cases you can even download your data from provider and payer organizations in industry standard formats (although dentistry specifically is way behind in this area). But for most patients this data is worth zero. Legal and privacy concerns aside, there's just not much use for it. Several start-ups have tried to de-identify and agregate such data for sale to researchers but consistency and quality issues make it tough to use in real studies.
paulryanrogers · 2 years ago
Once data gets out it's impossible to completely take back. Regulations could help corral law-abiding actors, so I agree with the idea.

Though I can also see how the incentives of social platforms encourage clamping down further and further on 3P access.

r3trohack3r · 2 years ago
> Once data gets out it's impossible to completely take back

As a society, we understand that rights !== access. Just because you share your data with Facebook, giving them access to it, in the normal course of interacting with their servers does not grant them rights to that data.

When Netflix delivers a video to your device, society understands you can’t make a copy of that video and share it with your neighbor. That’s called “The Pirate Bay.”

Data on the internet is lacking equity: an ownership interest in property. As you generate data online, platforms accumulate equity in your data giving them control over that valuable property.

If, instead, you accumulated equity in your data the entire data broker market would be “The Pirate Bay.”

godelski · 2 years ago
This. Honestly I'd love a government to take up the privacy mantel and protect its citizens. Sure, you lose the ability to spy on your citizens but so does every other country in the world. Seems like that's a net win.

If USPS was made to ensure that all Americans can communicate, even making it explicit in the constitution. I don't know why this wouldn't also apply to cell phones and the internet. Are they not modern evolutions? Put the code on the gov's GitHub along with the rest. Other players can exist, but it sets a baseline standard. But any country can do this, doesn't need to be the US.

happytiger · 2 years ago
Pay them to hold on to their data and manage it? That’s about the only way for end-users to experience the actual value of data.
j1elo · 2 years ago
I was about to open a Mastodon account, but because I don't really use to publish anything too often anyway, instead left it for later.

A couple month passed and the instance I had in mind, closed for good.

I've been told before in HN that Mastodon's solution is non-existent for these situations. Has the landscape changed or you're still f*d if you choose the wrong one? (aka. an incentive for centralization or always chosing only among the most popular instances)

ldjb · 2 years ago
The solution is to run your own Mastodon instance. Unfortunately, I find doing so can be quite a hassle. Obviously it's not free, and not only do you need to configure it properly, but you need to handle backups, updates and so on. Even for me, with a fair bit of technical experience, it can be challenging. I think there are significant barriers to doing so for people with no or little technical knowledge.
CaptainOfCoit · 2 years ago
> The solution is to run your own Mastodon instance

The problem is that Mastodon isn't really great for single-user instances (hassle to upgrade, slurps resources for breakfast, etc) but there are plenty of other ActivityPub-compatible software that is great for single-user usage, like Pleroma, Micro.blog, Akkoma and more.

Helmut10001 · 2 years ago
I did it with a free VM and wrote a guide here [1]. Zero administration work with docker automatic updates.

[1]: https://du.nkel.dev/blog/2023-12-12_mastodon-docker-rootless...

ceejayoz · 2 years ago
I’m paying a coffee a month for masto.host to do it for me. Great little service.
huimang · 2 years ago
You're incentivized to either choose a stable, robust instance, or self-host. A lot of instances are pet projects that people kill off once they get bored of mastodon and don't want to pay for the upkeep anymore.

Self-hosting is the only way to really guarantee that it'll still be up years later. But you might get randomly banned from certain instances that blanket-ban single-user instances.

michaelt · 2 years ago
I've got to say, that doesn't sound like a very good solution.

How are people who aren't already part of the community supposed to know what a stable, robust instance is?

And self-hosting is all very well, but if I wanted to join a community comprised exclusively of greybeard unix sysadmins, I'd use IRC :)

praisewhitey · 2 years ago
Isn't self-hosting more likely to end up being one of those pet projects that gets killed off?

Dead Comment

Tijdreiziger · 2 years ago
It’s basically analogous to e-mail. To have an e-mail account, you have to choose a host first. Most people choose a popular host (Gmail, Outlook), but there are various reasons one could prefer a different one.
bachmeier · 2 years ago
> I've been told before in HN that Mastodon's solution is non-existent for these situations.

What are you worried about losing? You can download everything if it's a concern. Honestly though, the concern you're raising is non-existent on every platform, because the others only have a single instance.

proactivesvcs · 2 years ago
You're equally messed up if your account on a centralised service is banned, in which case you chose the wrong one. You could choose a server that's signed up to the Mastodon Covenant to provide some more peace of mind, and should always take a backup of your account from time-to-time.
prmoustache · 2 years ago
I chose a mastodon instance for which the financing is quite clear. I donate once a year. Admin is reactive and all to updates and also use the instance daily.

If somehow I find out admin start not giving updates and updating the instance, or the donations are not meeting the goal I assume I would have time to startup my instance and or move somewhere else.

stevenicr · 2 years ago
I really feel there is a need and future growth in distrubuted backup as a service -

A few of us should get together and make a 'backup your stuff service' that can pull from mastodon and any other service, and make 2 backups in two different places around the world.

Offer addons for storing BnW copies of pics maybe, addon's for other services.

Should login, add link to your thing, walk through authorizing whatever is needed, and getting an email or DM that backup succeeds every week or something.

keep320909 · 2 years ago
Run your own blog, on your own domain with github. It takes like 2 hours to setup and costs $20/year. Lately, it really feels like the whole "social media" decade was a dead end.
quaintdev · 2 years ago
what 20$? I am running it for free except for domain name cost.
axegon_ · 2 years ago
I was having this exact thought around 2015, not longer after spaCy became open source. When I first tested it out, I was blow away how well it performed in every possible task: it was light years ahead of nltk and gensim, which were the big and well established players in that space. Even back then I was certain that in the not-so distant future, data will cost a fortune: considerably more than it already did. And I won't lie, starting to harvest data online on a massive scale did cross my mind and capitalize on it when the day comes. And now I really regret not doing it. Reddit closed itself off, so did stackoverflow, twitter is a no go, facebook made it nearly impossible. Cloudflare makes traditional scraping nearly impossible, if scraping wasn't already a nightmare with the modern web stacks: the doors are nearly shut and it will only get worse.

It really hurts me to know that I expected this to happen and didn't do anything about it. Oh well... One of many missed opportunities in my life I suppose(WAY more than I'd like to admit).

seanwbren · 2 years ago
Farcaster is a decentralized social network, much like Twitter but with Channels (which are similar to subreddits). All the data is open.

A cryptographic signing key is attached to each account, and they have been experiencing very fast growth over the past month.

A dashboard is possible to share because the data is open: https://dune.com/pixelhack/farcaster

INTPenis · 2 years ago
What's the role of Ethereum in farcaster? I noticed it in the design Overview but I don't really have the time or motivation to get deeper.
pa7x1 · 2 years ago
Your identity is on Ethereum (a rollup on top of it being a bit more precise). So stuff like this https://techcrunch.com/2023/07/26/twitter-now-x-took-over-th... cannot happen.
r721 · 2 years ago
>Minimizing bot activity: Farcaster requires that new users pay a $5 sign-up fee, aimed at preventing the creation of spam accounts, and limits users to a restricted number of “casts” tied to paid “storage units.”

https://decrypt.co/resources/farcaster-explained-the-blockch...

notsure357 · 2 years ago
Does anyone really believe that an AI based on Reddit or Twitter/X data would somehow be more superior than other AI's? Or that it would somehow provide a snarky competitive advantage included with other data? I don't see it.
Cheer2171 · 2 years ago
Doesn't matter. Execs, MBA types, and VCs only hear "data is the new oil" and think it is just as fungible.
delfinom · 2 years ago
Unfortunately data post launch of ChatGPT is now worthless as it's contaminated by the very same bots
notyourwork · 2 years ago
Superior may depend on your goal. If it’s mass disinformation campaigns produced by generative AI, those mentioned data sets may be ripe for the cause.
causal · 2 years ago
Yup. They're exactly what you need if your want to imitate a redditor or Twitter user.

Also not useless for just learning language.

Moldoteck · 2 years ago
I don't use it that much, but the farcaster protocol as a backbone for a social app is super hot for devs right now. Most of the stuff is open source, they even changed some login steps so that it would be closer to web2 experience compared to usual 'web3'. Idk for how long it'll keep being this friendly for devs, but that's the state for now
Eisenstein · 2 years ago
This is hilarious because Sam Altman was on the board of reddit until recently. I don't believe that reddit closed off API access due to AI data scraping. They did it to force people to use their shitty app so they can get more ad impressions before the IPO.
mvkel · 2 years ago
Data is oil (including synthetic), and consumer data companies (social networks) are sensing that their data is soon going to be the only defensible IP they have. Gotta hoard every little bit of it to maximize market cap