Show HN: I scrape Steam data every month and it's yours to download for free

Hi, I'm interested in scraping steam too. Do you have the scraper code available open source or one you recommend?

lolinder · a year ago

Have you looked over the data that OP is providing here and determined that it doesn't meet your needs?

Generally it's polite to avoid scraping if you can help it, so I'd start by considering whether OP is already providing what you are looking for.

DrammBA · a year ago

On the other hand you need to be a paid member to download the raw scraped data, so it isn't unreasonable to want to learn how to scrape it instead.

netruk44 · a year ago

I wrote a simple scraper for a 'steam game semantic search' app I built a while ago.

It definitely won't fetch all the data that this person does though. It only fetches the current list of games on Steam, their store page information and some reviews for the game.

The code quality probably isn't amazing, but it might give you an idea of how to get started with your own scraper.

https://github.com/Netruk44/steam-embedding-search/blob/main...

ddxv · a year ago

Thanks! That's perfect, just want somewhere to get started.

DrammBA · a year ago

https://steamdb.info/faq/#how-are-we-getting-this-informatio...

I found this explanation from steamdb that points to the various projects and libraries they use to gather all the data they have. It's not a how-to, but it has very useful info.

I had to refresh before posting, because I wanted to see if someone else beat me to being that HN commenter but...

From the Terms of Service (emphasis mine):

6. Restrictions on Use

You agree not to:

    Use the Service for any unlawful purpose.

    Attempt to reverse-engineer, modify, or *create derivative works of the Service.*

    Share, resell, or distribute downloadable data provided by the Service without explicit written permission.

Do you intend to delineate the data provided by the service from "the Service" itself? It seems most fair that data received via Fair Use remains in that arena, pun fully intended.

That aside, it's an intriguing dataset nonetheless, but I'd prefer to see a sample of the data before signing up.

csmets · a year ago

Thank you for highlighting this. I've updated the terms to align with the values of this service.

akudha · a year ago

Steamdb.info displays graphs etc. Is that considered a “derivative work”?

I am not sure what is considered derivative work and what isn’t

z3c0 · a year ago

IANAL but I am someone who deals heavily in 1) scraping and 2) data and the analysis, enrichment & brokerage thereof. As such, I like to consult this for anything regarding US Copyright law: https://www.copyright.gov/circs

Circular 14 addresses derivative works, including those based on data: https://www.copyright.gov/circs/circ14.pdf

Steamdb.info is a derivative work, yes. And scraping is usually accepted as Fair Use, so both services are presumably within their rights, but they have no claim to the underlying data, only their process of enrichment. If someone were to build a new service based on the data presented on either site, there's not much they could do to stop them... short of getting them to agree not to do so via their ToS.

OpenAI is a great example of a company who built a derivative work on scraped data available under Fair Use, and then subsequently gated their data via their ToS. With such a popular precedent at play, I'd rather not use any services doing anything similar, especially when steamdb.info doesn't even have a ToS.

JadoJodo · a year ago

At a glance, it appears the product is the “chat with the data" feature; The CSV is free.

z3c0 · a year ago

I might be inclined to seek the raw data, should it be more cost effective than scraping Steam myself.

Being a user, free, paid, or anonymous, can still be under the thumb of their ToS, especially so if they force a dialog in front of you to agree to the ToS while signing up. I'm merely pointing out hurdles to the OP that may obstruct some of the people they are trying to reach.

DrammBA · a year ago

What I don't understand is the difference between 'Download all CSV data' in the free tier and 'Download CSV data' / 'Download raw data' in the paid member tier. It seems that the free CSV data is likely an extract or digest of the raw data offered as a sample.

Apreche · a year ago

Do you have data that https://steamdb.info/ doesn’t have?

noirscape · a year ago

Steamdb lacks an API for one, and the devs officially have a policy that they'll never make one, saying you should just scrape Steam directly instead of bugging them about it[0].

It means that steamdb, while extraordinarily useful for casual prodding at what's stored on Valve's servers, isn't very good if you want to run data analysis or something like that on the metadata of Steam games at scale.

Not sure if it's legal to charge for the raw scrape when OP doesn't seem to be affiliated with Valve, but that's not up to me to figure out.

[0]: https://steamdb.info/faq/

seanw444 · a year ago

This whole time I was under the impression that SteamDB was owned by Valve. Huh.

joseda-hg · a year ago

That seems pretty reasonable, it's their data, they just make useful visualizations

m00dy · a year ago

If you need to be a paid member to download csv file, then it is not free :) lol

nickthegreek · a year ago

free tier allows you to download the csv.

xerox13ster · a year ago

If you need to make an account and give this guy personal information (a digital commodity like oil) to see the data it's not free lmao

stronglikedan · a year ago

> If you need to make an account and give this guy personal information

In this case, you don't. That's just to weed out people who can't figure out temporary emails. I just used one to create an account without turning over any PI.

kmfrk · a year ago

I got some answers that weren't specifically about my questions in some instances. As someone who's just trying out the free demo, it's not a big deal, but maybe you can provide a way to flag answers for to redeem their credits? It would probably increase retention and help people chase down bugs.

ghfhghg · a year ago

I guess the main differentiator over steamdb is getting the data in CSV?

Might be good to clarify in the FAQ because the people I know who would pay for this are not the most techy types.

stared · a year ago

Regarding Steam data, I am curious about how games are being played (hours spent) and, even more, about their co-occurrence (i.e., player X spent both time on game A and game B). I would love to make a visualization like https://p.migdal.pl/tagoverflow/?site=gaming&size=32, but for Steam data.

Also, for deeper insight than sales volumes (e.g., game design, general trends, demographics, types of players), such things would be crucial.

and

Ksudijaan · a year ago

It is not that difficult to scrape Steam data using SteamKit, right? I build a website around Steam data a couple of years ago, with a small scraper app which was hourly scraping new updates (using SteamKit) and putting it into my database.

The biggest advantage that SteamDB has, is that it has a ton of historical data. That is not retrievable from the Steam Network, so the only way to have gotten historical data is to start early.

My website is now defunct for a year, but I've kept the scraper running. I now have 7 years of historical data in my database.