Yeah, there's AI, but I added it because I found it easier to find answers I'm looking for. For the data scientists, you can download the CSV and go crazy. Would love to know what discoveries or learnings can be found from it.
To download the raw scraped data you need to become a paid member but you don't really need it unless you're wanting to finesse a table of data for a particular need. The cost is mostly just an incentive to help me pay the bills for running the website.
The bunch of available CSV files contain large amounts of data which has everything from tags, genres, pricing, wishlists, estimated revenue, etc. It's what the AI is reading from.
Hope you find it useful :-)
It means that steamdb, while extraordinarily useful for casual prodding at what's stored on Valve's servers, isn't very good if you want to run data analysis or something like that on the metadata of Steam games at scale.
Not sure if it's legal to charge for the raw scrape when OP doesn't seem to be affiliated with Valve, but that's not up to me to figure out.
[0]: https://steamdb.info/faq/
In this case, you don't. That's just to weed out people who can't figure out temporary emails. I just used one to create an account without turning over any PI.
Might be good to clarify in the FAQ because the people I know who would pay for this are not the most techy types.
Generally it's polite to avoid scraping if you can help it, so I'd start by considering whether OP is already providing what you are looking for.
It definitely won't fetch all the data that this person does though. It only fetches the current list of games on Steam, their store page information and some reviews for the game.
The code quality probably isn't amazing, but it might give you an idea of how to get started with your own scraper.
https://github.com/Netruk44/steam-embedding-search/blob/main...
I found this explanation from steamdb that points to the various projects and libraries they use to gather all the data they have. It's not a how-to, but it has very useful info.
From the Terms of Service (emphasis mine):
6. Restrictions on Use
You agree not to:
Do you intend to delineate the data provided by the service from "the Service" itself? It seems most fair that data received via Fair Use remains in that arena, pun fully intended.That aside, it's an intriguing dataset nonetheless, but I'd prefer to see a sample of the data before signing up.
I am not sure what is considered derivative work and what isn’t
Circular 14 addresses derivative works, including those based on data: https://www.copyright.gov/circs/circ14.pdf
Steamdb.info is a derivative work, yes. And scraping is usually accepted as Fair Use, so both services are presumably within their rights, but they have no claim to the underlying data, only their process of enrichment. If someone were to build a new service based on the data presented on either site, there's not much they could do to stop them... short of getting them to agree not to do so via their ToS.
OpenAI is a great example of a company who built a derivative work on scraped data available under Fair Use, and then subsequently gated their data via their ToS. With such a popular precedent at play, I'd rather not use any services doing anything similar, especially when steamdb.info doesn't even have a ToS.
Being a user, free, paid, or anonymous, can still be under the thumb of their ToS, especially so if they force a dialog in front of you to agree to the ToS while signing up. I'm merely pointing out hurdles to the OP that may obstruct some of the people they are trying to reach.
Also, for deeper insight than sales volumes (e.g., game design, general trends, demographics, types of players), such things would be crucial.
and
The biggest advantage that SteamDB has, is that it has a ton of historical data. That is not retrievable from the Steam Network, so the only way to have gotten historical data is to start early.
My website is now defunct for a year, but I've kept the scraper running. I now have 7 years of historical data in my database.