Interestingly, Alexa was founded by Brewster Kahle, who founded the Internet Archive contemporaneously (both in 1996). Interesting flow of ideas between the two projects - one to figure out what is getting traffic (commercial) and one to figure out how to preserve it (non-profit).
It seems that the flow has reduced somewhat in recent months: Alexa haven't been providing crawl data to the Internet Archive since January 2021. (You can see this by looking at the "Items" graph on the Alexa Crawls collection [0])
I was digging through their entries, this is the latest scrape that I could find. Haven't successfully unpacked anything yet, but it seems to be legit [1].
Alexa is my go-to place to get a first impression on how much traffic a website gets.
I have never looked into how they get their data, I always assumed they get it from internet providers?
Is there an alternative? SimilarWeb publishes their data with a huge delay, so it is not of much use for me. From my experience, it is also less reliable.
It is crazy how valuable internet properties get closed down when they are owned by giant companies. According to Alexa, Alexa is still a top-5000 site:
I started a news aggregator that at the peak was ranked in the top ten Alexa sites in a country (was doing 100M page views / month). Alexa was the standard that people would use to judge how successful a site was.... Until people figured out how to massively game it.
For a pretty small sum, you could approach illegal download or porn sites to embed an iframe of your site or do a pop-under ad, and instantly be in the top 10. People aren't actually visiting your site, you're just getting "shadow traffic". Then you'd go tell advertisers "look, we are one of the biggest sites in the country".
We didn't play that game but our competitors did and it was really frustrating.
Anyway, the point is their rankings weren't very reliable (this was 8+ years ago, maybe they've gotten better at detecting traffic fraud).
apple.com being so high is (to me) a little counter-intuitive. microsoft.com too for the same reason. They don't seem like the sites people would be using a lot in their day-to-day lives. I guess it's counting dns lookups from devices and not necessarily "human" web page/app requests?
> Alexa is my go-to place [...] I have never looked into how they get their data
Right, this is a problem with all sorts of data sources that provide numbers (and use lots of SEO) but don't talk much about their methodology. CelebrityNetWorth is another example of this.
I've come to assume that celebrity net worth sites are mostly just made up numbers. Sometimes you can look up the payout for some specific jobs, but not all, and most of the time, they seem to just ballpark a guess based on that.
> Alexa is my go-to place to get a first impression on how much traffic a website gets.
Alexa hasn't been a reliable source of traffic data for many years. It's gotten worse as mobile devices, private browsers, VPNs, and tight-fisted companies (like Facebook) have become more widespread.
If you own a high-traffic site and check Alexa, it's not even close. One of my sites wasn't in the order of magnitude.
Tranco list [1] is considered the most accurate source for relative site ranking by traffic. It is a result of triangulation of data from several sources (one of them is Alexa) and it what we use at Kagi Search for domain information [2]
Handwavey "proprietary methodology" using data from "millions of Internet users using one of many different browser extensions" and "direct sources in the form of sites that have chosen to install the Alexa script."[0]
Google Analytics would only work on sites that run Google Analytics, which would exclude sites run by the other big tech companies. Alexa worked because their toolbar addon would record every site their users went to, regardless of what was running on the site.
I thought they had a browser toolbar/extension which they use to collect data from a very very small subset of internet users, which is probably incredibly biased to a certain audience. (e.g. boomers who don't know how to not install random toolbars when downloading stuff).
It's sort of wild to think that Amazon purchased Alexa.com in 1999 for ~$250M in stock, and that stock would be worth more than $7B today, if my math is correct.
Bought a website ranking company, used the company's name as the name of a consumer electronics assistant, and then shuttered the company 22 years later to (presumably) be able to use the domain name for more consumer electronics.
In essense, what the original Alexa did seems to have been internalized into Amazon ads (or their b2b analytics division), which is coincidently the fastest growing part of Amazon.
Wonder why they used the Alexa name for their home assistant.
This just struck me, and I don't know if it's true, but a-lex could be a play on Greek and Latin for "not" and "written." Which is a pretty good name for a voice-based input system.
>Early on, the team realized they needed a "wake word" that would make the device start listening. The word would need to have three syllables, a "distinct combination of phonemes" so as not to unnecessarily rouse the device, and an easily marketable name, like Apple's Siri.
According to my former manager who worked on the Alexa team at one point, it was a name with a high true-positive rate for their voice recognition system.
Did they just have extreme foresight about wanting the domain alexa.com or were they genuinely interested in alexa.com as a product? They kept it around for 22 years after all but I can't see how what alexa.com used to do has anything to do with what amazon is doing.
They bought it for data gathering and market/competitor research. I was with Amazon at the time. Was great for keeping track of what products were drawing most traction on eBay (Longaberger baskets and Beanie babies, says my vague memory)
Other companies do that, too - I think it's due to trademark law and how it's much harder to establish a new name than reuse an existing one. Cortana comes to mind.
A huge piece of the internet is going away: back in the early 2000s I would use Alexa a lot it was a gold mine of information - to the point where some people bragged about their rank on Alexa - that was hilarious, like the modern "twitter followers" / "github stars"
I worked for amzn for over a decade until 5 years ago... nobody ever talked about Alexa.com (I'm sure there was a department who did, but overall, it just never came up).
https://en.wikipedia.org/wiki/Brewster_Kahle
[0] https://archive.org/details/alexacrawls?tab=about
[1] https://archive.org/download/alexa20200802-00
Alexa is my go-to place to get a first impression on how much traffic a website gets.
I have never looked into how they get their data, I always assumed they get it from internet providers?
Is there an alternative? SimilarWeb publishes their data with a huge delay, so it is not of much use for me. From my experience, it is also less reliable.
It is crazy how valuable internet properties get closed down when they are owned by giant companies. According to Alexa, Alexa is still a top-5000 site:
https://www.alexa.com/siteinfo/alexa.com
It seems insane to just close it down.
For a pretty small sum, you could approach illegal download or porn sites to embed an iframe of your site or do a pop-under ad, and instantly be in the top 10. People aren't actually visiting your site, you're just getting "shadow traffic". Then you'd go tell advertisers "look, we are one of the biggest sites in the country".
We didn't play that game but our competitors did and it was really frustrating.
Anyway, the point is their rankings weren't very reliable (this was 8+ years ago, maybe they've gotten better at detecting traffic fraud).
https://domain.glass/ycombinator.com#dns_rank
Not the prettiest, but I use it a fair amount myself for researching domains.
Right, this is a problem with all sorts of data sources that provide numbers (and use lots of SEO) but don't talk much about their methodology. CelebrityNetWorth is another example of this.
Alexa hasn't been a reliable source of traffic data for many years. It's gotten worse as mobile devices, private browsers, VPNs, and tight-fisted companies (like Facebook) have become more widespread.
If you own a high-traffic site and check Alexa, it's not even close. One of my sites wasn't in the order of magnitude.
[1] https://tranco-list.eu/
[2] https://kagi.com
After that, they've paid other browser add-ons to add their script and some websites voluntarily gave them their data by adding their tracker.
It says so right on the about page (warning: unusable on mobile): https://www.alexa.com/about
I think they moved on to more broad data sources, possibly purchasing data from ISPs.
Translation: trust us
[0] https://www.alexa.com/about
Dead Comment
Deleted Comment
Bought a website ranking company, used the company's name as the name of a consumer electronics assistant, and then shuttered the company 22 years later to (presumably) be able to use the domain name for more consumer electronics.
Wonder why they used the Alexa name for their home assistant.
https://www.msn.com/en-in/money/news/here-are-the-other-name...
https://en.wikipedia.org/wiki/OpenSearch
https://en.wikipedia.org/wiki/OpenSearch_(software)
curl http://s3.amazonaws.com/alexa-static/top-1m.csv.zip --output ~/Downloads/alexa.zip
Today it contains the top 630779 records.
Thanks!
Deleted Comment
Deleted Comment
Dead Comment