This data's definition of "famous" or "notable" is in the "Measuring notability" section of the linked paper:
we build a synthetic notability index using five dimensions to figure out a ranking for this broader set of individuals. These dimensions are:
1. the number of Wikipedia editions of each individual; [i.e. number of languages in which this person has a Wikipedia article]
2. the length, i.e total number of words found in all available biographies. […]
3. the average number of biography views (hits) for each individual between 2015 and 2018 in all available language editions […]
4. the number of non-missing items retrieved from Wikipedia or Wikidata for birth date, gender and domain of influence. The intuition here is that the more notable the individual, the more documented his/her biographies will be; [!]
5. the total number of external links (sources, references, etc.) from Wikidata.
We then determine the quantile values from each dimension and add them all to define our notability measure
Thanks so much for digging this out! Very useful to know. So, the notability methodology fails massively, as many have noted. Jesus and Muhammad trailing Britney Spears by a factor of 4 or so is my favorite so far ... LOL. But the question becomes, how can the notability be improved. Of course "AI" is probably the answer here, in the same way it is becoming the answer to so many questions/problems. (Just as the answer to every legal question is "it depends".) Two elements pop to mind: (i) Accessing more things outside of the Wikipedia/Wikidata database. (ii) Within the Wiki world, making associations like Jesus ~ Christian ~ bible ~ best selling books.
> Jesus and Muhammad trailing Britney Spears by a factor of 4 or so is my favorite so far
To play devil's advocate (edit: pun entirely not intended!), I'd bet that way more people today could correctly identify a photo of Britney Spears than an accurate rendition of either Jesus's or Muhammed's faces. Obviously this map isn't supposed to be most "recognizable" people, but I think there's something to be said about whether the person itself is different from the mythos around them (which may or may accurately describe their life).
1) Apply a discount on notability for people based on how near their birthdate is to now. If Hammurabi and Jordan Peterson have the same score, Hammurabi should win by far.
2) Use an additional book corpus. Someone mentioned in books from 1500, 1800 and 2022 should score higher than someone popular in only one era.
Does Pablo Escobar belong in that conversation? Or am I like many others who have just been exposed to the show Narcos which makes up the entirety of our Colombian experience?
Crazy how little women there are, it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
They weren’t ignored. For most of recorded history the basic unit was the family.
The men were in charge of public affairs of the family, while women were in charge of private and domestic affairs.
It was only recently the basic unit has been further subdivided into individuals, which required many to rely on institutional support on matters that used to be within the family, eg education, pensions, restaurants, clothing shops, apartment complexes, birth control.
The truly ignored throughout history were the peasants and serfs. Most men of significance were from aristocratic or upper class upbringings.
The divide is not between men and women, but haves vs have nots.
Fame is such a bizarre and frankly perplexing concept. It does not
equate to achievement, to competence or success per se. It says
nothing of the goodness or value of a person, the wealth they created,
the families they raised, the hearts they broke, and very little of
the suffering and joy they experienced as actual people. It's an
ever-fading trace left in the (mostly) written records of
institutions, where the narrow spotlight of social consciousness shone
at some time.
What I find most interesting as I explore history and civilisation is
the marginal web that supports what is "notable". Almost every
breakthough has a "revisionist" version of someone else who made a
simultaneous advance. Or allegedly had the limelight stolen from them.
Every Crick and Watson have their Rosalind Franklin. For every Charles
Babbage and Alan Turing there's an Ada Lovelace or Mavis Batey. And
yet those are at least "noted". Who and what lies behind those figures
in the third and fourth rows of history's group photograph, "fame"?
you say this as a linear progression, but you're only paying attention to certain written histories and ignoring a lot of anthro/archaelogical research of (large-scale post discovery of agriculture societies) cases where it was otherwise
I went looking to see who the entry was for the nearest town to where I live expecting it to be Mary Somerville and was rather disappointed to find it was some chap I'd never heard of.
>it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
I'm so sick of shit like this. It's so intellectually offensive, I can't be polite any longer.
It's so incredibly rude to dismiss so many great women just because you didn't hear about them, as if being famous is the ultimate test of potential. As if being a famous author or famous SOMETHING is the ultimate goal in this life.
I'll use my mother as an example. She's a truly great woman. She'll never be famous to you (she has no such vain desires anyway), but she's a great human being, much greater than you'll ever be, for she rejects DEMOGRAPHIC quotas, she's honest, and compassionate, and pious, and loving, and fun, and courageous, and every day she lives up to her potential and more, and she inspires her family and friends to do the same. She does what she does and she loves doing it and she does it well.
And how willfully ignorant it is to ignore the different powers and motivations unique to men and to women.
If you think there's a problem with so few famous women, then that's a personal problem, that's a you problem. You are the problem, because you are imposing your own personal beliefs and personal standards onto women.
If your criteria is "Wikipedia notability", we have been ignoring more like 98 per cent of our potential since antiquity. By far the most people who lived and died were subsistence farmers, most of them not even personally free (either serfs or slaves), and good luck making it to Wikipedia as a serf boy from Upper Nowhere, rural Campania of 635 AD.
Sometimes I wonder whether the entire contemporary American obsession with race and gender has been deliberately and cynically manufactured or at least blown up beyond all proportion to keep everyone's eyes away from class, the most formidable societal barrier almost everywhere, including societies that are ethnically fairly homogenous.
Current estimates are that around 100 billion people have ever lived. So that's a lot more than 50% that have been "ignored".
It turns out that if you look for notability or exceptional attributes you will get mostly men. This is due to biology and essentially the whole reason males and sexual reproduction exists.
This doesn't mean that being male will give you a better chance of being exceptional or notable, though. Quite the opposite, in fact. The bar is lower for women because simply being a woman is considered notable precisely because there are so few notable women.
There is an option in the top left that allows you to show city names. Unfortunately, you cannot see the city names and the people names at the same time.
And sisters and daughters... But somehow they never had the same opportunities to get on this "Famous People" list. Last week I told my Daughter she can be a knight (although granted she usually wants to be a princess), and I felt weird and then I felt extra weird.
Or maybe men don't have high expectations of women, or because they benefit from women having a subservient position, aren't very inclined to change society.
I suppose this is actually representing the most famous people in the -western world's lens- rather than the most famous people to each country respectively. For example, Haruki Murakami is a Japanese author, very famous in the west because their books have been translated into English. But would they be the most famous person from Kyoto to people in Japan?
That's something that's always fascinated me about the internet, it's essentially delineated by language and not country. If you google things in Spanish, you get the spanish web. If you google things in Japanese, you get the Japanese web. For a subtle example of this, there's very little crossover between Japanese memes and English memes, it's a whole different web. Japanese web design is also famously different to western web design, it's formed it's own set of UX expectations and principles.
There's a lot of discussion here of the 'western lens' as you bring up, but I'm not sure that's fair criticism. The creator(s) aggregated data and built something very interesting. To complain that the data they used isn't universal doesn't seem fair. I think Wikipedia is a reasonable starting place, but yes, Wikipedia skews geographically.
All datasets have bias. It's okay to acknowledge that and still find insights in the data.
Honestly curious: what highly accessible dataset that allows for the simple creations of 'fame metrics' would be better? I'm not aware of any.
It wasn't a criticism, of course something like this is limited by the data available and that's no the fault of the author. I was just musing on what might be a side effect of using what's available.
There wouldn't be a 'total complete and true set' of data for this task, since not all countries use wiki's to the same extent, and languages don't actually delineate between country (eg: Spanish wikipedia is not exclusively the view of people from Spain, nor is English wikipedia exclusively the views of people from England).
Yeah, similarly Dorothea Jordan, 17th century actress, is ranked above John O'Shea, premiership footballer, or Thomas Francis Meagher, originator of the Irish flag, leader of the 1848 young irelanders rebellion against British rule, and later US general in the american civil war.
If you're going by contemporary sources, I'd expect O'Shea to be on top, if we're including historical sources, I'd expect Meagher to be on top, unless Dorothea has some significant fame elsewhere than her city of birth
The word 'racist' didn't appear until your post. It's a shame we can't discuss obvious language bias without you becoming defensive about imagined sleights.
As soon as I saw Leonardo DaVinci and Picasso for Italy and France, I knew this was going to be the western lens, haha. Would be interesting to select the country as a point of reference.
DaVinci, Picasso & other western artists also score highest on the list of most expensive paintings. Would you consider an economic view to be a more balanced measure of fame? Many of the buyers are Middle Eastern or Asian too.
I had no idea who this was but I guessed cricketer. I guessed right. Seems like Westerners have better taste in Mumbai-ites than Indians do. I mean--noted author vs. guy who excels at weird ball 'n stick game.
On the language internet point, it's pretty amazing, yeah. For example, all the English youtube niches have Spanish language equivalents, and watchers of one are totally unaware that they are sitting right next to watchers of another. Like some sort of shadowverse.
A pity that so much language-agnostic material we'll never see because search engines and algorithms are so effective at this segregation. Translation is good enough for browsing in completely unknown languages for internet exploration fun, but only one at a time -- still waiting for a practical multilingual search engine.
> For example, in countries bordering Russia, science nobel laureates are missing, but racist pseudoscientists and UFO theorists are listed.
Maybe those pseudoscientists and UFO theorists are more "well known" than the Nobel laureates? Also, these are not opposites: there are various examples of Nobel laureates that later became pseudoscientists (see Luc Montagnier [1]).
I didn't specify a gender at all, not out of some intentionality, that was just the most natural way for me to write that sentence. So I'm not sure what you're talking about frankly.
Answering a question I had looking at this amazing work, the data set has a heavy English influence, but they are aware of it and also worked toward mitigating the effect. From the source:
> This strategy results in a cross-verified database of 2.29 million unique individuals (an elite of 1/43,000 of human being having ever lived) among which 30% come from the 6 non-English editions of Wikipedia, a significant improvement over earlier works that have only focused on English Wikipedia only.
The difference between the EU and US is wild. EU is mostly historical figures, Picasso, Da Vinci, Erasmus, Van Gogh, and of course Adolf. But US, even though some old presidents, it's mostly pop & movie stars.
It's a cool map, now i would really want to play. If i could color code the names by birthdate it would be possible to get a great new insight in regional relevance over time. Also switching between current residence and birth place would be very interesting as well as color coding the distance between birth and current residence to see where attractive places are or how much of a role to become famous the embedding from birth would be.
Very cool project, and also reveals buggy data to fix.
One note if the creator is here: it looks like deprecated locations are included. https://www.wikidata.org/wiki/Q596717 includes both Indiana (deprecated) and Linton, Indiana, and he shows up on the map near the center of Indiana apparently as its most notable person, which is clearly not the case.
we build a synthetic notability index using five dimensions to figure out a ranking for this broader set of individuals. These dimensions are:
1. the number of Wikipedia editions of each individual; [i.e. number of languages in which this person has a Wikipedia article]
2. the length, i.e total number of words found in all available biographies. […]
3. the average number of biography views (hits) for each individual between 2015 and 2018 in all available language editions […]
4. the number of non-missing items retrieved from Wikipedia or Wikidata for birth date, gender and domain of influence. The intuition here is that the more notable the individual, the more documented his/her biographies will be; [!]
5. the total number of external links (sources, references, etc.) from Wikidata.
We then determine the quantile values from each dimension and add them all to define our notability measure
They also have a table of what this metric throws up as the most "notable" from each time period: https://www.nature.com/articles/s41597-022-01369-4/tables/3 and how the "domain" varies over time: https://www.nature.com/articles/s41597-022-01369-4/figures/2 (note Nobility and Religious in 500–1000, to Sports and Culture post 1950).
Natalie Portman - notability rank 221
Jesus - notability rank 204.5
Go a bit further north to Haifa, and you'll find Gene Simmons with a notability rank of 2136.
To play devil's advocate (edit: pun entirely not intended!), I'd bet that way more people today could correctly identify a photo of Britney Spears than an accurate rendition of either Jesus's or Muhammed's faces. Obviously this map isn't supposed to be most "recognizable" people, but I think there's something to be said about whether the person itself is different from the mythos around them (which may or may accurately describe their life).
2) Use an additional book corpus. Someone mentioned in books from 1500, 1800 and 2022 should score higher than someone popular in only one era.
I think Simon Bolivar or Shakira or Gabriel Garcia Marquez or many others have a better claim to the title
Especially since Jean-Paul Sartre was born in Paris
What's weird is that wikidata has the correct info https://www.wikidata.org/wiki/Q9364
Change back to Paris on 22 December 2018.
On 17 March 2019 2a01:e35:8ab4:ac00:75c3:3673:f22b:4a45 changed to Tokyo.
On 30 September 2019 201.187.105.154 changed to Chile.
On 16 January 2020 changed to Efflamm.
On 16 January 2020 changed to Paris, where it's been ever since.
This signature tells us the dataset for the paper was extracted in November or December of 2018.
Various other bits of high-schooler sabotage:
30 September 2019 201.187.105.154 changed place of death to Easter Island.
29 November 2018 190.247.191.178 changed place of burial to Bikini Bottom.
7 March 2019 201.164.233.103 changed cause of death (P509) to cocaine.
I understand that everyone consume different things :P
Deleted Comment
Crazy how little women there are, it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
The men were in charge of public affairs of the family, while women were in charge of private and domestic affairs.
It was only recently the basic unit has been further subdivided into individuals, which required many to rely on institutional support on matters that used to be within the family, eg education, pensions, restaurants, clothing shops, apartment complexes, birth control.
The truly ignored throughout history were the peasants and serfs. Most men of significance were from aristocratic or upper class upbringings.
The divide is not between men and women, but haves vs have nots.
What I find most interesting as I explore history and civilisation is the marginal web that supports what is "notable". Almost every breakthough has a "revisionist" version of someone else who made a simultaneous advance. Or allegedly had the limelight stolen from them. Every Crick and Watson have their Rosalind Franklin. For every Charles Babbage and Alan Turing there's an Ada Lovelace or Mavis Batey. And yet those are at least "noted". Who and what lies behind those figures in the third and fourth rows of history's group photograph, "fame"?
Dead Comment
https://en.wikipedia.org/wiki/Mary_Somerville
Worth noting:
"In 1834 she became the first person to be described in print as a 'scientist'"
I'm so sick of shit like this. It's so intellectually offensive, I can't be polite any longer.
It's so incredibly rude to dismiss so many great women just because you didn't hear about them, as if being famous is the ultimate test of potential. As if being a famous author or famous SOMETHING is the ultimate goal in this life.
I'll use my mother as an example. She's a truly great woman. She'll never be famous to you (she has no such vain desires anyway), but she's a great human being, much greater than you'll ever be, for she rejects DEMOGRAPHIC quotas, she's honest, and compassionate, and pious, and loving, and fun, and courageous, and every day she lives up to her potential and more, and she inspires her family and friends to do the same. She does what she does and she loves doing it and she does it well.
And how willfully ignorant it is to ignore the different powers and motivations unique to men and to women.
If you think there's a problem with so few famous women, then that's a personal problem, that's a you problem. You are the problem, because you are imposing your own personal beliefs and personal standards onto women.
https://ideas.ted.com/you-can-help-fix-wikipedias-gender-imb...
https://www.wikiloveswomen.org/
There must be other initiatives if others have links to share in this thread.
Deleted Comment
https://en.wikipedia.org/wiki/Gender_bias_on_Wikipedia
Sometimes I wonder whether the entire contemporary American obsession with race and gender has been deliberately and cynically manufactured or at least blown up beyond all proportion to keep everyone's eyes away from class, the most formidable societal barrier almost everywhere, including societies that are ethnically fairly homogenous.
It turns out that if you look for notability or exceptional attributes you will get mostly men. This is due to biology and essentially the whole reason males and sexual reproduction exists.
This doesn't mean that being male will give you a better chance of being exceptional or notable, though. Quite the opposite, in fact. The bar is lower for women because simply being a woman is considered notable precisely because there are so few notable women.
But definitely not 50%.
Well, most all had a mom...
That's something that's always fascinated me about the internet, it's essentially delineated by language and not country. If you google things in Spanish, you get the spanish web. If you google things in Japanese, you get the Japanese web. For a subtle example of this, there's very little crossover between Japanese memes and English memes, it's a whole different web. Japanese web design is also famously different to western web design, it's formed it's own set of UX expectations and principles.
All datasets have bias. It's okay to acknowledge that and still find insights in the data.
Honestly curious: what highly accessible dataset that allows for the simple creations of 'fame metrics' would be better? I'm not aware of any.
There wouldn't be a 'total complete and true set' of data for this task, since not all countries use wiki's to the same extent, and languages don't actually delineate between country (eg: Spanish wikipedia is not exclusively the view of people from Spain, nor is English wikipedia exclusively the views of people from England).
Eg. Phil Collins isn't shown in favour of a cricketer from the early 20th century?
Sometimes things are imperfect, not racist.
If you're going by contemporary sources, I'd expect O'Shea to be on top, if we're including historical sources, I'd expect Meagher to be on top, unless Dorothea has some significant fame elsewhere than her city of birth
https://en.wikipedia.org/wiki/List_of_most_expensive_paintin...
For example, in countries bordering Russia, science nobel laureates are missing, but racist pseudoscientists and UFO theorists are listed.
Maybe those pseudoscientists and UFO theorists are more "well known" than the Nobel laureates? Also, these are not opposites: there are various examples of Nobel laureates that later became pseudoscientists (see Luc Montagnier [1]).
[1]: https://en.wikipedia.org/wiki/Luc_Montagnier#Controversies
I came back here to write pretty much this comment.
Deleted Comment
> This strategy results in a cross-verified database of 2.29 million unique individuals (an elite of 1/43,000 of human being having ever lived) among which 30% come from the 6 non-English editions of Wikipedia, a significant improvement over earlier works that have only focused on English Wikipedia only.
https://www.nature.com/articles/s41597-022-01369-4
Go over the Levant and you start seeing Paul the Apostle, Diogenes, Ptolemy, etc. which makes Voltaire look like a modern political commentator.
And that it hasn't done so with philosophers, artists, scientist or dictators. But mostly with entertainers.
And Keanu Reeves! I had no idea.
> We document an Anglo-Saxon bias present in the English edition of Wikipedia, and document when it matters and when not.
Regardless of these biases, Europe has much more historical background than the US.
Finally, this data is based upon Wikipedia and Wikidata. I gather datasets from India or China would provide much different results.
Interesting project nonetheless!
[1] https://www.nature.com/articles/s41597-022-01369-4
One note if the creator is here: it looks like deprecated locations are included. https://www.wikidata.org/wiki/Q596717 includes both Indiana (deprecated) and Linton, Indiana, and he shows up on the map near the center of Indiana apparently as its most notable person, which is clearly not the case.