"Cybergibbons" from PentestPartners has done a teardown of W3W in 2021 on twitter. It was very spicy and the takeaway was not that it is just ugly implementation as some here suggest, but flawed by design:
I can understand overlooking homophones, but plurals was just stupid.
Someone came up with a clever idea, and clearly they had some math/cipher skills. But w3w clearly had no linguists on staff. Even Wordle did a better job filtering their word list.
Though with some of the words being so lengthy and linguistically complex, they would have been better moving to a four word system and assuring simpler words that was better filtered.
Interesting that the researchers didn’t analyze W3W directly because it would violate their ToS, but instead constructed their own word list. I would think that this sort of criticism of a technology used by the government (for emergency services) would be acceptable, similar to “fair use” in the US.
Edit: it looks like the list itself is not even available, they would have had to reverse-engineer the app, so using a synthetic list makes more sense.
There's a reverse engineering of What3Words from a few years ago floating around out there called "What Free Words". It includes what claims to be their word list. I imagine the researcher is familiar with this but has chosen not to include it in their research.
(W3W is very aggressive about pursuing possibly spurious IP claims to remove any trace of What Free Words. In the past, even mentioning it in a comment like this is enough to attract a DMCA complaint.)
It's interesting because I'm fairly sure that work has already been done. Even without criticising the geocoding itself, the fact that there are singlular/plural pairs in the word list along with easily-confusable homophones says a lot about the product.
It's worth noting that some huge limitations/risks of W3W are there because of the flaws in their word list choice, so if you're analyzing a different wordlist, you explicitly aren't looking at a big source of problems.
I volunteered in SAR for five years, and the topic of W3W would occasionally come up.
For me, it’s just already damn hard to make sure you can hear numbers correctly over the radio. In marginal conditions, it’s a hell of a lot easier to use numbers (and requires less time due to not having to repeat or ask for clarification).
Do I really want to be trying to say “arrows.midst.senses” over a handheld radio?
Especially since "arrows.midst.senses" and "arrow.midst.senses" are two different locations, and if you mishear it as "arrows.midst.sends", that's yet another different valid location.
Really, there's all the research done (since the invention of early radio?) on what makes a phonetic alphabet or phonetic code usable in noisy conditions, and instead W3W effectively plants a field of rakes in front of blindfolded people.
Very similar experience, decade in volunteer SAR with a lot of police and fire overlap.
I hold exactly the same opinion as you, however what is very telling us that the police and fire services love it. I think that's because of the accessibility of the system rather than the underlying algorithm or communication.
I am also coming to see w3w locations written more and more as a more accessible alternative to coordinates, especially in land sales and forestry situations.
I do particularly hate that government services and agencies are becoming reliant on a commercially licensed pattern. It does not tick the sustainability tickbox for me.
poor implementation aside, the format has its advantages over numerical coordinates
besides the greater public awareness of w3w amongst certain demographics compared to coordinates, they may be harder to hear, but it's much easier to read out three words correctly than it is 16-18 numbers
the thing with words is that they're easy to hear in context, but take them out of context and they're often indecipherable
perhaps a system that gives each square a coherent-sounding sentence could be tried, although I'm sure that would have its difficulties too
Why are you not sending text messages to each other if it is so hard to understand? You can have plenty of error correction to make sure the message always is correctly received.
If they are in a situation in which they can send W3W coordinates by text message, then they are in a situation in which they can send ordinary numerical coordinates by text message.
It really makes no sense to use it for emergency calls. 999/911 can already access your location by GPS.
I think the issue is that other less emergency numbers can't (like 101). That's why they tell you to hang up and call 999 rather than transferring you.
But I don't know why they just don't add more numbers to the "allowed to geolocate you" list. Or even build in a "send my location" button in phone calls. You wouldn't even need to modify the phone system - it could send DTFM tones.
Really GPS on the device or just tower trangulation data? Is there information availae on how this works?
In Austria emergency services can not access GPS directly afaik.
Note that GPS can be often unreliable or not precise enough. For example, it would be pretty useful to specify correct side of river/road/railway. Arriving at wrong side may result in needing to do very long detour.
In French and (potentially other languages), I can add another criticism on top of the paper, some the words used in the dictionary are so obscure that a native speaker would not know them. They probably generated them randomly from a dictionary instead of asking native speakers.
The whole project has then no real purpose if you have to spell unknown words.
It also doesn't work well with agglutinative languages like Korean. The wordlist contains a lot of word sets only differentiated by postpositions (e.g. 기쁜 delightful vs. 기쁨 delight).
Plus Codes are way better because they are open source. S2 is kind of analogous (also from Google), but 64-bit integers. I wonder if someone will ever come up with an Emoji encoding for GPS, regional flags might facilitate everything. It's just a random idea.
I don't like how Plus Codes uses city names for relative ones. The result is longer than the full code. I also don't like how it uses the plus, it is hard to represent larger areas. I would have put the "+" at the beginning and use dots as separator, although I guess could look like phone number.
It has certainly picked up popularity, or at least use, in certain outdoor communities. I've noticed running events are increasingly noting the W3W address of their registration and start locations for instance, and associations related to emergency services have allowed themselves to be linked to the app (https://www.airambulancesuk.org/news-knowexactlyewhere/ for instance).
I thought I'd seen adverts from official emergency bodies mentioning W3W, but a quick search finds nothing mentioning the fact that call-centres have the facility available that isn't directly from W3W, a straight copy from one of their press releases, or a discussion about the system's deficiencies (i.e. https://www.bbc.co.uk/news/technology-56901363), so that might be me having not been cynical enough to identify an astroturf campaign!
We used it at a remote forestry work camp a year or so ago
It was great for the supervisor to just tell everyone 3 words to remember instead of UTC or LATLONG coordinates which I for one probably would not have remembered in an emergency
You can kind of shop around for an easy set to remember too if you get a weird set of words you can most likely just take the grid box next to you
I’m sure it’s not perfect but it was very convenient for our particular use case
Note that GPS can be often unreliable or not precise enough. For example, it would be pretty useful to specify correct side of river/road/railway. Arriving at wrong side may result in needing to do very long detour.
Seems like a great opportunity for a new alternative system.
Preferably with an error correcting code built in. Homophones shouldn't really matter, if you mess it up, the ECC could just fix it.
Go with 4 words instead of 3.
Take Diceware, treat the words as a base 7775 number, convert to binary, you get 50 bits.
You could do some fancy reshuffling, reorder the words by similarity(Start with the first, then repeatedly pick the one with the smallest edit distance), and do the conversion to an integer such that any one word swapped to a close neighbor only produces a small change.
My math knowledge is too limited to do this myself unless there's something to copy and paste or a brute force hack, but couldn't you do that by treating each word as a digit in 7775-ary gray code?
Any one word being off to a nearby word, would then just give you an error in the low order bits of the latitude.
Then you protect those bits with a hamming code, and have an overall 2 or 3 but checksum for everything, so at least some of the easy to make mistakes will be corrected, and most will probably be detected.
Reliability could probably be better than reading off coordinates, still faster, with only one extra word.
Plus the words themselves already have redundancy in them, if you make a typo, autocorrect can fix it, and if it gets the wrong one, the correction will likely catch it or flag the issue.
I built a location to words system for fun a little while ago. https://Wherewords.id
I concluded something similar to you and went to 4 words. I spent the most time on the wordlist. I used the Google s2 library to split the world into a hierarchy of points.
I even did an emoji checksum too although these days I'd probably do that differently. I really like the idea of an ECC actually, I might have a look at doing that, although it would remove the hierarchical nature that I like with the current system. Because it's hierarchical, people who know the context can skip initial words. Like two people in the 'decorate' region https://wherewords.id/decorate could just use the last three of the four words, or to indicate a number of locations near each other, you could just provide one location and then the other positions with just one or two words.
I think ECC probably wouldn't get you much unless you reordered words by similarity and did the grey code thing, otherwise, most words would change a lot of bits at once if you mixed them up.
But with grey code reordering you might not need any ECC because single words changing to similar words might just put you off by a few meters, degrading gracefully in a way that doesn't matter most of the time. Or at the very least you might only need to protect the last few bits.
Maybe there is an ordering that puts words likely to be mixed up next to each other, but still feels like it has variety, enough to where you'd notice the difference between adjacent locations most of the time?
Or maybe word confusion isn't actually the type of error that's most relevant? Accidentally using data meant for a a different region might be a bigger problem.
What3Words – The Algorithm https://cybergibbons.com/security-2/what3words-the-algorithm...
Why What3Words is not suitable for safety critical applications https://cybergibbons.com/security-2/why-what3words-is-not-su...
Someone came up with a clever idea, and clearly they had some math/cipher skills. But w3w clearly had no linguists on staff. Even Wordle did a better job filtering their word list.
Though with some of the words being so lengthy and linguistically complex, they would have been better moving to a four word system and assuring simpler words that was better filtered.
Edit: it looks like the list itself is not even available, they would have had to reverse-engineer the app, so using a synthetic list makes more sense.
(W3W is very aggressive about pursuing possibly spurious IP claims to remove any trace of What Free Words. In the past, even mentioning it in a comment like this is enough to attract a DMCA complaint.)
Deleted Comment
https://cybergibbons.com/security-2/why-what3words-is-not-su...
https://techcrunch.com/2021/04/30/what3words-legal-threat-wh...
Edit: initially I said “sued”, which was inaccurate. And the repo included the proprietary word list, not just code.
Deleted Comment
For me, it’s just already damn hard to make sure you can hear numbers correctly over the radio. In marginal conditions, it’s a hell of a lot easier to use numbers (and requires less time due to not having to repeat or ask for clarification).
Do I really want to be trying to say “arrows.midst.senses” over a handheld radio?
Really, there's all the research done (since the invention of early radio?) on what makes a phonetic alphabet or phonetic code usable in noisy conditions, and instead W3W effectively plants a field of rakes in front of blindfolded people.
I hold exactly the same opinion as you, however what is very telling us that the police and fire services love it. I think that's because of the accessibility of the system rather than the underlying algorithm or communication.
I am also coming to see w3w locations written more and more as a more accessible alternative to coordinates, especially in land sales and forestry situations.
I do particularly hate that government services and agencies are becoming reliant on a commercially licensed pattern. It does not tick the sustainability tickbox for me.
I hate it as much as I love it.
besides the greater public awareness of w3w amongst certain demographics compared to coordinates, they may be harder to hear, but it's much easier to read out three words correctly than it is 16-18 numbers
the thing with words is that they're easy to hear in context, but take them out of context and they're often indecipherable
perhaps a system that gives each square a coherent-sounding sentence could be tried, although I'm sure that would have its difficulties too
"arrows.midst.senses"
Will become "Alpha, romeo, romeo, oscar, whiskey, sierra. Mike, india, delta, sierra, tango. Sierra, echo, november, sierra, echo, sierra".
Ten numbers doesn’t sound much more difficult than three words to me.
I think the issue is that other less emergency numbers can't (like 101). That's why they tell you to hang up and call 999 rather than transferring you.
But I don't know why they just don't add more numbers to the "allowed to geolocate you" list. Or even build in a "send my location" button in phone calls. You wouldn't even need to modify the phone system - it could send DTFM tones.
The whole project has then no real purpose if you have to spell unknown words.
It has? It is?
I've known about it for many years and even seen billboards by them, yet never seen it used in practice.
I thought I'd seen adverts from official emergency bodies mentioning W3W, but a quick search finds nothing mentioning the fact that call-centres have the facility available that isn't directly from W3W, a straight copy from one of their press releases, or a discussion about the system's deficiencies (i.e. https://www.bbc.co.uk/news/technology-56901363), so that might be me having not been cynical enough to identify an astroturf campaign!
It was great for the supervisor to just tell everyone 3 words to remember instead of UTC or LATLONG coordinates which I for one probably would not have remembered in an emergency
You can kind of shop around for an easy set to remember too if you get a weird set of words you can most likely just take the grid box next to you
I’m sure it’s not perfect but it was very convenient for our particular use case
Preferably with an error correcting code built in. Homophones shouldn't really matter, if you mess it up, the ECC could just fix it.
Go with 4 words instead of 3.
Take Diceware, treat the words as a base 7775 number, convert to binary, you get 50 bits.
You could do some fancy reshuffling, reorder the words by similarity(Start with the first, then repeatedly pick the one with the smallest edit distance), and do the conversion to an integer such that any one word swapped to a close neighbor only produces a small change.
My math knowledge is too limited to do this myself unless there's something to copy and paste or a brute force hack, but couldn't you do that by treating each word as a digit in 7775-ary gray code?
Any one word being off to a nearby word, would then just give you an error in the low order bits of the latitude.
Then you protect those bits with a hamming code, and have an overall 2 or 3 but checksum for everything, so at least some of the easy to make mistakes will be corrected, and most will probably be detected.
Reliability could probably be better than reading off coordinates, still faster, with only one extra word.
Plus the words themselves already have redundancy in them, if you make a typo, autocorrect can fix it, and if it gets the wrong one, the correction will likely catch it or flag the issue.
I concluded something similar to you and went to 4 words. I spent the most time on the wordlist. I used the Google s2 library to split the world into a hierarchy of points.
I even did an emoji checksum too although these days I'd probably do that differently. I really like the idea of an ECC actually, I might have a look at doing that, although it would remove the hierarchical nature that I like with the current system. Because it's hierarchical, people who know the context can skip initial words. Like two people in the 'decorate' region https://wherewords.id/decorate could just use the last three of the four words, or to indicate a number of locations near each other, you could just provide one location and then the other positions with just one or two words.
The heirarchal thing is really cool.
I think ECC probably wouldn't get you much unless you reordered words by similarity and did the grey code thing, otherwise, most words would change a lot of bits at once if you mixed them up.
But with grey code reordering you might not need any ECC because single words changing to similar words might just put you off by a few meters, degrading gracefully in a way that doesn't matter most of the time. Or at the very least you might only need to protect the last few bits.
Maybe there is an ordering that puts words likely to be mixed up next to each other, but still feels like it has variety, enough to where you'd notice the difference between adjacent locations most of the time?
Or maybe word confusion isn't actually the type of error that's most relevant? Accidentally using data meant for a a different region might be a bigger problem.