Readit News logoReadit News
Posted by u/coolspot 3 years ago
Tell HN: Whole Yandex Git repository leaked
Someone just published 40Gb+ of leaked Yandex GIT repository. Won’t provide magnet here, but it is top google result for “yandex leak” when filtered by last 24h.

Affected services:

aapi.tar.bz2 admins.tar.bz2 ads.tar.bz2 alice.tar.bz2 analytics.tar.bz2 antiadblock.tar.bz2 antirobot.tar.bz2 autocheck.tar.bz2 balancer.tar.bz2 billing.tar.bz2 bindings.tar.bz2 captcha.tar.bz2 cdn.tar.bz2 certs.tar.bz2 ci.tar.bz2 classifieds.tar.bz2 client_analytics.tar.bz2 client_method.tar.bz2 cloud.tar.bz2 commerce.tar.bz2 connect.tar.bz2 crm.tar.bz2 crypta.tar.bz2 customer_service.tar.bz2 datacloud.tar.bz2 delivery.tar.bz2 direct.tar.bz2 disk.tar.bz2 docs.tar.bz2 drive.tar.bz2 extsearch.tar.bz2 fuzzing.tar.bz2 gencfg.tar.bz2 groups.tar.bz2 helpdesk.tar.bz2 infra.tar.bz2 intranet.tar.bz2 investors.tar.bz2 it-office.tar.bz2 jupytercloud.tar.bz2 kernel.tar.bz2 library.tar.bz2 load.tar.bz2 mail.tar.bz2 maps.tar.bz2 maps_2.tar.bz2 maps_adv.tar.bz2 market.tar.bz2 metrika.tar.bz2 mobile-WARNING-notfull.tar.bz2 nginx.tar.bz2 noc.tar.bz2 partner.tar.bz2 passport.tar.bz2 pay.tar.bz2 payplatform.tar.bz2 paysys.tar.bz2 portal.tar.bz2 robot.tar.bz2 rt-research.tar.bz2 saas.tar.bz2 sandbox.tar.bz2 search.tar.bz2 security.tar.bz2 skynet.tar.bz2 smart_devices.tar.bz2 smarttv.tar.bz2 solomon.tar.bz2 stocks.tar.bz2 tasklet.tar.bz2 taxi.tar.bz2 tools.tar.bz2 travel.tar.bz2 wmconsole.tar.bz2 yandex_io.tar.bz2 yandex360.tar.bz2 yaphone.tar.bz2 yawe.tar.bz2 frontend.tar.bz2

SXX · 3 years ago
If you want to know what's inside archives without downloading them I'm slowly working on my blog post about this breach. Will try to write a bit about affected Yandex services for those who never been interested in russian internet segment.

Also uploaded file lists from most of archives:

https://arseniyshestakov.com/2023/01/26/yandex-services-sour...

rad_gruchalski · 3 years ago
I have seen your article. This bit caught my attention:

> All files are dated back to 24 February 2022.

If a coincidence, pretty interesting.

capableweb · 3 years ago
In case people aren't aware, that day Russia made a major escalation of the Ukrainian invasion that began back in 2014, starting with the announcement from Putin about a "special military operation" beginning in Ukraine by Russian forces.

Unlikely that was the day of download, it's common practice to mask last-modified/last-accessed/created-at timestamps in dumps, by setting it to some significant date or just initial unix timestamp.

Zvez · 3 years ago
not a coincidence yandex has its own VCS (called arkadia). But not all services used it, some used github public and private. After war started, they had to migrate everything to internal vcs for obvious reasons.

So it makes sense they stopped committing to other repos somewhat around that date.

I don't have any inside knowledge now, but my guess would be that the leak is from 'on-prem' github.

Deleted Comment

FelixDeSouza · 3 years ago
It's not coincidence :)
imglorp · 3 years ago
Yandex reverse image search is very good. I use it more than tineye, bing, or goog. It either gives you the exact matches if it can find them, or else it can infer what is desired and show many similar matches.
rcfox · 3 years ago
I wonder if that's a function of their technology or their database. Google's image search used to be good, but regulations have forced them to cripple it in some ways, and they might have chosen not to surface results in some contexts.
ogurechny · 3 years ago
I don't seem to recall any big story about regulation at the time. It seemed that they carelessly had their internal facial recognition tools (probably also used in Picasa, etc. grouping by person) run on all public content, and the results were feed into public image search. Then people started to notice that Google knows about ALL of their photos posted on the Web, and that probably negatively affected their opinion on the prospects of sharing all their photos and personal data with corporations.

Also, such data collection abilities are generally limited to governments, so it was clear that many of them (US first and foremost) would ask for exclusion of certain individuals, and so forth, and so on, so the public tools were crippled.

Yandex image search does have some facial recognition, but it also seems bit-starved and/or mixed with text search (there's a bigger chance to match if the name and surname is present).

Also, Google is pretty Victorian about porn these days. It's almost like it has a whitelist of “acceptable” porn sites to suit the tastes of potentially angry old ladies.

ASalazarMX · 3 years ago
Not only regulations, Chrome's builtin image search uses Google Lens to find products to sell you instead of simple reverse searches. It has become useless.
unosama · 3 years ago
It's not regulation. Google just has chosen to cripple it, the same way Twitter and Facebook mostly chose to censor certain viewpoints. All of these groups thing they are doing good.
throwaway0x7E6 · 3 years ago
>but regulations have forced them to cripple it in some ways

the regulations don't apply to Microsoft apparently, because even goddamn Bing has been better at it than Google for years now.

AlexanderTheGr8 · 3 years ago
I never realized that reverse image search was so bad because of regulations. Can you give me some more context about which regulations affect google reverse image search and why?
SanjayMehta · 3 years ago
Yandex regular search is very good too, especially at surfacing results which appear to be censored by Google. The only problem is the interface language keeps flipping back to Russian.
kilroy123 · 3 years ago
I agree. I use it when I'm searching for "iso" torrents.
account42 · 3 years ago
Is there a meta search engine that highlights results from Yandex that are missing from or ranked much lower on Google?

Deleted Comment

spaceman_2020 · 3 years ago
Yandex translation and maps were also substantially better than Google in Russia when I traveled there in 2019
Finesse · 3 years ago
They are better not only in Russia. Many Google services feel outdated, sloppy and overcomplicated after using Yandex for a while.
actualwitch · 3 years ago
It is not hard being better at translation than google at this point. Probably the best option right now is deepl.
wolfgx · 3 years ago
I like that I can just drag & drop or paste any image into Yandex Translate website and it translates anything, also it has a better language detection than google translate, so you don't have to waste time to look the language you want to translate from. So far I haven't found any alternative, other than Bing translate desktop app...
unkulunkulu · 3 years ago
Our cat once demolished a nice chair in a rental apartment. The chair belonged to the owner. At first we could not find where to get a replacement, but yandex reverse image search helped to my surprise!
bakugo · 3 years ago
It's also unexpectedly good at finding the original full images from cropped versions.
Strom · 3 years ago
Not just cropped, the perspective and colors can be wrong. That is, you can take a photo of a poster on a wall from an angle, and it will give you the original image. It's impressive.
vintermann · 3 years ago
I think they also made it dumber after Bellingcat used it to identify a secret service officer. Interesting to see if we can find any trace of that in the git history.
computerfriend · 3 years ago
Allegedly it's snapshot data, not the actual Git repositories.
ncpa-cpl · 3 years ago
Their Street View product is also very good, however their coverage is mostly for ex Soviet Union countries
Scoundreller · 3 years ago
heh, I remember looking up some satellite maps of French prisons on Google maps. They were blurred!!!!

https://www.google.ca/maps/place/Centre+P%C3%A9nitentiaire+d...

A quick trip over to Yandex, and there they were in their full glory:

https://yandex.com/maps/10502/paris/?l=sat&ll=2.340173%2C48....

holografix · 3 years ago
Had no idea this existed. Fascinating
comfypotato · 3 years ago
My understanding is that it has reverse facial recognition as well. Very cool. Also, it’s an interesting foray into the ethics of AI in that this could easily be used for nefarious reasons. (You can snap a photo of anyone who’s been indexed and find them online very easily.)
wolfgx · 3 years ago
Agree. Here are two comparisons of some searches - [https://i.imgur.com/RmMWSG4.png] and [https://i.imgur.com/sIOhx6y.png] In 2022 even Bing beats Google..
SV_BubbleTime · 3 years ago
Hmm, didn't know they had one, but I'm let down by Google's 90% of the time so I'll try it next time.
NicoJuicy · 3 years ago
I actually doubt that a lot of that magic will be shown, as models ( because of ml ops) won't be in code.
29athrowaway · 3 years ago
It also sucks for privacy.
fIREpOK · 3 years ago
Its good for privacy if you don't live in Russia... I don't think that they share the data with US governments?
digianarchist · 3 years ago
metadat · 3 years ago
It wasn't google-able per the OPs instructions, but this did the trick. Thanks digianarchist!

Archive links:

https://archive.today/h5XJs

https://web.archive.org/web/20230125224316/https://breached....

wolfgx · 3 years ago
It can be found on Yandex too!

https://i.imgur.com/rxYINhF.png

Dead Comment

guilhas · 3 years ago
How is this site allowed to be live?

Yandex source is cool. But there are a lot of leaks with people private data

The US authorities move mountains for TornadoCash, Z-Library, etc... why leave this one?

dewey · 3 years ago
"Private people" don't have a powerful lobby group.
d0mine · 3 years ago
"Since this is leak only contain contents of git repositories there is no personal data." https://arseniyshestakov.com/2023/01/26/yandex-services-sour...
stonepresto · 3 years ago
This is basically a clone of RaidForums, which was taken down.

https://raidforums.com/

It's a cat-and-mouse game that will likely never end.

Deleted Comment

aaaaaA4 · 3 years ago
>The US authorities move mountains for TornadoCash, Z-Library, etc... why leave this one?

Nobody is moving mountains for those, what makes you think that?

yandythrowaway · 3 years ago
Anyone else's download stuck? I haven't used torrents in forever, not sure if this is normal or if there's a workaround.
SXX · 3 years ago
Most likely your ISP have throttling for torrent bandwidth. Use VPN.
thanatos519 · 3 years ago
Mine is stuck and my ISP has never throttled anything.
rmac · 3 years ago

  % bzgrep "BEGIN PRIVATE KEY" \*.bz2
  disk.tar.bz2:Binary file (standard input) matches
  drive.tar.bz2:Binary file (standard input) matches
  extsearch.tar.bz2:Binary file (standard input) matches
  ...

gnulinux · 3 years ago
People check in fake private keys to git repos all the time for testing. My own tests have private keys too. They're just sample, unused, publicly advertised private keys I found online. They're useful to make sure your code is working end to end with some private key.

EDIT: For example, here: https://ospkibook.sourceforge.net/docs/OSPKI-2.4.7/OSPKI-htm...

or here: https://docs.vmware.com/en/VMware-NSX-Data-Center-for-vSpher...

or: https://www.ietf.org/archive/id/draft-bre-openpgp-samples-01...

jesprenj · 3 years ago
It could also be a script that imports a private key and searches for the string BEGIN PRIVATE KEY.

Likewise if someone searched HN for this string he'd find your comment (:

corytheboyd · 3 years ago
Checked in private keys are fine if they're just used in tests, local development, etc.
inhumantsar · 3 years ago
Technically fine yes but from a habits and practice standpoint it's safest to stick to a "not ever" rule and work around the limitations.
pshirshov · 3 years ago
Nothing surprising. The development culture was shit back there.

Though I would expect these keys to be just some stub config values which allowed engineers to quickly run the shit locally.

coolspot · 3 years ago
Notably Yandex.Search, Yandex.Maps, Yandex.Disk (like Google Drive), Yandex.Mail, Yandex.Alice (voice assistant), Yandex.Taxi (like Uber), Yandex.Delivery (like Postmates), Yandex.Drive (like ZipCar), Yandex.Robot (idk, prob advanced AGI from the future), Yandex.Pay (like Google Pay), Yandex.Docs (like Google Docs/Sheets) all source codes leaked.
llIIllIIllIIl · 3 years ago
There you go, really working and scalable solution for running Mail, Storage, Voice assistant and Documents on your infrastructure and not depend on 3rd parties. End of Joke.
exebook · 3 years ago
From quick googling(yandexing?) Yandex.Robot appears to be a self driving briefcase.
mickelsen · 3 years ago
Now let's ping r/selfhosted to get this working locally and we are all set.
LudwigNagasena · 3 years ago
Yandex.Robot is probably a food delivery robot.
l11r · 3 years ago
Probably it's just a search bot like Googlebot.
tpmx · 3 years ago
I was super impressed when I interacted with Yandex ML engineering people like a decade ago on a conf call - they were trying to sell us their services. Very smart and very clearly outward-looking. Haven't followed the company in detail since. Wonder what happened to them. [removed inaccurate details]
type0 · 3 years ago
> One wonders if this is revenge from the regime.

It's not a revenge of the the regime. A decade ago it was a different company, now they all are completely under the cap of FSB, it's basically a Lubyanka filial. It's serfdom all over again, not that strange it got leaked by some unhappy employee.

tpmx · 3 years ago
Reading up on the current situation with the company - yeah, I think you're right. :/
mahoro · 3 years ago
When you're judging them, think about the space of possibilities as well. Is any example of a Russian business of such volume not tied to the government?
ergonaught · 3 years ago
A lot of really great technology and a lot of really great people.

There are shitheads everywhere.

throwaway290 · 3 years ago
Is this well known? Seems likely the leak was done by an anti-Putin employee or ex employee in revenge/protest against the war if yes.
throwaway0766 · 3 years ago
Yandex was probably one of the best and the most innovative Russian companies, and attracted the best workforce. But it cooperated with Putin's regime long before the war. It censored opposition narratives in its search results and promoted pro-Putin content [1]. It continues to promote Russian propaganda even today (try any Ukraine-related search query in Russian). Now it's under the oversight of a Putin's man, Alekei Kudrin [2].

Sadly, Yandex is not a neutral company and is just another weapon in Putin's hands.

[1]: https://misinforeview.hks.harvard.edu/article/a-story-of-non...

[2]: https://t.me/AlekseiKudrin/48

varispeed · 3 years ago
Don't assume that just because people are highly educated and intelligent, they will be on the "good" side.
aizen89 · 3 years ago
How do you think a company that has all its business in Russia should act?
SeanAnderson · 3 years ago
What are the short-term and long-term implications of this?

I assume a drastically increased attack surface and potentially a boon for open-source development? Anything else?

tazjin · 3 years ago
Yandex might be the most popular search engine in Russia, but also they're by far the most popular logistics company for private people here. They do deliveries (food, groceries, other goods), taxis, moving vans and so on - all strongly Yandex branded. If you walk around the streets of Russia's big cities, their brand is absolutely omnipresent (and also in some former Soviet countries, like Armenia).

Eating into their business would require much more than source code, but of course an analysis of the code could lead to finding more security issues.

> potentially a boon for open-source development?

That'd be an absolute copyright/licensing nightmare, just because the code was stolen and published doesn't mean it is now "open-source".

teetertater · 3 years ago
In Russia the copyright rules are more relaxed though. As long as you stay small nobody will enforce it
A4ET8a8uTh0 · 3 years ago
Part of me is seeing someone with GPT prompt asking for a 'rewrite'. We truly live in interesting times.

For the record, I don't disagree with you on the licensing/copyright front.

pshirshov · 3 years ago
In fact the damage would be mostly reputational.

I expect the code to be mostly worthless. There is just too much of it, it's poorly documented and, oftenly, just badly designed and badly written.

And the actual important data (index shards, voice models, all that crap) is not in these dumps.

capableweb · 3 years ago
Probably Yandex will start watching their infrastructure very closely, and go through all known attack vectors that haven't been prioritized before to fix them ASAP.

Won't be a boon for OSS, any author would be idiotic to read stolen source code and then decide to create a OSS library/project based on what they learn from it.

fencepost · 3 years ago
any author would be idiotic to read stolen source code and then decide to create a OSS library/project based on what they learn from it.

Ah, but will it make its way into Copilot? That could be interesting.

azangru · 3 years ago
> any author would be idiotic to read stolen source code and then decide to create a OSS library/project based on what they learn from it.

What? Why? Isn't this what software developers do — they read a lot of code; they find ideas they like; they mix them together with their own ideas while building something. Isn't this how learning works in general?

xwdv · 3 years ago
> Won't be a boon for OSS, any author would be idiotic to read stolen source code and then decide to create a OSS library/project based on what they learn from it.

This is naive. This generation seems very sensitive to the prospect of computer crime.

The stolen source code will almost certainly be read, and if deemed novel enough will be turned into open source projects. It may be tough to figure out those projects are derivatives of stolen code, but most likely they will be passed around in black market repos.

I looked through some of my telegram channels to see if anything has been posted yet. Lo and behold, the stolen files are in fact available… from a server in Ukraine.

EMIRELADERO · 3 years ago
> Won't be a boon for OSS, any author would be idiotic to read stolen source code and then decide to create a OSS library/project based on what they learn from it.

With the current geopolitical situation going on, is this really true? (From a western developer's perspective)

from · 3 years ago
They are still the most popular search engine in Russia, they have the best brand, etc. Intangible stuff like that is hard to copy. Running a search engine on the scale of Yandex is very expensive so I don't think that they are going to be replaced by some startup that copies their code and adds a few features. Probably a bunch of bugs will be found but that's not the end of the world. Apparently they handle images pretty well so parts of that may be (illegally) copied in some places.
SOLAR_FIELDS · 3 years ago
Knowing how stuff like anti-spam algorithms and ranking algorithms work in order to abuse them is probably the much higher value here.
geraldwhen · 3 years ago
I use Yandex when DDG and bing censor real results.
Aleksdev · 3 years ago
I can still see many companies trying. If the opportunity is there to make money off of this leak then someone will attempt it.
rcfox · 3 years ago
Using stolen code in an open-source project seems like a bad idea.
EMIRELADERO · 3 years ago
Someone could still get the uncopyrightable ideas out of it though.
SXX · 3 years ago
Then we can train FancyCodeGeneratorModel on it to bypass this unfortunate licensing problem.
metadat · 3 years ago
Does the USA care to enforce Russian copyright law anymore? For the near to medium future it seems unlikely to be a priority.

Still not a solid foundation to start with, though.

Solution: Take kernel of the idea and implement it yourself.

winddude · 3 years ago
All of those, assuming there's nothing malicious in the code. Google see if any of their code had been stolen, not like they can do much right now.

Deleted Comment

Deleted Comment

quazar · 3 years ago
magnet:?xt=urn:btih:7e0ac90b489baee8a823381792ec67d465488fef&dn=yandexarc&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969&tr=udp%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce