Readit News logoReadit News
throwaway7767 commented on The hell that is filename encoding (2016)   beets.io/blog/paths.html... · Posted by u/kristjansson
wazoox · 7 years ago
Ah yes, had the same fun problem at a customer's facility last week. Moving 350 TB of data from an old DDP storage server to a Linux one. Mounting with CIFS (no other option available), an copying using "cp -a".

The file names look OK after the copy on the Linux machine. However, when exporting the directory through Samba, the Macs Finder doesn't display files with accents in the names (though they appear correctly with "ls", weird...).

So the user copies the files again, using the Finder. Now I have files with exactly the same name (uhhhhh???):

# ls -l Mmo-1. -rw-rw-rw- 1 root root 8417218 6 sept. 2013 Mémo-1.aif -rwxr--r-- 1 test test 8417218 6 sept. 2013 Mémo-1.aif -rw-rw-rw- 1 root root 363175 6 sept. 2013 Mémo-1.m4a -rwxr--r-- 1 test test 363175 6 sept. 2013 Mémo-1.m4a

Yes, it looks like two files have exactly the same name, but actually they're different: one as "é" encoded as 0xCC81, and the other one (the "good one") as 0xC3A9. Why is that? Why does one work with the Finder, and the other doesn't? who knows.

throwaway7767 · 7 years ago
Most likely it's different normalization. I've seen this before with Mac systems.

Renaming the files to use NFKC normalization fixed it. In python, you could loop through the files and do something like:

  os.rename(originalfilename, unicodedata.normalize('NFKC', originalfilename.decode('utf8')))
EDIT: You'll probably need to do this on a non-Mac system, linux for example should work.

throwaway7767 commented on The hell that is filename encoding (2016)   beets.io/blog/paths.html... · Posted by u/kristjansson
blattimwind · 7 years ago
I don't think any backup software actually can do the right thing(tm). Some might preserve (or attempt so, anyway) binary representation, others attempt to preserve unicode codepoint-space representation...

... most do neither, but rather do ${complex thing emerging from combination of implementation details of runtime and backup tool, impossible to reproduce in any other runtime, likely platform- and environment dependent; the same backup likely restores in different ways on different machines, and the same source files create different backups on different machines; creating a backup on one machine and restoring it on another does not generally result in the same files; and I have not yet mentioned what might happen if you mount the same source file system from different platforms, because results might vary a lot; also, we are only talking about paths here, not any of the other plethora of things that can and will be different between any element in OSxFSxEnv}.

throwaway7767 · 7 years ago
> I don't think any backup software actually can do the right thing(tm).

Sure it can. In this case, I'd say treating the filename as a bag of bytes is the correct way to go, as that's the way the OS treats them. Translating filenames between character sets should not be part of a backup systems job.

There are valid setups where different software on the same machine might be running with different character sets for legacy reasons. In that case there is no correct way to handle the filenames as text. But treating it as a bag-of-bytes will always work consistently.

Also, the one purpose of a backup system is to back up the files on the filesystem. If it can't back up some files that the OS considers valid, it's the backup software that failed.

throwaway7767 commented on The hell that is filename encoding (2016)   beets.io/blog/paths.html... · Posted by u/kristjansson
throwaway7767 · 7 years ago
IBM's backup software TSM/Spectrum Protect messes this up as well.

If the machine has a UTF-8 encoding (like, say, every modern system), it will try to treat filenames as valid UTF-8 strings and fail to back up files which don't fulfill that assumption. The "solution" is to run the TSM software with a single-byte locale like en_US.

I've seen a number of shops that were silently missing files from backup from old systems because of this problem.

throwaway7767 commented on GPG and me (2015)   moxie.org/blog/gpg-and-me... · Posted by u/tosh
throwaway7767 · 8 years ago
> When I receive a GPG encrypted email from a stranger, though, I immediately get the feeling that I don’t want to read it. Sometimes I actually contemplate creating a filter for them so that they bypass my inbox entirely, but for now I sigh, unlock my key, start reading, and with a faint glimmer of hope am typically disappointed.

I wonder what proportion of his plaintext email from strangers is interesting. For me it's close to 0%, mostly spam or people demanding I do free work to fix issues in open source code. I really doubt this has much to do with GPG mails specifically.

throwaway7767 commented on Israel Hacked Kaspersky, Then Tipped NSA Its Tools Had Been Breached   washingtonpost.com/world/... · Posted by u/tptacek
apexalpha · 8 years ago
Our company uses the enterprise version of Kaspersky. But if we drop this over surveillance issues then it would be a pretty hypocritical to switch to AV software from the USA. Since they are proven to do the exact thing that Kaspersky is now suspected / blamed of doing.

So, fellow Europeans, what now? Avast? Any other options?

EDIT: Ok so I found a pretty useful Wiki list[1] with European made AV products. I haven't used them so I can't judge to their effectiveness, especially the enterprise versions. But here are some alternatives to US / RU anti virus suites.

Czech Republic: AVAST, AVG, TrustPort

Finland: F-Secure

Germany: Avira, G-Data

Iceland: FRISK (F-PROT)

Romania: Bitdefender

Slovakia: ESET

Spain: PANDA security

[1] https://en.wikipedia.org/wiki/Comparison_of_antivirus_softwa...

throwaway7767 · 8 years ago
> Iceland: FRISK (F-PROT)

FRISK was bought by Israeli company Commtouch several years ago. They wound down operations in Iceland to the point that I doubt any real technical work goes on there.

throwaway7767 commented on Show HN: Pocket Stream Archive – A personal Way-Back Machine   github.com/pirate/pocket-... · Posted by u/nikisweeting
rcarmo · 8 years ago
I've been thinking along those very same lines for a long time (this project makes me wish for more free time).

I have half a mind to fork this and add something like https://github.com/internetarchive/warcprox, or at the very least walk through the generated HTML and brute-force inline all assets as a first pass :)

throwaway7767 · 8 years ago
I've been thinking I'd love to have a WARC archive of all my browsing. So many times sites I remember seeing have gone offline, and didn't get archived by the big services. Ideally this has to happen with browser cooperation, so it can save resources from complex dynamic pages, including responses to user action.

This must happen either in the browser or in a proxy like the linked warcprox, in order to catch everything. But the proxy solution is getting less practical every day with key pinning and HSTS.

Maybe a future firefox will have an option to export everything to WARC?

throwaway7767 commented on Milk that lasts for months   bbc.com/future/story/2017... · Posted by u/akandiah
Retric · 8 years ago
Raw milk has a meaningful risk of death. Unless it's addictive levels of goodness I would avoid it.
throwaway7767 · 8 years ago
I'm sure it does, but every farmer I've ever met drank raw milk, so I'm not convinced it's so terribly unsafe.

For the record, I don't live in the US, and here they don't pump cattle full of antibacterials. I buy my raw milk at the grocery store, it's legal and vetted by the health authorities.

Also, yes, it is addictive levels of goodness. :)

throwaway7767 commented on Milk that lasts for months   bbc.com/future/story/2017... · Posted by u/akandiah
webignition · 8 years ago
Your preference might not change but your ability to accept the different taste of UHT may change.

I lived the first 25 years of my life in the UK and drank pasteurised milk almost exclusively and found the taste of UHT to be lesser.

I then lived in Poland for two years and drank UHT only.

At first I disliked the taste of UHT but over time I grew accustomed to it.

I am still able to tell the difference between pasteurised and UHT, I just no longer mind.

I find that for many things in life where there are similar but not exact variants, one tends to prefer the variant one first tried despite both variants being potentially equivalent.

Pasteurised milk vs UHT is one example. Margerine vs butter is another (I was raised on margerine and disliked butter for years). Windows vs Linux desktops (again, I was raised on the former and initially disliked the latter despite both being generally logically equivalent).

I have since learned to be more accepting of the differences in variants of food and tech. This seems to make life easier for me.

throwaway7767 · 8 years ago
> I find that for many things in life where there are similar but not exact variants, one tends to prefer the variant one first tried despite both variants being potentially equivalent.

I grew up with pasteurized milk. As an adult, I tried unpasteurized milk. The taste is different, was a bit weird at first. After a couple of glasses, I much preferred it over the kind I grew up with (it's just so much better), and it's all I drink now. Normal pasteurized milk tastes like low-fat now (i.e., it tastes like drinking water from a glass that had a little milk in it already).

So I don't think it's quite so simple that everyone always prefers what they grew up with. But you're right that we often need to get used to new kinds of food before we accept it.

throwaway7767 commented on Iranian woman visiting family on tourist visa detained in Oregon jail   theguardian.com/us-news/2... · Posted by u/nafizh
dr_ick · 8 years ago
I fail to see how the parent story is relevant to tech at all.

A person with a federal warrant was arrested entering the USA. Why is this hacker news?

If you want to discuss "what might happen" when entering a country, and the entry laws of those countries, I'm game. But again, I don't see how it is tech related.

throwaway7767 · 8 years ago
The part that you're missing is that HN is not just for tech news, and never has been.
throwaway7767 commented on Airbnb “Bribes” Host with Cash Under NDA After Partiers Destroy Apartment   observer.com/2017/03/airb... · Posted by u/moonka
rdl · 8 years ago
I don't understand how people could do that much damage and it would only come to $8k. Just the cleanup and repairs to the complex outside of his property should be nearly that much; losing his lease, other damages, and damage to his property should be a lot more than that.
throwaway7767 · 8 years ago
It's quite possible that exterior damage was covered by the insurance of the homeowners/building association. I'd love more information on this case to see if that's true.

If that's the case, it's another example of negative externalities from AirBNB being dumped on the neighbors of a host.

u/throwaway7767

KarmaCake day2104March 20, 2014
About
Please delete this account, HN admins
View Original