... most do neither, but rather do ${complex thing emerging from combination of implementation details of runtime and backup tool, impossible to reproduce in any other runtime, likely platform- and environment dependent; the same backup likely restores in different ways on different machines, and the same source files create different backups on different machines; creating a backup on one machine and restoring it on another does not generally result in the same files; and I have not yet mentioned what might happen if you mount the same source file system from different platforms, because results might vary a lot; also, we are only talking about paths here, not any of the other plethora of things that can and will be different between any element in OSxFSxEnv}.
Sure it can. In this case, I'd say treating the filename as a bag of bytes is the correct way to go, as that's the way the OS treats them. Translating filenames between character sets should not be part of a backup systems job.
There are valid setups where different software on the same machine might be running with different character sets for legacy reasons. In that case there is no correct way to handle the filenames as text. But treating it as a bag-of-bytes will always work consistently.
Also, the one purpose of a backup system is to back up the files on the filesystem. If it can't back up some files that the OS considers valid, it's the backup software that failed.
The file names look OK after the copy on the Linux machine. However, when exporting the directory through Samba, the Macs Finder doesn't display files with accents in the names (though they appear correctly with "ls", weird...).
So the user copies the files again, using the Finder. Now I have files with exactly the same name (uhhhhh???):
# ls -l Mmo-1. -rw-rw-rw- 1 root root 8417218 6 sept. 2013 Mémo-1.aif -rwxr--r-- 1 test test 8417218 6 sept. 2013 Mémo-1.aif -rw-rw-rw- 1 root root 363175 6 sept. 2013 Mémo-1.m4a -rwxr--r-- 1 test test 363175 6 sept. 2013 Mémo-1.m4a
Yes, it looks like two files have exactly the same name, but actually they're different: one as "é" encoded as 0xCC81, and the other one (the "good one") as 0xC3A9. Why is that? Why does one work with the Finder, and the other doesn't? who knows.
Renaming the files to use NFKC normalization fixed it. In python, you could loop through the files and do something like:
EDIT: You'll probably need to do this on a non-Mac system, linux for example should work.