I worked on the original (long since superseded) implementation of the metadata store for Google Drive, i.e. the system which was responsible for tracking file / folder relationships. The requirement to allow an item to appear in multiple locations was a huge complication, in part because of the way it interacted with permissions being inherited from a folder to the items in that folder. I imagine this change may be motivated by a desire to move away from that complex model, and whatever team owns that system now may be very happy to see it going away.
(IIRC, the requirement stemmed from the need to support the various applications that were being folded into / integrated with Google Drive, such as Photos which of course allows a photo to appear in multiple albums.)
This was my understanding as well. The original Drive was built effectively as a directed graph (with cycles allowed). Any file or folder could be stored in multiple locations. And permissions were at a per-file basis, so 2 people viewing the same folder may see different sets of files.
And permissions were definitely a hard part of it, as if you applied new permissions to a folder and all children, it had to walk the entire graph to update the permissions.
This is the advantage of the Team Drive style structure that the Drive team put out. It follows the classic filesystem design of a tree, which allows for easier permissions modeling, among other things. It's also why all "hard links" are now becoming shortcuts / Soft-links.
Forgive my ignorance (and there's a lot of it), but navigating a graph and setting permissions doesn't seem like a terribly difficult problem, especially for Google - the king of graphs, in a word. I would think maybe the issue has to do with how permissions of a file/folder apply to different users, but then isn't that just exactly the raison d'etre of permission systems?
This isn't an area I'm qualified to have a technical opinion on but if you'd care to elaborate I'd be interested to learn more.
Can you explain why team drives (now called shared drives) have a limit of 400.000 files while there are no limits and unlimited sharing for files in "my drive"?
I worked on another sync client's representation of filesystem structure, and came to the same conclusion. Hard links enable some cool behavior, but in retrospect added more complexity than anyone expected. Migrating to shortcuts / soft links seems very reasonable - I wish I had started there.
> various applications that were being folded into / integrated with Google Drive
The Photos/Drive integration was removed a long time ago. What other integrations were behind the original requirement? I'm curious to know if the extra complication was worth it in the long run and how long the integrations that needed this feature hung around for.
I don't specifically remember whether there were other motivations for that requirement. Possibly a desire to support "tag" style interfaces, where an item can have multiple tags applied. But it may have mostly been the Photos integration.
And no, definitely the complication was not worth it in the long run. :-) That project got way more complicated than anyone wanted (not that there's anything unusual about that).
Shouldn't de-duping be an implementation detail of the virtual file system, completely hidden from the user?
Isn't that how file sharing/syncing services have worked since the MegaUpload days?
If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
> If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
If that's what you were doing, you won't get affected by this. This is about files/folders that were "hardlinked", which was difficult to do by accident. I think you had to hold Ctrl while dragging the file into another folder, or something like that. (The key to notice is that they're talking about one file being in multiple directories, not multiple files with identical contents.)
>If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
That's precisely the behavior you'll get. You were allowed in the previous implementation to upload a file to one location and then put it in 2 locations, such that you would have changes in either location reflected the same way. This wasn't 'copying' a file, it was multiparenting it.
This is a concern of presentation, not the backing implementation: How to present (approximately) hardlinked files to users, with the possible complications of varying support for hard and soft links or other mechanisms across the various client implementations of Google Drive.
There's advantages and disadvantages to using either.
I see. I don't use Drive via the web/app interfaces much but essentially, hard links are a thing already and users must've inadvertently created them anyway when doing actions via web/app.
Any idea if these hard links can be created with Drive clients on Mac/Win?
Precisely. I can understand if they want to change the wording on their interfaces going forward to promote folks to use "shortcuts" but it sounds awful if they're going to do this to existing files/directories.
If someone has a Drive desktop client installed, has two source-code directories with some identical files in them, and modifies one of the identical files in just one of the directories, I can imagine they'd be very surprised when the other copy in the untouched directory also changes.
I'm on Linux where there's no official Drive client, so this won't happen to me. (I use Syncthing instead.)
What you are describing sounds like two distinct files with the same content. The change only affects the same file that has been "hard linked" into two separate folders. Copies of files are unaffected.
Was this hard-linking only possible when performing an action via the web/app interfaces? Because the wording makes it extremely confusing. Even I thought this was going to affect copies of the same file in multiple places. I don't notice any points in their support doc which tells me otherwise.
I see. I think your "hard linked" term is the difference between the "Add shortcut to Drive" and "Make a copy" options when right-clicking a file in the web UI. If this announcement affects only files created with "Add shortcut to Drive," and uploaded files that happen to have the same content as another file aren't automatically turned into shortcuts, then I'm less alarmed by the change.
Google Drive changing causing issues like this is why I moved to Syncthing. In google drive every so often I would have to de-duplicate a bunch of files appended with (1).
I think this is a good move – the cloning UX experience was a nightmare. I've moved many shared files to Team Drives because the language is easier for most of understand.
I imagine this was a tough call for a PM, with a lot of cases to consider and account for given this is so embedded in the Drive product DNA.
"The process will replace all but one location of files and folders that are currently in multiple locations. The files and folders will be replaced with shortcuts."
Is there any way for the user to specify that they want a full copy of a file?
What happens if another user makes a copy of the file and alters it? Are both copies changed?
"The replacement decision will be based on original file and folder ownership, and will also consider access and activity on all other folders to ensure the least possible disruption for collaboration."
"You can’t opt-out of the replacement."
This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
> This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
Shortcut preserves semantics: working on the original file or working on a shortcut to the original file will both modify the same document. Fully copy (create a new document with same contents as original document at a point in time) would bring new semantics.
It would also have quota implications - if I had a 5GB file multiparented in 3 locations, I wouldn't want to suddenly be over quota because Google decided I really wanted it copied to 3 locations.
One folder contains the original file, and one folder contains a shortcut to the file. There's no concept of a file existing in multiple folders with this change, only 1 source file and shortcuts linking to it. If you delete the original file, I assume the shortcuts are deleted too.
I couldn't figure this out from reading the article but perhaps someone here knows. Say I want to create a new edited version of a document without changing the original, so I first duplicate that doc and then edit the new copy of it. Does this new Drive behavior mean that the original document I copied from will be changed as well?
Google have been working up to this change for quite a while now. Rclone supports shortcuts from version v1.54.0 released on 2021-02-02. I've been impressed with the communication from Google and the care they've taken to keep things working in what must be a difficult transition.
I hope that this change can finally unlock the API to be able to return all the children of a given node recursively. Multiple parents make this much harder.
The drive API doesn't have that at the moment and it makes traversing deep directory trees really painful.
An API search term to find the objects which have a given ID as an ancestor at any depth would be fantastic.
(IIRC, the requirement stemmed from the need to support the various applications that were being folded into / integrated with Google Drive, such as Photos which of course allows a photo to appear in multiple albums.)
This was my understanding as well. The original Drive was built effectively as a directed graph (with cycles allowed). Any file or folder could be stored in multiple locations. And permissions were at a per-file basis, so 2 people viewing the same folder may see different sets of files.
And permissions were definitely a hard part of it, as if you applied new permissions to a folder and all children, it had to walk the entire graph to update the permissions.
This is the advantage of the Team Drive style structure that the Drive team put out. It follows the classic filesystem design of a tree, which allows for easier permissions modeling, among other things. It's also why all "hard links" are now becoming shortcuts / Soft-links.
This isn't an area I'm qualified to have a technical opinion on but if you'd care to elaborate I'd be interested to learn more.
The Photos/Drive integration was removed a long time ago. What other integrations were behind the original requirement? I'm curious to know if the extra complication was worth it in the long run and how long the integrations that needed this feature hung around for.
And no, definitely the complication was not worth it in the long run. :-) That project got way more complicated than anyone wanted (not that there's anything unusual about that).
Isn't that how file sharing/syncing services have worked since the MegaUpload days?
If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
If that's what you were doing, you won't get affected by this. This is about files/folders that were "hardlinked", which was difficult to do by accident. I think you had to hold Ctrl while dragging the file into another folder, or something like that. (The key to notice is that they're talking about one file being in multiple directories, not multiple files with identical contents.)
That's precisely the behavior you'll get. You were allowed in the previous implementation to upload a file to one location and then put it in 2 locations, such that you would have changes in either location reflected the same way. This wasn't 'copying' a file, it was multiparenting it.
There's advantages and disadvantages to using either.
Any idea if these hard links can be created with Drive clients on Mac/Win?
I'm on Linux where there's no official Drive client, so this won't happen to me. (I use Syncthing instead.)
I'm on Linux with Syncdocs for syncing Google Drive so will wait and see how it handles things.
I imagine this was a tough call for a PM, with a lot of cases to consider and account for given this is so embedded in the Drive product DNA.
Is there any way for the user to specify that they want a full copy of a file?
What happens if another user makes a copy of the file and alters it? Are both copies changed?
"The replacement decision will be based on original file and folder ownership, and will also consider access and activity on all other folders to ensure the least possible disruption for collaboration."
"You can’t opt-out of the replacement."
This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
Shortcut preserves semantics: working on the original file or working on a shortcut to the original file will both modify the same document. Fully copy (create a new document with same contents as original document at a point in time) would bring new semantics.
Most people I know prefer symlinks for most uses, so this feels like better UX.
I hope that this change can finally unlock the API to be able to return all the children of a given node recursively. Multiple parents make this much harder.
The drive API doesn't have that at the moment and it makes traversing deep directory trees really painful.
An API search term to find the objects which have a given ID as an ancestor at any depth would be fantastic.