The way Gluster FS handles renaming the file

Imagine that you have a user on a distributed file system who loves renaming files. Let's say he have never heard of version control systems and he is using this as a way to version his work. Or it is some sort of software that does it to one of the "dot" files and nobody noticed. Imagine also that this file is 5Gb in size. How do you think distributed file system should handle this?

Gluster FS uses the name of the file to determine which server (data node) should store the file. Once the file is renamed it may end up on a different node. So the file system need to copy it over. Then the file is renamed again, and it is moved again. Now, every time users issues a harmless looking "mv" command 5Gb of data is flying over the wire. That would've been terrible.

But fortunately Gluster FS is smarter than this. Instead of copying the file over, it leaves it be where it was, but creates a special "placeholder" file on the new server with the information where the file actually is. Now the harmlessly looking mv command only creates one small file and is indeed quite harmless.

I think it is brilliant.

If you want know more about Gluster FS DHT internals read this http://joejulian.name/blog/dht-misses-are-expensive/. If you are using Gluster, you should read the while blog. But I think you already know about its existence.

Lifeboat blog

A random blog about software

The way Gluster FS handles renaming the file