Awesome Conferences

ProTip: make rsync fail more reliably

A co-worker of mine recently noticed that I tend to use rsync in a way he hadn't seen before:

rsync -avP --inplace $FILE_LIST desthost:/path/to/dest/.

Why the "slash dot" at the end of the destination?

I do this because I want predictable behavior and the best way to achieve that is to make sure the destination is a directory that already exists. I can't be assured that /path/to/dest/ exists, but I know that if it exists then "." will exist. If the destination path doesn't exist, rsync makes a guess about what I intended, and I don't write code that relies on "guesses". I would rather the script fail in a way I can detect (shell variable $?) rather than have it "guess what I meant"; which is difficult to detect.

What? rsync makes a guess? Yes. rsync changes its behavior depending on a number of factors:

  • is there one source file or multiple source files?
  • is the destination a directory, a file, or doesn't exist?

There are many permutations there. You can eliminate most of them by having a destination directory end with "slash dot".

For example:

  • Example A: rsync -avP file1 host:/tmp/file
  • Example B: rsync -avP file1 file2 host:/tmp/file

Assume that host:/tmp/file exists. In that case, Example A copies the file and renames it in the process. Example B will fail because rsync's author (and I think this is the right decision) decided that it would be stupid to copy file1 to /tmp/file and then copy file2 over it. This is the same behavior as the Unix cp command: If there are multiple files being copied then the last name on the command line has to be a directory otherwise it is an error. The behavior changes based on the destination.

Let's look at those two examples if the destination name doesn't exist:

  • Example C: rsync -avP file1 host:/tmp/santa
  • Example D: rsync -avP file1 file2 host:/tmp/santa

In these examples assume that /tmp/santa doesn't exist. Example C is similar to Example A: rsync copies the file to /tmp/santa i.e. it renames it as it copies. Example B, however, rsync will assume you want it to create the directory so that both files have some place to go. The behavior changes due to the number of source files.

Remember that debugging, by definition, is more difficult than writing code. Therefore, if you write code that relies on the maximum of your knowledge, you have, by definition, written code that is beyond your ability to debug.

Therefore, if you are a sneaky little programmer and use your expertise in the arcane semantics and heuristics of rsync, congrats. However, if one day you modify the script to copy multiple files instead of one, or if the destination directory doesn't exist (or unexpectedly does exist), you will have a hard time debugging the program.

How might a change like this happen?

  • Your source file is a variable $SOURCE_FILES and occasionally there is only one source file. Or the variable represents one file but suddenly it represents multiple.
  • The script you've been using for years gets updated to copy two files instead of one.
  • Over time the list of files that need to be copied shrinks and shrinks and suddenly is just single file that needs to be copied.
  • Your destination directory goes away. In the example that my coworker noticed, the destination was /tmp. Well, everyone knows that /tmp always exists, right? I've seen it disappear due to typos, human errors, and broken install scripts. If /tmp disappeared I would want my script to fail.

It is good rsync hygiene to end destinations with "/." if you intend it to be a directory that exists. That way it fails loudly if the destination doesn't exist since rsync doesn't create the intervening subdirectories. I do this in scripts and on the command line. It's just a good habit to get into.

Tom

P.S. One last note. Much of the semantics described about change if you add the "-R". They don't get more consistent, they just become different. If you use this option make sure you do a lot of testing to be sure you cover all these edge cases.

Posted by Tom Limoncelli in Technical Tips

No TrackBacks

TrackBack URL: https://everythingsysadmin.com/cgi-bin/mt-tb.cgi/1453

2 Comments | Leave a comment

GSCopy Pro v6.0 (RoboCopy Alternative) with Open File Agent
GSCopyPro is a single command-line tool (CLI) that can copy, replicate and move files from one folder to another. This folder can be on the same machine/ server or another server elsewhere. What makes GSCopyPro stand out from other competitors is the fact it works on 32-bit as well as 64-bit systems and has no restrictions. It can easily be scheduled to run as a scheduled task and fully automated. GSCopyPro also comes with an open file agent which can copy files that are locked/ opened by other processes. This feature is supported in all windows versions from widows XP/ 2003 and later.
Go To:>> http://www.gurusquad.com/GSCOPYPRO

I'm glad to see someone else does this too.
FYI: I also do it with "mv" as well (for the same reasons).

However, it is dangerous (as I recently discovered - after MANY years of use of a trailing slash dot).
If using shell variables, if you make a typo.

My script had something like:
LOGDIR=/some/path
REMLOGS=/another/path
REMHOST=somehostname
rsync -av $REMHOST::$REMLOGS/. $LOGDIR/.

but I fat-fingered the rsync destination variable to $LOBDIR which mostly looks the same, but when the script ran ... as root ...
I'm sure you can determine the outcome.

I'd preferred to have run this not as root, but there were preserving owner/group requirements, so I moved the /. into the shell variable, triple checked my script, and as this was on a VM did a snapshot of the host before I retried my test.

If I had not used the slash dot, disaster would not have occurred, but overall I still believe it's better to use as you stated in your blog.

Leave a comment

Credits