Posts Tagged ‘rsync’

(Sysadmin) Software Design Decisions

Wednesday, October 3rd, 2012

When approaching a task with an inkling to automate, sometimes you find an open source project that fits the bill. But the creator will work within constraints, and often express their opinion of what’s important to ‘solve’ as a problem and therefore prioritize on: a deployment tool is not necessarily a patch management tool is not necessarily a configuration management tool, and so on. One of the things I’ve dealt with is trying to gauge the intent of a developer and deciding if they are interested in further discussion/support/development of a given project. Knowing why one decision was made or another can be helpful in these situations. In that category of things I wish someone could have written so I could read it, here’s the design decisions behind the sonOfBackupRestoreScripts project I’ve been toying with as an add-on to DeployStudio(heretofore DS), which you can hopefully understand why I am not releasing as an official, supportable tool in it’s current bash form after reading the following.
I’ve adapted some of the things Google used in their outline for Simian as a model, to give this some structure.

Project Objective:

To move user home folders and local authentication/cached credentials between workstations in a customizable and optimized manner, preserving the integrity of the data/user records as much as possible

Overview:

For speed and data integrity, rsync is used to move selections of the users home folder(minus caches, trash, and common exclusions made by Time Machine). To increase portability and preserve mac-specific attributes, a disk image is generated to enclose the data. The user account information is copied separately and helpful information is displayed at the critical points as it moves from one stage to another and during the backup itself.

Requirements: DeployStudio Server / NetBoot

DS, as a service, enables an infrastructure to run the script in, and automounts a repository to interact with over the network. Meant to work optimally with or without a NetBoot environment, an architecture assumption being made during development/testing is wired ethernet, with the use of USB/Thunderbolt adapters if clients are MacBook Airs. Even old minis can function fine as the server, assuming the repo is located on a volume with enough space available to accept the uncompressed backups.

Implementation Details: Major Components / Underlying Programs

- source/destination variables

Parameters can be passed to the script to change the source/destination of backups/restores with the -s(for source) and -d(…) switches and then a path that is reachable by the NetBooted system.

- hdiutil

A simple sparsediskimage is created which can expand up to 100GBs with the built-in binary hdiutil. The file system format of that container is JHFS+, and a bunch of other best practices, cobbled together from Bombich’s Carbon Copy Cloner(heretofore CCC) and InstaDMG, are employed.

- cp

The cp binary is used to just copy the user records from the directory service the data resides on to the root of the sparseimage, and the admin group’s record is copied into a ‘group’ folder. If hashes exist in /var/db/shadow/hash, which is how passwords were stored previous to 10.7, those are moved to a ‘hashes’ folder.

- rsync

A custom, even more current build of rsync could be generated if the instructions listed here are followed. Ideally, a battle-tested version like the one bundled with CCC’s ccc_helper.app (/Applications/Carbon\ Copy\ Cloner.app/Contents/MacOS/ccc_helper.app/Contents/MacOS/rsync, which is actually a heavily customized rsync version 3.0.6) could be used, but it’s output isn’t easy to adapt and see an overview of the progress during a CLI transfer. Regardless, the recommended switches are employed in hopes to get a passing grade on the backupBouncer test. The 3.0.7 version bundled with DS itself (/Applications/Utilities/DeployStudio\ Admin.app/Contents/Frameworks/DSCore.framework/Versions/A/Resources/Tools/rsync, which for whatever reason is excluded when the assistant creates NetBoot sets) was used during development/testing.

-Exclusions

The Users folder on the workstation that’s being backed up is what’s targeted directly, so any users that have been deleted or subfolders can be removed with the exclusions file fed to the rsync command, and without catch-all, asterisk(*) ‘file globbing’, you’d need to be specific about certain types of files you want to exclude if they’re in certain directories. For example, to not backup any mp3 files, no matter where they are in the user folders being backed up, you’d add - *.mp3 Additional catch-all excludes could be used, as detailed in the script, which specifically excludes ipsw’s(iOS firmware/OS installers) like this: --exclude='*.ipsw'

-Restore

Pretty much everything done via both rsync and cp are done in reverse, utilizing the source/destination options, so a backup taken from one machine can easily be chosen to restore to another.

Security Considerations:

Very little security is applied during storage. Files are transferred over password-protected AFP, so a separate server and repo could be used to minimize potential access by whoever can access the main DS service. Nothing encrypts the files inside the sparseimages, and if present, the older password format is a hash that could potentially be cracked over a great length of time. The home folder ACL’s and ownership/perms are preserved, so in that respect it’s secure according to whoever has access to the local file systems on the server and client.

Excluded/Missing Features:
(Don’t You Wish Every Project Said That?)

Hopefully this won’t sound like a soul-bearing confession, but here goes:
No checks are in place if there isn’t enough space on destinations, nor if a folder to backup is larger than the currently hard-coded 100GB sparseimage cap (after exclusions.) Minimal redirection of logs is performed, so the main DS log can quickly hit a 2MB cap and stop updating the DS NetBoot log window/GUI if there’s a boatload of progress echo’d to stdout. The process to restore a users admin group membership(or any other group on the original source) is not performed, although the group’s admin.plist can be queried after the fact. Nor is there even reporting on Deleted Users orphaned home folders if they do actually need to be preserved, by default they’re just part of the things rsync excludes. All restrictions are performed in the Excludes.txt file fed to rsync, so it cannot be passed as a parameter to the script.
And the biggest possible unpleasantness is also the #1 reason I’m not considering continuing development in bash: UID collisions. If you restore a 501 user to an image with a pre-existing 501 user that was the only admin… bad things will happen. (We’ve changed our default admin user’s UID as a result.) If you get lucky, you can change one user’s UID or the other and chown to fix things as admin before all heck breaks lose… If this isn’t a clean image, there’s no checking for duplicate users with newer data, there’s no filevault1 or 2 handling, no prioritization so if it can only fit a few home folders it’ll do so and warn about the one(s) that wouldn’t fit, no version checking on the binaries in case different NetBoot sets are used, no fixing of ByHostPrefs(although DS’s finalize script should handle that), no checks with die function are performed if the restore destination doesn’t have enough space, since common case is restoring to same HD or a newer, presumably larger computer. Phew!

Wrapup:

The moral of the story is that the data structures available in most of the other scripting languages are more suited for these checks and to perform evasive action, as necessary. Bash does really ungainly approximations of tuples/dictionaries/hash tables, and forced the previous version of this project to perform all necessary checks and actions during a single loop per-user to keep things functional without growing exponentially longer and more complex.

Let’s look forward to the distant future when this makes it’s way into Python for the next installment in this project. Of course I’ve already got the name of the successor to SonOfBackupRestoreScripts: BrideOfBackupRestoreScripts!

File Replication

Thursday, February 19th, 2009

Performing replication between physical locations is always an interesting task. Perhaps you’re only using your second location for a hot/cold site or maybe it’s a full blown branch office. In many cases, file replication can be achieved with no scripting, using off the shelf products such as Retrospect or even Carbon Copy Cloner. Other times, the needs are more granular and you may choose to script a solutions, as is often done using rsync.

However, a number of customers have found these solutions to leave something to be desired. Enter File Replication Pro. File Replication Pro allows administrators to replicate data between two locations in a variety of fashions and across a variety of operating systems in a highly configurable manner. Furthermore, File Replication Pro provides delta synchronization rather than full file copies, which means that you’re only pushing changes to files and not the full file over your replication medium, greatly reducing required bandwidth. File Replication Pro is also multi-platform (built on Java), allowing administrators to synchronize Sun, Windows, Mac OS X, etc.

If you struggle with File Replication issues, then we can help. Whatever the medium may be, give us a call and we can help you to determine the best solution for your needs!

Backing Up With Carbon Copy Cloner

Wednesday, April 2nd, 2008

The newest version of carbon copy cloner, now version 3.1, has a number of features that move it closer to a viable automated backup system.

Carbon Copy Clone is now a wrapper application that runs a series of terminal commands to accomplish its goal but it does then very well.

Compatibility: 10.4 or higher. Universal Binary

Usage:

Cloning: As its name suggests the first feature of this software is to clone one drive to another. This is how the program started and was one of the few good third party software applications to do drive cloning on the mac.

The software interface is simple. Choose a source volume and choose a destination volume. If you are cloning you by default want to overwrite the destination drive.

New Feature: There is now a built in feature that tests the “Bootability” of the target drive after the clone. This will let you know whether the target drive can be used as a boot volume.

Local Backup: Instead of copying all data from the local to the target drive, you can now choose to do incremental backups of selected files. The source file system tree is then displayed, you can choose to check mark the boxes that you wish to backup. This model is good because you can choose the user directory to back up but then deselect the music folder within the user. Any new files or folders in the user directory will get backed up, but any files or folders in the music will not be.

Destination in subdirectory & Pre or Post Script runs: to copy data into a subdirectory of the target drive you must choose the pull down the Application Menu, between the Apple menu and File menu. Then choose Advanced Settings. This will give you a field to enter a pathname to specify a subdirectory to receive the copied files. You will also see fields to specify scripts to run either before or after the copy. Classically this is to stop and then start a database, or execute a database export for backup. I have also seen commands to gzip a directory structure and then decompress it after the copy.

Incremental Backups: When you choose your destination you can choose whether to do a full copy or an incremental copy. In addition you are presented with options to choose whether files are deleted if they are not on the source, and whether to preserver files that are delete or overwritten. This option creates a directory at the destination point _CCC_Year_Month_Time that will indicate that the files inside are the files that would have been overwritten by the incremental backup. As of now there is no way to automatically remove these files without further scripting or user intervention. If you are at a client that makes use of CCC and the destination drives are reaching capacity. These are the files to remove to conserve space.

Filtering: This version of ccc has filtering. The gearbox next to the source drive selector will be available if the source drive is local. These filters will show what you have chosen not to include. In addition you can add to this filter exceptions by file extension or exceptions by pathname. The latter filter works the same was as the exceptions in rsync. If you add an entry to this list, and path name that has content that matches this string will be ignored.

For example: If you back up the /Users/ directory but place “iTunes” in the advanced filter. It will backup all the user folders but will ignore all of the iTunes folders inside all of the user folders.

Disk Images as destinations: This allows you to create a sparse image file, with encryption should you choose it, to be the destination of the backups. The image file needs to be local. You could use other scripts to move these files around

Remote Backup: A recent update to this feature makes this a more viable solution for cost effective backup. In the interface you can choose the source to a be a remote mac or the destination to be a remote mac, but not both. If you choose the source to be a remote mac you cannot apply the file filters. In most circumstances I prefer to set this up on a client computer that is to be backed up and then choose the remote computer to be the server that will receive the data. In either case for a remote computer to be the source or the destination, you have to generate and authorization package installer.

This creates an SSH encryption key that is installed into /var/root/.ssh which allows the rsync process to run over an ssh tunnel without username:password authorization. This package needs to be installed on both the source and destination computers. These installers will now place nice with each other and concatenate their encryption keys so multiple sources can write to the same computer.

Note: Computers set as destinations must have ssh enabled. Normally done by enabling “remote login” in the sharing pane of system preference.

Scheduling Backups: Once you have the specifics of the copy process set you can choose to save the task. This will open a new window called “Backup Task Scheduler” In it you will see a list of scheduled tasks. These tasks correspond to entries in /Library/LaunchDaemons, each one will run as a daemon process call ccc_helper.

You can schedule operations on a hourly, daily, weekly, monthly basis or whenever the drive is connected. That last option is only viable for a backup that writes to a local drive.

The settings tab allows you to specify whether the backup destination will be determined by pathname only or whether to use the unique uuid for each drive.

You can access existing schedules by going to the Application menu again and choosing “Scheduled Tasks…”

NOTE: if the destination drives at the client rotate onsite and offsite there are two things to consider. First is that the scheduled backups should NOT be using the unique uuid and that both drive should have the same name so that they can receive remote backups properly. The good news is that the ccc_helper daemon is smart enough to not write into the /Volumes directory if there is no drive there that matches the destination name.

The description field is by default populated with common language describing the specifics of the back up script. This can be edited to be anything that you like

Cancelling a copy in process: If you can see the windows for the ccc_helper app you can press the cancel button. If you do so you are given two options. One is to skip this execution, which will relaunch on the next scheduled time, or you can defer. If you choose to defer you can have this newly selected time be the execution time from now on. This is probably the only drawback to having the backup run on a client computer. Is that they can cancel the process on their own.

Conclusions: All in all you get a lot with this simple product and it can be of great use even in limited applications. If you client is mostly mac and does not want to invest in an expensive backup situation it can go a long way to backing them up.

Pros: It is donation ware: Meaning it is freeware that will bug you for a donation now and again. It uses existing technology on your system, namely rsync and ssh. It is HFS+ meta-data aware. It is the ccc-helper that does the work and it will copy the hfs+ meta data over ssh. It writes out its own ccc log file.

Cons: Does not handle failure gracefully: If it cannot perform its actions it will bring up an on screen alert that will stay until dismissed. Using incremental backup on a very large file list can be memory intensive. This is more pronounce in local copies as it seems to break down the rsync operations on a folder by folder basis with a remote destination. Filtering is only available if the source is local. MAC ONLY. No support for any other operating system.