Copying files to Google (Google Drive)
With UCSB moving to google for e-mail, you also have access to google drive for storing, and sharing files. While it's certainly not as easy to use as a local filesystem, it's free, and a great place to park data that you aren't currently working with. There are several ways to get files there from the cluster.
'gdrive' - this is good for moving a few large files, we've heard it doesn't work as well for a lot of files - see rclone info below for that. However in both cases, you'll do better (in terms of reliability and speed) if you create a few big files, rather than trying to keep a directory structure with many small files. You can either tar, or zip, these directories and then transfer the single larger file. We're assuming you're using google as a place to park data that you aren't actively working on - but it's a great place to park files that you'll need again at some point.
tar cf - MicroscopeData2017 > ./MicroscopeData2017-dir.tar (will create a single 'tar' file of your directory) You can also then 'gzip MicroscopeData2017-dir.tar' to make it a smaller file if you like
and, of course,
zip MicroscropData.zip -1 -r MicroscropData2017 (will create a single 'zip' file of that directory, the -1 makes it do a faster, but still pretty good, compression).
The first time you use rclone, you need to initialize it so it can login to your google account.
You can do this by typing
you can just answer the questions with the defaults (although on the first one - make sure to select 'google drive', not 'google cloud storage'). When it asks for Autoconfig, if you have a graphical interface open (e.g. X2Go, etc., say yes, otherwise say no. If you said yes to autoconfig it will automatically open a browser window for you to login to your Google account (NetID@ucsb.edu), if you say no, it will give you a URL to cut and paste into any browser on your desktop/laptop computer. Then you return to your command line window and finish answering questions (just hit return if you're not sure what to answer).
If you've used the default, you'll now have a gdrive connection named 'Google' - think of it like a drive letter in Windows, e.g. D:. From that you can do various things, e.g.
rclone lsd Google:
to list your directories.
or to copy files up to it
rclone copy somefile Google:
and to get it back,
rclone copy Google:somefile .
you can set it up to use multiple accounts, AWS, etc. - to see everything you have configured just to a
If, say, you want to push a whole directory up since you need to see individual files (i.e. you don't want a giant tar file up there) you can use 'sync' - be sure also use use 'check' to make sure everything made it there too! See example where I'm copying a directory named old-project to google drive. My rclone endpoint is named Google-paul and I like to have a subdirectory 'research-backups' so files end up in there rather than the top level google drive! Also, remember to include the directory name in Google, since otherwise it will put everything in 'research-backups', and what we want is 'research-backups/old-project'. Try it with some small directories to test first!
$ rclone mkdir Google-paul:research-backups $ rclone sync old-project Google-paul:research-backups/old-project
assume this runs overnight, or over a few days (if it's over a TB, it will take a few days). You'll want to make sure it actually copied up everything before you remove it locally! Use 'rclone check' for that and look for errors. Here's a check (I removed a file from google drive to simulate a file that didn't make it) that fails (1 files missing, 1 differences found) , resync, and check again (the second one is good, i.e. 0 differences found.
$ rclone check old-project Google-paul:research-backups/old-project 2020/04/16 14:46:21 ERROR : tmp-new-dir/somefile.txt: File not in Google drive root 'old-project' 2020/04/16 14:47:04 NOTICE: Google drive root 'old-project': 1 files missing 2020/04/16 14:47:04 NOTICE: Google drive root 'old-project': 1 differences found 2020/04/16 14:47:04 NOTICE: Google drive root 'old-project': 381 matching files 2020/04/16 14:47:04 Failed to check with 2 errors: last error was: 1 differences found $ rclone sync old-project Google-paul:research-backups/old-project $ rclone check old-project Google-paul:research-backups/old-project 2020/04/16 14:48:17 NOTICE: Google drive root 'research-backups': 0 differences found 2020/04/16 14:48:17 NOTICE: Google drive root 'research-backups': 382 matching files
To get more speed (i.e. if you are going to move a lot) there are a number of other settings to adjust, for a good description, see
The first time you use 'gdrive' you'll need to authenticate to your google account, so it knows you are allowed to upload files. You just need to cut and paste a URL into a browser, to get that URL type 'gdrive about' and you'll see something like this
gdrive about Authentication needed Go to the following url in your browser: https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=3b3a15a0-bdb7-4cb0-abb4-3cb390278020.apps.googleusercontent.com&redirect_uri=d8a4cd5d-9700-482c-9c7e-9e91d4591e78&response_type=code Enter verification code:
Now take that line beginning with https and put it in a browser. You'll login to your google account with NetID@ucsb.edu and it will give you another code back, and that you just cut and paste back into your command line on the cluster where it's asking for your verification code. That's it!
Once you've set it up the first time, you can use it directly, e.g.
gdrive upload somefile.tar (which will upload the file 'somefile.tar' to your google drive).
you can upload entire directories with
gdrive upload --recursive some-directory-name
you can get more info on other commands with
note that gdrive works well with larger files, so if you have a directory with a lot of data that you want to back up, you're probably better off to make it into one large tar file and to transfer that. Note that you won't have access to the files unless you download and untar it, but presumably you're putting stuff in google that you aren't actively working on!
e.g. if you have a lot of files in a data directory named 'MicroscopeData2017'