Synching your on-line storage to the HURCS file-system is made easy with the help of a small utility called rclone
.
Rclone is an rsync based software for syncing cloud services from the command line. As such, it can be used to sync / upload files from your HURCS account to Google Drive. All HUJI members are entitled to unlimited space in their HUJI google account and thus it can be easily used to backup data and free up space.
Bear in mind that sync operations may take a long time to run, and should never be performed from gateway nodes. See our “non-compute work” page for more details.
The rclone
program is available on all Moriah cluster nodes, but needs to be loaded first using the module
command (see ‘software modules’ ). If you are using a Linux workstation in your lab or are otherwise connected to a different CS gateway than Moriah, you will also have to load the hurcs
module:
rclone
does not work as the module isn't loaded. module load rclone
also fails since the ‘dore’ machine is not part of the HURCS environment.module load hurcs rclone
will work on any CS machine.Once you can run the rclone
program from the CLI, you will need to configure it to work with your google-drive account.
Pay Attention! The configuration program will assume that you can open a web-browser.
If you have connected to HURCS with a GUI enabled session (like MobaXTerm or ssh -X, see Login to Cluster for more details) you can glide through the config process more easily. Otherwise, pay close attention to the questions in the config script.
To enter the configuration menu, type rclone config
on your command line:
In the configuration menu, type n
to configre a new remote service. The first two questions in the configuration process are critical:
name>
: The first thing you need to input is the name of the service. gdrive
as the name of our remote service.Type of storage to configure
: The second question in the configuration process is the type of storage you want to connect to.The next two values (client_id
and client_secret
) can be left blank:
When asked for scope
of access that rclone will get, choose 1
for full read/write access, followed by a few default responses:
root_folder_id>
hit enter for defaultservice_account_file>
hit enter for defaultEdit advanced config?
hit enter for defaultMake sure that you hit “no” for auto-config if you are not connected to the CS network in a GUI supporting way!
scope> 1
root_folder_id> ""
service_account_file> ""
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n
Once done, the config script will print out a URL for authenticating your google account.
Verification code
Go to this URL, authenticate then paste the code here.
https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=………………-UvvvdYQ
And once authenticated, you will be presented with some final questions that can also be left blank.
A complete log of a configuration process can be seen below:
<15|0>zivben@dore:~% rclone config
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> gdrive
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
Storage> 15
Google Application Client Id
Setting your own is recommended.
See https://rclone.org/drive/#making-your-own-client-id for how to create your own.
If you leave this blank, it will use an internal key which is low performance.
Enter a string value. Press Enter for the default ("").
client_id>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Scope that rclone should use when requesting access from drive.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
scope> 1
ID of the root folder
Leave blank normally.
Fill in to access "Computers" folders (see docs), or for rclone to use
a non root folder as its starting point.
Enter a string value. Press Enter for the default ("").
root_folder_id>
Service Account Credentials JSON file path
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Leading `~` will be expanded in the file name as will environment variables such as
${RCLONE_CONFIG_DIR}`.
Enter a string value. Press Enter for the default ("").
service_account_file>
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n
Verification code
Go to this URL, authenticate then paste the code here.
https://accounts.google.com/o.....8ov2-UvvvdYQ
Enter a string value. Press Enter for the default ("").
config_verification_code> 4/1AX4XfWj694………………….3bRTE5g
Configure this as a Shared Drive (Team Drive)?
y) Yes
n) No (default)
y/n> y
--------------------
[gdrive]
type = drive
scope = drive
token = {"access_token":"ya29.A0ARr…………gGxreaOlBSBUjWiH1E",
"token_type":"Bearer",
"refresh_token":"1//09j8IDyPCnpajCg…eRBH9hnzQKvc",
"expiry":"2022-04-03T17:49:05.375187779+03:00"}
team_drive =
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
gdrive drive
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
Uploading and syncing can be done by using rclone sync.
Several arguments should be taken into consideration:
--cache-rps 50
& --tpslimit 10
- limit the number of requests for the server. Google Drive’s API has a limit of requests per second to avoid abuse, these settings help in ensuring you will not get rejected during syncing.--dry-run
- can be used to show what will be uploaded without executing--copy-links
- follow symbolic links--drive-use-trash=$X (true\false)
- deleted files can be either permanently deleted (false) or moved to grive’s trash (true - default)A final sync command may look like this:
rclone sync \
--cache-rps 50 --tpslimit 10 \
$source_dir $dest_dir
Where $source_dir
and $dest_dir
should be formated depending on whether you are syncing from or to your drive location.
In general, both directory paths are written in standard linux-style (with forward slashes), but the remote path needs to be preceded by “gdrive:” (or any other remote type if you have them set-up). A typical setup of these variables can look like this:
# source directory or directory to SYNC:
# "A regular file path"
source_dir=/sci/nosnap/yaronw/zivben/HURCS
# remote location to sync to:
# "$NAME_OF_REMOTE:$PATH_AT_DRIVE"
dest_dir=gdrive:HURCS
¶ Warning!
Regardless of whether you'll be syncing from or to your drive, the first path argument (i.e.
$source_dir
) is always the one rclone will be copying from. so make sure not to run the command with an empty folder as your source dir, or your destination dir will be erased!
Since clone jobs can take a long time, as well as lots of memory, large jobs should be executed as a slurm job, like advised on our “non-compute work" page.
A sbatch syncing script may look something like the following: This script can be used to sync the entire $USER
home-folder with the GDrive directory Research (gdrive:Research/
) This script is executed using sbatch (e.g. sbatch rclone_sync.sh
) everytime you want to sync to the GDrive, unlike the desktop client which syncs continuosly.
#!/bin/bash
#SBATCH --mem=8000
#SBATCH --time=96:00:00
#SBATCH -e sync_logs/latest.err
#SBATCH -o sync_logs/latest.out
# ensure that the rclone module is loaded:
module load hurcs rclone
#source directory or directory to SYNC
source_dir=/sci/home/$USER/
#drive location to sync to
rclone_dest=gdrive:Research/
#log path:
log_path=sync_logs/rclone_$(date +%Y_%m_%d_%H:%M:%S).log
#drive sync
#add --dry-run for a test run
rclone sync \
--cache-rps 50 -v --tpslimit 10 \
--checkers 8 --transfers 4 \
$source_dir $rclone_dest 2>&1 | tee $log_path