Machine Learning in Emacs - Copy Files from Remote Server to Local Machine
31 Jul 2022
dired-rsync
is a great additional to my Machine Learning workflow in Emacs
Table of Contents
- File Manager GUI
- Rsync Tool in CLI
- Emacs’ Way:
dired-rsync
- Setup for
dired=rsync
- Enhance
dired-rsync
with compilation mode
For machine learning projects, I tweaked my workflow so the interaction with remote server is kept as less as possible. I prefer to do everything locally on my laptop (M1 Pro) where I have all the tools for the job to do data analysis, visualisation, debugging etc and I can do all those without lagging or WI-FI.
The only usage of servers is running computation extensive tasks like
recursive feature selection, hyperparameter tuning etc. For that I ssh
to the server, start tmux, git pull
to update the codebase, run a
bash script that I prepared locally to fire hundreds of
experiments. All done in Emacs of course thanks to Lukas Fürmetz’s
vterm.
The only thing left is getting the experiment results back to my
laptop. I used two approaches for copying the data to local: file
manager GUI and rsync
tool in CLI.
Recently I discovered dired-rsync
that works like a charm - it
combines the two approaches above, providing a interactive way of
running rsync
tool in Emacs. What’s more, it is integrated
seamlessly into my current workflow.
They all have their own use case. In this post, I brief describe those
three approaches for coping files with a focus on dired-rsync
in
terms of how to use it, how to setup, and my thoughts on how to
enhance it.
Note the RL stands for remote location, i.e. a folder a in remote server, and LL stands for local location, the RL’s counterpart. The action in discussion is how to efficiently copying files from RL to LL.
File Manager GUI
This is the simplest approach requires little technical skills. The RL is mounted in the file manager which acts as an access point so it can be used just like a local folder.
I usually have two tabs open side by side, one for RL, and one for LL, compare the differences, and then copy what are useful and exists in RL but not in LL.
I used this approach on my Windows work laptop where rsync
is not
available so I have to copy files manually.
Rsync Tool in CLI
The rsync
tool is similar to cp
and scp
but it is much more
power:
- It copies files incrementally so it can stop at anytime without losing progress
- The output shows what files are copied, what are remaining, copying speed, overall progress etc
- Files and folders can be included/excluded by specifying patterns
I have a bash function in the project’s script folder as a shorthand like this
To use it, I firstly cd
(change directory) to the project directory
in terminal, call copy_from_debian_to_laptop
function, and use the
TAB completion to quickly get the directory I want to copy, for
example
This function is called more often from a org-mode file where I kept track of all the experiments.
Emacs’ Way: dired-rsync
This approach is a blend of the previous two, enable user to enjoy the
benefits of GUI for exploring and the power of rsync
.
What’s more, it integrates so well into the current workflow by simply
switching from calling dired-copy
to calling dired-rsync
, or
pressing r
key instead of C
key by using the configuration in this
post.
To those who are not familiar with copying files using dired
in
Emacs, here is the step by step process:
- Open two
dired
buffer, one at RL and one at LL, either manually or using bookmarks - Mark the files/folders to copy in the RL
dired
buffer - Press r key to invoke
dired-rsync
- It asks for what to copy to. The default destination is LL so press Enter to confirm.
After that, a unique process buffer, named *rsync with a timestamp suffix, is created to show the rsync output. I can stop the copying by killing the process buffer.
Setup for dired=rsync
The dired-rsync-options control the output shown in the process buffer. It defaults to “-az –info=progress2”. It shows the overall progress in one-line, clean and neat (not in MacOS though, see Issue 36). Sometimes I prefer “-azh –progress” so I can see exactly which files are copied.
There are other options for showing progress in modeline (dired-rsync-modeline-status), hooks for sending notifications on failure/success (dired-rsync-failed-hook and dired-rsync-success-hook).
Overall the library is well designed, and the default options work for me, so I can have a bare-minimal configuration as below (borrowed from ispinfx):
There are two more things to do on the system side:
-
In macOS, the default rsync is a 2010 version. It does not work with the latest rsync I have on Debian server so I upgrade it using
brew install rsync
. -
There no way of typing password as a limitation of using process buffer so I have to ensure I can rsync without remote server asking for password. It sounds complicated but fortunately it takes few steps to do as in Setup Rsync Between Two Servers Without Password.
Enhance dired-rsync
with compilation mode
It’s such a great library that makes my life much easier. It can be improved further to provide greater user experience, for example, keep the process buffer alive as a log after the coping finished because the user might want to have a look later.
At the moment, there’s no easy way of changing the arguments send to
rsync. I might want to test a dry-run (adding -n
argument) so I can
see exactly what files are going to be copied before running, or I
need to exclude certain files/folders, or rerun the coping if there’s
new files generated on RL.
If you used compilation buffer before, you know where I am going. That’s right, I am thinking of turning the rsync process buffer into compilation mode, then it would inherit these two features:
- Press g to rerun the rsync command when I know there are new files generated on the RL
- Press C-u g (g with prefix) to change the rsync arguments before running it for dry-run, inclusion or exclusion
I don’t have much experience in elisp but I had a quick look at source code, it seems there’s no easy of implementing this idea so something to add to my ever-growing Emacs wish-list.
In fact, the limitation comes from using lower level elisp functions. The Emacs Lisp manual on Process Buffers states that
Many applications of processes also use the buffer for editing input to be sent to the process, but this is not built into Emacs Lisp.
What a pity. For now I enjoy using it and look for opportunities to use it.