Yi Tang Data Science and Emacs

Machine Learning in Emacs - Copy Files from Remote Server to Local Machine

dired-rsync is a great additional to my Machine Learning workflow in Emacs

Table of Contents

For machine learning projects, I tweaked my workflow so the interaction with remote server is kept as less as possible. I prefer to do everything locally on my laptop (M1 Pro) where I have all the tools for the job to do data analysis, visualisation, debugging etc and I can do all those without lagging or WI-FI.

The only usage of servers is running computation extensive tasks like recursive feature selection, hyperparameter tuning etc. For that I ssh to the server, start tmux, git pull to update the codebase, run a bash script that I prepared locally to fire hundreds of experiments. All done in Emacs of course thanks to Lukas Fürmetz’s vterm.

The only thing left is getting the experiment results back to my laptop. I used two approaches for copying the data to local: file manager GUI and rsync tool in CLI.

Recently I discovered dired-rsync that works like a charm - it combines the two approaches above, providing a interactive way of running rsync tool in Emacs. What’s more, it is integrated seamlessly into my current workflow.

They all have their own use case. In this post, I brief describe those three approaches for coping files with a focus on dired-rsync in terms of how to use it, how to setup, and my thoughts on how to enhance it.

Note the RL stands for remote location, i.e. a folder a in remote server, and LL stands for local location, the RL’s counterpart. The action in discussion is how to efficiently copying files from RL to LL.

File Manager GUI

This is the simplest approach requires little technical skills. The RL is mounted in the file manager which acts as an access point so it can be used just like a local folder.

I usually have two tabs open side by side, one for RL, and one for LL, compare the differences, and then copy what are useful and exists in RL but not in LL.

I used this approach on my Windows work laptop where rsync is not available so I have to copy files manually.

Rsync Tool in CLI

The rsync tool is similar to cp and scp but it is much more power:

  1. It copies files incrementally so it can stop at anytime without losing progress
  2. The output shows what files are copied, what are remaining, copying speed, overall progress etc
  3. Files and folders can be included/excluded by specifying patterns

I have a bash function in the project’s script folder as a shorthand like this

copy_from_debian_to_laptop () {
    # first argument to this function
    folder_to_sync=$1
    # define where the RL is 
    remote_project_dir=debian:~/Projects/2022-May
    # define where the LL is 
    local_project_dir=~/Projects/2022-May          
    rsync -avh --progress \
	  ${remote_project_dir}/${folder_to_sync}/ \
	  ${local_project_dir}/${folder_to_sync}
}

To use it, I firstly cd (change directory) to the project directory in terminal, call copy_from_debian_to_laptop function, and use the TAB completion to quickly get the directory I want to copy, for example

copy_from_debian_to_laptop experiment/2022-07-17-FE

This function is called more often from a org-mode file where I kept track of all the experiments.

Emacs’ Way: dired-rsync

This approach is a blend of the previous two, enable user to enjoy the benefits of GUI for exploring and the power of rsync.

What’s more, it integrates so well into the current workflow by simply switching from calling dired-copy to calling dired-rsync, or pressing r key instead of C key by using the configuration in this post.

To those who are not familiar with copying files using dired in Emacs, here is the step by step process:

  1. Open two dired buffer, one at RL and one at LL, either manually or using bookmarks
  2. Mark the files/folders to copy in the RL dired buffer
  3. Press r key to invoke dired-rsync
  4. It asks for what to copy to. The default destination is LL so press Enter to confirm.

After that, a unique process buffer, named *rsync with a timestamp suffix, is created to show the rsync output. I can stop the copying by killing the process buffer.

Setup for dired=rsync

The dired-rsync-options control the output shown in the process buffer. It defaults to “-az –info=progress2”. It shows the overall progress in one-line, clean and neat (not in MacOS though, see Issue 36). Sometimes I prefer “-azh –progress” so I can see exactly which files are copied.

There are other options for showing progress in modeline (dired-rsync-modeline-status), hooks for sending notifications on failure/success (dired-rsync-failed-hook and dired-rsync-success-hook).

Overall the library is well designed, and the default options work for me, so I can have a bare-minimal configuration as below (borrowed from ispinfx):

(use-package dired-rsync
  :demand t
  :after dired
  :bind (:map dired-mode-map ("r" . dired-rsync))
  :config (add-to-list 'mode-line-misc-info '(:eval dired-rsync-modeline-status 'append))
  )

There are two more things to do on the system side:

  1. In macOS, the default rsync is a 2010 version. It does not work with the latest rsync I have on Debian server so I upgrade it using brew install rsync.

  2. There no way of typing password as a limitation of using process buffer so I have to ensure I can rsync without remote server asking for password. It sounds complicated but fortunately it takes few steps to do as in Setup Rsync Between Two Servers Without Password.

Enhance dired-rsync with compilation mode

It’s such a great library that makes my life much easier. It can be improved further to provide greater user experience, for example, keep the process buffer alive as a log after the coping finished because the user might want to have a look later.

At the moment, there’s no easy way of changing the arguments send to rsync. I might want to test a dry-run (adding -n argument) so I can see exactly what files are going to be copied before running, or I need to exclude certain files/folders, or rerun the coping if there’s new files generated on RL.

If you used compilation buffer before, you know where I am going. That’s right, I am thinking of turning the rsync process buffer into compilation mode, then it would inherit these two features:

  1. Press g to rerun the rsync command when I know there are new files generated on the RL
  2. Press C-u g (g with prefix) to change the rsync arguments before running it for dry-run, inclusion or exclusion

I don’t have much experience in elisp but I had a quick look at source code, it seems there’s no easy of implementing this idea so something to add to my ever-growing Emacs wish-list.

In fact, the limitation comes from using lower level elisp functions. The Emacs Lisp manual on Process Buffers states that

Many applications of processes also use the buffer for editing input to be sent to the process, but this is not built into Emacs Lisp.

What a pity. For now I enjoy using it and look for opportunities to use it.

If you have any questions or comments, please post them below. If you liked this post, you can share it with your followers or follow me on Twitter!
comments powered by Disqus