I switched to MacOS last year for editing home gym videos. I was and
am still amazed by how fast the M1 chip is for exporting 4K
videos. The MacOS also enriched the Emacs experience which makes it
deserve another blog post.
So I have been slowly adapting my Emacs configuration and workflow to
MacOS. One of the changes is the Emacs server.
The goal is to have fully loaded Emacs instances running all the time
so I can use them at any time and anywhere, in Terminal or Spotlight. They are
initiated upon login. In cases of Emacs crashes (it is rare but more
often than I like) or I have to stop them because I messed up the
configuration, they restart automatically.
It is an extension of Emacs Plus' plist file. I made a few changes for
running two Emacs servers: one for work (data sciences, research) and
one for personal usage (GTD, books). Taking the "work" server as an
example, the important attributes of the plist configuration file are:
Line 5
The unique service name to launchctl
Line 8
The full path to the Emacs program. In my case, it is
/opt/homebrew/opt/emacs-plus@29/bin/emacs
Line 9
The "--fg-daemon" option set the Emacs server name to
"work". Later I can connect to this server by specifying "-s=work"
option to emacsclient
Line 13
The KeepAlive is set to true so it keeps trying to
restart the server in case of failures
Line 16 and 18
The location of standard output and error
files. They are used to debug. Occasionally I have to check those
files to see why Emacs servers stopped working, usually because of
me introducing bugs in my .emacs.d.
With the updated plist files in place, I start the Emacs servers with
The launchctl list | grep -i emacs is a handy snippet that lists the
status of the services whose name includes "emacs". The output I have
right now is
PID
Exit Code
Server ID
1757
0
emacs_org
56696
0
emacs_work
It shows both Emacs servers are running fine with exit code 0.
Launch Emacs GUI in Terminal
I can now open a Emacs GUI and connect it to the "work" Emacs server
by running emacsclient -c -s work &. The -c option
Launch Emacs GUI in Spotlight
In MacOS, I found it is natural to open applications using Spotlight,
for example, type ⌘ + space to invoke Spotlight, put "work" in the
search bar, it narrows the search down to "emacs_work" application,
and hit return to finalise the search. It achieves the same thing as
the command above but can be used anywhere.
I uploaded a demo video on YouTube to show it in action. You might want
to watch it at 0.5x speed because I typed so fast...
To implement this shortcut, open "Automator" application, start a new
"Application", select "Run Shell Script", and paste the following bash
code
/opt/homebrew/opt/emacs-plus@29/bin/emacsclient \--no-wait\--quiet\--suppress-output\--create-frame-s work \"$@"
and save it as emacsclient_work in the ~/Application
folder.
Essentially, the bash script above is wrapped up as a MacOS
application, named emacsclient_work and the Spotlight searches the
application folder by default.
I’m working on replicating the (Re-)Imag(in)ing Price Trends paper -
the idea is to train a Convolutional Neutral Network (CNN) "trader" to
predict the stocks' return. What makes this paper interesting is the
model uses images of the pricing data, not in the traditional
time-series format. It takes financial charts like the one below
and tries to mimic the traders' behaviours to buy and sell stocks to
optimise future returns.
Alphabet 5-days Bar Chart Shows OHLC Price and Volume Data
To train the model, the price and volume data are transformed into
black-white images which is just a 2D matrix with 0s and 1s. For just
around 100 stocks' pricing history, there are around 1.2 million
images in total.
I used the on-the-fly imaging process during training: in each batch,
it loads pricing data for a given stock, sample one day in the
history, slice a chunk of pricing data, and then convert it to an image. It
takes about 0.2 milliseconds (ms) to do all that, so in total it takes 4
minutes to loop through all the 1.2 million images.
1.92 ms ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
To train 10 epochs, that's 40 minutes in loading data. To train one
epoch on the full dataset with 5,000 stocks, that's 200 minutes in
loading data alone!
PyToch utilises multiple processing in loading the data using CPU
while training using GPU. So the problem is less severe, but I'm using
the needle, the deep learning framework we developed during the
course, it does have this functionality yet.
During training using needle, the GPU utilisation is only around
50%. After all the components in the end-to-end are almost completed,
it is time to train with more data, go deeper (larger/more complicated
morel), try hyper-parameters tuning etc.
But before moving to the next stage, I need to improve the IO.
Scipy Sparse Matrix
In the image above, there are a lot of black pixels or zeros in the data
matrix. In general only 5%-10% of pixels are white in this dataset.
So my first attempt was to use scipy's spare matrix instead of numpy's
dense matrix: I save the sparse matrix, loaded it, and then convert it
back to a dense matrix for training CNN model.
967 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
It reduces the IO time to 1ms, so about half of the time, not bad,
but I was expecting a lot more as the sparseness is high.
Numpy Bites
Then I realised the data behind images is just 0 and 1, in fact, a lot
of zeros, and only some are 1. So I can ignore the 0s and only need to
save those 1s, then reconstruct the images using those 1.
It is so simple that numpy has functions for this type of data
processing already. The numpy.packbites function converts the image
matrix of 0 and 1 into a 1D array whose values indicate where the 1s
are. Then the numpy.unpackbits does the inverse: it reconstructs the
image matrix by using the 1D location array.
This process reduces the time of loading one image to 0.2
milliseconds, that's 10 times faster than the on-the-fly method with
only a few lines of code.
194 µs ± 3.95 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Another benefit is the file size is much smaller: it is 188 bytes
compared to 1104 bytes using sparse matrix. So it takes only 226MB of
disk space to save 1.2 million images!
It takes a couple of minutes to generate 1.2 million files on my Debian
machine. It is so quick! But then I release this approach is not
scalable without modification because there's a limited number of
files the OS can accommodate. The technical term is Inode. According
to this StackExchange question, once the filesystem is created, one
cannot increase the limit (Yes, I was there).
Without going down to the database route, one quick workaround is to
bundle the images together, for example, 256 images in one file. So
later in training, load 256 images in one go, then split them into
chunks. Just ensure the number of images is a multiple of the batch
size used in training so I don't have to deal with unequal batch
sizes. Since those bundled images are trained together, it reduces the
randomness of SGD, so I won't bundle too many images together, 256
sounds about right.
The LSP and other tools can cause problems when they are monitoring
folders with a large number of files. Moving them out of the project
folder is the way to go so Emacs won't complain or freeze.
I have been working on the Deep Learning System course. It is the
hardest course I ever studied after university. I would never thought
that I need CI for a personal study project. It just shows how
complex this course is.
Here is the setup: the goal is to develop a pytorch-like DL library
that supports ndarray ops, autograd, and to implement DL models, LSTM
for example, from scratch. That's the exciting math part. The tricky
part is it supports both CPU devices with C++11 and GPU devices with
Cuda. On the user front, the interface is written in Python. I worked
on my M1 laptop most of the time, and switch to my Debian desktop for
Cuda implementation.
It was a fine Saturday afternoon, I made a breakthrough in implementing
the gradient of Convolution Ops in Python after couple of hours of
tinkering in a local coffee shop. I rushed home, boosted up Debian
to test the Cuda backend, only to find "illegible memory access"
error!
It took me a few cycles of rolling back to the previous change in git to
find where the problems are. It made me think about the needs of
CI. In the ideal scenario, I would have a CI that automatically runs
the tests on the CPU and Cuda devices to ensure one bug-fix on CPU
side doesn't introduce new bugs on the Cuda, and vice versa. But I
don't have this setup at home.
Two Components of PoorMan CI
So I implemented what I call PoorMan CI. It is a semi-automated
process that gives me some benefits of the full CI. I tried hard to
refrain from doing anything fancy because I don't have
time. The final homework is due in a few days. The outcome is simple yet
powerful.
The PoorMan CI consists of two parts:
a bunch of bash functions that I can call to run the tests, capture
the outputs, save them in a file, and version control it
For example, wrap the below snippet in a single function
pytest -l-v-k"not training and cuda"\> test_results/2022_12_11_12_48_44__fce5edb__fast_and_cuda.log
git add test_results/2022_12_11_12_48_44__fce5edb__fast_and_cuda.log
a log file where I keep track of the code changes, and if the new
change fixes anything, or breaks anything.
In the example below, I have a bullet point for each change committed
to git with a short summary, and a link to the test results. The
fce5edb and f43d7ab are the git commit hash values.
- fix grid setup, from (M, N) to (P, M)!
[[file:test_results/2022_12_11_12_48_44__fce5edb__fast_and_cuda.log]]
- ensure all data/parameters are in the right device. cpu and cuda, all pass! milestone.
[[file:test_results/2022_12_11_13_51_22__f43d7ab__fast_and_cuda.log]]
As you can see, it is very simple!
Benefits
It changed my development cycle a bit: each time before I can claim
something is done or fixed, I run this process which takes about 2
mins for two fast runs. I would use this time to reflect on what I've
done so far, write down a short summary about what's got fixed and
what's broken, check in the test results to git, update the test log
file etc.
It sounds tedious, but I found myself enjoying doing it, it
gives me confidence and reassurance about the progress I'm making. The
time in reflecting also gives my brain a break and provides clarity on
where to go next.
During my few hours of using it, it amazes me how easy it is to
introduce new issues while fixing existing ones.
Implement in Org-mode
I don't have to use Org-mode for this, but I don't want to leave Emacs
:) Plus, Org-mode shines in literate programming where code and
documentation are put together.
This is actually how I implemented it in the first place. This section
is dedicated to showing how to do it in Org-mode. I'm sure I will come
back to this shortly, so it serves as documentation for myself.
Here is what I did: I have a file called poorman_ci.org, a full
example can be found at this gist. An extract is demonstrated below.
I group all the tests logistically together into "fast and cpu", "fast
and cuda", "slow and cuda", "slow and cuda". I have a top level header
named group tests, Each group has their 2nd-level header.
The top header has a property drawer where I specify the shell session
within which the tests are run so that
* grouped tests
:PROPERTIES:
:CREATED: [2022-12-10 Sat 11:32]
:header-args:sh: :session *hw4_test_runner* :async :results output :eval no
:END:
it is persistent. I can switch to the shell buffer named
hw4_test_runner and do something if needed
it runs asynchronically on the background
All the shell code block under the grouped tests inherits those
attributes.
The first code block defines variables that used to create a run
id. It uses the timestamp and the git commit hash value. The run id is
used for all the code blocks.
#+begin_src sh :eval nowd="./test_results/"ts=$(date +"%Y_%m_%d_%H_%M_%S")git_hash=$(git rev-parse --verify--short HEAD)echo"run id: "${ts}__${git_hash}$#+end_src
To run the code block, move the cursor inside the code block, and hit C-c
C-c (control c control c).
Then I define the first code block to run all the tests on CPU except
language model training. I name this batch of tests "fast and cpu".
#+begin_src sh :var fname="fast_and_cpu.log"fname_full=${wd}/${ts}__${git_hash}__${fname}
pytest -l-v-k"not language_training and cpu"\
2>&1 | tee${fname_full}#+end_src
It creates the full path of the test results. The fname variable
is set at the code clock header, this is a nice feature of
Org-mode.
pytest provides an intuitive interface for filtering tests, here
I use "not language_training and cpu".
The tee program is used to show the outputs and errors and at the
same time save them to a file.
Similarly, I define code blocks for "fast and cuda", "slow and cpu",
"slow and cuda".
So at the end of the development cycle, I open the poorman_ci.org
file, run the code blocks sequentially, and manually update the change
log. That's all.
For machine learning projects, I tweaked my workflow so the
interaction with remote server is kept as less as possible. I prefer
to do everything locally on my laptop (M1 Pro) where I have all the
tools for the job to do data analysis, visualisation, debugging etc
and I can do all those without lagging or WI-FI.
The only usage of servers is running computation extensive tasks like
recursive feature selection, hyperparameter tuning etc. For that I ssh
to the server, start tmux, git pull to update the codebase, run a
bash script that I prepared locally to fire hundreds of
experiments. All done in Emacs of course thanks to Lukas Fürmetz’s
vterm.
The only thing left is getting the experiment results back to my
laptop. I used two approaches for copying the data to local: file
manager GUI and rsync tool in CLI.
Recently I discovered dired-rsync that works like a charm - it
combines the two approaches above, providing a interactive way of
running rsync tool in Emacs. What’s more, it is integrated
seamlessly into my current workflow.
They all have their own use case. In this post, I brief describe those
three approaches for coping files with a focus on dired-rsync in
terms of how to use it, how to setup, and my thoughts on how to
enhance it.
Note the RL stands for remote location, i.e. a folder a in remote
server, and LL stands for local location, the RL’s counterpart. The
action in discussion is how to efficiently copying files from RL to
LL.
File Manager GUI
This is the simplest approach requires little technical skills. The RL
is mounted in the file manager which acts as an access point so it can
be used just like a local folder.
I usually have two tabs open side by side, one for RL, and one for LL,
compare the differences, and then copy what are useful and exists in
RL but not in LL.
I used this approach on my Windows work laptop where rsync is not
available so I have to copy files manually.
Rsync Tool in CLI
The rsync tool is similar to cp and scp but it is much more
power:
It copies files incrementally so it can stop at anytime without
losing progress
The output shows what files are copied, what are remaining, copying
speed, overall progress etc
Files and folders can be included/excluded by specifying
patterns
I have a bash function in the project’s script folder as a shorthand
like this
copy_from_debian_to_laptop (){# first argument to this functionfolder_to_sync=$1# define where the RL is remote_project_dir=debian:~/Projects/2022-May
# define where the LL is local_project_dir=~/Projects/2022-May
rsync -avh--progress\${remote_project_dir}/${folder_to_sync}/ \${local_project_dir}/${folder_to_sync}}
To use it, I firstly cd (change directory) to the project directory
in terminal, call copy_from_debian_to_laptop function, and use the
TAB completion to quickly get the directory I want to copy, for
example
This function is called more often from a org-mode file where I kept
track of all the experiments.
Emacs’ Way: dired-rsync
This approach is a blend of the previous two, enable user to enjoy the
benefits of GUI for exploring and the power of rsync.
What’s more, it integrates so well into the current workflow by simply
switching from calling dired-copy to calling dired-rsync, or
pressing r key instead of C key by using the configuration in this
post.
To those who are not familiar with copying files using dired in
Emacs, here is the step by step process:
Open two dired buffer, one at RL and one at LL, either manually
or using bookmarks
Mark the files/folders to copy in the RL dired buffer
Press r key to invoke dired-rsync
It asks for what to copy to. The default destination is LL so press
Enter to confirm.
After that, a unique process buffer, named *rsync with a timestamp
suffix, is created to show the rsync output. I can stop the copying by
killing the process buffer.
Setup for dired=rsync
The dired-rsync-options control the output shown in the process
buffer. It defaults to “-az –info=progress2”. It shows the overall
progress in one-line, clean and neat (not in MacOS though, see Issue
36). Sometimes I prefer “-azh –progress” so I can see exactly which
files are copied.
There are other options for showing progress in modeline
(dired-rsync-modeline-status), hooks for sending notifications on
failure/success (dired-rsync-failed-hook and
dired-rsync-success-hook).
Overall the library is well designed, and the default options work for
me, so I can have a bare-minimal configuration as below (borrowed from
ispinfx):
There are two more things to do on the system side:
In macOS, the default rsync is a 2010 version. It does not work
with the latest rsync I have on Debian server so I upgrade it using
brew install rsync.
There no way of typing password as a limitation of using process
buffer so I have to ensure I can rsync without remote server asking
for password. It sounds complicated but fortunately it takes few
steps to do as in Setup Rsync Between Two Servers Without Password.
Enhance dired-rsync with compilation mode
It’s such a great library that makes my life much easier. It can be
improved further to provide greater user experience, for example, keep
the process buffer alive as a log after the coping finished because
the user might want to have a look later.
At the moment, there’s no easy way of changing the arguments send to
rsync. I might want to test a dry-run (adding -n argument) so I can
see exactly what files are going to be copied before running, or I
need to exclude certain files/folders, or rerun the coping if there’s
new files generated on RL.
If you used compilation buffer before, you know where I am
going. That’s right, I am thinking of turning the rsync process buffer
into compilation mode, then it would inherit these two features:
Press g to rerun the rsync command when I know there are new
files generated on the RL
Press C-u g (g with prefix) to change the rsync arguments before
running it for dry-run, inclusion or exclusion
I don’t have much experience in elisp but I had a quick look at source
code, it seems there’s no easy of implementing this idea so something
to add to my ever-growing Emacs wish-list.
The good thing about Emacs is that you can always tweak it to suit
your needs. For years I’ve been doing it for productivity reasons. Now
for the first time, I’m doing it for health reasons.
Life can be sht sometimes, when I was in my mid 20s, I was reshaping
every aspects of my life for good. But optician told me my vision can
only get worse. I wasn’t paying much attention, busy with my first
job and learning.
Last month, I was told my right eye’s vision got whole point worse,
whatever that means. Now I’m wearing a new pair of glasses, seeing the
world in 4K using both eyes, noticing so much details. It makes the
world so vibrate and exciting. It comes with a price though, my eyes
get tired quickly, and it become so easy to get annoyed by little
things.
One of them is switching windows in Emacs. Even though I am in the
period of calibrating to the new glasses, I decided to take some
actions.
Ace-Window
Depends on the complexity of the tasks, I usually have about 4-8
windows laid on my 32 inch monitor. If that’s not enough, I would have
an additional frame of similar windows layout, doubling the number of
windows to 8-16.
So I found myself switching between windows all the time. The action
itself is straightforward with ace-window.
The process can be breakdwon into five steps:
Invoke ace-window command by pressing F2 key,
The Emacs buffers fade-in,
A red number pops-up at the top left corner of each window,
I press the number key to switch the window it associates with,
After that, the content in each Emacs buffer are brought back.
This approach depends on visual feedback - I have to look at the
corner of the window to see the number. Also, the screen flashes
twice during the process.
I tried removing the background dimming, increase the font size of the
number to make it easier to see, and bunch of other tweaks.
In the end, my eyes were not satisfied.
Windmove
So I started looking for alternative approaches and found windmove
which is built-in.
The idea is simple - keep move to the adjacent window by move left,
right, up, or down until it arrives at the window I want.
So it uses the relative location between windows instead of assigning
each window a unique number and then using the number for switching.
Is it really better? Well with this approach, I use my eyes a lot less
as I do not have to look for the number. Plus, I feel this is more
nature as I do not need to work out the directions, somehow I just
know I need to move right twice or whatever to get to the destination.
The only issue I had so far is the conflicts with org-mode’s
calendar. I like the keybinding in org-mode, so I disabled windmove
in org-mode’s calendar with the help from this stackoverflow question.
The following five lines of code is all I need to use windmove.
I created an git branch for switching from ace-window to
windmove. I would try it for a month before merge it into master
branch.
Back to where it started
After using it for few days, I realised this is the very package I
used for switch windows back in 2014 when I started learning Emacs. I
later then switched to ace-window because it looks pretty cool.
Life is changing, my perspectives are changing, so is my Emacs
configuration. This time, it goes back to where I started 8 years ago.