I am having a hard time keeping my git repositories clean: there are
just too many repositories, I counted 31 in total, and I have 5
computers where I work on them.
The consequence is that sometimes I get surprised at seeing a lot of
seemingly useful changes that are not committed to the git repo. I had
to stop whatever I was doing to just think about what to do with those
changes. It breaks the flow!
There are other occasions where I thought I fixed some bugs, but I
don’t have the patches on my laptop. It turned out I didn’t check in
to the cloud, so I have to log back to the right server to run a couple
of git commands, or if I don’t have access to the servers, I have
to fix the bugs from scratch again. It is inefficient!
It can happen a lot in active projects where I work on multiple
systems and multiple git repos or when I travel. I plan to revisit my
filesystem (which is inspired by Stephen Wolfram 1) and tech
setup to reduce the number of repos by merging them and keeping only
1 laptop, 1 workstation and 1 server. This is something for summer, it
can reduce the severity of the problem but can not eliminate it.
At the moment, I just have to become more disciplined in managing
files, e.g. to have an atomic habit of checking my git repo regularly,
or at least do it once at the end of the day, or as part of the
shutdown ritual after finishing a task2.
Emacs Lisp Helper
The 3rd Law of Behavior Change is make it easy.
James Clear, Atomic Habit
To facilitate the forming of this habit, I implemented a utility
function in Lisp to list the dirty git repo, and provide a clickable
link to the magit-status buffer of the git repo. With one click on
the hyperlink, I can start to run git commands via the mighty magit
package. I bind this action to keystroke F9-G.
The workhorse is the git status --porcelain command: If the git repo
is clean, it returns nothing, otherwise, it outputs the file names
whose changes are not checked in, e.g. the first file is modified (M),
and the second file is not untracked (??).
M config/Dev-R.el
?? snippets/org-mode/metric
The rest of the code is for parsing the outputs and turning them
into a user-friendly format in Org-mode. What’s interesting is that
The org-mode provides a kind of hyperlink that evaluates Lisp
expressions, using the example below,
The description of the hyperlink is “Git Status of Repo /foo” , after
I click it, it runs the expression (magit-status "/foo") which shows
the git status of /foo repo in a dedicated buffer.
Before executing it will ask for a confirmation. It can be a bit
annoying and inconvenienced at first which naturally leads to the
temptation of removing this behaviour by setting
org-link-elisp-confirm-function to nil. I discourage you from doing
so in case someone embeds funny codes, (for example rm -rf ~/) in
a hyperlink, so make sure to check that variable’s documentation
before changing it3!
Practise
It was fun to write the lisp functions. I learnt how to use the
optional function argument and interactive so that the function can
be used both interactively and pragmatically. I’m very much wanting to
spend more time in coding, to enhance it with some ideas I got from
reading Xu Chunyang’s osx-dictionary package4.
However, the effectiveness of those functions has little to do with
the extra features I had in mind but really depends on how I use
them. Solving the problems requires deliberate practise and changing
my behaviours so that cleaning git repos becomes a habit of mine, which
is always the hardest part.
One key indicator for this habit5 can be the number of check-ins
and see if there’s a substantial increase from today.
Continuing from my last post, the EPA provides a seamless interface when
working with GPG files in Emacs. But there are situations where I have
to work with GPG files using other programs (mostly Python) which EPA
cannot help.
For those cases, I have to decrypt the GPG files first before using
them (for example, calling pandas.read_csv).
Obviously, there’s no point in encrypting a file if there is a
decrypted version next to it. So I also need a function to delete all
the decrypted files.
Emacs Lisp Implementation
Of course, I run Python inside of Emacs, I wrote the Lisp functions to
decrypt GPG files and delete all the decrypted files.
A bit of explanation:
directory-files-recursively: searches for files with a
pattern. Here, it returns all the files ending with .gpg under the
given root-dir,
dolist: loops over the GPG files to process them one by one,
epa-decrypt-file: decrypts a GPG file into a new file.
delete-file: deletes a given filename.
It seems the epa-decrypt-file function does not like the new
filename with the directory in its path, so I have to set the default
directory (working directory) and use the base filename after removing
the directory as a workaround.
Bash Implementation
It would be useful to have those functionalities outside of the Emacs,
so I implemented their counterpart in Bash.
The interface is the same: given a root directory, it decrypts all the
GPG files or deletes the decrypted files.
A little bit of Bash:
$1: refers to the first function argument, $2 refers to the
second function argument and so on. This is the Bash way. When the
function is called, $1 will be replaced with the actual argument,
here it means the root directory.
$(find …): is a list of files returned by the find program. In
this context, it stands for all the files whose filename ends with
.gpg.
It can be achieved using ls program but it will be a lot
slower 1 and requires some configuration in MacOS 2.
${fn%.*}: removes the last file extension of the variable $fn$,
for example, foo.tar.gz.gpg becomes foo.tar.gz.
Another approach is using $(basename $fn .gpg) to remove the
.gpg extension explicitly.
for, do, done: loops through each file.
The Bash functions have the advantage of being easily incorporated
into the system, for example, call the remove_decrypted_files
function automatically prior to shutting down or after login.
I have growing concerns about data security. It is not that I have
something to hide, it’s that I don’t like how my data is being
harvested in general by the big corporations for their own benefits,
which is mostly trying to sell me stuff that I don’t need or I
purchased already. Seeing the advertisements specifically targeting me
motivates me to do something.
Setting my personal cloud seems a bit too extreme, and I don’t have
the time for it anyway. So I did a little “off-the-grid” experiment
in which I exclusively used an offline Debian laptop for data
sensitivity work (password management, personal finance, diary
etc). It is absolutely secure for sure, but the problem is
accessibility: I can only work when I have access to the physical
hardware.
It becomes infeasible when I travel, and it gives me some headaches to
maintain one more system. Also, the laptop’s screen is only 720p, I
can literally see the pixels when I write; it feels criminal to not
use the MBP’s Retina display. Lastly, It cannot be off the grid
completely; at one point, I have to back it up to the cloud.
So I spent some time researching and learning. I just need a data
protection layer so that I don’t have to worry about leaking private
data accidentally by myself, or the cloud storage provider getting hacked.
The benefits include not only having peace of mind but also
encouraging myself to work on those types of projects with greater
convenience.
GNU Privacy Guard (GPG)
is the tool I settled with. It is a 24 years old software that enables
encrypting/decrypting files, emails or online communication in
general. It is part of the GNU project which weighs a lot to me.
There are two methods in GPG:
Symmetric method: The same password is used to both encrypt and decrypt
the file, thus the symmetric in its name.
Asymmetric method: It requires a public key to encrypt, and a
separate private key to decrypt.
There seems no clear winner in which method is better1. I choose
the asymmetric method simply for its ease of use. The symmetric method
requires typing the passwords twice whenever I save/encrypt the file
which seems too much.
The GPG command line interface is simple. Take the below snippet as an
example,
The first line encrypts the foo.org file using the public key identified as
“Bob”. It results in a file named foo.org.gpg.
The second line decrypts the foo.org.gpg file to foo2.org which will
be identical to foo.gpg.
EPA - Emacs Interface to GPG
Emacs provides a better interface to GPG: Its EPA package enables me
to encrypt/decrypt files in place. So I don’t have to keep jumping
between the decrypted file (foo.org) and the encrypted file
(foo.org.gpg) while working on it.
Below is the simple configuration that works well for me and its
explanation.
epa-file-enable: is called to add hooks to find-file so that
decrypting starts after opening a file in Emacs. It also ensures the
encrypting starts when saving a GPG file I believe.
To stop this behaviour, call (epa-file-disbale) function.
epa-file-encrypt-to: to choose the default key for
encryption.
This variable can be file specific, for example, to use the key
belonging to foo2@bar.com key, drop the following in the file
;; -*- epa-file-encrypt-to: ("foo2@bar.com") -*-
epg-pinentry-mode: should be set to loopback so that GPG reads
the password from Emacs’ minibuffer, otherwise, an external program
(pinentry if installed) is used.
Org-Agenda and Dired
That’s more benefits Emacs offers in working with GPG files. Once I
have the EPA configured, the org-agenda command works pretty well
with encrypted files with no extra effort.
In the simplified example below, I have two GPG files as
org-agenda-files. When the org-agenda is called, Emacs first try
to decrypt the foo.org.gpg file. It requires me to type the password
in a minibuffer.
The password will be cached by the GPG Agent and will be used to
decrypt the bar.org.gpg assuming the same key is used for both files. So I
only need to type the passphrase once.
After that, org-agenda works as if these GPG files are normal
unencrypted files; I can extract TODO lists, view the clock summary
report, search text and check schedules/deadlines etc.
The dired provides functions to encrypt (shortcut “:e”) and decrypt
(shortcut “:d”) multiple marked files in a dired buffer. Under the
hood, they call the epa-encrypt-file and epa-decrypt-file
functions.
Lisp to Close all GPG Files
It seems that once a buffer is decrypted upon opening or encrypted upon
saving in Emacs, it stays as decrypted forever. So I need a utility
function to close all the GPG buffers in Emacs to avoid leakage.
Before I share my screens or start working in a coffee shop, I would
call this function to ensure I close all buffers with sensitive data.
My notes show I have had this issue since 9 months ago. I made another
attempt, but still could not find a solution!
Solution
I then switched to tidy up my Emacs configuration, and the variable
org-html-prefer-user-labels caught my eye.
its documentation says
By default, Org generates its own internal ID values during HTML
export.
When non-nil use user-defined names and ID over internal ones.
So “#org0238b9f” is generated by org-mode. They are randomly
generated; they change if I update the export file. It means every
time I update a blog post, it breaks the URLs. This was a problem I
wasn’t aware of.
Anyway, what’s important is that, in the end, it says
Independently of this variable, however, CUSTOM_ID are always
used as a reference.
That’s it, I just need to set CUSTOM_ID. That’s the solution to my
problem. It is hidden in the documentation of some variables…
Implementation
So I need a function to loop through each node, and set the CUSTOM_ID
property to its headline. The org-mode API provides three helpful
functions for working with org files:
org-entry-get: to get a textual property of a node. the headline
title is referenced as “ITEM”,
org-entry-put: to set a property of a node,
org-map-entries: to apply a function to each node.
I changed the final function a bit so it is used as an export hook
(org-export-before-processing-functions) as an experiment. With this
setup, it runs automatically whenever I export a blog post in org-mode
to Markdown. Also, it works on the exported file so it leaves the
original org file unchanged.
The code is listed below. It can also be found at my .emacs.d git repo
which includes many other useful Emacs configurations for Jekyll.
I’m the type of writer who writes first and comes up with the title
later. The title in the end is usually rather different to what I
started with. To change the title is straightforward - update the
title and date fields in the front matter.
However, doing so leads to discrepancies between the title and date
fields in front matter and the filename. In Jekyll, the filename
consists of the original date and title when the post is first
created.
This can be confusing sometimes in finding the file when I want to
update a post. I have to rely on grep/ack to find the right files. A
little bit of inefficiency is fine.
Recently, I realised that readers sometimes can be confused as well
because the URL apparently also depends on the filename.
For example, I have my previous post in a file named
2022-12-08-trx-3970x.md. It indicates that I started writing it on 08
Dec with the initial title “trx 3970x”. A couple of days later on 13
Dec, I published the post with the title “How Much Does Threadripper
3970x Help in Training LightGBM Models?”.
The URL is however yitang.uk/2022/12/13/trx-3970x. It has the correct
updated publish date, but the title is still the old one. This is just
how Jekyll works.
From that point, I decided to write a bit of Emacs Lisp code to help the
readers.
Emacs Lisp Time
The core functionality is updating the filename and front matter to
have the same publish date and title. It can breakdown into three
parts:
when called, it promotes a new title. The publish date is fixed to
whenever the function is called.
It renames the current blog post file with the new date and title.
It also updates the title and date fields in the front matter
accordingly.
It deletes the old file, closes the related buffer, and opens the
new file so I can continue to work on it.
My Emacs Lisp coding skill is rusty but I managed to get it working in
less than 2 hours. I won’t say it looks beautiful, but it does the
job!
I spent a bit of time debugging, it turns out the (org-show-all)
needs to be called first to flatten the org file, otherwise, editing
with some parts of the content hidden can lead to unexpected results.
I always found working with the filename/directory in vanilla Emacs
Lisp cumbersome, I wonder if is there any modern lisp library with a
better API, something like Python’s pathlib module?
Code
Here are the main functions in case someone needs something similar.
They are extracted from my Emacs configuration.