Yi Tang Data Science and Emacs

Re-discovery the Ancient Info Documentation System in the Age of LLM

So there are few notes that helped me to learn Info. Hopefully it can
bring more new users to the Info system.

Table of Contents

  1. Travel with Info
  2. Why Info is not Popular?
  3. dir the Index File
  4. Setup Info in MacOS

Travel with Info

The best time to learn difficult thing is in travelling. During my last two-week’s trip to Singapore/Malaysian, I was reading about Ledger Cli causally. without putting much efforts, it clicked. It suddenly started to make sense to me.

The more I learn, the more I want to learn more. I cannot wait for the next opportunities to open Emacs and dive into Ledger’s brilliant documentation. This is me with my Emacs in Changi Airport next to the Jewe.


Emacsing next to the Rain Vortex in Changi Airport

I was able to apply the learning and came up with the project-rule to keep data hygiene (will blog next). The positive feedback energise me. The flight to London is ready for boarding but I don’t want stop exploring during the 14 hours flight without WIFI.

That’s where I re-discovered the Info documentation system. I used it to read the ledger.el library between sleep sessions 8000 feet above the ground. Reading in plain text inside of Emacs has great benefits, no distractions, fraction free in taking notes. it was a breeze.

Then I stepped into learning the Info documentation system itself, how to navigate, search text/index and all that. I was able to pick it up quickly, the concepts and shortcuts are native to me as an experienced Emacs user.

I envisioned myself to use it to read all the documentation, e.g. Pandas library’s in Python. That would be ideal I told myself. However, I soon realised that Info documentation system is a niche tool: it is mostly used in GNU projects and Emacs libraries.

Why it is no popular? I was wondering myself. I decided to have a go myself. well, the journey to start is already full of hiccups. This is typical theme in learning legacy system, and could put many people off.

So there are few notes that helped me to learn Info. Hopefully it can bring more new users to the Info system.

dir the Index File

The first and most important thing I realised is, in the context of Info, the dir is not a directory, but a plain text file. I simply call it index file, then the rest becomes so much clearer.

The =dir=/index file is the entry point of the Info program. it has a lists of the available Info manuals with their name, Info file location, and desecration.

Setup Info in MacOS

Then there is a bug in emacs-plus: during the installation of Emacs, the dir file somehow got deleted in the cleaning process. So the manuals for the default libraries that comes with Emacs are not available. In my case, I only have few Info from the packages I installed post-installation, like orderlies, org-roam for example.

I took a slightly different approach to fix this problem: I kept the system level tools separate from the Emacs’s library, so i have two dir files.

 
# manuals of system level programs
cd /opt/homebrew/share/info
for file in * ; do install-info "$file" dir; done

# manuals of Emacs and Emacs libraries
cd /opt/homebrew/share/emacs/info
for file in * ; do install-info "$file" dir; done

Then tell Emacs the locations of those dir files as below.

 
(setq Info-directory-list
      (list "/opt/homebrew/share/info"
            "/opt/homebrew/share/info/emacs"))

Note, the convention is for each directory in the list, there is a dir file, on in Emacs, we are specifying the file using directory, and the file is happened to be called dir. I feel the naming can be improved to avoid such confusion!

After restarting Emacs, Info will show there are about 500+ manuals available, e.g. find tool, mu4e library, and Ledger3.

Lastly, an quick note install-info. As shown above, it is used to install Info manuals, taking ledger3.info as an example, to install it requires

 
install-info ledger3.info /opt/homebrew/share/info/dir

After that, the following line is added to the /opt/homebrew/share/info/dir file.

 
* Ledger3: (ledger3).           Command-Line Accounting

a bit of explanation:

  • *: mark the starting of the entry
  • Ledger3: is the node/manual name
  • ledger3: inside of a parenthesis is the path to the Info file without extension.
  • Command-Line Accounting: is the description of the manual.

Filter Ledger Transactions using Tags

I have been testing using Ledger-Cli to track my expenses, so far I have found the tagging system useful. In my ledger journal, each transaction is associated with a project, for example, the below transaction is assigned to project “2024 Monitor Stand”

2024-12-08 Screwfix
    ; project: 2024 Monitor Stand
    Expenses:HomeImprovement:Tools            £ 4.99 
    Expenses:HomeImprovement:PPE             £ 19.98 
    Expenses:HomeImprovement:PPE             £ 14.99 
    ; :refund:
    Assets:Amex

This constraint I came up with helps avoid meaningless spending on new shiny tools. Operationally, imposing this limitation on my book provides flexible ways of querying the data.

For example, bring up the transactions that do not have projects assigned to:

 
ledger reg exp and "expr" "not has_meta('project')" \
       --format "| %(date) | %P | %(amount) | %(note) |\n"
Table 1: posts without project
Date Payee Amount Note
2024/12/22 Selco £ 30.570 ; CaberFloor p5 T&G 2400x600x18mm x 2

There is only one post that I forgot to add the project tag, so pretty good.

A bit of explanation of the ledger-cli query syntax

  • exp: check only accounts contain ‘exp’, in the ledger’s convention, it is all expending accounts, i.e. Expense::*
  • expr: invoke filters using expressions
  • has_meta(‘project’): check if the transactions have the metadata key ‘project’
  • and, not: logical operators
  • –format: specify the output formatting

Another use case is counting the number of transactions per project. I use the number of purchased items as a proxy to gauge the project size.

 
ledger reg exp and "expr" "has_meta('project')" \
       --format "%(meta('project'))\n"  \
       | sort |  uniq -c | sort -bgr
Table 2: Number of items purchased for each project
No. Items Project
39 2024 Loft Lights
34 2024 Loft Insulation
32 2025 Garage Conversion
8 2024 Monitor Stand
2 General

The data shows the “2024 Loft Lights” project is by far the largest . That was a simple project by itself, however, since that was my first electrical project, I had to purchase a lot of stuff, 1.5mm cables, clamps, grommets, connectors, switches, sockets etc.

Finally, I have the “refund” tag so I can flag up the items to remind of myself to check if I received the refund fully.

 
ledger reg "expr" "has_tag('refund')" \
        --format "| %(date) | %P | %(amount) | %(note) |\n"
Date Payee Amount Note
2024/12/08 Screwfix £ 14.990 Site Optimus Gel Knee Pads

So far I enjoyed the plain text accounting using ledger-cli. The format and syntax are simple, and yet I can do complicated queries.

Setup ssh-agent Systemd Service for Emacs

Problem Statement

My personal desktop is not booting (the motherboard is probably dead) so I have been setting my server so I can work while sorting things out.

I got stuck in getting magit working in emacsclient: I thought I could run ssh-add inside of Emacs that would allow magic to access my git repos using ssh, but apparently, it is not the case.

After some digging, I learnt that the problem I have to solve is to run one ssh-agent in the background and then make the Emacs/Magit or any programs hook onto it. Then once I run ssh-add and type the passphrase for the first time, either inside of Emacs or in a bash terminal, everything would work.

Implementation

Drop the following unit file below to ~/.config/systemd/user/ssh-agent.service.

[Unit]
Description=SSH key agent

[Service]
Type=simple
Environment=SSH_AUTH_SOCK=%t/ssh-agent.socket
ExecStart=/usr/bin/ssh-agent -D -a $SSH_AUTH_SOCK

[Install]
WantedBy=default.target

The important things are

  1. The environment variable SSH_AUTH_SOCK is specified. It can be anywhere as long as this environment variable in other programs points to the same location.
  2. ssh-agent is invoked with the -a option to provide an address specified in the above step.

The $t is a specifier1 in systemd, it is equivalent to $XDG_RUNTIME_DIR variable in Debian. It points to the runtime temporary directory which apparently is safer2 than the /tmp directory. The runtime directory was cleaned up after stopping the ssh-agent so it is non-persistent.

To start the ssh-agent service:

 
systemctl enable --user ssh-agent
systemctl start --user ssh-agent

After that, update the unit file of Emacs to include this line (follow up my blog post Managing Emacs Server as Systemd Service for the full setup).

Environment=SSH_AUTH_SOCK=%t/ssh-agent.socket

To make it work for bash shell and all other programs calling from a bash terminal, add this line to ~/.bashrc.

 
export SSH_AUTH_SOCK="$XDG_RUNTIME_DIR/ssh-agent.socket"

Alternatives

There are programs developed to solve this specific problem (see Debian wiki). While using such a program seems like a simpler alternative (e.g. keychain), I prefer to use systemd as the unified approach for managing background services. I have been using it for emacsclient, and I’m adding ssh-agent to it.

What is your preference? How do you solve this problem?

Footnotes

1 All the specifiers are listed here.

2 I am not a security expert but the StackExchange comments seem to make sense.

Retiring Raspberry Pi 4 as Home Server and NAS

Table of Contents

  1. Good Start for Self-Hosting
  2. Lack of NAS Capacity
  3. Looking for a Successor
  4. Unexpected
  5. Setting up z170a
  6. Power Consumption

Good Start for Self-Hosting

The little Raspberry Pi 4 (RP4) served me well in the last two years. I used it to host NextCloud/Syncthing for syncing files between devices, scraping financial data from Yahoo Finance and TimeMachine for MacOS backup.

The latest addition to the service stack is paperless-ngx. It allows my Canon printer/scanner to send digital copies of documents directly to the RP4 or Gmail.

The RP4 handles all the demands without showing any signs of struggle. It costs as little as 6kW per hour while the Xbox One S draws 11kW while sleeping. Thanks to the energy crisis in the UK, I started to appreciate the energy efficiency of RP4. The ARM chips in it really impressed me.

Lack of NAS Capacity

A 3TB portal hard drive (WD My Passport) was attached to the PR4 to store media data. The USB 3.0 connector is surprisingly stable and fast. With both ends connected by ethernet cables, the file transfer speed can reach up to 100 MB/s. When my MacBook Pro uses Wi-Fi, the speed drops to about 40-50 MB/s but it is still great because of the convenience.

Later I started using it as a NAS to store the Final Cut Pro library. The 4k home gym videos I shot using iPhone 12 Pro are numerous 1! The hard drive keeps getting filled up.

I can get another portal hard drive, but then it will get filled up again, say in less than a month? So it occurred to me that I need a proper home server with full NAS capacity.

Looking for a Successor

I did a bit of research but I am not able to find a good product. I suspect the reason is the NAS build is a niche area while the PC industry is gaming-centric, focusing on getting faster, bigger, and fancier hardware with unnecessary RGB lights, that is where the profits are I presume.

I came across some innovative products on AliExpress from China, such as the TopTon N5105 board. It is more powerful, consumes slightly more electricity, and it has 6 SATA cables! It would be a perfect successor for my PR4.

But I am not comfortable ordering electronic stuff from AliExpress, returning it or sending it back for repair would be a nightmare.

PS: The company is growing fast, it continued to innovate, and the product lines extended to Intel N100 with an additional NVME drive and a USB-C. Their website and marketing materials look notched up quite a bit. I kind of regret not taking the risk back then.

Unexpected

The other day, I was re-organising (again) my home office, so had to move a bookshelf. I started moving it without taking everything off, then a motherboard fell off. It was the z170a with an i5-6600k and a heat sink attached to it. The motherboard was in my first desktop that I purchased 10 years ago when I started participating in Kaggle competitions in 2014.

After a quick inspection, I saw some pins were bent. I felt ashamed and sorry for the motherboard that I had not taken care of it. So I made a promise: if it survived the fall, I would use it for my NAS.

Well, it did so I found my NAS.

Setting up z170a

While putting it up, one SATA port was snapped and came up, but the rest is still fine. Apart from that, everything else went smoothly. The Debian 12 became much easier to install with the isohybrid technology and the non-free firmware is now part of the installation image itself.

The server setup scripts and configuration are saved in a selfhosted-services git repository so restoring the services took little efforts.

I had one little trick: I assigned the IP address of RP4 to the new z170a server so that on the client side I didn’t have to change anything. This was achieved rather easily: few clicks in the ASUS router web UI and then a reboot.

While setting it up, I noticed the z170a system is much more responsive, thanks to the 3.5 GHz i5-6600k CPU and a much faster SSD over the SD card. I was able to run multiple processes at the same time.

The longest part is copying files from the 3TB portal hard drive to the z170a’s internal HDD, which took about 20 hours.

It has great extensibilities: there are 3 free SATA for HHD and two PCIe slots.

Power Consumption

The only downside is that it consumes a lot more electricity. When testing in barebone, it drew only 10W. After putting everything together with additional HDDs, fans, and ethernet cable, the power metre jumped to 45W. I removed hard drives one by one to see where the bottleneck is.

  • No HDD, 27W
  • IronWolf alone, 32W, 5W increases.
  • IronWolf + Seagate, 37W, another 5W increase.
  • IronWolf + Seagate + Toshiba, 45W, 8W increase.

So I kept only IronWolf which is a 3TB NAS grade HDD.

I also tried tweaking the BIOS and Linux kernel to get to C-states but I felt it was over-engineering so I am happily settled down with 27W.

Footnotes

1 I record weightlifting to correct and improve my techniques.

Use Ledger-Cli to Track DIY Project Expenses

Table of Contents

  1. Personal Technical Challenge
  2. Baby Steps
  3. Why? - Effort Estimation

Personal Technical Challenge

I used ledger-cli1 before and it was a painful experience. The problem was not rooted in the tool but in how I intended to use it: I wanted to track all my expenses, from buying a cup of coffee to booking a holiday package. When I started this journey, there was a massive jump from knowing little to nothing about personal finance to doing double-entry accounting in plain text.

Though I gave up, it introduced me to the idea of owning my bank transaction data in text files on my personal computer. So over the years, I manually curated about 8 years of historical transaction data.

If you haven’t done so, I strongly recommend you go to your banks’ website and download the transaction data manually, going as far back as you can. You will notice that the banks only give access to 3-5 years of data2. It’s a shame that banks use outdated technologies but it is better than having nothing.

Since I had the data, I did some analysis and charts in Python/R. But I kept wondering what ledger-cli can offer. I occasionally saw blog posts on ledger-cli in the Emacs communities, so there must be something out there.

It also has become a personal challenge. I turned not to give up but put it aside to tackle it again after I got older.

Baby Steps

Hopefully, I had become smarter as well. This time, to ensure I can successfully adopt the tool, I am going to reduce the scope to limit to only tracking DIY project expenses.

I love DIY and I wish I had more days for DIY projects. It is usually labour-intensive and I feel hyped and extremely confident after a couple of DIY. Pairing it with learning ledger-cli, a cognitive-intensive activity, would make them a nice bundle3.

Though the usage is simple, the question it can answer is important. I want to know, during or after the DIY project, how much it exactly costs. I could use a much simpler tool, like spreadsheets or a pen/notebook, but I want it to be a stepping stone to acquire ledger-cli properly in the future.

Why? - Effort Estimation

I need an accurate answer to the actual costs so that I can use the data to train myself in cost estimation. This is an very important skill to have as a homeowner, it would put me in a much better position in negotiation with the tradesman. A lot of the people in the UK complained that they or their relatives got ripped off by tradesman.4

In general, house repairs and improvements are getting much more expensive every year, due to the shortage of labourers, inflation and Brexit etc. To give an example using my last two quotes, adding an electrical socket costs £240 and replacing a small section of water pipes costs £500.

I have a good habit of using org-mode to track time, my goal to add ledger-cil to my system to track the expenses. After that, I would know if it is really worth doing the DIY or finding a proper tradesman. The total cost itself is not the only metric that matters, but n very essential one to have.

Footnotes

1 https://ledger-cli.org/

2 Why don’t banks give access to all your transaction activity?

3 I might pick it up from Atomic Habit

4 How many of you have been ripped off by builders / tradesmen? (or know someone closely that has)

If you have any questions or comments, please post them below. If you liked this post, you can share it with your followers or follow me on Twitter!