Once upon a time, I didn’t: within a few years of university, I managed to blow through all my savings, loans, grants, parental support, and overdraft. There were times where I couldn’t withdraw money from a cash machine because I had less than £5 to my name (well, less than -£1995 to my name, if you exclude the overdraft). I had to carefully plan around the monthly income my parents gave me, and sometimes had to delay rent payments. And then I decided to do a Ph.D: which I didn’t get funding for, and would have to pay for myself.
The situation was untenable, things had to change: I started tracking all my spending. Then, when I knew where the money was going, I started to budget and to rein my spending in.
I’ve now been tracking my finances, down to the penny, with entirely manual data entry, since 2016. And I couldn’t imagine doing it any other way.
I’ve essentially adopted the You Need A Budget (YNAB) principles, though I don’t use the actual YNAB software (more on that later). I’ve also drawn inspiration from the /r/ukpersonalfinance flowchart.
My personal finance principles are:
Track everything, by hand.
Every transaction I make, I note down in my journal. Having the data lets me analyse it and establish goals, and doing it by hand gives me a greater appreciation of where my money is going (as well as adding a little extra friction per-transaction, which sometimes is enough to avoid an impulse-purchase).
Use envelope budgeting for short-term money.
I consider my bank account “empty”, in a sense: there’s no money in the account itself, all the money is in subaccounts allocated to specific purposes, like rent or food or web servers. And those purposes are specific: there’s no “savings” category, for instance. My bank doesn’t actually provide these subaccounts, they’re just something in my tracking, but some banks (like Monzo and Starling) do.
I’ve specified “short-term” here because I don’t envelope-budget my long-term investments: I’m not going to touch those for 5 to 10 years, or longer, so there’s no way I can predict how I’ll want to use them.
Budget everything monthly.
Like most people in the UK, I get paid once a month. So it’s easy to budget for monthly expenses like rent or utilities. There’s a consistent amount spent every month, so I can just allocate that much of my monthly income to pay for it. Somewhat harder are the expenses which occur less frequently: it’s easy to forget about these, not budget for them in advance, and then rush to find the money to pay for them.
But these infrequent expenses can be treated as a monthly expense, by dividing the cost by 12 (or by however many months are between payments on average) and budgeting that much every month. Then when the expense comes around, the money has been put aside.
A model is only useful if it reflects reality.
If I’m consistently over- or under-spending in a budget category, the budget needs to change. For example, there is absolutely no use in budgeting £200 for food every month if I always overshoot that by getting a bunch of takeaways and have to make up the difference elsewhere: much better to budget the actual amount, and then work to reduce how many takeaways I get until I’m consistently spending under my target.
The purpose of tracking everything and of making budgets is so that I can make predictions about the future. But those predictions are worthless if the data used to produce them is unrealistically optimistic or just downright wrong.
Save more than you spend.
Every month, I should save (as cash or by investing) more than I spend. If I spend all my income, I’m living paycheque to paycheque, and that means any disruption or reduction in my income could be dangerous; and if I spend all my income and then some, I’m gradually running out of money.
These principles have lifted me out of financial ruin (or close to it), and set me on the path to wealth. Sure, I also have a high-paying job, which helps a lot; but if I had the same spending habits now that I did in my university days, I would have almost no savings and would be living in fear of how I would survive if I were to lose my job.
Lots of people use YNAB, or Excel, or Google Sheets to track their spending. I use plain-text accounting, specifically, hledger. The tool you use doesn’t actually matter, so long as it works for you. Nor do you need to track down to the penny, as I do. Some people round all their spending to the nearest pound, or even larger amounts, and track that. Some people don’t track cash, and just mark any money they withdraw as “spent”.
It’s more important that you do enough tracking to help you meet your goals. Don’t let the perfect be the enemy of the good.
While the most straightforward metric of financial health is the monthly change to my assets (positive is good: I’m saving more than I spend; negative is bad: I’m spending more than I save; zero is kind of bad: I’m spending everything), there are a few other metrics I look at.
Firstly, there are two metrics which I don’t generally see mentioned online, but they’re valuable to me because I can use them to compute other, more directly useful, metrics:
Average daily expense:
This is the total money spent over the period (excluding anything taken before income hits my bank account: like income tax or my student loan), divided by the number of days in the period.
As a Prometheus time series, this is:
( sum(hledger_balance{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="expenses:gross"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window
Let’s break that down:
hledger_balances
is a time series of end-of-day account balances. It has labels account
and currency
.
hledger_fx_rate
is a time series of daily currency exchange rates, which I collect at 9PM UK time. It has labels currency
(so it can be combined easily with hledger_balances
) and target_currency
.
{account="expenses"}
is the parent account of all other expense accounts, so it contains their balances too. All expense accounts are strictly positive: money moves into expenses
from other accounts.
{account="expenses:gross"}
is the account I use to track deductions from my gross pay.
$currency
is a Grafana dashboard variable, defining the currency I want to see the result in, usually that’s GBP
.1
$agg_window
is another dashboard variable, defining the number of days to look at to work out that average, usually 365.
So this is saying “take the expenses (excluding pay deductions) now and $agg_window
days ago, subtract them to work out how much I’ve spent over that entire time, and divide by $agg_window
to work out the average daily spend.”
Average daily income:
This is the total income over the period (excluding gifts), divided by the number of days in the period.
The Prometheus expression is pretty similar to the average daily expense:
( sum(hledger_balance{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income:gift"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="income:gift"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window * -1
The * -1
at the end is because all the income
accounts are strictly negative (money moves out of income
into other accounts).
Now we can compute some more interesting metrics.
Net worth:
If I paid off all my debts right now, how much money would I have left?
sum(hledger_balance{account="assets"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) + on(target_currency) sum(hledger_balance{account="liabilities"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency)
For the same reason as income
, liabilities
here is strictly negative.
You could exclude things that aren’t “real” debts here, like a student loan, if you wanted. But I include it.
Savings rate:
For every calendar month (since I get paid monthly) divide the saved income by the net income, then take the average of all those values.
# saved income ( sum(hledger_monthly_decrease{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - on(target_currency) sum(hledger_monthly_increase{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) ) / on(target_currency) # net income ( sum(hledger_monthly_decrease{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - on(target_currency) sum(hledger_monthly_increase{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) )
There’s no explicit averaging in this expression because Grafana does that for me.
This uses a couple of new metrics:
hledger_monthly_decrease
is the amount of money moved out of the account (as a non-negative number) in that calendar month.
hledger_monthly_increase
is the amount of money moved in to the account (as a non-negative number) in that calendar month.
Unfortunately I can’t just use hledger_balances
for this because Prometheus doesn’t allow aggregating data by calendar month, and months are not all the same length. But even if it could, I think this approach would still end up being more straightforward. Before migrating to Prometheus, I used a significantly more complicated InfluxDB-based dashboard, which did attempt to work out savings rate from the balances. It was pretty complex, and also would wrongly count receiving a loan (a liability) as income.
So this is saying that my saved income is the amount income
has gone down by (remember: money moves from income
into other accounts) minus the amount expenses
has gone up by. Whereas my net income is the amount income
has gone down by (i.e., gross income) minus the amount expenses:gross
(pay deductions) has gone up by.
An alternative formulation which excludes pension contributions would be:
# saved income ( sum(hledger_monthly_decrease{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - on(target_currency) sum(hledger_monthly_increase{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) # ignore pension contributions (assumes pensions only go up - include 'decrease' as well to handle January roll-over) - on(target_currency) ( sum(hledger_monthly_increase{account="assets:pension"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - sum(hledger_monthly_decrease{account="assets:pension"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) ) ) / on(target_currency) # net income ( sum(hledger_monthly_decrease{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - on(target_currency) sum(hledger_monthly_increase{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) # as above - on(target_currency) ( sum(hledger_monthly_increase{account="assets:pension"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) - sum(hledger_monthly_decrease{account="assets:pension"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by(target_currency) ) )
Runway:
I’m sure there’s a better name for this, but this is the metric which tells me how many days I could survive with my current assets with no income. So, if I lost my job today with no severance pay, how long would I have to find a new one, assuming I keep my spending habits the same?
This comes in two forms: a “short runway” and a “long runway”.
The short runway only considers cash (whether physical cash or a bank account) and an emergency fund (if you have one of those)2:
# total available cash and emergency fund ( sum(hledger_balance{account="assets:cash"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on (target_currency) sum(hledger_balance{account="assets:investments:nsi:premium_bonds:emergency"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / on (target_currency) # average daily expense ( ( sum(hledger_balance{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="expenses:gross"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window )
The long runway considers all assets, including investments, as if they were sold today:
sum(hledger_balance{account="assets"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) / on (target_currency) # average daily expense ( ( sum(hledger_balance{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="expenses:gross"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window )
In practice, if I did suddenly lose my job, I’d change my spending habits. So these are pessimistic estimates. In general though I prefer financial estimates to be pessimistic, and not optimistic.
FIRE number:
Financial Indepence, Retire Early (FIRE) is a movement with the goal of aggressively saving and investing enough money so that you can live off the returns indefinitely, meaning you no longer need to work (though some choose to). It’s something that appeals to me: I like my job and my lifestyle, but I would like having the same lifestyle without a job significantly more.
The rule of thumb is that if you have 25 years worth of expenses invested, you can withdraw one year’s expenses (4%) every year without the value of your investments decreasing, assuming an annual 7% growth and 3% inflation.
# average daily expense ( sum(hledger_balance{account="expenses"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="expenses:gross"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="expenses:gross"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window # $fire_annual_factor years worth * 365 * $fire_annual_factor
Here $fire_annual_factor
is 25. I just made it a variable so I could give it a clear name.
AAW / PAW thresholds:
The Millionaire Next Door, a study of wealthy Americans, proposed a metric for wealth: if a person aged N
years old has an annual income of $D
(excluding any inheritance), then they should have a net worth of $D * N / 10
.
Someone with under half that is an “under-accumulator of wealth” (UAW) and someone with more than double that is a “prodigious accumulator of wealth” (PAW).
So, the AAW threshold is the amount of money at which you are no longer a UAW:
( # average daily income ( sum(hledger_balance{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income:gift"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="income:gift"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window * -1 # $age/10 years worth * 365 * $age / 10 ) / 2 # ignore gifted income - on(target_currency) sum(hledger_balance{account="income:gift"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) # add (subtract) liabilities, other than student loan - on(target_currency) sum(hledger_balance{account="liabilities"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="liabilities:loan:slc"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency)
And the PAW threshold is the amount of money at which you are no longer an AAW:
( # average daily income ( sum(hledger_balance{account="income"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) - on(target_currency) sum(hledger_balance{account="income:gift"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="income:gift"} offset ${agg_window}d * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) ) / $agg_window * -1 # $age/10 years worth * 365 * $age / 10 ) * 2 # ignore gifted income - on(target_currency) sum(hledger_balance{account="income:gift"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) # add (subtract) liabilities, other than student loan - on(target_currency) sum(hledger_balance{account="liabilities"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency) + on(target_currency) sum(hledger_balance{account="liabilities:loan:slc"} * on(currency) hledger_fx_rate{target_currency="$currency"}) by (target_currency)
It turns out that the PAW threshold is below the FIRE number. This makes some degree of intuitive sense: becoming financially independent requires a prodigious amount of money! But, since the FIRE number does not exclude gifts, it’s possible to become financially independent by receiving a large inheritance which you then invest, but you may still be a UAW if you’re otherwise not very good at saving.
Once you’re tracking your finances and have some metrics of interest, even if it’s only something very simple like “total value of assets” or “net worth”, you can start to make goals. The best goals are SMART goals:
In the past I’ve set goals like:
And so on. I currently have a long-term goal of saving for a house deposit, but that will take a few more years to complete.
I also have targets for all the key metrics I track, and display those on a dashboard:
I wrote a script which imports the data every evening. Sometimes I’ll also make some plans for the future, mock up some data, and import that into the dashboard, so I can try out different savings plans, or think about how to allocate an expected payrise or bonus.
I have one main bank account, and everything is driven by activity in there. This bank account is where I do the bulk of my envelope budgeting, it’s where my income arrives, and it’s where standing orders transfer money from into other accounts and investments.
This is my Nationwide FlexDirect. I picked it because it had a good introductory interest rate.
It has four “types” of envelope, and many envelopes of each of these types:
Discretionary envelopes are for money I can spend however I want, and I usually spend the entire contents of these within a month, rather than building up savings.
Goal envelopes are for specific future expenses, like renewing my passport or visiting Japan.
Pending envelopes are to hold cash which is due to be sent elsewhere in the near future.
Saved envelopes are my regular budget categories: things like food, rent, travel, and so on.
I also have a few other accounts:
Both my ISA and LISA are invested in a low-cost Vanguard index fund.
On payday, money arrives in the Nationwide account. It’s split up between discretionary envelopes, goal envelopes, pending envelopes, and saved envelopes as appropriate. The allocations are fairly static, usually only changing when I intentionally change something in my process.
However, my income doesn’t necessarily exactly match my budget. There’s usually some excess income, which I allocate to two special accounts:
I add up to £125 to a “saved goals” envelope, if I don’t have any specific goal envelopes right now, to be used towards future goals.
I add any remainder to a “saved invest” envelope, which I’ll manually invest in my ISA when it reaches a reasonable amount (say, £100 or more).
As I spend or move money, I note down the transactions. I check all my statements once a week, on Saturday mornings, to reconcile and fix any inconsistencies. This only takes a few minutes.
I’m now very good at not having inconsistencies.
I have an American Express credit card, because it gives me some cashback. There is a £25 annual fee, but the cashback more than covers it.
When I buy something with the card, I note that down as two transactions: one spending money from the card, and one transferring money from the relevant discretionary / saved / goal envelope into a “pending amex” envelope. The card is paid off in full, from that envelope, by a direct debit around the start of each month.
It’s fairly straightforward. I think it’s important to keep things simple if you plan to stick to it. Most of the complexity in my personal finance system comes from manually entering all the data, and using the right envelopes.
As said earlier, I use hledger, which is a plain-text accounting tool. This section covers how I use hledger: it’s pretty conventional, but has lots of examples.
I log all my financial transactions for the current year in a file called current.journal
. There are $YEAR.journal
files for historic data. I also have two files which all my journals include:
commodities
, a list of all commodities (currencies, cryptocurrencies, funds) I deal with.prices
, end-of-day exchange rates for all of my commodities.Finally, I have a combined.journal
file, which includes the journal files from 2020 onwards (as the start of 2020 marked the last big change to how I tracked things), along with appropriate closing transactions for each year so that they fit together. This file is used as the data source for the dashboard.
This is the template for a new journal file:
include commodities include prices * Starting balances YYYY-01-01 ! Opening balances ... * Ledger ** January ** February ** March ** April ** May ** June ** July ** August ** September ** October ** November ** December
For each month, I fill in expected expenses, transfers, and income allocation based on previous months and my budget:
YYYY-MM-30 Expenses # expected expenses (from previous months) go here expenses:virtual YYYY-MM-30 Transfers # expected transfers (from budget) go here YYYY-MM-30 Job # expected income allocations (from budget) go here income:job
I then comment out these expected transactions. They’re there to uncomment if I want to forecast, but for every-day use they’re hidden away.
The set of accounts I use is fairly stable: sometimes I’ll add one, or one will cease to be useful, but that’s a rare event. Here are all the regular accounts, which are mostly self-explanatory:
assets
cash
paypal
petty
hand
—physical cash, in my wallet
budgeted
—…which was withdrawn from my bank accountunbudgeted
—…which was a gifthome
—physical cash, not in my walletmarcus
savings
nationwide
flexdirect
discretionary
other
social
tea
goal
pending
saved
food
gift
goals
—money to allocate to future goals, if I don’t have any right nowgraze
—monthly Graze subscriptionhealth
household
invest
—money to invest, outside of my regular scheduled investmentsphone
rent
travel
utilities
starling
saved
patreon
—monthly Patreon subscriptions (charged in USD)protonmail
—annual ProtonMail fee (charged in EUR)web
—AWS, domain names, and hosting (all charged in foreign currencies)investments
ajbell
fidelity
nsi
receivable
deposit
—the deposit on my flatequity
—used for special transactions (see below)expenses
income
liabilities
creditcard
amex
owed
An account name is the path to it through the tree, separated by colons. For example, assets:cash
, or expenses:utilities:electricity
.
Money (and other commodities) is only stored in leaf accounts.
hledger allows transactions to be marked with a !
or a *
. The traditional meaning of these is “pending” and “cleared”.
I use !
slightly differently. I use it to indicate a transaction which is just an artefact of the way I track my finances, which doesn’t involve any balance changes to a real-world account.
For example, putting aside money to pay off credit card expenses:
2021-12-02 ! Bookkeeping assets:cash:nationwide:flexdirect:saved:food -£29.35 assets:cash:nationwide:flexdirect:saved:household -£8.63 assets:cash:nationwide:flexdirect:saved:health -£5.20 assets:cash:nationwide:flexdirect:saved:gift -£119.00 assets:cash:nationwide:flexdirect:saved:gift -£22.43 assets:cash:nationwide:flexdirect:pending:amex
Income is recorded as the pre-tax amount coming from income:$source
, and is split across assets:*
and expenses:gross:*
. All amounts are included.
2021-11-30 * Cabinet Office assets:cash:nationwide:flexdirect:float £32.46 assets:cash:nationwide:flexdirect:discretionary:other £2.66 assets:cash:nationwide:flexdirect:discretionary:social £30.00 assets:cash:nationwide:flexdirect:discretionary:tea £30.00 assets:cash:nationwide:flexdirect:goal:clothes £25.00 assets:cash:nationwide:flexdirect:goal:monitor £25.00 assets:cash:nationwide:flexdirect:goal:phone £250.00 assets:cash:nationwide:flexdirect:goal:upgrades £25.00 assets:cash:nationwide:flexdirect:pending:ajbell £400.00 assets:cash:nationwide:flexdirect:pending:fidelity £500.00 assets:cash:nationwide:flexdirect:pending:premium_bonds £150.00 assets:cash:nationwide:flexdirect:pending:starling:patreon £8.00 assets:cash:nationwide:flexdirect:pending:starling:protonmail £5.00 assets:cash:nationwide:flexdirect:pending:starling:roll20 £10.00 assets:cash:nationwide:flexdirect:pending:starling:web £80.00 assets:cash:nationwide:flexdirect:saved:food £200.00 assets:cash:nationwide:flexdirect:saved:gift £0.00 assets:cash:nationwide:flexdirect:saved:graze £18.95 assets:cash:nationwide:flexdirect:saved:health £0.94 assets:cash:nationwide:flexdirect:saved:household £70.99 assets:cash:nationwide:flexdirect:saved:phone £13.92 assets:cash:nationwide:flexdirect:saved:rent £1406.21 assets:cash:nationwide:flexdirect:saved:travel £0.00 assets:cash:nationwide:flexdirect:saved:utilities £237.67 expenses:gross:tax:income £1145.27 expenses:gross:tax:ni £439.80 expenses:gross:liabilities:loan:slc £375.00 expenses:gross:pension £345.09 income:job -£5826.96 expenses:gross:pension £1309.93 income:job -£1309.93 2021-11-30 ! Student Loan expenses:gross:liabilities:loan:slc -£375.00 = £0.00 liabilities:loan:slc
All the postings in an income transaction should be for assets
, expenses:gross
, or income
, so that my net income can be easily calculated as “decrease in income
- increase in expenses:gross
”, as in the metrics above. So student loan repayments are handled slightly awkwardly, but the ease of calculation is worth it.
Some income transactions may not have anything to do with expenses or liabilities:
2021-12-01 * Starling assets:cash:starling:saved:web £0.05 income:interest
I use the @@
form to exactly specify the overall price:
2021-08-10 * AJ Bell assets:investments:ajbell:lisa 1.6458 VANEA @@ £473.51 expenses:fees £1.50 assets:investments:ajbell:lisa
Transferring the cash to the investment account and then investing it may be two separate steps:
2021-08-02 * Fidelity assets:investments:fidelity:isa £500.00 assets:cash:nationwide:flexdirect:pending:fidelity 2021-08-09 * Fidelity assets:investments:fidelity:isa 1.75 VANEA @@ £500.00 assets:investments:fidelity:isa
If there isn’t enough cash in the account to pay for any fees, some other asset will be sold. That’s bad, so I always make sure there’s some cash.
Expenses from a bank account or debit card are straightforward:
2021-09-06 * Three Rivers District Council expenses:tax:council £128.00 assets:cash:nationwide:flexdirect:saved:rent
Foreign currency expenses are recorded like so:
2021-01-04 * Hetzner expenses:web 38.16 EUR @@ £34.52 assets:cash:starling:saved:web
When I withdraw physical cash, I take it from the relevant budget category directly:
2019-01-25 * Withdraw assets:cash:petty:hand:budgeted £10.00 assets:cash:nationwide:discretionary:other 2019-01-25 * Post Office expenses:other £1.01 assets:cash:petty:hand:budgeted
Foreign currency cash withdrawals are treated exactly the same as investment transactions, using @@
to note down the exact exchange rate I got.
When I pay for something on my credit card I add a transaction from liabilities
to track the debt, and also remove the money from the budget category:
2021-12-06 * Tesco expenses:food £24.73 liabilities:creditcard:amex 2021-12-06 ! Bookkeeping assets:cash:nationwide:flexdirect:saved:food -£24.73 assets:cash:nationwide:flexdirect:pending:amex
I pay off my credit card in full every month automatically via direct debit:
2021-12-03 * American Express liabilities:creditcard:amex £520.97 assets:cash:nationwide:flexdirect:pending:amex
Every year, I get cashback. As the cashback goes to the balance on the card, rather than being paid into my bank account, I treat it as a pair of an income transaction and an allocation transaction:
2021-08-14 * American Express | cashback liabilities:creditcard:amex £105.64 income:amex 2021-08-14 ! Allocation | cashback assets:cash:nationwide:flexdirect:goal:amex_membership £25.00 assets:cash:nationwide:flexdirect:saved:health £80.64 assets:cash:nationwide:flexdirect:pending:amex -£105.64
These are kind of like credit card transactions: I incur an expense, but the money isn’t actually taken for a while. In this case, “a while” could be months.
I put aside the money immediately:
2022-02-04 ! Kickstarter | Knock! Issue Three assets:cash:nationwide:flexdirect:pending:preorder £38.00 assets:cash:nationwide:flexdirect:discretionary:other
And then note down the expense when it happens. Sometimes the amount I put aside won’t be quite right (e.g. if it’s a transaction in another currency, and I estimated the initial amount based on then-current exchange rates), so I’ll need to add or remove some money:
2022-03-06 * Kickstarter | Knock! Issue Three expenses:ttrpg £32.43 liabilities:creditcard:amex 2022-03-06 ! Bookkeeping | Kickstarter | Knock! Issue Three assets:cash:nationwide:flexdirect:pending:amex £32.43 assets:cash:nationwide:flexdirect:discretionary:other £5.57 assets:cash:nationwide:flexdirect:pending:preorder -£38.00
I used to track this sort of thing by putting the money in pending:amex
immediately, without going via pending:preorder
. But that only works if I pre-order everything with my credit card, and also means that pending:amex
almost never matches liabilities:creditcard:amex
, which makes it easier to lose track of things. So I introduced this new account to make everything more explicit.
Every Saturday I check my financial statements and reconcile transactions in the journal:
hand
transactions.If any of the account balances are incorrect, and I can’t find the mistake, give up and fix it with a transaction to/from equity:adjustment
. For example:
2020-12-04 ! Adjustment liabilities:creditcard:amex -£76.15 = -£220.26 equity:adjustment
I don’t like making these adjustment transactions, and I’m pretty good at avoiding them now.
At the end of December, I finish up the journal to start the new year:
current.journal
to $YEAR.journal
.current.journal
.hledger close
:
combined.journal
Here’s an example of a (2) transaction:
2020-12-31 ! Write-off assets:receivable:adam -£11.95 = £0.00 liabilities:owed:jake £10.94 = £0.00 equity:writeoff
Here’s an example of a (5.1) transaction:
2021-01-01 ! Closing balances assets:cash:nationwide:flexdirect:pending:amex £-598.01 = £0.00 assets:cash:nationwide:flexdirect:pending:cavendish £-200.00 = £0.00 assets:cash:nationwide:flexdirect:pending:starling:patreon £-8.00 = £0.00 assets:cash:nationwide:flexdirect:pending:starling:protonmail £-5.00 = £0.00 assets:cash:nationwide:flexdirect:pending:starling:roll20 £-5.00 = £0.00 assets:cash:nationwide:flexdirect:pending:starling:web £-55.00 = £0.00 assets:cash:nationwide:flexdirect:saved:food £-200.57 = £0.00 assets:cash:nationwide:flexdirect:saved:graze £-50.00 = £0.00 assets:cash:nationwide:flexdirect:saved:health £-500.00 = £0.00 assets:cash:nationwide:flexdirect:saved:household £-300.00 = £0.00 assets:cash:nationwide:flexdirect:saved:phone £-100.00 = £0.00 assets:cash:nationwide:flexdirect:saved:rent £-2475.86 = £0.00 assets:cash:nationwide:flexdirect:saved:travel £-523.69 = £0.00 assets:cash:nationwide:flexdirect:saved:utilities £-800.00 = £0.00 assets:cash:petty:hand:budgeted £-19.05 = £0.00 assets:cash:petty:hand:unbudgeted £-2.00 = £0.00 assets:cash:petty:home -3.35 EUR = 0.00 EUR assets:cash:petty:home -1853.00 JPY = 0.00 JPY assets:cash:starling:saved:patreon £-21.79 = £0.00 assets:cash:starling:saved:protonmail £-42.43 = £0.00 assets:cash:starling:saved:roll20 £-37.33 = £0.00 assets:cash:starling:saved:web £-283.38 = £0.00 assets:investments:cavendish -19.66 VANEA = 0.00 VANEA assets:investments:cavendish £-36.08 = £0.00 assets:investments:coinbase -10.00 EUR = 0.00 EUR assets:investments:fundingcircle £-0.03 = £0.00 assets:investments:nsi:premium_bonds:emergency £-4475.00 = £0.00 assets:investments:nsi:premium_bonds:move £-3775.00 = £0.00 assets:receivable:deposit £-1384.62 = £0.00 assets:receivable:refund £-161.00 = £0.00 assets:pension:alpha -2653.00 £/yr = 0.00 £/yr liabilities:creditcard:amex £598.01 = £0.00 liabilities:loan:slc £20468.52 = £0.00 equity:opening/closing
And here’s an example of a (5.2) transaction:
2021-01-01 ! Opening balances assets:cash:nationwide:flexdirect:pending:amex £598.01 assets:cash:nationwide:flexdirect:pending:fidelity £200.00 assets:cash:nationwide:flexdirect:pending:starling:patreon £8.00 assets:cash:nationwide:flexdirect:pending:starling:protonmail £5.00 assets:cash:nationwide:flexdirect:pending:starling:roll20 £5.00 assets:cash:nationwide:flexdirect:pending:starling:web £55.00 assets:cash:nationwide:flexdirect:saved:food £200.57 assets:cash:nationwide:flexdirect:saved:graze £50.00 assets:cash:nationwide:flexdirect:saved:health £500.00 assets:cash:nationwide:flexdirect:saved:household £300.00 assets:cash:nationwide:flexdirect:saved:phone £100.00 assets:cash:nationwide:flexdirect:saved:rent £2475.86 assets:cash:nationwide:flexdirect:saved:travel £523.69 assets:cash:nationwide:flexdirect:saved:utilities £800.00 ; assets:cash:petty:hand:budgeted £19.05 assets:cash:petty:hand:unbudgeted £2.00 assets:cash:petty:home 3.35 EUR assets:cash:petty:home 1853.00 JPY ; assets:cash:starling:saved:patreon £21.79 assets:cash:starling:saved:protonmail £42.43 assets:cash:starling:saved:roll20 £37.33 assets:cash:starling:saved:web £283.38 ; assets:investments:fidelity 19.66 VANEA assets:investments:fidelity £36.08 assets:investments:coinbase 10.00 EUR assets:investments:fundingcircle £0.03 assets:investments:nsi:premium_bonds:emergency £4475.00 assets:investments:nsi:premium_bonds:move £3775.00 ; assets:receivable:deposit £1384.62 assets:receivable:refund £161.00 ; assets:pension:alpha 2653.00 £/yr ; liabilities:creditcard:amex -£598.01 liabilities:loan:slc -£20468.52 ; equity:opening/closing
You might be wondering why I do the currency conversion one at a time for each account, rather than once at the end. This is because FOO +on(FIELD) BAR
(or any binary operator) will discard those entries of FOO
for which there isn’t a corresponding entry of BAR
, it won’t assume BAR
to be 0 in those cases. So this means that binary operators are lossy in PromQL! So to get around that issue, I convert all the series to the same currency before doing arithmetic on them.↩︎
I’ve got rid of my dedicated emergency fund, since I have both a credit card and a few months regular expenses saved up. But I did have one in the past, so it’s taken into account in the short runway so that historic data works correctly.↩︎
Plus a surge-protected 8-way mains extension lead and a pile of spare cat6 cables, for guest use.
azathoth is my desktop machine, and is running a NixOS / Windows 10 dual boot. I mostly use NixOS for programming and for work (which is, mostly, programming); and Windows 10 for everything else.
nyarlathotep is my general-purpose server and also a NAS, and is running NixOS. As you can see in the photo above, nyarlathotep sits atop the network cabinet; I’ll probably upgrade to a rack and a suitable chassis at some point.
The pi-hole is providing DNS, and is running Raspbian. I have plans to put NixOS on this too and rename it to yog-sothoth.
Noise is a concern, as everything is set up in my living room, which is where guests staying overnight sleep. So to keep everything quiet at night (and at all times) I’m using Noctua fans running at as low an RPM as I can get them.
I’m using Ubiquiti’s managed UniFi networking equipment, which is overkill for the small network I have, but it’s all very nice.
350Mbit WAN comes in through my Virgin Media router (running in modem mode), into my UniFi Dream Machine Pro’s WAN port, which is connected to a UniFi Switch 24 (non-PoE) and a UniFi FlexHD Access Point.
I have three VLANs with some firewall rules set up between them:
Name | VLAN ID | IP Range | Firewall rules |
---|---|---|---|
Wired | 1 | 10.0.0.0/24 | Can talk to hosts in VLANs 1 and 10 |
Wireless | 10 | 10.0.10.0/24 | Can talk to hosts in VLANs 1 and 10 |
Untrusted Wireless | 20 | 10.0.20.0/24 | Can send DNS traffic to the pi-hole and HTTP traffic to nyarlathotep |
The untrusted wireless is for phones and smart devices which don’t make it easy to see what they’re doing. And my work laptop. Normal computers (eg, a guest’s laptop) go straight on the trusted wireless network.
I’ve got a few custom DNS records set up for various static IP addresses:
address=/router.lan/10.0.0.1 address=/pi.hole/10.0.0.2 address=/nyarlathotep/10.0.0.3 # for https://github.com/alphagov/govuk-docker address=/dev.gov.uk/127.0.0.1 # for general use address=/localhost/127.0.0.1 # these should be CNAMEs but windows doesn't resolve them address=/help.lan/10.0.0.3 address=/nas.lan/10.0.0.3 # firefox in windows has started redirecting http://nyarlathotep to http://www.nyarlathotep.com ??? # so add in a domain with a dot, which it seems happier with address=/nyarlathotep.lan/10.0.0.3
The help.lan
and nas.lan
rules are for guests. Visiting http://help.lan
tells you what VLAN you’re on, gives a summary of the firewall rules, and gives guest credentials for the NAS (if not on VLAN 20). http://help.lan
is served by nyarlathotep, so to restrict access to the other domains it’s serving, it 302-redirects to http://help.lan
if the user is on VLAN 20.
Services and configuration are covered in my NixOS config.
nyarlathotep uses a 250GB SSD as the system volume (connected via PCI-e), with a ZFS partition and a vfat partition (the UEFI system volume).
The ZFS partition consists of one zpool with volumes:
local/volatile/root
: mounted at /
local/persistent/home
: mounted at /home
local/persistent/nix
: mounted at /nix
local/persistent/persist
: mounted at /persist
local/persistent/var-log
: mounted at /var/log
This local/volatile/root
dataset is configured in the “erase your darlings” style: everything is deleted by rolling back to an empty snapshot at boot. Any state which needs to be persisted is in /persist
, and managed through configuration and symlinks.
The local/persistent
dataset has automatic snapshots configured.
nyarlathotep uses 8 hot-swap SATA bays configured as a zpool of mirrored pairs for NAS:
Mirror Device Mirror Device 0 A 0 B 1 A 1 B 2 A 2 B - A - B
The “A” volume of each pair is connected to the motherboard SATA controller and the “B” volume of each pair to a PCI-e SATA controller.
The HDD serial numbers are:
ata-ST10000VN0004-1ZD101_ZA206882
ata-ST10000VN0004-1ZD101_ZA27G6C6
ata-ST10000VN0004-1ZD101_ZA22461Y
ata-ST10000VN0004-1ZD101_ZA27BW6R
ata-ST10000VN0008-2PJ103_ZLW0398A
ata-ST10000VN0008-2PJ103_ZLW032KE
The zpool currently has a single dataset:
data/nas
: mounted at /mnt/nas
The data
dataset has automatic snapshots configured.
I’ve got a few thoughts on future projects and expansions for this set-up, but given how much I spent on the last upgrade these are all likely to be a few years off at least.
Currently I have a network cabinet, and a non-rackmount server chassis. I could instead get a larger rack, an appropriate server chassis, and use that for everything.
The main downsides to this are cost (just by virtue of being rack compatible it seems everything gets more expensive) and noise (with less space in the chassis fans have to work harder). Ease of transport is also a consideration, as I’m only renting my current flat.
So this is probably something I’d only do after finding a place I intend to stay at long-term; ideally where I can have a dedicated computer room and run ethernet cables through the walls.
Currently I rely on just the one ISP for internet. They’re usually pretty good, but sometimes issues do occur. My UDM Pro supports a second WAN source, so I could get a 4G / 5G modem and set up automatic failover if the primary goes down.
Currently I have a Raspberry Pi and a UniFi Access Point powered by regular power cables. Both of these devices are capable of being powered by a switch with PoE (with some extra hardware for the Pi), which would reduce other cables.
However, PoE switches are significantly more expensive. So I could either get a small PoE switch for the limited number of devices I have, or save this upgrade for when I have use for more PoE-connected devices. For example, if I get a house and need to set up multiple access points.
Totally overkill, but it could be cool to get a switch which supports 10Gbit connections, and also 10Gbit NICs for azathoth and nyarlathotep.
I think I would need to rebuild nyarlathotep before doing this, as it doesn’t have a free PCI-e port.
People have designed 3D-printable rack mounting gear for Raspberry Pis, and since reading an article about a Raspberry Pi cluster I’ve been tempted. I wasn’t very keen on Kubernetes when I last tried it; at work we use Cloud Foundry, and it’s pretty easy to deploy things to, so I’d probably look into running that first.
I could move some of the services off nyarlathotep onto this Pi cluster, though I’d probably still want to use nyarlathotep as backing storage.
]]>It’s the modern day, and you are all paranormal investigation / conspiracy theorist youtubers. You uncover the truth that the man doesn’t want people to know! Sure, many of your “ghost” tips turn out to be teens doing drugs in an abandoned building at night, or similar mundane explanations, but a little video editing can solve that problem. And sometimes—sometimes—you stumble upon something real.
This campaign is about the real ones.
Player buy-in: The player characters are a bunch of obnoxious youtubers who’ll willingly put their lives and sanity at risk just to increase that view count. No playing it safe here. I also strongly encourage you to really get into it, to narrate what you’re doing as if you’re presenting a video to a skeptical audience, and to have over-the-top reactions to things. The campaign will be a very episodic mystery-of-the-week style game with little in the way of an overall plot.
Your good friend Jackson Elias, an author who infiltrates and writes about cults, contacts you out of the blue in a panic. He says he’s found signs of something big, some sort of conspiracy, but he won’t go into details over a channel which could be tapped. He can’t do this alone and needs a reliable team to help him. He sets a meeting: New York, January 15th, 1925.
You arrive, but are too late. Jackson is dead, freshly murdered. A strange symbol carved into his forehead, a symbol seen in a series of other murders. There’s more here than meets the eye, you don’t know what’s going on, but you do know that Jackson found something. And now he’s dead.
Player buy-in: The player characters have to be motivated to solve the mystery their friend left behind, even at risk of their own lives and sanity. This is a nonlinear campaign which has several major parts with clues pointing between them, and which can be visited in any order: but it’s not static, your actions in any one part of the campaign will have ramifications in other parts. This is a long and detailed campaign, you will have to keep careful notes of clues and discoveries, or you just won’t know what’s going on, and you can expect it to last somewhere between 1 and 2 years.
The year is 1933. In South America, the Chaco War between Bolivia and Paraguay is in full swing. You’ve been employed by a humanitarian charity, the Caduceus Foundation, to deliver medical aid to civilians caught up in the war. Caduceus has flown you to Asuncion in Paraguay, from where you travelled across country, escorting doctors, nurses, and medical supplies to an aid camp deep in the jungle of Gran Chaco.
The truth, which the player characters don’t know at the start, but which their team leader will fill them in on, is that the Caduceus Foundation is a front for an organisation which battles the Cthulhu Mythos. For now, the player characters are just heroes who have volunteered to help a humanitarian charity, but soon they will be pulled into combatting the mythos.
Player buy-in: The player characters are a heroic bunch who volunteered to help civilians caught in a war, but don’t bat an eye when it turns out the organisation they joined is actually fighting more than just humans.
Dolmenwood is a mystical fairy-tale forest setting inspired by British folklore and stories like The King of Elfland’s Daughter. It’s a large dense forest with threads of civilisation, roads and towns and inns, running through it.
But if you move off those threads, the forest becomes dark and magical: fearsome frost elves prowl the forest glades, looking for ways to open the doors to Fairyland and begin their dominion once more; pagan cults venerate the many standing stones, and jealously guard their secrets; and the corrupted forces of the Nag Lord, chaos godling from the north, expand their reach bit by bit.
Player buy-in: The player characters are Dolmenwood natives, who have decided to leave behind the safe towns and fields that we know to head off in search of adventure. They’re typical loot-motivated adventurers. You’ll start off with an incomplete and inaccurate map of things people generally know of, but will need to explore to get concrete information. There’ll be plenty of rumours and strange happenings to give you ideas.
It’s the year 1125. Emperor Strephon, once a fit and handsome man, now pale and thin, consumed by stress, sits in a chair, resting his head in his hands as he studies the latest report from the front. This report, now over half a year old, confirms one thing: the Aslan are winning the war. Pax Rulin, the fortress world of the Trojan Reach, has fallen, the Reach itself is all but lost, and who knows how much further they’ll advance?
The other sector fleets have been displaced to assist, leaving skeleton defences elsewhere, and the Imperium’s other neighbours have been quick to take advantage. Incursions by the Zhodani Consulate, the Solomani Confederation, the Sword Worlds Confederation, and the Vargr, have resulted in the losses of dozens of worlds. Terra once more belongs to the Solomani.
Strephon reflects on how all this began, two years ago. It was supposed to be a great day, the highest ranking Aslan ever to make the trip to Capital: a representative not just for his clan, but for all the clans. One who could speak with the voice of the Tlaukhu. But there was a bomb. The ambassador immediately, bravely, honourably, selflessly, threw himself between Strephon and the explosion. The ambassador died in the blast, Strephon survived with only minor injuries. When the news made it back to the Hierate, they accused the Imperium of some sort of human trickery. It was the spark that ignited the powder keg, the fragile peace was shattered, and tens of thousands of territory-hungry ihatei poured over the border.
The independent worlds of the Trojan Reach were taken in a few months. The reborn Kingdom of Drinax put up a good fight, but was also lost. The imperial border was overrun, and it’s just been worse news since then. Eventually the Aslan will tire of war, and settle down to rule the worlds they’ve conquered, but how many more will they take before then?
Strephon sighs, and leans back with his eyes closed, massaging his creased forehead.
Then the door bursts open, a naval intelligence officer rushes in, glances around to make sure there’s nobody else in the room, and says “your majesty, it’s Project Longbow, it’s picked something up!”
Strephon’s mind turns to Project Longbow. A highly classified, beyond top secret, project to use a series of detectors spread across the whole Imperium, and all pointing at the same patch of sky, to make a single satellite dish dozens of parsecs wide. All to investigate some anomaly deep in space, close to the galactic core.
“Your Majesty!” the officer snaps, drawing Strephon out of his memories, “It’s a ship. A ship unlike any design we’ve ever seen before, coming from the core. And sir, it’s travelling at a speed of jump-10.”
Player buy-in: There are at least two campaign ideas here, both set against a backdrop of the collapse of the Third Imperium. We should decide what we want and flesh it out some more. Option 1 is a naval campaign where you’re fighting against one of the hostile forces, either the Aslan themselves, or one of the other factions taking advantage of the chaos. Option 2 is a scientific campaign where you’re part of a research crew sent by Imperial Naval Intelligence to investigate this mysterious high-tech ship coming from deep space. I guess there are some more options too, eg you could play a crew of regular people trying to get by in this societal collapse; or you could even play members of the invading forces, like a ship of Aslan seeking to make their names.
Drinax, once a mighty interstellar kingdom but now a bombed-out radioactive husk of a world, lies along two major trade routes and is close to two of the most significant interstellar empires in all of Charted Space: the Aslan Hierate and the Third Imperium.
The current king has a daring plan to recover some of Drinax’s lost glory: he needs a band of privateers and agents, who can curry favour with worlds once part of the kingdom, and also cause trouble along the trade routes between the Hierate and the Imperium. No empire is willing to allow another to establish a large permanent military force close to their borders, but they may be willing to delegate the job of policing the space lanes to a newly-reformed kingdom which they could easily squash if it starts trouble.
The king’s plan is to bring the old worlds of the kingdom back under the banner of Drinax, stir up piracy in the region enough that the empires agree something needs to be done, and then use that to get their blessing for a new interstellar state. Of course, if they learn too soon that Drinax is behind the upsurge in piracy, it’ll all fall apart…
Player buy-in: The player characters have two very different roles: negotiators and space pirates. Both of those have to sound fun. You will be able to (and are encouraged to!) play the faction game to some degree, making allies (or enemies) of powerful NPCs and groups, and so could delegate one side of the campaign entirely to some trusted aides, but it will take a while to get the resources to do that, so you’ll be doing both in the beginning. Furthermore, this is a sandbox-style campaign: there are several key adventures and missions, but player characters are expected to go above and beyond just those, and will need to think about how to curry favour with factions and cause trouble on the trade routes themselves.
The Ziru Sirka, the great interstellar empire of the Vilani, lasted for two thousand years. But inability to manage such a large expanse of territory caused a slow decline, which ultimately led to its conquest by the Terrans, and the establishment of their great interstellar empire: the Rule of Man.
But the Rule of Man lasted a scant few centuries before it, too, collapsed under its own weight. This time, nothing replaced it. The galaxy fell into anarchy, worlds were cut off from one another, technologies were lost, and many civilisations simply failed to survive: this is the Long Night, and this is when our game is set.
In the Long Night many small pocket empires rose, and fell. You are all scouts, ex-military, and similar sorts, employed by the government of the Sylean Federation, one of the larger pocket empires of the Long Night. It’s survived for 650 years now, but Sylea, too, is feeling the strain of administration at interstellar distances, and is also currently at war with two other pocket empires: the Interstellar Confederacy and the Chanestin Kingdom. Can Sylea solve its problems, or will it perish like so many others?
Player buy-in: The player characters are all explorers working government contracts to reach out to worlds not heard from in centuries (or millennia), to establish peaceful relations where possible, or maybe just to plunder them if the world is dead—after all, who’s to say you handed in everything you found to the government inspectors? It’s post-apocalypse in space. There will be space dungeons. But it’s also about the rise of the Sylean Federation into something even greater, so there’s plenty of scope for this to turn into a more political game if you want.
It’s the year 2090AD, the planets of the solar system have long-since been settled and exploited, and are governed by the Planetary Consortium based on Earth, which is essentially a puppet government run by the megacorps.
Everything changed a few years ago, with the invention of the Jump Drive, allowing ships to travel an entire parsec in a mere week. The Planetary Consortium saw this as the solution to the ongoing population crisis and lack of resources, and started constructing and loaning out jump-capable starships to competent crews, funding exploration, prospecting, and research.
Now, there are a few permanent colonies in the closest suitable systems. Further out, there are exploration hubs staffed by semi-permanent crews of outcasts, consortium officials, and explorers. Exploration is dangerous and proceeds slowly, at the cost of many lives. Lucky crews have found bizarre alien creatures, they’ve found resource-rich worlds which one day will be exploited, they’ve found suitable sites for new colonies, and some have even found artefacts and ruins from ancient civilisations. Nobody has found a living intelligent alien species yet, but First Contact is the dream of every explorer.
Player buy-in: The player characters are brave (or perhaps just foolhardy) space adventurers. They risk death every time they jump into uncharted space, and are motivated by the thrill of the unknown. Whatever you do and wherever you go, chances are you’re the first humans to do so.
A sandbox campaign set in the standard Traveller setting. This will probably be in a border region like the Spinward Marches or the Trojan Reach just because I think being set entirely within one of the vast space empires reduces the scope for small-scale interstellar politics: it’s either big-scale dukes-plotting-against-the-emperor kind of thing or power-tripping planetary governments, with nothing in between.
You will have a ship and a mortgage to pay off, or maybe a free ship from someone you owe favours to, and will need to do whatever it takes to meet those obligations. Carrying freight and passengers is easy but kind of dull, seeking out patrons to undertake missions for pay can be better but also risky. You’ll occasionally unintentionally stumble into trouble or adventure. You can follow rumours to find wealth, or maybe just disaster.
Player buy-in: This is essentially a job- or planet-of-the-week / “do what you want” campaign: I’ll prepare interesting locations, people, rumours, and opportunities, and you engage with whatever seems fun. This campaign can easily transform into a more focussed one if we want.
]]>memo.barrucadu.co.uk
, to numbers, like 116.203.34.201
. It’s federated, with trusted entities (you may have heard of the “DNS root servers”) delegating control of segments of the DNS namespace to others. It holds hundreds of millions of records, and updates to this database are typically visible in minutes to hours.
And the protocol behind it is not massively different to when it was standardised in the 1980s.
In this memo I’ll cover:
memo.barrucadu.co.uk
to an IP addressIf you want to get it straight from the horse’s mouth, RFC 1034: Domain Names - Concepts and Facilities and RFC 1035: Domain Names - Implementation and Specification are the standards I’m drawing on. They’re very approachable, and I encourage you to read them.
You can also look at resolved, the DNS server I wrote, which acts as both a recursive (or forwarding) and authoritative nameserver, and is suitable for home networks. Well, my home network. I can’t promise anything about yours.
Let’s start with an example:2
$ dig memo.barrucadu.co.uk +noedns ; <<>> DiG 9.16.25 <<>> memo.barrucadu.co.uk +noedns ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37169 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;memo.barrucadu.co.uk. IN A ;; ANSWER SECTION: memo.barrucadu.co.uk. 292 IN CNAME barrucadu.co.uk. barrucadu.co.uk. 292 IN A 116.203.34.201 ;; AUTHORITY SECTION: barrucadu.co.uk. 2975 IN NS ns-98.awsdns-12.com. barrucadu.co.uk. 2975 IN NS ns-1520.awsdns-62.org. barrucadu.co.uk. 2975 IN NS ns-1828.awsdns-36.co.uk. barrucadu.co.uk. 2975 IN NS ns-763.awsdns-31.net. ;; Query time: 0 msec ;; SERVER: 185.12.64.2#53(185.12.64.2) ;; WHEN: Tue Mar 22 16:42:02 GMT 2022 ;; MSG SIZE rcvd: 202
I’ve used dig
a lot so I’m fairly used to reading this output, but I’ve since realised I wasn’t really reading it.
What does flags: qr rd ra
mean?
The QUESTION SECTION
and ANSWER SECTION
make sense, but what’s the point of the AUTHORITY SECTION
? Do all queries have an AUTHORITY SECTION
?
$ dig www.google.com +noedns ; <<>> DiG 9.16.25 <<>> www.google.com +noedns ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46676 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 102 IN A 142.250.185.100 ;; Query time: 0 msec ;; SERVER: 185.12.64.2#53(185.12.64.2) ;; WHEN: Tue Mar 22 16:49:36 GMT 2022 ;; MSG SIZE rcvd: 48
…no AUTHORITY SECTION
there. Is it unimportant? Or optional?
Also, all the domain names there have a trailing dot. What’s that about?3
Time to dig into the protocol. RFC 1035 is our guide here.
DNS has two types of messages, queries and responses, and uses port 53. It prefers UDP but, if a message is too long to send in a single UDP datagram, it falls back to TCP.
A DNS message has five parts. These are:
A header, which specifies what sort of message this is and how many entries are in the other parts. This also has those flags we saw in the dig
output.
The “question section”, which specifies what sort of records the client is interested in. Did you know that you can ask multiple questions with a single DNS query? I didn’t.
The “answer section”, a collection of records directly answering the questions.
The “authority section”, a series of NS
records pointing to an authoritative source which can answer the questions.
The “additional section”, a series of records which may be useful when using records from the answer and authority sections. For example, the A
records for any nameservers given in the authority section.
The answer, authority, and additional sections won’t be present in a query. But the question section will be present in a response: it’s copied over from the query.
The header is 12 bytes long and has a few different fields packed in there. RFC 1035 has some nice ASCII art illustrations:
1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ID | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |QR| Opcode |AA|TC|RD|RA| Z | RCODE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | QDCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ANCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | NSCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ARCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
ID
is a 16-bit random identifier set by the client and copied into the response by the server. Since UDP is connectionless, this is essential for the client to know which response goes with which query.4
QR
indicates whether this is a query (0) or a response (1).
OPCODE
is a four-bit field, set by the client and copied into the response by the server, indicating what type of query this message is. The most common opcode is 0, which is a “standard query”.
AA
(“Authoritative Answer”) is set by the server and means that this response is authoritative.
More on authority in zones?
TC
(“Truncation”) is set by the server and means that the full response couldn’t fit in a single UDP datagram, and so the client should try again using TCP.5
RD
(“Recursion Desired”) is set by the client, and copied into the response by the server, and means that they would like the server to answer the question recursively, if they can.
More on recursive and non-recursive resolution in how resolution happens.
RA
(“Recursion Available”) is set by the server and means that it can perform recursive resolution, if requested.
Z
is reserved for future use, and so should be set to zero if you don’t implement those future standards.
RCODE
is a four-bit field, set by the server, indicating what type of response this message is. There are a few common ones:
QDCOUNT
, ANCOUNT
, NSCOUNT
, and ARCOUNT
are unsigned 16-bit (big endian) integers specifying the number of entries in the question, answer, authority, and additional sections (respectively) of the message.
Since all the multi-byte fields in a DNS message are unsigned and big endian, I’ll not mention it from now on.
Before diving into the other sections, let’s have a look at how domain names are encoded. They show up a lot, after all.
Let’s take the domain name memo.barrucadu.co.uk.
, and separate it by dots. This gives us a sequence of labels:
memo
barrucadu
co
uk
How you actually interpret those labels is a bit confused, unfortunately.
RFC 1035 says that they are sequences of arbitrary octets and that you can’t assume any particular character encoding… but it also says that labels are to be compared case-insensitively.
RFC 4343 clarifies that that means octets in the range 0x41
to 0x5a
(the upper case ASCII letters) are considered equal to corresponding octets in the range 0x61
to 0x7a
(the lower case ASCII letters), and vice versa, but that that still doesn’t mean that labels are ASCII, as they can also contain arbitrary non-ASCII octets.
But there’s also RFC 3492, which defines the punycode standard for encoding internationalised, i.e. unicode, domain names into ASCII. So maybe domain names are ASCII after all?
There may well be a later RFC which resolves this ambiguity and says that labels are definitely ASCII, but I haven’t seen it yet.
Anyway, back to the topic of encoding.
A label is encoded as a one-octet length field followed by the octets of the label. And an encoded domain name is a sequence of encoded labels. This means that a domain name ends with 0x00
, the length of the empty label.6
So memo.barrucadu.co.uk
is encoded as:
0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00
There are two restrictions on the validity of domain names:
A single label may be no more than 63 octets long (not including the length octet)
An entire encoded domain name may be no more than 255 octets long (including the label length octets)
Unfortunately, that’s not all.
Domain names get repeated a lot in DNS messages, and the 512 bytes of a UDP datagram can start to feel pretty limiting. So DNS also has a compression mechanism, where some suffix of a domain name can be replaced with a pointer to an earlier occurrence of that name.
So if the name memo.barrucadu.co.uk.
appears in a message twice, the second occurrence could be represented as:
memo.barrucadu.co.uk.
memo.barrucadu.co.[pointer to uk.]
memo.barrucadu.[pointer to co.uk.]
memo.[pointer to barrucadu.co.uk.]
[pointer to memo.barrucadu.co.uk.]
But how do you distinguish between a regular label and a pointer? Well, remember that a label can’t be longer than 63 octets. And what’s 63 as an 8-bit binary number?
It’s 00111111
.
There’s two whole bits there at the front which are completely wasted!
So pointers are encoded as the two-octet sequence 11[14-bit index into message]
.
Pretty clever.
1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | / QNAME / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | QTYPE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | QCLASS | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
QNAME
is the domain name, which can be any length (so long as it’s properly encoded), it’s not padded to any specific size.
QTYPE
is a 16-bit integer specifying the type of records the client is interested in. Which will usually be a record type (see the next subsection) or 255, meaning “all records”. There are a few other QTYPE
s but those are less common.
QCLASS
is a 16-bit integer specifying which network class the client is interested in. These days this will always be 1, or IN
, for “internet”.7
We can now understand the question section of our dig
example!
;; QUESTION SECTION: ;memo.barrucadu.co.uk. IN A
Means that it’s looking for an internet address record for memo.barrucadu.co.uk.
(yes, it shows the type and class the other way around). That question is encoded as:
0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; qname: memo.barrucadu.co.uk. 0x00 0x01 ; qtype: A 0x00 0x01 ; qclass: IN
The answer, authority, and additional sections are all a sequence of resource records:
1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | / / / NAME / | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TYPE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | CLASS | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TTL | | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | RDLENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| / RDATA / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
NAME
is the domain name, which is variable-length like the QNAME
of a question.
TYPE
is a 16-bit integer specifying what sort of record this is. There are a fair few of these, but some common ones are:
A
recordNS
recordCNAME
recordAAAA
record (from RFC 3596)CLASS
is a 16-bit integer specifying what network class this record applies to. Like the QCLASS
, these days this will always be 1. Unless you’re specifically running some sort of old non-IP-based network for fun.
TTL
is a 32-bit integer specifying the number of seconds that this record is valid for. This is important for caching purposes. Zero has a special meaning: it means that you can use the record to do whatever it is you’re doing right now, but that you can’t cache it at all.
RDLENGTH
is a 16-bit integer specifying the length of the RDATA
section.
RDATA
is the record data, which is type- and class-specific. For example:
IN A
records hold an IPv4 address, as a 32-bit numberIN NS
and IN CNAME
records hold a domain nameIN AAAA
records hold an IPv6 address, as a 128-bit numberReturning to our dig
example, we had a few different resource records in the response. Let’s just look at the answer section:
;; ANSWER SECTION: memo.barrucadu.co.uk. 292 IN CNAME barrucadu.co.uk. barrucadu.co.uk. 292 IN A 116.203.34.201
We have one IN CNAME
record for memo.barrucadu.co.uk.
and one IN A
record for barrucadu.co.uk.
. This is because, upon encountering a CNAME
, resolution starts again with whatever name the CNAME
refers to.8
Leaving out the name compression for simplicity, those records are encoded as:
0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; name: memo.barrucadu.co.uk. 0x00 0x05 ; type: CNAME 0x00 0x01 ; class: IN 0x00 0x00 0x01 0x24 ; ttl: 292 0x00 0x11 ; rdlength: 17 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; rdata: barrucadu.co.uk. 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; name: barrucadu.co.uk. 0x00 0x01 ; type: A 0x00 0x01 ; class: IN 0x00 0x00 0x01 0x24 ; ttl: 292 0x00 0x04 ; rdlength: 4 0x74 0xcb 0x22 0xc9 ; rdata: 116.203.34.201
Returning to our dig memo.barrucadu.co.uk +noedns
example from the beginning, we can now see the whole encoded query and response. I’ve included comments and linebreaks to make it clear what’s what.
Here’s the query:
;; header 0xb6 0x54 ; ID: 46676 0x01 0x00 ; flags: RD 0x00 0x01 ; QDCOUNT: 1 0x00 0x00 ; ANCOUNT: 0 0x00 0x00 ; NSCOUNT: 0 0x00 0x00 ; ARCOUNT: 0 ;; question section ; memo.barrucadu.co.uk. A IN 0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01
And here’s the response (omitting compression):
;; header 0xb6 0x54 ; ID: 46676 0x81 0x80 ; flags: QR, RD, RA 0x00 0x01 ; QDCOUNT: 1 0x00 0x02 ; ANCOUNT: 2 0x00 0x04 ; NSCOUNT: 4 0x00 0x00 ; ARCOUNT: 0 ;; question section ; memo.barrucadu.co.uk. A IN 0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01 ;; answer section ; memo.barrucadu.co.uk. CNAME IN 292 barrucadu.co.uk. 0x04 m e m o 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x05 0x00 0x01 0x00 0x00 0x01 0x24 0x00 0x11 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; barrucadu.co.uk. A IN 292 116.203.34.201 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01 0x00 0x00 0x01 0x24 0x00 0x04 0x74 0xcb 0x22 0xc9 ;; authority section ; barrucadu.co.uk. NS IN 2975 ns-98.awsdns-12.com. 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x02 0x00 0x01 0x00 0x00 0x0b 0x9f 0x00 0x15 0x05 n s - 9 8 0x09 a w s d n s - 1 2 0x03 c o m 0x00 ; barrucadu.co.uk. NS IN 2975 ns-1520.awsdns-62.org. 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x02 0x00 0x01 0x00 0x00 0x0b 0x9f 0x00 0x17 0x07 n s - 1 5 2 0 0x09 a w s d n s - 6 2 0x03 o r g 0x00 ; barrucadu.co.uk. NS IN 2975 ns-1828.awsdns-36.co.uk. 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x02 0x00 0x01 0x00 0x00 0x0b 0x9f 0x00 0x19 0x07 n s - 1 8 2 8 0x09 a w s d n s - 3 6 0x02 c o 0x02 u k 0x00 ; barrucadu.co.uk. NS IN 2975 ns-763.awsdns-31.net. 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x02 0x00 0x01 0x00 0x00 0x0b 0x9f 0x00 0x16 0x06 n s - 7 6 3 0x09 a w s d n s - 3 1 0x03 n e t 0x00
And that’s that!
The DNS protocol isn’t very complicated. But it is somewhat fiddly, what with each record type having its own RDATA
format, and the domain name compression. One big thing I learned implementing resolved is to always fuzz test your serialisation and deserialisation logic.
When we ran dig memo.barrucadu.co.uk +noedns
in the previous section, we got an answer. We found the IP address which memo.barrucadu.co.uk.
refers to.
But how?
Well, dig
tells us that it talked to some server at 185.12.64.2. But how did that server know? Does it have a copy of the entire DNS? Unlikely, since there are hundreds of millions of records in use.
The answer is that the server followed a process called recursive resolution. This is described in section 5.3.3 of RFC 1034:
On the face of it this looks pretty straightforward… but on closer inspection that step 2 is doing a lot of work: how exactly do we “figure out the best nameservers to ask”?9
Well, step 4.b gives us a clue here: if the response gives some better nameservers to use, cache them and go back to step 2. So we don’t need to pick the correct nameservers at the very beginning. We only need to know about a nameserver which will be able to point us to a nameserver which knows that (or is closer to knowing that).
There are thirteen nameservers which, transitively, know about every domain name. These are the root nameservers, and they’re where recursive resolution starts.
You can find them at a.root-servers.net.
through m.root-servers.net.
So you just point your recursive resolver at, say, j.root-servers.net.
and… oh wait, we have a chicken-and-egg problem. Ultimately, you need to know their IP addresses. IANA, the Internet Assigned Numbers Authority, provides the “root hints” file, which has the IPv4 and IPv6 addresses of these root nameservers.
How do you download that file if you don’t have DNS working to resolve www.iana.org.
? Look, you just need IP addresses to get DNS and DNS to get IP addresses. Use 1.1.1.1 or something while you get your fancy recursive resolver working.
Alright, let’s resolve memo.barrucadu.co.uk.
recursively! Starting with:
$ dig memo.barrucadu.co.uk @j.root-servers.net ; <<>> DiG 9.16.27 <<>> memo.barrucadu.co.uk @j.root-servers.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48477 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 17 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;memo.barrucadu.co.uk. IN A ;; AUTHORITY SECTION: uk. 172800 IN NS dns1.nic.uk. uk. 172800 IN NS dns4.nic.uk. uk. 172800 IN NS nsa.nic.uk. uk. 172800 IN NS nsd.nic.uk. uk. 172800 IN NS nsc.nic.uk. uk. 172800 IN NS nsb.nic.uk. uk. 172800 IN NS dns3.nic.uk. uk. 172800 IN NS dns2.nic.uk. ;; ADDITIONAL SECTION: dns1.nic.uk. 172800 IN A 213.248.216.1 dns1.nic.uk. 172800 IN AAAA 2a01:618:400::1 dns4.nic.uk. 172800 IN A 43.230.48.1 dns4.nic.uk. 172800 IN AAAA 2401:fd80:404::1 nsa.nic.uk. 172800 IN A 156.154.100.3 nsa.nic.uk. 172800 IN AAAA 2001:502:ad09::3 nsd.nic.uk. 172800 IN A 156.154.103.3 nsd.nic.uk. 172800 IN AAAA 2610:a1:1010::3 nsc.nic.uk. 172800 IN A 156.154.102.3 nsc.nic.uk. 172800 IN AAAA 2610:a1:1009::3 nsb.nic.uk. 172800 IN A 156.154.101.3 nsb.nic.uk. 172800 IN AAAA 2001:502:2eda::3 dns3.nic.uk. 172800 IN A 213.248.220.1 dns3.nic.uk. 172800 IN AAAA 2a01:618:404::1 dns2.nic.uk. 172800 IN A 103.49.80.1 dns2.nic.uk. 172800 IN AAAA 2401:fd80:400::1 ;; Query time: 4 msec ;; SERVER: 2001:503:c27::2:30#53(2001:503:c27::2:30) ;; WHEN: Sat Apr 02 23:20:04 BST 2022 ;; MSG SIZE rcvd: 553
Alright, we now know the names and IP addresses of the uk.
nameservers. Thanks, additional section!
On we go:
$ dig memo.barrucadu.co.uk @213.248.216.1 ; <<>> DiG 9.16.27 <<>> memo.barrucadu.co.uk @213.248.216.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43056 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;memo.barrucadu.co.uk. IN A ;; AUTHORITY SECTION: barrucadu.co.uk. 172800 IN NS ns-98.awsdns-12.com. barrucadu.co.uk. 172800 IN NS ns-763.awsdns-31.net. barrucadu.co.uk. 172800 IN NS ns-1520.awsdns-62.org. barrucadu.co.uk. 172800 IN NS ns-1828.awsdns-36.co.uk. ;; Query time: 14 msec ;; SERVER: 213.248.216.1#53(213.248.216.1) ;; WHEN: Sat Apr 02 23:21:28 BST 2022 ;; MSG SIZE rcvd: 183
No additional section here, so we’ll need to resolve one of those nameservers. Back to the root!
$ dig ns-98.awsdns-12.com @j.root-servers.net ; <<>> DiG 9.16.27 <<>> ns-98.awsdns-12.com @j.root-servers.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8418 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;ns-98.awsdns-12.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. ;; ADDITIONAL SECTION: a.gtld-servers.net. 172800 IN A 192.5.6.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 m.gtld-servers.net. 172800 IN A 192.55.83.30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 ;; Query time: 3 msec ;; SERVER: 2001:503:c27::2:30#53(2001:503:c27::2:30) ;; WHEN: Sat Apr 02 23:22:36 BST 2022 ;; MSG SIZE rcvd: 844
We’ve got the com.
nameservers. Next!10
$ dig ns-98.awsdns-12.com @192.5.6.30 ; <<>> DiG 9.16.27 <<>> ns-98.awsdns-12.com @192.5.6.30 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59687 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ns-98.awsdns-12.com. IN A ;; AUTHORITY SECTION: awsdns-12.com. 172800 IN NS g-ns-13.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-588.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-1164.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-1740.awsdns-12.com. ;; ADDITIONAL SECTION: g-ns-13.awsdns-12.com. 172800 IN A 205.251.192.13 g-ns-13.awsdns-12.com. 172800 IN AAAA 2600:9000:5300:d00::1 g-ns-588.awsdns-12.com. 172800 IN A 205.251.194.76 g-ns-588.awsdns-12.com. 172800 IN AAAA 2600:9000:5302:4c00::1 g-ns-1164.awsdns-12.com. 172800 IN A 205.251.196.140 g-ns-1164.awsdns-12.com. 172800 IN AAAA 2600:9000:5304:8c00::1 g-ns-1740.awsdns-12.com. 172800 IN A 205.251.198.204 g-ns-1740.awsdns-12.com. 172800 IN AAAA 2600:9000:5306:cc00::1 ;; Query time: 23 msec ;; SERVER: 192.5.6.30#53(192.5.6.30) ;; WHEN: Sat Apr 02 23:24:01 BST 2022 ;; MSG SIZE rcvd: 317
Nearly there… each query gets us a step or two closer.
$ dig ns-98.awsdns-12.com @205.251.192.13 ; <<>> DiG 9.16.27 <<>> ns-98.awsdns-12.com @205.251.192.13 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43579 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;ns-98.awsdns-12.com. IN A ;; ANSWER SECTION: ns-98.awsdns-12.com. 172800 IN A 205.251.192.98 ;; AUTHORITY SECTION: awsdns-12.com. 172800 IN NS g-ns-1164.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-13.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-1740.awsdns-12.com. awsdns-12.com. 172800 IN NS g-ns-588.awsdns-12.com. ;; ADDITIONAL SECTION: g-ns-1164.awsdns-12.com. 172800 IN A 205.251.196.140 g-ns-1164.awsdns-12.com. 172800 IN AAAA 2600:9000:5304:8c00::1 g-ns-13.awsdns-12.com. 172800 IN A 205.251.192.13 g-ns-13.awsdns-12.com. 172800 IN AAAA 2600:9000:5300:d00::1 g-ns-1740.awsdns-12.com. 172800 IN A 205.251.198.204 g-ns-1740.awsdns-12.com. 172800 IN AAAA 2600:9000:5306:cc00::1 g-ns-588.awsdns-12.com. 172800 IN A 205.251.194.76 g-ns-588.awsdns-12.com. 172800 IN AAAA 2600:9000:5302:4c00::1 ;; Query time: 13 msec ;; SERVER: 205.251.192.13#53(205.251.192.13) ;; WHEN: Sat Apr 02 23:24:41 BST 2022 ;; MSG SIZE rcvd: 333
We’ve got an IP address for ns-98.awsdns-12.com.
! Now we can answer our original question:
$ dig memo.barrucadu.co.uk @205.251.192.98 ; <<>> DiG 9.16.27 <<>> memo.barrucadu.co.uk @205.251.192.98 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26684 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;memo.barrucadu.co.uk. IN A ;; ANSWER SECTION: memo.barrucadu.co.uk. 300 IN CNAME barrucadu.co.uk. barrucadu.co.uk. 300 IN A 116.203.34.201 ;; AUTHORITY SECTION: barrucadu.co.uk. 172800 IN NS ns-1520.awsdns-62.org. barrucadu.co.uk. 172800 IN NS ns-1828.awsdns-36.co.uk. barrucadu.co.uk. 172800 IN NS ns-763.awsdns-31.net. barrucadu.co.uk. 172800 IN NS ns-98.awsdns-12.com. ;; Query time: 13 msec ;; SERVER: 205.251.192.98#53(205.251.192.98) ;; WHEN: Sat Apr 02 23:25:37 BST 2022 ;; MSG SIZE rcvd: 213
And we’re done, after 6 requests to other nameservers. And in a real nameserver implementation, we’d be checking before each of those requests whether we already had the answer cached, so likely some of them (eg, the request to find the com.
nameservers) wouldn’t have been needed.
In the previous section, it looked very much like the DNS was broken up into subtrees (or “zones”, if you will) based on the label structure:
.
nameservers knew about the com.
and uk.
nameservers, but couldn’t answer queries about subdomains of those directlyuk.
nameservers knew about the nameservers for barrucadu.co.uk.
, but not any of its other recordscom.
nameservers and awsdns-12.com.
This makes sense. Imagine if the root nameservers knew every DNS record! Their databases would be huge! It would be infeasible to run a handful of servers which know hundreds of millions of records and which the whole world uses.
So .
is a zone. And uk.
is a zone. And barrucadu.co.uk.
is a zone. All of the TLDs are zones, and every domain you can buy creates a new zone. A zone can be bigger than a single label, e.g. foo.bar.baz.barrucadu.co.uk.
is in the barrucadu.co.uk.
zone unless I delegate it to someone else, by creating some NS
records for, say, baz.barrucadu.co.uk.
That’s exactly how registering a domain name works, by the way. The registrars have privileged access to the TLD nameservers, and you pay them some money for them to send a message to the nameservers saying “please delegate barrucadu
to these other nameservers”.
Zones are traditionally represented in a textual format defined in RFC 1035.11 You’ve seen this format before: it’s the format dig
gives its responses in and it’s the format of the root hints file (and the root zone file, also provided by IANA).
Here’s the zone file I use for my LAN DNS:
$ORIGIN lan. @ 300 IN SOA @ @ 4 300 300 300 300 router 300 IN A 10.0.0.1 nyarlathotep 300 IN A 10.0.0.3 *.nyarlathotep 300 IN CNAME nyarlathotep help 300 IN CNAME nyarlathotep *.help 300 IN CNAME help nas 300 IN CNAME nyarlathotep
It’s a list of records, but note that they all use relative domain names (no dot at the end). I could write them as absolute domain names, but that would be repetitive, and who doesn’t want to golf their zone files? The $ORIGIN
line at the top is used to complete any relative names, and the @
is an alias for the origin, so this zone file could also be written as:
lan. 300 IN SOA lan. lan. 4 300 300 300 300 router.lan. 300 IN A 10.0.0.1 nyarlathotep.lan. 300 IN A 10.0.0.3 *.nyarlathotep.lan. 300 IN CNAME nyarlathotep.lan. help.lan. 300 IN CNAME nyarlathotep.lan. *.help.lan. 300 IN CNAME help.lan. nas.lan. 300 IN CNAME nyarlathotep.lan.
Zones come in two types: authoritative (also just called a zone, or a master zone) and non-authoritative (also called hints). An authoritative zone has a SOA record, and causes the nameserver to give authoritative responses to questions which fall into that zone.12
Non-authoritative zones don’t, and are primarily useful as a sort of permanent cache. Take the root hints file for example: all recursive resolvers need to know the NS
records for .
. But they should not act as if they’re authoritative for .
, they just know a little bit about it.
Since any nameserver could claim to be authoritative for any zone it wants, and I’m sure malicious nameservers often do try to claim ownership of big sites like google.com.
, how does the DNS work?
It works on trust.
You trust that the root nameservers will give you the correct nameservers for all the TLDs. You then, in turn, trust that the TLD nameservers will give the correct nameservers for the domains registered under those TLDs. And so on, all the way down to the domain you actually want to resolve.
Not every nameserver operator will be equally trustworthy or competent, so that trust does erode somewhat as you move further and further away from the root, but if you do some basic validation of DNS responses (e.g. rejecting a response with NS records for a domain which is not a subdomain of the zone which you know this nameserver to be authoritative for), you can do pretty well.
There are, broadly speaking, three sorts of nameservers you see:
Authoritative nameservers are the source of truth for records about a given zone. Typically, these refuse to answer questions for other zones. These set the AA
flag for queries falling into their zones and return a “name error” response if a name they are authoritative for doesn’t exist.13
In resolved this is implemented by the dns_resolver::nonrecursive
module.
Recursive nameservers (or recursive resolvers) perform recursive resolution for anyone who wants it. For example: 1.1.1.1, 8.8.8.8, and the nameserver your ISP operates. Typically, these are not authoritative for any zones. Recursive nameservers are convenient because the client doesn’t need to implement the recursive algorithm themselves: they can just fire off a query and get the response.14
In resolved this is implemented by the dns_resolver::recursive
module.
Forwarding nameservers (or forwarding resolvers) forward all queries to a recursive resolver, rather than do the recursive resolution themselves. Typically, these are not authoritative for any zones. Forwarding nameservers are simpler than recursive nameservers, and they’re useful for the same reason any other sort of proxy is: they can increase cache hit rate (by having many clients go through the forwarding resolver), and selectively falsify or block records.15
In resolved this is implemented by the dns_resolver::forwarding
module.
Of course, there’s no reason a single nameserver can’t do all of those things at the same time!
Consider bind, the big-name nameserver. Check out its configuration documentation: it says any zone can authoritative, forwarded, or hints, and the allow-recursion
option configures whether recursive queries for zones the server doesn’t know about are allowed.
My resolved server by default supports authoritative zones and recursive resolution. It may not appear to support bind-style zone-specific forwarding, but you could implement that with a hints file containing NS records for the zone you want to forward, and there is a command-line flag to forward all recursive queries to some other server.
The reason you’d want to make a nameserver do only one sort of resolution is to make operation simpler. In particular, it’s good practice for internet-facing authoritative nameservers to only perform non-recursive resolution. Answering or rejecting queries based only on local data makes them have much more predictable performance.
When I first got into all this web development stuff, the common wisdom was that DNS changes took 24 to 48 hours to propagate. But having seen some details of the DNS protocol and how recursive resolution works, does that really make sense? Shouldn’t changes be visible as soon as the TTL of the old record expires? And shouldn’t new records be visible immediately? Why do changes need to propagate? Where do they propagate to?
Propagation implies a push model, where you make your changes and then they get sent to the resolvers which need them. But that’s not what happens at all: instead, caches expire.
Ok, there are two cases in which DNS does propagate:
My hunch is that this 24 to 48 hour window came from:
Ah, ISP DNS. Almost the first thing any self-respecting nerd changes when setting up a new home network. They often do nefarious things like redirect misspelled domain names to ad-covered search pages, trying to profit off your typos. And, as it turns out, a lot of them ignore record TTLs, and will cache something for a long period if they feel like it.
How long? Well, I’ve seen reports of 24 hours…
Well, no matter what the cause of the occasional slow DNS update is—though I can’t say I’ve experienced slow DNS updates in a very long time, and updates are evidently fast enough for changing an A record to be considered a viable failover mechanism for big sites—“propagation” is the wrong mental model.
DNS is pull, not push.
I’ve been running resolved for my LAN for about two and a half weeks now. And it’s working pretty well! Ok, I have implemented a few more RFCs:
So there are a few things. But what I’ve covered in this memo is, more or less, enough to implement a working nameserver. You’d need to look up the formats of a few more common record types in RFC 1035, and also the full algorithm for non-recursive resolution in RFC 1034 (which I glossed over in a single sentence), but the point is that DNS is not very complicated, even today.
There have been new record types; there have been security extensions; there have been clarifications; some zones have been given special meaning. But all of that is optional.
Certainly for a home network, RFCs 1034 and 1035 are enough.
You could even call it a “NoSQL” database, if you really must.↩︎
The +noedns
flag turns off some extensions to the basic DNS protocol, which I’m not covering for simplicity.↩︎
Ok, I’ve actually known this one for a while, because I’m the sort of person to pedantically bring that up.↩︎
Well, this plus source port matching. There are also some other security mechanisms DNS clients sometimes use to prevent spoofed responses, like randomly capitalising letters in the question names (since DNS is case-insensitive), and checking that the response from the server uses the same random capitalisation.↩︎
Fun fact, Alpine Linux doesn’t support DNS over TCP, so it can break if a truncated response doesn’t include enough complete records for it to make progress.↩︎
And also makes encoded domain names work as null-terminated strings in C in the (very common) case where none of the labels contain a null byte. What a fortuitous coincidence!↩︎
It feels kind of wasteful that we effectively throw away 16 whole bits for each question and record on this historical artefact. UDP messages are short, so we compress domain names to squeeze out a little extra space, but then we waste a bunch like this! Even worse, there never were very many network classes: RFC 1035 only defines four. Did the IETF really expect there to be so many non-internet networks in the future?↩︎
Unless the query was for, say, IN CNAME memo.barrucadu.co.uk.
. More on this in how resolution happens.↩︎
That step 1 is also doing a surprising amount of work if your nameserver supports authoritative zones (see next section). For the full gory details, see section 4.3.2 of RFC 1034.↩︎
I know it’s a necessary consequence of how DNS works, but I still find it pretty cool that there are servers which know about literally every com.
(or uk.
, or net.
, etc) domain name.↩︎
Like the DNS protocol, this format appears to be straightforward but is annoyingly fiddly when you come to implement it. It’s almost (but not quite!) line-oriented, just about every field is optional, and there are two fields which can be written in either order. Just why?↩︎
See the next section for more on authoritative nameservers.↩︎
Note that there’s a difference between a domain not existing and a domain existing but having no records at all (or just no records matching the current query). An authoritative nameserver should only return a name error if it actually doesn’t exist.↩︎
In fact, the resolver your operating system uses is probably what’s called a “stub resolver”, rather than a recursive resolver. Try configuring your DNS resolver in /etc/resolv.conf
to be one of the root nameservers, rather than a recursive resolver: it won’t work.↩︎
The Pi-hole is a forwarding resolver which blocks advertising domains by returning a fake A
record pointing to some unusable IP address, like 0.0.0.0.↩︎
I used to read this as “A-A-A-A” but, having now typed and said it a bunch, I’ve switched to the less tounge-twistery “quad-A”. I wonder what actual networking people say.↩︎
This memo is about how I ended up implementing the caching layer. You don’t need to know much about DNS to follow this memo, just some basics:
www.barrucadu.co.uk
) and the values are resource records (RRs for short).Let’s make this a bit more concrete:
// we'll need these later use priority_queue::PriorityQueue; use std::cmp::Reverse; use std::collections::HashMap; use std::net::Ipv4Addr; use std::time::{Duration, Instant}; /// A resource record, or RR, is something we receive from another /// nameserver, or which we send in answer to a client's query. #[derive(Debug)] pub struct ResourceRecord { pub name: DomainName, pub rtype: RecordTypeWithData, pub rclass: RecordClass, pub ttl: Duration, } /// A domain name is a sequence of "labels", eg, `www.barrucadu.co.uk` /// is made up of the labels `["www", "barrucadu", "co", "uk", ""]`. /// The final empty label is the root domain, which we normally don't /// bother writing, but is meaningful in some contexts. /// /// Incidentally, the final empty label means that in the DNS wire /// format, names are null-terminated. I'm sure this isn't a /// coincidence. /// /// Labels are ASCII and case-insensitive, so make sure to construct /// them correctly! #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub struct DomainName { pub labels: Vec<Vec<u8>>, } /// Record data depends on its type, so this enum has one variant for /// each type. #[derive(Debug, Clone, PartialEq, Eq)] pub enum RecordTypeWithData { A { address: Ipv4Addr }, CNAME { cname: DomainName }, // many more omitted } /// We'll also need a notion of record type *without* data. #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] pub enum RecordType { A, CNAME, // many more omitted } impl RecordTypeWithData { pub fn rtype(&self) -> RecordType { match self { RecordTypeWithData::A { .. } => RecordType::A, RecordTypeWithData::CNAME { .. } => RecordType::CNAME, // many more omitted } } } /// The record class identifies which sort of network the record is /// for. For the purposes of this memo, let's only consider the /// internet. #[derive(Debug, Copy, Clone, PartialEq, Eq)] pub enum RecordClass { IN, }
Before we go any further, there’s one final prerequisite. When you ask a DNS server for some records, you don’t say,
Give me all records of such-and-such record type and record class for
www.barrucadu.co.uk
.
You instead ask in terms of a query type and query class.
In this memo, you can think of those as just the record types and classes we’ve just defined, plus a wildcard to mean “match anything”:
#[derive(Debug, Copy, Clone)] pub enum QueryType { Record(RecordType), Wildcard, } // does a record match a query, or a query match a record? this is // the way 'round I went for, but the other choice would make just as // much sense. impl RecordType { pub fn matches(&self, qtype: &QueryType) -> bool { match qtype { QueryType::Wildcard => true, QueryType::Record(rtype) => rtype == self, } } } #[derive(Debug, Copy, Clone)] pub enum QueryClass { Record(RecordClass), Wildcard, } impl RecordClass { pub fn matches(&self, qclass: &QueryClass) -> bool { match qclass { QueryClass::Wildcard => true, QueryClass::Record(rclass) => rclass == self, } } }
There are a few more in reality, but they’re not important for our purposes.
So we’ll use the Record*
types to put values into the cache and the Query*
types to get values from the cache.
Right, what’s the simplest possible cache we could implement?
Perhaps something like this:1
pub struct SimpleCache { entries: HashMap<DomainName, Vec<(RecordTypeWithData, RecordClass, Instant)>>, } impl SimpleCache { pub fn new() -> Self { Self { entries: HashMap::new(), } }
To put something in the cache, you just add it to the appropriate Vec
:
pub fn insert(&mut self, name: &DomainName, rr: ResourceRecord) { let entry = (rr.rtype, rr.rclass, Instant::now() + rr.ttl); if let Some(entries) = self.entries.get_mut(name) { entries.push(entry); } else { self.entries.insert(name.clone(), vec![entry]); } }
What if the user inserts the same record twice?
Well, what about it? This is a proof-of-concept! The DNS resolver will return duplicate records I guess! Moving swiftly on…
To get something from the cache, just iterate over the appropriate Vec
, pulling out all the records with the right type and class:
pub fn get( &self, name: &DomainName, qtype: QueryType, qclass: QueryClass, ) -> Vec<ResourceRecord> { let now = Instant::now(); if let Some(entries) = self.entries.get(name) { let mut rrs = Vec::with_capacity(entries.len()); for (rtype_with_data, rclass, expires) in entries { if rtype_with_data.rtype().matches(&qtype) && rclass.matches(&qclass) { rrs.push(ResourceRecord { name: name.clone(), rtype: rtype_with_data.clone(), rclass: *rclass, ttl: expires.saturating_duration_since(now), }); } } rrs } else { Vec::new() } } }
What if a record has expired?
Proof-of-concept! The caller can deal with that by checking expiration times or something!
So, this was the caching implementation I started with. It works, but it has some problems:
HashMap
, totally ignoring their hierarchical label structure.But it’s better than no cache!
Ok, how do we do better? The most egregious problems with the simple cache are the duplicate entries and the unbounded growth.
Using something that takes the hierarchical structure of domain names into account, like a trie, would also be nice, but I’m not dealing with enough live cache entries for that to be a concern yet.
So, how do we remove entries?
Well, we could periodically iterate over the entire cache, removing all expired entries. But if entries have a long expiration time, or just get accessed frequently enough, they won’t expire. So relying on expiration isn’t enough, we also need to occasionally remove live entries.
This sounds like a job for an LRU2 cache: a size-bounded LRU cache with expiring entries for my DNS server!
Before jumping straight to the struct
definition, let’s think about how to model this:
To solve the problem of iterating through records of unrelated types, we’ll need to subdivide the entries by type as well as domain name.
We’ll need to keep track of the most recent time each record has been accessed, so when the cache is full of unexpired records we can work out which one to evict first.
But the cache may be big. There could be hundreds or thousands of domains in there, each likely with multiple records. Iterating through the whole thing to find records to evict is a bad choice. We need a more efficient data structure to map from eviction priority to domain name.
For similar reasons, we don’t want to have to iterate through the entire cache to work out how big it is.
My usual mantra for designing data structures is to “make illegal states unrepresentable”, but I don’t think that will work here. To make this cache efficient, we’ll need to denormalise the data, and make our code ensure the correct invariants hold. Testing helps with this (and indeed testing did find some bugs in my implementation).
So I decided to use a pair of priority queues3 to efficiently track (1) which domain is next to have an expiring record, and (2) which domain has been least recently used. I also decided to keep track of sizes and times throughout the data structure, rather than just in the records.
Here’s the new cache data structure:
#[derive(Debug)] pub struct BetterCache { /// Cached records, indexed by domain name. entries: HashMap<DomainName, CachedDomainRecords>, /// Priority queue of domain names ordered by access times. /// /// When the cache is full and there are no expired records to /// prune, domains will instead be pruned in LRU order. /// /// INVARIANT: the domains in here are exactly the domains in /// `entries`. access_priority: PriorityQueue<DomainName, Reverse<Instant>>, /// Priority queue of domain names ordered by expiry time. /// /// When the cache is pruned, expired records are removed first. /// /// INVARIANT: the domains in here are exactly the domains in /// `entries`. expiry_priority: PriorityQueue<DomainName, Reverse<Instant>>, /// The number of records in the cache. /// /// INVARIANT: this is the sum of the `size` fields of the /// entries. current_size: usize, /// The desired maximum number of records in the cache. desired_size: usize, } #[derive(Debug)] struct CachedDomainRecords { /// The time this record was last read at. last_read: Instant, /// When the next RR expires. /// /// INVARIANT: this is the minimum of the expiry times of the RRs. next_expiry: Instant, /// How many records there are. /// /// INVARIANT: this is the sum of the vector lengths in `records`. size: usize, /// The records, further divided by record type. /// /// INVARIANT: the `RecordType` and `RecordTypeWithData` match. records: HashMap<RecordType, Vec<(RecordTypeWithData, RecordClass, Instant)>>, } impl BetterCache { pub fn new() -> Self { Self::with_desired_size(512) } pub fn with_desired_size(desired_size: usize) -> Self { if desired_size == 0 { panic!("cannot create a zero-size cache"); } Self { // `desired_size / 2` is a compromise: most domains will // have more than one record, so `desired_size` would be // too big for the `entries`. entries: HashMap::with_capacity(desired_size / 2), access_priority: PriorityQueue::with_capacity(desired_size), expiry_priority: PriorityQueue::with_capacity(desired_size), current_size: 0, desired_size, } }
There are some invariants there in the comments. I’d prefer not to have those, but I don’t think there’s any getting around it given that we want better than linear time eviction.
This is substantially more complex than the SimpleCache
, and the operations we’re about to define on it are too. Make sure this all makes sense before continuing. In particular, you might notice that I’ve opted to have the LRU eviction expire entire domain names, rather than individual records within them.
Let’s go through the new operations in order of complexity: querying, eviction, and insertion.
This isn’t too bad:
/// Get an entry from the cache. /// /// The TTL in the returned `ResourceRecord` is relative to the /// current time - not when the record was inserted into the /// cache. /// /// This entry may have expired: if so, the TTL will be 0. /// Consumers MUST check this before using the record! pub fn get( &mut self, name: &DomainName, qtype: &QueryType, qclass: &QueryClass, ) -> Vec<ResourceRecord> { if let Some(entry) = self.entries.get_mut(name) { let now = Instant::now(); let mut rrs = Vec::new(); match qtype { QueryType::Wildcard => { for tuples in entry.records.values() { to_rrs(name, qclass, now, tuples, &mut rrs); } } QueryType::Record(rtype) => { if let Some(tuples) = entry.records.get(rtype) { to_rrs(name, qclass, now, tuples, &mut rrs); } } } if !rrs.is_empty() { entry.last_read = now; self.access_priority .change_priority(name, Reverse(entry.last_read)); } rrs } else { Vec::new() } } }
This is quite similar to what we had before. Sure, the extra layer of indirection adds a tad more complication, and there’s now a write operation in here (updating last_read
and access_priority
, which takes log time), but other than that nothing complex.
The to_rrs
function just exists to prevent some code duplication:
/// Helper for `get_without_checking_expiration`: converts the cached /// record tuples into RRs. fn to_rrs( name: &DomainName, qclass: &QueryClass, now: Instant, tuples: &[(RecordTypeWithData, RecordClass, Instant)], rrs: &mut Vec<ResourceRecord>, ) { for (rtype, rclass, expires) in tuples { if rclass.matches(qclass) { rrs.push(ResourceRecord { name: name.clone(), rtype: rtype.clone(), rclass: *rclass, ttl: expires.saturating_duration_since(now), }); } } }
If you’re following along at home, put that definition outside the impl BetterCache
block.
Here’s are the simplest three functions in the entire impl
:
/// Delete all expired records, and then enough /// least-recently-used records to reduce the cache to the desired /// size. /// /// Returns the number of records deleted. pub fn prune(&mut self) -> usize { if self.current_size <= self.desired_size { return 0; } let mut pruned = self.remove_expired(); while self.current_size > self.desired_size { pruned += self.remove_least_recently_used(); } pruned } /// Helper for `prune`: deletes all records associated with the /// least recently used domain. /// /// Returns the number of records removed. fn remove_least_recently_used(&mut self) -> usize { if let Some((name, _)) = self.access_priority.pop() { self.expiry_priority.remove(&name); if let Some(entry) = self.entries.remove(&name) { let pruned = entry.size; self.current_size -= pruned; pruned } else { 0 } } else { 0 } } /// Delete all expired records. /// /// Returns the number of records deleted. pub fn remove_expired(&mut self) -> usize { let mut pruned = 0; loop { let before = pruned; pruned += self.remove_expired_step(); if before == pruned { break; } } pruned }
So simple! So straightforward! If only all my code could be like this.
prune
shrinks the cache to the desired size by removing the expired entries and then removing enough domains (in LRU order) to get below the target.
remove_least_recently_used
pops an entry from the access_priority
queue, removes it from the expiry_priority
queue (which takes log time), and deletes it from the top-level entries
map. It also updates the current_size
, and returns the number of records it just deleted.
remove_expired
is deceptively simple. It looks easy at first glance, but it’s calling this remove_expired_step
function in a loop, until no more get removed.
Removing an entire domain is easy, but removing individual records from a domain is harder:
size
of the domain will change.next_expiry
of the domain may change.current_size
and expiry_priority
fields.Additionally, the queue gives us the domain name, and there may be one or more expiring records in it (or even zero, but that would be a bug).
With all that said, here’s the implementation:
/// Helper for `remove_expired`: looks at the next-to-expire /// domain and cleans up expired records from it. This may delete /// more than one record, and may even delete the whole domain. /// /// Returns the number of records removed. fn remove_expired_step(&mut self) -> usize { if let Some((name, Reverse(expiry))) = self.expiry_priority.pop() { let now = Instant::now(); if expiry > now { self.expiry_priority.push(name, Reverse(expiry)); return 0; } if let Some(entry) = self.entries.get_mut(&name) { let mut pruned = 0; let rtypes = entry.records.keys().cloned().collect::<Vec<RecordType>>(); let mut next_expiry = None; for rtype in rtypes { if let Some(tuples) = entry.records.get_mut(&rtype) { let len = tuples.len(); tuples.retain(|(_, _, expiry)| expiry > &now); pruned += len - tuples.len(); for (_, _, expiry) in tuples { match next_expiry { None => next_expiry = Some(*expiry), Some(t) if *expiry < t => next_expiry = Some(*expiry), _ => (), } } } } entry.size -= pruned; if let Some(ne) = next_expiry { entry.next_expiry = ne; self.expiry_priority.push(name, Reverse(ne)); } else { self.entries.remove(&name); self.access_priority.remove(&name); } self.current_size -= pruned; pruned } else { self.access_priority.remove(&name); 0 } } else { 0 } }
It’s pretty complex. We could describe it in pseudocode like so:
In outline, fairly simple. In implementation, not fairly simple. Maybe someone better at Rust would be able to write this in a clearer way, but this is what I’ve got.
Incidentally, one of the bugs found by testing (by inserting randomly generated entries, pruning the expired ones, and checking the invariants) was that I had that entry.size -= pruned;
inside the for rtype in rtypes
, which means that if a domain had multiple records of different types expire at the same time, the size would be wrong.
Unfortunately, this is the most complex part. Adding a new entry to our cache involves a lot of work to maintain those invariants, especially if we also want to handle duplicate entries.
So before getting to the code, let’s think about what the behaviour should be.
CachedDomainRecords
containing just our new record.access_priority
queue.expiry_priority
queue.size
and last_read
.access_priority
queue.next_expiry
and the expiry_priority
queue, if this new record expires sooner than the current soonest.size
and the current_size
.next_expiry
and the expiry_priority
queue if the duplicate would have been the soonest record to expire.Additionally, in all cases, we need to increment the current_size
.
Got all that? Here’s the code:
/// Insert an entry into the cache. pub fn insert(&mut self, record: &ResourceRecord) { let now = Instant::now(); let rtype = record.rtype.rtype(); let expiry = Instant::now() + record.ttl; let tuple = (record.rtype.clone(), record.rclass, expiry); if let Some(entry) = self.entries.get_mut(&record.name) { if let Some(tuples) = entry.records.get_mut(&rtype) { let mut duplicate_expires_at = None; for i in 0..tuples.len() { let t = &tuples[i]; if t.0 == tuple.0 && t.1 == tuple.1 { duplicate_expires_at = Some(t.2); tuples.swap_remove(i); break; } } tuples.push(tuple); if let Some(dup_expiry) = duplicate_expires_at { entry.size -= 1; self.current_size -= 1; if dup_expiry == entry.next_expiry { let mut new_next_expiry = expiry; for (_, _, e) in tuples { if *e < new_next_expiry { new_next_expiry = *e; } } entry.next_expiry = new_next_expiry; self.expiry_priority .change_priority(&record.name, Reverse(entry.next_expiry)); } } } else { entry.records.insert(rtype, vec![tuple]); } entry.last_read = now; entry.size += 1; self.access_priority .change_priority(&record.name, Reverse(entry.last_read)); if expiry < entry.next_expiry { entry.next_expiry = expiry; self.expiry_priority .change_priority(&record.name, Reverse(entry.next_expiry)); } } else { let mut records = HashMap::new(); records.insert(rtype, vec![tuple]); let entry = CachedDomainRecords { last_read: now, next_expiry: expiry, size: 1, records, }; self.access_priority .push(record.name.clone(), Reverse(entry.last_read)); self.expiry_priority .push(record.name.clone(), Reverse(entry.next_expiry)); self.entries.insert(record.name.clone(), entry); } self.current_size += 1; }
I didn’t write this all in one go and get it right the first time. I first implemented this without the duplicate handling then, when it was working, I made it prevent duplicate records.
If you allow duplicates, the if let Some(tuples)
block becomes much simpler:
if let Some(tuples) = entry.records.get_mut(&rtype) { tuples.push(tuple); } else { entry.records.insert(rtype, vec![tuple]); }
We’ve made it—the end of the operations!
This code is pretty involved, and I’ve already said that I made at least one mistake when first writing it. So how do I know it’s correct?
Tests.
Tests, tests, tests.
I’m not going to go into the actual test code (see the source if you want that), but I will outline the cases.
The most important thing is to have a good way to generate inputs: you want distinct domains, overlapping domains, distinct types, overlapping types, overlapping but unequal records… the whole shebang. I’m generating random records, rather than trying to enumerate all the useful cases. I’m a big fan of random inputs in testing in general.
Some say “oh, but if my test is randomised it’ll be flaky: it might pass some times and fail other times!” In which case… good? If your test fails, you’ve found a bug: fix it!
Anyway, here are my test cases:
QueryType::Record(_)
and QueryClass::Record(_)
QueryType::Wildcard
and QueryClass::Record(_)
QueryType::Record(_)
and QueryClass::Wildcard
QueryType::Wildcard
and QueryClass::Wildcard
current_size
only goes up by 1, and that the invariants hold.desired_size
of 25, call prune
, and check that 25 records remain and that the invariants hold.remove_expired
, and check that 51 remain and that the invariants hold.desired_size
of 99, 49 of which have a TTL of 0, call prune
, and check that 51 remain and that the invariants hold.In most of those tests I check that the data structure invariants hold, there I:
current_size
is equal to the total number of records.entries
and the access_priority
are the same size.entries
and the expiry_priority
are the same size.next_expiry
for each domain is equal to the minimum of its records’ expiry times.access_priority
from the domains and check it’s the same as the stored one.expiry_priority
from the domains and check it’s the same as the stored one.I feel pretty confident that my tests cover a variety of different cases and sequences of operations, and that I would have found any significant bugs. There could always be subtle bugs lurking, but that’s true of all code.
I’ve opted to prune the cache in two places.
Firstly, in my actual code, this cache is inside an Arc<Mutex<_>>
, so it can be shared across threads. There’s not much point in having an unshared cache, after all. Anyway, this wrapper has some helper methods to get and insert entries, and the get helper calls remove_expired
if it fetches any expired records:
impl SharedCache { pub fn get( &self, name: &DomainName, qtype: &QueryType, qclass: &QueryClass, ) -> Vec<ResourceRecord> { let mut rrs = self.get_without_checking_expiration(name, qtype, qclass); let len = rrs.len(); rrs.retain(|rr| rr.ttl > Duration::ZERO); if rrs.len() != len { self.remove_expired(); } rrs } // ... more omitted }
Secondly, I spawn a tokio task to periodically remove expired entries, and then do additional pruning if need be:
async fn prune_cache_task(cache: SharedCache) { loop { sleep(Duration::from_secs(60 * 5)).await; let expired = cache.remove_expired(); let pruned = cache.prune(); println!( "[CACHE] expired {:?} and pruned {:?} entries", expired, pruned ); } }
It was very satisfying when I added this and first saw that [CACHE]
output with non-zero expired and pruned records.
This cache works, and it works well. I get nice and fast responses from my DNS server for queries which are wholly or partially cached, and the benchmarks I’ve written look promising:
insert/unique/1 time: [1.0965 us 1.1001 us 1.1044 us] thrpt: [905.51 Kelem/s 909.00 Kelem/s 912.01 Kelem/s] insert/unique/100 time: [115.72 us 115.96 us 116.24 us] thrpt: [860.27 Kelem/s 862.39 Kelem/s 864.15 Kelem/s] insert/unique/1000 time: [1.1769 ms 1.1787 ms 1.1807 ms] thrpt: [846.96 Kelem/s 848.36 Kelem/s 849.67 Kelem/s] insert/duplicate/1 time: [1.1927 us 1.1964 us 1.2003 us] thrpt: [833.13 Kelem/s 835.86 Kelem/s 838.44 Kelem/s] insert/duplicate/100 time: [56.880 us 57.047 us 57.221 us] thrpt: [1.7476 Melem/s 1.7529 Melem/s 1.7581 Melem/s] insert/duplicate/1000 time: [541.33 us 542.10 us 542.93 us] thrpt: [1.8419 Melem/s 1.8447 Melem/s 1.8473 Melem/s] get_without_checking_expiration/hit/1 time: [1.4057 us 1.4249 us 1.4425 us] thrpt: [693.22 Kelem/s 701.81 Kelem/s 711.40 Kelem/s] get_without_checking_expiration/hit/100 time: [84.651 us 84.999 us 85.322 us] thrpt: [1.1720 Melem/s 1.1765 Melem/s 1.1813 Melem/s] get_without_checking_expiration/hit/1000 time: [991.64 us 997.89 us 1.0030 ms] thrpt: [996.98 Kelem/s 1.0021 Melem/s 1.0084 Melem/s] get_without_checking_expiration/miss/1 time: [948.17 ns 961.92 ns 974.39 ns] thrpt: [1.0263 Melem/s 1.0396 Melem/s 1.0547 Melem/s] get_without_checking_expiration/miss/100 time: [45.399 us 46.116 us 46.671 us] thrpt: [2.1426 Melem/s 2.1684 Melem/s 2.2027 Melem/s] get_without_checking_expiration/miss/1000 time: [570.42 us 577.92 us 583.75 us] thrpt: [1.7131 Melem/s 1.7303 Melem/s 1.7531 Melem/s] remove_expired/1 time: [1.2796 us 1.2983 us 1.3151 us] thrpt: [760.38 Kelem/s 770.26 Kelem/s 781.52 Kelem/s] remove_expired/100 time: [55.622 us 56.761 us 57.895 us] thrpt: [1.7273 Melem/s 1.7618 Melem/s 1.7978 Melem/s] remove_expired/1000 time: [786.47 us 794.30 us 800.89 us] thrpt: [1.2486 Melem/s 1.2590 Melem/s 1.2715 Melem/s] prune/1 time: [1.3455 us 1.3539 us 1.3617 us] thrpt: [734.36 Kelem/s 738.63 Kelem/s 743.24 Kelem/s] prune/100 time: [41.584 us 41.676 us 41.774 us] thrpt: [2.3938 Melem/s 2.3995 Melem/s 2.4048 Melem/s] prune/1000 time: [613.73 us 617.63 us 620.87 us] thrpt: [1.6106 Melem/s 1.6191 Melem/s 1.6294 Melem/s]
But could it be better?
The only optimisation that really comes to mind is using a trie instead of the HashMap
for domains. Another possibility is turning it into a more generic size-bounded-LRU-cache-with-expiration data structure with type parameters, and so making the DNS usage just a specialisation of that; perhaps genericising the code would make it easier to see improvements.
But nothing needs to be done, it works pretty well as it is. When I start using my DNS server for my LAN, and it starts to get much more traffic than my test instance, I’m sure performance problems will start to crop up, but hopefully they won’t be with this cache.
Not just “perhaps”: this is more-or-less copied straight from my original code.↩︎
Least Recently Used↩︎
From the priority-queue crate. I started out trying to build something on top of std::collections::BinaryHeap
directly, but didn’t get very far.↩︎
Notes:
I’ve developed a system to organise myself and make sure I get around to doing the things I need to do. Like my personal finance system, this system has evolved over the years based on what’s made noticeable improvements to my life, and this memo describes what I currently put into practice, and not some aspirational system I can only hope to achieve
I’ve worked at a few different programming jobs now, and I’ve noticed programmers on small agile teams tend to adopt a process like this:
There’s a Trello board with various lists: some are lists for work not yet started, some are for work in progress, and one is for work which is done.
We regularly review the lists: to make sure work is progressing, to make sure we don’t have too much in progress at once, and to decide what to pick up next.
Once every week or two, there’s a more in-depth review in which new work gets prioritised, and old work might be changed or removed.
I think this works really well. So I use this system to manage my life as well.
Everything I need to do, which isn’t captured elsewhere (like in my email inbox, or in a GitHub issue), goes on the Trello board. And if it’s a particularly important email or GitHub issue I want to make sure I don’t forget about, and I can’t deal with it straight away, I’ll make a card for it anyway.
I used to use org-mode for this, but I’ve found I personally am more productive with Trello. org-mode is very powerful, and so I felt I needed to tinker with it to come up with a perfect system, which in practice just meant I spent more time fiddling with how I tracked things than actually doing things. Trello is much more limited.
It’s also nice to have the board visualisation, so I can see the state of everything at a glance.
The rest of my process is now more complex than it was when I first started, but even just having “To Do”, “Doing”, “Waiting / Blocked”, and “Done” lists was a game-changer. I would have got my Ph.D corrections done without Trello (because I had to), but it would have been much more difficult.
Each task is in one of these states:
Routines—regular, time-based, tasks.
Has Prerequisites—things I can’t do until I do something else.
Nice To Have—things which would be nice to do, but aren’t particularly important or urgent.
Priority—important, but non-urgent, things.
Near Future—tasks regularly taken from Nice To Have and Priority which I’ve decided to get done soon.
Doing—things I’m actively doing at the moment (this list is very small).
Waiting / Blocked—things where I’ve done my part, and need to wait on something or someone else.
Done—things I’ve done.
Each of these is a list on my Trello board.
I used to pick up tasks from Nice To Have and Priority as I felt like it. But one day I realised that some of the Priority tasks had been there for over a year. They were important, but I was putting them off, and still feeling good about myself because I was getting through lots of unimportant tasks instead.
This is standard procrastination behaviour.
So I decided to have a fortnightly “sprint planning” session where I would prioritise a small number of tasks, ensuring that nothing got neglected too much. I moved these into a This Sprint list, which I aimed to get through in that fortnight.
But, actually, I still found myself slipping a lot. The separate list of short-term priorities definitely helped out, but some things hung around in there for a while, or got moved back into a different list.
So I renamed it to Near Future. It does the same thing as This Sprint did, but it’s more honest.
I track routines as recurring events in Google Calendar, another integral component of my self-organisation system.
Those routines which have multiple things to do (like my chores) have a corresponding card in the Routines list on my Trello board. When the time to do the routine arrives I move a copy of the template card to Doing and work through its checklist. Those which just have a single task, like “Near Future prioritisation”, just have a calendar entry.
Routines which I’ve got cards for are:
Weekly chores—household maintenance (cleaning, laundry, etc) and checking my ledger is up to date.
Prepare game—preparation for my regular RPG sessions (currently Ars Magica and Traveller).
Monthly chores—more intense household maintenance (cleaning the oven, emptying the hoover, etc), updating computers, and reviewing the lists.
Quarterly chores—updating my CV and website, and checking my credit report.
Annual chores—reviewing my habits and preparing for the next year.
Routines on my calendar which I don’t have cards for are:
Near Future prioritisation—every Sunday, review and move cards into and out of the Near Future list.
Write up session notes—after my RPG sessions, write up notes from the game I prepared and ran.
Weeknotes—every Sunday evening, write my weeknotes.
I find having the Trello cards, rather than just putting the steps to be done in the calendar event, helps. I like being able to look at the calendar and see when everything needs to be done; but I prefer the Trello card interface for text and checkboxes. It’s nice having everything in its place: temporal data on a calendar, procedural data in a card.
The final component of my self-organisation system is a whiteboard and some pens.
I primarily use this to note down my shopping list, as it’s much easier to grab a pen and scribble something down than to open Trello and add a comment to a card.
I also use the whiteboard for meal planning. I have a simple 5-week calendar on the board, with rows labelled “Monday” to “Sunday” and columns labelled “1” to “5”, and the first cell of each column labelled with the day number. I tend to cook large portions of meals so I can freeze some of it for later, and I note down meals for future days as my freezer fills up. This has all but eliminated getting to meal time and realising I have nothing in, which has cut down on takeaways and helped to optimise my food budget.
]]>The gist of it is that this snippet of code:
mask $ \restore -> do putMVar var x ...
behaves differently to this snippet of code:
mask $ \restore -> do restore $ putMVar var x ...
in the presence of asynchronous exceptions. The post goes on to explain what the different behaviours are and why they crop up; but thinking about concurrency is too much like effort, let’s turn to dejafu!
In this test case, I want to see
putMVar var x
is interrupted by an asynchronous exception; and...
bit of code gets executedSo the actual test case is a bit more complex than just the snippet above. We’re going to need three threads:
thread1 = mask $ \restore -> catch (putMVar var "hello world" >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) thread2 = putMVar var "interrupted!" thread3 = killThread thread1
Putting it together into an actual test case, we get:
import Control.Concurrent.Classy import Control.Exception (SomeException) example1 :: MonadConc m => m (String, Bool) example1 = do var <- newEmptyMVar success <- newEmptyMVar interruptMe <- newEmptyMVar tid <- fork $ mask $ \_ -> do putMVar interruptMe () catch (putMVar var "hello world" >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) -- wait for the thread to be inside the `mask`, then fork a thread -- to race on the `putMVar` and also throw an async exception. takeMVar interruptMe _ <- fork $ putMVar var "interrupted!" killThread tid (,) <$> readMVar var <*> readMVar success
There’s a little extra ceremony involved in making sure that the race happens after the mask
—we need a new interruptMe
MVar
—but other than that it’s fairly straightforward.
dejafu finds two behaviours for this example, and gives abbreviated execution traces:
> autocheck example1 [pass] Successful [fail] Deterministic ("hello world",True) S0-----S1--------S0------ ("interrupted!",False) S0-----S1---P0---S2--S1-S0---S1---S0-- False
Here’s our new test case:
import Control.Concurrent.Classy import Control.Exception (SomeException) example2 :: MonadConc m => m (String, Bool) example2 = do interruptMe <- newEmptyMVar var <- newEmptyMVar success <- newEmptyMVar tid <- fork $ mask $ \restore -> do putMVar interruptMe () catch (restore (putMVar var "hello world") >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) -- wait for the thread to be inside the `mask`, then fork a thread -- to race on the `putMVar` and also throw an async exception. takeMVar interruptMe _ <- fork $ putMVar var "interrupted!" killThread tid (,) <$> readMVar var <*> readMVar success
Lo and behold, dejafu finds a third behaviour:
> autocheck example2 [pass] Successful [fail] Deterministic ("hello world",True) S0-----S1-----------S0------ ("hello world",False) S0-----S1-----P0-----S1---S0-- ("interrupted!",False) S0-----S1----P0----S1---S2--S0--- False
So it seems that we can now end up in the situation where the putMVar var "hello world"
does happen, but after writing to the MVar
the asynchronous exception is delivered and so we hit the putMVar success False
case.
Weird, right?
We can get the actual execution trace for the new case with a lower-level function in dejafu, runSCT
. Digging through it, we can find the pre-emption of thread 1 (the first thread forked) by thread 0 (the main thread):
(SwitchTo main, [(1, WillResetMasking True MaskedInterruptible)], TakeMVar 1 [])
This says that we switched to the main thread, and it performed a takeMVar
operation. And furthermore, that thread 1 will next reset the masking state back to MaskedInterruptible
.
Now the issue becomes clear. The problematic snippet:
mask $ \restore -> do restore $ putMVar var x ...
Actually means to perform these steps:
MaskedInterruptible
Unmasked
putMVar var x
MaskedInterruptible
...
The issue is that completing the putMVar var x
call and resetting the masking state are two operations. That’s not atomic. So there is a chance that an exception can be delivered between them.
And that’s the issue explained in It’s not a no-op to unmask an interruptible operation, replicated with dejafu.
]]>I’m a big fan of configuration-as-code, and when I was exposed to Concourse CI at work, which does everything through configuration files and environment variables, I decided to replace my Jenkins set-up and migrate some of my Travis projects as a learning experience.
Eventually I ended up with Concourse doing continuous deployment, and Travis solely for continuous integration. This worked well, until the future of the free-for-open-source Travis became uncertain, and I decided to move away.
As luck would have it, we were discussing using GitHub Actions for CI at work at the time. I decided to switch to Actions as another learning experience.
Now I have GitHub Actions for CI on pull requests (PRs), and Concourse for CD of master branches. It works pretty well.
This memo talks through my practices, using this blog and dejafu as running examples. I’ll also cover how I run Concourse on NixOS, other related tools I use, and what my plans for future work are.
GitHub Actions is GitHub’s hosted CI/CD tool. It’s got good support for both official and community-maintained Actions (which are Docker images conforming to a simple specification), is as well-integrated into the rest of GitHub as you’d expect, and has a config file syntax not entirely unlike Travis.
Currently I’m inconsistent across my repos whether I require Actions to pass before a commit can make it into master. I tend to have that for my Haskell packages, because master gets deployed to Hackage, but allow pushing straight to master for other things.
This is fairly typical of my Python projects: I have two jobs
, which show up as two separate checks with their own logs in a PR, one to check for linting errors and one to check that the dependencies all install.
I’ve found that pip doesn’t have the most robust dependency solver, and can sometimes get confused and install mutually incompatible versions of packages. So for any PR which upgrades the dependencies, I like to ensure that the freeze file has a consistent set of versions.
If I wrote tests they would solve this problem too. But I don’t.
This is rather more complicated. I want to build the code and run the tests against all the supported versions of GHC, but for linting and doctests I just want to use the latest version. And I want the linting, doctests, and each of the main tests to run as separate jobs. This makes them run in parallel, and means that a failure in one doesn’t prevent the rest from running.
Like Travis, GitHub Actions supports matrix builds. The strategy
part of the configuration means “run this job with each of these options; and don’t kill the rest if one fails”:
strategy: fail-fast: false matrix: resolver: - lts-9.0 # ghc-8.0 - lts-10.0 # ghc-8.2 - lts-12.0 # ghc-8.4 - lts-13.3 # ghc-8.6 - lts-15.0 # ghc-8.8 - lts-17.0 # ghc-8.10
Another nice feature of GitHub Actions is that the documentation is well-written and easy to follow. Just about every option has a short example.
Concourse CI is an opinionated “continuous thing-doer”. Everything is containerised and pure. No state is shared between jobs without you explicitly managing it, in the form of a “resource” (like a git remote, or an S3 bucket).
This was a big change when I came from Jenkins, which is just about as impure as you can get, but I’ve become a big fan of it. It makes jobs (potentially) reproducible, as they only depend on their inputs and on the pipeline configuration. You can have nondeterminism in your configuration, but you can’t get into trouble because of a previous build leaving things in a weird state.
I currently have 16 Concourse pipelines deploying a variety of things:
This is another fairly typical pipeline, all of my static websites look largely like this. The one unusual feature is that it builds a Docker image: I need a few dependencies to deploy this site, like pandoc, so rather than install them on every deploy I build an image.
The deploy uses a custom rsync-resource
that I took from somewhere and slightly tweaked. It also uses ((secrets))
in a few places.
The configuration is rather more verbose than GitHub Actions. It is doing more, but it also requires more to be spelled out. This can make large pipelines a bit difficult to read.
This is significantly more complicated. dejafu is a monorepo containing four Haskell packages and one set of tests, so this pipeline has jobs for testing & releasing each of those packages, as well as a job to run a nightly build when Stackage updates.
I use YAML anchors to reduce the repetition, which helps a bit, but it’s still a pretty long file.
This pipeline shows off Concourse’s task dependencies. All builds are triggered by a “resource” changing, but a job can specify that it should only be called for resources which passed a previous job.
For example, the release-concurrency
job will be triggered by changes to the concurrency-cabal-git
resource, but only after they pass the test-concurrency
job:
- name: test-concurrency plan: - get: concurrency-cabal-git trigger: true - task: build-and-test input_mapping: source-git: concurrency-cabal-git config: <<: *task-build-and-test - name: release-concurrency plan: - get: concurrency-cabal-git trigger: true passed: - test-concurrency - task: prerelease-check params: PACKAGE: concurrency input_mapping: source-git: concurrency-cabal-git config: <<: *task-prerelease-check - task: release params: PACKAGE: concurrency input_mapping: source-git: concurrency-cabal-git config: <<: *task-release
These dependencies are what make up the visualisation in the screenshot above.
Dependabot is a handy little tool for automatically checking if you have any outdated dependencies, for a variety of ecosystems, and opening a PR to update them. It’s another tool we use at work (spotting a pattern?), but I didn’t pick this up to learn anything: it’s so simple there’s nothing really to learn, and its utility far outweighs the small configuration file you might want to write.
This is one of my more complex Dependabot config files, which should hopefully convince you of how straightforward it is. It specifies I want PRs to update any official or community Actions, Dockerfile base images, or pip dependencies, that I’m using. And I want it to check daily (at 5AM UTC by default).
That’s it!
Unlike the other cases, this time dejafu has a simpler configuration than the blog. Dependabot doesn’t support Haskell, so all it’s doing is ensuring any Actions I’m using are kept up to date.
Since my Haskell packages are on Stackage, the Stackage maintainers let me know if I need to update a dependency.
I don’t make a practice of needing secrets to build or run code in my public repos, so I don’t need to give GitHub Actions any secrets. It’s supported though, you can have both organisation-level and repository-level secrets.
My Concourse pipelines, however, do regularly need secrets. The password for my private Docker registry; the password to upload Haskell packages to Hackage; the SSH key to deploy this blog; and more!
Concourse has support for a few secret stores. I’m using the AWS SSM integration, mostly because it’s incredibly cheap, and means I don’t have to host and secure anything myself. It works well, I just need to set some environment variables giving Concourse an AWS access key hooked up to an IP-restricted policy granting SSM and KMS permissions. Almost no effort at all to set up if you already have an AWS account.
NixOS is my Linux distribution of choice and, while it has packages for many things, it does not have one for Concourse. However, there is an official docker image for Concourse.
I’ve got a systemd unit running Concourse in docker-compose:
systemd.services.concourse = let yaml = import ./concourse.docker-compose.nix { httpPort = concourseHttpPort; githubClientId = fileContents /etc/nixos/secrets/concourse-clientid.txt; githubClientSecret = fileContents /etc/nixos/secrets/concourse-clientsecret.txt; enableSSM = true; ssmAccessKey = fileContents /etc/nixos/secrets/concourse-ssm-access-key.txt; ssmSecretKey = fileContents /etc/nixos/secrets/concourse-ssm-secret-key.txt; }; dockerComposeFile = pkgs.writeText "docker-compose.yml" yaml; in { enable = true; wantedBy = [ "multi-user.target" ]; requires = [ "docker.service" ]; environment = { COMPOSE_PROJECT_NAME = "concourse"; }; serviceConfig = { ExecStart = "${pkgs.docker_compose}/bin/docker-compose -f '${dockerComposeFile}' up"; ExecStop = "${pkgs.docker_compose}/bin/docker-compose -f '${dockerComposeFile}' stop"; Restart = "always"; }; };
Where the concourse.docker-compose.nix file is just some templated YAML. I’ve heard that you shouldn’t use systemd units to run Docker containers, for some reason, but it works and I run a few different services on a bunch of servers like this. Running Concourse in Docker also makes it easy to upgrade to a newer version, without needing to wait for an official package to be updated.
I’m pretty happy with how things are working right now. Until recently I didn’t have Concourse secrets set up, and I was handling secrets by doing variable interpolation in my pipeline deployment script, and also I’d written everything in jsonnet for some reason. Setting up secrets, just using YAML, and removing the deployment script simplified things a lot.
I see GitHub advertising code scanning to me in all of my repositories, so maybe I’ll look into that next. I’m a big fan of static analysis, so having something which automatically scans my code for issues is very attractive.
The main thing I don’t have continuous deployment for is my NixOS configuration. I SSH into servers, run git pull && sudo nixos-rebuild switch
like some sort of caveman! But automatically deploying that makes me a bit nervous, what if it goes wrong? Still, I switched to automatic updates recently, and nothing has broken yet, so maybe automatic configuration deployments are fine too.
Lockdowns have come and gone, restrictions have changed frequently and unexpectedly, and so I’ve lived the last 12 months as a hermit. Since that final day in the office, one year ago today, I’ve only left my flat once or twice a week, and that only to go shopping.
There is a vaccine now but, judging from the timeline, I’ll still be at home for a few more months.
It feels a bit selfish to type this, but frankly I’ve been having a great time:
My sleep has improved. The lack of commute means I get an extra hour or so to lie in bed.
I’ve saved money. Partly due to the lack of commute, but also due to not going out to buy lunch. Even one or two lunches a week add up.
I’ve been reading more. I now have more energy in the evenings after the work day ends, so I’ve got back into the habit of reading before bed. And over 2020 I read 99 books.
I can cook whenever I want. I used to get hungry in the afternoons almost every day. One day a thought hit me: if I’m at home all the time now, I can cook a proper meal for lunch! And so I switched to having my main meal of the day for lunch, and a smaller meal in the evening.
I’ve started a second RPG group. I did get a bit bored after a couple of months, and so I reached out to some online friends to see if anyone wanted to play games. I’ve now got a group which has been going strong since May, and I’ve deepened those friendships.
I’m not in an open office any more. I don’t like open office layouts. I always feel like someone is peering over my shoulder and watching my screen. It’s not an issue with just my current job, it’s been an issue everywhere. At home, I know there is nobody watching me, and I feel much more relaxed, even when I’m not slacking off.
Of course, there have been a handful of downsides too:
I’ve not seen any friends. I’ve got a small group of friends who meet up a couple of times a year, and we’ve missed a few of those meetings. We’ve made do with Zoom calls, but it’s not the same.
I’ve not seen any family. I normally only visit home at Christmas, and Christmas got cancelled.
I came down with shingles. Not very fun, possibly caused by stress. I’ve got a few small scars on my forehead which, now that it’s been nearly 6 months, will likely not heal. However, other than that one week of illness, my health has been great.
But the upsides definitely outweigh these. I was already only physically meeting friends and family three or four times a year, so missing one year isn’t a huge change. It’s not like I’ve gone from hanging out with people at the pub every week to never seeing anyone.
The weirdest part of the past year, by far, has been the discovery that a significant number of people just cannot cope with being alone, and break down after spending even a fortnight by themselves.
I was regularly spending weeks by myself even before covid!
It makes some sense though. I fill my time with reading, programming, playing RPGs, and socialising with online friends. Most people don’t do any of those to any significant degree (or at all). If everything you do for fun requires the physical presence of other people, the past year will have been tough.
I also have appropriate desk space, and don’t have noisy children or housemates. Being a loner with a nice flat during lockdown is life in easy mode.
I’m sure I will have to return to the office at some point, but I’ll fully enjoy being at home until then.
]]>