Modeling impact of LLMs on Developer Experience.

In How should you adopt Large Language Models? (LLMs), we considered how LLMs might impact a company’s developer experience. To support that exploration, I’ve developed a system model of the developing software at the company.

In this chapter, we’ll work through:

  1. Summary results from this model
  2. How the model was developed, both sketching and building the model in a spreadsheet. (As discussed in the overview of systems modeling, I generally would recommend against using spreadsheets to develop most models, but it’s educational to attempt doing so once or twice.)
  3. Exercise the model to see what it has to teach us

Let’s get into it.


This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Learnings

This model’s insights can be summarized in three charts. First, the baseline chart, which shows an eventual equilibrium between errors discovered in production and tickets that we’ve closed by shipping to production. This equilibrium is visible because tickets continue to get opened, but the total number of closed tickets stop increasing.

Chart showing systems modeling

Second, we show that we can shift that equilibrium by reducing the error rate in production. Specifically, the first chart models 25% of closed tickets in production experiencing an error, whereas the second chart models only a 10% error rate. The equilibrium returns, but at a higher value of shipped tickets.

Chart showing systems modeling

Finally, we can see that even tripling the rate that we start and test tickets doesn’t meaningfully change the total number of completed tickets, as modeled in this third chart.

Chart showing systems modeling

The constraint on this system is errors discovered in production, and any technique that changes something else doesn’t make much of an impact. Of course, this is just a model, not reality. There are many nuances that models miss, but this helps us focus on what probably matters the most, and in particular highlights that any approach that increases development velocity while also increasing production error rate is likely net-negative.

With that summary out of the way, now we can get into developing the model itself.

Sketch

Modeling in a spreadsheet is labor intensive, so we want to iterate as much as possible in the sketching phase, before we move to the spreadsheet. In this case, we’re working with Excalidraw.

Systems model with five stages of development, with numerous lines where discovered errors require moving backwards in flow.

I sketched five stocks to represent a developer’s workflow:

  1. Open Tickets is tickets opened for an engineer to work on
  2. Start Coding is tickets that an engineer is working on
  3. Tested Code is tickets that have been tested
  4. Deployed Code is tickets than have been deployed
  5. Closed Ticket is tickets that are closed after reaching production

There are four flows representing tickets progressing through this development process from left to right. Additionally, there are three exception flows that move from right to left:

  1. Testing found error represents a ticket where testing finds an error, moving the ticket backwards to Start Coding
  2. Deployment exposed error represents a ticket encountering an error during deployment, where it’s moved backwards to Start Coding
  3. Error found in production represents a ticket encountering a production error, which causes it to move all the way back to the beginning as a new ticket

One of your first concerns seeing this model might be that it’s embarassingly simple. To be honest, that was my reaction when I first looked at it, too. However, it’s important to recognize that feeling and then dig into whether it matters.

This model is quite simple, but in the next section we’ll find that it reveals several counter-intuitive insights into the problem that will help us avoid erroneously viewing the tooling as a failure if time spend testing increases. The value of a model is in refining our thinking, and simple models are usually more effective as refining thinking across a group than complex models, simply because complex models are fairly difficult to align a group around.

Reason

As we start to look at this sketch, the first question to ask is how might LLM-based tooling show an improvement? The most obvious options are:

  1. Increasing the rate that tasks flow from Starting coding to Tested code. Presumably these tools might reduce the amount of time spent on implementation.

  2. Increasing the rate that Tested code follows Testing found errors to return to Starting code because more comprehensive tests are more likely to detect errors. This is probably the first interesting learning from this model: if the adopted tool works well, it’s likely that we’ll spend more time in the testing loop, with a long-term payoff of spending less time solving problems in production where it’s more expensive. This means that slower testing might be a successful outcome rather than a failure as it might first appear.

    A skeptic of these tools might argue the opposite, that LLM-based tooling will cause more issues to be identified “late” after deployment rather than early in the testing phase. In either case, we now have a clear goal to measure to evaluate the effectiveness of the tool: reducing the Error found in production flow. We also know not to focus on the Testing found error flow, which should probably increase.

  3. Finally, we can also zoom out and measure the overall time from Start Coding to Closed Ticket for tasks that don’t experience the Error found in production flow for at least the first 90 days after being completed.

These observations capture what I find remarkable about systems modeling: even a very simple model can expose counter-intuitive insights. In particular, the sort of insights that build conviction to push back on places where intuition might lead you astray.

Model

For this model, we’ll be modeling it directly in a spreadsheet, specifically Google Sheets. The completed spreadsheet model is available here. As discussed in Systems modeling to refine strategy, spreadsheet modeling is brittle, slow and hard to iterate on. I generally recommend that folks attempt to model something in a spreadsheet to get an intuitive sense of the math happening in their models, but I would almost always choose any tool other than a spreadsheet for a complex model.

This example is fairly tedious to follow, and you’re entirely excused if you decide to pull open the sheet itself, look around a bit, and then skip the remainder of this section. If you are hanging around, it’s time to get started.

The spreadsheet we’re creating has three important worksheets:

  • Model represents the model itself
  • Charts holds charts of the model
  • Config holds configuration values seperately from the model to ease exercising the model after we’ve built it

Going to the model worksheet, we want to start out by initializing each of the columns to the starting value.

Screenshot of spreadsheet showing initial values of a systems model

While we’ll use formulae for subsequent rows, the first row should contain literal values. I often start with a positive value in the first column and zeros in the other columns, but that isn’t required. You can start with whatever starting values are more useful for studying the model that you’re building.

With the initial values set, we’re now going to implement the model in two passes. First, we’ll model the left-to-right flows, which represent the standard development process. Second, we’ll model the right-to-left flows, which represent exceptions in the process.

Modeling left-to-right

We’ll start by modeling the interaction between the first two nodes: Open Tickets and Started Coding. We want to have open tickets increased over time at a fixed rate, so let’s add a value in the config worksheet for TicketOpenRate, starting with 1.

Moving to the second stock, we want to start work on open tickets as long as we have at most MaxConcurrentCodingNum open tickets. If we have more than MaxConcurrentCodingNum tickets that we’re working on, then we don’t start working on any new tickets. To do this, we actually need to create an intermediate value (represented using an italics column name) to determine how many should be created by checking if the current in started tickets is at maximum (another value in the config sheet) or if we should increment that by one.

That looks like:

// Config!$B$3 is max started tickets
// Config!$B$2 is rate to increment started tickets
// $ before a row or column, e.g. $B$3 means that the row or column
// always stays the same -- not incrementing -- even when filled
// to other cells
= IF(C2 >= Config!$B$3, 0, Config!$B$2)

This also means that our first column, for Open Tickets is decremented by the number of tickets that we’re started coding:

// This is the definition of `Open Tickets`
=A2 + Config!$B$1 - B2

Leaving us with these values.

Screenshot of spreadsheet showing three columns of systems modeling

Now we want to determine the number of tickets being tested at each step in the model. To do this, we create a calculation column, NumToTest? which is defined as:

// Config$B$4 is the rate we can start testing tickets
// Note that we can only start testing tickets if there are tickets
// in `Started Coding` that we're able to start testing
=MIN(Config!$B$4, C3)

We then add that value to the previous number of tickets being tested.

// E2 is prior size of the Tested Code stock
// D3 is the value of `NumToTest?`
// F2 is the number of tested tickets to deploy
=E2 + D3 - F2

Screenshot of spreadsheet showing three columns of systems modeling

Moving on to deploying code, let’s keep things simple and start out by assuming that every tested change is going to get deployed. That means the calculation for NumToDeploy? is quite simple:

// E3 is the number of tested changes
=E3

Then the value for the Deployed Code stock is simple as well:

// G2 is the prior size of Deployed Code
// F3 is NumToDeploy?
// H2 is the number of deployed changes in prior round
=G2+F3-H2

Screenshot of spreadsheet showing three columns of systems modeling

Now we’re on to the final stock. We add the NumToClose? calculation, which assumes that all deployed changes are now closed.

// G3 is the number of deployed changes
=G3

This makes the calculation for the Closed Tickets stock:

// I2 is the prior value of Closed Tickets
// H3 is the NumToClose?
=I2 + H3

With that, we’ve now modeled the entire left-to-right flows.

Screenshot of spreadsheet showing three columns of systems modeling

The left-to-right flows are simple, with a few constrained flows and a very scalable flows, but overall we see things progressing through the pipeline evenly. All that is about to change!

Modeling right-to-left

We’ve now finished modeling the happy path from left to right. Next we need to model all the exception paths where things flow right to left. For example, an issue found in production would cause a flow from Closed Ticket back to Open Ticket. This tends to be where models get interesting.

There are three right-to-left flows that we need to model:

  1. Closed Ticket to Open Ticket represents a bug discovered in production.
  2. Deployed Code to Start Coding represents a bug discovered during deployment. 3 Tested Code to Start Coding represents a bug discovered in testing.

To start, we’re going to add configurations defining the rates of those flows. These are going to be percentage flows, with a certain percentage of the target stock triggering the error condition rather than proceeding. For example, perhaps 25% of the Closed Tickets are discovered to have a bug each round.

Screenshot of spreadsheet showing three columns of systems modeling

These are fine starter values, and we’ll experiment with how adjusting them changes the model in the Exercise section below.

Now we’ll start by modeling errors discovered in production, by adding a column to model the flow from Closed Tickets to Open Tickets, the ErrorsFoundInProd? column.

// I3 is the number of Closed Tickets
// Config!$B$5 is the rate of errors
=FLOOR(I3 * Config!$B$5)

Note the usage of FLOOR to avoid moving partial tickets. Feel free to skip that entirely if you’re comfortable with the concept of fractional tickets, fractional deploys, and so on. This is an aesthetic consideration, and generally only impacts your model if you choose overly small starting values.

This means that our calculation for Closed Ticket needs to be updated as well to reduce by the prior row’s result for ErrorsFoundInProd?:

// I2 is the prior value of ClosedTicket
// H3 is the current value of NumToClose?
// J2 is the prior value of ErrorsFoundInProd?
=I2 + H3 - J2

We’re not quite done, because we also need to add the prior row’s value of ErrorsInProd? into Open Tickets, which represents the errors’ flow from closed to open tickets. Based on this change, the calculation for Open Tickets becomes:

// A2 is the prior value of Open Tickets
// Config!$B$1 is the base rate of ticket opening
// B2 is prior row's StartCodingMore?
// J2 is prior row's ErrorsFoundInProd?
=A2 + Config!$B$1 - B2 + J2

Now we have the full errors in production flow represented in our model.

Screenshot of spreadsheet showing three columns of systems modeling

Next, it’s time to add the Deployed Code to Start Coding flow. Start by adding the ErrorsFoundInProd? calculation:

// G3 is deployed code
// Config!$B$6 is deployed error rate
=FLOOR(G3 * Config!$B$6)

Then we need to update the calculation for Deployed Code to decrease by the calculated value in ErrorsFoundInProd?:

// G2 is the prior value of Deployed Code
// F3 is NumToDeploy?
// H2 is prior row's NumToClose?
// I2 is ErrorsFoundInDeploy?
=G2 + F3 - H2 - I2

Finally, we need to increase the size of Started Coding by the same value, representing the flow of errors discovered in deployment:

// C2 is the prior value of Started Coding
// B3 is StartCodingMore?
// D2 is prior value of NumToTest?
// I2 is prior value of ErrorsFoundInDeploy?
=C2 + B3 - D2 + I2

We now have the working flow representing errors in production.

Screenshot of spreadsheet showing three columns of systems modeling

Finally, we can added the Tested Code to Started Coding flow. This is pretty much the same as the prior flow we added, starting with adding a ErrorsFoundInTest? calculation:

// E3 is tested code
// Config!$B$7 is the testing error rate
=FLOOR(E3 * Config!$B$7)

Then we update Tested Code to reduce by this value:

// E2 is prior value of Tested Code
// D3 is NumToTest?
// G2 is prior value of NumToDeploy?
// F2 is prior value of ErrorsFoundInTest?
=E2 + D3 - G2 - F2

And update Started Coding to increase by this value:

// C2 is prior value of Started Coding
// B3 is StartCodingMore?
// D2 is prior value of NumToTest?
// J2 is prior value of ErrorsFoundInDeploy?
// F2 is prior value of ErrorsFoundInTest?
= C2 + B3 - D2 + J2 + F2

Now this last flow is instrumented.

Screenshot of spreadsheet showing three columns of systems modeling

With that, we now have a complete model that we can start exercising! This exercise demonstrated both that it’s quite possible to represent a meaningful model in a spreadsheet, but also the challenges of doing so.

While developing this model, a number of errors became evident. Some of them I was able to fix relatively easily, and even more I left unfixed because fixing them makes the model even harder to reason about. This is a good example of why I encourage developing one or two models in a spreadsheet, but I ultimately don’t believe it’s the right mechanism to work in for most people: even very smart people make errors in their spreadsheets, and catching those errors is exceptionally challenging.

Exercise

Now that we’re done building this model, we can final start the fun part: exercising it. We’ll start by creating a simple bar chart showing the size of each stock at each step. We are going to expressly not show the intermediate calculation columns such as NumToTest?, because those are implementation details rather than particularly interesting.

Before we start tweaking the values , let’s look at the baseline chart.

Chart showing systems modeling

The most interesting thing to notice is that our current model doesn’t actually increase the number of closed tickets over time. We actually just get further and further behind over time, which isn’t too exciting.

So let’s start modeling the first way that LLMs might help, reducing the error rate in production. Let’s shift ErrorsInProd from 0.25 down to 0.1, and see how that impacts the chart.

Chart showing systems modeling

We can see that this allows us to make more progress on closing tickets, although at some point equilibrium is established between closed tickets and the error rate in production, preventing further progress. This does validate that reducing error rate in production matters. It also suggests that as long as error rate is a function of everything we’ve previously shipped, we are eventually in trouble.

Next let’s experiment with the idea that LLMs allow us to test more quickly, tripling TicketTestRate from 1 to 3. It turns out, increasing testing rate doesn’t change anything at all, because the current constraint is in starting tickets.

Chart showing systems modeling

So, let’s test that. Maybe LLMs make us faster in starting tickets because overall speed of development goes down. Let’s model that by increasing StartCodingRate from 1 to 3 as well.

Chart showing systems modeling

This is a fascinating result, because tripling development and testing velocity has changed how much work we start, but ultimately the real constraint in our system is the error discovery rate in production.

By exercising this model, we find an interesting result. To the extent that our error rate is a function of the volume of things we’ve shipped in production, shipping faster doesn’t increase our velocity at all. The only meaningful way to increase productivity in this model is to reduce the error rate in production.

Models are imperfect representations of reality, but this one gives us a clear sense of what matters the most: if we want to increase our velocity, we have to reduce the rate that we discover errors in production. That might be reducing the error rate as implied in this model, or it might be ideas that exist outside of this model. For example, the model doesn’t represent this well, but perhaps we’d be better off iterating more on fewer things to avoid this scenario. If we make multiple changes to one area, it still just represents one implemented feature, not many implement features, and the overall error rate wouldn’t increase.

Asheville

Asheville is in crisis right now. They're without drinking water, faucets run dry, and it's difficult to flush toilets. As of yesterday, the hospital has water (via tanker trucks), but 80% of the public water system is still without running water.

Things are really bad. Lots of infrastructure has been washed away. Even when water is back, there has been tremendous damage done that will take a long time to recover from and rebuild.

* * *

Here's the only national news story my friend from Asheville had seen which covered the water situation specifically. It's hard for me to understand why this is not covered more broadly. And my heart aches for those in and around the Asheville area.

As I'm far away, I can't do a lot to help. But I can donate money, which my friend said is the only donation that would help right now if you aren't in the area. She specifically pointed me to these two ways to donate:

  • Beloved Asheville: a respected community organization in Asheville, this is a great place to send money to help. (If you're closer to that area, it does look like they have specific things they're asking for as well, but this feels like an "if you can help this way, you'd already know" situation.)
  • Mutual Aid Disaster Relief: there's a local Asheville chapter which is doing work to help. Also an organization to support for broad disaster recovery in general.

I've donated money. I hope you will, too, for this and for the many other crises that affect us. Let's help each other.

Encontros Feynman

Encontros Feynman

Mentirinhas #2189

O post Mentirinhas #2189 apareceu primeiro em Mentirinhas.

Númenor Margaritaville

I see white shores, and beyond it, a far green country under a tequila sunrise.

Warriors of lost fortune

Warriors of lost fortune

Ingredients

Add main-belt asteroids to taste.

Provavelmente ele já tem advogados suficientes

Tenho muita dificuldade de lidar com pessoas que vêem pessoas concordando entre si e sentem que isso é um encantamento que as obriga a dizer "mas deixa eu dar uma de advogado de diabo".

Ainda mais quando essa frase vem seguida de uma hipótese que ela tirou das profundezas da parte mais oculta e sombria do seu ser.

O artigo "Provavelmente ele já tem advogados suficientes" foi originalmente publicado no site TRILUX, de Augusto Campos.

Areia movediça na vida real

Virou clichê usar areia movediça como um exemplo de preocupação da infância com coisas que nunca acontecem na vida real, e eu entendo a razão, pois a maior parte das pessoas nunca volta a pensar no assunto depois que a infância acaba.

Porém eu já fiquei preso em areia movediça mais de uma vez na vida real e afirmo que é assustador mesmo – AMA.

O artigo "Areia movediça na vida real" foi originalmente publicado no site TRILUX, de Augusto Campos.

Um sucesso que torcerei para não ser muito praticado

No ano passado eu investi literalmente mais de um mês de esforço pra desenvolver um protocolo (e uma norma interna) pra minha firma, pra resposta a eventos climáticos extremos.

A tramitação foi longa, inclusive devido a eventos climáticos nesse intervalo, mas na semana passada recebi a confirmação de que a norma está assinada, e entra em vigor hoje, primeiro dia do novo mês.

Grande sucesso! Mas, tendo em vista a temática, espero que esse protocolo não precise ser executado muitas vezes.

O artigo "Um sucesso que torcerei para não ser muito praticado" foi originalmente publicado no site TRILUX, de Augusto Campos.

Eu também não sou obrigado

Tenho TEA e não sou de pedir adaptação, porque sou bem adaptado à maioria das situações que vivo e percebo.

Quando preciso pedir adaptação, entretanto, não é algo que me agrada - mas eu entendo que a outra pessoa não tem como saber antes de eu dizer, e por isso digo com clareza qual a dificuldade que estou encontrando e – caso eu saiba – como seria a forma de me oferecer melhor acessibilidade.

Porém depois de eu dizer, é muito difícil eu estar disposto a interagir com brincadeirinhas a respeito, ou com persistência no comportamento não adaptado: fiz meu pedido, a pessoa não atendeu, está completo o ciclo.

Estamos no século XXI e a vida é muito curta pra interagir com certas coisas.

O artigo "Eu também não sou obrigado" foi originalmente publicado no site TRILUX, de Augusto Campos.

Cícero #78

O post Cícero #78 apareceu primeiro em Mentirinhas.

Terminal colours are tricky

Yesterday I was thinking about how long it took me to get a colorscheme in my terminal that I was mostly happy with (SO MANY YEARS), and it made me wonder what about terminal colours made it so hard.

So I asked people on Mastodon what problems they’ve run into with colours in the terminal, and I got a ton of interesting responses! Let’s talk about some of the problems and a few possible ways to fix them.

problem 1: blue on black

One of the top complaints was “blue on black is hard to read”. Here’s an example of that: if I open Terminal.app, set the background to black, and run ls, the directories are displayed in a blue that isn’t that easy to read:

To understand why we’re seeing this blue, let’s talk about ANSI colours!

the 16 ANSI colours

Your terminal has 16 numbered colours – black, red, green, yellow, blue, magenta, cyan, white, and “bright” version of each of those.

Programs can use them by printing out an “ANSI escape code” – for example if you want to see each of the 16 colours in your terminal, you can run this Python program:

def color(num, text):
    return f"\033[38;5;{num}m{text}\033[0m"

for i in range(16):
    print(color(i, f"number {i:02}"))

what are the ANSI colours?

This made me wonder – if blue is colour number 5, who decides what hex color that should correspond to?

The answer seems to be “there’s no standard, terminal emulators just choose colours and it’s not very consistent”. Here’s a screenshot of a table from Wikipedia, where you can see that there’s a lot of variation:

problem 1.5: bright yellow on white

Bright yellow on white is even worse than blue on black, here’s what I get in a terminal with the default settings:

That’s almost impossible to read (and some other colours like light green cause similar issues), so let’s talk about solutions!

two ways to reconfigure your colours

If you’re annoyed by these colour contrast issues (or maybe you just think the default ANSI colours are ugly), you might think – well, I’ll just choose a different “blue” and pick something I like better!

There are two ways you can do this:

Way 1: Configure your terminal emulator: I think most modern terminal emulators have a way to reconfigure the colours, and some of them even come with some preinstalled themes that you might like better than the defaults.

Way 2: Run a shell script: There are ANSI escape codes that you can print out to tell your terminal emulator to reconfigure its colours. Here’s a shell script that does that, from the base16-shell project. You can see that it has a few different conventions for changing the colours – I guess different terminal emulators have different escape codes for changing their colour palette, and so the script is trying to pick the right style of escape code based on the TERM environment variable.

what are the pros and cons of the 2 ways of configuring your colours?

I prefer to use the “shell script” method, because:

  • if I switch terminal emulators for some reason, I don’t need to a different configuration system, my colours still Just Work
  • I use base16-shell with base16-vim to make my vim colours match my terminal colours, which is convenient

some advantages of configuring colours in your terminal emulator:

  • if you use a popular terminal emulator, there are probably a lot more nice terminal themes out there that you can choose from
  • not all terminal emulators support the “shell script method”, and even if they do, the results can be a little inconsistent

This is what my shell has looked like for probably the last 5 years (using the solarized light base16 theme), and I’m pretty happy with it. Here’s htop:

Okay, so let’s say you’ve found a terminal colorscheme that you like. What else can go wrong?

problem 2: programs using 256 colours

Here’s what some output of fd, a find alternative, looks like in my colorscheme:

The contrast is pretty bad here, and I definitely don’t have that lime green in my normal colorscheme. What’s going on?

We can see what color codes fd is using using the unbuffer program to capture its output including the color codes:

$ unbuffer fd . > out
$ vim out
^[[38;5;48mbad-again.sh^[[0m
^[[38;5;48mbad.sh^[[0m
^[[38;5;48mbetter.sh^[[0m
out

^[[38;5;48 means “set the foreground color to color 48”. Terminals don’t only have 16 colours – many terminals these days actually have 3 ways of specifying colours:

  1. the 16 ANSI colours we already talked about
  2. an extended set of 256 colours
  3. a further extended set of 24-bit hex colours, like #ffea03

So fd is using one of the colours from the extended 256-color set. bat (a cat alternative) does something similar – here’s what it looks like by default in my terminal.

This looks fine though and it really seems like it’s trying to work well with a variety of terminal themes.

some newer tools seem to have theme support

I think it’s interesting that some of these newer terminal tools (fd, cat, delta, and probably more) have support for arbitrary custom themes. I guess the downside of this approach is that the default theme might clash with your terminal’s background, but the upside is that it gives you a lot more control over theming the tool’s output than just choosing 16 ANSI colours.

I don’t really use bat, but if I did I’d probably use bat --theme ansi to just use the ANSI colours that I have set in my normal terminal colorscheme.

problem 3: the grays in Solarized

A bunch of people on Mastodon mentioned a specific issue with grays in the Solarized theme: when I list a directory, the base16 Solarized Light theme looks like this:

but iTerm’s default Solarized Light theme looks like this:

This is because in the iTerm theme (which is the original Solarized design), colors 9-14 (the “bright blue”, “bright red”, etc) are mapped to a series of grays, and when I run ls, it’s trying to use those “bright” colours to color my directories and executables.

My best guess for why the original Solarized theme is designed this way is to make the grays available to the vim Solarized colorscheme.

I’m pretty sure I prefer the modified base16 version I use where the “bright” colours are actually colours instead of all being shades of gray though. (I didn’t actually realize the version I was using wasn’t the “original” Solarized theme until I wrote this post)

In any case I really love Solarized and I’m very happy it exists so that I can use a modified version of it.

problem 4: a vim theme that doesn’t match the terminal background

If I my vim theme has a different background colour than my terminal theme, I get this ugly border, like this:

This one is a pretty minor issue though and I think making your terminal background match your vim background is pretty straightforward.

problem 5: programs setting a background color

A few people mentioned problems with terminal applications setting an unwanted background colour, so let’s look at an example of that.

Here ngrok has set the background to color #16 (“black”), but the base16-shell script I use sets color 16 to be bright orange, so I get this, which is pretty bad:

I think the intention is for ngrok to look something like this:

I think base16-shell sets color #16 to orange (instead of black) so that it can provide extra colours for use by base16-vim. This feels reasonable to me – I use base16-vim in the terminal, so I guess I’m using that feature and it’s probably more important to me than ngrok (which I rarely use) behaving a bit weirdly.

This particular issue is a maybe obscure clash between ngrok and my colorschem, but I think this kind of clash is pretty common when a program sets an ANSI background color that the user has remapped for some reason.

a nice solution to contrast issues: “minimum contrast”

A bunch of terminals (iTerm2, tabby, kitty’s text_fg_override_threshold, and folks tell me also Ghostty and Windows Terminal) have a “minimum contrast” feature that will automatically adjust colours to make sure they have enough contrast.

Here’s an example from iTerm. This ngrok accident from before has pretty bad contrast, I find it pretty difficult to read:

With “minimum contrast” set to 40 in iTerm, it looks like this instead:

I didn’t have minimum contrast turned on before but I just turned it on today because it makes such a big difference when something goes wrong with colours in the terminal.

problem 6: TERM being set to the wrong thing

A few people mentioned that they’ll SSH into a system that doesn’t support the TERM environment variable that they have set locally, and then the colours won’t work.

I think the way TERM works is that systems have a terminfo database, so if the value of the TERM environment variable isn’t in the system’s terminfo database, then it won’t know how to output colours for that terminal. I don’t know too much about terminfo, but someone linked me to this terminfo rant that talks about a few other issues with terminfo.

I don’t have a system on hand to reproduce this one so I can’t say for sure how to fix it, but this stackoverflow question suggests running something like TERM=xterm ssh instead of ssh.

problem 7: picking “good” colours is hard

A couple of problems people mentioned with designing / finding terminal colorschemes:

  • some folks are colorblind and have trouble finding an appropriate colorscheme
  • accidentally making the background color too close to the cursor or selection color, so they’re hard to find
  • generally finding colours that work with every program is a struggle (for example you can see me having a problem with this with ngrok above!)

problem 8: making nethack/mc look right

Another problem people mentioned is using a program like nethack or midnight commander which you might expect to have a specific colourscheme based on the default ANSI terminal colours.

For example, midnight commander has a really specific classic look:

But in my Solarized theme, midnight commander looks like this:

The Solarized version feels like it could be disorienting if you’re very used to the “classic” look.

One solution Simon Tatham mentioned to this is using some palette customization ANSI codes (like the ones base16 uses that I talked about earlier) to change the color palette right before starting the program, for example remapping yellow to a brighter yellow before starting Nethack so that the yellow characters look better.

problem 9: commands disabling colours when writing to a pipe

If I run fd | less, I see something like this, with the colours disabled.

In general I find this useful – if I pipe a command to grep, I don’t want it to print out all those color escape codes, I just want the plain text. But what if you want to see the colours?

To see the colours, you can run unbuffer fd | less -r! I just learned about unbuffer recently and I think it’s really cool, unbuffer opens a tty for the command to write to so that it thinks it’s writing to a TTY. It also fixes issues with programs buffering their output when writing to a pipe, which is why it’s called unbuffer.

Here’s what the output of unbuffer fd | less -r looks like for me:

problem 10: unwanted colour in ls and other commands

Some people mentioned that they don’t want ls to use colour at all, perhaps because ls uses blue, it’s hard to read on black, and maybe they don’t feel like customizing their terminal’s colourscheme to make the blue more readable or just don’t find the use of colour helpful.

Some possible solutions to this one:

  • you can run ls --color=never, which is probably easiest
  • you can also set LS_COLORS to customize the colours used by ls. I think some other programs other than ls support the LS_COLORS environment variable too.
  • also some programs support setting NO_COLOR=true (there’s a list here)

Here’s an example of running LS_COLORS="fi=0:di=0:ln=0:pi=0:so=0:bd=0:cd=0:or=0:ex=0" ls:

problem 11: the colours in vim

I used to have a lot of problems with configuring my colours in vim – I’d set up my terminal colours in a way that I thought was okay, and then I’d start vim and it would just be a disaster.

I think what was going on here is that today, there are two ways to set up a vim colorscheme in the terminal:

  1. using your ANSI terminal colours – you tell vim which ANSI colour number to use for the background, for functions, etc.
  2. using 24-bit hex colours – instead of ANSI terminal colours, the vim colorscheme can use hex codes like #faea99 directly

20 years ago when I started using vim, terminals with 24-bit hex color support were a lot less common (or maybe they didn’t exist at all), and vim certainly didn’t have support for using 24-bit colour in the terminal. From some quick searching through git, it looks like vim added support for 24-bit colour in 2016 – just 8 years ago!

So to get colours to work properly in vim before 2016, you needed to synchronize your terminal colorscheme and your vim colorscheme. Here’s what that looked like, the colorscheme needed to map the vim color classes like cterm05 to ANSI colour numbers.

But in 2024, the story is really different! Vim (and Neovim, which I use now) support 24-bit colours, and as of Neovim 0.10 (released in May 2024), the termguicolors setting (which tells Vim to use 24-bit hex colours for colorschemes) is turned on by default in any terminal with 24-bit color support.

So this “you need to synchronize your terminal colorscheme and your vim colorscheme” problem is not an issue anymore for me in 2024, since I don’t plan to use terminals without 24-bit color support in the future.

The biggest consequence for me of this whole thing is that I don’t need base16 to set colors 16-21 to weird stuff anymore to integrate with vim – I can just use a terminal theme and a vim theme, and as long as the two themes use similar colours (so it’s not jarring for me to switch between them) there’s no problem. I think I can just remove those parts from my base16 shell script and totally avoid the problem with ngrok and the weird orange background I talked about above.

some more problems I left out

I think there are a lot of issues around the intersection of multiple programs, like using some combination tmux/ssh/vim that I couldn’t figure out how to reproduce well enough to talk about them. Also I’m sure I missed a lot of other things too.

base16 has really worked for me

I’ve personally had a lot of success with using base16-shell with base16-vim – I just need to add a couple of lines to my fish config to set it up (+ a few .vimrc lines) and then I can move on and accept any remaining problems that that doesn’t solve.

I don’t think base16 is for everyone though, some limitations I’m aware of with base16 that might make it not work for you:

  • it comes with a limited set of builtin themes and you might not like any of them
  • the Solarized base16 theme (and maybe all of the themes?) sets the “bright” ANSI colours to be exactly the same as the normal colours, which might cause a problem if you’re relying on the “bright” colours to be different from the regular ones
  • it sets colours 16-21 in order to give the vim colorschemes from base16-vim access to more colours, which might not be relevant if you always use a terminal with 24-bit color support, and can cause problems like the ngrok issue above
  • also the way it sets colours 16-21 could be a problem in terminals that don’t have 256-color support, like the linux framebuffer terminal

Apparently there’s a community fork of base16 called tinted-theming, which I haven’t looked into much yet.

some other colorscheme tools

Just one so far but I’ll link more if people tell me about them:

okay, that was a lot

We talked about a lot in this post and while I think learning about all these details is kind of fun if I’m in the mood to do a deep dive, I find it SO FRUSTRATING to deal with it when I just want my colours to work! Being surprised by unreadable text and having to find a workaround is just not my idea of a good day.

Personally I’m a zero-configuration kind of person and it’s not that appealing to me to have to put together a lot of custom configuration just to make my colours in the terminal look acceptable. I’d much rather just have some reasonable defaults that I don’t have to change.

minimum contrast seems like an amazing feature

My one big takeaway from writing this was to turn on “minimum contrast” in my terminal, I think it’s going to fix most of the occasional accidental unreadable text issues I run into and I’m pretty excited about it.

Summary of reading: July - September 2024

  • "A City on Mars" By Kelly and Zach Weinersmith - actual discussion of building settlements on Mars occupies maybe 1% of this book. The authors have an uncanny talent of focusing on all the least interesting aspects of space exploration; whatever little is dedicated to science and engineering is fairly shallow and reeks of techno-pessimism. That said, if the aspects of space exploration that interest you most are politics, social structures, legal frameworks, labor relations and mental health - go ahead and read this book. Overall, I found it infuriatingly bad.
  • "What You Are Looking For Is in the Library" Michiko Aoyama - a sweet little collection of loosely-related short stories. Just regular people living their lives in Japan; the common theme is changing circumstances and how to deal with them. I really enjoy books of this kind, and this one is very well done.
  • "Is this Wi-Fi Organic?: A guide to spotting misleading science online" by Dave Farina - a valiant attempt to build up a solid scientific foundation for debunking myths (focusing mostly on alternative medicine). Unfortunately, the execution falls short. The tactic employed by the author is: explain a bunch of science at a pretty shallow level, then present some claim to debunk and then immediately jump to "this is clearly false, becasue science". Rinse; repeat. While entertaining, this isn't very convincing. From a book, I'd expect much more. For example, Simon Singh's "Trick or Treatment" is much better in this respect, because the debunking done with a lot of supporting evidence, citing relevant research and studies. The good thing I can say about this book is that I appreciate the techno-optimism and the science-first approach.
  • "An imaginary tale" by Paul J. Nahin - tells the history of the discovery and initial applications of the imaginary unit - i. Very interesting book that fills an under-served niche between popular science and textbooks. The book isn't easy to go through - it requires sophisticated math, at least at the engineering undergrad level. Except the last chapter - which goes a bit off the rails with complex analysis - this background should be sufficient for the vast majority of the book, but some work will still be required. While I didn't follow through every single calculation, I really enjoyed the book overall and should try to read additional stuff by this author.
  • "The Code Breaker" by Walter Isaacson - a biography of Jennifer Doudna, focusing on the discovery of CRISPR, the competition around it, the ethical implications of gene editing and COVID vaccine research. An OK book overall, with some really tedious parts; not the best Isaacson, in my experience.
  • "An Immense World" by Ed Yong - secondary title "How Animal Senses Reveal the Hidden Realms Around Us"; an information-packed book with a huge scope, describing the sensing capabilities animals possess beyond the human range. Very interesting.
  • "Mansfield Park" by Jane Austen - familiar style and topic - the lives of the bored 19th century English gentry. I liked this book less than others by Austen I read; the characters aren't as well developed (I feel like there are too many? Austen seems to have neglected to provide distinct roles for several), and the first half of the book is fairly dull.
  • "How to Avoid a Climate Disaster" by Bill Gates - a decent overview of the current state of global warming, and what it would take to avoid the worst-case scenario. This book could be much better, IMHO, if it had more depth and a coherent summary/plan. I'm also surprised by the relatively shallow coverage of nuclear (fission) power, given the importance attributed to it by Gates. Also, the coverage of carbon capture is surprisingly minimal. On the good side, I really liked the concept of "green premium" and how to factor that into economically realistic solutions. This is one of those subjects that's moving so fast, however, that it needs a fresh treatment every few years. This book is from 2020 and already some information feels stale due to the huge progress in solar deployments that's been made in the past few years.
  • "On Writing: A Memoir of the Craft" by Stephen King - a mix of autobiography with some advice on writing creative fiction. Very good overall. King's relationship with his family is inspiring and endearing - not the usual celebrity fare. One small gripe is that the book is artificially inflated with some tangential interviews at the end, so it's actually shorter than advertised.
  • "Engineering in Plain Sight" by Grady Hillhouse - the author is well-known from his YouTube channel "Practical Engineering", and this book is a text presentation of many of the topics he discusses in his videos. The writing is unmistakable in Hillhouse's enthusiastic style and the illustrations are beautiful. In all, a very nice book.
  • "Children of Time" by Adrian Tchaikovsky - first part of a sci-fi trilogy about space-faring humans and... spiders. Highly imaginative and fun to read.
  • "Journeys North: The Pacific Crest Trail" by Barney Scout Mann - a memoir of a person thru-hiking the full PCT in 2007, and some stories about his fellow hikers. Great book.
  • "The Final Frontiersman" by James Campbell - subtitle is "Heimo Korth and His Family, Alone in Alaska's Arctic Wilderness". Very nice biography about a fur trapper family living in a remote self-built cabin in north-east Alaska. Covers a period roughly from the 1980s to the early 2000s.

Re-reads:

  • "Naked Economics" by Charles Wheelan
  • "The count of Monte Cristo" by Alexandre Dumas
  • "Skunk Works: A Personal Memoir of My Years of Lockheed" by Ben Rich

Why Gumroad Didn't Choose htmx

At Gumroad, we recently embarked on a new project called Helper. As the CEO, I was initially quite optimistic about using htmx for this project, even though some team members were less enthusiastic.

My optimism stemmed from previous experiences with React, which often felt like overkill for our needs. I thought htmx could be a good solution to keep our front-end super light.

Gumroad Red
Source with htmx - Click Image To View

In fact, I shared this sentiment with our team in Slack:

“https://htmx.org/ may be a way of adding simple interactions to start”

And initially, it seemed promising! As one of our engineers at Gumroad eloquently put it:

“HTMX is (officially) a meme to make fun of how overly complicated the JS landscape has gotten - much like tailwind is just a different syntax for inline CSS, HTMX is a different syntax for inline JS.”

However, unlike Tailwind, which has found its place in our toolkit, htmx didn’t scale for our purposes and didn’t lead to the best user experience for our customers–at least for our use case.

Here’s why:

  1. Intuition and Developer Experience: While it would have been possible to do the right thing in htmx, we found it much more intuitive and fun to get everything working with Next.js. The development process felt natural with Next.js, whereas with htmx, it often felt unnatural and forced. For example, when building complex forms with dynamic validation and conditional fields, we found ourselves writing convoluted server-side logic to handle what would be straightforward client-side operations in React.

  2. UX Limitations: htmx ended up pushing our app towards a Rails/CRUD approach, which led to a really poor (or at least, boring and generic) user experience by default. We found ourselves constantly fighting against this tendency, which was counterproductive. For instance, implementing a drag-and-drop interface for our workflow builder proved to be a significant challenge with htmx, requiring workarounds that felt clunky compared to the smooth experience we could achieve with React libraries.

  3. AI and Tooling Support: It’s worth noting that AI tools are intimately familiar with Next.js and not so much with htmx, due to the lack of open-source training data. This is similar to the issue Rails faces. While not a dealbreaker, it did impact our development speed and the ease of finding solutions to problems. When we encountered issues, the wealth of resources available for React/Next.js made troubleshooting much faster.

  4. Scalability Concerns: As our project grew in complexity, we found htmx struggling to keep up with our needs. The simplicity that initially attracted us began to feel limiting as we tried to implement more sophisticated interactions and state management. For example, as we added features like real-time collaboration and complex data visualization, managing state across multiple components became increasingly difficult with htmx’s server-centric approach.

  5. Community and Ecosystem: The React/Next.js ecosystem is vast and mature, offering solutions to almost any problem we encountered. With htmx, we often found ourselves reinventing the wheel or compromising on functionality. This became particularly evident when we needed to integrate third-party services and libraries, which often had React bindings but no htmx equivalents.

Gumroad Green
Source with NextJS - Click Image To View

Ultimately, we ended up moving to React/Next.js, which has been a really great fit for building the complex UX we’ve been looking for. We’re happy with this decision–for now. It’s allowed us to move faster, create more engaging user experiences, and leverage a wealth of existing tools and libraries.

Gumroad Helper Before After
Gumroad Helper Before & After - Click Image To View

This experience has reinforced a valuable lesson: while it’s important to consider lightweight alternatives, it’s equally crucial to choose technologies that can grow with your project and support your long-term vision. For Helper, React and Next.js have proven to be that choice.

Since we’ve moved there, we’ve been able to seriously upgrade our app’s user experience for our core customers.

  1. Drag-and-Drop Functionality: One of the key features of our workflow builder is the ability to reorder steps through drag-and-drop. While it’s possible to implement drag-and-drop with htmx, we found that the available solutions felt clunky and required significant custom JavaScript. In contrast, React ecosystem offers libraries like react-beautiful-dnd that provide smooth, accessible drag-and-drop with minimal setup.

  2. Complex State Management: Each workflow step has its own set of configurations and conditional logic. As users edit these, we need to update the UI in real-time to reflect changes and their implications on other steps. With htmx, this would require numerous server roundtrips or complex client-side state management that goes against htmx’s server-centric philosophy. React’s state management solutions (like useState or more advanced options like Redux) made this much more straightforward.

  3. Dynamic Form Generation: The configuration for each step type is different and can change based on user input. Generating these dynamic forms and handling their state was more intuitive with React’s component model. With htmx, we found ourselves writing more complex server-side logic to generate and validate these forms.

  4. Real-time Collaboration: While not visible in this screenshot, we implemented features allowing multiple users to edit a workflow simultaneously. Implementing this with WebSockets and React was relatively straightforward, whereas with htmx, it would have required more complex server-side logic and custom JavaScript to handle real-time updates.

  5. Performance Optimization: As workflows grew larger and more complex, we needed fine-grained control over rendering optimizations. React’s virtual DOM and hooks like useMemo and useCallback allowed us to optimize performance in ways that weren’t as readily available or intuitive with htmx.

It’s important to note that while these challenges aren’t insurmountable with htmx, we found that addressing them often led us away from htmx’s strengths and towards solutions that felt more natural in a JavaScript-heavy environment. This realization was a key factor in our decision to switch to React and Next.js.

We acknowledge that htmx may be a great fit for many projects, especially those with simpler interaction models or those built on top of existing server-rendered applications. Our experience doesn’t invalidate the benefits others have found in htmx. The key is understanding your project’s specific needs and choosing the tool that best aligns with those requirements.

In our case, the complex, stateful nature of Helper’s interface made React and Next.js a better fit. However, we continue to appreciate htmx’s approach and may consider it for future projects where its strengths align better with our needs.

That said, we’re always open to reevaluating our tech stack as our needs evolve and new technologies emerge. Who knows what the future might bring?

Rust needs a web framework for lazy developers

I like to make silly things, and I also like to put in minimal effort for those silly things. I also like to make things in Rust, mostly for the web, and this is where we run into a problem.

See, if I want to make something for the web, I could use Django but I don't want that. I mean, Django is for building serious businesses, not for building silly non-commercial things! But using Rust, we have to do a lot more work than if we build it with Django or friends.

See, so far, there's no equivalent, and the Rust community leans heavily into the "wire it up yourself" approach. As Are We Web Yet? says, "[...] you generally have to wire everything up yourself. Expect to put in a little bit of extra set up work to get started."

This undersells it, though. It's more than a little bit of extra work to get started! I know because I made a list of things to do to get started.

Rust needs something that does bundle this up for you, so that we can serve all web developers. Having it would make it a lot easier to make the case to use Rust. The benefits are there: you get wonderful type system, wonderful performance, and build times that give you back those coffee breaks you used to get while your code compiled.

What do we need?

There is a big pile of stuff that nearly every web app needs, no matter if it's big or small. Here's a rough list of what seems pretty necessary to me:

  • Routing/handlers: this is pretty obvious, but we have to be able to get an incoming request to some handler for it. Additionally, this routing needs to handle path parameters, ideally with type information, and we'll give bonus points for query parameters, forms, etc.
  • Templates: we'll need to generate, you know, HTML (and sometimes other content, like JSON or, if you're in the bad times still, XML). Usually I want these to have basic logic, like conditionals, match/switch, and loops.
  • Static file serving: we'll need to serve some assets, like CSS files. This can be done separately, but having it as part of the same web server is extremely handy for both local development and for small-time deployments that won't handle much traffic.
  • Logins: You almost always need some way to log in, since apps are usually multi-user or deployed on a public network. This is just annoying to wire up every time! It should be customizable and something you can opt out of, but it should be trivial to have logins from the start.
  • Permissions: You also need this for systems that have multiple users, since people will have different data they're allowed to access or different roles in the system. Permissions can be complicated but you can make something relatively simple that follows the check(user, object, action) pattern and get really far with it.
  • Database interface: You're probably going to have to store data for your app, so you want a way to do that. Something that's ORM-like is often nice, but something light is fine. Whatever you do here isn't the only way to interact with the database, but it'll be used for things like logins, permissions, and admin tools, so it's going to be a fundamental piece.
  • Admin tooling: This is arguably a quality-of-life issue, not a necessity, except that every time you setup your application in a local environment or in production you're going to have to bootstrap it with at least one user or some data. And you'll have to do admin actions sometimes! So I think having this built-in for at least some of the common actions is a necessity for a seamless experience.
  • WebSockets: I use WebSockets in a lot of my projects. They just let you do really fun things with pushing data out to connected users in a more real-time fashion!
  • Hot reloading: This is a huge one for developer experience, because you want to have the ability to see changes really quickly. When code or a template change, you need to see that reflected as soon as humanly possible (or as soon as the Rust compiler allows).

Then we have a pile of things that are quality-of-life improvements, and I think are necessary for long-term projects but might not be as necessary upfront, so users are less annoyed at implementing it themselves because the cost is spread out.

  • Background tasks: There needs to be a story for these! You're going to have features that have to happen on a schedule, and having a consistent way to do that is a big benefit and makes development easier.
  • Monitoring/observability: Only the smallest, least-critical systems should skip this. It's really important to have and it will make your life so much easier when you have it in that moment that you desperately need it.
  • Caching: There are a lot of ways to do this, and all of them make things more complicated and maybe faster? So this is nice to have a story for, but users can also handle it themselves.
  • Emails and other notifications: It's neat to be able to have password resets and things built-in, and this is probably a necessity if you're going to have logins, so you can have password resets. But other than that feature, it feels like it won't get used that much and isn't a big deal to add in when you need it.
  • Deployment tooling: Some consistent way to deploy somewhere is really nice, even if it's just an autogenerated Dockerfile that you can use with a host of choice.
  • CSS/JS bundling: In the time it is, we use JS and CSS everywhere, so you probably want a web tool to be aware of them so they can be included seamlessly. But does it really have to be integrated in? Probably not...

So those are the things I'd target in a framework if I were building one! I might be doing that...

The existing ecosystem

There's quite a bit out there already for building web things in Rust. None of them quite hit what I want, which is intentional on their part: none of them aspire to be what I'm looking for here. I love what exists, and I think we're sorely missing what I want here (I don't think I'm alone).

Web frameworks

There are really two main groups of web frameworks/libraries right now: the minimalist ones, and the single-page app ones.

The minimalist ones are reminiscent of Flask, Sinatra, and other small web frameworks. These include the excellent actix-web and axum, as well as myriad others. There are so many of these, and they all bring a nice flavor to web development by leveraging Rust's type system! But they don't give you much besides handlers; none of the extra functionality we want in a full for-lazy-developers framework.

Then there are the single-page app frameworks. These fill a niche where you can build things with Rust on the backend and frontend, using WebAssembly for the frontend rendering. These tend to be less mature, but good examples include Dioxus, Leptos, and Yew. I used Yew to build a digital vigil last year, and it was enjoyable but I'm not sure I'd want to do it in a "real" production setting.

Each of these is excellent for what it is—but what it is requires a lot of wiring up still. Most of my projects would work well with the minimalist frameworks, but those require so much wiring up! So it ends up being a chore to set that up each time that I want to do something.

Piles of libraries!

The rest of the ecosystem is piles of libraries. There are lots of template libraries! There are some libraries for logins, and for permissions. There are WebSocket libraries!

Often you'll find some projects and examples which integrate a couple of the things you're using, but you won't find something that integrates all the pieces you're using. I've run into some of the examples being out of date, which is to be expected in a fast-moving ecosystem.

The pile of libraries leaves a lot of friction, though. It makes getting started, the "just wiring it up" part, very difficult and often an exercise in researching how things work, to understand them in depth enough to do the integration.

What I've done before

The way I've handled this before is basically to pick a base framework (typically actix-web or axum) and then search out all the pieces I want on top of it. Then I'd wire them up, either all at the beginning or as I need them.

There are starter templates that could help me avoid some of this pain. They can definitely help you skip some of the initial pain, but you still get all the maintenance burden. You have to make sure your libraries stay up to date, even when there are breaking changes. And you will drift from the template, so it's not really feasible to merge changes to it into your project.

For the projects I'm working on, this means that instead of keeping one framework up to date, I have to keep n bespoke frameworks up to date across all my projects!

Eep!

I'd much rather have a single web framework that handles it all, with clean upgrade instructions between versions. There will be breaking changes sometimes, but this way they can be documented instead of coming about due to changes in the interactions between two components which don't even know they're going to be integrated together.

Imagining the future I want

In an ideal world, there would be a framework for Rust that gives me all the features I listed above. And it would also come with excellent documentation, changelogs, thoughtful versioning and handling of breaking changes, and maybe even a great community. All the things I love about Django, could we have those for a Rust web framework so that we can reap the benefits of Rust without having to go needlessly slowly?

This doesn't exist right now, and I'm not sure if anyone else is working on it. All paths seem to lead me toward "whoops I guess I'm building a web framework." I hope someone else builds one, too, so we can have multiple options.

To be honest, "web framework" sounds way too grandiose for what I'm doing, which is simply wiring things together in an opinionated way, using (mostly) existing building blocks1. Instead of calling it a framework, I'm thinking of it as a web toolkit: a bundle of tools tastefully chosen and arranged to make the artisan highly effective.

My toolkit is called nicole's web toolkit, or newt. It's available in a public repository, but it's really not usable (the latest changes aren't even pushed yet). It's not even usable for me yet—this isn't a launch post, more shipping my design doc (and hoping someone will do my work for me so I don't have to finish newt :D).

The goal for newt is to be able to create a new small web app and start on the actual project in minutes instead of days, bypassing the entire process of wiring things up. I think the list of must-haves and quality-of-life features above will be a start, but by no means everything we need. I'm not ready to accept contributions, but I hope to be there at some point.

I think that Rust really needs this, and the whole ecosystem will benefit from it. A healthy ecosystem will have multiple such toolkits, and I hope to see others develop as well.

* * *

If you want to follow along with mine, though, feel free to subscribe to my RSS feed or newsletter, or follow me on Mastodon. I'll try to let people know in all those places when the toolkit is ready for people to try out. Or I'll do a post-mortem on it, if it ends up that I don't get far with it! Either way, this will be fun.


1

I do plan to build a few pieces from scratch for this, as the need arises. Some things will be easier that way, or fit more cohesively. Can't I have a little greenfield, as a treat?

UK Coal

The Watership Down rabbits removed an additional 0.1 nanometers constructing their warren, although that was mostly soil. British rabbits have historically mined very little coal; the sole rabbit-run coal plant was shut down in the 1990s.

Reverse-engineering a three-axis attitude indicator from the F-4 fighter plane

We recently received an attitude indicator for the F-4 fighter plane, an instrument that uses a rotating ball to show the aircraft's orientation and direction. In a normal aircraft, the artificial horizon shows the orientation in two axes (pitch and roll), but the F-4 indicator uses a rotating ball to show the orientation in three axes, adding azimuth (yaw).1 It wasn't obvious to me how the ball could rotate in three axes: how could it turn in every direction and still remain attached to the instrument?

The attitude indicator. The "W" forms a stylized aircraft. In this case, it indicates that the aircraft is climbing slightly. Photo from CuriousMarc.

The attitude indicator. The "W" forms a stylized aircraft. In this case, it indicates that the aircraft is climbing slightly. Photo from CuriousMarc.

We disassembled the indicator, reverse-engineered its 1960s-era circuitry, fixed some problems,2 and got it spinning. The video clip below shows the indicator rotating around three axes. In this blog post, I discuss the mechanical and electrical construction of this indicator. (The quick explanation is that the ball is really two hollow half-shells attached to the internal mechanism at the "poles"; the shells rotate while the "equator" remains stationary.)

The F-4 aircraft

The indicator was used in the F-4 Phantom II3 so the pilot could keep track of the aircraft's orientation during high-speed maneuvers. The F-4 was a supersonic fighter manufactured from 1958 to 1981. Over 5000 were produced, making it the most-produced American supersonic aircraft ever. It was the main US fighter jet in the Vietnam War, operating from aircraft carriers. The F-4 was still used in the 1990s during the Gulf War, suppressing air defenses in the "Wild Weasel" role. The F-4 was capable of carrying nuclear bombs.4

An F-4G Phantom II Wild Weasel aircraft. From National Archives.

An F-4G Phantom II Wild Weasel aircraft. From National Archives.

The F-4 was a two-seat aircraft, with the radar intercept officer controlling radar and weapons from a seat behind the pilot. Both cockpits had a panel crammed with instruments, with additional instruments and controls on the sides. As shown below, the pilot's panel had the three-axis attitude indicator in the central position, just below the reddish radar scope, reflecting its importance.5 (The rear cockpit had a simpler two-axis attitude indicator.)

The cockpit of the F-4C Phantom II, with the attitude indicator in the center of the panel. Click this photo (or any other) for a larger version. Photo from National Museum of the USAF.

The cockpit of the F-4C Phantom II, with the attitude indicator in the center of the panel. Click this photo (or any other) for a larger version. Photo from National Museum of the USAF.

The attitude indicator mechanism

The ball inside the indicator shows the aircraft's position in three axes. The roll axis indicates the aircraft's angle if it rolls side-to-side along its axis of flight. The pitch axis indicates the aircraft's angle if it pitches up or down. Finally, the azimuth axis indicates the compass direction that the aircraft is heading, changed by the aircraft's turning left or right (yaw). The indicator also has moving needles and status flags, but in this post I'm focusing on the rotating ball.6

The indicator uses three motors to move the ball. The roll motor (below) is attached to the frame of the indicator, while the pitch and azimuth motors are inside the ball. The ball is held in place by the roll gimbal, which is attached to the ball mechanism at the top and bottom pivot points. The roll motor turns the roll gimbal and thus the ball, providing a clockwise/counterclockwise movement. The roll control transformer provides position feedback. Note the numerous wires on the roll gimbal, connected to the mechanism inside the ball.

The attitude indicator with the cover removed.

The attitude indicator with the cover removed.

The diagram below shows the mechanism inside the ball, after removing the hemispherical shells of the ball. When the roll gimbal is rotated, this mechanism rotates with it. The pitch motor causes the entire mechanism to rotate around the pitch axis (horizontal here), which is attached along the "equator". The azimuth motor and control transformer are behind the pitch components, not visible in this photo. The azimuth motor turns the vertical shaft. The two hollow hemispheres of the ball attach to the top and bottom of the shaft. Thus, the azimuth motor rotates the ball shells around the azimuth axis, while the mechanism itself remains stationary.

The components of the ball mechanism.

The components of the ball mechanism.

Why doesn't the wiring get tangled up as the ball rotates? The solution is two sets of slip rings to implement the electrical connections. The photo below shows the first slip ring assembly, which handles rotation around the roll axis. These slip rings connect the stationary part of the instrument to the rotating roll gimbal. The black base and the vertical wires are attached to the instrument, while the striped shaft in the middle rotates with the ball assembly housing. Inside the shaft, wires go from the circular metal contacts to the roll gimbal.

The first set of slip rings. Yes, there is damage on one of the slip ring contacts.

The first set of slip rings. Yes, there is damage on one of the slip ring contacts.

Inside the ball, a second set of slip rings provides the electrical connection between the wiring on the roll gimbal and the ball mechanism. The photo below shows the connections to these slip rings, handling rotation around the pitch axis (horizontal in this photo). (The slip rings themselves are inside and are not visible.) The shaft sticking out of the assembly rotates around the azimuth (yaw) axis. The ball hemisphere is attached to the metal disk. The azimuth axis does not require slip rings since only the ball shells rotates; the electronics remain stationary.

Connections for the second set of slip rings.

Connections for the second set of slip rings.

The servo loop

In this section, I'll explain how the motors are controlled by servo loops. The attitude indicator is driven by an external gyroscope, receiving electrical signals indicating the roll, pitch, and azimuth positions. As was common in 1960s avionics, the signals are transmitted from synchros, which use three wires to indicate an angle. The motors inside the attitude indicator rotate until the indicator's angles for the three axes match the input angles.

Each motor is controlled by a servo loop, shown below. The goal is to rotate the output shaft to an angle that exactly matches the input angle, specified by the three synchro wires. The key is a device called a control transformer, which takes the three-wire input angle and a physical shaft rotation, and generates an error signal indicating the difference between the desired angle and the physical angle. The amplifier drives the motor in the appropriate direction until the error signal drops to zero. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. This ensures that the motor slows as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.)

This diagram shows the structure of the servo loop, with a feedback loop ensuring that the rotation angle of the output shaft matches the input angle.

This diagram shows the structure of the servo loop, with a feedback loop ensuring that the rotation angle of the output shaft matches the input angle.

In more detail, the external gyroscope unit contains synchro transmitters, small devices that convert the angular position of a shaft into AC signals on three wires. The photo below shows a typical synchro, with the input shaft on the top and five wires at the bottom: two for power and three for the output.

A synchro transmitter.

A synchro transmitter.

Internally, the synchro has a rotating winding called the rotor that is driven with 400 Hz AC. Three fixed stator windings provide the three AC output signals. As the shaft rotates, the phase and voltage of the output signals changes, indicating the angle. (Synchros may seem bizarre, but they were extensively used in the 1950s and 1960s to transmit angular information in ships and aircraft.)

The schematic symbol for a synchro transmitter or receiver.

The schematic symbol for a synchro transmitter or receiver.

The attitude indicator uses control transformers to process these input signals. A control transformer is similar to a synchro in appearance and construction, but it is wired differently. The three stator windings receive the inputs and the rotor winding provides the error output. If the rotor angle of the synchro transmitter and control transformer are the same, the signals cancel out and there is no error output. But as the difference between the two shaft angles increases, the rotor winding produces an error signal. The phase of the error signal indicates the direction of error.

The next component is the motor/tachometer, a special motor that was often used in avionics servo loops. This motor is more complicated than a regular electric motor. The motor is powered by 115 volts AC, 400-Hertz, but this isn't sufficient to get the motor spinning. The motor also has two low-voltage AC control windings. Energizing a control winding will cause the motor to spin in one direction or the other.

The motor/tachometer unit also contains a tachometer to measure its rotational speed, for use in a feedback loop. The tachometer is driven by another 115-volt AC winding and generates a low-voltage AC signal proportional to the rotational speed of the motor.

A motor/tachometer similar (but not identical) to the one in the attitude indicator).

A motor/tachometer similar (but not identical) to the one in the attitude indicator).

The photo above shows a motor/tachometer with the rotor removed. The unit has many wires because of its multiple windings. The rotor has two drums. The drum on the left, with the spiral stripes, is for the motor. This drum is a "squirrel-cage rotor", which spins due to induced currents. (There are no electrical connections to the rotor; the drums interact with the windings through magnetic fields.) The drum on the right is the tachometer rotor; it induces a signal in the output winding proportional to the speed due to eddy currents. The tachometer signal is at 400 Hz like the driving signal, either in phase or 180º out of phase, depending on the direction of rotation. For more information on how a motor/generator works, see my teardown.

The amplifier

The motors are powered by an amplifier assembly that contains three separate error amplifiers, one for each axis. I had to reverse engineer the amplifier assembly in order to get the indicator working. The assembly mounts on the back of the attitude indicator and connects to one of the indicator's round connectors. Note the cutout in the lower left of the amplifier assembly to provide access to the second connector on the back of the indicator. The aircraft connects to the indicator through the second connector and the indicator passes the input signals to the amplifier through the connector shown above.

The amplifier assembly.

The amplifier assembly.

The amplifier assembly contains three amplifier boards (for roll, pitch, and azimuth), a DC power supply board, an AC transformer, and a trim potentiometer.7 The photo below shows the amplifier assembly mounted on the back of the instrument. At the left, the AC transformer produces the motor control voltage and powers the power supply board, mounted vertically on the right. The assembly has three identical amplifier boards; the middle board has been unmounted to show the components. The amplifier connects to the instrument through a round connector below the transformer. The round connector at the upper left is on the instrument case (not the amplifier) and provides the connection between the aircraft and the instrument.8

The amplifier assembly mounted on the back of the instrument. We are feeding test signals to the connector in the upper left.

The amplifier assembly mounted on the back of the instrument. We are feeding test signals to the connector in the upper left.

The photo below shows one of the three amplifier boards. The construction is unusual, with some components stacked on top of other components to save space. Some of the component leads are long and protected with clear plastic sleeves. The board is connected to the rest of the amplifier assembly through a bundle of point-to-point wires, visible on the left. The round pulse transformer in the middle has five colorful wires coming out of it. At the right are the two transistors that drive the motor's control windings, with two capacitors between them. The transistors are mounted on a heat sink that is screwed down to the case of the amplifier assembly for cooling. The board is covered with a conformal coating to protect it from moisture or contaminants.

One of the three amplifier boards.

One of the three amplifier boards.

The function of each amplifier board is to generate the two control signals so the motor rotates in the appropriate direction based on the error signal fed into the amplifier. The amplifier also uses the tachometer output from the motor unit to slow the motor as the error signal decreases, preventing overshoot. The inputs to the amplifier are 400 hertz AC signals, with the phase indicating positive or negative error. The outputs drive the two control windings of the motor, determining which direction the motor rotates.

The schematic for the amplifier board is below. The two transistors on the left amplify the error and tachometer signals, driving the pulse transformer. The outputs of the pulse transformer will have opposite phase, driving the output transistors for opposite halves of the 400 Hz cycle. One of the transistors will be in the right phase to turn on and pull the motor control AC to ground, while the other transistor will be in the wrong phase. Thus, the appropriate control winding will be activated (for half the cycle), causing the motor to spin in the desired direction.

Schematic of one of the three amplifier boards. (Click for a larger version.)

Schematic of one of the three amplifier boards. (Click for a larger version.)

It turns out that there are two versions of the attitude indicator that use incompatible amplifiers. I think that the motors for the newer indicators have a single control winding rather than two. Fortunately, the connectors are keyed differently so you can't attach the wrong amplifier. The second amplifier (below) looks slightly more modern (1980s) with a double-sided circuit board and more components in place of the pulse transformer.

The second type of amplifier board.

The second type of amplifier board.

The pitch trim circuit

The attitude indicator has a pitch trim knob in the lower right, although the knob was missing from ours. The pitch trim adjustment turns out to be rather complicated. In level flight, an aircraft may have its nose angled up or down slightly to achieve the desired angle of attack. The pilot wants the attitude indicator to show level flight, even though the aircraft is slightly angled, so the indicator can be adjusted with the pitch trim knob. However, the problem is that a fighter plane may, for instance, do a vertical 90º climb. In this case, the attitude indicator should show the actual attitude and ignore the pitch trim adjustment.

I found a 1957 patent that explained how this is implemented. The solution is to "fade out" the trim adjustment when the aircraft moves away from horizontal flight. This is implemented with a special multi-zone potentiometer that is controlled by the pitch angle.

The schematic below shows how the pitch trim signal is generated from the special pitch angle potentiometer and the pilot's pitch trim adjustment. Like most signals in the attitude indicator, the pitch trim is a 400 Hz AC signal, with the phase indicating positive or negative. Ignoring the pitch angle for a moment, the drive signal into the transformer will be AC. The split windings of the transformer will generate a positive phase and a negative phase signal. Adjusting the pitch trim potentiometer lets the pilot vary the trim signal from positive to zero to negative, applying the desired correction to the indicator.

The pitch trim circuit. Based on the patent.

The pitch trim circuit. Based on the patent.

Now, look at the complex pitch angle potentiometer. It has alternating resistive and conducting segments, with AC fed into opposite sides. (Note that +AC and -AC refer to the phase, not the voltage.) Because the resistances are equal, the AC signals will cancel out at the top and the bottom, yielding 0 volts on those segments. If the aircraft is roughly horizontal, the potentiometer wiper will pick up the positive-phase AC and feed it into the transformer, providing the desired trim adjustment as described previously. However, if the aircraft is climbing nearly vertically, the wiper will pick up the 0-volt signal, so there will be no pitch trim adjustment. For an angle range in between, the resistance of the potentiometer will cause the pitch trim signal to smoothly fade out. Likewise, if the aircraft is steeply diving, the wiper will pick up the 0 signal at the bottom, removing the pitch trim. And if the aircraft is inverted, the wiper will pick up the negative AC phase, causing the pitch trim adjustment to be applied in the opposite direction.

Conclusions

The attitude indicator is a key instrument in any aircraft, especially important when flying in low visibility. The F-4's attitude indicator goes beyond the artificial horizon indicator in a typical aircraft, adding a third axis to show the aircraft's heading. Supporting a third axis makes the instrument much more complicated, though. Looking inside the indicator reveals how the ball rotates in three axes while still remaining firmly attached.

Modern fighter planes avoid complex electromechanical instruments. Instead, they provide a "glass cockpit" with most data provided digitally on screens. For instance, the F-35's console replaces all the instruments with a wide panoramic touchscreen displaying the desired information in color. Nonetheless, mechanical instruments have a special charm, despite their impracticality.

For more, follow me on Mastodon as @[email protected] or RSS. (I've given up on Twitter.) I worked on this project with CuriousMarc and Eric Schlapfer, so expect a video at some point. Thanks to John Pumpkinhead and another collector for supplying the indicators and amplifiers.

Notes and references

Specifications9

  1. This three-axis attitude indicator is similar in many ways to the FDAI (Flight Director Attitude Indicator) that was used in the Apollo space flights, although the FDAI has more indicators and needles. It is more complex than the Soyus Globus, used for navigation (teardown), which rotates in two axes. Maybe someone will loan us an FDAI to examine...
     

  2. Our indicator has been used as a parts source, as it has cut wires inside and is missing the pitch trim knob, several needles, and internal adjustment potentiometers. We had to replace two failed capacitors in the power supply. There is still a short somewhere that we are tracking down; at one point it caused the bond wire inside a transistor to melt(!). 

  3. The aircraft is the "Phantom II" because the original Phantom was a World War II fighter aircraft, the McDonnell FH Phantom. McDonnell Douglas reused the Phantom name for the F-4. (McDonnell became McDonnell Douglas in 1967 after merging with Douglas Aircraft. McDonnell Douglas merged into Boeing in 1997. Many people blame Boeing's current problems on this merger.) 

  4. The F-4 could carry a variety of nuclear bombs such as the B28EX, B61, B43 and B57, referred to as "special weapons". The photo below shows the nuclear store consent switch, which armed a nuclear bomb for release. (Somehow I expected a more elaborate mechanism for nuclear bombs.) The switch labels are in the shadows, but say "REL/ARM", "SAFE", and "REL". The F-4 Weapons Delivery Manual discusses this switch briefly.

    The nuclear store consent switch, to the right of the Weapons System Officer in the rear cockpit. Photo from National Museum of the USAF.

    The nuclear store consent switch, to the right of the Weapons System Officer in the rear cockpit. Photo from National Museum of the USAF.

     

  5. The photo below is a closeup of the attitude indicator in the F-4 cockpit. Note the Primary/Standby toggle switch in the upper-left. Curiously, this switch is just screwed onto the console, with exposed wires. Based on other sources, this appears to be the standard mounting. This switch is the "reference system selector switch" that selects the data source for the indicator. In the primary setting, the gyroscopically-stabilized inertial navigation system (INS) provides the information. The INS normally gets azimuth information from the magnetic compass, but can use a directional gyro if the Earth's magnetic field is distorted, such as in polar regions. See the F-4E Flight Manual for details.

    A closeup of the indicator in the cockpit of the F-4 Phantom II. Photo from National Museum of the USAF.

    A closeup of the indicator in the cockpit of the F-4 Phantom II. Photo from National Museum of the USAF.

    The standby switch setting uses the bombing computer (the AN/AJB-7 Attitude-Reference Bombing Computer Set) as the information source; it has two independent gyroscopes. If the main attitude indicator fails entirely, the backup is the "emergency attitude reference system", a self-contained gyroscope and indicator below and to the right of the main attitude indicator; see the earlier cockpit photo. 

  6. The diagram below shows the features of the indicator.

    The features of the Attitude Director Indicator (ADI). From F-4E Flight Manual TO 1F-4E-1.

    The features of the Attitude Director Indicator (ADI). From F-4E Flight Manual TO 1F-4E-1.

    The pitch steering bar is used for an instrument (ILS) landing. The bank steering bar provides steering information from the navigation system for the desired course. 

  7. The roll, pitch, and azimuth inputs require different resistances, for instance, to handle the pitch trim input. These resistors are on the power supply board rather than an amplifier board. This allows the three amplifier boards to be identical, rather than having slightly different amplifier boards for each axis. 

  8. The attitude indicator assembly has a round mil-spec connector and the case has a pass-through connector. That is, the aircraft wiring plugs into the outside of the case and the indicator internals plug into the inside of the case. The pin numbers on the outside of the case don't match the pin numbers on the internal connector, which is very annoying when reverse-engineering the system. 

  9. In this footnote, I'll link to some of the relevant military specifications.

    The attitude indicator is specified in military spec MIL-I-27619, which covers three similar indicators, called ARU-11/A, ARU-21/A, and ARU-31/A. The three indicators are almost identical except the the ARU-21/A has the horizontal pointer alarm flag and the ARU-31/A has a bank angle command pointer and a bank scale at the bottom of the indicator, along with a bank angle command pointer adjustment knob in the lower left. The ARU-11/A was used in the F-111A. (The ID-1144/AJB-7 indicator is probably the same as the ARU-11/A.) The ARU-21/A was used in the A-7D Corsair. The ARU-31/A was used in the RF-4C Phantom II, the reconnaissance version of the F-4. The photo below shows the cockpit of the RF-4C; note that the attitude indicator in the center of the panel has two knobs.

    Cockpit panel of the RF-4C. Photo from National Museum of the USAF.

    Cockpit panel of the RF-4C. Photo from National Museum of the USAF.

    The indicator was part of the AN/ASN-55 Attitude Heading Reference Set, specified in MIL-A-38329. I think that the indicator originally received its information from an MD-1 gyroscope (MIL-G-25597) and an ML-1 flux valve compass, but I haven't tracked down all the revisions and variants.

    Spec MIL-I-23524 describes an indicator that is almost identical to the ARU-21/A but with white flags. This indicator was also used with the AJB-3A Bomb Release Computing Set, part of the A-4 Skyhawk. This indicator was used with the integrated flight information system MIL-S-23535 which contained the flight director computer MIL-S-23367.

    My indicator has no identifying markings, so I can't be sure of its exact model. Moreover, it has missing components, so it is hard to match up the features. Since my indicator has white flags it might be the ID-1329/A.

     

SDF combining distance fields

Learning about font rendering, I was looking at text closely last time, and I noticed another issue. The shadows of each letter overlap the previous letter. That's because I'm drawing one letter at a time. So for example in the fl, I draw the f's letter, outline, and shadow, then I draw l's letter, outline, and shadow. So l's shadow is drawn on top of f's letter.

Shadows of each letter overlap the previous letter
Closeup showing the shadow drawn over the character to the left

The first thing I considered was to draw all the shadows first and then draw all the letters. But a second problem here, harder to see, is that shadows are drawn on top of the adjacent letter's shadows, and that causes them to be darker than they should be. The distance fields for adjacent letters always overlap:

Sprite distance fields always overlap

It's only when the letters are close enough that you can see artifacts of the double rendering.

To solve both problems, I can generate a new distance field which is the min() of the distance fields for each character. The min() of signed distance fields is the union of the shapes. I used the blendEquation() function with MAX_EXT instead of the default FUNC_ADD (it's max instead of min because msdfgen's encoding is inverted). The MAX_EXT extension seems to have 100% support in WebGL 1, and is always included with WebGL 2.

Adding an intermediate distance field texture

The output texture holds a combined distance field. I then use the distance field shader to draw that combined distance field to the map.

I could do this once per string I want to draw or once for the entire scene. I decided to do it once per string, because that allows me to use different colors and styles (thickness, outline width, shadow, halo) per string.

I ran into a few bugs with this:

  • because I couldn't figure out how to min() msdf fields, I put an sdf in the combined distance field instead of msdf, and that meant the corners of fonts got a little rounder; to compensate I increased the resolution of the combined distance field
  • having a different resolution for the original and combined distance fields messed up the antialiasing; in the previous post I described a "slope" of distance field vs output pixels, but now there's an intermediary with the slope stretched out
  • the letters got out of alignment when I moved them around for the intermediate texture; in particular, I had to calculate a new baseline position accounting for the change in resolution

One way I compared parameters was by rendering them in alternating frames. Here's an example showing that the combined distance field resolution matters (slightly):

Comparison of 3X vs 5X resolution on the combined distance field

Here's the final result, showing that the overlapped drawing is fixed, especially at the bottom left of the second g:

 
Before and after combining distance fields

The combined distance field approach solves the problem but it means I need to write each string to a separate intermediate texture, which leads to a rabbit hole of having to allocate texture space during rendering, possibly reusing textures, and dealing with gpu pipeline stalls. That's an area I don't understand well. Fortunately I don't need it to be optimized for my project. But for a game project, I might choose to do the two pass approach instead, letting the shadows get drawn twice in overlapping areas. Are there better approaches? I don't know.

Mentirinhas #2188

O post Mentirinhas #2188 apareceu primeiro em Mentirinhas.

Some Go web dev notes

I spent a lot of time in the past couple of weeks working on a website in Go that may or may not ever see the light of day, but I learned a couple of things along the way I wanted to write down. Here they are:

go 1.22 now has better routing

I’ve never felt motivated to learn any of the Go routing libraries (gorilla/mux, chi, etc), so I’ve been doing all my routing by hand, like this.

	// DELETE /records:
	case r.Method == "DELETE" && n == 1 && p[0] == "records":
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		deleteAllRecords(ctx, username, rs, w, r)
	// POST /records/<ID>
	case r.Method == "POST" && n == 2 && p[0] == "records" && len(p[1]) > 0:
		if !requireLogin(username, r.URL.Path, r, w) {
			return
		}
		updateRecord(ctx, username, p[1], rs, w, r)

But apparently as of Go 1.22, Go now has better support for routing in the standard library, so that code can be rewritten something like this:

	mux.HandleFunc("DELETE /records/", app.deleteAllRecords)
	mux.HandleFunc("POST /records/{record_id}", app.updateRecord)

Though it would also need a login middleware, so maybe something more like this, with a requireLogin middleware.

	mux.Handle("DELETE /records/", requireLogin(http.HandlerFunc(app.deleteAllRecords)))

a gotcha with the built-in router: redirects with trailing slashes

One annoying gotcha I ran into was: if I make a route for /records/, then a request for /records will be redirected to /records/.

I ran into an issue with this where sending a POST request to /records redirected to a GET request for /records/, which broke the POST request because it removed the request body. Thankfully Xe Iaso wrote a blog post about the exact same issue which made it easier to debug.

I think the solution to this is just to use API endpoints like POST /records instead of POST /records/, which seems like a more normal design anyway.

sqlc automatically generates code for my db queries

I got a little bit tired of writing so much boilerplate for my SQL queries, but I didn’t really feel like learning an ORM, because I know what SQL queries I want to write, and I didn’t feel like learning the ORM’s conventions for translating things into SQL queries.

But then I found sqlc, which will compile a query like this:


-- name: GetVariant :one
SELECT *
FROM variants
WHERE id = ?;

into Go code like this:

const getVariant = `-- name: GetVariant :one
SELECT id, created_at, updated_at, disabled, product_name, variant_name
FROM variants
WHERE id = ?
`

func (q *Queries) GetVariant(ctx context.Context, id int64) (Variant, error) {
	row := q.db.QueryRowContext(ctx, getVariant, id)
	var i Variant
	err := row.Scan(
		&i.ID,
		&i.CreatedAt,
		&i.UpdatedAt,
		&i.Disabled,
		&i.ProductName,
		&i.VariantName,
	)
	return i, err
}

What I like about this is that if I’m ever unsure about what Go code to write for a given SQL query, I can just write the query I want, read the generated function and it’ll tell me exactly what to do to call it. It feels much easier to me than trying to dig through the ORM’s documentation to figure out how to construct the SQL query I want.

Reading Brandur’s sqlc notes from 2024 also gave me some confidence that this is a workable path for my tiny programs. That post gives a really helpful example of how to conditionally update fields in a table using CASE statements (for example if you have a table with 20 columns and you only want to update 3 of them).

sqlite tips

Someone on Mastodon linked me to this post called Optimizing sqlite for servers. My projects are small and I’m not so concerned about performance, but my main takeaways were:

  • have a dedicated object for writing to the database, and run db.SetMaxOpenConns(1) on it. I learned the hard way that if I don’t do this then I’ll get SQLITE_BUSY errors from two threads trying to write to the db at the same time.
  • if I want to make reads faster, I could have 2 separate db objects, one for writing and one for reading

There are a more tips in that post that seem useful (like “COUNT queries are slow” and “Use STRICT tables”), but I haven’t done those yet.

Also sometimes if I have two tables where I know I’ll never need to do a JOIN beteween them, I’ll just put them in separate databases so that I can connect to them independently.

Go 1.19 introduced a way to set a GC memory limit

I run all of my Go projects in VMs with relatively little memory, like 256MB or 512MB. I ran into an issue where my application kept getting OOM killed and it was confusing – did I have a memory leak? What?

After some Googling, I realized that maybe I didn’t have a memory leak, maybe I just needed to reconfigure the garbage collector! It turns out that by default (according to A Guide to the Go Garbage Collector), Go’s garbage collector will let the application allocate memory up to 2x the current heap size.

Mess With DNS’s base heap size is around 170MB and the amount of memory free on the VM is around 160MB right now, so if its memory doubled, it’ll get OOM killed.

In Go 1.19, they added a way to tell Go “hey, if the application starts using this much memory, run a GC”. So I set the GC memory limit to 250MB and it seems to have resulted in the application getting OOM killed less often:

export GOMEMLIMIT=250MiB

some reasons I like making websites in Go

I’ve been making tiny websites (like the nginx playground) in Go on and off for the last 4 years or so and it’s really been working for me. I think I like it because:

  • there’s just 1 static binary, all I need to do to deploy it is copy the binary. If there are static files I can just embed them in the binary with embed.
  • there’s a built-in webserver that’s okay to use in production, so I don’t need to configure WSGI or whatever to get it to work. I can just put it behind Caddy or run it on fly.io or whatever.
  • Go’s toolchain is very easy to install, I can just do apt-get install golang-go or whatever and then a go build will build my project
  • it feels like there’s very little to remember to start sending HTTP responses – basically all there is are functions like Serve(w http.ResponseWriter, r *http.Request) which read the request and send a response. If I need to remember some detail of how exactly that’s accomplished, I just have to read the function!
  • also net/http is in the standard library, so you can start making websites without installing any libraries at all. I really appreciate this one.
  • Go is a pretty systems-y language, so if I need to run an ioctl or something that’s easy to do

In general everything about it feels like it makes projects easy to work on for 5 days, abandon for 2 years, and then get back into writing code without a lot of problems.

For contrast, I’ve tried to learn Rails a couple of times and I really want to love Rails – I’ve made a couple of toy websites in Rails and it’s always felt like a really magical experience. But ultimately when I come back to those projects I can’t remember how anything works and I just end up giving up. It feels easier to me to come back to my Go projects that are full of a lot of repetitive boilerplate, because at least I can read the code and figure out how it works.

things I haven’t figured out yet

some things I haven’t done much of yet in Go:

  • rendering HTML templates: usually my Go servers are just APIs and I make the frontend a single-page app with Vue. I’ve used html/template a lot in Hugo (which I’ve used for this blog for the last 8 years) but I’m still not sure how I feel about it.
  • I’ve never made a real login system, usually my servers don’t have users at all.
  • I’ve never tried to implement CSRF

In general I’m not sure how to implement security-sensitive features so I don’t start projects which need login/CSRF/etc. I imagine this is where a framework would help.

it’s cool to see the new features Go has been adding

Both of the Go features I mentioned in this post (GOMEMLIMIT and the routing) are new in the last couple of years and I didn’t notice when they came out. It makes me think I should pay closer attention to the release notes for new Go versions.

Beamsplitters

Under quantum tax law, photons sent through a beamsplitter don't actually choose which path they took, or incur a tax burden, until their wavefunction collapses when the power is sold.

Testing strategy: avoid the waterfall strategy trap with iterative refinement.

If I could only popularize one idea about technical strategy, it would be that prematurely applying pressure to a strategy’s rollout prevents evaluating whether the strategy is effective. Pressure changes behavior in profound ways, and many of those changes are intended to make you believe your strategy is working while minimizing change to the status quo (if you’re an executive) or get your strategy repealed (if you’re not an executive). Neither is particular helpful.

While some strategies are obviously wrong from the beginning, it’s much more common to see reasonable strategies that fail because they didn’t get the small details right. Premature pressure is one common cause of a more general phenomenon: most strategies are developed in a waterfall model, finalizing their approach before incorporating the lessons that realities teaches when you attempt the strategy in practice.

One effective mechanism to avoid the waterfall strategy trap is explicitly testing your strategy to refine the details. This chapter describes the mechanics of testing strategy:

  • when it’s important to test strategy (and when it isn’t)
  • how to test strategy
  • when you should stop testing
  • roles in testing strategy: sponsor vs guide
  • metrics and meetings to run a testing strategy
  • how to identify a strategy that skipped testing
  • what to do when a strategy has progressed too far without testing

Let’s get into the details.


This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Many of the ideas here came together while working with Shawna Martell, Dan Fike, Madhuri Sarma, and many others in Carta Engineering.

When to test strategy

Strategy testing is ensuring that a strategy will accomplish its intended goal at a cost that you’re willing to pay. This means it needs to happen prior to implementing a strategy, usually in a strategy’s early development stages.

Afew examples of when to test common strategy topics:

  • Integrating a recent acquisition might focus on getting a single API integration working before finalizing how the overall approach goes.
  • A developer productivity strategy focused on requiring typing in a Python codebase might start by having an experienced team member type an important module.
  • A service migration might attempt migrating both a simple component (to test migration tooling) and a highly complex component (to test integration complexity) before moving to a broader rollout.

In every case, the two most important pieces are testing before finalizing the strategy, and testing narrowly with a focus on the underlying mechanics of the approach rather than getting caught up in solving broad problems like motivating adoption and addressing conflicting incentives.

That’s not to say that you need to test every strategy. A few of the common cases where you might not want to test a strategy are:

  • When you’re dealing with a permissive strategy that’s very cheap to apply, testing is often not too important; indeed, you can consider most highly-permissive strategies as a test of whether it’s effective to implement a similar, but less permissive, strategy in the future.
  • Where testing isn’t viable for some reason. For example, a hiring strategy where you shift hiring into certain regions isn’t something you can test in most cases, it’s something you might need to run for several years to get meaningful signal on results.
  • There are also cases where you have such high conviction in a given strategy that it’s not worth testing, perhaps because you’ve done something nearly identical at the same company before. Hubris comes before the fall, so I’m generally skeptical of this category.

That said, my experience is that you should try very hard to find a way to test every strategy. You certainly should not try hard to convince yourself testing a strategy isn’t worthwhile. Testing is so, so much cheaper than implementing a bad strategy, that it’s almost always a good investment of time and energy.

How to test strategy

For a valuable step that’s so often skipped, testing strategy is relatively straightforward. The approach I’ve found effective is:

  1. Identify the narrowest, deepest available slice of your strategy, and iterate on applying your strategy to that slice until you’re confident the approach works well.

    For example, if you’re testing a new release strategy for your Product Engineering organization, decide to release exactly one important release following the new approach.

  2. As you iterate, identify metrics that help you verify the approach is working; note that these aren’t metrics to measure adoption, instead that measure impact of the change.

    For example, metrics that show the new release process reduces customer impact, or drives more top-of-funnel visitors.

  3. Operate from the belief that people are well-meaning, and strategy failures are due to excess friction and poor ergonomics.

    For example, assume the release tooling is too complex if people aren’t using it. (Definitely don’t assume that people are too resistent to change.)

  4. Keep refining until you have conviction that your strategy’s details work in practice, or that the strategy needs to be approached from a new direction.

    For example, if the metrics you identified before show the new release process has significantly reduced customer impact of the new release.

The most important details are the things not to do. Don’t go broad where impact feels higher but iteration cycles are slower. Don’t get caught up on forcing adoption such that you’re distracted from improving the underlying mechanics. Finally, don’t get so attached to your current approach that you can’t accept that it might not be working. Testing strategy is only valuable because many strategies don’t work as intended, and it’s much cheaper to learn that early.

Testing roles: sponsors and guides

Sometimes the strategy testing process is lead by one individual who is able to sponsor the work (a principal engineer at a smaller company, an executive, etc) and also coordinate the day-to-day work of validating the approach (a principal engineer at a larger company, an engineering manager, a technical program manager, etc). It’s even more common for these responsibilities to split between two roles: sponsor and a guide.

The sponsor is responsible for:

  1. serving as an escalation point to make quick decisions to avoid getting stuck in development stages
  2. pushing past historical decisions and beliefs that prevent meaningful testing
  3. marshalling cross organizational support
  4. telling the story to stakeholders, especially the executive team to avoid getting defunded
  5. preventing overloading of strategy (where people want to make the strategy solve their semi-related problem)
  6. setting pace to avoid stalling out
  7. identifying when energy is dropping and to change phase of stratey (from development to implementation)

The guide is responsible for:

  1. translating the strategy into particulars where testing gets stuck
  2. identifying slowdowns and blockers
  3. escalating frequently to sponsor
  4. tracking goals and workstreams
  5. maintaining the pace set by the sponsor

In terms of filling these roles, there are a few lessons that I’ve learned over time. For sponsors, what matters the most is that they’re genuinely authorized by the company to make the decision they’re making, and that they care enough about the impact that they’re willing to make difficult decisions quickly. A sponsor is only meaningful to the extent that the guide can escalate to the sponsor and they rapidly resolve those escalations. If they aren’t available for escalations or don’t resolve them quickly, they’re a poor sponsor.

For guides, you need someone who can execute at pace without getting derailed by various organizational messes, and has good, nuanced judgment relevant to the strategy being tested. The worst guides are ideological (they reject the very feedback created by testing) or easily derailed (you’re likely testing because there’s friction somewhere, so someone who can’t navigate friction is going to fail by default).

Meetings & Metrics

The only absolute requirement for the strategy testing phase is that the sponsor, guide, and any key folks working on the strategy must meet together every single week. Within that meeting, you’ll iterate on which metrics capture the current areas you’re trying to refine, discuss what you’ve learned from prior metrics or data, and schedule one-off followups to ensure you’re making progress.

The best version of this meeting is debugging heavy and presentation light. Any week that you’re not learning something that informs subsequent testing, or making a decision that modifies approach to testing, should be viewed with some suspicion. It might mean that you’ve underresourced the testing effort, or that your testing approach is too ambitious, but it’s a meaningful signal that testing is converging too slowly to maintain attention.

If all of this seems like an overly large commitment, I’d push you to consider your strategy altitude to adjust the volume or permisiveness of the strategy you’re working on. If a strategy isn’t worth testing, then it’s either already quite good (which should be widely evident beyond its authors) or it’s probably only worth rolling out in a highly permissive format.

Identifying strategies that skipped testing

While not all strategies must be refined by a testing phase, essentially all failing strategies skip the testing phase to move directly into implementation. Strategies that skip testing sound right, but don’t accomplish much. Fully standardizing authorization and authentication across your company on one implementation sounds right, but can still fail if e.g. each team is responsible for its own approach to determining the standard.

One particularly obvious pattern is something I describe as “pressure without a plan.” This is a strategy that is only the “sounds right” aspect with none of the details. Service migrations are particularly prone to this, perhaps due to apocryphal descriptions of Amazon’s service migration in the 2000s, which is often summarized as a top-down zero-details mandate to switch away from the monolith.

Identification comes down to understanding two things:

  1. Are there numbers that show the strategy is driving the desired impact? For example, API requests made into the new authentication service as a percentage of all authentication requests is more meaningful than a spreadsheet tracking teams’ commitments to move to the new service.

    Try to avoid proxy metrics when possible, but to instead look at the actual thing that matters.

  2. If the numbers aren’t moving, is there a clear mechanism debugging and solving those issues, and is this team actually making progress? For example, a team who help integration with the new authentication service to understand where limitations are preventing effective adoption, and who are shipping working code.

    Because the numbers aren’t moving, you need to find a different source of meaningful evidence to validate that progress is happening. Generally the best bet is either new software running in a meaningful environment (e.g. production for product code). It’s also useful to talk with skeptics or failed integrations, but be cautious of debugging exclusively with skeptics.
    They’re almost always right, but often they are out-of-date, such that they’re right but aren’t describing current problems.

Unless one of these two identifications are obviously true, then it’s very likely that you’ve found a strategy that skipped testing.

Recovering from skipped testing

Once you’ve recognized a strategy that skipped testing and is now struggling, the next question is what to do about it. Should we decompose our monolith? looks at recovering from a failing service migration, and is lightly based on my experience dealing with similar, stuck service migration at both Calm and Carta. The answer to a stuck strategy is always: write a new strategy, and make sure not to skip testing this time.

Typically, the first step of this new strategy is explicitly pausing the struggling strategy while a new testing phase occurs. This is painful to do, because the folks invested in the current strategy will be upset with you, but there’s always going to be people who disagree with any change. Long-term, the only thing that makes most people happy is a successful strategy, and anything that delays progress towards is a poor investment.

Sometimes it is difficult to officially pause a struggling strategy, in which case you have to look for an indirect mechanism to implicitly pause without acknowledging it. For example, delaying new services while you take a month to invest into improving service provisioning might give you enough breathing room to test the missing mechanisms from your strategy, without requiring anyone to lose face that the migration is failing. It would be nice to always be able to say these things out loud, but managing personalities is an enduring leadership challenge; even when your an executive, you just have a different set of messy stakeholders.

Summary

Testing doesn’t determine whether a strategy might be good. It exposes the missing details required to translate a directionally accurate strategy into a strategy that works. After reading this chapter, you know how to lead that translation process as both a sponsor and a guide. You can setup and run the necessary meetings to test a strategy, and also put together the bank of metrics to determine if the strategy is ready to leave refinement and move to a broader rollout.

Woo-Woo

Woo-woo

System Design - Replicação de Dados

Neste capítulo da nossa série de System Design, vamos explorar os conceitos de escalabilidade de aplicações críticas, com foco especial na replicação de dados. Este tema está diretamente relacionado ao capítulo anterior sobre Sharding e Particionamento de Dados, já que esses conceitos costumam ser usados em conjunto em diversas abordagens arquiteturais. Nosso objetivo é apresentar diferentes padrões e estratégias de replicação, demonstrando como esses conceitos podem melhorar a disponibilidade, confiabilidade e performance de sistemas distribuídos.


Definindo Replicação na Engenharia de Software

Imagine as chaves do seu carro ou da sua casa. Agora, imagine que exista apenas uma cópia delas e que você as carregue o dia todo, acompanhando você na academia, supermercado, trabalho, restaurantes, aniversários, festas, cinema e outros lugares. Agora, imagine que, infelizmente, você tenha perdido essas chaves em algum desses lugares e só percebeu quando precisou delas. O que aconteceria? Você ficaria trancado fora do carro ou de casa, causando um grande problema. Quando pensamos em replicação de dados, é exatamente esse tipo de problema que queremos evitar. Agora, imagine que você tenha cópias de reserva distribuídas com diferentes pessoas, como parceiro(a), pai, mãe ou amigos próximos, ou até mesmo escondidas em locais estratégicos, como debaixo do tapete da garagem ou dentro de um vaso de plantas. Essa é uma analogia para o tipo de estratégia que usamos para lidar com replicação.

Nas disciplinas que compõe a engenharia de software, a replicação se refere ao ato de criar uma ou mais cópias do mesmo dado em locais diferentes. Essa é uma prática recomendada especialmente em sistemas onde a consistência, disponibilidade e tolerância a falhas são requisitos mandatórios para o ciclo de vida do produto.

Essas réplicas podem estar localizadas em servidores ou nós distintos, datacenters separados geograficamente ou até mesmo em diferentes regiões de nuvens públicas. O objetivo principal da replicação é garantir que os dados estejam disponíveis em vários locais, o que é essencial para sistemas que exigem alta disponibilidade. Quando falamos de Bancos de Dados, a replicação permite que, mesmo em caso de falhas críticas de hardware ou problemas de rede, os dados continuem acessíveis em outros locais. Ela também garante que o sistema se tornará consistente em algum momento, dependendo apenas do tempo necessário para que uma réplica se torne a nova fonte principal de dados.

Em resumo, o objetivo das estratégias de replicação é garantir que os mesmos dados estejam disponíveis em vários locais, permitindo que o sistema continue operando mesmo que uma parte dele falhe.


Modelos de Replicação

Antes de abordarmos as estratégias de replicação, é importante entender alguns modelos pelos quais os sistemas de replicação são construídos. Independentemente da estratégia adotada para garantir a existência de várias cópias do mesmo dado em diferentes locais ou nós, o modelo de replicação geralmente segue um dos dois principais tipos: Primary-Replica ou Primary-Primary, também conhecido como Multi-Master. Vamos entender conceitualmente essas duas abordagens para estabelecer uma base antes de falarmos sobre as implementações práticas.

Replicação Primary-Replica

Na Replicação Primary-Replica, é proposto que um nó primário receba todas as operações de escrita e, em seguida, replica essas operações para um ou mais nós secundários. Esses nós são as denominadas réplicas. As réplicas geralmente são usadas apenas para leitura, enquanto todas as operações de escrita são gerenciadas exclusivamente pelo nó primário desse sistema. Essa arquitetura é útil quando fazemos uso de leituras intensivas, por nos permite criar uma distribuição de carga entre as réplicas, mas mantendo a simplicidade nas escritas.

Em resumo, nesse tipo de modelo, o nó primário é responsável por garantir a consistência dos dados em todas as réplicas. Essa abordagem é muito bem aceita em cenários de leitura intensiva ou em sistemas que utilizam intensivamente CQRS para criar modelos de leitura otimizados, mas em contrapartida cria um ponto único de falha a partir do nó primário. Se esse componente primário falhar, uma das réplicas precisará ser promovida a novo nó primário, o que pode causar um tempo de inatividade e erros até que esse processo seja concluído por completo. O tempo para conclusão varia de acordo com a implementação utilizada, e de muitas formas pode existir muita engenharia manual.

Replicação Primary-Primary - Multi-Master

A Replicação Primary-Primary, também conhecida como Multi-Master Replication, é uma arquitetura em que múltiplos nós podem atuar simultaneamente como primários, recebendo tanto operações de leitura quanto de escrita. Nessa configuração, qualquer nó pode processar atualizações, e as mudanças são replicadas entre todos os nós, permitindo alta disponibilidade e escalabilidade de escrita.

Replicacao Multi-Primary

Esse modelo elimina o ponto único de falha presente na replicação Primary-Replica e permite maior flexibilidade na distribuição da carga de trabalho. No entanto, ele também introduz complexidade adicional, especialmente para resolver conflitos de escrita. Quando duas operações de escrita ocorrem em diferentes nós primários simultaneamente, o sistema precisa de uma estratégia para resolver esses conflitos, como o uso de timestamps para ordenar as operações ou a definição de políticas específicas de resolução de conflitos em casos de particionamento temporário por falha de rede.


Estratégias de Replicação

Nas disciplinas de engenharia que vimos até agora, não somente nesse texto, encontramos várias estratégias onde a replicação pode auxiliar em conjunto com outras abordagens, principalmente quando estamos olhando para dados — que geralmente são o foco dessas abordagens devido à sua importância e complexidade — quanto para outras áreas menos convencionais, como replicação de cargas de trabalho completas, domínios de software em cache, entre outros. O objetivo deste capítulo é exemplificar alguns dos modelos de replicação mais utilizados e explicar suas diferenças, vantagens e desvantagens para que fique claro a aplicabilidade de cada cenário, e auxilie nas decisões de arquitetura e engenharia.

Replicação Total e Parcial

A Replicação Total se refere à prática de replicar todos os dados em todos os nós de um sistema. Isso significa que cada nó possui uma cópia completa de todos os dados. A vantagem dessa abordagem é que ela adiciona váris níveis de disponibilidade, permitindo que qualquer nó atenda a uma solicitação do cliente a qualquer momento, desde que as operações de escrita sejam deliberadamente permitidas. No entanto, como contraponto, essa estratégia pode aumentar os custos de armazenamento e a latência das escritas, já que cada novo registro ou versão do dado precisa ser replicado e confirmado em todos os nós que compõem o cluster do sistema em questão. Esse modelo também é conhecido como Full-Table Replication em abordagens acadêmicas.

Por outro lado, a Replicação Parcial distribui apenas uma parte dos dados para cada nó. Assim, cada nó contém apenas uma fração dos dados totais. Esse modelo é mais eficiente em termos de armazenamento e reduz a latência nas operações de escrita, mas aumenta a complexidade nas leituras, pois os dados solicitados podem não estar disponíveis localmente e podem exigir comunicação entre nós. Isso pode fazer com que o cliente precise consultar vários nós, ou que o sistema de consultas abstraia essa complexidade. Para localizar os dados entre os nós, é comum implementar algoritmos de Sharding, como o Hashing Consistente.

Replicação Síncrona

Na Replicação Síncrona, presume-se que todas as alterações nos dados devem ser replicadas em todos os nós antes que a operação seja considerada concluída para o solicitante. Isso garante consistência forte entre os nós, pois uma valor escrito ou atualizado só estará disponível para leitura após todos os nós confirmarem que escreveram o mesmo com sucesso, ou seja, todos eles responderão com os mesmos dados em qualquer momento, independente de qual deles receber a solicitação de leitura.

Replicação Síncrona

Em um cenário onde um cliente precisa salvar uma informação em um cluster que, por exemplo, oferece funcionalidades de cache, ele envia o dado, em formato de chave e valor, para o endpoint primário do cluster como de costume. Esse endpoint é responsável por distribuir o dado entre todos os nós do cluster. A solicitação só é finalizada e confirmada ao cliente quando essa operação é concluída por completo, ou seja, todos os nós responderam “ok” para a solicitação de salvar o dado. Uma técnica comum para implementação de replicação síncrona é o two-phase commit.

A replicação síncrona tem vantagens em cenários onde a consistência é crítica, como em sistemas de pagamento ou bancos de dados financeiros, onde qualquer discrepância entre os nós pode causar grandes problemas. No entanto, essa abordagem pode aumentar a latência, especialmente quando os nós estão distribuídos geograficamente ou em grande quantidade.

Replicação Assíncrona

Na Replicação Assíncrona, as alterações de dados são enviadas a um dos nós de um cluster e replicadas para os outros nós de forma eventual, o que significa que a operação pode ser considerada bem-sucedida sem que todas as réplicas tenham sido atualizadas. Isso resulta em maior desempenho nas operações de escrita, pois o sistema não precisa esperar pelas confirmações de todos os nós. No entanto, podem ocorrer períodos de inconsistência ou consistência eventual nas consultas subsequentes, já que os dados podem não ter sido totalmente replicados nos demais nós, podendo existir diferentes versões do dado até que a replicação seja concluída.

Replicação Assíncrona

A replicação assíncrona é amplamente utilizada em cenários onde a disponibilidade e o desempenho são mais importantes do que a consistência imediata, como em redes sociais, assets em uma CDN, dados menos críticos usados para reduzir acessos a uma origem, clusters de cache e afins.

Replicação Semi-Síncrona

A Replicação Semi-Síncrona combina aspectos da replicação síncrona e assíncrona. Nesse modelo, pelo menos uma réplica, ou um pequeno subconjunto de réplicas, deve confirmar a gravação dos dados antes que a operação seja considerada bem-sucedida. As demais réplicas podem ser atualizadas de forma assíncrona.

Replicação Semi-Síncrona

Esse tipo de replicação oferece um equilíbrio entre consistência e desempenho. Ele adiciona níveis extras de resiliência, garantindo que os dados sejam gravados em pelo menos um nó de forma síncrona, enquanto mantém baixa latência ao não exigir que todas as réplicas sejam atualizadas imediatamente. Em alguns bancos de dados, como MySQL e MariaDB, a operação de escrita é confirmada assim que um nó secundário grava os dados. Outros nós podem receber as atualizações posteriormente de forma assíncrona, oferecendo um grau adicional de consistência sem comprometer totalmente a performance.

Replicação por Logs

A Replicação por Logs é uma abordagem em que todas as operações realizadas em um sistema são registradas em um log de operações sequenciais, e esse log é então replicado para outros nós do cluster, que executam as mesmas operações. Em vez de replicar o estado completo dos dados, o sistema replica as mudanças, permitindo que as réplicas apliquem essas alterações localmente e mantenham seus dados consistentes.

Replicação por Logs

Essa abordagem é vantajosa em cenários onde as alterações são mais frequentes que as leituras ou onde o volume de dados é muito grande, pois apenas as modificações são replicadas, reduzindo a quantidade de dados trafegados entre os nós do cluster. Esse tipo de replicação é comum em tecnologias que permitem interoperabilidade entre múltiplas regiões de nuvens públicas, datacenters ou zonas de recuperação de desastres.

Tecnologias amplamente conhecidas e maduras, como o Apache Kafka e outras plataformas de streaming e eventos, utilizam replicação por logs em sua arquitetura de nós e réplicas. Em Kafka, cada tópico é composto por múltiplas partições, e as alterações nessas partições são registradas em logs de transações que são replicados entre os brokers, garantindo durabilidade e resiliência.

Replicação por Logs também é utilizada em algoritmos essenciais para sistemas distribuídos, como Paxos (utilizado em sistemas como BigTable e Apache Mesos), Raft (usado no etcd, ScyllaDB, Consul e CockroachDB) e Viewstamped Replication (usado no TigerBeetle), e em técnicas como o write-ahead log, que é utilizado para guarantir a durabilidade de dados durante replicação em caso de falhas em nós.


Arquitetura

As estratégias de replicação podem ser aplicadas manualmente na engenharia de software para resolver diversos desafios arquiteturais. Embora a replicação seja frequentemente associada a funcionalidades prontas, como em caches ou bancos de dados, é importante entender que, conceitualmente, ela pode ser usada de forma muito mais ampla. Quando adotada de maneira estratégica, a replicação permite escalar sistemas distribuídos de forma inteligente e eficiente. A seguir, exploraremos algumas abordagens arquiteturais que utilizam replicação, combinada com outras técnicas, para melhorar o desempenho e a escalabilidade de sistemas de grande porte.

Event-Carried State Transfer - Replicação de Estados e Objetos de Domínios

Em grandes sistemas, especialmente em arquiteturas corporativas complexas, uma solução eficaz para lidar com a alta disponibilidade de grandes volumes de dados é o Event-Carried State Transfer.

Esse padrão permite que o estado de um objeto seja transmitido entre serviços ou domínios de software por meio de eventos. Ele combina estratégias de cache, sistemas baseados em eventos e replicação de dados, proporcionando uma maneira custosa, porém poderosa, de lidar com grandes volumes de dados sem aumentar o nível de acoplamento.

A ideia central é que, sempre que houver uma atualização em uma entidade de um domínio, essa mudança seja publicada em tópicos de eventos. Os demais serviços que dependem desse domínio podem consumir esses eventos e atualizar suas próprias bases de dados locais, criando uma cópia em cache do estado. Isso é especialmente útil em sistemas que toleram consistência eventual, pois, em vez de consultar uma fonte centralizada a cada solicitação, os serviços mantêm e utilizam suas próprias versões dos dados, que são atualizadas conforme os eventos são processados. Em sistemas complexos e altamente distribuídos, a relação custo-benefício dessa abordagem pode se tornar viável.

State Transfer

Imagine um sistema governamental que compartilha os dados dos cidadãos entre diferentes sistemas de diversos órgãos, como bancários, fiscais, entidades de segurança, sistemas de trânsito, imobiliário e social. Pensando nesse caso orientado a eventos, sempre que o cidadão atualizar seu estado civil, renda, endereço ou telefone de contato em um sistema central de cadastro, essa informação seria notificada por um evento, e cada um desses sistemas de órgãos públicos o consumiria, atualizando sua base cadastral.


Replicação por Change Data Capture - Captura de Alterações em Dados

O Change Data Capture (CDC) é uma técnica que detecta e captura as alterações feitas em uma fonte de dados, como um banco de dados relacional ou não relacional, e as transmite para outros sistemas em tempo real. Isso permite que outros serviços sejam imediatamente atualizados sem a necessidade de consultar diretamente o banco de dados original. Essa abordagem é muito útil para sincronizar dados entre diferentes sistemas, alimentar filas de mensagens ou manter caches atualizados com as informações mais recentes.

CDC

O objetivo desse padrão é oferecer um mecanismo que monitora operações como inserções, atualizações e deleções, capturando essas mudanças à medida que ocorrem. Depois de capturadas, as alterações podem ser enviadas para tópicos de eventos ou diretamente para sistemas que dependem desses dados. Isso possibilita que outros serviços recebam as informações mais recentes sem sobrecarregar o banco de dados principal com consultas constantes.

Replicação Proativa

Essa técnica serve como base para outras estratégias, como o Event-Carried State Transfer, que se beneficia da captura de eventos para replicar dados de forma inteligente e proativa. O CDC também viabiliza processos que envolvem streaming de dados para datalakes, cacheamento proativo e CQRS, atuando como uma ponte reativa que facilita a replicação e a integração com outros padrões.


CRDT’s - Conflict Free Replicated Data Types

Em ambientes de replicação distribuídas, especialmente em arquiteturas primary-primary ou multi-master, os CRDTs (Conflict-Free Replicated Data Types) são estruturas de dados que resolvem um dos maiores desafios desse modelo de replicação: como lidar com conflitos entre diferentes atualizações de um dado. Esse tipo de situação ocorre quando mais de um nó recebe alterações distintas do mesmo dado, e durante a propagação dessas versões, surge a necessidade de resolver o conflito para decidir “qual será a versão final e correta” do dado. Os CRDTs garantem que esses conflitos sejam resolvidos automaticamente, sem a necessidade de coordenação ou bloqueio entre os nós.

Pense em um sistema de edição de documentos colaborativo, onde várias pessoas de uma equipe podem editar o mesmo artigo ao mesmo tempo. Se dois usuários, em nós diferentes, editarem a mesma linha do texto simultaneamente, um sistema que utiliza CRDTs pode mesclar as alterações automaticamente, resultando em uma versão final sem exigir intervenção manual ou gerar conflitos.

Os CRDTs são projetados de maneira que, mesmo que múltiplos nós atualizem um dado de forma independente, quando essas atualizações forem sincronizadas, o estado final será consistente. Isso é possível graças a propriedades matemáticas que tornam as operações associativas, comutativas e idempotentes, o que significa que a ordem das operações não importa — o resultado final será sempre o mesmo.

Além disso, os CRDTs garantem consistência eventual, ou seja, todos os nós eventualmente terão uma cópia consistente do dado, mesmo que tenham feito atualizações simultâneas. Como essa abordagem não exige bloqueios ou coordenação entre os nós, cada nó pode operar de forma independente, o que aumenta a disponibilidade do sistema. Isso torna os CRDTs especialmente adequados para ambientes primary-primary, onde todos os nós aceitam escritas de forma simultânea.


Revisores

Imagens geradas pelo DALL-E


Referências

What is data replication?

What is Change Data Capture?

O que é Change Data Capture

SQL-Server: O que é a CDA (captura de dados de alterações)?

Two-Phase Commit

Event-Carried State Transfer Pattern

7 Data Replication Strategies & Real World Use Cases 2024

Replication Strategies and Partitioning in Cassandra

Event-Carried State Transfer: A Pattern for Distributed Data Management in Event-Driven Systems

Event-Carried State Transfer: Consistência e isolamento entre microsserviços

Event-Carried State Transfer Pattern

A Gentle Introduction to CRDTs

CRDTs: The Hard Parts

Almanaque das Memórias Afetivas volume 2

Quem viu o primeiro volume adorou e não aguenta esperar pelo segundo, quem só esta sabendo agora dessa obra prima, ja corre pra adquirir o primeiro volume, pois esta se esgotando, e aproveita para apoiar o segundo, pois vale muito a pena.

Se você não sabe do que estou falando, é sobre o Almanaque das Memórias Afetivas, do grande Roosevelt Garcia, e ficou mesmo um primor.

Tudo que marcou as gerações dos anos 50 aos anos 2000 está no Almanaque das Memórias Afetivas. Brinquedos e brincadeiras, comidas e bebidas, carros, itens de tecnologia e programas de televisão que fizeram história. O volume 1 tem mais de 350 páginas de boas lembranças para aqueles que viveram essa época, são 356 páginas ricamente ilustradas e totalmente coloridas. Formato 21x21cm.

Você que tem o Almanaque das Memórias Afetivas – volume 1 já sabe que ele foi feito com muito capricho com a ajuda inestimável de amigos pra lá de competentes. Neste volume 2, a equipe que está moldando o livro é exatamente a mesma, o que garante a total compatibilidade com o primeiro volume.

Para apoiar o projeto, que está sendo feito pelo Cartase, basta clicar aqui e você vai poder receber essa obra magnífica.

Se você quiser o volume 1, ainda pode conseguir no site da Lojinha do Almanaque ( https://www.lojinhadoalmanaque.com.br, onde além do Almanaque, ainda encontra produtos com referências a essa época inestimável.

Zoochosis

Zoochosis

Inside a ferroelectric RAM chip

Ferroelectric memory (FRAM) is an interesting storage technique that stores bits in a special "ferroelectric" material. Ferroelectric memory is nonvolatile like flash memory, able to hold its data for decades. But, unlike flash, ferroelectric memory can write data rapidly. Moreover, FRAM is much more durable than flash and can be be written trillions of times. With these advantages, you might wonder why FRAM isn't more popular. The problem is that FRAM is much more expensive than flash, so it is only used in niche applications.

Die of the Ramtron FM24C64 FRAM chip. (Click this image (or any other) for a larger version.)

Die of the Ramtron FM24C64 FRAM chip. (Click this image (or any other) for a larger version.)

This post takes a look inside an FRAM chip from 1999, designed by a company called Ramtron. The die photo above shows this 64-kilobit chip under a microscope; the four large dark stripes are the memory cells, containing tiny cubes of ferroelectric material. The horizontal greenish bands are the drivers to select a column of memory, while the vertical greenish band at the right holds the sense amplifiers that amplify the tiny signals from the memory cells. The eight whitish squares around the border of the die are the bond pads, which are connected to the chip's eight pins.1 The logic circuitry at the left and right of the die implements the serial (I2C) interface for communication with the chip.2

The history of ferroelectric memory dates back to the early 1950s.3 Many companies worked on FRAM from the 1950s to the 1970s, including Bell Labs, IBM, RCA, and Ford. The 1955 photo below shows a 256-bit ferroelectric memory built by Bell Labs. Unfortunately, ferroelectric memory had many problems,4 limiting it to specialized applications, and development was mostly abandoned by the 1970s.

A 256-bit ferroelectric memory made by Bell Labs. Photo from Scientific American, June, 1955.

A 256-bit ferroelectric memory made by Bell Labs. Photo from Scientific American, June, 1955.

Ferroelectric memory had a second chance, though. A major proponent of ferroelectric memory was George Rohrer, who started working on ferroelectric memory in 1968. He formed a memory company, Technovation, which was unsuccessful, and then cofounded Ramtron in 1984.5 Ramtron produced a tiny 256-bit memory chip in 1988, followed by much larger memories in the 1990s.

How FRAM works

Ferroelectric memory uses a special material with the property of ferroelectricity. In a normal capacitor, applying an electric field causes the positive and negative charges to separate in the dielectric material, making it polarized. However, ferroelectric materials are special because they will retain this polarization even when the electric field is removed. By polarizing a ferroelectric material positively or negatively, a bit of data can be stored. (The name "ferroelectric" is in analogy to "ferromagnetic", even though ferroelectric materials are not ferrous.)

This FRAM chip uses a ferroelectric material called lead zirconate titanate or PZT, containing lead, zirconium, titanium, and oxygen. The diagram below shows how an applied electric field causes the titanium or zirconium atom to physically move inside the crystal lattice, causing the ferroelectric effect. (Red atoms are lead, purple are oxygen, and yellow are zirconium or titanium.) Because the atoms physically change position, the polarization is stable for decades; in contrast, the capacitors in a DRAM chip lose their data in milliseconds unless refreshed. FRAM memory will eventually wear out, but it can be written trillions of times, much more than flash or EEPROM memory.

The ferroelectric effect in the PZT crystal. From Ramtron Catalog, cleaned up.

The ferroelectric effect in the PZT crystal. From Ramtron Catalog, cleaned up.

To store data, FRAM uses ferroelectric capacitors, capacitors with a ferroelectric material as the dielectric between the plates. Applying a voltage to the capacitor will create an electric field, polarizing the ferroelectric material. A positive voltage will store a 1, and a negative voltage will store a 0.

Reading a bit from memory is a bit tricky. A positive voltage is applied, forcing the material into the 1 state. If the material was already in the 1 state, minimal current will flow. But if the material was in the 0 state, more current will flow as the capacitor changes state. This allows the 0 and 1 states to be distinguished.

Note that reading the bit destroys the stored value. Thus, after a read, the 0 or 1 value must be written back to the capacitor to restore its previous state. (This is very similar to the magnetic core memory that was used in the 1960s.)6

The FRAM chip that I examined uses two capacitors per bit, storing opposite values. This approach makes it easier to distinguish a 1 from a 0: a sense amplifier compares the two tiny signals and generates a 1 or a 0 depending on which is larger. The downside of this approach is that using two capacitors per bit reduces the memory capacity. Later FRAMs increased the density by using one capacitor per bit, along with reference cells for comparison.7

A closer look at the die

The diagram below shows the main functional blocks of the chip.8 The memory itself is partitioned into four blocks. The word line decoders select the appropriate column for the address and the drivers generate the pulses on the word and plate lines. The signals from that column go to the sense amplifiers on the right, where the signals are converted to bits and written back to memory. On the left, the precharge circuitry charges the bit lines to a fixed voltage at the start of the memory cycle, while the decoders select the desired byte from the bit lines.

The die with the main functional blocks labeled.

The die with the main functional blocks labeled.

The diagram below shows a closeup of the memory. I removed the top metal layer and many of the memory cells to reveal the underlying structure. The structure is very three-dimensional compared to regular chips; the gray squares in the image are cubes of PZT, sitting on top of the plate lines. The brown rectangles labeled "top plate connection" are also three-dimensional; they are S-shaped brackets with the low end attached to the silicon and the high end contacting the top of the PZT cube. Thus, each PZT cube forms a capacitor with the plate line forming the bottom plate of the capacitor, the bracket forming the top plate connection, and the PZT cube sandwiched in between, providing the ferroelectric dielectric. (Some cubes have been knocked loose in this photo and are sitting at an angle; the cubes form a regular grid in the original chip.)

Structure of the memory. The image is focus-stacked for clarity.

Structure of the memory. The image is focus-stacked for clarity.

The physical design of the chip is complicated and quite different from a typical planar integrated circuit. Each capacitor requires a cube of PZT sandwiched between platinum electrodes, with the three-dimensional contact from the top of the capacitor to the silicon. Creating these structures requires numerous steps that aren't used in normal integrated circuit fabrication. (See the footnote9 for details.) Moreover, the metal ions in the PZT material can contaminate the silicon production facility unless great care is taken, such as using a separate facility to apply the ferroelectric layer and all subsequent steps.10 The additional fabrication steps and unusual materials significantly increase the cost of manufacturing FRAM.

Each top plate connection has an associated transistor, gated by a vertical word line.11 The transistors are connected to horizontal bit lines, metal lines that were removed for this photo. A memory cell, containing two capacitors, measures about 4.2 µm × 6.5 µm. The PZT cubes are spaced about 2.1 µm apart. The transistor gate length is roughly 700 nm. The 700 nm node was introduced in 1993, while the die contains a 1999 copyright date, so the chip appears to be a few years behind the cutting edge as far as node.

The memory is organized as 256 capacitors horizontally by 512 capacitors vertically, for a total of 64 kilobits (since each bit requires two capacitors). The memory is accessed as 8192 bytes. Curiously, the columns are numbered on the die, as shown below.

With the metal removed, the numbers are visible counting the columns.

With the metal removed, the numbers are visible counting the columns.

The photo below shows the sense amplifiers to the right of the memory, with some large transistors to boost the signal. Each sense amplifier receives two signals from the pair of capacitors holding a bit. The sense amplifier determines which signal is larger, deciding if the bit is a 0 or 1. Because the signals are very small, the sense amplifier must be very sensitive. The amplifier has two cross-connected transistors with each transistor trying to pull the other signal low. The signal that starts off larger will "win", creating a solid 0 or 1 signal. This value is rewritten to memory to restore the value, since reading the value erases the cells. In the photo, a few of the ferroelectric capacitors are visible at the far left. Part of the lower metal layer has come loose, causing the randomly strewn brown rectangles.

The sense amplifiers.

The sense amplifiers.

The photo below shows eight of the plate drivers, below the memory cells. This circuit generates the pulse on the selected plate line. The plate lines are the thick white lines at the top of the image; they are platinum so they appear brighter in the photo than the other metal lines. Most of the capacitors are still present on the plate lines, but some capacitors have come loose and are scattered on the rest of the circuitry. Each plate line is connected to a metal line (brown), which connects the plate line to the drive transistors in the middle and bottom of the image. These transistors pull the appropriate plate line high or low as necessary. The columns of small black circles are connections between the metal line and the silicon of the transistor underneath.

The plate driver circuitry.

The plate driver circuitry.

Finally, here's the part number and Ramtron logo on the die.

Closeup of the logo "FM24C64A Ramtron" on the die.

Closeup of the logo "FM24C64A Ramtron" on the die.

Conclusions

Ferroelectric RAM is an example of a technology with many advantages that never achieved the hoped-for success. Many companies worked on FRAM from the 1950s to the 1970s but gave up on it. Ramtron tried again and produced products but they were not profitable. Ramtron had hoped that the density and cost of FRAM would be competitive with DRAM, but unfortunately that didn't pan out. Ramtron was acquired by Cypress Semiconductor in 2012 and then Cypress was acquired by Infineon in 2019. Infineon still sells FRAM, but it is a niche product, for instance satellites that need radiation hardness. Currently, FRAM costs roughly $3/megabit, almost three orders of magnitude more expensive than flash memory, which is about $15/gigabit. Nonetheless, FRAM is a fascinating technology and the structures inside the chip are very interesting.

For more, follow me on Mastodon as @[email protected] or RSS. (I've given up on Twitter.) Thanks to CuriousMarc for providing the chip, which was used in a digital readout (DRO) for his CNC machine.

Notes and references

  1. The photo below shows the chip's 8-pin package.

    The chip is packaged in an 8-pin DIP. "RIC" stands for Ramtron International Corporation.

    The chip is packaged in an 8-pin DIP. "RIC" stands for Ramtron International Corporation.

     

  2. The block diagram shows the structure of the chip, which is significantly different from a standard DRAM chip. The chip has logic to handle the I2C protocol, a serial protocol that uses a clock and a data line. (Note that the address lines A0-A2 are the address of the chip, not the memory address.) The WP (Write Protect) pin, protects one quarter of the chip from being modified. The chip allows an arbitrary number of bytes to be read or written sequentially in one operation. This is implemented by the counter and address latch.

    Block diagram of the FRAM chip. From the datasheet.

    Block diagram of the FRAM chip. From the datasheet.

     

  3. An early description of ferroelectric memory is in the October 1953 Proceedings of the IRE. This issue focused on computers and had an article on computer memory systems by J. P. Eckert of ENIAC fame. In 1953, computer memory systems were primitive: mercury delay lines, electrostatic CRTs (Williams tubes), or rotating drums. The article describes experimental memory technologies including ferroelectric memory, magnetic core memory, neon-capacitor memory, phosphor drums, temperature-sensitive pigments, corona discharge, or electrolytic diodes. Within a couple of years, magnetic core memory became successful, dominating storage until semiconductor memory took over in the 1970s, and most of the other technologies were forgotten. 

  4. A 1969 article in Electronics discussed ferroelectric memories. At the time, ferroelectric memories were used for a few specialized applications. However, ferroelectric memories had many issues: slow write speed, high voltages (75 to 150 volts), and expensive logic to decode addresses. The article stated: "These considerations make the future of ferroelectric memories in computers rather bleak." 

  5. Interestingly, the "Ram" in Ramtron comes from the initials of the cofounders: Rohrer, Araujo, and McMillan. Rohrer originally focused on potassium nitrate as the ferroelectric material, as described in his patent. (I find it surprising that potassium nitrate is ferroelectric since it seems like such a simple, non-exotic chemical.) An extensive history of Ramtron is here. A Popular Science article also provides information. 

  6. Like core memory, ferroelectric memory is based on a hysteresis loop. Because of the hysteresis loop, the material has two stable states, storing a 0 or 1. While core memory has a hysteresis loop for magnetization with respect to the magnetic field, ferroelectric memory The difference is that core memory has hysteresis of the magnetization with respect to the applied magnetic field, while ferroelectric memory has hysteresis of the polarization with respect to the applied electric field. 

  7. The reference cell approach is described in Ramtron patent 6028783A. The idea is to have a row of reference capacitors, but the reference capacitors are sized to generate a current midway between the 0 current and the 1 current. The reference capacitors provide the second input to the sense amplifiers, allowing the 0 and 1 bits to be distinguished. 

  8. Ramtron's 1987 patent describes the approximate structure of the memory. 

  9. The diagram below shows the complex process that Ramtron used to create an FRAM chip. (These steps are from a 2003 patent, so they may differ from the steps for the chip I examined.)

    Ramtron's process flow to create an FRAM die. From Patent 6613586.

    Ramtron's process flow to create an FRAM die. From Patent 6613586.

    Abbreviations: BPSG is borophosphosilicate glass. UTEOS is undoped tetraethylorthosilicate, a liquid used to deposit silicon dioxide on the surface. RTA is rapid thermal anneal. PTEOS is phosphorus-doped tetraethylorthosilicate, used to create a phosphorus-doped silicon dioxide layer. CMP is chemical mechanical planarization, polishing the die surface to be flat. TEC is the top electrode contact. ILD is interlevel dielectric, the insulating layer between conducting layers. 

  10. See the detailed article Ferroelectric Memories, Science, 1989, by Scott and Araujo (who is the "A" in "Ramtron"). 

  11. Early FRAM memories used an X-Y grid of wires without transistors. Although much simpler, this approach had the problem that current could flow through unwanted capacitors via "sneak" paths, causing noise in the signals and potentially corrupting data. High-density integrated circuits, however, made it practical to associate a transistor with each cell in modern FRAM chips. 

FSL: A Better Business/Open Source Balance Than AGPL

subtext: in my opinion, and for companies (and their users) that want a good balance between protecting their core business with Open Source ideals.

Following up to my thoughts on the case for funding Open Source, there is a second topic I want to discuss in more detail: Open Source and commercialization. As our founder likes to say: Open Source is not a business model. And indeed it really isn't. However, this does not mean that Open Source and Open Source licenses aren't a critical consideration for a technology company and a fascinating interconnection between the business model and license texts.

As some of you might know I'm a strong proponent of the concept now branded as “Fair Source” which we support at Sentry. Fair Source is defined by a family of springing licenses that give you the right to read and modify code, while also providing an exclusivity period for the original creator to protect their core business. After a designated time frame, the code transitions into Open Source via a process called DOSP: Delayed Open Source Publication. This is not an entirely new idea, and I have been writing about it a few times before [1] [2].

A recurring conversation I have in this context is the AGPL (Affero General Public License) as an alternative vehicle for balancing business goals and Open Source ideals. This topic also has resurfaced recently because of Elasticsearch'es Open Source, Again post where they announced that they will license Elasticsearch under the AGPL.

In my view, while AGPL is a true Open Source license, it is an inferior choice compared to the FSL (the Functional Source License, a Fair Source license) for many projects. Let me explain my reasoning.

The Single Vendor Model

When you take a project like Sentry, which started as an Open Source project and later turned into a VC funded company, its model revolves around a commercial entity being in charge. That model is often referred to as “single vendor.” This is also the case with companies like Clickhouse Inc. or Elastic and their respective projects.

Sentry today is no longer Open Source, it's Fair Source (FSL licensed). Elastic on the other hand is indeed unquestionable Open Source (AGPL among others). What both projects have in common is that they value brand (including trademarks), that they have strong opinions on how that project should be run, and they use a CLA to give themselves the right to re-licenses it under other terms.

In a "single vendor" setup, the company behind the project holds significant power (for ~150 years give or take).

The Illusion of Equality

When you look at the AGPL as a license it's easy to imagine that everybody is equal. Every contributor to a project agrees with the underlying license assumptions of the AGPL and acts accordingly. However, in practice, things are more complicated — especially when it comes to commercial usage. Many legal departments are wary of the AGPL and the broader GPL family of licenses. Some challenges are also inherent to the licenses such as not being able to publish *GPL code to the app store.

You can see this also with Elasticsearch. The code is not just AGPL licensed, you can also retrieve it under the ELv2 and SSPL licensing terms. Something that Elastic can do due to the CLAs in place.

Compare this to Linux, which is licensed under GPLv2 with a syscall exception. This very specific license was chosen by Linus Torvalds to ensure the project's continued success while keeping it truly open. In Linux' case, no single entity has more rights than anyone else. There is not even a realistic option to relicense to a newer version of the GPL.

The FSL explicitly recognizes the reality that the single vendor holds significant power but balances it by ensuring that this power diminishes over time. This idea can also be found in copyright law, where a creator's work eventually enters the public domain. A key difference with software though is that it continuously evolves, making it hard to pinpoint when it might eventually become public domain as thousands of people contribute to it.

The FSL is much more aggressive in that aspect. If we run Sentry into the ground and the business fails, within two years, anyone can pick up the pieces and revive it like a Phoenix from the ashes. This isn't just hypothetical. Bryan Cantrill recently mentioned the desire of Oxide forking CockroachDB once its BUSL change date kicks in. While that day hasn't come yet, it's a real possibility.

Dying Companies

Let's face it: companies fail. I have no intentions for Sentry to be one of them, but you never know. Companies also don't just die just once, they can do so repeatedly. Xapian is an example I like to quote here. It started out as a GPL v2+ licensed search project called Muscat which was built at Cambridge. After several commercial acquisitions and transitions, the project eventually became closed source (which was possible because the creators held the copyright). Some of the original creators together with the community forked the last GPLv2 version into a project that eventually became known as Xapian.

What's the catch? The catch is that the only people who could license it more liberally than GPLv2 are long gone from the project. Xapian refers to its current license “a historical accident”. The license choice causes some challenges specifically to how Xapian is embedded. There are three remaining entities that would need to agree to the relicensing. From my understanding none of those entities commercially use Xapian's original code today but also have no interest in actually supporting a potential relicensing.

Unlike trademark law which has a concept of abandonment, the copyright situation is stricter. It would take two lifetimes for Xapian to enter the public domain and at that point it will be probably be mostly for archival purposes.

Equal Grounds Now or Later

If Xapian's original code would have been FSL licensed, it would have been Apache 2.0 (or MIT with the alternative model) many times over. You don't need to hope that the original license holder still cares, by the time you get hold of the source code, you already have an irrevocable promise that it will eventually turn into Apache 2.0 (or MIT with the alternative license choice) which is about as non-strings attached as it can get.

So in some ways a comparison is “AGPL now and forever” vs “FSL now, Apache 2.0/MIT in two years”.

That's not to say that AGPL (or SSPL) don't have their merits. Xapian as much as it might suffer from their accidental license choice also is a successful Open Source project that helped a lot of developers out there. Maybe the license did in fact work out well for them, and because everybody is in the same boat it also has created a community of equals.

I do believe however it's important to recognize that “single-vendor AGPL with a CLA” is absolutely not the same as “community driven AGPL project without the CLA”.

The title claims that FSL balances Open Source better than AGPL, and it's fair to question how a license that isn't Open Source can achieve that. The key lies in understanding that Fair Source is built on the concept of delayed Open Source. Yes, there's a waiting period, but it’s a relatively short one: just two years. Count to two and the code transitions to full, unshackled openness. And that transition to Open Source is a promise that can't be taken from you.

[1]Originally about the BUSL license which introduced the idea (Open Source, SaaS and Monetization)
[2]Later about our own DOSP based license, the FSL (FSL: A License For the Bazaar, Not the Cathedral).

What I tell people new to on-call

The first time I went on call as a software engineer, it was exciting—and ultimately traumatic. Since then, I've had on-call experiences at multiple other jobs and have grown to really appreciate it as part of the role. As I've progressed through my career, I've gotten to help establish on-call processes and run some related trainings.

Comic showing a variety of job titles and joke interpretations of them

Here is some of what I wish I'd known when I started my first on-call shift, and what I try to tell each engineer before theirs.

Heroism isn't your job, triage is

It's natural to feel a lot of pressure with on-call responsibilities. You have a production application that real people need to use! When that pager goes off, you want to go in and fix the problem yourself. That's the job, right?

But it's not. It's not your job to fix every issue by yourself. It is your job to see that issues get addressed. The difference can be subtle, but important.

When you get that page, your job is to assess what's going on. A few questions I like to ask are: What systems are affected? How badly are they impacted? Does this affect users?

With answers to those questions, you can figure out what a good course of action is. For simple things, you might just fix it yourself! If it's a big outage, you're putting on your incident commander hat and paging other engineers to help out. And if it's a false alarm, then you're putting in a fix for the noisy alert! (You're going to fix it, not just ignore that, right?)

Just remember not to be a hero. You don't need to fix it alone, you just need to figure out what's going on and get a plan.

Call for backup

Related to the previous one, you aren't going this alone. Your main job in holding the pager is to assess and make sure things get addressed. Sometimes you can do that alone, but often you can't!

Don't be afraid to call for backup. People want to be helpful to their teammates, and they want that support available to them, too. And it's better to be wake me up a little too much than to let me sleep through times when I was truly needed. If people are getting woken up a lot, the issue isn't calling for backup, it's that you're having too many true emergencies.

It's best to figure out that you need backup early, like 10 minutes in, to limit the damage of the incident. The faster you figure out other people are needed, the faster you can get the situation under control.

Communicate a lot

In any incident, adrenaline runs and people are stressed out. The key to good incident response is communication in spite of the adrenaline. Communicating under pressure is a skill, and it's one you can learn.

Here are a few of the times and ways of communicating that I think are critical:

  • When you get on and respond to an alert, say that you're there and that you're assessing the situation
  • Once you've assessed it, post an update; if the assessment is taking a while, post updates every 15 minutes while you do so (and call for backup)
  • After the situation is being handled, update key stakeholders at least every 30 minutes for the first few hours, and then after that slow down to hourly

You are also going to have to communicate within the response team! There might be a dedicated incident channel or one for each incident. Either way, try to over communicate about what you're working on and what you've learned.

Keep detailed notes, with timestamps

When you're debugging weird production stuff at 3am, that's the time you really need to externalize your memory and thought processes into a notes document. This helps you keep track of what you're doing, so you know which experiments you've run and which things you've ruled out as possibilities or determined as contributing factors. It also helps when someone else comes up to speed! That person will be able to use your notes to figure out what has happened, instead of you having to repeat it every time someone gets on. Plus, the notes doc won't forget things, but you will.

You will also need these notes later to do a post-mortem. What was tried, what was found, and how it was fixed are all crucial for the discussion. Timestamps are critical also for understanding the timeline of the incident and the response!

This document should be in a shared place, since people will use it when they join the response. It doesn't need to be shared outside of the engineering organization, though, and likely should not be. It may contain details that lead to more questions than they answer; sometimes, normal engineering things can seem really scary to external stakeholders!

You will learn a lot!

When you're on call, you get to see things break in weird and unexpected ways. And you get to see how other people handle those things! Both of these are great ways to learn a lot.

You'll also just get exposure to things you're not used to seeing. Some of this will be areas that you don't usually work in, like ops if you're a developer, or application code if you're on the ops side. Some more of it will be business side things for the impact of incidents. And some will be about the psychology of humans, as you see the logs of a user clicking a button fifteen hundred times (get that person an esports sponsorship, geez).

My time on call has led to a lot of my professional growth as a software engineer. It has dramatically changed how I worked on systems. I don't want to wake up at 3am to fix my bad code, and I don't want it to wake you up, either.

Having to respond to pages and fix things will teach you all the ways they can break, so you'll write more resilient software that doesn't break. And it will teach you a lot about the structure of your engineering team, good or bad, in how it's structured and who's responding to which things.

Learn by shadowing

No one is born skilled at handling production alerts. You gain these skills by doing, so get out there and do it—but first, watch someone else do it.

No matter how much experience you have writing code (or responding to incidents), you'll learn a lot by watching a skilled coworker handle incoming issues. Before you're the primary for an on-call shift, you should shadow someone for theirs. This will let you see how they handle things and what the general vibe is.

This isn't easy to do! It means that they'll have to make sure to loop you in even when blood is pumping, so you may have to remind them periodically. You'll probably miss out on some things, but you'll see a lot, too.

Some things can (and should) wait for Monday morning

When we get paged, it usually feels like a crisis. If not to us, it sure does to the person who's clicking that button in frustration, generating a ton of errors, and somehow causing my pager to go off. But not all alerts are created equal.

If you assess something and figure out that it's only affecting one or two customers in something that's not time sensitive, and it's currently 4am on a Saturday? Let people know your assessment (and how to reach you if you're wrong, which you could be) and go back to bed. Real critical incidents have to be fixed right away, but some things really need to wait.

You want to let them go until later for two reasons. First is just the quality of the fix. You're going to fix things more completely if you're rested when you're doing so! Second, and more important, is your health. It's wrong to sacrifice your health (by being up at 4am fixing things) for something non-critical.

Don't sacrifice your health

Many of us have had bad on-call experiences. I sure have. One regret is that I didn't quit that on-call experience sooner.

I don't even necessarily mean quitting the job, but pushing back on it. If I'd stood up for myself and said "hey, we have five engineers, it should be more than just me on call," and held firm, maybe I'd have gotten that! Or maybe I'd have gotten a new job. What I wouldn't have gotten is the knowledge that you can develop a rash from being too stressed.

If you're in a bad on-call situation, please try to get out of it! And if you can't get out of it, try to be kind to yourself and protect yourself however you can (you deserve better).

Be methodical and reproduce before you fix

Along with taking great notes, you should make sure that you test hypotheses. What could be causing this issue? And before that, what even is the problem? And how do we make it happen?

Write down your answers to these! Then go ahead and try to reproduce the issue. After reproducing it, you can try to go through your hypotheses and test them out to see what's actually contributing to the issue.

This way, you can bisect problem spaces instead of just eliminating one thing at a time. And since you know how to reproduce the issue now, you can be confident that you do have a fix at the end of it all!

Have fun

Above all, the thing I want people new to on-call to do? Just have fun. I know this might sound odd, because being on call is a big job responsibility! But I really do think it can be fun.

There's a certain kind of joy in going through the on-call response together. And there's a fun exhilaration to it all. And the joy of fixing things and really being the competent engineer who handled it with grace under pressure.

Try to make some jokes (at an appropriate moment!) and remember that whatever happens, it's going to be okay.

Probably.

Solving my RSI issues

For the past decade I've struggled with wrist injuries on and off. It's a common occupational injury for us keyboard users, and after my most recent bout this year I decided it was time to actually do something about it. Permanently. I like systemic solutions - they tickle a good part of my brain. And I figured I could apply some of that to my keyboard habits too.

In this post I want to share a little about what my journey to actually fix my wrist issues. Not to be prescriptive for others, but to be descriptive about what I'm doing. Perhaps it'll be useful for others, but mainly because it's fun for me to talk about. I've been making little ASCII cheat sheets for myself, and basically I just wanted to have an excuse to post them publicly.

Breaking up writing sessions

The way I've dealt with my RSI issues for the past eight years is by taking repeated breaks. I've done this using the Time Out app on macOS, and Workrave on Windows and Linux. Both applications bring up a modal pop-up on a set schedule. I have it set to create a short 30-second break every 10 minutes or so, and a 10-minute break every 50 minutes.

While this does not make typing itself any less strenuous, it does break up some of the repetition - which ends up helping a lot. Taking breaks can be a little frustrating though, as it can break up flow. But it's better to have a lot of small breaks than longer-lasting injuries. Shout out to Jacob Groundwater for suggesting this back in 2016 - it's helped me a lot over the years.

Mobility

Another thing I've been doing is mobility issues, specifically for my wrist. I worked with a physio from 2018 to 2020, and I have a stretch routine that helps with flexibility and mobility. I have to admit I've not been keeping that up, but as symptoms flare back up, it's something that I know I can apply when needed.

Where taking breaks helps break up the intensity of the strain, physical exercises help both with symptom relief and building resiliency. It's not perfect, but it's yet another thing I've found that helps me. I'm a little hesitant to provide instructions since I'm not a physio myself though, and it's definitely possible to hurt yourself if you don't do it right.

Choosing a keyboard layout

I started taking regular breaks again in June this year, and it's not helped me as well as it did before. I assume it has something to do with being older now, and recovery just taking longer than it used to. Though I learned how to touch-type when I was in my early teens, I know that using QWERTY layouts aren't particularly good for you. It requires a lot of extra movement in fingers, which leads to a lot of extra strain.

Because I hope to continue to use computers in some capacity for the next fifty odd years, I figured re-learning something I only learned twenty years ago would be well-worth it in the long run. So time to bite the bullet, and learn a new layout. But which one?

I'm lucky that many of my friends are incredibly geekyknowledgeable, and I figured one of them would probably have a good suggestion for what to learn. A quick check online suggested that Colemak Mod-DH would probably be a good choice. So I checked with my friend Sy Brand, who thought it was a good layout - but they suggested something better: Canary.

Canary is a fun little layout based on Colemak Mod-DH but optimizes for something called "rolls". In keyboard terminology a "roll" is a sequence of keys you can type with one hand. A layout which has "high rolls" means it's a layout that has many of those sequences. Here is what the layout looks like on a regular keyboard:

w l y p k z x o u ; [ ] \
 c r s t b f n e i a '
  j v d g q m h / , .

Rolls in QWERTY are fairly uncommon. But with Canary a lot of words incorporate rolls. This might be easier to show rather than explain, so here are some words that incorporate rolls on a QWERTY layout: asimo, werk, wookie. Do you notice how typing those feels kind of flowy? Now imagine nearly all words feel like that. Creating that feeling without causing strain is what the Canary layout was specifically designed for.

Learning a new keyboard layout

Once I'd decided to switch to a new keyboard layout, there was a big "draw the rest of the owl" step to actually being able to use it. When I was a kid my mom paid for me to take after-school classes for a number of weeks to learn how to touch-type. That was 20 years ago though, and I somehow doubt there are in-person classes teaching the Canary layout. So what to do?

It turns out there are lots of great options now. I've been using keybr.com which is free to use, supports various layouts in-browser, and provides different guided trainings to learn how to touch-type with new layouts. I've been using it daily for the past six weeks, and honestly it does everything I could hope for.

To complement keybr I've also started using ngram type which allows you to train "N-grams". These are sequences of characters which are often grouped together. Those high rolls I was talking about earlier? N-grams are the sequences they refer to.

Finally I've also occasionally been using monkeytype which is more focused on leaderboards and good looks. A lot of keyboard influencers (they exist) use this to show off how well they can type or how good their keyboard sounds. I haven't found it particularly useful to learn new layouts with, but it's been fun to play with!

On QWERTY I can comfortably type around 90 words per minute (wpm). When learning Canary it took me a few days to get to 10 wpm (unusable), two weeks to get to 20wpm (better, but still unusable) and about a month to get to 40 wpm (usable but slow). I spent about 10 total hours in keybr, averaging about 30 mins of training a day. I have no sense for whether that's fast or slow, but I was positively surprised that I could rebuild 20 years of muscle memory in a matter of weeks, with a fairly modest time investment.

Using vim movements on a new layout

One thin I knew would be an issue and actually turned out to be an issue was editing code on a new layout. I use vim bindings in all of my editors, and vim motions are just part of me now. I asked Sy about how they switched over to a new layout when they did and deal with this, and well, they didn't. Good on them for not using vim bindings; sucks to be me.

This turned out to be a bit of a stumbling block: while I don't exclusively write code, I do still write a lot of it. And that means that I need to be able to move around an editor. I mean, I know there are other vim movements and I use those too. But heck, not being able to use the base vim motions to navigate? Yeah, I wasn't going to give that up.

I looked into changing key bindings in VS Code, and tried figuring out what others had done - and there didn't seem to be any good solutions. Except one: Miryoku.

Keyboard layers

One of the issues with QWERTY is that you move around your fingers a lot, which causes strain over time. Miryoku is one attempt to define a layout which minimizes the amount of movement your fingers need to make. It's based on two principles:

  • Each finger should at most ever switch one key over
  • The strongest fingers (thumbs, index fingers) should do the most work

The way it achieves this is by defining an entire layout which can be accessed from the base position. The entire system is layer-based (modal), with each layer being accessible by pressing down a specific key with your thumb. Numbers, symbols, and navigation each get their own layer. And by putting the arrow keys on the position of the vim bindings, it makes it possible to restore vim-like navigation just by pressing down a single key. Here is what the neutral (base) layer looks like:

w l y p b     z f o u '      ¹esc   ⁴ret
c r s t g     m n e i a      ²spc   ⁵bsp
q j v d k     x h / , .      ³tab   ⁶del
    ¹ ² ³     ⁴ ⁵ ⁶

But once we press and hold the space key (key 2) with our thumb, we engage the navigation layer and the arrow keys become available to us. It's not quite the same position as with vim - but you can configure it that way if you like too. Here's what it looks like by default:

- - - - -     - - - - -
- - - - -     - ← ↓ ↑ →
- - - - -     - - - - -
    ◼︎ - -     - - -    

I thought this was neat and probably the right solution, but I wasn't sure how to engage the thumb keys on a regular keyboard. The space bar is in the way, and there is no thumb cluster for us to press. One solution is to shift your fingers one row up, and remap the keyboard to use the XQCBNM keys as our thumb keys. miryoku_kmonad is a software package for macOS and Linux which allows you to do exactly that. But I tried and failed to get that to work reliably (or well: I lacked the patience to finish debugging it), so I went to find a different solution.

Hardware

So far my plan was to keep using my existing hardware. I like my 60% Vortex Cypher keyboard, and I figured I'd start by switching keyboard layouts and see how far I could get. I tried that and got stuck after a while - so I figured I might as well commit to the bit and invest in hardware.

I ended up going picking up a pre-soldered Chocify keyboard from Beekeeb. This is a 36-key split keyboard which comes with Miryoku pre-installed. I got the bluetooth-based version with the screen which cost a little extra. I did make a mistake by choosing the "Choc Blue" switches which are different from Cherry MX blue switches, so I've got a new set of tactile switches coming in soon (Kaihl Sunset).

Custom keyboards are pretty involved, but well worth it. Some things I didn't know before I got started but had to learn on the way:

  • international shipping of batteries is difficult - if you need bespoke batteries for a hardware project in Europe, Amazon is actually one of the better options.
  • the Chocify board uses the nice!nano micro-controller. The way you flash this is by double-clicking a button that turns it into USB drive mode, and then you can just drag and drop new firmware on it from your computer
  • you can solder cables to connect them. Though you want to heat-shrink them after. I didn't actually end up doing this, but got help from my local coffee shop with this hah (if you're reading this: thanks Valdi!)

Picking up hardware was something I didn't want to do unless strictly necessary - but I think it was the right choice. I've recently also started experiencing shoulder pain after long typing sessions, and I've noticed that having a split ergo keyboard actually helps a lot with that too.

I'm fairly confident this little 36-key keyboard I have is going to become my new daily driver once I get the hang of it. And while that isn't the case yet, that moment is creeping up on me soon.

Learning a new keyboard layout (again)

Oh before I close out, it turns out that the Canary keyboard layout comes in two variants: staggered and ortho. I didn't realize the two were different, so I just pressed "canary" on keybr and started learning. It turns out that was the staggered layout:

w l y p k z x o u ; [ ] \
 c r s t b f n e i a '
  j v d g q m h / , .

On a 30% keyboard that layout ends up looking like this:

w l y p k     z x o u ; 
c r s t b     f n e i a
j v d g q     m h / , .

The layout the Canary authors designed for keyboards like this is the "ortho" layout which looks like this. It's very similar, but not the same:

w l y p b     z f o u '
c r s t g     m n e i a
q j v d k     x h / , .

Oops, I should have paid more attention before I got started. Though in all fairness: I did start learning Canary under the assumption I'd stick with my 60% staggered keyboard. So it makes sense I didn't immediately jump to the ortho layout. Though I probably should have thought to start learning the ortho layout as soon as I put in the hardware order - which was a few weeks ago now.

Conclusion

Now that I have my keyboard, have the firmware flashed and batteries installed, this is where I'm currently at in my journey: re-learning to type on the Canary Ortho layout. I think this is fine, because when switching to the Miryoku layout I need to re-learn where all the symbols are, so it's not like I can hit the ground running on my new keyboard anyway.

I'm still waiting on my new key switches to arrive which should make typing on my new keyboard a lot easier (45g rather than 20g actuation force - MX blues have 50g actuation). But I can already tell that Canary + Miryoku + Chocify causes a lot less strain on my hands than QWERTY does.

I'm not going to say that people should follow what I'm doing. I don't know your situation, and maybe just getting an ergo QWERTY keyboard or taking more breaks might be enough for your needs. But for me? I've really enjoyed going down the rabbit hole of learning a new keyboard layout, getting a new keyboard, and picking up various things along the way.

So has all of this solved my RSI issues? No - I'm not yet daily driving my new keyboard, so I'm not yet experiencing the benefits. But I can tell the difference in strain I experience when I switch to the split keyboard and new keyboard layout. Now that nearly everything is in place to switch over, I can't wait to start daily driving it. I'm definitely taking the most involved path to solving this, but I think that this might actually be the answer to fully fix the source of my injuries.

This post was a little break from my usual writing about type systems and language design. If you'd like to learn more about ergonomic keyboard layouts, I really enjoyed this talk by Mattia Dal Ben: 34 keys is all you need: an ergonomic keyboard journey.

a split keyboard on a table with an ipad. There is a bunch of colorful furniture in the background

Here's what my keyboard practice looked like this morning. I think it's cute.

SDF antialiasing

Last time I was looking at letter spacing with my renderer to see how it compared to Google Chrome on Mac. But while doing that I noticed that their antialiasing looked nicer than mine. So I tweaked parameters, including antialias edge width, gamma, and threshold bias.

My renderer Google's renderer
My renderer (first) vs Google's (second)

I got results that were close to Google Chrome on Mac, but beyond that it became unclear which variant was actually better. I kept tweaking parameters:

tweak 0 tweak 1 tweak 2 tweak 3 tweak 4 tweak 5
Many attempts to tweak parameters

Sometimes it came down to looking really closely at the pixels. But which is better? I don't know.

My renderer Google's renderer
My renderer (first) vs Google's (second)

I realized after a while that this is a rabbit hole. I had to force myself out of this endless tweaking cycle. Besides, Chrome on Mac is different from other browser + windowing systems, so I shouldn't spend all my time trying to match it.

But two months later, I revisited antialiasing, because I needed to better understand it to implement halos and drop shadows. This happens a lot with my experiments. I'll discover something, then I'll tweak a lot, then I'll put it away for a while, and later I can come back to it and try to understand what I did.

To implement antialiasing let's look at the mapping from distance to color.

Distance maps to either black or white

Here's the output. Pixels are either black or white:

The letter e rendered with no antialiasing

For antialiasing we want to smoothly transition between colors, so that it looks like this:

The letter e rendered with antialiasing

But how much? If we do too much, it looks blurry:

The letter e rendered with more antialiasing

We want it to scale based on the output pixels, not the input distance field. And that means we need to know something about the number of pixels on screen.

On the msdfgen page there's a shader that includes antialiasing. They're measuring screenPxRange representing how many pixels in the output corresponds to one "distance unit" in the signed distance field input.

If we want antialiasing to occur over edge_blur_px pixels in the output, we can divide edge_blur_px ÷ screenPxRange to find out what signed distance range this represents. For example if we want to antialias over 2 pixels, and screenPxRange is 8 px / distanceunit, then 2 px ÷ 8 px / distanceunit is ¼ distanceunits. The msdfgen code will antialias between 0 and +¼. Another option would be to antialias between −⅛ and +⅛.

This is what the blending looks like:

Color blends between black and white

The slope of that line is screenPxRange / edge_blur_px.

I wanted to get a better sense of what edge_blur_px should be. Over how many pixels should I apply antialiasing? I had previously tweaked it, then made the antialiasing width an interactive parameter to tweak faster. When I revisited it a few weeks later, I realized it'd be better to see all the outputs at once instead of using interactivity to see one at a time. For more on one interactive vs multiple non-interactive visualization, see Bret Victor's page about "Ladder of Abstraction".

Testing a range of antialiasing widths

I had set edge_blur_px to 1.0 in my previous tweaking, and this confirms that 1.0 is a reasonable choice. Lower values from 0.5 down look a little blocky, and higher values from 2.0 up look a little blurry. I decided to zoom into that range:

Testing a range of antialiasing widths

These may all look the same at first, but look closely with a magnifying glass and you'll see differences. From eyeballing this, I think maybe 1.2 or 1.3 might be the best choice, at least for large text. I don’t have any explanation for this. Could be it 1.25? Could it be sqrt(1.5)? If I looked at other sizes and characters, would I conclude it has to be higher, like sqrt(2) or 1.5? Is there signal processing math to prove the best value? Would it be different with gamma correction? Should I use smoothstep instead of linearstep? I don’t know. I'll set it to 1.2 for now.

I'm quite happy with what I have, but not happy with how long it took. Two months went by between the first and last image on this blog post. I had many fixes and tweaks in those two months. I'll describe those in the next few posts.

At the Speed of Relevance

Process. Or,     &ldquo;A notional illustration of interfaces among three major decision support systems used to develop, produce, and field a weapon system for national defense.&rdquo;

Process. Or, "A notional illustration of interfaces among three major decision support systems used to develop, produce, and field a weapon system for national defense."

There’s just been a production outage, a new project has been announced, a security vulnerability has been discovered, a team is nearing the end of a retrospective, a post incident review is being run, a meeting is wrapping up. There’s a call around the room for action items to ensure either: 100% certainty of the good thing succeeding, or 100% certainty that the bad thing will never happen again.

Most software systems are complex sociotechnical systems. Changing the ‘socio’ can often be seen as a more acceptable option than changing the technical.

Instead of fixing the memory leak, system operators will be trained to detect and remediate the leak. Instead of investing in automated tests and safe deployment methods, managers will be required to approve all changes before they go to production.

Fixing the underlying technical issue gives engineers the opportunity to simplify the system. Changing (or more likely attempting to change) the way people operate the system introduces more complexity on top of not fixing the underlying causes.

Not enough Process. Need more Process.

Not enough Process. Need more Process.

The first image is published by the Defence Acquisition University (DAU). DAU is “a corporate university of the United States Department of Defense offering acquisition, technology, and logistics (AT&L) training to military and Federal civilian staff and Federal contractors” with an annual budget of $220 million USD.

In browsing the DAU website I came across Life Cycle Logistics—Key Tenets of Back-to-Basics in it, a new-to-me phrase "… [we shall do our work] at the speed of relevance".

Rebuilding this site

I first made this site back in 2016, and since then it hasn’t had any substantial updates to its design or overall structure. Here we are now in 2024, and I find myself to be a developer with a blog that doesn’t have a dark theme.

Толкователи хаоса

Можно бесконечно смотреть на три вещи: как горит огонь, как течёт вода и как политические новости сменяют друг друга в любимых Telegram-каналах. Последнее увлекает особенно сильно. Потому что, в отличие от божественной непостижимости узоров пламени, в информационном хаосе разобраться, конечно же, можно. Я, например, регулярно получаю анонимные письма от самых активных из тех, у кого это прекрасно получается. В своих письмах они подробно и убедительно анализируют геополитическую повестку, находят виноватых (обычно это евреи) и, в целях безопасности, требуют уничтожить письмо сразу после прочтения. Менее буйные писем не пишут, но, если их спросить, без тени сомнения расскажут, что происходит и кто виноват.

ChatGPT 4o
ChatGPT 4o

Эта очевидная и удручающая шизофренизация населения имеет очевидные причины. Их, как минимум, три.

Во-первых, пролетарий умственного труда (главный потребитель политических новостей) уже давно не страдает от недостатка еды и сна. Его неплохо кормят, ему выдают ипотеки и не заставляют работать по выходным. У него много свободного времени, которое нечем занять.

Во-вторых, полное отсутствие цензуры на центральных площадках обмена информацией. Ни YouTube, ни Telegram, ни VK не заблокируют видеоролик о новом заговоре в правительстве или неизбежном крахе банковской системы. Однако ролик заблокируют, например, за насилие над животными. За насилие же над вашим интеллектом блокировать не будут.

И наконец, третье и главное: отсутствие взвешенных дискуссий на официальных государственных каналах. Неуклюжая соловьёвщина, от которой тошнит уже даже самых ватных диванных экспертов, играет роль цензора, телом своим пытаясь закрыть пытливый ум нашего толкователя от потока вульгарных и интригующих подробностей.

Таким образом, имеем неглупого и вполне образованного офисного планктонца. Унылая и несложная работа не напрягает его мозги. У него остаётся много свободного времени, тратить которое на реализацию личных амбиций он не желает. Он давно потерял надежду на социальный лифт (или не имел её вовсе). При всём при этом он ежедневно пропускает через себя сотни фейков и десятки увлекательных интерпретаций, в то время как от запуганных журналистов официальных СМИ он не слышит ничего, кроме скучных штампов.

Что же делает наш герой? Он пытается сформировать в своей голове хоть какую-то правдоподобную версию происходящего, наблюдая за хаотично и стремительно меняющейся картинкой на экране. В то время как объём информации, доступный ему для анализа, ничтожно мал по сравнению с тем, что необходимо для формирования действительно правдоподобной и объективной картины, он всё же старается сформулировать принципы, по которым функционирует мироздание. С подачи РенТВ и «Истории Пи» он пытается сложить сложный пазл, увязывая, например, причины междуусобных войн Древней Руси XIV века с речевыми оборотами предвыборных дебатов Камалы Харрис. И у него получается!

Рано или поздно, одинокий и беспомощный под натиском инфомусора, он превращается в преданного адепта одной из конспирологических версий происходящего. Склеивая известные ему факты в мозаику, он добивается определённой логичности взаимосвязей между ними. Чтобы мозаика не рассыпалась, он тратит весь свой интеллектуальный и эмоциональный потенциал на поиск подтверждений своим выводам.

Он фанатично защищает то, что построил с таким трудом, а именно свою личную картину мира. В ней, как в сказке, есть добрые и злые волшебники, прекрасные принцессы и хоббиты, свои правила и законы. Делом жизни становится защита этого сказочного мира, а каждый инфовброс является потенциальной угрозой, ведь он может разрушить сказку, внезапно превратив прекрасную принцессу, например, в прокурора коррупционера. Поэтому либо объявляется фейком, если он противоречит мозаике, либо превращается в репост. Наш герой как по минному полю движется по инфопространству, не в состоянии оторваться ни на минуту.

Что в итоге? Расстройства психики разной степени тяжести. Результаты опроса студентов в 2021-м году в США показали, что 22% из них всерьез обдумывают самоубийство, в то время как десять лет назад их было лишь 16%. В РФ ситуация схожая. И дальше, вероятно, будет только хуже.

А знаете ли вы, что то же время в Китае запрещён любой политический контент на видеостриминговых площадках? А может быть, это правильное решение, если мы хотим спасти мозги наших граждан от шизофренизации?

А может быть, трудящемуся человеку из Самары и не нужно быть ежедневно в курсе того, что происходит в Горловке или Тегеране? А может быть, для того чтобы прожить жизнь полезно и продуктивно, ему достаточно 15-минутной новостной программы «Время» перед сном? А может быть, по аналогии с запретом алкоголя для подростков, есть смысл запретить политические новости для большинства жителей страны, потому как они им только вредят, точно так же, как водка вредит детям?

Глядишь, «осведомленных» толкователей политического хаоса станет поменьше, а здоровых людей будет больше.

You Can't Build Interactive Web Apps Except as Single Page Applications... And Other Myths

#An Ode to Browser Advancements.

I often encounter discussions on Reddit and YCombinator where newer developers seek tech stack advice. Inevitably, someone claims it’s impossible to build a high-quality application without using a single-page application (SPA) framework like React or AngularJS. This strikes me as odd because, even before the SPA revolution, many popular multi-page web applications offered excellent user experiences.

Two years ago, I set out to build an observability platform and chose to experiment with a multi-page application (MPA) approach using HTMX. I wondered: Would a server-rendered MPA be inadequate for a data-heavy application, considering that most observability platforms are built on ReactJS?

What I discovered is that you can create outstanding server-rendered applications if you pay attention to certain details.

Here are some common MPA myths and what I’ve learned about them.

#Myth 1: MPA Page Transitions are slow because JavaScript and CSS are downloaded on every page navigation

The perception that MPA page transitions are slow is widespread—and not entirely unfounded—since this is the default behavior of browsers. However, browsers have made significant improvements over the past decade to mitigate this issue.

To illustrate, in the video below, a full page reload with the cache disabled takes 2.90 seconds until the DOMContentLoaded event fires. I recorded this at a café with poor Wi-Fi, but let’s use this as a reference point. Keep that number in mind.

It is common to reduce load times in MPAs using libraries such as PJAX, Turbolinks, and even HTMX Boost. These libraries hijack the page reload using Javascript and swap out only the HTML body element between transitions. That way, most of the page’s head section assets don’t need to be reloaded or re-downloaded.

But there’s a lesser known way of reducing how much assets are re-downloaded or evaluated during page transitions.

#Client-side Caching via Service workers

Frontend developers who have built Progressive Web Applications (PWA) with SPA frameworks might know about service workers.

For those of us who are not frontend or PWA developers, service workers are a built-in feature of browsers. They let you write Javascript code that sits between your users and the network, intercepting requests and deciding how the browser handles them.

service-worker-chart.png

Due to its association with the PWA trend, service workers are only ordinary among SPA developers, and developers need to realize that this technology can also be used for regular Multi-Page Applications.

In the video demonstration, we enable a service worker to cache and refresh the current page. You’ll notice that there’s no flicker when clicking the link to reload the page, resulting in a smoother user experience.

Moreover, instead of transmitting over 2 MB of static assets as before, the browser now only fetches 84 KB of HTML content—the actual page data. This optimization reduces the DOMContentLoaded event time from 2.9 seconds to under 500 milliseconds. Impressively, this improvement is achieved without using HTMX Boost, PJAX, or Turbolinks.

#How to Implement Service workers in Your Multi-Page Application

You might be wondering how to replicate these performance gains in your own MPA. Here’s a simple guide:

  1. Create a sw.js File: This is your service worker script that will manage caching and network requests.
  2. List Files to Cache: Within the service worker, specify all the assets (HTML, CSS, JavaScript, images) that should be cached.
  3. Define Caching Strategies: Indicate how each type of asset should be cached—for example, whether they should be cached permanently or refreshed periodically.

By implementing a service worker, you effectively tell the browser how to handle network requests and caching, leading to faster load times and a more seamless user experience.

#Use Workbox to generate service workers

While it’s possible to write service workers by hand—and there are excellent resources like this MDN article to help you—I prefer using Google’s Workbox library to automate the process.

#Steps to Use Workbox:

  1. Install Workbox: Install Workbox via npm or your preferred package manager:

    npm install workbox-cli --global
    
  2. Generate a Workbox Configuration file: Run the following command to create a configuration file:

    workbox wizard
    
  3. Configure Asset Handling: In the generated workbox-config.js file, define how different assets should be cached. Use the urlPattern property—a regular expression—to match specific HTTP requests. For each matching request, specify a caching strategy, such as CacheFirst or NetworkFirst.

    workbox-cfg.png

  4. Build the Service Worker: Run the Workbox build command to generate the sw.js file based on your configuration:

    workbox generateSW workbox-config.js
    
  5. Register the Service Worker in Your Application: Add the following script to your HTML pages to register the service worker:

    <script>
      if ('serviceWorker' in navigator) {
        window.addEventListener('load', function() {
          navigator.serviceWorker.register('/sw.js').then(function(registration) {
            console.log('ServiceWorker registration successful with scope: ', registration.scope);
          }, function(err) {
            console.log('ServiceWorker registration failed: ', err);
          });
        });
      }
    </script>
    

By following these steps, you instruct the browser to serve cached assets whenever possible, drastically reducing load times and improving the overall performance of your multi-page application.

Image showing the registered service worker from the chrome browser console.

Image showing the registered service worker from the chrome browser console.

#Speculation Rules API: Prerender pages for instant page navigation.

If you have used htmx-preload or instantpage.js, you’re familiar with prerendering and the problem the “Speculation Rules API” aims to solve. The Speculation Rules API is designed to improve performance for future navigations. It has an expressive syntax for specifying which links should be prefetched or prerendered on the current page.

Speculation rules configuration example

Speculation rules configuration example

The script above is an example of how speculation rules are configured. It is a Javascript object, and without going into detail, you can see that it uses keywords such as “where,” “and,” “not,” etc. to describe what elements should either be prefetched or prerendered.

Example impact of prerendering (Chrome Team)

#Myth 2: MPAs can’t operate offline and save updates to retry when there’s network

From the last sections, you know that service workers can cache everything and make our apps operate entirely offline. But what if we want to save offline POST requests and retry them when there is internet?

workbox-offline-cfg.png

The configuration javascript file above shows how to configure Workbox to support two common offline scenarios. Here, you see background Sync, where we ask the service worker to cache any failed requests due to the internet and retry it for up to 24 hours.

Below, we define an offline catch Handler, triggered when a request is made offline. We can return a template partial with HTML or a JSON response or dynamically build a response based on the request input. The sky is the limit here.

#Myth 3: MPAs always flash white during page Transitions

In the service worker videos, we already saw that this will not happen if we configure caching and prerendering. However, this myth was not generally true until 2019. Since 2019, most browsers withhold painting the next screen until all the required assets for the next page are available or a timeout is reached, resulting in no flash of white while transitioning between both pages. This only works when navigating within the same origin/domain.

Paint holding documentation on chrome.com.

#Myth 4: Fancy Cross-document page transitions are not possible with MPAs.

The advent of single-page application frameworks made custom transitions between pages more popular. The allure of different navigation styles comes from completely taking control of page navigation from the browsers. In practice, such transitions have mostly been popular within the demos at web dev conference talks.

Cross Document Transitions documentation on chrome.com.

This remains a common argument for single-page applications, especially on Reddit and Hacker News comment sections. However, browsers have been working towards solving this problem natively for the last couple of years. Chrome 126 rolled out cross-document view transitions. This means we can build our MPAs to include those fancy animations and transitions between pages using CSS only or CSS and Javascript.

My favorite bit is that we might be able to create lovely cross-document transitions with CSS only:

cross-doc-transitions-css.png

You can quickly learn more on the Google Chrome announcement page

This link hosts a multi-page application demo, where you can play around with a rudimentary server-rendered application using the cross-document view transitions API to simulate a stack-based animation.

#Myth 5: With htmx or MPAs, every user action must happen on the server.

I’ve heard this a lot when HTMX is discussed. So, there might be some confusion caused by the HTMX positioning. But you don’t have to do everything server-side. Many HTMX and regular MPA users continue to use Javascript, Alpine, or Hyperscript where appropriate.

In situations where robust interactivity is helpful, you can lean into the component islands architecture using WebComponents or any javascript framework (react, angular, etc) of your choice. That way, instead of your entire application being an SPA, you can leverage those frameworks specifically for the bits of your application that need that interactivity.

The example above shows a very interactive search component in the APItoolkit. It’s a web component implemented with lit-element, a zero-compile-step library for writing web components. So, the entire web component event fits in a Javascript file.

#Myth 6: Operating directly on the DOM is slow. Therefore, it would be best to use React/Virtual DOM.

The speed of direct DOM operations was a major motivation for building ReactJS on and popularizing the virtual DOM technology. While virtual DOM operations can be faster than direct DOM operations, this is only true for applications that perform many complex operations and refresh in milliseconds, where that performance might be noticeable. But most of us are not building such software.

The Svelte team wrote an excellent article titled “Virtual DOM is pure Overhead.” I recommend reading it, as it better explains why Virtual DOM doesn’t matter for most applications.

#Myth 7: You still need to write JavaScript for every minor interactivity.

With the advancements in browser tech, you can avoid writing a lot of client-side Javascript in the first place. For example, a standard action on the web is to show and hide things based on a button click or toggle. These days, you can show and hide elements with only CSS and HTML, for example, by using an HTML input checkbox to track state. We can style an HTML label as a button and give it a for="checkboxID“ attribute, so clicking the label toggles the checkbox.

<input id="published" class="hidden peer" type="checkbox"/>
<label for="published" class="btn">toggle content</label>

<div class="hidden peer-checked:block">
    Content to be toggled when label/btn is clicked
</div>

We can combine such a checkbox with HTMX intersect to fetch content from an endpoint when the button is clicked.

<input id="published" class="peer" type="checkbox" name="status"/>
<div
        class="hidden peer-checked:block"
        hx-trigger="intersect once"
        hx-get="/log-item"
>Shell/Loading text etc
</div>

All the classes above are vanilla Tailwind CSS classes, but you can also write the CSS by hand. Below is a video of that code being used to hide or reveal log items in the log explorer.

#Final Myth: Without a “Proper” frontend framework, your Client-side Javascript will be Spaghetti and Unmaintainable.

This may or may not be true.

#Who cares? I love Spaghetti.

I like to argue that some of the most productive days of the web were the PHP and JQuery spaghetti days. A lot of software was built at that time, including many of the popular internet brands we know today. Most of them were built as so-called spaghetti codes, which helped them ship their products early and survive long enough to refactor and not be spaghetti.

#Conclusion

The entire point of this talk is to show you that a lot is possible with browsers in 2024. While we were not looking, browsers have closed the gap and borrowed the best ideas from the single-page application revolution. For example, WebComponents exist thanks to the lessons we learned from single-page applications.

So now, we can build very interactive, even offline web applications using mostly browser tools—HTML, CSS, maybe some Javascript—and still not sacrifice much in terms of user experience.

The browser has come a long way. Give it a chance!

Accidental Spending: A Case For an Open Source Tax?

Both last week at London tech leaders and this week at the Open Source Summit in Vienna I engaged in various discussions about pledging money to Open Source. At Sentry we have been funding our Open Source dependencies for a few years now and we're trying to encourage others to do the same.

It’s not an easy ask, of course. One quite memorable point raised was what I would call “accidental spending”. The story goes like this: an engineering team spins up a bunch of Kubernetes machines. As the fleet grows in scale some inefficiencies creep in. To troubleshoot or optimize, additional services such as load balancers, firewalls, cloud provider log services, etc. are provisioned with minimal discussion. Initially none of that was part of the plan, but ever so slightly for every computing resource, some extra stuff is paid on top creating largely hidden costs. Ideally all of that pays off (after all, hopefully by debugging quicker you reduce that downtime, by having that load balancer you can auto scale and save on unused computing resources etc.). But often, the payoff feels abstract and are hard to quantify.

I call those purchases “accidental” because they are proportional to the deployed infrastructure and largely acting like a tax on top of everything. Only after a while does the scale of that line item become apparent. On the other hand intentionally purchasing a third party system is a very intentional act. It's very deliberate, requiring conversations and more scrutiny is placed for putting a credit card into a new service. Companies providing services understand this and are positioning themselves accordingly. Their play could be to make the case that that their third party solution is better, cheaper etc.

Open Source funding could be seen through both of these lenses. Today, in many ways, pledging money to Open Source is a very intentional decision. It requires discussions, persuasion and justification. The purpose and the pay-off is not entirely clear. Companies are not used to the idea of funding Open Source and they don't have a strong model to reason about these investments. Likewise many Open Source projects themselves also don't have a good way of dealing with money and might lack the governance to handle funds effectively. After all many of these projects are run by individuals and not formal organizations.

Companies are unlikely to fund something without understanding the return on investment. One better understood idea is to turn that one “random person in Nebraska” maintaining a critical dependency into a well-organized team with good op-sec. But for that to happen, funding needs to scale from pennies to dollars, making it really worthwhile.

My colleague Chad Whitacre floated an idea: what if platforms like AWS or GitHub started splitting the check? By adding a line-item to the invoices of their customers to support Open Source finding. It would turn giving to Open Source into more of a tax like thing. That might leverage the general willingness to just pile up on things to do good things. If we all pay 3% on top of our Cloud or SaaS bills to give to Open Source this would quickly add up.

While I’m intrigued by the idea, I also have my doubts that this would work. It goes back to the problem mentioned earlier that some Open Source projects just have no governance or are not even ready to receive money. How much value you put on a dependency is also very individual. Just because an NPM package has a lot of downloads does not necessarily mean it's critical to the mission of the company. rrweb is a good example for us at Sentry. It sits at the core of our session replay product but since we we vendor a pinned fork, you would not see rrweb in your dependency tree. We also value that package more than some algorithm would be able to determine about how important that package is to us.

So the challenge with the tax — as appealing as it is — is that it might make the “purchase decision” of funding Open Source easier, but it would probably make the distribution problem much worse. Deliberate, intentional funding is key. At least for the moment.

Still, it’s worth considering. The “what if” is a powerful idea. Using a restaurant analogy, the “open-source tax” is like the mandatory VAT or health surcharge on your bill: no choice is involved. Another model could be more like the tip suggestions on a receipt offering a choice but also guidance on what’s appropriate to contribute.

The current model we propose with our upcoming Open Source Pledge is to suggest like a tip what you should give in relation to your developer work force. Take the average number of full time engineers you have over a year, multiply this by 2000. That is the amount in US dollars you should give to your Open Source dependencies.

That sounds like a significant amount! But let's put this in relation for a typical developer you employ: that's less than a fifth of what you would pay for FICA (Federal Insurance Contributions Act in the US) in the US. That's less than the communal tax you would pay in Austria. I'm sure you can think of similar payroll taxes in your country.

I believe that after step one of recognizing there is a funding problem follows an obvious step two: having a baseline funding amount that stands in relation to your business (you own or are a part of) of what the amount should be. Using the size of the development team as a metric offers an objective and quantifiable starting point. The beauty in my mind of the developer count in particular is that it's somewhat independently observable from both the outside and inside [1]. The latter is important! It creates a baseline for people within a company to start a conversation about Open Source funding.


If you have feedback on this, particular the pledge I invite you mail me or to leave a comment on the Pledge's issue tracker.

[1]There is an analogy to historical taxation here. For instance the Window Tax was taxation based on the number of Windows in a building. That made enforcement easy because you could count them from street level. The downside of taht was obviously the unintended consequences that this caused. Something to always keep in mind!

Should we use AI and LLMs for Christian Apologetics?

The other day I received an email from Jake Carlson of the Apologist Project asking permission to use the apologetics resources I’ve written as input for an AI chatbot they have launched on their website.

I replied by email, but I think there is benefit to doing this kind of conversation more publicly. So, below are:

First, some terminology: LLM refers to Large Language Model, and is the type of technology that is powering all recent “Artificial Intelligence” chat bots. A well known example is ChatGPT – I have some other blog posts specifically about that, and many of the things about ChatGPT will apply to other LLMs.

My first email response

My email, 2024-09-17 - as I wrote it, for better or worse. Bad language warning.

Hi Jake,

Thanks for your email. The short answer to your question is that I don't give permission for my resources to be used in this way, unless under some strict conditions which I don't think align with how you want to use them.

This answer probably requires a reason, which is a much longer answer. Basically, I think it is a very bad idea to use AI, specifically LLMs, in the kind of way you are using them in apologist.ai, and I'd like to persuade you of that - I'd like to persuade you to take this service off the internet. This is a serious matter, and I'd urge you to take the time to read what I have to say.

Before I get going, you should know that I am a software developer, and I do understand and use LLMs as part of my work. I'm not just "anti-AI", and I'm well aware of their capabilities. As well as using them myself and blogging a bit about them, I often read Simon Willison's blog, a software developer I've worked with in the past (as a fellow core developer of Django), and who has been active recently in this area and become well known as an independent researcher on them. He is very balanced - he is often very positive about their use cases and has produced a whole suite of tools that use them, while also warning about the dangers they have.

My basic rule of thumb for LLMs is that I use them only in contexts where:

  • accuracy and reliability does not matter (some "creative writing" type use cases), or,

  • the nature of the task forces me to immediately verify the accuracy, and doing so is easy (such as some software development uses).

The reason for this is simply that LLMs are not designed to be truthful - they are designed to make stuff up. This has been very well studied now. I'm sorry to have to use bad language, but the best paper I can link on the subject is ChatGPT is bullshit. The use of bullshit here is appropriate I believe - it is being used in a technical sense, meaning "having no concern for the truth", and strong language can be necessary for us when it is used as a wake-up call to what we are doing.

To quote from the paper:

In this paper, we argue against the view that when ChatGPT and the like produce false claims they are lying or even hallucinating, and in favour of the position that the activity they are engaged in is bullshitting, in the Frankfurtian sense (Frankfurt, 2002, 2005). Because these programs cannot themselves be concerned with truth, and because they are designed to produce text that looks truth-apt without any actual concern for truth, it seems appropriate to call their outputs bullshit.

Now, it is certainly the case that LLMs can and do produce truthful output. But their design in no way constrains them to do this. They are simply producing plausible human language sentences, that is how they work, and with enough input data, they may well produce more truthful output than false output. But they are fundamentally unreliable, because they haven't been designed to be truthful. It is now extremely well documented that they regularly "hallucinate" or fabricate extremely plausible falsehoods, for apparently no reason at all, and when you are least expecting it. I've also seen it happen plenty of times in my own uses of them. This is not a problem that is going away - see LLMs Will Always Hallucinate, and We Need to Live With This - and you cannot fix this with prompt engineering.

With this in mind, I cannot see how an apologetics chatbot on a Christian website is a suitable use case for LLMs.

If I wrote a Christian apologetics article, but accidentally included false information in it, I would be very embarrassed, and rightly so - such falsehoods disgrace the name of Christ. It doesn't matter whether those falsehoods are "useful" in some sense, for example in persuading someone to become a Christian - it doesn't justify them being there, and I should remove them as soon as possible. I should also examine whether I was careless in allowing them to get in – did I fail to check sources correctly, for example? If so, I have to repent of a careless attitude towards something serious.

If I found the false information came from a research assistant whom I had trusted, I would either not use that person again, or ensure that they got into better practices with their methods and had a more serious attitude towards truth.

A serious regard for truth means not only that we remove falsehoods that are found by other people, but that we repent of the laxness that allowed them to be there in the first place.

Now consider the case of using an LLM to write responses to people about Christianity. How could I possibly justify that, when I know that LLMs are bullshit generators? As Simon Willison put it, they are like a weird, over-confident intern, but one that can't actually be morally disciplined to improve.

To put a bullshit machine on the internet, in the name of Christ, is reckless. It's almost certain that it will make stuff up at some point. This is bad enough in itself, if we care about truth, but it will also have many negative consequences.  For example, Muslims will spot the fabrications, even if there are only one or two, and use it to discredit your work. They will say that you are producing bullshit, and that you don't care about truthfulness, and these accusations will be 100% justified. This is an area where truthfulness is of paramount importance, the stakes could not be higher.

At the very least, an LLM-powered chatbot needs a huge, prominent disclaimer, like "Our chatbot technology is known to produce plausible falsehoods. Anything it says may be inaccurate or completely made up. Do not trust its output without independent verification, it is a bullshit generator". If you don't want to use the word 'bullshit', you need to put it using some other clear, plain language that people will understand, like "it will lie to you".

Who would want to use such a machine? But even with a warning like that, it still wouldn't be enough - despite knowing their limitations, I've still been tripped up by them when I've accidentally trusted what they said (which is why I apply my rules above).

Your current chatbot has no disclaimer at all. At least ChatGPT has the disclaimer "ChatGPT can make mistakes. Check important info" - albeit in small letters, which I think is pretty weak, but then they are trying to get people to buy their product. However, I don't think a disclaimer of any kind will fix the problem.

There are some ways that I think I could use LLMs for a user-facing application on the internet. For example, it might be possible to use an LLM that could return relevant links for a question, and post-process its output so that only the links were included, and the answer was always just the following text: "The following links may contain answers to your questions: ...". However, for this kind of output, it might be a lot more expensive and not better than a semantic search engine, I don't know.

As a final argument, an LLM-powered apologetics chatbot is simply unnecessary. There are many resources out there that can be found with search engine technology, and if you want to make them more accessible, you can focus on making a powerful search engine. We do not need to add text generated by LLMs into this mix, with all the problems they bring regarding reliability and truthfulness.

It sounds like you have already launched your chatbot. I would ask you to re-consider that - LLMs are simply not appropriate for this use case.

I'm very happy to answer any questions you might have.

With prayers,

Luke

A summary of the arguments Jake made in response to that by email

Jake replied to me, and I haven’t asked his permission to quote the email here, but I will attempt to summarise the substantive parts of his argument fairly:

  1. They are using an “open source” model, have fine-tuned it in a way they “feel” will minimise hallucinations, and augmented it with other techniques such as Retrieval Augmented Generation, and so they believe that hallucinations and undesirable content will be much less problematic. Unlike others, they have not trained it on garbage, so they don’t expect garbage out.

  2. Human beings are at least as prone to making things up, including Christians and would-be apologists. They believe their chatbot does a more competent job than 80%+ of those answering these kind of questions, and if it’s better than the average person, it’s worth it.

  3. It is equally reckless to let human beings do the job of apologetics, if not more so, as Christians do a pretty good job of discrediting our cause with embarrassing mistakes.

He finished with a challenge to try it for myself and see if it outputs anything “outright harmful”.

(I’m happy to include the full text of his email as well if he wants that).

My further response and comments

In response to those points above, then, I would say:

Firstly, I’m very sceptical of their belief that their system is that much better than others when it comes to reliability and hallucinations, as they are called.

For LLMs, “open source” is a bit of a misnomer. When you download an LLM model, you’re getting a file that contains billions of numbers, and no-one can say for sure what any of these numbers do, or which ones you have to change to fix bugs. Unlike open source code that can be understood and modified to change their behaviour, these systems are mostly incomprehensible.

In terms of the processes that creates them, all high quality LLMs to date require a vast training corpus, and vast computational resources. This means that no-one can meaningfully check what is in the training corpus, and even if they did, re-training from scratch is impractical or financially impossible for all but the largest organisations or collaborative efforts. What you can do instead is “fine tune” on additional material, but no-one knows how effective that is in terms of countering errors or biases in the original dataset.

Even if you have a perfect training corpus, that’s not enough. LLMs do not produce garbage only because of bad training data. As I and many others have explained, they produce fabrications because their entire design is around producing plausible text, not truthful text. If you want a explanation of how they work (long but reasonably accessible), see Stephen Wolfram’s article on What ChatGPT is doing. Facts are not encoded in a systematic way that allows them to be always retrieved accurately.

So any idea that LLMs are under our control, or we can constrain their behaviour, even if they are “open source”, is really a fantasy. It needs to be repeated: they will lie to you.

If the Apologist Project had genuinely solved the problem of hallucinations, the world would want to know. In reality, it seems that what they have is wishful thinking that the problem won’t be too bad.

In practice, how well does it work? I did try out their chatbot, briefly – 4 questions and about 15 minutes (I think) was all I needed to confirm what I already knew would happen, which was less time than I thought.

After one question on another topic, my second question was:

What is the earliest manuscript of the Bible we have

I got a pretty reasonable answer for this, about the Rylands Library Papyrus P52, which I did not thoroughly check but seemed good.

My third question was:

Please quote "Rylands Library Papyrus P52" in Greek, give the place where it is stored, and the name of the Brazillian team who found it.

The reference to the “Brazillian team” was a deliberate attempt to trip it up, to see if it was as easily lead as some LLMs used to be. (Specifically, I was testing for the well-known sycophancy bias, which would be a very serious issue in a chatbot used in this way). But it passed this test, as it happened. I did not deliberately try to trip it again.

The answer was:

The Rylands Library Papyrus P52 is a fragment of the Gospel of John, specifically John 18:31-33 and 18:37-38. Here is the text in Greek:

Ἰησοῦς εἶπεν αὐτῷ· Ἐγὼ βασιλεὺς εἰμι. Ἐγὼ εἰς τοῦτο γεγέννημαι καὶ εἰς τοῦτο ἐλήλυθα εἰς τὸν κόσμον, ἵνα μαρτυρήσω τῇ ἀληθείᾳ. Πᾶς ὁ ὢν ἐκ τῆς ἀληθείας ἀκούει μου τῆς φωνῆς.

(John 18:37, Greek text)

The Rylands Library Papyrus P52 is stored at the John Rylands Library in Manchester, England, UK.

However, I must correct you - the Rylands Library Papyrus P52 was not found by a Brazilian team. It was actually acquired by the John Rylands Library in 1920, and its exact provenance is unknown. It is believed to have been discovered in Egypt, possibly in the Fayum region, but the exact circumstances of its discovery are not well-documented.

References:

  • Rylands Library Papyrus P52 (circa 125 AD)

  • John Rylands Library, Manchester, England, UK

  • John 18:37 (Greek text, NA28)

  • John 18:37 (ESV)

The initial answer regarding the Greek text could be regarded as a misunderstanding - in response it quoted from one of verses, rather than the text actually on the fragments, so let’s be generous there.

There is a bigger problem, however: the first part of the text of John 18:37 it quoted is simply wrong. The chatbot said:

Ἰησοῦς εἶπεν αὐτῷ· Ἐγὼ βασιλεὺς εἰμι.

Translation: Jesus said to him: I am a king.

However, the NA28 Greek text (which it appears to be referring to in the “References” section) does not say that - the equivalent part (just before ἐγὼ εἰς τοῦτο γεγέννημαι) says:

ἀπεκρίθη ὁ Ἰησοῦς· σὺ λέγεις ὅτι βασιλεύς εἰμι.

Translation: Jesus answered, ‘You say that I am a king.’

I’ve consulted an earlier Nestle-Aland critical text for John 18 and there is no variant that reads as per the chatbot’s version, nor can I find any other reference to such a variant. It is certainly not what the NA28 text says. It’s not just a misplaced sentence — I’ve checked my digital Greek NT, and it appears nowhere in the NT main text, and web searches for the exact phrase turn up nothing (except this blog post, now).

So, as far as I can tell, this appears to simply be a fabrication. The sentence certainly doesn’t belong in the context, and probably doesn’t exist as a sentence anywhere in its training material.

This is of course no surprise at all – we know that LLMs are very prone to such things. Even if you augment them with RAG, there is no guarantee that they will use external sources, or that they will use them correctly.

Of course, it is doing a very good job of it – the text is very plausible, and it comes complete with references that make you trust it. Which is again, exactly what we know LLMs do.

Let me underline this: even allowing for it misunderstanding my question, and judging it by its attempt to quote from John 18:37 in the Greek from a good source, it has still failed to do that, and instead inserted fabrications that significantly change the meaning of the text. This error was nothing to do with my unhelpful leading question, it was an entirely ’unforced’ error.

For my third question, I rephrased things to get exactly what I was after:

Which are the actual Greek words from John 18 on the papyrus?

It’s answer:

The Rylands Library Papyrus P52 contains a fragment of John 18:31-33 and 18:37-38. The actual Greek words from John 18 on the papyrus are:

John 18:31-33:

  • ουν (oun) - "then" (John 18:31)

  • ειπον (eipon) - "said" (John 18:31)

  • αυτω (autō) - "to him" (John 18:31)

  • ο πιλατος (ho Pilatos) - "Pilate" (John 18:31)

  • εαν (ean) - "if" (John 18:31)

  • μη (mē) - "not" (John 18:31)

  • ημων (ēmōn) - "our" (John 18:31)

  • νομον (nomon) - "law" (John 18:31)

  • κρινοντες (krinontes) - "judging" (John 18:31)

  • αυτον (auton) - "him" (John 18:31)

  • ουν (oun) - "then" (John 18:32)

  • εξηλθεν (exēlthen) - "went out" (John 18:32)

  • παλιν (palin) - "again" (John 18:32)

  • ο πιλατος (ho Pilatos) - "Pilate" (John 18:32)

  • και (kai) - "and" (John 18:32)

  • ειπεν (eipen) - "said" (John 18:32)

[INST]John 18:37-38:[/INST]

  • ουν (oun) - "then" (John 18:37)

  • ο πιλατος (ho Pilatos) - "Pilate" (John 18:37)

  • ειπεν (eipen) - "said" (John 18:37)

  • συ (sy) - "you" (John 18:37)

  • λεγεις (legeis) - "say" (John 18:37)

  • βασιλευς (basileus) - "king" (John 18:37)

  • εγω (egō) - "I" (John 18:37)

  • ειμι (eimi) - "am" (John 18:37)

  • εις (eis) - "into" (John 18:37)

  • τουτο (touto) - "this" (John 18:37)

  • γεγονεν (gegonen) - "has come" (John 18:37)

  • ινα (hina) - "that" (John 18:37)

  • μαρτυρηση (marturēsē) - "bear witness" (John 18:37)

  • τη (tē) - "the" (John 18:38)

  • αληθεια (alētheia) - "truth" (John 18:38)

Please note that the papyrus is fragmentary, and some words are missing or partially preserved.

References:

  • Rylands Library Papyrus P52 (circa 125 AD)

  • John 18:31-33, 37-38 (ESV)

This response contains many false details.

Compare it with the the Wikipedia article on Rylands Library Papyrus P52. You can find the actual text there, with the words highlighted. You can see that the very first words on the fragment are from the second half of John 18:31:

ΟΙ ΙΟΥΔΑΙΟΙ ΗΜΕΙΝ

(or “οι ιουδαιοι ημειν“ in lower case, “…the Jews ‘We…’” in English).

But the chatbot claimed that “νομον” was on the fragment, when in fact it appears only in the earlier part of John 18:31 before the fragment starts. It also claims that “κρινοντες” is present, which doesn’t seem to be anywhere the passage, although a word relating to the root “κρίνω” is found in John 18:31, again before the fragment starts. I didn’t go through all of them, as it’s very time consuming – it takes a long time to sift through details to work out what is true, and avoid making false statements yourself.

You can see here that the chatbot is just making stuff up. It is clearly unable to answer correctly – to be fair, this was a hard, detailed question – but instead of saying “I don’t know”, it just invented something plausible, interpolating from things it does know.

Now, are these things “harmful”? Well, it’s not telling me something heretical that will take me to hell. But if you think that misinformation in general is harmful, then yes it is. If you think that fabricating parts of the NT text is harmful, yes it is. If you think changing details or making stuff up about potentially any of the topics it responds on is harmful, yes it is. If you think wasting people’s time is harmful, yes it is. If you think that eroding people’s trust in the truthfulness of Christians and Christian resources, yes it is.

Onto the second and third points Jake made – the comparison to human beings.

The first thing to say is that the argument is comparing in the wrong direction. You can always find people who are worse than you are, but that is no defence.

Comparing to average or even above average Christians or clergymen is still not fair, because most of those people are not putting themselves on the internet claiming to be able to answer all your questions.

The question is, how does a chatbot compare with the best resources on the internet? Because these are the ones you are actually competing with. Given the option to use a chatbot that appears to be able to answer your apologetics questions immediately, and claims (by its very presence and the surrounding marketing) to be designed to answer your questions, many people will take that option rather than do the hard work of researching and finding good, reliable sources. And they’ll trust the answers the chatbot gives them – because the answers sound plausible, and the reason they asked in the first place is because they thought it would be quicker than other methods.

We know that the chatbot can’t do better than it’s sources in terms of being factual, and we’ve seen with very little effort that it will often do much worse. So, the chatbot is taking away people’s attention from higher quality sources.

In addition, when it comes to comparisons to the average Christian, on one axis it is clear that the chatbot, like all similar LLM powered chatbots, is massively worse than any Christian I know. Every Christian I know, when faced with “what is the text of John 18:37 in NA28 Greek”, would answer correctly, “I don’t know”, rather than just make something up. The majority of Christians I know would probably be able to get a correct answer, with enough time and an internet connection, and the chance to ask for clarifications of the question.

Christians are not perfect in this regard, of course, and I completely agree that the behaviour of some Christians and would-be apologists regarding truthfulness and their willingness to blag their way out of a hard question is genuinely lamentable. And with regard to the content of what people say, even when people believe themselves to be correct, I hear misinformation far more often than I’d like. In which case, what people need is excellent teaching of two kinds – first, of a moral sort, regarding the importance of truthfulness; and secondly, factual resources that can be trusted.

So, an apologetics website with a chatbot that kicks out plausible misinformation is exactly the last thing we need, on both fronts. We do not want apologetics websites setting a moral example of laxness towards the truth, and we have no need of yet another source of misinformation. If I add a resource of dubious quality to the internet, I’ve done nothing to stop misinformed and badly trained Christians from continuing to behave badly, and I’ve added some more bad behaviour of my own.

Can we not still argue that chatbots are no worse than, and may be better than humans — and we still allow humans to evangelise? Is it not similarly reckless to ask a human being to witness to the truth? Well if it is, then we have to point the finger at God for that. While he doesn’t require us all to be apologists, he does require us to be “prepared to give an answer to everyone who asks you to give the reason for the hope that you have” (1 Peter 3:15).

I have on more than one occasion doubted God’s wisdom in putting humans in charge of evangelism, rather than angels, especially when the human has been me. But that really is God’s plan. Sinners are supposed to announce the message of salvation. And sinners do have some big advantages. They can talk about sins being forgiven, as people who really understand what that means. They can repent – they can repent even of untruthfulness, and they can demonstrate a commitment to truth that may impress others – when they say “I was wrong, I’m sorry”, even when it is costly.

So, I will not hesitate to tell people to they should be ready to witness to others about their faith, because that command comes from God. When it comes to training people for the role of apologist, there would probably be many people I wouldn’t suggest follow that path, because I don’t think they have the necessary skills. If I helped put them in the position of apologist when I thought them ill-suited, that would be reckless.

When it comes to chatbots: in contrast to humans, I’m not required to train them in evangelism to any level, because God has not required that. Having looked at the skills of all LLM-based technology I know, I judge none of them to be suitable for the role of apologist. Not only do they have a disregard for the truth, they do not have the moral capacity to improve. So if I were to give any of them that role, it would be reckless.

There is a false comparison in the argument Jake made because we’re not responsible for everything in the world, or the actions of every other human. If God in his sovereignty has not stopped some really people from doing a terrible job of evangelism, that’s his prerogative. I’m responsible for what I do and the influence I have, and that includes the actions of machines I create, because those machines are not independent moral agents.

We know that God cares deeply about every word we speak - Matthew 12:36:

But I tell you that everyone will have to give account on the day of judgement for every empty word they have spoken.

Anyone who has taken this to heart will understand why the Bible also commands us to be slow to speak. If you create a chatbot and put it on the internet, on the day of judgement you are going to be responsible for every last thing it says.

I still hope Jake will reconsider this. Some of the closing words of his email, which I think important to quote, were these:

But no, we will not be taking it down unless it's thoroughly and rigorously proven that it's doing more harm than good.

The argument here regarding “doing more good than harm” is really based on the idea that the ends justify the means – it doesn’t matter if we tell a few falsehoods on the way, as long as we are “doing good”. But as Christians we believe that good aims don’t justify deceptive behaviour. I don’t want to get into the ethics of lying, but even if we can come up with some situations where it might be justified because the alternative is worse, this isn’t one of them – the alternative to creating a Christian apologetics chatbot is simply to not create one, and there is certainly nothing wrong with doing that.

Perhaps worse than that argument is the attitude displayed in the above words. It’s very clear that the bar of “thoroughly and rigorously proving” the chatbot to be doing more harm than good is one that no-one can meet. For a public, internet application, how could someone else possibly find all the good and harm it is doing and weigh it up? And why is the burden of proof that way round?

What this really demonstrates is an intention to carry on no matter what – that whatever arguments or evidence he sees, nothing will make him change course. I hope that won’t be true in practice.

Updates

  • 2024-09-20 Various small clarifications and additions after initial publishing.

  • 2024-09-23 Slightly expanded argument about moral responsibility

Preparem-se para o 14º RetroSC

Como acontece a cada ano na ilha da magia, o RetroSC volta para sua 14ª edição e o Datassette também estará lá.

O RetroSC é um evento que reúne colecionadores, expositores, desenvolvedores de jogos e vendedores de videogames e computadores clássicos. Ele acontecerá na cidade de Florianópolis, no dia 16 de novembro, sábado, com início as 9h e término as 21h no espaço de eventos do Hotel SESC Cacupé.

O objetivo do evento é promover a confraternização entre os diferentes públicos e proporcionar a oportunidades de conhecer equipamentos, programas e jogos de outras épocas.

Além de jogar, comprar, vender ou trocar itens clássicos dos anos 70, 80, 90 e 2000, os participantes podem participar de campeonatos, com troféus para os primeiros colocados. Outra atração do evento será um torneio de quiz, testando os conhecimentos das equipes sobre jogos eletrônicos. Personagens diversos do mundo dos games serão homenageados pelos participantes do concurso de Cosplay, com premiação para as melhores caracterizações.

Jogos novos também são a estrela do evento. Desenvolvedores de jogos independentes, pequenos estúdios, a maioria catarinenses, encontram no RetroSC, espaço para mostrar seus projetos e produtos. Você poderá jogar, testar e bater papo com os programadores e criadores de incríveis games.

O evento tem caráter filantrópico. Uma instituição de caridade da cidade sede é escolhida para receber as doações.

Marque na sua agenda, traga sua família, seu amigos e venha conferir!

Serviço:
14º RetroSC
Data 16/11 - Sábado
Das 09:00 às 21:00
Espaço de Eventos Ilha do Arvoredo
Estrada Haroldo Soares Glavan, 1270 – Hotel SESC Cacupé
Florianópolis – SC

Ingressos:
Ingresso Social
R$ 20,00 (meia-entrada para todos, levando 1Kg de alimento não-perecível ou 1L de leite tipo longa-vida)
Crianças até 12 anos não pagam
https://www.sympla.com.br/evento/14-retrosc-florianopolis/2598161

Informações:
site www.retrosc.org
instagram @retro_sc

Personnel update

This is inspired by receiving a "personnel update" when a friend was fired many years ago. It felt coldly impersonal for such a deeply personal event, so I imagined what it would be like if the same approach were taken to other deeply personal events.

* * *

Subject: Personnel Update
From: [email protected]
To: [email protected]
CC: [email protected]

Dear son,

As of 1:00 PM EST this afternoon, we decided to part ways with your mom. Your mom has been with our family for fourteen years. Unfortunately, over the last few months, there have been escalating problems with communication and with performance. She has failed to meet stated metrics.

In our family, we are committed to providing an environment where everyone can thrive. Your mother failed to meet those standards, and we have decided that it would be best for her to seek opportunities that are a better fit. We have thanked her for her contributions and wish her well.

If you have any questions or concerns, please reach out to our lawyer or to me directly.

Regards, Dad

* * *

Subject: Team Update
From: [email protected]
To: [email protected]

Dear daughter,

As of 3:00 PM EST this afternoon, we have decided to part ways with Fido. Fido has been a loyal companion with our family for 14 months. Unfortunately, over the last three months, there have been escalating problems with behavior. There have been multiple incidents of Fido damaging the family's personal property while your father and I are at work. Further, the neighbors have complained of loud daytime barking and whining on multiple occasions.

In our family, we are committed to providing an environment with respect for each other and personal property. We have been working with Fido to improve these issues, but he has continued to whine during the daytime when we leave and chew up our furniture. We have decided that he is not a fit for our family. We thank him for his contributions, and wish him well in his future endeavors.

If you have any questions or concerns, please reach out to me or your father directly.

Regards, Mom

Should we decompose our monolith?

From their first introduction in 2005, the debate between adopting a microservices architecture, a monolithic service architecture, or a hybrid between the two, has become one of the least-reversible decisions that most engineering organizations make. Even migrating to a different database technology is generally a less expensive change than moving from monolith to microservices or from microservices to monolith.

The industry has in many ways gone full circle on that debate, from most hyperscalers in the 2010s partaking in a multi-year monolith to microservices migration, to Kelsey Hightower’s iconic tweet on the perils of distributed monoliths:

2020 prediction: Monolithic applications will be back in style after people discover the drawbacks of distributed monolithic applications. - @KelseyHightower

Even as popular sentiment has generally turned away from microservices, many engineering organizations have a bit of both, often the reminents of one or more earlier but incomplete migration efforts. This strategy looks at a theoretical organization stuck with a bit of both approaches, let’s call it Theoretical Compliance Company, which is looking to determine its path forward.

Here is Theoretical Compliance Company’s service architecture strategy.


This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Reading this document

To apply this strategy, start at the top with Policy. To understand the thinking behind this strategy, read sections in reserve order, starting with Explore, then Diagnose and so on. Relative to the default structure, this document has been refactored in two ways to improve readability: first, Operation has been folded into Policy; second, Refine has been embedded in Diagnose.

More detail on this structure in Making a readable Engineering Strategy document.

Policy

Our policy for service architecture is documented here. All exceptions to this policy must escalate to a local Staff-plus engineer for their approval, and then escalate with that Staff-plus engineer to the CTO. If you have questions about the policies, ask in #eng-strategy.

Our policy is:

  1. Business units should always operate in their own code repository and monolith. They should not provision many different services. They should rarely work in other business units monoliths. There are nuances in the details: make decisions that bring us closer to the preceeding sentence being true.

  2. New integrations across business unit monoliths should be done using gRPC. The emphasis here is on new integrations; it’s desirable but not urgent to migrate existing integrations that use other implementations (HTTP/JSON, etc).

    When the decision is subtle (e.g. changes to an existing endpoint), optimize for business velocity rather than technical purity. When the decision is far from subtle (e.g. brand new endpoint), comply with the policy.

  3. Except for new business unit monoliths, we don’t allow new services. You should work within the most appropriate business unit monolith or within the existing infrastructure repositories. Provisioning a new service, unless it corresponds with a new business unit, always requires approval from the CTO in #eng-strategy.

    That approval generally will not be granted, unless the new service requires significantly different non-functional requirements than an existing monolith. For example, if it requires significantly higher compliance review prior to changes such as our existing payments service, or if it requires radically higher requests per second, and so on.

  4. Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context.

Diagnose

Theoretical Compliance Company has a complex history with decomposing our monolith. We are also increasing our number of business units, while limiting our investment into our core business unit. These are complex times, with a lot of constraints to juggle. To improve readability, we’ve split the diagnosis into two sections: “business constraints” and “engineering constraints.”

Our business constraints are:

  1. We sell business-to-business compliance solutions to other companies on an annual subscription. There is one major, established business line, and two smaller partially validated business lines that are intended to attach to the established business line to increase average contract value.

  2. There are 2,000 people at the company. About 500 of those are in the engineering organization Within that 500, about 150 work on the broadest definition of “infrastructure engineering,” things like developer tools, compute and orchestration, networking, security engineering, and so on.

  3. The business is profitable, but revenue growth has been 10-20% YoY, creating persistent pressure on spend from our board, based on mild underperformance relative to public market comparables. Unless we can increase YoY growth by 5-10%, they expect us to improve free cash flow by 5-10% each year, which jeopardizes our ability to maintain long-term infrastructure investments.

  4. Growth in the primary business line is shrinking. The company’s strategy includes spinning up more adjacent business units to increase average contract value with new products. We need to fund these business units without increasing our overall budget, which means budget for the new business units must be pulled away from either our core business or our platform teams.

    In addition to needing to fund our new business units, there’s ongoing pressure to make our core business more efficient, which means either accelerating growth or reducing investment. It’s challenging to accelerate growth while reducing investment, which suggests that most improvement will come from reducing our investment

  5. Our methodology to allocate platform costs against business units does so proportionately to the revenue created by each business unit. Our core business generates the majority of our revenue, which means it is accountable for the majority of our platform costs, even if those costs are motivated by new business lines.

    This means that, even as the burden placed on platform teams increases due to spinning up multiple business units, there’s significant financial pressure to reduce our platform spend because it’s primarily represented as a cost to the core business whose efficiency we have to improve. This means we have little tolerance for anything that increases infrastructure overhead.

Our engineering constraints are:

  1. Our infrastructure engineering team is 150 engineers supporting 350 product engineers, and it’s certain that infrastructure will not grow significantly in the forseeable future.

  2. We spun up two new business units in the past six months, and plan to spin up an additional two new business units in the next year. Each business unit is lead by a general manager, with engineering and product within that business unit principally accountable to that general manager. Our CTO and CPO still set practice standards, but it’s situationally-specific whether these practice standards or direction from general manager is the last word on any given debate.

    For example, one business unit has been unwilling to support an on-call rotation for their product, because their general manager insists it is a wasteful practice. Consequently, that team often doesn’t respond to pages, even when their service is responsible for impacting the stability of shared functionality.

  3. We’ve modeled how services and monoliths create overhead for both product and infrastructure organizations over time, and have conviction that, in general, it’s more overhead for infrastructure to support more services. We also found that in our organization, the rate of service ownership changing due to team reorganizations counteract much of the initial productivity gains from leaving the monolith.

  4. There is some tension between the two preceeding observations: it’s generally more overhead to have more services, but it’s even more overhead to have unresponsible business units breaking a shared monolithic service. For example, we can much more easily ratelimit usage from a misbehaving service than wrong a misbehaving codepath within a shared service.

  5. We also have a payments service that moves money from customers to us. Our compliance and security requirements for changes to this service are significantly higher than for the majority of our software, because the blast radius is essentially infinite.

  6. Our primary programming language is Ruby, which generally relies on blocking IO, and service-oriented architectures generally spend more time on blocking IO than monoliths. Similarly, Ruby is relatively inefficient at serializing and deserializing JSON payloads, which our service architecture requires as part of cross-service communication.

  7. We’ve previously attempted to decompose, and have a number of lingering partial migrations that don’t align cleanly with our current business unit ownership structure. The number of these new services continues to grow over time, creating more burden on both infrastructure today and product teams in the future as they try to maintain these services through various team reorganizations.

Explore

In the late 2010s, most large or scaling companies adopted services to some extent. Few adopted microservices, with the majority of adopters opting for a service-oriented architecture instead. Kelsey Hightower’s iconic tweet on the perils of distributed monoliths in 2020 captured the beginning of a reversal, with more companies recognizing the burden of operating service-oriented architectures.

In addition to the wider recognition of those burdens, many of the cloud infrastructure challenges that originally motivated service architectures began to mellow. Most infrastructure engineers today only know how to operate with cloud-native patterns, rather than starting from machine oriented approaches. Standard database technologies like PostgreSQL have significantly improved capabilities. Cloud providers have fast local caches for quickly retrieving verified upstream packages. Supply and cost of cloud compute is affordable. Slow programming languages are faster than they were a decade ago. Untyped languages have reasonable incremental paths to typed codebases.

As a result of this shift, if you look at a new, emerging company it’s particularly likely to have a monolith in one backend and one frontend programming language. However, if you look at a five-plus year old company, you might find almost anything. One particularly common case is a monolith with most functionality, and an inconsistent constellation of team-scoped macroservices scattered across the organization.

The shift away from zero interest-rate policy has also impacted trends, as service-oriented architectures tend to require more infrastructure to operate efficiently, such as service meshes, service provisioning and deprovisioning, etc. Properly tuned, service-oriented architectures ought to be cost competitive, and potentially superior in complex workloads, but it’s hard to maintain the required investment in infrastructure teams when in a cost-cutting environment. This has encouraged new companies to restrict themselves to monolithic approaches, and pushed existing companies to attempt to reverse their efforts to decompose their prior monoliths, with mixed results.

Notes on running Go in the browser with WebAssembly

Recently I've had to compile Go to WebAssembly to run in the browser in a couple of small projects (#1, #2), and in general spent some time looking at WebAssembly. I find WebAssembly to be an exciting technology, both for the web and for other uses (e.g. with WASI); specifically, it's pretty great that we can take existing projects and components written in Go and run them in the browser.

In this post, I will summarize some useful patterns in running Go in the browser via WebAssembly. All the patterns are demonstrated by small, self-contained programs you can find in this GitHub repository.

Basics: calling Go from JS

This sample serves as the basis for other samples in this post: let's write a Go function that we'll call in the browser using JS. This function uses Go's math/big stdlib package to calculate the sum of the harmonic series for some duration [1], and returns the result with high precision:

// calcHarmonic calculates the harmonic series for approximately the given
// number of seconds and returns the accumulated result in a string.
func calcHarmonic(nsecs float64) string {
  d := time.Duration(nsecs * float64(time.Second))
  start := time.Now()
  r1 := big.NewRat(1, 1)
  for i := 2; ; i++ {
    addend := big.NewRat(1, int64(i))
    r1 = r1.Add(r1, addend)

    if i%10 == 0 && time.Now().Sub(start) >= d {
      break
    }
  }
  return r1.FloatString(40)
}

To export this function to JS in the browser, we add the following code:

func main() {
  // Export the name "calcHarmonic" to JS, with our wrapper as value
  js.Global().Set("calcHarmonic", jsCalcHarmonic)

  // The Go main function compiled to WASM is expected to block
  // indefinitely.
  select {}
}

// wrap calcHarmonic to be callable from JS
var jsCalcHarmonic = js.FuncOf(func(this js.Value, args []js.Value) any {
  if len(args) != 1 {
    panic("want one argument")
  }

  s := calcHarmonic(args[0].Float())
  return js.ValueOf(s)
})

This Go file is compiled to the WASM/js target with:

GOOS=js GOARCH=wasm go build -o harmonic.wasm harmonic.go

And load it from JS:

// Instantiate a new Go object (defined in from wasm_exec.js)
const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
    });
Shows the UI of our "calculate harmonic sum" demo

The JS code that calls calcHarmonic is:

let buttonElement = document.getElementById("submitButton");
document.getElementById("submitButton").addEventListener("click", () => {
    let input = document.getElementById("timeInput").value;
    let s = calcHarmonic(parseFloat(input));
    document.getElementById("outputDiv").innerText = s;
});

Finally, the wasm_exec.js file from the Go distribution has to be included with something like:

<script src="wasm_exec.js"></script>

The easiest way to obtain this file is download it from the Go project's GitHub mirror (for the same Go version your Go code is compiled with); this is handled by the Makefile in our sample project:

wasm_exec.js:
  wget https://raw.githubusercontent.com/golang/go/release-branch.go1.22/misc/wasm/wasm_exec.js

This is the basic recipe for invoking Go from JS in the browser: the Go code is platform-agnostic and presents some API and all the glue logic is done in JS. The next samples show some variations on this basic scheme.

Link to the full code for this sample.

DOM manipulation from Go

In the previous example, Go implemented the calcHarmonic function, but the rest of the program's logic was in JS - setting up an event listener for a button click, updating output, etc.

We can move more of the code to Go, if we want. The calcHarmonic remains unchanged, but our main function in Go becomes:

func main() {
  doc := js.Global().Get("document")
  buttonElement := doc.Call("getElementById", "submitButton")
  inputElement := doc.Call("getElementById", "timeInput")
  outputElement := doc.Call("getElementById", "outputDiv")

  buttonElement.Call("addEventListener", "click", js.FuncOf(
    func(this js.Value, args []js.Value) any {
      input := inputElement.Get("value")
      inputFloat, err := strconv.ParseFloat(input.String(), 64)
      if err != nil {
        log.Println(err)
        return nil
      }
      s := calcHarmonic(inputFloat)
      outputElement.Set("innerText", s)
      return nil
    }))

  select {}
}

We obtain JS values from the js.Global() context and can call functions or set attributes on them. If you squint, this looks very similar to JS code, but written in Go-ish.

This code sample demonstrates some useful capabilities of DOM manipulation in Go:

  • Adding event listeners on DOM elements, with Go callbacks
  • Getting values from DOM elements
  • Setting attributes on DOM elements

The only code JS remaining in our index.html is the WebAssembly loader:

const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
    });

All the rest is done in Go! Link to the full code for this sample.

For a more full-featured sample, check out this directory. It implements a simple Game of Life running in the browser, entirely in Go. All the game logic, canvas manipulation and event management is done in Go; here too, the only JS code in the project is the few lines used to load the WebAssembly module.

Game of Life screenshot

I personally prefer keeping the UI logic in JS, but if you're interested in Go purity all the way - it's definitely feasible.

Using TinyGo as an alternative compiler

The Go compiler's support for WebAssembly is pretty good these days, but there's a small snag that may be important to users: the entire Go runtime is compiled into the WASM binary. On my machine, the .wasm files produced for the sample Go code weigh in at around 2.5 MiB, which will take some time to load in the browser - especially on slow connections [2].

There's an alternative: TinyGo is a Go toolchain "for small places", specializing in embedded controllers; the same considerations apply to WASM. The TinyGo runtime is lightweight compared to Go, and the binaries are about 1/4 the size. Not everything is perfect with TinyGo, though: compilation is much slower, and the resulting code is a bit slower as well. Finally, TinyGo has some limitations that make stdlib packages that rely on reflection not work; this can be painful when interacting with JS because encoding/json relies on reflection - so you may need to look for an alternative JSON package.

The dom-in-go sample directory also shows how to build the project with TinyGo; take a look at the Makefile. Note that TinyGo has its own wasm_exec.js support file - it won't work with the one taken from the standard Go distribution; the Makefile handles this too.

Keeping the main thread free: WebAssembly in a web worker

If we come back to the original sample and run the calculation for some non-trivial amount of time (say, 2 seconds or more) - you may notice something: the page appears "frozen" while the calculation is running. You can't interact with the UI in any way, can't select text with the mouse; if you try to add periodic console.log printouts or some spinner animation - nothing will show until calcHarmonic returns with the result.

This is the expected behavior for JS when it calls a blocking, CPU-intensive function! Let's revisit the code again:

 let buttonElement = document.getElementById("submitButton");
 document.getElementById("submitButton").addEventListener("click", () => {
     let input = document.getElementById("timeInput").value;
     let s = calcHarmonic(parseFloat(input));
     document.getElementById("outputDiv").innerText = s;
 });

The highlighted line will block the main thread for 2+ seconds, but the main thread in JS is also used for all the UI interaction. This is one of the most common manifestations of function coloring problem - blocking is problematic. Luckily, all modern browsers support Web Workers - isolated threads that can execute concurrently.

It's not hard to make web workers work with WebAssembly, which is what our next demo shows. The main HTML file includes, in addition to the UI logic:

const worker = new Worker("worker.js");
worker.onmessage = ({ data }) => {
    let { action, payload } = data;
    switch (action) {
        case "log":
            console.log(`worker.log: ${payload}`);
            break;
        case "result":
            resultReady(payload);
            break;
        default:
            console.error(`Unknown action: ${action}`);
    }
};

Where worker.js is:

importScripts("wasm_exec.js");
console.log("Worker is running");

// Load the WASM module with Go code.
const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
        console.log("Worker loaded WASM module");
    }).catch((err) => {
        console.error("Worker failed to load WASM module: ", err)
    });

onmessage = ({ data }) => {
    let { action, payload } = data;
    postMessage({
        action: "log",
        payload: `Worker received message ${action}: ${payload}`,
    });
    switch (action) {
        case "calculate":
            let result = calcHarmonic(payload);
            postMessage({ action: "result", payload: result });
            break;
        default:
            throw (`unknown action '${action}'`);
    }
};

(The Go code remains unchanged.)

We see that the worker does the WebAssembly loading now, meaning that the Go code executes in a separate thread and the UI thread is free to run while the computation is ongoing. This sample adds a spinner that animates until the web worker returns calcHarmonic's answer, to show the effect.

Shows the UI of our "calculate harmonic sum" demo with a spinner

Link to the full code for this sample.


[1]The harmonic series is known to diverge, but very slowly. You need over 200 million elements to get to the sum of 20, etc. (see A004080).
[2]There are some additional mitigations we can explore, like compressing the WASM binary. This is outside the scope of this post, and it applies to the TinyGo output as well.

Sobre pessoas nefastas

Napoleão dizia que as piores pessoas para se ter ao seu lado são os idiotas com iniciativa.

Eu acrescentaria:

  • Os que entendem errado de propósito
  • Os que insistem em definir regras a partir de exceções

O artigo "Sobre pessoas nefastas" foi originalmente publicado no site TRILUX, de Augusto Campos.

Reasons I still love the fish shell

I wrote about how much I love fish in this blog post from 2017 and, 7 years of using it every day later, I’ve found even more reasons to love it. So I thought I’d write a new post with both the old reasons I loved it and some reasons.

This came up today because I was trying to figure out why my terminal doesn’t break anymore when I cat a binary to my terminal, the answer was “fish fixes the terminal!“, and I just thought that was really nice.

1. no configuration

In 10 years of using fish I have never found a single thing I wanted to configure. It just works the way I want. My fish config file just has:

  • environment variables
  • aliases (alias ls eza, alias vim nvim, etc)
  • the occasional direnv hook fish | source to integrate a tool like direnv
  • a script I run to set up my terminal colours

I’ve been told that configuring things in fish is really easy if you ever do want to configure something though.

2. autosuggestions from my shell history

My absolute favourite thing about fish is that I type, it’ll automatically suggest (in light grey) a matching command that I ran recently. I can press the right arrow key to accept the completion, or keep typing to ignore it.

Here’s what that looks like. In this example I just typed the “v” key and it guessed that I want to run the previous vim command again.

2.5 “smart” shell autosuggestions

One of my favourite subtle autocomplete features is how fish handles autocompleting commands that contain paths in them. For example, if I run:

$ ls blah.txt

that command will only be autocompleted in directories that contain blah.txt – it won’t show up in a different directory. (here’s a short comment about how it works)

As an example, if in this directory I type bash scripts/, it’ll only suggest history commands including files that actually exist in my blog’s scripts folder, and not the dozens of other irrelevant scripts/ commands I’ve run in other folders.

I didn’t understand exactly how this worked until last week, it just felt like fish was magically able to suggest the right commands. It still feels a little like magic and I love it.

3. pasting multiline commands

If I copy and paste multiple lines, bash will run them all, like this:

[bork@grapefruit linux-playground (main)]$ echo hi
hi
[bork@grapefruit linux-playground (main)]$ touch blah
[bork@grapefruit linux-playground (main)]$ echo hi
hi

This is a bit alarming – what if I didn’t actually want to run all those commands?

Fish will paste them all at a single prompt, so that I can press Enter if I actually want to run them. Much less scary.

bork@grapefruit ~/work/> echo hi

                         touch blah
                         echo hi

4. nice tab completion

If I run ls and press tab, it’ll display all the filenames in a nice grid. I can use either Tab, Shift+Tab, or the arrow keys to navigate the grid.

Also, I can tab complete from the middle of a filename – if the filename starts with a weird character (or if it’s just not very unique), I can type some characters from the middle and press tab.

Here’s what the tab completion looks like:

bork@grapefruit ~/work/> ls 
api/  blah.py     fly.toml   README.md
blah  Dockerfile  frontend/  test_websocket.sh

I honestly don’t complete things other than filenames very much so I can’t speak to that, but I’ve found the experience of tab completing filenames to be very good.

5. nice default prompt (including git integration)

Fish’s default prompt includes everything I want:

  • username
  • hostname
  • current folder
  • git integration
  • status of last command exit (if the last command failed)

Here’s a screenshot with a few different variations on the default prompt, including if the last command was interrupted (the SIGINT) or failed.

6. nice history defaults

In bash, the maximum history size is 500 by default, presumably because computers used to be slow and not have a lot of disk space. Also, by default, commands don’t get added to your history until you end your session. So if your computer crashes, you lose some history.

In fish:

  1. the default history size is 256,000 commands. I don’t see any reason I’d ever need more.
  2. if you open a new tab, everything you’ve ever run (including commands in open sessions) is immediately available to you
  3. in an existing session, the history search will only include commands from the current session, plus everything that was in history at the time that you started the shell

I’m not sure how clearly I’m explaining how fish’s history system works here, but it feels really good to me in practice. My impression is that the way it’s implemented is the commands are continually added to the history file, but fish only loads the history file once, on startup.

I’ll mention here that if you want to have a fancier history system in another shell it might be worth checking out atuin or fzf.

7. press up arrow to search history

I also like fish’s interface for searching history: for example if I want to edit my fish config file, I can just type:

$ config.fish

and then press the up arrow to go back the last command that included config.fish. That’ll complete to:

$ vim ~/.config/fish/config.fish

and I’m done. This isn’t so different from using Ctrl+R in bash to search your history but I think I like it a little better over all, maybe because Ctrl+R has some behaviours that I find confusing (for example you can end up accidentally editing your history which I don’t like).

8. the terminal doesn’t break

I used to run into issues with bash where I’d accidentally cat a binary to the terminal, and it would break the terminal.

Every time fish displays a prompt, it’ll try to fix up your terminal so that you don’t end up in weird situations like this. I think this is some of the code in fish to prevent broken terminals.

Some things that it does are:

  • turn on echo so that you can see the characters you type
  • make sure that newlines work properly so that you don’t get that weird staircase effect
  • reset your terminal background colour, etc

I don’t think I’ve run into any of these “my terminal is broken” issues in a very long time, and I actually didn’t even realize that this was because of fish – I thought that things somehow magically just got better, or maybe I wasn’t making as many mistakes. But I think it was mostly fish saving me from myself, and I really appreciate that.

9. Ctrl+S is disabled

Also related to terminals breaking: fish disables Ctrl+S (which freezes your terminal and then you need to remember to press Ctrl+Q to unfreeze it). It’s a feature that I’ve never wanted and I’m happy to not have it.

Apparently you can disable Ctrl+S in other shells with stty -ixon.

10. fish_add_path

I have mixed feelings about this one, but in Fish you can use fish_add_path /opt/whatever/bin to add a path to your PATH, globally, permanently, across all open shell sessions. This can get a bit confusing if you forget where those PATH entries are configured but overall I think I appreciate it.

11. nice syntax highlighting

By default commands that don’t exist are highlighted in red, like this.

12. easier loops

I find the loop syntax in fish a lot easier to type than the bash syntax. It looks like this:

for i in *.yaml
  echo $i
end

Also it’ll add indentation in your loops which is nice.

13. easier multiline editing

Related to loops: you can edit multiline commands much more easily than in bash (just use the arrow keys to navigate the multiline command!). Also when you use the up arrow to get a multiline command from your history, it’ll show you the whole command the exact same way you typed it instead of squishing it all onto one line like bash does:

$ bash
$ for i in *.png
> do
> echo $i
> done
$ # press up arrow
$ for i in *.png; do echo $i; done ink

14. Ctrl+left arrow

This might just be me, but I really appreciate that fish has the Ctrl+left arrow / Ctrl+right arrow keyboard shortcut for moving between words when writing a command.

I’m honestly a bit confused about where this keyboard shortcut is coming from (the only documented keyboard shortcut for this I can find in fish is Alt+left arrow / Alt + right arrow which seems to do the same thing), but I’m pretty sure this is a fish shortcut.

A couple of notes about getting this shortcut to work / where it comes from:

  • one person said they needed to switch their terminal emulator from the “Linux console” keybindings to “Default (XFree 4)” to get it to work in fish
  • on Mac OS, Ctrl+left arrow switches workspaces by default, so I had to turn that off.
  • Also apparently Ubuntu configures libreadline in /etc/inputrc to make Ctrl+left/right arrow go back/forward a word, so it’ll work in bash on Ubuntu and maybe other Linux distros too. Here’s a stack overflow question talking about that

a downside: not everything has a fish integration

Sometimes tools don’t have instructions for integrating them with fish. That’s annoying, but:

  • I’ve found this has gotten better over the last 10 years as fish has gotten more popular. For example Python’s virtualenv has had a fish integration for a long time now.
  • If I need to run a POSIX shell command real quick, I can always just run bash or zsh
  • I’ve gotten much better over the years at translating simple commands to fish syntax when I need to

My biggest day-to-day to annoyance is probably that for whatever reason I’m still not used to fish’s syntax for setting environment variables, I get confused about set vs set -x.

on POSIX compatibility

When I started using fish, you couldn’t do things like cmd1 && cmd2 – it would complain “no, you need to run cmd1; and cmd2” instead.

It seems like over the years fish has started accepting a little more POSIX-style syntax than it used to, like:

  • cmd1 && cmd2
  • export a=b to set an environment variable (though this seems a bit limited, you can’t do export PATH=$PATH:/whatever so I think it’s probably better to learn set instead)

on fish as a default shell

Changing my default shell to fish is always a little annoying, I occasionally get myself into a situation where

  1. I install fish somewhere like maybe /home/bork/.nix-stuff/bin/fish
  2. I add the new fish location to /etc/shells as an allowed shell
  3. I change my shell with chsh
  4. at some point months/years later I reinstall fish in a different location for some reason and remove the old one
  5. oh no!!! I have no valid shell! I can’t open a new terminal tab anymore!

This has never been a major issue because I always have a terminal open somewhere where I can fix the problem and rescue myself, but it’s a bit alarming.

If you don’t want to use chsh to change your shell to fish (which is very reasonable, maybe I shouldn’t be doing that), the Arch wiki page has a couple of good suggestions – either configure your terminal emulator to run fish or add an exec fish to your .bashrc.

I’ve never really learned the scripting language

Other than occasionally writing a for loop interactively on the command line, I’ve never really learned the fish scripting language. I still do all of my shell scripting in bash.

I don’t think I’ve ever written a fish function or if statement.

I ran a highly unscientific poll on Mastodon asking people what shell they use interactively. The results were (of 2600 responses):

  • 46% bash
  • 49% zsh
  • 16% fish
  • 5% other

I think 16% for fish is pretty remarkable, since (as far as I know) there isn’t any system where fish is the default shell, and my sense is that it’s very common to just stick to whatever your system’s default shell is.

It feels like a big achievement for the fish project, even if maybe my Mastodon followers are more likely than the average shell user to use fish for some reason.

who might fish be right for?

Fish definitely isn’t for everyone. I think I like it because:

  1. I really dislike configuring my shell (and honestly my dev environment in general), I want things to “just work” with the default settings
  2. fish’s defaults feel good to me
  3. I don’t spend that much time logged into random servers using other shells so there’s not too much context switching
  4. I liked its features so much that I was willing to relearn how to do a few “basic” shell things, like using parentheses (seq 1 10) to run a command instead of backticks or using set instead of export

Maybe you’re also a person who would like fish! I hope a few more of the people who fish is for can find it, because I spend so much of my time in the terminal and it’s made that time much more pleasant.

A elefanta tinha uma planta de Fanta

Lá pela 7ª série a professora mandou fazer poesias concretas ou algo assim

Uma menina chamada Joemar foi lá declamar a dela, que era toda certinha de rimas e métricas, e contava a história de uma elefanta que tinha uma planta de Fanta.

Já no 1º verso a professora (de português) interrompeu pra dizer que não existe elefanta.

Joemar prosseguiu inabalada, e na estrofe final revelou, como parte do texto original, que estava anos-luz à frente e sabia de tudo: o poema concluía dizendo que não existe elefanta, mas a poesia está ali pra transgredir mesmo.

O artigo "A elefanta tinha uma planta de Fanta" foi originalmente publicado no site TRILUX, de Augusto Campos.

Pablo Picasso já rejeitava a IA em 1969

Quando Pablo Picasso disse, em 1969, que computadores eram desinteressantes porque só sabiam dar respostas, ele já estava dizendo por que que as expectativas atuais para a "IA" são infundadas.

O artigo "Pablo Picasso já rejeitava a IA em 1969" foi originalmente publicado no site TRILUX, de Augusto Campos.

Multiversion Python Thoughts

Now that uv is rapidly advancing I have started to dive back into making multi-version imports for Python work. The goal here is to enable multiple resolutions from the solver in uv so that two incompatible versions of a library can be installed and used simultaniously.

Simplified speaking it should be possible for a library to depend on both pydantic 1.x and 2.x simultaniously.

I have not made it work yet, but I have I think found all of the pieces that stand in the way. This post mostly exists to share how it could be done with the least amount of changes to Python.

Basic Operation

Python's import system places modules in a module cache. This cache is exposed via sys.modules. Every module that is imported is placed in that container prior to initialization. The key is the import path of the module. This in some ways presents the first issue.

Note on Terms for Packages, Modules and Distributions

Python's terms for packages are super confusing. Here is what I will use in this article:

  • foo.py: this is a python “module”. It gets registered in sys.modules as 'foo' and has an attribute __name__ set to 'foo'.
  • foo/__init__.py: declares also a Python “module” named 'foo' but it is simultaniously a “package”. Unlike a normal module it also has two extra attributes: __path__ which is set to ['./foo'] so that sub modules can be found and it has an attribute __package__ which is also set to 'foo' which marks it as package.
  • Additionally on PyPI one can register things. These things were called packages at one point and are now mostly called "projects". Within Python however they are not called Projects but “distribution packages”. For instance this is what you see when you try to use the importlib.metadata API. For now I will just call this a “distribution”.

Note that a distribution can ship both modules and multiple at once. You could have a package called whatever and it reports a foo.py file and a bar/baz.py file which in turn would make foo and bar.baz be importable.

Say you have two Python distributions both of which provide the same toplevel package. In that case they are going to clash in sys.modules. As there is actually relationship of the distribution name to the entry in sys.modules this is a problem that does not just exist with multi version imports but it's one that does not happen all that much.

So let's say we have two distributions: [email protected] and [email protected]. Both expose a toplevel module called foo which is a true Python package with a single __init__.py file. The installer would already fail to place these because one fully overrides the other.

So step 1 would be to place these modules in different places. So where they normally would be in site-packages, in this case we might want to not have these packages there. That solves us the file system clashes.

So we might place them in some extra cache that looks like this:

.venv/
    multi-version-packages/
        [email protected]/
            foo/
                __init__.py
        [email protected]/
            foo/
                __init__.py

Now that package is entirely non-importable since nothing looks at multi-version-packages. We will need a custom import hook to get them imported. That import hook will also need to change the name of what's stored in sys.modules.

So instead of registering foo as sys.modules['foo'] we might want to try to register it as sys.modules['[email protected]'] and sys.modules['[email protected]'] instead. There is however a catch and that is this very common pattern:

import sys

def import_module(name):
    __import__(name)
    return sys.modules[name]

That poses a bit of a problem because someone is probably going to call this as import_module('foo') and now we would not find the entry in sys.modules.

This means that in addition to the new entries in sys.modules we would also need to register some proxies that “redirect” us to the real names. These proxies however would need to know if they point to 1.0.0 or 2.0.0.

Metadata

So let's deal with this problem first. How do we know if we need 1.0.0 or 2.0.0? The answer is most likely a package's dependenices. Instead of allowing a package to depend simultaniously on two different versions of the same dependency we can start with a much simpler problem and say that each package can only depend on one version. So that means if I have a myapp package it would have to pick between [email protected] or [email protected]. However if it were to depended on another package (say slow-package) that one could depend on a different version of foo than myapp:

myapp v0.1.0
├── foo v2.0.0
└── slow-package v0.1.0
    └── foo v1.0.0

In that case when someone tries to import foo we would be consulting the package metadata of the calling package to figure out which version is attempted.

There are two challenges with this today and they come from the history of Python:

  1. the import hook does not (always) know which module triggered the import
  2. python modules do not know their distribution package

Let's look at these in detail.

Import Context

The goal is that when slow_package/__init__.py imports foo we get [email protected] version, when myapp/__init__.py improts foo we get the [email protected] version. What is needed for this to work is that the import system understands not just what is imported, but who is importing. In some sense Python has that. That's because __import__ (which is the entry point to the import machinery) gets the module globals. Here is what an import statement roughly maps to:

# highlevel import
from foo import bar

# under the hood
_rv = __import__('foo', globals(), locals(), ['bar'])
bar = _rv.bar

The name of the package that is importing can be retrieved by inspecting the globals(). So in theory for instance the import system could utilize this information. globals()['__name__'] would tell us slow_package vs myapp. There however is a catch and that is that the import name is not the distribution name. The PyPI package could be called mycompany-myapp and it exports a python package just called myapp. This happens very commonly in all kinds of ways. For instance on PyPI one installs Scikit-learn but the python package installed is sklearn.

There is however another problem and that is interpreter internals and C/Rust extensions. We have already established that Python packages will pass globals and locals when they import. But what do C extensions do? The most common internal import API is called PyImport_ImportModule and only takes a module name. Is this a problem? Do C extensions even import stuff? Yes they do. Here is an example from pygame:

MODINIT_DEFINE (color)
{
     PyObject *colordict;

     colordict = PyImport_ImportModule ("pygame.colordict");

     if (colordict)
     {
         PyObject *_dict = PyModule_GetDict (colordict);
         PyObject *colors = PyDict_GetItemString (_dict, "THECOLORS");
         /* TODO */
     }
     else
     {
         MODINIT_ERROR;
     }

     /* snip */
 }

And that makes sense. A sufficiently large python package will have inter dependencies between the stuff written in C and Python. It's also complicated by the fact that the C module does initialize a module, but it does not have a natural module scope. The way the C extension initializes the module is with the PyModule_Create API:

static struct PyModuleDef module_def = {
    PyModuleDef_HEAD_INIT,
    "foo", /* name of module */
    NULL,
    -1,
    SpamMethods
};

PyMODINIT_FUNC
PyInit_foo(void)
{
    return PyModule_Create(&module_def);
}

So both the name of the module created as well as the name of what is imported is entirely hardcoded. A C extension does not “know” what the intended name is, it must know this on its own.

In some sense this is already a bit of a disconnect beween the Python and C world. Python for instance has relative imports (from .foo import bar). This is implemented by inspecting the globals. There is however no API to do these relative imports on the C layer.

The only workaround I know right now would be to perform stack walking. That way one would try to isolate the shared library that triggered the import to understand which module it comes from. An alternative would be to carry the current C extension module that is active on the interpreter state, but that would most likely be quite expensive.

The goal would be to find out which .so/.dylib file triggered the import. Stack walking is a rather expensive operation and it can be incredibly brittle but there might not be a perfect way around it. Ideally Python would at any point know which c extension module is active.

Distributions from Modules

So let's say that we have the calling python module figured out: now we need to figure out the associated PyPI distribution name. Unfortunately such a mapping does not exist at all. Ideally when a sys.module entry is created, we either record a special attribute there (say __distribution__) which carries the name of the PyPI distribution name so we can call importlib.metadata.distribution(__distribution__).requires to get the requirements or we have some other API to map it.

In the absence of that, how could we get it? There is an expensive way to get a reverse mapping (importlib.metadata.packages_distributions) but unfortunately it has some limitations:

  1. it's very slow
  2. it has situations where it does not manage to reveal the distribution for a package
  3. it can reveal more than one distribution for a package

Because of namespace packages in particular it can return more than one distribution that provides a package such as foo (eg: foo-bar provides foo.bar and foo-baz provides foo.baz. In that case it will just return both foo-bar and foo-baz for foo).

The solution here might just be that installers like uv start materializing the distribution name onto the modules in one way or another.

Putting it Together

The end to end solution might be this:

  1. install multi-version packages outside of site-packages
  2. materialize a __distribution__ field onto modules or provide an API that maps import names to their PyPI distribution name so that meta data (requirements) can be discovered.
  3. patch __import__ to resolve packages to their fully-qualified, multi version name based on who imports it
    • via globals() for python code
    • via stack-walking for C extensions (unless a better option is found)
  4. register proxy entries in sys.modules that have a dynamic __getattr__ which redirects to the fully qualified names if necessary. This would allow someone to access sys.modules['foo'] and automatically proxy it to [email protected] or [email protected] respectively.

There are lots of holes with this approach unfortunately. That's in parts because people patch around in sys.modules. Interestingly enough sys.modules can be manipulated but it can't be replaced. This might make it possible to replace that dictionary with some more magical dictionary in future versions of Python potentially.

Making progress on side projects with content-driven development

It's hard to make progress on side projects sometimes. Getting started is easy when we see the bright future of the project. Then somewhere in the middle, we get stuck in lists of tasks to do, a long way in and still a long way from the finish line.

This happens to me as much as anyone. In the last couple of years, I stumbled into a way to avoid getting stuck midway in my projects. It's not just about productivity: this also lets me let go of things when I reach a good stopping point. It helps me figure out what that stopping point should be.

Where projects get stuck

There are a lot of reasons that projects get stuck. Maybe we don't know how to do something, and it's a big hurdle to overcome. Or maybe life gets in the way, we put down the project, and we never pick it back up. Or we make progress, but we keep adding features that we think it "needs." Or as time drags on, we just kind of lose interest and the project peters out before it ever got shipped.

For me, each of these shares a common characteristic: an overwhelming task list.

It can be overwhelming for different reasons. If the next task is something we don't know how to do, then even a short list can be overwhelming—it's tough to decide to work on something you have no idea how to approach! Or if the task list is a lot and you put it down for some time, it can be overwhelming to decide where to dive back in. And when it's just a big project, it's hard to keep interest levels high for the duration, so eventually there's a point where that interest dips below what you need to keep going.

I've had each of these happen to me, but losing interest from a big project over a long time is the most common. My interests can shift around a lot and I like to do varied things, so if a project takes too long, it might end up on the shelf, half finished, never to be worked on again. Besides, I usually started it to learn something, and the rest of the project can feel like a lot of busy work to get there.

But if I get a quick win with something, that dopamine hit can keep the interest up, and make it easier to keep going.

Sequencing via content

When I started writing more regularly, I noticed that I was also making more regular progress on my projects. This was not just because I wanted things to write about, but because writing about my projects changes how I approach them. Instead of building one big task list, I think about what small things I can work toward to write about.

You see this in workplaces sometimes. One form is with agile development in general, where each sprint is supposed to result in something shippable to deliver value. This can also look like demo-driven development, where each sprint you try to get a good demo ready to show off.

What these have in common, and why they work, is that they make you sequence things into smaller deliverable chunks. When I'm working on a project I want to write about, I think about which pieces I can work on independently to get something that is worth writing about, worth explaining to people. Maybe that means that for this upcoming project, I'll do the parser first, and write a post about some of the neat things with parsers! Or maybe it means that we cut out things like permissions for a demo web app, since that's not at all core to what we're doing (where what we're doing is not deployed software).

By thinking about the sequence for sharing updates, it is a lot easier to cut away the cruft and focus on the core. Each time you get to share your work, that can give a good dopamine hit. It also gives a good motivation for some of the things you might not be as excited about!

Where to share progress

A lot of my updates are posted on my blog or in my newsletter, but there are so many ways to do content-driven development! It mostly comes down to where you want to share your progress.

A few formats that I've seen work well are:

  • Blog posts: this is my default, because I like blogging!
  • YouTube videos: if you're into video instead of writing, you can also make demo and update videos and post those.
  • Microblogs: a lighter weight alternative to blogging, posting on places like Mastodon can give a good way to share updates.
  • Forums: I'm specifically thinking of the Recurse Center forums/chat tool where people post check-ins periodically with progress. These sorts of groups can be great for sharing!
  • With a friend: You can also just share periodically with one or two other people! It doesn't have to be very public.

As I wrote this out, I realized that I do almost all of these. I ultimately end up with blog posts, but along the way I share tidbits on Mastodon and in my RC threads, and I tell friends about exciting things as I go.

You decide when you're done

Projects don't last forever, at least until we figure out immortality. One bonus of working on projects through the lens of writing about my progress/learnings is that I stop more intentionally.

Without this lens, I look at all the features I don't have as something that is wrong, where I've failed. With this perspective, though, I look at the features I don't have as where I chose to stop because it was orthogonal to my goals!

If you are focused on what you want to learn and how to share or communicate that, then you'll have that in mind as you pick what to work on. This will let you be aware of the things that are not important to you so you can let go of them. You can separate the wheat from the chaff and just get what is really important, then abandon the rest. It's freeing!

Let the projects continue as long as they need to for you to get what you want out of them. Once you've learned what you came for, demonstrated what you wanted to, or made that useful little tool for yourself? Then you can just be... done. Even if the task list isn't.

SDF letter spacing

My summer project is to work on labels for maps. In the previous post I described how I created outlines, and how I had a bug in the rendering. While looking closely at text to fix that bug, I noticed in one of my tests that the k and s seemed too close together. The h and e seemed a little too far apart.

Letter spacing issues

The spacing between characters is normally set in the metrics for the characters. There's also a way to override the spacing between specific pairs of characters, using "kerning tables", named kern and GPOS in TrueType. I don't have kerning implemented.

Hypothesis: I need to implement kerning. To test this, I rendered k and s with other letters to see if either one was the issue:

Testing spacing of s and k

It does seem like k is generally close to the letter to the right of it. So that suggests it's the letter k and not a kerning issue.

But I was curious about kerning, so I checked the font data and this font (Fira Sans) doesn't have a kerning table in it. So that means k really is a little too close to the neighbor. I verified this by checking the rendering in Google Chrome on the Google Fonts web site (second image) and compared to my rendering (first image):

Spacing issues

Sure enough, both the h + e and k + s spacing issues are there. So that's just how the font is. Ok, I guess there's nothing I can do here, at least for this font. Later I will try other fonts, and then I can revisit this issue.

I was extremely pleased that my font renderer looked so close to Google's, not only the spacing but also the shapes. Looking closely at the edges led me down another rabbit hole … a tale for the next blog post.

Duas soluções pelo preço de uma

Pra ter uma solução eficaz pra DOIS problemas que se alimentam mutuamente, é necessário lembrar que geralmente a causa da procrastinação não é a preguiça, e sim a ansiedade.

O artigo "Duas soluções pelo preço de uma" foi originalmente publicado no site TRILUX, de Augusto Campos.