Bridging theory and practice in engineering strategy.

Some people I’ve worked with have lost hope that engineering strategy actually exists within any engineering organizations. I imagine that they, reading through the steps to build engineering strategy, or the strategy for navigating private equity ownership, are not impressed. Instead, these ideas probably come across as theoretical at best. In less polite company, they might describe these ideas as fake constructs.

Let’s talk about it! Because they’re right. In fact, they’re right in two different ways. First, this book is focused on explain how to create clean, refine and definitive strategy documents, where initially most real strategy artifacts look rather messy. Second, applying these techniques in practice can require a fair amount of creativity. It might sound easy, but it’s quick difficult in practice.

This chapter will cover:

  • Why strategy documents need to be clear and definitive, especially when strategy development has been messy
  • How to iterate on strategy when there are demands for unrealistic timelines
  • Using strategy as non-executives, where others might override your strategy
  • Handling dynamic, quickly changing environments where diagnosis can change frequently
  • Working with indecisive stakeholders who don’t provide clarity on approach
  • Surviving other people’s bad strategy work

Alright, let’s dive into the many ways that praxis doesn’t quite line up with theory.

This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Clear and definitive documents

As explored in Making engineering strategies more readable, documents that feel intuitive to write are often fairly difficult to read, That’s because thinking tends to be a linear-ish journey from a problem to a solution. Most readers, on the other hand, usually just want to know the solution and then to move on. That’s because good strategies for direction (e.g. when a team wants to understand how they’re supposed to solve a specific issue at hand) far more frequently than they’re read to build agreement (e.g. building stakeholder alignment during the initial development of the strategy).

However, many organizations only produce writer-oriented strategy documents, and may not have any reader-oriented documents at all. If you’ve predominantly worked in those sorts of organizations, then the first reader-oriented documents you encounter will see artificial.

There are also organizations that have many reader-oriented documents, but omit the rationale behind those documents. Those documents feel proscriptive and heavy-handed, because the infrequent reader who does want to understand the thinking can’t find it. Further, when they want to propose an alternative, they have to do so without the rationale behind the current policies: the absence of that context often transforms what was a collaborative problem-solving opportunity into a political match.

With that in mind, I’d encourage you to see the frequent absence of these documents as a major opportunity to drive strategy within your organization, rather than evidence that these documents don’t work. My experience is that they do.

Doing strategy despite unrealistic timelines

The most frequent failure mode I see for strategy is when it’s rushed, and its authors accept that thinking must stop when the artificial deadline is reached. Taking annual planning at Stripe as an example, Claire Hughes Johnson argued that planning expands to fit any timeline, and consequently set a short planning timeline of several weeks. Some teams accepted that as a fixed timeline and stopped planning when the timeline ended, whereas effective teams never stopped planning before or after the planning window.

When strategy work is given an artificially or unrealistic timeline, then you should deliver the best draft you can. Afterwards, rather than being finished, you should view yourself as startingt he refinement process. An open strategy secret is that many strategies never leave the refinement phase, and continue to be tweaked throughout their lifespan. Why should a strategy with an early deadline be any different?

Well, there is one important problem to acknowledge: I’ve often found that the executive who initially provided the unrealistic timeline intended it as a forcing function to inspire action and quick thinking. If you have a discussion with them directly, they’re usually quite open to adjusting the approach. However, the intermediate layers of leadership between that executive and you often calcify on a particular approach which they claim that the executive insists on precisely following.

Sometimes having the conversation with the responsible executive is quite difficult. In that case, you do have to work with individuals taking the strategy as literate and unalterable until either you can have the conversation or something goes wrong enough that the executive starts paying attention again. Usually, though, you can find someone who has a communication path, as long as you can articulate the issue clearly.

Using strategy as non-executives

Some engineers will argue that the only valid strategy altitude is the highest one defined by executives, because any other strategy can be invalidated by a new, higher altitude strategy. They would claim that teams simply cannot do strategy, because executives might invalidate it. Some engineering executives would argue the same thing, instead claiming that they can’t work on an engineering strategy because the missing product strategy or business strategy might introduce new constraints.

I don’t agree with this line of thinking at all. To do strategy at any altitude, you have to come to terms with the certainty that new information will show up, and you’ll need to revise your strategy to deal with that.

Uber’s service provisioning strategy is a good counterexample against the idea that you have to wait for someone else to set the strategy table. We were able to find a durable diagnosis despite being a relatively small team within a much larger organization that was relatively indifferent to helping us succeed. When it comes to using strategy, effective diagnosis trumps authority. In my experience, at least as many executives’ strategies are ravaged by reality’s pervasive details as are overridden by higher altitude strategies. The only way to be certain your strategy will fail is waiting until you’re certain that no new information might show up and require it changing.

Doing strategy in chaotic environments

How should you adopt LLMs? discusses how a company should plot a path through the rapidly evolving LLM ecosystem. Periods of rapid technology evolution are one reason why your strategy might encounter a pocket of chaos, but there are many others. Pockets of rapid hiring, as well as layoffs, create chaos. The departure of load-bearing senior leaders can change a company quickly. Slowing revenue in a company’s core business can also initiate chaotic actions in pursuit of a new business.

Strategies don’t require stable environments. Instead, strategies require awareness of the environment that they’re operating in. In a stable period, a strategy might expect to run for several years and expect relatively little deviation from the initial approach. In a dynamic period, the strategy might know you can only protect capacity in two-week chunks before a new critical initiative pops up. It’s possible to good strategy in either scenario, but it’s impossible to good strategy if you don’t diagnose the context effectively.

Unreliable information

Often times, the strategy forward is very obvious if a few key decisions were made, you know who is supposed to make those decisions, but you simply cannot get them to decide. My most visceral experience of this was conducting a layoff where the CEO wouldn’t define a target cost reduction or a thesis of how much various functions (e.g. engineering, marketing, sales) should contribute to those reductions. With those two decisions, engineering’s approach would be obvious, and without that clarity things felt impossible.

Although I was frustrated at the time, I’ve since come to appreciate that missing decisions are the norm rather than the exception. The strategy on Navigating Private Equity ownership deals with this problem by acknowledging a missing decision, and expressly blocking one part of its execution on that decision being made. Other parts of its plan, like changing how roles are backfilled, went ahead to address the broader cost problem.

Rather than blocking on missing information, your strategy should acknowledge what’s missing, and move forward where you can. Sometimes that’s moving forward by taking risk, sometimes that’s delaying for clarity, but it’s never accepting yourself as stuck without options other than pointing a finger.

Surviving other people’s bad strategy work

Sometimes you will be told to follow something which is described as a strategy, but is really just a policy without any strategic thinking behind it. This is an unavoidable element of working in organizations and happens for all sorts of reasons. Sometimes, your organization’s leader doesn’t believe it’s valuable to explain their thinking to others, because they see themselves as the one important decision maker.

Other times, your leader doesn’t agree with a policy they’ve been instructed to rollout. Adoption of “high hype” technologies like blockchain technologies during the crypto book was often top-down direction from company leadership that engineering disagreed with, but was obligated to align with. In this case, your leader is finding that it’s hard to explain a strategy that they themselves don’t understand either.

This is a frustrating situation. What I’ve found most effective is writing a strategy of my own, one that acknowledges the broader strategy I disagree with in its diagnosis as a static, unavoidable truth. From there, I’ve been able to make practical decisions that recognize the context, even if it’s not a context I’d have selected for myself.

Summary

I started this chapter by acknowledging that the steps to building engineering strategy are a theory of strategy, and one that can get quite messy in practice. Now you know why strategy documents often come across as overly pristine–because they’re trying to communicate clear about a complex topic.

You also know how to navigate the many ways reality pulls you away from perfect strategy, such as unrealsitic timelines, higher altitude strategies invalidating your own strategy work, working in a chaotic environment, and dealing with stakeholders who refuse to align with your strategy. Finally, we acknowledged that sometimes strategy work done by others is not what we’d consider strategy, it’s often unsupported policy with neither a diagnosis or an approach to operating the policy.

That’s all stuff you’re going to run into, and it’s all stuff you’re going to overcome on the path to doing good strategy work.

System Design - Saga Pattern

O ano de publicação desse texto foi marcado por interessantes experiências profissionais nas quais eu pude resolver problemas muito complexos de sistemas distribuídos utilizando o modelo Saga. Logo, por mais que tenha sido sensacional poder compilar todas as referências bibliográficas e materiais que consumi por todo esse período aqui, também foi extremamente desafiador remover as “exclusividades” que foram trabalhadas e deixar as sugestões sem um excesso de particularidades dos meus cenários.

É sempre maravilhoso poder contemplar um material finalizado sobre o tema de microserviços, arquitetura e sistemas distribuídos, mas esse capítulo em questão foi entregue com extrema felicidade. Espero que seja de bom proveito para todos que estão buscando por referências e experiências com esse tipo de implementação.


O que é o modelo SAGA?

Uma transação Saga é um padrão arquitetural que visa garantir a consistência dos dados em transações distribuídas, especialmente em cenários onde essas transações dependem de execução contínua em múltiplos microserviços ou possuam uma longa duração até serem completamente finalizadas, e onde qualquer execução parcial é indesejável.

O termo Saga vem do sentido literal de Saga, que o conceito remete a uma aventura, uma história, uma jornada do herói, jornada na qual a mesma remonta vários capítulos onde o “herói” precisa cumprir objetivos, enfrentar desafios, superar certos limites e concluir um objetivo predestinado. Dentro de uma implementação do Saga Pattern, uma Saga possui uma característica sequencial, na qual a transação depende de diversos microserviços para ser concluída, com etapas que devem ser executadas uma após a outra de forma ordenada e distribuída.

A implementação dessas etapas pode variar entre abordagens Coreografadas e Orquestradas, as quais serão exploradas mais adiante. Independentemente da abordagem escolhida, o objetivo principal é gerenciar transações que envolvem dados em diferentes microserviços e bancos de dados, ou que são de longa duração, e garantir que todos os passos sejam executados sem perder a consistência e controle, e em caso de falha de algum dos componentes por erros sistemicos ou por entradas de dados inválidas ter a capacidade de notificar todos os participantes da saga a compensarem a transação executando um rollback de todos os passos já executados.

Lembrando que a principal proposta do modelo Saga é garantir confiabilidade e consistência, não performance. Inclusive, suas maiores nuâncias pagam o preço de performance para atingir esses objetivos.

A Origem Histórica do Saga Pattern

Não é costume desta série de textos aprofundar demasiadamente nos detalhes acadêmicos e históricos dos tópicos abordados. Porém, vale destacar as origens do Saga Pattern e o problema que ele foi originalmente concebido para resolver.

Artigo Saga

O Saga Pattern foi publicado pela primeira vez por Hector Garcia-Molina e Kenneth Salem, em 1987, em um artigo para o Departamento de Ciências da Computação da Universidade de Princeton, intitulado SAGAS. O objetivo do artigo é enfrentar a problemática das Long Live Transactions (LLTs) nos computadores da época, quando já se buscava uma forma de lidar com processos que demandavam mais tempo que as operações tradicionais e não podiam simplesmente bloquear os recursos computacionais até sua conclusão.

Como mencionado, o termo “Saga” faz alusão a histórias que se desenrolam em capítulos menores, ou seja, a proposta era quebrar uma Transação de Longa Duração em várias transações menores, cada uma podendo ser confirmada ou desfeita de forma independente. Isso transformava uma operação atômica extensa em pequenas transações atômicas, com um nível de supervisão pragmática.

Portanto, embora o Modelo Saga não tenha sido inicialmente projetado para gerenciar consistência em microserviços, e sim para tratar processos computacionais em bancos de dados, ele foi revisitado ao longo do tempo. À medida que microserviços e sistemas distribuídos se tornaram mais comuns no ambiente corporativo, os princípios do Saga Pattern provarem-se úteis para lidar com falhas e garantir a consistência nessas arquiteturas modernas e distribuídas.


O problema de lidar com transações distribuídas

Uma transação distribuida é aquela que precisa acontecer em multiplos sistemas e bancos de dados para ser concluída. Por definição, entendemos que ela precisa de multiplos participantes escrevendo e commitando seus dados para que ela seja bem sucedida, e reportando o status de escrita para quem está coordenando a transação.

Vamos imaginar o sistema de pedidos de um grande e-commerce. A funcionalidade principal desse sistema é receber uma solicitação de pedido e executar todas as ações necessárias para garantir a efetivação completa desse pedido, desde a solicitação até a entrega. Para isso, é preciso interagir com diversos microserviços pertinentes a esse fluxo hipotético, como Serviço de Pedidos, Serviço de Pagamentos, Serviço de Estoque, Serviço de Entregas e um Serviço de Notificações que notifica o cliente de todas as etapas do pedido.

Saga Problema

Exemplo de um processo distribuido inicial

Em uma arquitetura complexa com múltiplos serviços interligados, cada domínio isolado precisa garantir uma parte da sequência da execução para que o pedido seja concluído com sucesso. À medida que o número de componentes aumenta, a complexidade também cresce, aumentando a probabilidade de falhas e inconsistências.

Saga Error

Exemplo de um erro em uma transação distribuída

Imagine que, durante a execução dessas etapas, um dos serviços falhe por algum motivo não sistêmico em termos de resiliência, como a falta de um item no estoque ou a recepção de informações inválidas pelo serviço de estoque. Nessas situações, pode ser impossível continuar as chamadas para os serviços subsequentes, como o serviço de entregas, mesmo que etapas críticas, como o processamento do pagamento, já tenham sido concluídas com sucesso. Nesse caso, conhecer e desfazer os passos anteriores pode se tornar um problema complicado.

Esse cenário representa um grave problema de consistência distribuída. Sem mecanismos adequados, o sistema pode acabar em um estado inconsistente, onde o pagamento foi efetuado, mas o pedido não foi concluído. O Saga Pattern é uma solução que tenta solucionar exatamente esse tipo de problema, garantindo que, mesmo em caso de falhas, o sistema mantenha a integridade dos dados e retorne a um estado consistente em todos os serviços que compõe a transação.


O problema de lidar com transações longas

Em diversos cenários, processos complexos exigem um período um pouco mais longo para serem concluídos em sua totalidade. Por exemplo, uma solicitação dentro de um sistema que precisa passar por várias etapas de execução pode levar desde milissegundos até semanas ou meses para ser finalizada completamente.

O tempo de espera entre a execução de um microserviço e o serviço subsequente pode variar intencionalmente devido a fatores como agendamentos, estímulos externos, agrupamento de registros dentro de períodos e outros. Os exemplos disso incluem controle de cobrança de parcelamento, agendamento financeiro, consolidação de franquias de uso de produtos digitais, agrupamento de solicitações para processamento em batch, fechamento de faturas e controle de uso de recursos de um sistema por seus clientes.

Gerenciar o ciclo de vida dessas transações de longo prazo representa um desafio arquitetural significativo, especialmente em termos de consistência e conclusão. É necessário criar mecanismos que permitam controlar transações de ponta a ponta em cenários complexos, monitorar todas as etapas pelas quais a transação passou e determinar e gerenciar o estado atual da transação de forma transparente e duradoura. O Saga Pattern resolve esses problemas ao decompor transações longas em uma série de transações menores e independentes, cada uma gerenciada por um microserviço específico. Isso facilita a garantia de consistência, a recuperação de falhas no quesito de resiliência operacional.


A Proposta de Transações Saga

Concluindo o que foi abordado anteriormente na explicação da problemática, o Saga Pattern é um padrão arquitetural projetado para lidar com transações distribuídas e dependentes da consistência eventual em multiplos microserviços.

A proposta da aplicabilidade do Saga Pattern é decompor uma transação longa e complexa em uma sequência de transações menores e coordenadas, que são gerenciadas para garantir a consistência e sucesso ou erro da execução, e principalmente garantir a consistência dos dados em diferentes serviços que sigam o modelo “One Database Per Service”.

Cada Saga corresponde a uma transação pseudo-atômica dentro do sistema, onde cada solicitação corresponde a execução de uma operação isolada. Essas sagas em questão consistem em um agrupamento de operacões menores que acontecem localmente em cada microserviço da saga. Além de proporcionar meios de garantir que todas as etapas sejam concluídas, caso uma das operações da saga falhe, o Saga Pattern define transações compensatórias para desfazer as operações já executadas, assegurando que o sistema se mantenha consistênte até mesmo durante uma falha.

A proposta da Saga quando aplicado em abordagens assincronas elimina a necessidade de bloqueios síncronos e prolongados, como o caso do Two-Phase Commit (2PC) que são computacionalmente caros e podem se tornar gargalos de desempenho em ambientes distribuídos. Esses tipos de bloqueios longos também são complicados de serem restabelecidos em caso de falhas.

Existem dois modelos principais para implementar o Saga Pattern, o Modelo Orquestrado e o Modelo Coreografado. Cada um deles possui características de coordenação e comunicação das transações Saga diferentes em termos arquiteturais. A escolha entre os modelos depende das necessidades específicas de como o sistema foi projetado, e principalmente deve levar em conta a complexidade das transações.


Modelo Orquestrado

O Modelo Orquestrado propõe a existência de um componente centralizado de orquestração que gerencia a execução das sagas. O Orquestrador é responsável por iniciar a saga, coordenar a sequência de transações, monitorar as respostas e gerenciar o fluxo de compensação em caso de falhas. Ele atua como um control plane que envia comandos para os microserviços participantes e espera pelas respostas para decidir os próximos passos ou continuar a saga.

Orquestrador

Exemplificação do Modelo Orquestrado

Considere que para concluir uma transação de um pedido de compra, você precisa estimular e esperar a resposta de confirmação de uma série de domínios como pagamentos, estoques, notificações e entregas. São muitos componentes distribuidos, com suas próprias limitações, capacidades de escala, modos de uso, e que possuem seus próprios contratos e precisam ser acionados de forma sequencial e lógica para que a transação seja concluída. Assumindo uma abordagem assíncrona, um orquestrador utiliza-se do pattern de command / response para acionar esses microserviços, e mediante a resposta de cada um deles acionar o próximo microserviço da saga, compensar as operações já realizadas em caso de falha, ou concluir e encerrá-la. Um orquestrador também pode trabalhar de forma síncrona se necessário, porém mecanismos de resiliência que já são “nativos” de mensageria, como backoff, retries e DLQ’s devem ser implementados manualmente para garantir uma resiliência saudável da execução da saga.

Então a função do orquestrador é basicamente montar um “mapa da saga”, com todas as etapas que precisam ser concluídas para a finaliza-la, enviar mensagens e eventos para os respectivos microserviços e, a partir de suas respostas, prosseguir e estimular o próximo passo da Saga até que a mesma esteja totalmente completa ou compensar as operações já realizadas em caso de falha.

O modelo orquestrado é dependente da implementação de um pattern de Máquina de Estado, e o mesmo deve ser capaz de gerenciar o estado atual e, mediante a respostas, mudar esse estado e tomar uma ação mediante ao novo estado. Dessa forma conseguimos controlar a orquestação de forma centralizada e segura, concentrando a complexidade de orquestração de microserviços em um único componente, onde podemos metrificar todos os passos, o início e fim da execução da saga, controle de historico e alteração de estado de forma transacional e etc.

Modelo de Comando / Resposta em Transações Saga

Em implementações modernas de Saga Pattern, principalmente no modelo orquestrado, muitas das interações entre os participantes da Saga ocorrem de forma assíncrona e reativa. Nessa abordagem, o orquestrador da saga (ou um serviço solicitante, fora do saga pattern) envia um comando para outro microserviço realizar uma ação, e aguarda a resposta de forma bloqueante ou semi-bloqueante antes de prosseguir para o próximo passo da Saga.

Comando e Resposta

Modelo de Comando e Resposta de Fluxos Assincronos

Isso presume que os serviços expostos precisam expor um tópico de ação, e outro para respostas daquela ação em questão, para que o orquestrador ou serviço solicitante saiba onde enviar o comando e onde aguardar pela resposta de conclusão com sucesso ou falha do mesmo.


Modelo Coreografado

O modelo Coreografado, ao contrário do Orquestrado que propõe um componente centralizado que conhece todos os passos da saga, propõe que os microserviços devem conhecer o serviço seguinte e o anterior. Isso significa que a saga é executada em uma abordagem de malha de serviço, onde, num caso complexo, um microserviço quando é chamado e termina seu processo conhece o microserviço seguinte e o protocolo que o mesmo usa para expor suas funcionalidades. Esse microserviço se encarrega de executar o passo seguinte e assim sucessivamente até a finalização da saga.

Saga Coreografado

A mesma lógica é aplicada em operações de compensação e rollback, onde o serviço que falhou é obrigado a notificar o anterior ou acionar um “botão do pânico” para que toda a malha anterior regrida com os passos já confirmados.

Saga Coreografado - Compensacao

O modelo coreografado, por mais que seja mais simples e com menos garantias que o orquestrado de primeiro momento, também funciona como um viabilizador de fluxos síncronos para arquiteturas sagas.


Adoções Arquiteturais

As abordagens de Saga podem variar e se extender para diversos patterns arquiteturais. Nessa sessão vamos abordar alguns dos padrões e abordagens que eu considerei mais importantes e relevantes para serem considerados quando avaliamos uma arquitetura Saga para algum projeto.

Maquinas de Estado no Modelo Saga

Em arquiteturas distribuídas, manter o estado de todos os passos que uma saga deve efetuar até ser considerada concluída é talvez a preocupação de maior criticidade. Esse tipo de controle nos permite identificar quais sagas ainda estão pendentes ou falharam e em que passo isso aconteceu, permitindo criar mecanismos de monitoramento, retentativas, resumos de saga e compensação em caso de erros e etc.

Transições de Estados da Saga

Uma Maquina de Estado, ou State Machine, tem a função de lidar com o estados, eventos, transições e ações.

Os Estados representam o estado atual da máquina e os estados possíveis do sistema. O estado atual corresponde descritivamente ao status da transação, literalmente como Iniciado, Agendado, Pagamento Concluido, Entrega Programada, Finalizado e etc. Os Eventos correspondem a notificações relevantes do processo que podem ou não alterar o estado atual da máquina. Por exemplo, algum dos passos pode enviar os eventos Pagamento Aprovado ou Item não disponível no estoque, que são eventos que podem alterar o curso planejado da saga. Esses eventos podem ou não gerar uma Transição de Estado. As Transições correspondem a mudança de um estado válido para outro estado válido decorrente de um evento recebido. Por exemplo, se o estado de um registro for Estoque Reservado e o sistema de pagamentos enviar o evento de Pagamento Concluído, isso pode notificar a máquina e transicionar o estado para Agendar Entrega. Caso o evento emitido for Pagamento Recusado, o estado da máquina pode ser transicionado para Pedido Cancelado por exemplo. Ao transacionar de um Estado para outro, a máquina executa uma Ação para prossguir com a execução. No exemplo anterior, ao entrar no estado Agendar Entrega, a máquina precisa invocar o microserviço de entregas.

Transicoes

E dentro de um modelo saga, entendemos que o estado atual corresponde a saga em si e eventos são as entradas e saídas dos microserviços e passos que são chamados. Uma máquina de estado precisa ser capaz de guardar o estado atual e, mediante a um evento de mudança que ela recebe de alguma forma, determinar se existirá uma nova transição de estado, e em caso positivo, qual ação ele deve tomar com relação a isso.

Ciclo de Vida da Saga

Imagine que a saga seja iniciada, criando um novo registro na máquina de estado que representa o início de uma saga de fechamento de pedido. Esse estado inicial poderia ser considerado NOVO. Dentro do mapeamento da saga, entendemos que, quando o estado é NOVO, é necessário garantir que o domínio de pedidos tenha gravado todos os dados referentes à solicitação para fins analíticos.

Transicoes

Exemplo do Fluxo de Transição e Ações da Saga

Assim que o serviço de pedidos confirmar a gravação do registro, o estado pode transicionar para RESERVANDO, onde o próximo passo da saga se encarregará de reservar o item em estoque. Após receber a confirmação dessa reserva, o estado se tornará RESERVADO, iniciando em seguida o processo de cobrança, alterando o estado para COBRANDO. Nesse momento, o sistema de pagamentos será notificado e poderá levar algum tempo para responder, informando se o pagamento foi efetivado ou não.

Em caso de sucesso, o estado mudará para COBRADO, e o sistema de entregas será notificado sobre quais itens devem ser entregues, bem como o endereço de destino. Assim, o estado transiciona para INICIAR_ENTREGA. A partir daí, poderíamos ter diversos estados intermediários, nos quais ações adicionais, como o envio de notificações por e-mail, seriam realizadas. Exemplos incluem SEPARACAO, SEPARADO, DESPACHADO, EM_ROTA e ENTREGUE. Finalmente, a saga atinge o estado FINALIZADO, sendo considerada concluída em sua totalidade.

Por outro lado, se o sistema de pagamentos, partindo do estado COBRANDO, mudar para um estado de falha como PAGAMENTO_NEGADO ou NAO_PAGO, a saga deverá notificar o sistema de reservas para liberar os itens, possibilitando que sejam novamente disponibilizados para compra, além de atualizar o estado analítico do sistema de pedidos.

De modo geral, a máquina de estado segue uma lógica semelhante a:

  • Qual evento acabei de receber?COBRADO COM SUCESSO
  • Qual é o meu estado atual?COBRANDO
  • Se meu estado é COBRANDO e eu recebo COBRADO COM SUCESSO, para qual estado devo ir?INICIAR_ENTREGA
  • Qual ação devo tomar ao entrar no estado INICIAR_ENTREGA? → Notificar o sistema de entregas.

Basicamente, o controle funciona questionando: “Que evento é esse?”, “Onde estou agora?”, “Para onde vou agora?” e, finalmente, “O que devo fazer aqui?”.


Logs de Saga e Rastreabilidade da Transação

Manter registros de todos os passos da transação pode ser extremamente vantajoso, tanto em sagas mais simples quanto, principalmente, nas mais complexas, porém pode se tornar custoso se mantido por longo prazo. A principal vantagem de manter uma coordenação de estados é possibilitar a rastreabilidade de todas as sagas: as concluídas, as que estão em andamento ou as que foram finalizadas com erro.

Podemos considerar estruturas e modelagens de dados que permitam gerar uma rastreabilidade completa de todos os passos iniciados e finalizados. Dessa forma, o componente centralizado — no caso dos modelos orquestrados — registra e mantém documentados os passos executados, bem como as respectivas respostas, facilitando o controle pragmático ou manual.

Saga Log

Com isso, é possível verificar de maneira simples quais sagas apresentaram erros, mantendo esses registros na camada de dados. Esses recursos fornecem insumos para criar mecanismos de resiliência inteligentes o suficiente para monitorar, retomar, reiniciar ou tentar novamente os passos que falharam, além de auxiliar na construção de uma visão analítica da execução da jornada de serviço.

Saga Log - Error


Modelos de Ação e Compensação no Saga Pattern

Projetar sistemas distribuídos é assumir um compromisso no qual reconhecemos que lutaremos constantemente contra problemas de consistência de dados. Os patterns de compensação dentro das transações Saga garantem que todos os passos, executados de forma sequencial, possam ser revertidos em caso de falha.

Assim como o modelo Saga é criado para garantir que todas as transações saudáveis sejam executadas com sucesso, o modelo de compensação assegura que, em caso de falha sistêmica — seja por dados inválidos, problemas de disponibilidade irrecuperáveis dentro do SLA da Saga, problemas de saldo, pagamentos, limites de crédito, disponibilidade de estoque ou dados de entrada inválidos — as ações sejam completamente revertidas, permitindo que o sistema retorne a um estado consistente e evitando que apenas parte da transação seja confirmada enquanto o restante falha.

Funcionalidades

Uma forma eficiente de projetar handlers que recebem estímulos e executam algum passo da saga, seja por meio de endpoints de API ou de listeners de eventos ou mensagens, é expor esses handlers junto aos métodos de reversão. Assim, sempre haverá um handler que execute a ação e outro que desfaça essas ações. Por exemplo, reservaPassagens() e liberaPassagens(), cobrarPedido() e estornarCobranca(), ou incrementarUso() e decrementarUso().

Uma vez que dispomos das ferramentas necessárias para que o modelo de orquestração escolhido possa acionar os microserviços responsáveis pelas ações solicitadas, podemos assegurar o chamado “caminho feliz” da saga.

Ação

Com o modelo de Ação e Compensação implementado, o orquestrador da saga também pode “apertar o botão do pânico” quando necessário, notificando todos os microserviços participantes para desfazerem as ações que foram confirmadas. Em uma arquitetura orientada a eventos ou mensageria que ofereça suporte a esse tipo de transação, podemos criar um tópico de compensação da saga com múltiplos consumer groups, de modo que cada um receba a mesma mensagem e execute a compensação se a transação já tiver sido confirmada no serviço em questão.

Compensação


Problemas de Dual Write em Transações Saga

O Dual Write é conhecido tanto como um problema quanto como um pattern clássico em arquiteturas distribuídas. Ele ocorre com frequência em cenários onde determinadas operações precisam gravar dados em dois locais diferentes — seja em um banco de dados e em um cache, em um banco de dados e em uma API externa, em duas APIs distintas ou em um banco de dados e em uma fila ou tópico. Em essência, sempre que for necessário garantir a escrita de forma atômica em múltiplos pontos, estaremos diante desse tipo de desafio.

Para ilustrar o problema na prática em uma aplicação que utiliza o Saga Pattern, consideremos um exemplo em que seja preciso confirmar a operação em um local, mas o outro esteja indisponível. Nesse caso, a confirmação não será atômica, pois as duas escritas deveriam ser consideradas juntas para manter a consistência dos dados.

No modelo coreografado, para que uma operação seja concluída em sua totalidade, cada microserviço executa localmente as ações em seu banco de dados e em seguida publica um evento no broker para o próximo serviço dar continuidade ao fluxo. Esse seria o “caminho feliz” da saga, sem problemas de consistência até aqui.

Dual Write

Modelo Coreografado - Exemplo de dual write

Os problemas de consistência aparecem, por exemplo, quando o dado não é salvo no banco de dados, mas o evento é emitido em sequência; ou quando o dado é salvo corretamente, porém, por indisponibilidade do broker de mensagens, o evento não é emitido. Em ambos os casos, o sistema pode se encontrar em um estado inconsistente.

Dual Write - Error

Modelo Coreografado - Exemplo de falha de dual write

No modelo orquestrado, o mesmo problema pode ocorrer, ainda que de forma ligeiramente diferente. Em um cenário de comando e resposta entre orquestrador e microserviços, se um deles falhar ao tentar garantir a escrita dupla (entre suas dependências e o canal de resposta), poderemos ter uma saga perdida, em que etapas intermediárias não são confirmadas e ficam “presas” no meio do processo por falta de resposta ou confirmação.

Dual Write - Orquestrado Dual Write

Modelo Orquestrado - Exemplo de falha de dual write

Garantir que todos os passos sejam executados com a devida atomicidade é, talvez, a maior complexidade na implementação de um modelo Saga. Os mecanismos de controle precisam dispor de recursos sistêmicos suficientes para lidar com problemas de falhas, adotando retentativas, processos de supervisão de sagas e formas de identificar aquelas que foram iniciadas há muito tempo e ainda não foram concluídas ou estão em um estado inconsistente. A alternativa mais eficiente dentro de banco de dados ACID por exemplo, é executar a publicação do evento dentro de uma transaction no banco de dados, e só commitar a modificação dos dados quando os processos de comunicação estarem concluídos, garantindo que todos os processos, ou nenhum, sejam efetuados.

Outbox Pattern e Change Data Capture em Transações Saga

O Outbox Pattern já foi mencionado anteriormente algumas vezes, porém resolvendo problemas diferentes. Nesse caso, podemos utilizá-lo para atribuir uma característica transacional a execução e controle de steps da saga. Onde temos um processo de relay adicional em um modelo orquestrado que através de uma fila sincrona do banco, consegue verificar quais steps de quais sagas estão pendentes e somente removê-los dessa “fila” no banco quando todos os processos de execução do step forem devidamente executados.

Essa é uma abordagem interessante para se blindar contra os problemas de Dual Write e ajudar a aplicação a se garantir em questão de resiliência em períodos de indisponibilidades totais e parciais de suas dependências.

Change Data Capture

Mecanismos de Change Data Capture podem ser empregados para lidar com o transporte do dado para o sistema subsequente. Essa abordagem pode ser implementada em ambas alternativas arquiteturais do Saga Pattern, embora lidar com as transações de forma pragmática, controlando manualmente a execução, os fallbacks e as lógicas de negócio referentes aos steps da saga seja o mais indicado no padrão orquestrado pelo próprio objetivo do orquestrador.

Two-Phase Commit em Transações Saga

Embora os exemplos deste capítulo tenham adotado uma característica de orquestração assíncrona para detalhar as implementações de Saga, é possível explorar tópicos que nos ajudem a manter certos níveis de consistência em um contexto síncrono, típico de uma abordagem cliente/servidor (request/reply).

O Two-Phase Commit (2PC) é um padrão bastante conhecido para tratar sistemas distribuídos. Ele propõe que, em uma transação com vários participantes, exista um coordenador capaz de garantir que todos estejam “pré-confirmados” (prontos para gravar a transação) antes de efetivamente aplicar as mudanças em seus respectivos estados, realizando, portanto, a confirmação em duas fases. Caso algum dos passos não confirme que esté pronto para comitar o estado, nenhum deles recebe o comando de commit. Além de implementações de microserviços, esse pattern é muito bem empregado em estratégias de replicação.

Saga - 2PC

Two-Phase Commit executado com sucesso

Esse protocolo 2PC traz a sensação de atomicidade para serviços distribuídos que compõem uma transação, pois o coordenador envia solicitações de confirmação a cada participante antes de efetivar o commit. Tal abordagem pode ser de grande valor em transações Saga que exijam a validação de todos os passos antes da conclusão total — principalmente em cenários síncronos, nos quais o cliente aguarda uma resposta imediata e, muitas vezes, a operação pode ser abortada repentinamente, sem a possibilidade de compensar etapas já executadas.

Saga - 2PC ERRO

Two-Phase Commit executado com erro

Caso algum dos serviços não responda com sucesso, ou em tempo hábil para o mecanismo de coordenação da transação, o mesmo envia sinal de rollback da transação para que todos os participantes não considerem as transações pendentes.

Esse pattern, por mais que seja muito útil, também pode se tornar um gargalo de performance em ambientes de alta demanda, por precisar gerenciar multiplas conexões abertas a todo momento em diferentes contextos. Uma forma de otimizar esse tipo de abordagem é adotar protocolos de comunicação que facilite a gestão de long-live-connections como o gRPC que pode manter conexões bidirecionais e reaproveitar a conexão para diversas requisições.

Mecanismos de Reinicialização de Saga

Ainda que os mecanismos de coordenação do Saga Pattern forneçam diversos “guard rails” para a execução de transações, imprevistos sistêmicos podem ocorrer, resultando em inconsistências de estado entre os microserviços. Nesse cenário, é preciso tomar decisões de negócio sobre como lidar com falhas significativas entre os participantes da saga: optar por compensações em massa ou por alguma estratégia de reinicialização de saga.

No caso de uma reinicialização de saga, é essencial que todos os microserviços implementem controles de idempotência, de forma a receber o mesmo comando múltiplas vezes sem gerar erros inesperados. Por exemplo, se um serviço de reserva de quartos de hotel receber repetidamente a mesma solicitação de reserva para o mesmo quarto e para o mesmo usuário, deve aceitar a operação sem sobrescrever ou alterar o estado, enviando a devida resposta de sucesso. Isso facilita processos de ressincronização do estado.

Quando o processo de coordenação (seja orquestrado ou coreografado) recebe estímulos para iniciar uma nova saga com identificadores únicos ou chaves de idempotência já existentes para outra saga, ele pode reiniciar a saga por completo ou verificar quais etapas ficaram incompletas, de modo a reinicializa-las a partir do ponto em que não houve resposta, garantindo assim a consistência das transações.

Obrigado aos Revisores

Referências

SAGAS - Department of Computer Science Princeton University

Saga distributed transactions pattern

Pattern: SAGA

The Saga Pattern in a Reactive Microservices Environmen

Enhancing Saga Pattern for Distributed Transactions within a Microservices Architecture

Model: 8 types of sagas

Saga Pattern in Microservices

SAGA Pattern para microservices

Saga Pattern — Um resumo com Caso de Uso (Pt-Br)

Distributed Sagas: A Protocol for Coordinating Microservices

What is a Saga in Microservices?

Try-Confirm-Cancel (TCC) Protocol

Microservices Patterns: The Saga Pattern

Compensating Actions, Part of a Complete Breakfast with Sagas

Getting started with small-step operational semantics

Microserviços e o problema do Dual Write

Solving the Dual-Write Problem: Effective Strategies for Atomic Updates Across Systems

Outbox Pattern(Saga): Transações distribuídas com microservices

Saga Orchestration for Microservices Using the Outbox Pattern

Martin Kleppmann - Distributed Systems 7.1: Two-phase commit

Distributed Transactions & Two-phase Commit

Try-Confirm-Cancel (TCC) Protocol

Uncanceled Units

Speed limit c arcminutes^2 per steradian

Cícero #89

O post Cícero #89 apareceu primeiro em Mentirinhas.

Reverse mode Automatic Differentiation

Automatic Differentiation (AD) is an important algorithm for calculating the derivatives of arbitrary functions that can be expressed by a computer program. One of my favorite CS papers is "Automatic differentiation in machine learning: a survey" by Baydin, Perlmutter, Radul and Siskind (ADIMLAS from here on). While this post attempts to be useful on its own, it serves best as a followup to the ADIMLAS paper - so I strongly encourage you to read that first.

The main idea of AD is to treat a computation as a nested sequence of function compositions, and then calculate the derivative of the outputs w.r.t. the inputs using repeated applications of the chain rule. There are two methods of AD:

  • Forward mode: where derivatives are computed starting at the inputs
  • Reverse mode: where derivatives are computed starting at the outputs

Reverse mode AD is a generalization of the backpropagation technique used in training neural networks. While backpropagation starts from a single scalar output, reverse mode AD works for any number of function outputs. In this post I'm going to be describing how reverse mode AD works in detail.

While reading the ADIMLAS paper is strongly recommended but not required, there is one mandatory pre-requisite for this post: a good understanding of the chain rule of calculus, including its multivariate formulation. Please read my earlier post on the subject first if you're not familiar with it.

Linear chain graphs

Let's start with a simple example where the computation is a linear chain of primitive operations: the Sigmoid function.

\[S(x)=\frac{1}{1+e^{-x}}\]

This is a basic Python implementation:

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

To apply the chain rule, we'll break down the calculation of S(x) to a sequence of function compositions, as follows:

\[\begin{align*} f(x)&=-x\\ g(f)&=e^f\\ w(g)&=1+g\\ v(w)&=\frac{1}{w} \end{align*}\]

Take a moment to convince yourself that S(x) is equivalent to the composition v\circ(w\circ(g\circ f))(x).

The same decomposition of sigmoid into primitives in Python would look as follows:

def sigmoid(x):
    f = -x
    g = math.exp(f)
    w = 1 + g
    v = 1 / w
    return v

Yet another representation is this computational graph:

Computational graph showing sigmoid

Each box (graph node) represents a primitive operation, and the name assigned to it (the green rectangle on the right of each box). An arrows (graph edge) represent the flow of values between operations.

Our goal is to find the derivative of S w.r.t. x at some point x_0, denoted as S'(x_0). The process starts by running the computational graph forward with our value of x_0. As an example, we'll use x_0=0.5:

Computational graph with forward calculation at 0.5

Since all the functions in this graph have a single input and a single output, it's sufficient to use the single-variable formulation of the chain rule.

\[(g \circ f)'(x_0)={g}'(f(x_0)){f}'(x_0)\]

To avoid confusion, let's switch notation so we can explicitly see which derivatives are involved. For f(x) and g(f) as before, we can write the derivatives like this:

\[f'(x)=\frac{df}{dx}\quad g'(f)=\frac{dg}{df}\]

Each of these is a function we can evaluate at some point; for example, we denote the evaluation of f'(x) at x_0 as \frac{df}{dx}(x_0). So we can rewrite the chain rule like this:

\[\frac{d(g \circ f)}{dx}(x_0)=\frac{dg}{df}(f(x_0))\frac{df}{dx}(x_0)\]

Reverse mode AD means applying the chain rule to our computation graph, starting with the last operation and ending at the first. Remember that our final goal is to calculate:

\[\frac{dS}{dx}(x_0)\]

Where S is a composition of multiple functions. The first composition we unravel is the last node in the graph, where v is calculated from w. This is the chain rule for it:

\[\frac{dS}{dw}=\frac{d(S \circ v)}{dw}(x_0)=\frac{dS}{dv}(v(x_0))\frac{dv}{dw}(x_0)\]

The formula for S is S(v)=v, so its derivative is 1. The formula for v is v(w)=\frac{1}{w}, so its derivative is -\frac{1}{w^2}. Substituting the value of w computed in the forward pass, we get:

\[\frac{dS}{dw}(x_0)=1\cdot\frac{-1}{w^2}\bigg\rvert_{w=1.61}=-0.39\]

Continuing backwards from v to w:

\[\frac{dS}{dg}(x_0)=\frac{dS}{dw}(x_0)\frac{dw}{dg}(x_0)\]

We've already calculated \frac{dS}{dw}(x_0) in the previous step. Since w=1+g, we know that w'(g)=1, so:

\[\frac{dS}{dg}(x_0)=-0.39\cdot1=-0.39\]

Continuing similarly down the chain, until we get to the input x:

\[\begin{align*} \frac{dS}{df}(x_0)&=\frac{dS}{dg}(x_0)\frac{dg}{df}(x_0)=-0.39\cdot e^f\bigg\rvert_{f=-0.5}=-0.24\\ \frac{dS}{dx}(x_0)&=\frac{dS}{df}(x_0)\frac{df}{dx}(x_0)=-0.24\cdot -1=0.24 \end{align*}\]

We're done; the value of the derivative of the sigmoid function at x=0.5 is 0.24; this can be easily verified with a calculator using the analytical derivative of this function.

As you can see, this procedure is rather mechanical and it's not surprising that it can be automated. Before we get to automation, however, let's review the more common scenario where the computational graph is a DAG rather than a linear chain.

General DAGs

The sigmoid sample we worked though above has a very simple, linear computational graph. Each node has a single predecessor and a single successor; moreover, the function itself has a single input and single output. Therefore, the single-variable chain rule is sufficient here.

In the more general case, we'll encounter functions that have multiple inputs, may also have multiple outputs [1], and the internal nodes are connected in non-linear patterns. To compute their derivatives, we have to use the multivariate chain rule.

As a reminder, in the most general case we're dealing with a function that has n inputs, denoted a=a_1,a_2\cdots a_n, and m outputs, denoted f_1,f_2\cdots f_m. In other words, the function is mapping f:\mathbb{R}^{n} \to \mathbb{R}^{m}.

The partial derivative of output i w.r.t. input j at some point a is:

\[D_j f_i(a)=\frac{\partial f_i}{\partial a_j}(a)\]

Assuming f is differentiable at a, then the complete derivative of f w.r.t. its inputs can be represented by the Jacobian matrix:

\[Df(a)=\begin{bmatrix} D_1 f_1(a) & \cdots & D_n f_1(a) \\ \vdots &  & \vdots \\ D_1 f_m(a) & \cdots & D_n f_m(a) \\ \end{bmatrix}\]

The multivariate chain rule then states that if we compose f\circ g (and assuming all the dimensions are correct), the derivative is:

\[D(f \circ g)(a)=Df(g(a)) \cdot Dg(a)\]

This is the matrix multiplication of Df(g(a)) and Dg(a).

Linear nodes

As a warmup, let's start with a linear node that has a single input and a single output:

A single node f(x) with one input and one output

In all these examples, we assume the full graph output is S, and its derivative by the node's outputs is \frac{\partial S}{\partial f}. We're then interested in finding \frac{\partial S}{\partial x}. Since since f:\mathbb{R}\to\mathbb{R}, the Jacobian is just a scalar:

\[Df=\frac{\partial f}{\partial x}\]

And the chain rule is:

\[D(S\circ f)=DS(f)\cdot Df=\frac{\partial S}{\partial f}\frac{\partial f}{\partial x}\]

No surprises so far - this is just the single variable chain rule!

Fan-in

Let's move on to the next scenario, where f has two inputs:

A single node f(x1,x2) with two inputs and one output

Once again, we already have the derivative \frac{\partial S}{\partial f} available, and we're interested in finding the derivative of S w.r.t. the inputs.

In this case, f:\mathbb{R}^2\to\mathbb{R}, so the Jacobian is a 1x2 matrix:

\[Df=\left [ \frac{\partial f}{\partial x_1} \quad \frac{\partial f}{\partial x_2} \right ]\]

And the chain rule here means multiplying a 1x1 matrix by a 1x2 matrix:

\[D(S\circ f)=DS(f)\cdot Df= \left [ \frac{\partial S}{\partial f} \right ] \left [ \frac{\partial f}{\partial x_1} \quad \frac{\partial f}{\partial x_2} \right ] = \left [ \frac{\partial S}{\partial f} \frac{\partial f}{\partial x_1} \quad \frac{\partial S}{\partial f} \frac{\partial f}{\partial x_2} \right ]\]

Therefore, we see that the output derivative propagates to each input separately:

\[\begin{align*} \frac{\partial S}{\partial x_1}&=\frac{\partial S}{\partial f} \frac{\partial f}{\partial x_1}\\ \frac{\partial S}{\partial x_2}&=\frac{\partial S}{\partial f} \frac{\partial f}{\partial x_2} \end{align*}\]

Fan-out

In the most general case, f may have multiple inputs but its output may also be used by more than one other node. As a concrete example, here's a node with three inputs and an output that's used in two places:

A single node f(x1,x2,x3) with three inputs and two outputs

While we denote each output edge from f with a different name, f has a single output! This point is a bit subtle and important to dwell on: yes, f has a single output, so in the forward calculation both f_1 and f_2 will have the same value. However, we have to treat them differently for the derivative calculation, because it's very possible that \frac{\partial S}{\partial f_1} and \frac{\partial S}{\partial f_2} are different!

In other words, we're reusing the machinery of multi-output functions here. If f had multiple outputs (e.g. a vector function), everything would work exactly the same.

In this case, since we treat f as f:\mathbb{R}^3\to\mathbb{R}^2, its Jacobian is a 2x3 matrix:

\[Df= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \frac{\partial f_1}{\partial x_3} \\ \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \frac{\partial f_2}{\partial x_3} \\ \end{bmatrix}\]

The Jacobian DS(f) is a 1x2 matrix:

\[DS(f)=\left [ \frac{\partial S}{\partial f_1} \quad \frac{\partial S}{\partial f_2} \right ]\]

Applying the chain rule:

\[\begin{align*} D(S\circ f)=DS(f)\cdot Df&= \left [ \frac{\partial S}{\partial f_1} \quad \frac{\partial S}{\partial f_2} \right ] \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \frac{\partial f_1}{\partial x_3} \\ \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \frac{\partial f_2}{\partial x_3} \\ \end{bmatrix}\\ &= \left [ \frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_1}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_1}\qquad \frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_2}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_2}\qquad \frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_3}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_3} \right ] \end{align*}\]

Therefore, we have:

\[\begin{align*} \frac{\partial S}{\partial x_1}&=\frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_1}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_1}\\ \frac{\partial S}{\partial x_2}&=\frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_2}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_2}\\ \frac{\partial S}{\partial x_3}&=\frac{\partial S}{\partial f_1}\frac{\partial f_1}{\partial x_3}+\frac{\partial S}{\partial f_2}\frac{\partial f_2}{\partial x_3} \end{align*}\]

The key point here - which we haven't encountered before - is that the derivatives through f add up for each of its outputs (or for each copy of its output). Qualitatively, it means that the sensitivity of f's input to the output is the sum of its sensitivities across each output separately. This makes logical sense, and mathematically it's just the consequence of the dot product inherent in matrix multiplication.

Now that we understand how reverse mode AD works for the more general case of DAG nodes, let's work through a complete example.

General DAGs - full example

Consider this function (a sample used in the ADIMLAS paper):

\[f(x_1, x_2)=ln(x_1)+x_1 x_2-sin(x_2)\]

It has two inputs and a single output; once we decompose it to primitive operations, we can represent it with the following computational graph [2]:

Computational graph of f as function of x_1 and x_2

As before, we begin by running the computation forward for the values of x_1,x_2 at which we're interested to find the derivative. Let's take x_1=2 and x_2=5:

Computational graph with forward calculation at 2, 5

Recall that our goal is to calculate \frac{\partial f}{\partial x_1} and \frac{\partial f}{\partial x_2}. Initially we know that \frac{\partial f}{\partial v_5}=1 [3].

Starting with the v_5 node, let's use the fan-in formulas developed earlier:

\[\begin{align*} \frac{\partial f}{\partial v_4}&=\frac{\partial f}{\partial v_5} \frac{\partial v_5}{\partial v_4}=1\cdot 1=1\\ \frac{\partial f}{\partial v_3}&=\frac{\partial f}{\partial v_5} \frac{\partial v_5}{\partial v_3}=1\cdot -1=-1 \end{align*}\]

Next, let's tackle v_4. It also has a fan-in configuration, so we'll use similar formulas, plugging in the value of \frac{\partial f}{\partial v_4} we've just calculated:

\[\begin{align*} \frac{\partial f}{\partial v_1}&=\frac{\partial f}{\partial v_4} \frac{\partial v_4}{\partial v_1}=1\cdot 1=1\\ \frac{\partial f}{\partial v_2}&=\frac{\partial f}{\partial v_4} \frac{\partial v_4}{\partial v_2}=1\cdot 1=1 \end{align*}\]

On to v_1. It's a simple linear node, so:

\[\frac{\partial f}{\partial x_1}^{(1)}=\frac{\partial f}{\partial v_1} \frac{\partial v_1}{\partial x_1}=1\cdot \frac{1}{x_1}=0.5\]

Note the (1) superscript though! Since x_1 is a fan-out node, it will have more than one contribution to its derivative; we've just computed the one from v_1. Next, let's compute the one from v_2. That's another fan-in node:

\[\begin{align*} \frac{\partial f}{\partial x_1}^{(2)}&=\frac{\partial f}{\partial v_2} \frac{\partial v_2}{\partial x_1}=1\cdot x_2=5\\ \frac{\partial f}{\partial x_2}^{(1)}&=\frac{\partial f}{\partial v_2} \frac{\partial v_2}{\partial x_2}=1\cdot x_1=2 \end{align*}\]

We've calculated the other contribution to the x_1 derivative, and the first out of two contributions for the x_2 derivative. Next, let's handle v_3:

\[\frac{\partial f}{\partial x_2}^{(2)}=\frac{\partial f}{\partial v_3} \frac{\partial v_3}{\partial x_2}=-1\cdot cos(x_2)=-0.28\]

Finally, we're ready to add up the derivative contributions for the input arguments. x_1 is a "fan-out" node, with two outputs. Recall from the section above that we just sum their contributions:

\[\frac{\partial f}{\partial x_1}=\frac{\partial f}{\partial x_1}^{(1)}+\frac{\partial f}{\partial x_1}^{(2)}=0.5+5=5.5\]

And:

\[\frac{\partial f}{\partial x_2}=\frac{\partial f}{\partial x_2}^{(1)}+\frac{\partial f}{\partial x_2}^{(2)}=2-0.28=1.72\]

And we're done! Once again, it's easy to verify - using a calculator and the analytical derivatives of f(x_1,x_2) - that these are the right derivatives at the given points.

Backpropagation in ML, reverse mode AD and VJPs

A quick note on reverse mode AD vs forward mode (please read the ADIMLAS paper for much more details):

Reverse mode AD is the approach commonly used for machine learning and neural networks, because these tend to have a scalar loss (or error) output that we want to minimize. In reverse mode, we have to run AD once per output, while in forward mode we'd have to run it once per input. Therefore, when the input size is much larger than the output size (as is the case in NNs), reverse mode is preferable.

There's another advantage, and it relates to the term vector-jacobian product (VJP) that you will definitely run into once you start digging deeper in this domain.

The VJP is basically a fancy way of saying "using the chain rule in reverse mode AD". Recall that in the most general case, the multivariate chain rule is:

\[D(f \circ g)(a)=Df(g(a)) \cdot Dg(a)\]

However, in the case of reverse mode AD, we typically have a single output from the full graph, so Df(g(a)) is a row vector. The chain rule then means multiplying this row vector by a matrix representing the node's jacobian. This is the vector-jacobian product, and its output is another row vector. Scroll back to the Fan-out sample to see an example of this.

This may not seem very profound so far, but it carries an important meaning in terms of computational efficiency. For each node in the graph, we don't have to store its complete jacobian; all we need is a function that takes a row vector and produces the VJP. This is important because jacobians can be very large and very sparse [4]. In practice, this means that when AD libraries define the derivative of a computation node, they don't ask you to register a complete jacobian for each operation, but rather a VJP.

This also provides an additional way to think about the relative efficiency of reverse mode AD for ML applications; since a graph typically has many inputs (all the weights), and a single output (scalar loss), accumulating from the end going backwards means the intermediate products are VJPs that are row vectors; accumulating from the front would mean multiplying full jacobians together, and the intermediate results would be matrices [5].

A simple Python implementation of reverse mode AD

Enough equations, let's see some code! The whole point of AD is that it's automatic, meaning that it's simple to implement in a program. What follows is the simplest implementation I could think of; it requires one to build expressions out of a special type, which can then calculate gradients automatically.

Let's start with some usage samples; here's the Sigmoid calculation presented earlier:

xx = Var(0.5)
sigmoid = 1 / (1 + exp(-xx))
print(f"xx = {xx.v:.2}, sigmoid = {sigmoid.v:.2}")

sigmoid.grad(1.0)
print(f"dsigmoid/dxx = {xx.gv:.2}")

We begin by building the Sigmoid expression using Var values (more on this later). We can then run the grad method on a Var, with an output gradient of 1.0 and see that the gradient for xx is 0.24, as calculated before.

Here's the expression we used for the DAG section:

x1 = Var(2.0)
x2 = Var(5.0)
f = log(x1) + x1 * x2 - sin(x2)
print(f"x1 = {x1.v:.2}, x2 = {x2.v:.2}, f = {f.v:.2}")

f.grad(1.0)
print(f"df/dx1 = {x1.gv:.2}, df/dx2 = {x2.gv:.2}")

Once again, we build up the expression, then call grad on the final value. It will populate the gv attributes of input Vars with the derivatives calculated w.r.t. these inputs.

Let's see how Var works. The high-level overview is:

  • A Var represents a node in the computational graph we've been discussing in this post.
  • Using operator overloading and custom math functions (like the exp, sin and log seen in the samples above), when an expression is constructed out of Var values, we also build the computational graph in the background. Each Var has links to its predecessors in the graph (the other Vars that feed into it).
  • When the grad method is called, it runs reverse mode AD through the computational graph, using the chain rule.

Here's the Var class:

class Var:
    def __init__(self, v):
        self.v = v
        self.predecessors = []
        self.gv = 0.0

v is the value (forward calculation) of this Var. predecessors is the list of predecessors, each of this type:

@dataclass
class Predecessor:
    multiplier: float
    var: "Var"

Consider the v5 node in DAG sample, for example. It represents the calculation v4-v3. The Var representing v5 will have a list of two predecessors, one for v4 and one for v3. Each of these will have a "multiplier" associated with it:

  • For v3, Predecessor.var points to the Var representing v3 and Predecessor.multiplier is -1, since this is the derivative of v5 w.r.t. v3
  • Similarly, for v4, Predecessor.var points to the Var representing v4 and Predecessor.multiplier is 1.

Let's see some overloaded operators of Var [6]:

def __add__(self, other):
    other = ensure_var(other)
    out = Var(self.v + other.v)
    out.predecessors.append(Predecessor(1.0, self))
    out.predecessors.append(Predecessor(1.0, other))
    return out

# ...

def __mul__(self, other):
    other = ensure_var(other)
    out = Var(self.v * other.v)
    out.predecessors.append(Predecessor(other.v, self))
    out.predecessors.append(Predecessor(self.v, other))
    return out

And some of the custom math functions:

def log(x):
    """log(x) - natural logarithm of x"""
    x = ensure_var(x)
    out = Var(math.log(x.v))
    out.predecessors.append(Predecessor(1.0 / x.v, x))
    return out


def sin(x):
    """sin(x)"""
    x = ensure_var(x)
    out = Var(math.sin(x.v))
    out.predecessors.append(Predecessor(math.cos(x.v), x))
    return out

Note how the multipliers for each node are exactly the derivatives of its output w.r.t. corresponding input. Notice also that in some cases we use the forward calculated value of a Var's inputs to calculate this derivative (e.g. in the case of sin(x), the derivative is cos(x), so we need the actual value of x).

Finally, this is the grad method:

def grad(self, gv):
    self.gv += gv
    for p in self.predecessors:
        p.var.grad(p.multiplier * gv)

Some notes about this method:

  • It has to be invoked on a Var node that represents the entire computation.
  • Since this function walks the graph backwards (from the outputs to the inputs), this is the direction our graph edges are pointing (we keep track of the predecessors of each node, not the successors).
  • Since we typically want the derivative of some output "loss" w.r.t. each Var, the computation will usually start with grad(1.0), because the output of the entire computation is the loss.
  • For each node, grad adds the incoming gradient to its own, and propagates the incoming gradient to each of its predecessors, using the relevant multiplier.
  • The addition self.gv += gv is key to managing nodes with fan-out. Recall our discussion from the DAG section - according to the multivariate chain rule, fan-out nodes' derivatives add up for each of their outputs.
  • This implementation of grad is very simplistic and inefficient because it will process the same Var multiple times in complex graphs. A more efficient implementation would sort the graph topologically first and then would only have to visit each Var once.
  • Since the gradient of each Var adds up, one shouldn't be reusing Vars between different computations. Once grad was run, the Var should not be used for other grad calculations.

The full code for this sample is available here.

Conclusion

The goal of this post is to serve as a supplement for the ADIMLAS paper; once again, if the topic of AD is interesting to you, I strongly encourage you to read the paper! I hope this post added something on top - please let me know if you have any questions.

Industrial strength implementations of AD, like autograd and JAX, have much better ergonomics and performance than the toy implementation shown above. That said, the underlying principles are similar - reverse mode AD on computational graphs.

I'll discuss an implementation of a more sophisticated AD system in a followup post.


[1]In this post we're only looking at single-output graphs, however, since these are typically sufficient in machine learning (the output is some scalar "loss" or "error" that we're trying to minimize). That said, for functions with multiple outputs the process is very similar - we just have to run the reverse mode AD process for each output variable separately.
[2]Note that the notation here is a bit different from the one used for the sigmoid function. This notation is adopted from the ADIMLAS paper, which uses v_i for all temporary values within the graph. I'm keeping the notations different to emphasize they have absolutely no bearing on the math and the AD algorithm. They're just a naming convention.
[3]For consistency, I'll be using the partial derivative notation throughout this example, even for nodes that have a single input and output.
[4]For an example of gigantic, sparse jacobians see my older post on backpropagation through a fully connected layer.
[5]There are a lot of additional nuances here to explain; I strongly recommend this excellent lecture by Matthew Johnson (of JAX and autograd fame) for a deeper overview.
[6]These use the utility function ensure_var; all it does is wrap the its argument in a Var if it's not already a Var. This is needed to wrap constants in the expression, to ensure that the computational graph includes everything.

Thirty Seven

Today is my birthday! 🥳

37 is prime, and also sexy prime. It has the interesting property of being the number most likely offered when you ask someone to pick a number between 1 and 1001. Thirty seven is neat!


Dennis is 37. He’s not old!


Eta Boötes, indicated by the target reticle. Screenshot of Stellarium, an open source planetarium package.
Eta Boötes, indicated by the target reticle. Screenshot of Stellarium, an open source planetarium package.
NGC 2169. Taken by <a href="https://www.instagram.com/buenosaires_skies/">Sergio Eguivar</a>.
NGC 2169. Taken by Sergio Eguivar. Source

Eta Boötes, 37 lightyears away, is now in my light cone. NGC 2169 is an open star cluster in Orion that looks remarkably like the number 37.


thirty-seven.org is a website dedicated to the collection of artifacts marked with the number 37.


This Veritasium video is a cool roundup of the interesting things about 37.


  1. When you control for 69 and 42, which have certain culteral significance. ↩︎

Como fazer sua autoavaliação de forma fácil – e honesta

Preencher a autoavaliação é complicado para quem tem autocrítica apurada e também para quem não tem um bom termo de comparação para saber o quanto está indo bem em cada quesito.

Mesmo quando você entende o valor da autoavaliação, sabe para quê ela servirá e sabe o quanto está disposto a ser sincero no preenchimento, resta a dificuldade prática de definir o que preencher.

Uma maneira simples de preencher as avaliações, nessas condições, tem 3 passos:

  1. ver quais critérios serão avaliados, e colocá-los ordem - do seu melhor ao seu pior.
  2. na ausência de um critério estabelecido, escolher qual a nota mais baixa e a nota mais alta que você irá se dar, e aí distribui-las nos quesitos ordenados do passo anterior - ou seja, esse passo vira uma escolha de quais quesitos ganharão a máxima que você escolheu, quais ganharão a sua mínima, e se tem algum no meio.
  3. enquanto você faz o Passo 2, acabará lembrando de evidências de cada item - anote-as de forma resumida, porque elas viram o fundamento do seu texto complementar (se necessário) ou de alguma justificativa futura que seja solicitada.

    Fim!

    O artigo "Como fazer sua autoavaliação de forma fácil – e honesta" foi originalmente publicado no site TRILUX, de Augusto Campos.

Cuidado com a ilusão da figura interna de autoridade

Estou convencido de que a indecisão que outras pessoas me relatam, muitas vezes, não é por não saber o que se quer, por não saber o que é melhor, ou por não saber o que é o certo: a pessoa se imobiliza porque fica aguardando a aprovação de uma figura interna de autoridade.

É isso mesmo: a pessoa internaliza as expectativas da mãe, do chefe, de alguém que quer impressionar, etc., e aí busca, sem perceber, a aprovação interna dessa figura que só existe na sua fantasia.

É sinal de uma necessidade grande de perceber isso e se libertar da amarra interior.

O artigo "Cuidado com a ilusão da figura interna de autoridade" foi originalmente publicado no site TRILUX, de Augusto Campos.

O custo oculto de ter sangue frio durante as crises

Durante toda a minha vida, eu – mesmo ansioso – sempre exibi o superpoder de me manter calmo e controlado DURANTE emergências.

Eu me preocupo antes, travo depois, mas DURANTE eu sou aquela pedra de gelo, racional e presente, que inspira e acalma os outros

É útil e toda equipe quer contar com alguém assim, porque é um atributo raro e valioso nessas horas (melhor ainda seria não passar por crises, né?), mas só chegando aos 50 eu descobri que isso é mais um traço do meu TEA. Tem que ter lado bom!

(mas só é bom durante a crise – o custo emocional se acumula e vem depois, quando estou sozinho e com o problema superado)

O artigo "O custo oculto de ter sangue frio durante as crises" foi originalmente publicado no site TRILUX, de Augusto Campos.

Radon

A good ²³⁸Umbrella policy should cover it.

Beginning of a MIDI GUI in Rust

A project I'm working on (which is definitely not my SIGBOVIK submission for this year, and definitely not about computer ergonomics) requires me to use MIDI. And to do custom handling of it. So I need something that receives those MIDI events and handles them.

But... I'm going to make mistakes along the way, and a terminal program isn't very interesting for a presentation. So of course, this program also needs a UI.

This should be simple, right? Just a little UI to show things as they come in, should be easy, yeah?

Hahahaha. Haha. Ha. Ha. Whoops.

The initial plan

Who am I kidding?

There was no plan. I sat down with egui's docs open in a tab and started just writing a UI.

After a few false starts with this—it turns out, sitting down without a plan is a recipe for not doing anything at all—I finally asked some friends to talk through it. I had two short meetings with two different friends. Talking it through with them forced me to figure out ahead of time what I wanted, which made the problems much clearer.

Laying out requirements

Our goal here is twofold: to provide a debug UI to show MIDI messages and connected devices, and to serve as scaffolding for future MIDI shenanigans. A few requirements fall out of this1:

  • Display each incoming MIDI message in a list in the UI
  • Display each connected MIDI device in a list in the UI
  • Allow filtering of MIDI messages by type in the UI
  • Provide a convenient hook to add subscribers, unrelated to the UI, which receive all MIDI messages
  • Allow the subscribers to choose which device categories they receive messages from (these categories are things like "piano," "drums", or "wind synth")
  • Dynamically detect MIDI devices, handling the attachment or removal of devices
  • Minimize duplication of responsibility (cloning data is fine, but I want one source of truth for incoming data, not multiple)

Now that we have some requirements, we can think about how this would be implemented. We'll need some sort of routing system. And the UI will need state, which I think should be separate from the state of the core application which handles the routing. That is, I want to make it so that the UI is a subscriber just like all the other subscribers are.

Jumping ahead a bit, here's what it looks like when two of my MIDI devices are connected, after playing a few notes.

Current architecture

The architecture has three main components: the MIDI daemon, the message subscribers, and the GUI.

The MIDI daemon is the core that handles detecting MIDI devices and sets up message routing. This daemon owns the connections for each port2, and periodically checks if devices are still connected or not. When it detects a new device, it maps it onto a type of device (is this a piano? a drum pad? a wind synth?) Each new device gets a listener, which will take every message and send it into the global routing.

The message routing and device connection handling are done in the same loop, which is fine—originally I was separating them, but then I measured the timing and each refresh takes under 250 microseconds. That's more than fast enough for my purposes, and probably well within the latency requirements of most MIDI systems.

The next piece is message subscribers. Each subscriber can specify which type of messages it wants to get, and then their logic is applied to all incoming messages. Right now there's just the GUI subscriber and some debug subscribers, but there'll eventually be a better debug subscriber and there'll be the core handlers that this whole project is written around. (Is this GUI part of a giant yak shave? Maaaaybeeeee.)

The subscribers look pretty simple. Here's one that echoes every message it receives (where dbg_recv is its queue). You just receive from the queue, then do whatever you want with that information!

std::thread::spawn(move || loop {
    match dbg_recv.recv() {
        Ok(m) => println!("debug: {m:?}"),
        Err(err) => println!("err: {err:?}"),
    }
});

Finally we reach the GUI, which has its own receiver (marginally more complicated than the debug print one, but not much). The GUI has a background thread which handles message receiving, and stores these messages into state which the GUI uses. This is all separated out so we won't block a frame render if a message takes some time to handle. The GUI also contains state, in two pieces: the ports and MIDI messages are shared with the background thread, so they're in an Arc<Mutex>. And there is also the pure GUI state, like which fields are selected, which is not shared and is just used inside the GUI logic.

I think there's a rule that you can't bandy around the word "architecture" without drawing at least one diagram, so here's one diagram. This is the flow of messages through the system. Messages come in from each device, go into the daemon, then get routed to where they belong. Ultimately, they end up stored in the GUI state for updates (by the daemon) and display (by the GUI).

State of the code

This one's not quite ready to be used, but the code is available. In particular, the full GUI code can be found in src/ui.rs. Here are the highlights!

First let's look at some of the state handling. Here's the state for the daemon. CTM is a tuple of the connection id, timestamp, and message; I abbreviate it since it's just littered all over the place.

pub struct State {
    midi: MidiInput,
    ports: Vec<MidiInputPort>,
    connections: HashMap<String, MidiInputConnection<(ConnectionId, Sender<CTM>)>>,
    conn_mapping: HashMap<ConnectionId, Category>,

    message_receive: Receiver<CTM>,
    message_send: Sender<CTM>,

    ports_receive: Receiver<Vec<MidiInputPort>>,
    ports_send: Sender<Vec<MidiInputPort>>,
}

And here's the state for the GUI. Data that's written once or only by the GUI is in the struct directly, and anything which is shared is inside an Arc<Mutex>.

/// State used to display the UI. It's intended to be shared between the
/// renderer and the daemon which updates the state.
#[derive(Clone)]
pub struct DisplayState {
    pub midi_input_ports: Arc<Mutex<Vec<MidiInputPort>>>,
    pub midi_messages: Arc<Mutex<VecDeque<CTM>>>,
    pub selected_ports: HashMap<String, bool>,
    pub max_messages: usize,
    pub only_note_on_off: Arc<AtomicBool>,
}

I'm using egui, which is an immediate-mode GUI library. That means we do things a little differently, and we build what is to be rendered on each frame instead of retaining it between frames. It's a model which is different from things like Qt and GTK, but feels pretty intuitive to me, since it's just imperative code!

This comes through really clearly in the menu handling. Here's the code for the top menu bar. We show the top panel, and inside it we create a menu bar. Inside that menu bar, we create menu buttons, which have an if statement for the click hander.

egui::TopBottomPanel::top("menu_bar_panel").show(ctx, |ui| {
    egui::menu::bar(ui, |ui| {
        ui.menu_button("File", |ui| {
            if ui.button("Quit").clicked() {
                ctx.send_viewport_cmd(egui::ViewportCommand::Close);
            }
        });
        ui.menu_button("Help", |ui| {
            if ui.button("About").clicked() {
                // TODO: implement something
            }
        });
    });
});

Notice how we handle clicks on buttons. We don't give it a callback—we just check if it's currently clicked and then take action from there. This is run each frame, and it just... works.

After the top menu panel, we can add our left panel3.

egui::SidePanel::left("instrument_panel").show(ctx, |ui| {
    ui.heading("Connections");

    for port in ports.iter() {
        let port_name = // ...snip!
        let conn_id = port.id();

        let selected = self.state.selected_ports.get(&conn_id).unwrap_or(&false);

        if ui
            .add(SelectableLabel::new(*selected, &port_name))
            .clicked()
        {
            self.state.selected_ports.insert(conn_id, !selected);
        }
    }
});

Once again, we see pretty readable code—though very unfamiliar, if you're not used to immediate mode. We add a heading inside the panel, then we iterate over the ports and for each one we render its label. If it's selected it'll show up in a different color, and if we click on it, the state should toggle4.

The code for displaying the messages is largely the same, and you can check it out in the repo. It's longer, but only in tedious ways, so I'm omitting it here.

What's next?

Coming up, I'm going to focus on what I set out to in the first place and write the handlers. I have some fun logic to do for different MIDI messages!

I'm also going to build another tab in this UI to show the state of those handlers. Their logic will be... interesting... and I want to have a visual representation of it. Both for presentation reasons, and also for debugging, so I can see what state they're in while I try to use those handlers.

I think the next step is... encoding bytes in base 3 or base 45, then making my handlers interpret those. And also base 7 or 11.

And then I'll be able to learn how to use this weird program I'm building.

Sooo this has been fun, but back to work!


Thank you to Anya and Mary for helping me think through some of the problems here, and especially Anya for all the pair programming!


1

I really try to avoid bulleted lists when possible, but "list of requirements" is just about the perfect use case for them.

2

A MIDI port is essentially a physically connected, available device. It's attached to the system, but it's not one we're listening to yet.

3

The order you add panels matters for the final rendering result! This can feel a little different from what we're used to in a declarative model, but I really like it.

4

This is debounced by egui, I think! Otherwise it would result in lots of clicks, since we don't usually hold a button down for precisely the length of one frame render.

5

or in bass drum

The gen auto-trait problem

One of the open questions surrounding the unstable gen {} feature is whether it should return Iterator or IntoIterator. People have had a feeling there might be good reasons for it to return IntoIterator, but couldn't necessarily articulate why. Which is why it was included in the "unresolved questions" section on the gen blocks RFC.

Because I'd like to see gen {} blocks stabilize sooner, I figured it would be worth spending some time looking into this question and see whether there are any reasons to pick one over the other. And I have found what I believe to be a fairly annoying issue with gen returning Iterator that I've started calling the gen auto-trait problem. In this post I'll walk through what this problem is, as well as how gen returning IntoIterator would prevent it. So without further ado, let's dive in!

Leaking auto-traits from gen bodies

The issue I've found has to do with auto-trait impls on reified gen {} instances. Take the thread::spawn API: it takes a closure which returns a type T. If you haven't seen its definition before, here it is:

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + Send + 'static,
    T: Send + 'static,

This signature states that the closure F and return type T must be Send. But what it doesn't state is that all local values created inside the closure F must be Send too - which is common in the async world. Now let's create an iterator using a gen {} block that we try and send across threads. We can create an iterator that yields a single u32:

let iter = gen {
    yield 12u32;
};

Now if we try and pass this to thread::spawn, there are no problems. Our iterator implements + Send and things will work as expected:

// ✅ Ok
thread::spawn(move || {
    for num in iter {
        println("{num}");
    }
}).unwrap();

Now to show you the problem, let's try this again but this time with a gen {} block that internally creates a std::rc::Rc. This type is !Send, which means that any type that holds it will also be !Send. That means that iter in our example here is now no longer thread-safe:

let iter = gen { // `-> impl Iterator + !Send`
    let rc = Rc::new(...);
    yield 12u32;
    rc.do_something();
};

That means if we try and send it across threads we'll get a compiler error:

// ❌ cannot send a `!Send` type across threads
thread::spawn(move || {   // ← `iter` is moved
    for num in iter {     // ← `iter` is used
        println("{num}");
    }
}).unwrap();

And that right there is the problem. Even though iterators are lazy and the Rc in our gen {} block isn't actually constructed until iteration begins on the new thread, the generator block needs to reserve space for it internally and so it inherits the !Send restriction. This leads to incompatibilities where locals defined entirely within gen {} itself end up affecting the public-facing API. This ends up being very subtle and tricky to debug, unless you're already familiar with the generator desugaring.

Preventing the leakage

Solving the gen's auto-trait problem is not that hard. What we want is for the !Send fields in the generator to not show up in the generated type until we are ready to start iterating over it. That sounds a little scary, but in practice all we have to do is for gen {} to start returning an impl IntoIterator rather than an impl Iterator. The actual Iterator will still be !Send, but our type IntoIterator will be Send:

let iter = gen { // `-> impl IntoIterator + Send`
    let rc = Rc::new(...);
    yield 12u32;
    rc.do_something();
};

Since our value iter implements Send we can now happily pass it across thread bounds. And our code continues to operate as expected, as for..in operates on IntoIterator which is implemented for all Iterator:

// ✅ Ok
thread::spawn(move || {   // ← `iter` is moved
    for num in iter {     // ← `iter` is used
        println("{num}");
    }
}).unwrap();

If you think about it, from a theoretical perspective this makes a lot of sense too. We can think of for..in as our way of handling the iteration effect, which expects an IntoIterator. gen {} is the dual of that, used to create new instances of the iteration effect. It's not at all strange for it to return the same trait that for..in expects. With the added bonus that it doesn't leak auto-traits from impl bodies to the type's signature.

What about other effects and auto-traits?

This issue primarily affects effects which make use of the generator transform. That is: iteration and async. In theory I believe that, yes, async {} should probably return IntoFuture rather than Future. In WG Async we regularly see issues that have to do with auto-traits leaking from inside function bodies out into the type's signatures. If async (2019) would have returned IntoFuture (2022) rather than Future (2019) that certainly seems like it could have helped. Though there would be a higher barrier to make that change today now that things are already stable.

On the trait side this doesn't just affect Send either: it applies to all auto-traits, present and future. Though Send is by far the most common trait people will experience issues with today, to a lesser extent this also already applies to Sync. And in the future possibly also auto-traits like Freeze, Leak, and Move. Though this shouldn't be the main motivator, preventing potential future issues is not worthless either.

Conclusion

Admittedly the worst part of gen {} returning IntoIterator is that the name of the trait kind of sucks. IntoIterator sounds auxiliary to Iterator, and so it feels a little wrong to make it the thing we return from gen {}. But on top of that: that's a lot of characters to write.

I wonder what would have happened if we'd taken an approach more similar to Swift. Swift has Sequence which is like Rust's IntoIterator and IteratorProtocol which is like Rust's Iterator. The primary interface people are expected to use is short and memorable. While the secondary interface isn't meant to be directly used, and so it has a much longer and less memorable name. As we're increasingly thinking of IntoIterator as the primary interface for async iteration, maybe we'll want to revisit the trait naming scheme in the future.

In conclusion: it seems like a good idea to prevent auto-traits from leaking from gen bodies when reified. This is the kind of issue that if we can prevent we should prevent, as it can be both difficult to diagnose and annoying to work around. Having auto-trait leakage be a non-issue for generator blocks seems worthwhile.

Precisamos aprender com as pedras demarcadoras dos tsunamis japoneses

Agora que ultrapassamos oficialmente a calamitosa marca de 1,5ºC de aquecimento global, é hora de repensar onde reconstruíremos depois de enchentes cataclísmicas, incêndios florestais em pleno cenário urbano, e outras calamidades que se repetem em ciclos cada vez mais curtos.

Depois dos grandes desastres, a frase que atrai votos e apoio é: “vamos reconstruir tudo”. É uma boa frase, e uma boa ideia, mas quando são desastres de grandes proporções, sujeitos a se repetirem sem prévio aviso, seria ideal reconstruir, sim – mas em um local menos exposto.

NÃO reconstruir no mesmo lugar a cidade devastada pela natureza é uma lição que os japoneses aprenderam há séculos – eles sabem que tsunamis se repetem, e que a tendência é as gerações seguintes reconstruírem na área exposta, e por isso passaram a deixar marcos que resistem ao tempo.


Print do site Atlas Obscura mostrando uma das pedras de tsunami japonesas, em Aneyoshi, indicando o ponto mais alto em que um tsunami do passado alcançou, e advertindo para não voltar a construir abaixo dali.

Essas pedras demarcadoras registram o ponto que os tsunamis do passado alcançaram, com a advertência: lembre da calamidade, não construa abaixo deste ponto.

Precisamos imitá-los, e não temos mais o luxo de intervalos multigeracionais - nossos desastres já aceleraram, e vão acelerar mais ainda.

O artigo "Precisamos aprender com as pedras demarcadoras dos tsunamis japoneses" foi originalmente publicado no site TRILUX, de Augusto Campos.

The origin of the cargo cult metaphor

The cargo cult metaphor is commonly used by programmers. This metaphor was popularized by Richard Feynman's "cargo cult science" talk with a vivid description of South Seas cargo cults. However, this metaphor has three major problems. First, the pop-culture depiction of cargo cults is inaccurate and fictionalized, as I'll show. Second, the metaphor is overused and has contradictory meanings making it a lazy insult. Finally, cargo cults are portrayed as an amusing story of native misunderstanding but the background is much darker: cargo cults are a reaction to decades of oppression of Melanesian islanders and the destruction of their culture. For these reasons, the cargo cult metaphor is best avoided.

Members of the John Frum cargo cult, marching with bamboo "rifles". Photo adapted from The Open Encyclopedia of Anthropology, (CC BY-NC 4.0).

Members of the John Frum cargo cult, marching with bamboo "rifles". Photo adapted from The Open Encyclopedia of Anthropology, (CC BY-NC 4.0).

In this post, I'll describe some cargo cults from 1919 to the present. These cargo cults are completely different from the description of cargo cults you usually find on the internet, which I'll call the "pop-culture cargo cult." Cargo cults are extremely diverse, to the extent that anthropologists disagree on the cause, definition, or even if the term has value. I'll show that many of the popular views of cargo cults come from a 1962 "shockumentary" called Mondo Cane. Moreover, most online photos of cargo cults are fake.

Feynman and Cargo Cult Science

The cargo cult metaphor in science started with Professor Richard Feynman's well-known 1974 commencement address at Caltech.1 This speech, titled "Cargo Cult Science", was expanded into a chapter in his best-selling 1985 book "Surely You're Joking, Mr. Feynman". He said:

In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.

Richard Feynman giving the 1974 commencement address at Caltech. Photo from Wikimedia Commons.

Richard Feynman giving the 1974 commencement address at Caltech. Photo from Wikimedia Commons.

But the standard anthropological definition of "cargo cult" is entirely different: 2

Cargo cults are strange religious movements in the South Pacific that appeared during the last few decades. In these movements, a prophet announces the imminence of the end of the world in a cataclysm which will destroy everything. Then the ancestors will return, or God, or some other liberating power, will appear, bringing all the goods the people desire, and ushering in a reign of eternal bliss.

An anthropology encyclopedia gives a similar definition:

A southwest Pacific example of messianic or millenarian movements once common throughout the colonial world, the modal cargo cult was an agitation or organised social movement of Melanesian villagers in pursuit of ‘cargo’ by means of renewed or invented ritual action that they hoped would induce ancestral spirits or other powerful beings to provide. Typically, an inspired prophet with messages from those spirits persuaded a community that social harmony and engagement in improvised ritual (dancing, marching, flag-raising) or revived cultural traditions would, for believers, bring them cargo.

As you may see, the pop-culture explanation of a cargo cult and the anthropological definition are completely different, apart from the presence of "cargo" of some sort. Have anthropologists buried cargo cults under layers of theory? Are they even discussing the same thing? My conclusion, after researching many primary sources, is that the anthropological description accurately describes the wide variety of cargo cults. The pop-culture cargo cult description, however, takes features of some cargo cults (the occasional runway) and combines this with movie scenes to yield an inaccurate and fictionalized dscription. It may be hard to believe that the description of cargo cults that you see on the internet is mostly wrong, but in the remainder of this article, I will explain this in detail.

Background on Melanesia

Cargo cults occur in a specific region of the South Pacific called Melanesia. I'll give a brief (oversimplified) description of Melanesia to provide important background. The Pacific Ocean islands are divided into three cultural areas: Polynesia, Micronesia, and Melanesia. Polynesia is the best known, including Hawaii, New Zealand, and Samoa. Micronesia, in the northwest, consists of thousands of small islands, of which Guam is the largest; the name "Micronesia" is Greek for "small island". Melanesia, the relevant area for this article, is a group of islands between Micronesia and Australia, including Fiji, Vanuatu, Solomon Islands, and New Guinea. (New Guinea is the world's second-largest island; confusingly, the country of Papua New Guinea occupies the eastern half of the island, while the western half is part of Indonesia.)

Major cultural areas of Oceania. Image by https://commons.wikimedia.org/wiki/File:Pacific_Culture_Areas.jpg.

The inhabitants of Melanesia typically lived in small villages of under 200 people, isolated by mountainous geography. They had a simple, subsistence economy, living off cultivated root vegetables, pigs, and hunting. People tended their own garden, without specialization into particular tasks. The people of Melanesia are dark-skinned, which will be important ("Melanesia" and "melanin" have the same root). Technologically, the Melanesians used stone, wood, and shell tools, without knowledge of metallurgy or even weaving. The Melanesian cultures were generally violent3 with everpresent tribal warfare and cannibalism.4

Due to the geographic separation of tribes, New Guinea became the most linguistically diverse country in the world, with over 800 distinct languages. Pidgin English was often the only way for tribes to communicate, and is now one of the official languages of New Guinea. This language, called Tok Pisin (i.e. "talk pidgin"), is now the most common language in Papua New Guinea, spoken by over two-thirds of the population.5

For the Melanesians, religion was a matter of ritual, rather than a moral framework. It is said that "to the Melanesian, a religion is above all a technology: it is the knowledge of how to bring the community into the correct relation, by rites and spells, with the divinities and spirit-beings and cosmic forces that can make or mar man's this-worldly wealth and well-being." This is important since, as will be seen, the Melanesians expected that the correct ritual would result in the arrival of cargo. Catholic and Protestant missionaries converted the inhabitants to Christianity, largely wiping out traditional religious practices and customs; Melanesia is now over 95% Christian. Christianity played a large role in cargo cults, as will be shown below.

European explorers first reached Melanesia in the 1500s, followed by colonization.6 By the end of the 1800s, control of the island of New Guinea was divided among Germany, Britain, and the Netherlands. Britain passed responsibility to Australia in 1906 and Australia gained the German part of New Guinea in World War I. As for the islands of Vanuatu, the British and French colonized them (under the name New Hebrides) in the 18th century.

The influx of Europeans was highly harmful to the Melanesians. "Native society was severely disrupted by war, by catastrophic epidemics of European diseases, by the introduction of alcohol, by the devastation of generations of warfare, and by the depredations of the labour recruiters."8 People were kidnapped and forced to work as laborers in other countries, a practice called blackbirding. Prime agricultural land was taken by planters to raise crops such as coconuts for export, with natives coerced into working for the planters.9 Up until 1919, employers were free to flog the natives for disobedience; afterward, flogging was technically forbidden but still took place. Colonial administrators jailed natives who stepped out of line.7

Cargo cults before World War II

While the pop-culture cargo cults explains them as a reaction to World War II, cargo cults started years earlier. One anthropologist stated, "Cargo cults long preceded [World War II], continued to occur during the war, and have continued to the present."

The first writings about cargo cult behavior date back to 1919, when it was called the "Vailala Madness":10

The natives were saying that the spirits of their ancestors had appeared to several in the villages and told them that all flour, rice, tobacco, and other trade belonged to the New Guinea people, and that the white man had no right whatever to these goods; in a short time all the white men were to be driven away, and then everything would be in the hands of the natives; a large ship was also shortly to appear bringing back the spirits of their departed relatives with quantities of cargo, and all the villages were to make ready to receive them.

The 1926 book In Unknown New Guinea also describes the Vialala Madness:11

[The leader proclaimed] that the ancestors were coming back in the persons of the white people in the country and that all the things introduced by the white people and the ships that brought them belonged really to their ancestors and themselves. [He claimed that] he himself was King George and his friend was the Governor. Christ had given him this authority and he was in communication with Christ through a hole near his village.

The Melanesians blamed the Europeans for the failure of cargo to arrive. In the 1930s, one story was that because the natives had converted to Christianity, God was sending the ancestors with cargo that was loaded on ships. However, the Europeans were going through the cargo holds and replacing the names on the crates so the cargo was fraudulently delivered to the Europeans instead of the rightful natives.

The Mambu Movement occurred in 1937. Mambu, the movement's prophet, claimed that "the Whites had deceived the natives. The ancestors lived inside a volcano on Manum Island, where they worked hard making goods for their descendants: loin-cloths, socks, metal axes, bush-knives, flashlights, mirrors, red dye, etc., even plank-houses, but the scoundrelly Whites took the cargoes. Now this was to stop. The ancestors themselves would bring the goods in a large ship." To stop this movement, the Government arrested Mambu, exiled him, and imprisoned him for six months in 1938.

To summarize, these early cargo cults believed that ships would bring cargo that rightfully belonged to the natives but had been stolen by the whites. The return of the cargo would be accompanied by the spirits of the ancestors. Moreover, Christianity often played a large role. A significant racial component was present, with natives driving out the whites or becoming white themselves.

Cargo cults in World War II and beyond

World War II caused tremendous social and economic upheavals in Melanesia. Much of Melanesia was occupied by Japan near the beginning of the war and the Japanese treated the inhabitants harshly. The American entry into the war led to heavy conflict in the area such as the arduous New Guinea campaign (1942-1945) and the Solomon Islands campaign. As the Americans and Japanese battled for control of the islands, the inhabitants were caught in the middle. Papua and New Guinea suffered over 15,000 civilian deaths, a shockingly high number for such a small region.12


The photo shows a long line of F4F Wildcats at Henderson Field, Guadalcanal, Solomon Islands, April 14, 1943.
Solomon Islands was home to several cargo cults, both before and after World War II (see map).
Source: US Navy photo 80-G-41099.

The photo shows a long line of F4F Wildcats at Henderson Field, Guadalcanal, Solomon Islands, April 14, 1943. Solomon Islands was home to several cargo cults, both before and after World War II (see map). Source: US Navy photo 80-G-41099.

The impact of the Japanese occupation on cargo cults is usually ignored. One example from 1942 is a cargo belief that the Japanese soldiers were spirits of the dead, who were being sent by Jesus to liberate the people from European rule. The Japanese would bring the cargo by airplane since the Europeans were blocking the delivery of cargo by ship. This would be accompanied by storms and earthquakes, and the natives' skin would change from black to white. The natives were to build storehouses for the cargo and fill the storehouses with food for the ancestors. The leader of this movement, named Tagarab, explained that he had an iron rod that gave him messages about the future. Eventually, the Japanese shot Tagarab, bringing an end to this cargo cult.13

The largest and most enduring cargo cult is the John Frum movement, which started on the island of Tanna around 1941 and continues to the present. According to one story, a mythical person known as John Frum, master of the airplanes, would reveal himself and drive off the whites. He would provide houses, clothes, and food for the people of Tanna. The island of Tanna would flatten as the mountains filled up the valleys and everyone would have perfect health. In other areas, the followers of John Frum believed they "would receive a great quantity of goods, brought by a white steamer which would come from America." Families abandoned the Christian villages and moved to primitive shelters in the interior. They wildly spent much of their money and threw the rest into the sea. The government arrested and deported the leaders, but that failed to stop the movement. The identity of John Frum is unclear; he is sometimes said to be a white American while in other cases natives have claimed to be John Frum.14

The cargo cult of Kainantu17 arose around 1945 when a "spirit wind" caused people in the area to shiver and shake. Villages built large "cargo houses" and put stones, wood, and insect-marked leaves inside, representing European goods, rifles, and paper letters respectively. They killed pigs and anointed the objects, the house, and themselves with blood. The cargo house was to receive the visiting European spirit of the dead who would fill the house with goods. This cargo cult continued for about 5 years, diminishing as people became disillusioned by the failure of the goods to arrive.

The name "Cargo Cult" was first used in print in 1945, just after the end of World War II.15 The article blamed the problems on the teachings of missionaries, with the problems "accentuated a hundredfold" by World War II.

Stemming directly from religious teaching of equality, and its resulting sense of injustice, is what is generally known as “Vailala Madness,” or “Cargo Cult.” "In all cases the "Madness" takes the same form: A native, infected with the disorder, states that he has been visited by a relative long dead, who stated that a great number of ships loaded with "cargo" had been sent by the ancestor of the native for the benefit of the natives of a particular village or area. But the white man, being very cunning, knows how to intercept these ships and takes the "cargo" for his own use... Livestock has been destroyed, and gardens neglected in the expectation of the magic cargo arriving. The natives infected by the "Madness" sank into indolence and apathy regarding common hygiene."

In a 1946 episode, agents of the Australian government found a group of New Guinea highlanders who believed that the arrival of the whites signaled that the end of the world was at hand. The highlanders butchered all their pigs in the expectation that "Great Pigs" would appear from the sky in three days. At this time, the residents would exchange their black skin for white skin. They created mock radio antennas of bamboo and rope to receive news of the millennium.16

The New York Times described Cargo Cults in 1948 as "the belief that a convoy of cargo ships is on its way, laden with the fruits of the modern world, to outfit the leaf huts of the natives." The occupants of the British Solomon Islands were building warehouses along the beaches to hold these goods. Natives marched into a US Army camp, presented $3000 in US money, and asked the Army to drive out the British.

A 1951 paper described cargo cults: "The insistence that a 'cargo' of European goods is to be sent by the ancestors or deceased spirits; this may or may not be part of a general reaction against Europeans, with an overtly expressed desire to be free from alien domination. Usually the underlying theme is a belief that all trade goods were sent by ancestors or spirits as gifts for their descendants, but have been misappropriated on the way by Europeans."17

In 1959, The New York Times wrote about cargo cults: "Rare Disease and Strange Cult Disturb New Guinea Territory; Fatal Laughing Sickness Is Under Study by Medical Experts—Prophets Stir Delusions of Food Arrivals". The article states that "large native groups had been infected with the idea that they could expect the arrival of spirit ships carrying large supplies of food. In false anticipation of the arrival of the 'cargoes', 5000 to 7000 native have been known to consume their entire food reserve and create a famine." As for "laughing sickness", this is now known to be a prion disease transmitted by eating human brains. In some communities, this disease, also called Kuru, caused 50% of all deaths.

A detailed 1959 article in Scientific American, "Cargo Cults", described many different cargo cults.16 It lists various features of cargo cults, such as the return of the dead, skin color switching from black to white, threats against white rule, and belief in a coming messiah. The article finds a central theme in cargo cults: "The world is about to end in a terrible cataclysm. Thereafter God, the ancestors or some local culture hero will appear and inaugurate a blissful paradise on earth. Death, old age, illness and evil will be unknown. The riches of the white man will accrue to the Melanesians."

In 1960, the celebrated naturalist David Attenborough created a documentary The People of Paradise: Cargo Cult.18 Attenborough travels through the island of Tanna and encounters many artifacts of the John Frum cult, such as symbolic gates and crosses, painted brilliant scarlet and decorated with objects such as a shaving brush, a winged rat, and a small carved airplane. Attenborough interviews a cult leader who claims to have talked with the mythical John Frum, said to be a white American. The leader remains in communication with John Frum through a tall pole said to be a radio mast, and an unseen radio. (The "radio" consisted of an old woman with electrical wire wrapper around her waist, who would speak gibberish in a trance.)

"Symbols of the cargo cult." In the center, a representation of John Frum with "scarlet coat and a white European face" stands behind a brilliantly painted cross. A wooden airplane is on the right, while on the left (outside the photo) a cage contains a winged rat. From Journeys to the Past, which describes Attenborough's visit to the island of Tanna.

"Symbols of the cargo cult." In the center, a representation of John Frum with "scarlet coat and a white European face" stands behind a brilliantly painted cross. A wooden airplane is on the right, while on the left (outside the photo) a cage contains a winged rat. From Journeys to the Past, which describes Attenborough's visit to the island of Tanna.

In 1963, famed anthropologist Margaret Mead brought cargo cults to the general public, writing Where Americans are Gods: The Strange Story of the Cargo Cults in the mass-market newspaper supplement Family Weekly. In just over a page, this article describes the history of cargo cults before, during, and after World War II.19 One cult sat around a table with vases of colorful flowers on them. Another cult threw away their money. Another cult watched for ships from hilltops, expecting John Frum to bring a fleet of ships bearing cargo from the land of the dead.

One of the strangest cargo cults was a group of 2000 people on New Hanover Island, "collecting money to buy President Johnson of the United States [who] would arrive with other Americans on the liner Queen Mary and helicopters next Tuesday." The islanders raised $2000, expecting American cargo to follow the president. Seeing the name Johnson on outboard motors confirmed their belief that President Johnson was personally sending cargo.20

A 1971 article in Time Magazine22 described how tribesmen brought US Army concrete survey markers down from a mountaintop while reciting the Roman Catholic rosary, dropping the heavy markers outside the Australian government office. They expected that "a fleet of 500 jet transports would disgorge thousands of sympathetic Americans bearing crates of knives, steel axes, rifles, mirrors and other wonders." Time magazine explained the “cargo cult” as "a conviction that if only the dark-skinned people can hit on the magic formula, they can, without working, acquire all the wealth and possessions that seem concentrated in the white world... They believe that everything has a deity who has to be contacted through ritual and who only then will deliver the cargo." Cult leaders tried "to duplicate the white man’s magic. They hacked airstrips in the rain forest, but no planes came. They built structures that look like white men’s banks, but no money materialized."21

National Geographic, in an article Head-hunters in Today's World (1972), mentioned a cargo-cult landing field with a replica of a radio aerial, created by villagers who hoped that it would attract airplanes bearing gifts. It also described a cult leader in South Papua who claimed to obtain airplanes and cans of food from a hole in the ground. If the people believed in him, their skins would turn white and he would lead them to freedom.

These sources and many others23 illustrate that cargo cults do not fit a simple story. Instead, cargo cults are extremely varied, happening across thousands of miles and many decades. The lack of common features between cargo cults leads some anthropologists to reject the idea of cargo cults as a meaningful term.24 In any case, most historical cargo cults have very little in common with the pop-culture description of a cargo cult.

Cargo beliefs were inspired by Christianity

Cargo cult beliefs are closely tied to Christianity, a factor that is ignored in pop-culture descriptions of cargo cults. Beginning in the mid-1800s, Christian missionaries set up churches in New Guinea to convert the inhabitants. As a result, cargo cults incorporated Christian ideas, but in very confusing ways. At first, the natives believed that missionaries had come to reveal the ritual secrets and restore the cargo. By enthusiastically joining the church, singing the hymns, and following the church's rituals, the people would be blessed by God, who would give them the cargo. This belief was common in the 1920s and 1930s, but as the years went on and the people didn't receive the cargo, they theorized that the missionaries had removed the first pages of the Bible to hide the cargo secrets.

A typical belief was that God created Adam and Eve in Paradise, "giving them cargo: tinned meat, steel tools, rice in bags, tobacco in tins, and matches, but not cotton clothing." When Adam and Eve offended God by having sexual intercourse, God threw them out of Paradise and took their cargo. Eventually, God sent the Flood but Noah was saved in a steamship and God gave back the cargo. Noah's son Ham offended God, so God took the cargo away from Ham and sent him to New Guinea, where he became the ancestor of the natives.

Other natives believed that God lived in Heaven, which was in the clouds and reachable by ladder from Sydney, Australia (source). God, along with the ancestors, created cargo in Heaven—"tinned meat, bags of rice, steel tools, cotton cloth, tinned tobacco, and a machine for making electric light"—which would be flown from Sydney and delivered to the natives, who thus needed to clear an airstrip (source).25

Another common belief was that symbolic radios could be used to communicate with Jesus. For instance, a Markham Valley cargo group in 1943 created large radio houses so they could be informed of the imminent Coming of Jesus, at which point the natives would expel the whites (source). The "radio" consisted of bamboo cylinders connected to a rope "aerial" strung between two poles. The houses contained a pole with rungs so the natives could climb to Jesus along with cane "flashlights" to see Jesus.

A tall mast with a flag and cross on top. This was claimed to be a special radio mast that enabled
communication with John Frum. It was decorated with scarlet leaves and flowers.
From Attenborough's Cargo Cult.

A tall mast with a flag and cross on top. This was claimed to be a special radio mast that enabled communication with John Frum. It was decorated with scarlet leaves and flowers. From Attenborough's Cargo Cult.

Mock radio antennas are also discussed in a 1943 report26 from a wartime patrol that found a bamboo "wireless house", 42 feet in diameter. It had two long poles outside and with an "aerial" of rope between them, connected to the "radio" inside, a bamboo cylinder. Villagers explained that the "radio" was to receive messages of the return of Jesus, who would provide weapons for the overthrow of white rule. The villagers constructed ladders outside the house so they could climb up to the Christian God after death. They would shed their skin like a snake, getting a new white skin, and then they would receive the "boats and white men's clothing, goods, etc."

Mondo Cane and the creation of the pop-culture cargo cult

As described above, cargo cults expected the cargo to arrive by ships much more often than airplanes. So why do pop-culture cargo cults have detailed descriptions of runways, airplanes, wooden headphones, and bamboo control towers?27 My hypothesis is that it came from a 1962 movie called Mondo Cane. This film was the first "shockumentary", showing extreme and shocking scenes from around the world. Although the film was highly controversial, it was shown at the Cannes Film Festival and was a box-office success.

The film made extensive use of New Guinea with multiple scandalous segments, such as a group of "love-struck" topless women chasing men,29 a woman breastfeeding a pig, and women in cages being fattened for marriage. The last segment in the movie showed "the cult of the cargo plane": natives forlornly watching planes at the airport, followed by scenes of a bamboo airplane sitting on a mountaintop "runway" along with bamboo control towers. The natives waited all day and then lit torches to illuminate the runway at nightfall. These scenes are very similar to the pop-culture descriptions of cargo cults so I suspect this movie is the source.

A still from the 1962 movie "Mondo Cane", showing a bamboo airplane sitting on a runway, with flaming torches acting as beacons. I have my doubts about its accuracy.

A still from the 1962 movie "Mondo Cane", showing a bamboo airplane sitting on a runway, with flaming torches acting as beacons. I have my doubts about its accuracy.

The film claims that all the scenes "are true and taken only from life", but many of the scenes are said to be staged. Since the cargo cult scenes are very different from anthropological reports and much more dramatic, I think they were also staged and exaggerated.28 It is known that the makers of Mondo Cane paid the Melanesian natives generously for the filming (source, source).

Did Feynman get his cargo cult ideas from Mondo Cane? It may seem implausible since the movie was released over a decade earlier. However, the movie became a cult classic, was periodically shown in theaters, and influenced academics.30 In particular, Mondo Cane showed at the famed Cameo theater in downtown Los Angeles on April 3, 1974, two months before Feynman's commencement speech. Mondo Cane seems like the type of offbeat movie that Feynman would see and the theater was just 11 miles from Caltech. While I can't prove that Feynman went to the showing, his description of a cargo cult strongly resembles the movie.31

Fake cargo-cult photos fill the internet

Fakes and hoaxes make researching cargo cults online difficult. There are numerous photos online of cargo cults, but many of these photos are completely made up. For instance, the photo below has illustrated cargo cults for articles such as Cargo Cult, UX personas are useless, A word on cargo cults, The UK Integrated Review and security sector innovation, and Don't be a cargo cult. However, this photo is from a Japanese straw festival and has nothing to do with cargo cults.

An airplane built from straw, one creation at a Japanese straw festival. I've labeled the photo with "Not cargo cult" to ensure it doesn't get reused in cargo cult articles.

An airplane built from straw, one creation at a Japanese straw festival. I've labeled the photo with "Not cargo cult" to ensure it doesn't get reused in cargo cult articles.

Another example is the photo below, supposedly an antenna created by a cargo cult. However, it is actually a replica of the Jodrell Bank radio telescope, built in 2007 by a British farmer from six tons of straw (details). The farmer's replica ended up erroneously illustrating Cargo Cult Politics, The Cargo Cult & Beliefs, The Cargo Cult, Cargo Cults of the South Pacific, and Cargo Cult, among others.32

A British farmer created this replica radio telescope. Photo by Mike Peel, (CC BY-SA 4.0).

A British farmer created this replica radio telescope. Photo by Mike Peel, (CC BY-SA 4.0).

Other articles illustrate cargo cults with the aircraft below, suspiciously sleek and well-constructed. However, the photo actually shows a wooden wind tunnel model of the Buran spacecraft, abandoned at a Russian airfield as described in this article. Some uses of the photo are Are you guilty of “cargo cult” thinking without even knowing it? and The Cargo Cult of Wealth.

This is an abandoned Soviet wind tunnel model of the Buran spacecraft. Photo by Aleksandr Markin.

This is an abandoned Soviet wind tunnel model of the Buran spacecraft. Photo by Aleksandr Markin.

Many cargo cult articles use one of the photo below. I tracked them down to the 1970 movie "Chariots of the Gods" (link), a dubious documentary claiming that aliens have visited Earth throughout history. The segment on cargo cults is similar to Mondo Cane with cultists surrounding a mock plane on a mountaintop, lighting fires along the runway. However, it is clearly faked, probably in Africa: the people don't look like Pacific Islanders and are wearing wigs. One participant wears leopard skin (leopards don't live in the South Pacific). The vegetation is another giveaway: the plants are from Africa, not the South Pacific.33

Two photos of a straw plane from "Chariots of the Gods".

Two photos of a straw plane from "Chariots of the Gods".

The point is that most of the images that illustrate cargo cults online are fake or wrong. Most internet photos and information about cargo cults have just been copied from page to page. (And now we have AI-generated cargo cult photos.) If a photo doesn't have a clear source (including who, when, and where), don't believe it.

Conclusions

The cargo cult metaphor should be avoided for three reasons. First, the metaphor is essentially meaningless and heavily overused. The influential "Jargon File" defined cargo-cult programming as "A style of (incompetent) programming dominated by ritual inclusion of code or program structures that serve no real purpose."34 Note that the metaphor in cargo-cult programming is the opposite of the metaphor in cargo-cult science: Feyman's cargo-cult science has no chance of working, while cargo-cult programming works but isn't understood. Moreover, both metaphors differ from the cargo-cult metaphor in other contexts, referring to the expectation of receiving valuables without working.35

The popular site Hacker News is an example of how "cargo cult" can be applied to anything: agile programming, artificial intelligence, cleaning your desk. Go, hatred of Perl, key rotation, layoffs, MBA programs, microservices, new drugs, quantum computing, static linking, test-driven development, and updating the copyright year are just a few things that are called "cargo cult".36 At this point, cargo cult is simply a lazy, meaningless attack.

The second problem with "cargo cult" is that the pop-culture description of cargo cults is historically inaccurate. Actual cargo cults are much more complex and include a much wider (and stranger) variety of behaviors. Cargo cults started before World War II and involve ships more often than airplanes. Cargo cults mix aspects of paganism and Christianity, often with apocalyptic ideas of the end of the current era, the overthrow of white rule, and the return of dead ancestors. The pop-culture description discards all this complexity, replacing it with a myth.

Finally, the cargo cult metaphor turns decades of harmful colonialism into a humorous anecdote. Feynman's description of cargo cults strips out the moral complexity: US soldiers show up with their cargo and planes, the indigenous residents amusingly misunderstand the situation, and everyone carries on. However, cargo cults really were a response to decades of colonial mistreatment, exploitation, and cultural destruction. Moreover, cargo cults were often harmful: expecting a bounty of cargo, villagers would throw away their money, kill their pigs, and stop tending their crops, resulting in famine. The pop-culture cargo cult erases the decades of colonial oppression, along with the cultural upheaval and deaths from World War II. Melanesians deserve to be more than the punch line in a cargo cult story.

Thus, it's time to move beyond the cargo cult metaphor.

Update: well, this sparked much more discussion on Hacker News than I expected. To answer some questions: Am I better or more virtuous than other people? No. Are you a bad person if you use the cargo cult metaphor? No. Is "cargo cult" one of many Hacker News comments that I'm tired of seeing? Yes (details). Am I criticizing Feynman? No. Do the Melanesians care about this? Probably not. Did I put way too much research into this? Yes. Is criticizing colonialism in the early 20th century woke? I have no response to that.

Notes and references

  1. As an illustration of the popularity of Feynman's "Cargo Cult Science" commencement address, it has been on Hacker News at least 15 times. 

  2. The first cargo cult definition above comes from The Trumpet Shall Sound; A Study of "Cargo" Cults in Melanesia. The second definition is from the Cargo Cult entry in The Open Encyclopedia of Anthropology. Written by Lamont Lindstrom, a professor who studies Melanesia, the entry comprehensively describes the history and variety of cargo cults, as well as current anthropological analysis.

    For an early anthropological theory of cargo cults, see An Empirical Case-Study: The Problem of Cargo Cults in "The Revolution in Anthropology" (Jarvie, 1964). This book categorizes cargo cults as an apocalyptic millenarian religious movement with a central tenet:

    When the millennium comes it will largely consist of the arrival of ships and/or aeroplanes loaded up with cargo; a cargo consisting either of material goods the natives long for (and which are delivered to the whites in this manner), or of the ancestors, or of both.
     
  3. European colonization brought pacification and a reduction in violence. The Cargo Cult: A Melanesian Type-Response to Change describes this pacification and termination of warfare as the Pax Imperii, suggesting that pacification came as a relief to the Melanesians: "They welcomed the cessation of many of the concomitants of warfare: the sneak attack, ambush, raiding, kidnapping of women and children, cannibalism, torture, extreme indignities inflicted on captives, and the continual need to be concerned with defense." That article calls the peace the Pax Imperii.

    Warfare among the Enga people of New Guinea is described in From Spears to M-16s: Testing the Imbalance of Power Hypothesis among the Enga. The Enga engaged in tribal warfare for reasons such as "theft of game from traps, quarrels over possessions, or work sharing within the group." The surviving losers were usually driven off the land and forced to settle elsewhere. In the 1930s and 1940s, the Australian administration banned tribal fighting and pacified much of the area. However, after the independence of Papua New Guinea in 1975, warfare increased along with the creation of criminal gangs known as Raskols (rascals). The situation worsened in the late 1980s with the introduction of shotguns and high-powered weapons to warfare. Now, Papua New Guinea has one of the highest crime rates in the world along with one of the lowest police-to-population ratios in the world. 

  4. When you hear tales of cannibalism, some skepticism is warranted. However, cannibalism is proved by the prevalence of kuru, or "laughing sickness", a fatal prion disease (transmissible spongiform encephalopathy) spread by consuming human brains. Also see Headhunters in Today's World, a 1972 National Geographic article that describes the baking of heads and the eating of brains. 

  5. A 1957 dictionary of Pidgin English can be found here. Linguistically, Tok Pisin is a creole, not a pidgin. 

  6. The modern view is that countries such as Great Britain acquired colonies against the will of the colonized, but the situation was more complex in the 19th century. Many Pacific islands desperately wanted to become European colonies, but were turned down for years because the countries were viewed as undesiable burdens.

    For example, Fiji viewed colonization as the solution to the chaos caused by the influx of white settlers in the 1800s. Fijian political leaders attempted to cede the islands to a European power that could end the lawlessness, but were turned down. In 1874, the situation changed when Disraeli was elected British prime minister. His pro-imperial policies, along with the Royal Navy's interest in obtaining a coaling station, concerns about American expansion, and pressure from anti-slavery groups, led to the annexation of Fiji by Britain. The situation in Fiji didn't particularly improve from annexation. (Fiji obtained independence almost a century later, in 1970.)

    As an example of the cost of a colony, Australia was subsidizing Papua New Guinea (with a population of 2.5 million) with over 100 million dollars a year in the early 1970s. (source

  7. When reading about colonial Melanesia, one notices a constant background of police activity. Even when police patrols were very rare (annual in some parts), they were typically accompanied by arbitrary arrests and imprisonment. The most common cause for arrest was adultery; it may seem strange that the police were so concerned with it, but it turns out that adultery was the most common cause of warfare between tribes, and the authorities were trying to reduce the level of warfare. Cargo cult activity could be punished by six months of imprisonment. Jailing tended to be ineffective in stopping cargo cults, however, as it was viewed as evidence that the Europeans were trying to stop the cult leaders from spreading the cargo secrets that they had uncovered. 

  8. See The Trumpet Shall Sound

  9. The government imposed a head tax, which for the most part could only be paid through employment. A 1924 report states, "The primary object of the head tax was not to collect revenue but to create among the natives a need for money, which would make labour for Europeans desirable and would force the natives to accept employment." 

  10. The Papua Annual Report, 1919-20 includes a report on the "Vailala Madness", starting on page 118. It describes how villages with the "Vialala madness" had "ornamented flag-poles, long tables, and forms or benches, the tables being usually decorated with flowers in bottles of water in imitation of a white man's dining table." Village men would sit motionless with their backs to the tables. Their idleness infuriated the white men, who considered the villagers to be "fit subjects for a lunatic asylum." 

  11. The Vailala Madness is also described in The Missionary Review of the World, 1924. The Vaialala Madness also involved seizure-like physical aspects, which typically didn't appear in later cargo cult behavior.

    The 1957 book The Trumpet Shall Sound: A Study of "Cargo" Cults in Melanesia is an extensive discussion of cargo cults, as well as earlier activity and movements. Chapter 4 covers the Vailala Madness in detail. 

  12. The battles in the Pacific have been extensively described from the American and Japanese perspectives, but the indigenous residents of these islands are usually left out of the narratives. This review discusses two books that provide the Melanesian perspective.

    I came across the incredible story of Sergeant Major Vouza of the Native Constabulary. While this story is not directly related to cargo cults, I wanted to include it as it illustrates the dedication and suffering of the New Guinea natives during World War II. Vouza volunteered to scout behind enemy lines for the Marines at Guadalcanal but he was captured by the Japanese, tied to a tree, tortured, bayonetted, and left for dead. He chewed through his ropes, made his way through the enemy force, and warned the Marines of an impending enemy attack.

    SgtMaj Vouza, British Solomon Islands Constabulary.
From The Guadalcanal Campaign, 1949.

    SgtMaj Vouza, British Solomon Islands Constabulary. From The Guadalcanal Campaign, 1949.

    Vouza described the event in a letter:

    Letter from SgtMaj Vouza to Hector MacQuarrie, 1984. From The Guadalcanal Campaign.

    Letter from SgtMaj Vouza to Hector MacQuarrie, 1984. From The Guadalcanal Campaign.

     

  13. The Japanese occupation and the cargo cult started by Tagareb are described in detail in Road Belong Cargo, pages 98-110. 

  14. See "John Frum Movement in Tanna", Oceania, March 1952. The New York Times described the John Frum movement in detail in a 1970 article: "On a Pacific island, they wait for the G.I. who became a God". A more modern article (2006) on John Frum is In John They Trust in the Smithsonian Magazine.

    As for the identity of John Frum, some claim that his name is short for "John from America". Others claim it is a modification of "John Broom" who would sweep away the whites. These claims lack evidence. 

  15. The quote is from Pacific Islands Monthly, November 1945 (link). The National Library of Australia has an extensive collection of issues of Pacific Islands Monthly online. Searching these magazines for "cargo cult" provides an interesting look at how cargo cults were viewed as they happened. 

  16. Scientific American had a long article titled Cargo Cults in May 1959, written by Peter Worsley, who also wrote the classic book The Trumpet Shall Sound: A Study of 'Cargo' Cults in Melanesia. The article lists the following features of cargo cults:

    • Myth of the return of the dead
    • Revival or modification of paganism
    • Introduction of Christian elements
    • Cargo myth
    • Belief that Negroes will become white men and vice versa
    • Belief in a coming messiah
    • Attempts to restore native political and economic control
    • Threats and violence against white men
    • Union of traditionally separate and unfriendly groups

    Different cargo cults contained different subsets of these features but no specific feature The article is reprinted here; the detailed maps show the wide distribution of cargo cults. 

  17. See A Cargo Movement in the Eastern Central Highlands of New Guinea, Oceania, 1952. 

  18. The Attenborough Cargo Cult documentary can be watched on YouTube.

    I'll summarize some highlights with timestamps:
    5:20: A gate, palisade, and a cross all painted brilliant red.
    6:38: A cross decorated with a wooden bird and a shaving brush.
    7:00: A tall pole claimed to be a special radio mast to talk with John Frum.
    8:25: Interview with trader Bob Paul. He describes "troops" marching with wooden guns around the whole island.
    12:00: Preparation and consumption of kava, the intoxicating beverage.
    13:08: Interview with a local about John Frum.
    14:16: John Frum described as a white man and a big fellow.
    16:29: Attenborough asks, "You say John Frum has not come for 19 years. Isn't this a long time for you to wait?" The leader responds, "No, I can wait. It's you waiting for two thousand years for Christ to come and I must wait over 19 years." Attenborough accepts this as a fair point.
    17:23: Another scarlet gate, on the way to the volcano, with a cross, figure, and model airplane.
    22:30: Interview with the leader. There's a discussion of the radio, but Attenborough is not allowed to see it.
    24:21: John Frum is described as a white American.

    The expedition is also described in David Attenborough's 1962 book Quest in Paradise.  

  19. I have to criticize Mead's article for centering Americans as the heroes, almost a parody of American triumphalism. The title sets the article's tone: "Where Americans are Gods..." The article explains, "The Americans were lavish. They gave away Uncle Sam's property with a generosity which appealed mightily... so many kind, generous people, all alike, with such magnificent cargoes! The American servicemen, in turn, enjoyed and indulged the islanders."

    The article views cargo cults as a temporary stage before moving to a prosperous American-style society as islanders realized that "American things could come [...] only by work, education, persistence." A movement leader named Paliau is approvingly quoted: "We would like to have the things Americans have. [...] We think Americans have all these things because they live under law, without endless quarrels. So we must first set up a new society."

    On the other hand, by most reports, the Americans treated the residents of Melanesia much better than the colonial administrators. Americans paid the natives much more (which was viewed as overpaying them by the planters). The Americans treated the natives with much more respect; natives worked with Americans almost as equals. Finally, it appeared to the natives that black soldiers were treated as equals to white soldiers. (Obviously, this wasn't entirely accurate.)

    The Melanesian experience with Americans also strengthened Melanesian demands for independence. Following the war, the reversion to colonial administration produced a lot of discontent in the natives, who realized that their situation could be much better. (See World War II and Melanesian self-determination.) 

  20. The Johnson cult was analyzed in depth by Billings, an anthropologist who wrote about it in Cargo Cult as Theater: Political Performance in the Pacific. See also Australian Daily News, June 12, 1964, and Time Magazine, July 19, 1971. 

  21. In one unusual case, the islanders built an airstrip and airplanes did come. Specifically, the Miyanmin people of New Guinea hacked an airstrip out of the forest in 1966 using hand tools. The airstrip was discovered by a patrol and turned out to be usable, so Baptist missionaries made monthly landings, bringing medicine and goods for a store. It is pointed out that the only thing preventing this activity from being considered a cargo cult is that in this case, it was effective. See A Small Footnote to the 'Big Walk', p. 59. 

  22. See "New Guinea: Waiting for That Cargo", Time Magazine, July 19, 1971.  

  23. In this footnote, I'll list some interesting cargo cult stories that didn't fit into the body of the article.

    The 1964 US Bureau of Labor Statistics report on New Guinea describes cargo cults: "A simplified explanation of them is often given namely that contact with Western culture has given the indigene a desire for a better economic standard of living this desire has not been accompanied by the understanding that economic prosperity is achieved by human effort. The term cargo cult derives from the mystical expectation of the imminent arrival by sea or air of the good things of this earth. It is believed sufficient to build warehouses of leaves and prepare air strips to receive these goods. Activity in the food gardens and daily community routine chores is often neglected so that economic distress is engendered."

    Cargo Cult Activity in Tangu (Burridge) is a 1954 anthropological paper discussing stories of three cargo cults in Tangu, a region of New Guinea. The first involved dancing around a man in a trance, which was supposed to result in the appearance of "rice, canned meat, lava-lavas, knives, beads, etc." In the second story, villagers built a shed in a cemetery and then engaged in ritualized sex acts, expecting the shed to be filled with goods. However, the authorities forced the participants to dismantle the shed and throw it into the sea. In the third story, the protagonist is Mambu, who stowed away on a steamship to Australia, where he discovered the secrets of the white man's cargo. On his return, he collected money to help force the Europeans out, until he was jailed. He performed "miracles" by appearing outside jail as well as by producing money out of thin air.

    Reaction to Contact in the Eastern Highlands of New Guinea (Berndt, 1954) has a long story about Berebi, a leader who was promised a rifle, axes, cloth, knives, and valuable cowrie by a white spirit. Berebi convinces his villagers to build storehouses and they filled the houses with stones that would be replaced by goods. They take part in many pig sacrifices and various rituals, and endure attacks of shivering and paralysis, but they fail to receive any goods and Berebi concludes that the spirit deceived him. 

  24. Many anthropologists view the idea of cargo cults as controversial. One anthropologist states, "What I want to suggest here is that, similarly, cargo cults do not exist, or at least their symptoms vanish when we start to doubt that we can arbitrarily extract a few features from context and label them an institution." See A Note on Cargo Cults and Cultural Constructions of Change (1988). The 1992 paper The Yali Movement in Retrospect: Rewriting History, Redefining 'Cargo Cult' summarizes the uneasiness that many anthropologists have with the term "cargo cult", viewing it as "tantamount to an invocation of colonial power relationships."

    The book Cargo, Cult, and Culture Critique (2004) states, "Some authors plead quite convincingly for the abolition of the term itself, not only because of its troublesome implications, but also because, in their view, cargo cults do not even exist as an identifiable object of study." One paper states that the phrase is both inaccurate and necessary, proposing that it be written crossed-out (sous rature in Derrida's post-modern language). Another paper states: "Cargo cults defy definition. They are inherently troublesome and problematic," but concludes that the term is useful precisely because of this troublesome nature.

    At first, I considered the idea of abandoning the label "cargo cult" to be absurd, but after reading the anthropological arguments, it makes more sense. In particular, the category "cargo cult" is excessively broad, lumping together unrelated things and forcing them into a Procrustean ideal: John Frum has very little in common with Vaialala Madness, let alone the Johnson Cult. I think that the term "cargo cult" became popular due to its catchy, alliterative name. (Journalists love alliterations such as "Digital Divide" or "Quiet Quitting".) 

  25. It was clear to the natives that the ancestors, and not the Europeans, must have created the cargo because the local Europeans were unable to repair complex mechanical devices locally, but had to ship them off. These ships presumably took the broken devices back to the ancestral spirits to be repaired. Source: The Trumpet Shall Sound, p119. 

  26. The report from the 1943 patrol is discussed in Berndt's "A Cargo Movement in the Eastern Central Highlands of New Guinea", Oceania, Mar. 1953 (link), page 227. These radio houses are also discussed in The Trumpet Shall Sound, page 199. 

  27. Wooden airplanes are a staple of the pop-culture cargo cult story, but they are extremely rare in authentic cargo cults. I searched extensively, but could find just a few primary sources that involve airplanes.

    The closest match that I could find is Vanishing Peoples of the Earth, published by National Geographic in 1968, which mentions a New Guinea village that built a "crude wooden airplane", which they thought "offers the key to getting cargo".

    The photo below, from 1950, shows a cargo-house built in the shape of an airplane. (Note how abstract the construction is, compared to the realistic straw airplanes in faked photos.) The photographer mentioned that another cargo house was in the shape of a jeep, while in another village, the villagers gather in a circle at midnight to await the arrival of heavily laden cargo boats.

    The photo is from They Still Believe in Cargo Cult, Pacific Islands Monthly, May 1950.

    The photo is from They Still Believe in Cargo Cult, Pacific Islands Monthly, May 1950.

    David Attenborough's Cargo Cult documentary shows a small wooden airplane, painted scarlet red. This model airplane is very small compared to the mock airplanes described in the pop-culture cargo cult.

    A closeup of the model airplane. From Attenborough's Cargo Cult documentary.

    A closeup of the model airplane. From Attenborough's Cargo Cult documentary.

    The photo below shows the scale of the aircraft, directly in front of Attenborough. In the center, a figure of John Frum has a "scarlet coat and a white, European face." On the left, a cage contains a winged rat for some reason.

    David Attenborough visiting a John Frum monument on Tanna, near Sulfur Bay.
From Attenborough's Cargo Cult documentary.

    David Attenborough visiting a John Frum monument on Tanna, near Sulfur Bay. From Attenborough's Cargo Cult documentary.

     

  28. The photo below shows another scene from the movie Mondo Cane that is very popular online in cargo cult articles. I suspect that the airplane is not authentic but was made for the movie.

    Screenshot from Mondo Cane, 
 showing the cargo cultists posed in front of their airplane.

    Screenshot from Mondo Cane, showing the cargo cultists posed in front of their airplane.

     

  29. The tale of women pursuing men was described in detail in the 1929 anthropological book The Sexual Life of Savages in North-Western Melanesia, specifically the section "Yausa—Orgiastic Assaults by Women" (pages 231-234). The anthropologist heard stories about these attacks from natives, but didn't observe them firsthand and remained skeptical. He concluded that "The most that can be said with certainty is that the yausa, if it happened at all, happened extremely rarely". Unlike the portrayal in Mondo Cane, these attacks on men were violent and extremely unpleasant (I won't go into details). Thus, it is very likely that this scene in Mondo Cane was staged, based on the stories. 

  30. The movie Mondo Cane directly influenced the pop-culture cargo cult as shown by several books. The book River of Tears: The Rise of the Rio Tinto-Zinc Mining Corporation explains cargo cults and how one tribe built an "aeroplane on a hilltop to attract the white man's aeroplane and its cargo", citing Mondo Cane. Likewise, the book Introducing Social Change states that underdeveloped nations are moving directly from ships to airplanes without building railroads, bizarrely using the cargo cult scene in Mondo Cane as an example. Finally, the religious book Open Letter to God uses the cargo cult in Mondo Cane as an example of the suffering of godless people. 

  31. Another possibility is that Feynman got his cargo cult ideas from the 1974 book Cows, Pigs, Wars and Witches: The Riddle of Culture. It has a chapter "Phantom Cargo", which starts with a description suspiciously similar to the scene in Mondo Cane:

    The scene is a jungle airstrip high in the mountains of New Guinea. Nearby are thatch-roofed hangars, a radio shack, and a beacon tower made of bamboo. On the ground is an airplane made of sticks and leaves. The airstrip is manned twenty-four hours a day by a group of natives wearing nose ornaments and shell armbands. At night they keep a bonfire going to serve as a beacon. They are expecting the arrival of an important flight: cargo planes filled with canned food, clothing, portable radios, wrist watches, and motorcycles. The planes will be piloted by ancestors who have come back to life. Why the delay? A man goes inside the radio shack and gives instructions into the tin-can microphone. The message goes out over an antenna constructed of string and vines: “Do you read me? Roger and out.” From time to time they watch a jet trail crossing the sky; occasionally they hear the sound of distant motors. The ancestors are overhead! They are looking for them. But the whites in the towns below are also sending messages. The ancestors are confused. They land at the wrong airport.
     
  32. Some other uses of the radio telescope photo as a cargo-cult item are Cargo cults, Melanesian cargo cults and the unquenchable thirst of consumerism, Cargo Cult : Correlation vs. Causation, Cargo Cult Agile, Stop looking for silver bullets, and Cargo Cult Investing

  33. Chariots of the Gods claims to be showing a cargo cult from an isolated island in the South Pacific. However, the large succulent plants in the scene are Euphorbia ingens and tree aloe, which grow in southern Africa, not the South Pacific. The rock formations at the very beginning look a lot like Matobo Hills in Zimbabwe. Note that these "Stone Age" people are astounded by the modern world but ignore the cameraman who is walking among them.

    Many cargo cults articles use photos that can be traced back from this film, such as The Scrum Cargo Cult, Is Your UX Cargo Cult, The Remote South Pacific Island Where They Worship Planes, The Design of Everyday Games, Don’t be Fooled by the Bitcoin Core Cargo Cult, The Dying Art of Design, Retail Apocalypse Not, You Are Not Google, and Cargo Cults. The general theme of these articles is that you shouldn't copy what other people are doing without understanding it, which is somewhat ironic. 

  34. The Jargon File defined "cargo-cult programming" in 1991:

    cargo-cult programming: n. A style of (incompetent) programming dominated by ritual inclusion of code or program structures that serve no real purpose. A cargo-cult programmer will usually explain the extra code as a way of working around some bug encountered in the past, but usually, neither the bug nor the reason the code avoided the bug were ever fully understood.

    The term cargo-cult is a reference to aboriginal religions that grew up in the South Pacific after World War II. The practices of these cults center on building elaborate mockups of airplanes and military style landing strips in the hope of bringing the return of the god-like airplanes that brought such marvelous cargo during the war. Hackish usage probably derives from Richard Feynman's characterization of certain practices as "cargo-cult science" in `Surely You're Joking, Mr. Feynman'.

    This definition of "cargo-cult programming" came from a 1991 Usenet post to alt.folklore.computers, quoting Kent Williams. The definition was added to the much-expanded 1991 Jargon File, which was published as The New Hacker's Dictionary in 1993. 

  35. Overuse of the cargo cult metaphor isn't specific to programming, of course. The book Cargo Cult: Strange Stories of Desire from Melanesia and Beyond describes how "cargo cult" has been applied to everything from advertisements, social welfare policy, and shoplifting to the Mormons, Euro Disney, and the state of New Mexico.

    This book, by Lamont Linstrom, provides a thorough analysis of writings on cargo cults. It takes a questioning, somewhat trenchant look at these writings, illuminating the development of trends in these writings and the lack of objectivity. I recommend this book to anyone interested in the term "cargo cult" and its history. 

  36. Some more things that have been called "cargo cult" on Hacker News: the American worldview, ChatGPT fiction, copy and pasting code, hiring, HR, priorities, psychiatry, quantitative tests, religion, SSRI medication, the tech industry, Uber, and young-earth creationism

Alternatives to htmx

htmx is only one of many different libraries & frameworks that take the hypermedia oriented approach to building web applications. I have said before that I think the ideas of htmx / hypermedia are more important than htmx as an implementation.

Here are some of my favorite other takes on these ideas that I think are worth your consideration:

Unpoly

Unpoly is a wonderful, mature front end framework that has been used heavily (especially in the ruby community) for over a decade now. It offers best-in-class progressive enhancement and has many useful concepts such as layers and sophisticated form validation.

I interviewed the author, Henning Koch, here

You can see a demo application using Unpoly here.

Datastar

Datastar started life as a proposed rewrite of htmx in typescript and with modern tooling. It eventually became its own project and takes an SSE-oriented approach to hypermedia.

Datastar combines functionality found in both htmx and Alpine.js into a single, tidy package that is smaller than htmx.

You can see many examples of Datastar in action here.

Alpine-ajax

Speaking of Alpine (which is a common library to use in conjunction with htmx) you should look at Alpine AJAX, an Alpine plugin which integrates htmx-like concepts directly into Alpine.

If you are already and Alpine enthusiast, Alpine AJAX allows you to stay in that world.

You can see many examples of Alpine AJAX in action here.

Hotwire Turbo

Turbo is a component of the Hotwire set of web development technologies by 37Signals, of Ruby on Rails fame. It is a polished front end framework that is used heavily in the rails community, but can be used with other backend technologies as well.

Some people who have had a bad experience with htmx have enjoyed turbo.

htmz

htmz is a brilliant, tiny library that takes advantage of the fact that anchors and forms already have a target attribute that can target an iframe.

This, in combination with the location hash, is used to allow generalized transclusion.

This is the entire source of the library (I’m not joking):

  <iframe hidden name=htmz onload="setTimeout(()=>document.querySelector(contentWindow.location.hash||null)?.replaceWith(...contentDocument.body.childNodes))"></iframe>

Amazing!

TwinSpark

TwinSpark is a library created by Alexander Solovyov that is similar to htmx, and includes features such as morphing.

It is being used in production on sites with 100k+ daily users.

jQuery

Finally, good ol’ jQuery has the the load() function that will load a given url into an element. This method was part of the inspiration for intercooler.js, the precursor to htmx.

It is very simple to use:

  $( "#result" ).load( "ajax/test.html" );

and might be enough for your needs if you are already using jQuery.

Conclusion

I hope that if htmx isn’t right for your application, one of these other libraries might be useful in allowing you to utilize the hypermedia model. There is a lot of exciting stuff happening in the hypermedia world right now, and these libraries each contribute to that.

Finally, if you have a moment, please give them (especially the newer ones) a star on Github: as an open source developer I know that Github stars are one of the best psychological boosts that help keep me going.

What's involved in getting a "modern" terminal setup?

Hello! Recently I ran a terminal survey and I asked people what frustrated them. One person commented:

There are so many pieces to having a modern terminal experience. I wish it all came out of the box.

My immediate reaction was “oh, getting a modern terminal experience isn’t that hard, you just need to….”, but the more I thought about it, the longer the “you just need to…” list got, and I kept thinking about more and more caveats.

So I thought I would write down some notes about what it means to me personally to have a “modern” terminal experience and what I think can make it hard for people to get there.

what is a “modern terminal experience”?

Here are a few things that are important to me, with which part of the system is responsible for them:

  • multiline support for copy and paste: if you paste 3 commands in your shell, it should not immediatly run them all! That’s scary! (shell, terminal emulator)
  • infinite shell history: if I run a command in my shell, it should be saved forever, not deleted after 500 history entries or whatever. Also I want commands to be saved to the history immediately when I run them, not only when I exit the shell session (shell)
  • a useful prompt: I can’t live without having my current directory and current git branch in my prompt (shell)
  • 24-bit colour: this is important to me because I find it MUCH easier to theme neovim with 24-bit colour support than in a terminal with only 256 colours (terminal emulator)
  • clipboard integration between vim and my operating system so that when I copy in Firefox, I can just press p in vim to paste (text editor, maybe the OS/terminal emulator too)
  • good autocomplete: for example commands like git should have command-specific autocomplete (shell)
  • having colours in ls (shell config)
  • a terminal theme I like: I spend a lot of time in my terminal, I want it to look nice and I want its theme to match my terminal editor’s theme. (terminal emulator, text editor)
  • automatic terminal fixing: If a programs prints out some weird escape codes that mess up my terminal, I want that to automatically get reset so that my terminal doesn’t get messed up (shell)
  • keybindings: I want Ctrl+left arrow to work (shell or application)
  • being able to use the scroll wheel in programs like less: (terminal emulator and applications)

There are a million other terminal conveniences out there and different people value different things, but those are the ones that I would be really unhappy without.

how I achieve a “modern experience”

My basic approach is:

  1. use the fish shell. Mostly don’t configure it, except to:
    • set the EDITOR environment variable to my favourite terminal editor
    • alias ls to ls --color=auto
  2. use any terminal emulator with 24-bit colour support. In the past I’ve used GNOME Terminal, Terminator, and iTerm, but I’m not picky about this. I don’t really configure it other than to choose a font.
  3. use neovim, with a configuration that I’ve been very slowly building over the last 9 years or so (the last time I deleted my vim config and started from scratch was 9 years ago)
  4. use the base16 framework to theme everything

A few things that affect my approach:

  • I don’t spend a lot of time SSHed into other machines
  • I’d rather use the mouse a little than come up with keyboard-based ways to do everything
  • I work on a lot of small projects, not one big project

some “out of the box” options for a “modern” experience

What if you want a nice experience, but don’t want to spend a lot of time on configuration? Figuring out how to configure vim in a way that I was satisfied with really did take me like ten years, which is a long time!

My best ideas for how to get a reasonable terminal experience with minimal config are:

  • shell: either fish or zsh with oh-my-zsh
  • terminal emulator: almost anything with 24-bit colour support, for example all of these are popular:
    • linux: GNOME Terminal, Konsole, Terminator, xfce4-terminal
    • mac: iTerm (Terminal.app doesn’t have 256-colour support)
    • cross-platform: kitty, alacritty, wezterm, or ghostty
  • shell config:
    • set the EDITOR environment variable to your favourite terminal text editor
    • maybe alias ls to ls --color=auto
  • text editor: this is a tough one, maybe micro or helix? I haven’t used either of them seriously but they both seem like very cool projects and I think it’s amazing that you can just use all the usual GUI editor commands (Ctrl-C to copy, Ctrl-V to paste, Ctrl-A to select all) in micro and they do what you’d expect. I would probably try switching to helix except that retraining my vim muscle memory seems way too hard. Also helix doesn’t have a GUI or plugin system yet.

Personally I wouldn’t use xterm, rxvt, or Terminal.app as a terminal emulator, because I’ve found in the past that they’re missing core features (like 24-bit colour in Terminal.app’s case) that make the terminal harder to use for me.

I don’t want to pretend that getting a “modern” terminal experience is easier than it is though – I think there are two issues that make it hard. Let’s talk about them!

issue 1 with getting to a “modern” experience: the shell

bash and zsh are by far the two most popular shells, and neither of them provide a default experience that I would be happy using out of the box, for example:

  • you need to customize your prompt
  • they don’t come with git completions by default, you have to set them up
  • by default, bash only stores 500 (!) lines of history and (at least on Mac OS) zsh is only configured to store 2000 lines, which is still not a lot
  • I find bash’s tab completion very frustrating, if there’s more than one match then you can’t tab through them

And even though I love fish, the fact that it isn’t POSIX does make it hard for a lot of folks to make the switch.

Of course it’s totally possible to learn how to customize your prompt in bash or whatever, and it doesn’t even need to be that complicated (in bash I’d probably start with something like export PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ ', or maybe use starship). But each of these “not complicated” things really does add up and it’s especially tough if you need to keep your config in sync across several systems.

An extremely popular solution to getting a “modern” shell experience is oh-my-zsh. It seems like a great project and I know a lot of people use it very happily, but I’ve struggled with configuration systems like that in the past – it looks like right now the base oh-my-zsh adds about 3000 lines of config, and often I find that having an extra configuration system makes it harder to debug what’s happening when things go wrong. I personally have a tendency to use the system to add a lot of extra plugins, make my system slow, get frustrated that it’s slow, and then delete it completely and write a new config from scratch.

issue 2 with getting to a “modern” experience: the text editor

In the terminal survey I ran recently, the most popular terminal text editors by far were vim, emacs, and nano.

I think the main options for terminal text editors are:

  • use vim or emacs and configure it to your liking, you can probably have any feature you want if you put in the work
  • use nano and accept that you’re going to have a pretty limited experience (for example I don’t think you can select text with the mouse and then “cut” it in nano)
  • use micro or helix which seem to offer a pretty good out-of-the-box experience, potentially occasionally run into issues with using a less mainstream text editor
  • just avoid using a terminal text editor as much as possible, maybe use VSCode, use VSCode’s terminal for all your terminal needs, and mostly never edit files in the terminal. Or I know a lot of people use code as their EDITOR in the terminal.

issue 3: individual applications

The last issue is that sometimes individual programs that I use are kind of annoying. For example on my Mac OS machine, /usr/bin/sqlite3 doesn’t support the Ctrl+Left Arrow keyboard shortcut. Fixing this to get a reasonable terminal experience in SQLite was a little complicated, I had to:

  • realize why this is happening (Mac OS won’t ship GNU tools, and “Ctrl-Left arrow” support comes from GNU readline)
  • find a workaround (install sqlite from homebrew, which does have readline support)
  • adjust my environment (put Homebrew’s sqlite3 in my PATH)

I find that debugging application-specific issues like this is really not easy and often it doesn’t feel “worth it” – often I’ll end up just dealing with various minor inconveniences because I don’t want to spend hours investigating them. The only reason I was even able to figure this one out at all is that I’ve been spending a huge amount of time thinking about the terminal recently.

A big part of having a “modern” experience using terminal programs is just using newer terminal programs, for example I can’t be bothered to learn a keyboard shortcut to sort the columns in top, but in htop I can just click on a column heading with my mouse to sort it. So I use htop instead! But discovering new more “modern” command line tools isn’t easy (though I made a list here), finding ones that I actually like using in practice takes time, and if you’re SSHed into another machine, they won’t always be there.

everything affects everything else

Something I find tricky about configuring my terminal to make everything “nice” is that changing one seemingly small thing about my workflow can really affect everything else. For example right now I don’t use tmux. But if I needed to use tmux again (for example because I was doing a lot of work SSHed into another machine), I’d need to think about a few things, like:

  • if I wanted tmux’s copy to synchronize with my system clipboard over SSH, I’d need to make sure that my terminal emulator has OSC 52 support
  • if I wanted to use iTerm’s tmux integration (which makes tmux tabs into iTerm tabs), I’d need to change how I configure colours – right now I set them with a shell script that I run when my shell starts, but that means the colours get lost when restoring a tmux session.

and probably more things I haven’t thought of. “Using tmux means that I have to change how I manage my colours” sounds unlikely, but that really did happen to me and I decided “well, I don’t want to change how I manage colours right now, so I guess I’m not using that feature!”.

It’s also hard to remember which features I’m relying on – for example maybe my current terminal does have OSC 52 support and because copying from tmux over SSH has always Just Worked I don’t even realize that that’s something I need, and then it mysteriously stops working when I switch terminals.

change things slowly

Personally even though I think my setup is not that complicated, it’s taken me 20 years to get to this point! Because terminal config changes are so likely to have unexpected and hard-to-understand consequences, I’ve found that if I change a lot of terminal configuration all at once it makes it much harder to understand what went wrong if there’s a problem, which can be really disorienting.

So I usually prefer to make pretty small changes, and accept that changes can might take me a REALLY long time to get used to. For example I switched from using ls to eza a year or two ago and while I like it (because eza -l prints human-readable file sizes by default) I’m still not quite sure about it. But also sometimes it’s worth it to make a big change, like I made the switch to fish (from bash) 10 years ago and I’m very happy I did.

getting a “modern” terminal is not that easy

Trying to explain how “easy” it is to configure your terminal really just made me think that it’s kind of hard and that I still sometimes get confused.

I’ve found that there’s never one perfect way to configure things in the terminal that will be compatible with every single other thing. I just need to try stuff, figure out some kind of locally stable state that works for me, and accept that if I start using a new tool it might disrupt the system and I might need to rethink things.

Abaixo a positividade tóxica

O que promove confiança não é te dizerem que você vai conseguir (mesmo quando sabem que você provavelmente não vai) – o nome disso é positividade, frequentemente positividade tóxica, e raramente ajuda alguém.

Aliás, a positividade tóxica é aquela mesma atitude de insistir em ver o copo meio cheio quando na verdade o copo já está quebrado, totalmente seco e até foi recolhido pela turma da segurança no trabalho.

Também é insistir na fala vazia sobre fazer uma limonada, quando a vida só deu limões podres pra pessoa, não deu açúcar nenhum, não deu nem copo, e ainda a proibiu de usar facas.

Tetesto – deu pra perceber, né?

O que promove confiança é te mostrarem ou criarem condições para que, mesmo se você não conseguir desta vez, tudo bem – ou pelo menos que você terá oportunidade de tentar de novo tendo aprendido alguma coisa e sem ser prejudicado.

Isso é raro e é algo a ser valorizado.

O artigo "Abaixo a positividade tóxica" foi originalmente publicado no site TRILUX, de Augusto Campos.

Mentirinhas #2199

O post Mentirinhas #2199 apareceu primeiro em Mentirinhas.

Chess Zoo

The zoo takes special care to keep kings separated from opposite-color pieces as part of their conservation program to prevent mating in captivity.

A Real World wasm to htmx Port

When I was in college, I wrote some customer service software that tied together some custom AI models I trained, the OpenAI API, a database, and some social media APIs to make the first version of Sidekick.

Led astray

Over the next couple years I worked on adding more features and growing the user base. As a solo founder, I should have been focused on sales, marketing, and market discovery. Instead, as an engineer, I wanted to hand-craft the perfect web stack. I was firmly of the belief that the network gap between the frontend and the backend could be abstracted away, and I could make writing web apps as simple as writing native apps. Did this have anything to do with my business, product, or customers? Absolutely not, but as many technical founders do, I believed if I perfected the tech, the customers would materialize.

My design decisions were naive, but also reminiscent to what’s seen in industry today: I wanted the backend and frontend to share a language (Rust), I wanted compile-time checks across the network boundary, I wanted to write frontend code like it was an app (reactive), and I wanted nearly instant reload times. What I got out of it was a buggy mess.

I had invented a system where simple rust functions can be tagged with a macro to generate a backend route and a frontend request function, so you can call the function like it was a standard function, and it would run on the backend. A true poor-mans GraphQL. My desire to write Rust on the frontend required I compile a WASM bundle. My desire for instant load times required isomorphic SSR. All of this complexity, for what was essentially a simple CRUD site.

A better way

At this point Sidekick has grown and it now has a codebase which is responsible for not-insignificant volumes of traffic each day. There was this point where I looked into HTMX, multi-page websites, and HATEOAS, and realized the Sidekick codebase, which had grown into ~36k lines spread over 8 different crates, could be folded into a single crate, a single binary that ran the backend, which generated the frontend on demand through templating, and that HTMX could suffice for all the interactivity we required.

Large refactors typically have a bad track record so we wrote a quick and dirty simplified version of part of the site to convince ourselves it could work. After sufficient convincing, we undertook a full rewrite. All said and done, the rewrite took approximately 3 weeks of intense work. The results were dramatic:

  • 36k LOC -> 8k LOC
  • 8 crates -> 1 crate
  • ~5 bug reports / week -> ~1 bug report / week
  • More full nights of sleep

sidekick_port_loc.jpg

The rewrite went far better than I could have imagined. It definitely won’t be representative of every experience, our app was definitely uniquely suited to HTMX. Axum and some custom middleware also went a long way for sharing common infrastructure across the site. Though we don’t have proper metrics, we’ve anecdotally noticed significantly improved load times.

Reflection

I’ll finish by touching on the biggest benefit in my eyes: it’s tremendously easier to add new features as our customers request them. A feature that would have taken 2 weeks to fully implement, test and ship, now takes a day or two. As a small startup with a large number of customer demands, this is table stakes.

Sidekick hasn’t raised VC funding so I can’t afford to hire lots of devs. With HTMX we don’t need to.

Analog Memories Patreon // inprint

Analog Memories

Patreon // inprint

Uber's service migration strategy circa 2014.

In early 2014, I joined as an engineering manager for Uber’s Infrastructure team. We were responsible for a wide number of things, including provisioning new services. While the overall team I led grew significantly over time, the subset working on service provisioning never grew beyond four engineers.

Those four engineers successfully migrated 1,000+ services onto a new, future-proofed service platform. More importantly, they did it while absorbing the majority, although certainly not the entirety, of the migration workload onto that small team rather than spreading it across the 2,000+ engineers working at Uber at the time. Their strategy serves as an interesting case study of how a team can drive strategy, even without any executive sponsor, by focusing on solving a pressing user problem, and providing effective ergonomics while doing so.

Note that after this introductory section, the remainder of this strategy will be written from the perspective of 2014, when it was originally developed.

More than a decade later after this strategy was implemented, we have an interesting perspective to evaluate its impact. It’s fair to say that it had some meaningful, negative consequences by allowing the widespread proliferation of new services within Uber. Those services contributed to a messy architecture that had to go through cycles of internal cleanup over the following years.

As the principle author of this strategy, I’ve learned a lot from meditating on the fact that this strategy was wildly successful, that I think Uber is better off for having followed it, and that it also meaningfully degraded Uber’s developer experience over time. There’s both good and bad here; with a wide enough lens, all evaluations get complicated.

This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Reading this document

To apply this strategy, start at the top with Policy. To understand the thinking behind this strategy, read sections in reserve order, starting with Explore, then Diagnose and so on. Relative to the default structure, this document one tweak, folding the Operation section in with Policy.

More detail on this structure in Making a readable Engineering Strategy document.

Policy & Operation

We’ve adopted these guiding principles for extending Uber’s service platform:

  • Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers.

  • Self-service must be safely usable by a new hire without Uber context. It is possible today to make a Puppet or Clusto change while provisioning a new service that negatively impacts the production environment. This must not be true in any self-service solution.

  • Move to structured requests, and out of tickets. Missing or incorrect information in provisioning requests create significant delays in provisioning. Further, collecting this information is the first step of moving to a self-service process. As such, we can get paid twice by reducing errors in manual provisioning while also creating the interface for self-service workflows.

  • Prefer initializing new services with good defaults rather than requiring user input. Most new services are provisioned for new projects with strong timeline pressure but little certainty on their long-term requirements. These users cannot accurately predict their future needs, and expecting them to do so creates significant friction.

    Instead, the provisioning framework should suggest good defaults, and make it easy to change the settings later when users have more clarity. The gate from development environment to production environment is a particularly effective one for ensuring settings are refreshed.

We are materializing those principles into this sequenced set of tasks:

  1. Create an internal tool that coordinates service provisioning, replacing the process where teams request new services via Phabricator tickets. This new tool will maintain a schema of required fields that must be supplied, with the aim of eliminating the majority of back and forth between teams during service provisioning.

    In addition to capturing necessary data, this will also serve as our interface for automating various steps in provisioning without requiring future changes in the workflow to request service provisioning.

  2. Extend the internal tool will generate Puppet scaffolding for new services, reducing the potential for errors in two ways. First, the data supplied in the service provisioning request can be directly included into the rendered template. Second, this will eliminate most human tweaking of templates where typo’s can create issues.

  3. Port allocation is a particularly high-risk element of provisioning, as reusing a port can breaking routing to an existing production service. As such, this will be the first area we fully automate, with the provisioning service supply the allocated port rather than requiring requesting teams to provide an already allocated port.

    Doing this will require moving the port registry out of a Phabricator wiki page and into a database, which will allow us to guard access with a variety of checks.

  4. Manual assignment of new services to servers often leads to new serices being allocated to already heavily utilized servers. We will replace the manual assignment with an automated system, and do so with the intention of migrating to the Mesos/Aurora cluster once it is available for production workloads.

Each week, we’ll review the size of the service provisioning queue, along with the service provisioning time to assess whether the strategy is working or needs to be revised.

Prolonged strategy testing

Although I didn’t have a name for this practice in 2014 when we created and implemented this strategy, the preceeding paragraph captures an important truth of team-led bottom-up strategy: the entire strategy was implemented in a prolonged strategy testing phase.

This is an important truth of all low-attitude, bottom-up strategy: because you don’t have the authority to mandate compliance. An executive’s high-altitude strategy can be enforced despite not working due to their organizational authority, but a team’s strategy will only endure while it remains effective.

Refine

In order to refine our diagnosis, we’ve created a systems model for service onboarding. This will allow us to simulate a variety of different approaches to our problem, and determine which approach, or combination of approaches, will be most effective.

A systems model of provisioning services at Uber circa 2014.

As we exercised the model, it became clear that:

  1. we are increasingly falling behind,
  2. hiring onto the service provisioning team is not a viable solution, and
  3. moving to a self-service approach is our only option.

While the model writeup justifies each of those statements in more detail, we’ll include two charts here. The first chart shows the status quo, where new service provisioning requests, labeled as Initial RequestedServices, quickly accumulate into a backlog.

Initial diagram of Uber service provisioning model without error states.

Second, we have a chart comparing the outcomes between the current status quo and a self-service approach.

Chart showing impact of self-service provisioning on provisioning rate.

In that chart, you can see that the service provisioning backlog in the self-service model remains steady, as represented by the SelfService RequestedServices line. Of the various attempts to find a solution, none of the others showed promise, including eliminating all errors in provisioning and increasing the team’s capacity by 500%.

Diagnose

We’ve diagnosed the current state of service provisioning at Uber’s as:

  • Many product engineering teams are aiming to leave the centralized monolith, which is generating two to three service provisioning requests each week. We expect this rate to increase roughly linearly with the size of the product engineering organization.

    Even if we disagree with this shift to additional services, there’s no team responsible for maintaining the extensibility of the monolith, and working in the monolith is the number one source of developer frustration, so we don’t have a practical counter proposal to offer engineers other than provisioning a new service.

  • The engineering organization is doubling every six months. Consequently, a year from now, we expect eight to twelve service provisioning requests every week.

  • Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.

    Some additional headcount is being allocated to Service Reliability Engineers (SREs) who can take on the most nuanced, complicated service provisioning work. However, their bandwidth is already heavily constrained across many tasks, so relying on SRES is an insufficient solution.

  • The queue for service provisioning is already increasing in size as things are today. Barring some change, many services will not be provisioned in a timely fashion.

  • Today, provisioning a new service takes about a week, with numerous round trips between the requesting team and the provisioning team. Missing and incorrect information between teams is the largest source of delay in provisioning services.

    If the provisioning team has all the necessary information, and it’s accurate, then a new service can be provisioned in about three to four hours of work across configuration in Puppet, metadata in Clusto, allocating ports, assigning the service to servers, and so on.

  • There are few safe guards on port allocation, server assignment and so on. It is easy to inadvertently cause a production outage during service provisioning unless done with attention to detail.

    Given our rate of hiring, training the engineering organization to use this unsafe toolchain is an impractical solution: even if we train the entire organization perfectly today, there will be just as many untrained individuals in six months. Further, there’s product engineering leadership has no interest in their team being diverted to service provisioning training.

  • It’s widely agreed across the infrastructure engineering team that essentially every component of service provisioning should be replaced as soon as possible, but there is no concrete plan to replace any of the core components. Further, there is no team accountable for replacing these components, which means the service provisioning team will either need to work around the current tooling or replace that tooling ourselves.

  • It’s urgent to unblock development of new services, but moving those new services to production is rarely urgent, and occurs after a long internal development period. Evidence of this is that requests to provision a new service generally come with significant urgency and internal escalations to management. After the service is provisioned for development, there are relatively few urgent escalations other than one-off requests for increased production capacity during incidents.

  • Another team within infrastructure is actively exploring adoption of Mesos and Aurora, but there’s no concrete timeline for when this might be available for our usage. Until they commit to supporting our workloads, we’ll need to find an alternative solution.

Explore

Uber’s server and service infrastructure today is composed of a handful of pieces. First, we run servers on-prem within a handful of colocations. Second, we describe each server in Puppet manifests to support repeatable provisioning of servers. Finally, we manage fleet and server metadata in a tool named Clusto, originally created by Digg, which allows us to populate Puppet manifests with server and cluster appropriate metadata during provisioning. In general, we agree that our current infrastructure is nearing its end of lifespan, but it’s less obvious what the appropriate replacements are for each piece.

There’s significant internal opposition to running in the cloud, up to and including our CEO, so we don’t believe that will change in the forseeable future. We do however believe there’s opportunity to change our service definitions from Puppet to something along the lines of Docker, and to change our metadata mechanism towards a more purpose-built solution like Mesos/Aurora or Kubernetes.

As a starting point, we find it valuable to read Large-scale cluster management at Google with Borg which informed some elements of the approach to Kubernetes, and Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center which describes the Mesos/Aurora approach.

If you’re wondering why there’s no mention of Borg, Omega, and Kubernetes, it’s because it wasn’t published until 2016, a year after this strategy was developed.

Within Uber, we have a number of ex-Twitter engineers who can speak with confidence to their experience operating with Mesos/Aurora at Twitter. We have been unable to find anyone to speak with that has production Kubernetes experience operating a comparably large fleet of 10,000+ servers, although presumably someone is operating–or close to operating–Kuberenetes at that scale.

Our general belief of the evolution of the ecosystem at the time is described in this Wardley mapping exercise on service orchestration (2014).

Wardley map of evolution of service orchestration in 2014

One of the unknowns today is how the evolution of Mesos/Aurora and Kubernetes will look in the future. Kubernetes seems promising with Google’s backing, but there are few if any meaningful production deployments today. Mesos/Aurora has more community support and more production deployments, but the absolute number of deployments remains quite small, and there is no large-scale industry backer outside of Twitter.

Even further out, there’s considerable excitement around “serverless” frameworks, which seem like a likely future evolution, but canvassing the industry and our networks we’ve simply been unable to find enough real-world usage to make an active push towards this destination today.

Wardley mapping is introduced as one of the techniques for strategy refinement, but it can also be a useful technique for exploring an dyanmic ecosystem like service orchestration in 2014.

Assembling each strategy requires exercising judgment on how to compile the pieces together most usefully, and in this case I found that the map fits most naturally in with the rest of exploration rather than in the more operationally-focused refinement section.

Service onboarding model for Uber (2014).

At the core of Uber’s service migration strategy (2014) is understanding the service onboarding process, and identifying the levers to speed up that process. Here we’ll develop a system model representing that onboarding process, and exercise the model to test a number of hypotheses about how to best speed up provisioning.

In this chapter, we’ll cover:

  1. Where the model of service onboarding suggested we focus on efforts
  2. Developing a system model using the lethain/systems package on Github. That model is available in the lethain/eng-strategy-models repository
  3. Exercising that model to learn from it

Let’s figure out what this model can teach us.

This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Learnings

Even if we model this problem with a 100% success rate (e.g. no errors at all), then the backlog of requested new services continues to increase over time. This clarifies that the problem to be solved is not the quality of service the service provisioning team is providing, but rather that the fundamental approach is not working.

Initial diagram of Uber service provisioning model without error states.

Although hiring is tempting as a solution, our model suggests it is not a particularly valuable approach in this scenario. Even increasing the Service Provisioning team’s staff allocated to manually provisioning services by 500% doesn’t solve the backlog of incoming requests.

Chart showing impact of increased infrastructure engineering hiring on service provisioning.

If reducing errors doesn’t solve the problem, and increased hiring for the team doesn’t solve the problem, then we have to find a way to eliminate manual service provisioning entirely. The most promising candidate is moving to a self-service provisioning model, which our model shows solves the backlog problem effectively.

Chart showing impact of self-service provisioning on provisioning rate.

Refining our earlier statement, additional hiring may benefit the team if we are able to focus those hires on building self-service provisioning, and were able to ramp their productivity faster than the increase of incoming service provisioning requests.

Sketch

Our initial sketch of service provisioning is a simple pipieline starting with requested services and moving step by step through to server capacity allocated. Some of these steps are likely much slower than others, but it gives a sense of the stages and where things might go wrong. It also gives us a sense of what we can measure to evaluate if our approach to provisioning is working well.

A systems model of provisioning services at Uber circa 2014.

One element worth mentioning are the dotted lines from hiring rate to product engineers and from product engineers to requested services. These are called links, which are stocks that influence another stock, but don’t flow directly into them.

A purist would correctly note that links should connect to flows rather than stocks. That is true! However, as we’ll encounter when we convert this sketch into a model, there are actually several counterintuitive elemnents here that are necessary to model this system but make the sketch less readable. As a modeler, you’ll frequently encounter these sorts of tradeoffs, and you’ll have to decide what choices serve your needs best in the moment.

The biggest missing element the initial model is missing is error flows, where things can sometimes go wrong in addition to sometimes going right. There are many ways things can go wrong, but we’re going to focus on modeling three error flows in particular:

  1. Missing/incorrect information occurs twice in this model, and throws a provisioning request back into the initial provisioning phase where information is collected.

    When this occurs during port assignment, this is a relatively small trip backwards. However, when it occurs in Puppet configuration, this is a significantly larger step backwards.

  2. Puppet error occurs in the second to final stock, Puppet configuration tested & merged. This sends requests back one step in the provisioning flow.

Updating our sketch to reflect these flows, we get a fairly complete, and somewhat nuanced, view of the service provisioning flow.

A systems model of provisioning services at Uber circa 2014, with error transitions

Note that the combination of these two flows introduces the possibility of a service being almost fully provisioned, but then traveling from Puppet testing back to Puppet configuration due to Puppet error, and then backwards again to the intial step due to Missing/incorrect information. This means it’s possible to lose almost all provisioning progress if everything goes wrong.

There are more nuances we could introduce here, but there’s already enough complexity here for us to learn quite a bit from this model.

Reason

Studying our sketches, a few things stands out:

  1. The hiring of product engineers is going to drive up service provisioning requests over time, but there’s no counterbalancing hiring of infrastructure engineers to work on service provisioning. This means there’s an implicit, but very real, deadline to scale this process independently of the size of the infrastructure engineering team.

    Even without building the full model, it’s clear that we have to either stop hiring product engineers, turn this into a self-service solution, or find a new mechanism to discourage service provisioning.

  2. The size of error rates are going to influence results a great deal, particularly those for Missing/incorrect information. This is probably the most valuable place to start looking for efficiency improvements.

  3. Missing information errors are more expensive than the model implies, because they require coordination across teams to resolve. Conversely, Puppet testing errors are probably cheaper than the model implies, because they should be solvable within the same team and consequently benefit from a quick iteration loop.

Now we need to build a model that helps guide our inquiry into those questions.

Model

You can find the full implementation of this model on Github if you want to see the entirety rather than these emphasized snippets.

First, let’s get the success states working:

HiringRate(10)
ProductEngineers(1000)
[PotentialHires] > ProductEngineers @ HiringRate
[PotentialServices] > RequestedServices(10) @ ProductEngineers / 10
RequestedServices > InflightServices(0, 10) @ Leak(1.0)
InflightServices > PortNameAssigned @ Leak(1.0)
PortNameAssigned > PuppetGenerated @ Leak(1.0)
PuppetGenerated > PuppetConfigMerged @ Leak(1.0)
PuppetConfigMerged > ServerCapacityAllocated @ Leak(1.0)

As we run this model, we can see that the number of requested services grows significantly over time. This makes sense, as we’re only able to provision a maximum of ten services per round.

Initial diagram of Uber service provisioning model without error states.

However, it’s also the best case, because we’re not capturing the three error states:

  1. Unique port and name assignment can fail because of missing or incorrect information
  2. Puppet configuration can also fail due to missing or incorrect information.
  3. Puppet configurations can have errors in them, requiring rework.

Let’s update the model to include these failure modes, starting with unique port and name assignment. The error-free version looks like this:

InflightServices > PortNameAssigned @ Leak(1.0)

Now let’s add in an error rate, where 20% of requests are missing information and return to inflight services stock.

PortNameAssigned > PuppetGenerated @ Leak(0.8)
PortNameAssigned > RequestedServices @ Leak(0.2)

Then let’s do the same thing for puppet configuration errors:

# original version
PuppetGenerated > PuppetConfigMerged @ Leak(1.0)
# updated version with errors
PuppetGenerated > PuppetConfigMerged @ Leak(0.8)
PuppetGenerated > InflightServices @ Leak(0.2)

Finally, we’ll make a similar change to represent errors made in the Puppet templates themselves:

# original version
PuppetConfigMerged > ServerCapacityAllocated @ Leak(1.0)
# updated version with errors
PuppetConfigMerged > ServerCapacityAllocated @ Leak(0.8)
PuppetConfigMerged > PuppetGenerated @ Leak(0.2)

Even with relatively low error rates, we can see that the throughput of the system overall has been meaningfully impacted by introducing these errors.

Updated diagram of Uber service provisioning model with error states.

Now that we have the foundation of the model built, it’s time to start exercising the model to understand the problem space a bit better.

Exercise

We already know the errors are impacting throughput, but let’s start by narrowing down which of errors matter most by increasing the error rate for each of them independently and comparing the impact.

To model this, we’ll create three new specifications, each of which increases one error from from 20% error rate to 50% error rate, and see how the overall throughput of the system is impacted:

# test 1: port assignment errors increased
PortNameAssigned > PuppetGenerated @ Leak(0.5)
PortNameAssigned > RequestedServices @ Leak(0.5)
# test 2: puppet generated errors increased
PuppetGenerated > PuppetConfigMerged @ Leak(0.5)
PuppetGenerated > InflightServices @ Leak(0.5)
# test 3: puppet merged errors increased
PuppetConfigMerged > ServerCapacityAllocated @ Leak(0.5)
PuppetConfigMerged > PuppetGenerated @ Leak(0.5)

Comparing the impact of increasing the error rates from 20% to 50% in each of the three error loops, we can get a sense of the model’s sensitivity to each error.

Chart showing impact of increased error rates in different stages of provisioning.

This chart captures why exercising is so impactful: we’d assumed during sketching that errors in puppet generation would matter the most because they caused a long trip backwards, but it turns out a very high error rate early in the process matters even more because there are still multiple other potential errors later on that compound on its increase.

Next we can get a sense of the impact of hiring more people onto the service provisioning team to manually provision more services, which we can model by increasing the maximum size of the inflight services stock from 10 to 50.

# initial model
RequestedServices > InflightServices(0, 10) @ Leak(1.0)
# with 5x capacity!
RequestedServices > InflightServices(0, 50) @ Leak(1.0)

Unfortunately, we can see that even increasing the team’s capacity by 500% doesn’t solve the backlog of requested services.

Chart showing impact of increased infrastructure engineering hiring on service provisioning.

There’s some impact, but that much, and the backlog of requested services remains extremely high. We can conclude that more infrastructure hiring isn’t the solution we need, but let’s see if moving to self-service is a plausible solution.

We can simulate the impact of moving to self-service by removing the maximum size from inflight services entirely:

# initial model
RequestedServices > InflightServices(0, 10) @ Leak(1.0)
# simulating self-service
RequestedServices > InflightServices(0) @ Leak(1.0)

We can see this finally solves the backlog.

Chart showing impact of self-service provisioning on provisioning rate.

At this point, we’ve exercised the model a fair amount and have a good sense of what it wants to tell us. We know which errors matter the most to invest in early, and we also know that we need to make the move to a self-service platform sometime soon.

Trimix

You don't want the nitrogen percentage to be too high or you run the risk of eutrophication.

Moon and Venus

A thin crescent moon and the bright dot of Venus shine through thin clouds. In the foreground, you can just make out the shapes of trees.

Throwback to 2019. I took this photo of the Moon and Venus with my old DSLR. It’s not perfect, but it really shows the benefits of a good lens and camera over a cellphone. In this case, I’m using my 100mm Canon Macro lens. The same one I used for taking photos of the eclipse in 2024.

Fixing Up Japanese Language Tags

Two years ago, I took a trip to Japan with Tess. I wrote up my experiences on that trip in various posts, and shared several photos.

Feeling perhaps overly-confident in my ability to read Japanese and my website engine’s ability to handle non-ASCII content, I tagged many of those pages with Japanese language tags: 日本, 日本語, 東京, 京都.

This led to some unweildy URLs because of how Hugo rendered the Japanese language tags names into ASCII when generating file paths.

<ul>
    <li><a href="/tags/nature/">Nature</a></li>
    <li><a href="/tags/temples/">Temples</a></li>
    <li><a href="/tags/%E6%97%A5%E6%9C%AC/">日本</a></li>
</ul>

It’s not necessary to convert URLs to punycode if your document is UTF8, but Hugo does it. Browsers should automatically handle converting to and from punycode when sending HTTP requests over the wire. Nevertheless…

I went through my post tags and updated all of the Japanese language tags to have English slugs and Japanese titles. Now, these URLs are more friendly for typing (and the file paths are easier to navigate in a terminal) while still showing up in posts and tag list pages with the Japanese name.

<ul>
    <li><a href="/tags/meta/">Meta</a></li>
    <li><a href="/tags/hugo/">Hugo</a></li>
    <li><a href="/tags/japanese/">日本語</a></li>
</ul>

#One More Thing

I often tag posts with tags by location. For example, several of my posts from my Japan trip in 2023 are tagged with /tags/japan and with (e.g.) /tags/tokyo. For the place tags that are within a large region, like cities within countries, I added the larger region to the tag:

  • /tags/tokyo/tags/japan-tokyo
  • /tags/kyoto/tags/japan-kyoto

I only did this for the Japan tags for now. I’ll be making more such updates going forward.

Cícero #88

O post Cícero #88 apareceu primeiro em Mentirinhas.

Great things about Rust that aren't just performance

Nearly every line of code I write for fun is in Rust. It's not because I need great performance, though that's a nice benefit. I write a lot of Rust because it's a joy to write code in. There is so much else to love about Rust beyond going fast without segfaults.

Here are a few of my favorite things about it. Note that these are not unique to Rust by any stretch! Other languages have similar combinations of features.

Expressive type safety

There are two aspects of Rust's type system that I really enjoy: type safety and expressiveness.

I got a taste of this expressiveness back when I learned Haskell, and had been seeking it. I found it in Rust. One of the other languages I use a fair amount at work1 is Go, and its type system is much harder for me to express ideas in. You can do it, but you're not getting the type system's help. Rust lets you put your design straight into types, with enums and structs and traits giving you a lot of room to maneuver.

All the while, it's also giving you good type safety! I can express a lot in Python, but I don't trust the code as much without robust tests. You don't have a compiler checking your work! It's remarkably helpful having Rust's compiler by your side, making sure that you're using types correctly and satisfying constraints on things. To call back to data races, the type system is one of the reasons we can prevent those! There are traits that tell you whether or not data is safe to send to another thread or to share with another thread. If your language doesn't have the equivalent of these traits, then you're probably relying on the programmer to ensure those properties!

That said, Rust's type system isn't an unmitigated good for me. It can take longer to get something up and running in Rust than in Python, for example, because of the rigidity of the type system: satisfy it or you don't run. And I find a lot of Rust that uses generics is very hard to read, feeling like it is a soup of traits. What we make generic is an implementation question, and a cultural question, so that isn't necessarily inherent to the language but does come strongly bundled to it.

It doesn't crash out as much

Okay, I have a beef with Go. They included Tony Hoare's "billion-dollar mistake": null pointers. Go gives you pointers, and they can be null2! This means that you can try to invoke methods on a null pointer, which can crash your program.

In contrast, Rust tries very very hard to make you never crash. You can make null pointers, but you have to use unsafe and if you do that, well, you're taking on the risk. If you have something which is nullable, you'd use an Option of it and then the type system will make sure you handle both cases.

The places you typically see crashes in Rust are when someone either intentionally panics, for an unrecoverable error, or when they unintentionally panic, if they use unwrap on an Option or Result. It's better to handle the other case explicitly.

Fortunately, you can configure the linter, clippy, to deny code that uses unwrap (or expect)! If you add this to your Cargo.toml file, it will reject any code which uses unwrap.

[lints.clippy]
unwrap_used = "deny"

Data race resistance

It's so hard to write concurrent code that works correctly. Data races are one of the biggest factors contributing to this. Rust's data race prevention is an incredible help for writing concurrent code.

Rust isn't immune to data races, but you have to work harder to make one happen. They're almost trivial to introduce in most languages, but in Rust, it's a lot harder! This happens because of the borrow checker, so it's harder to have multiple concurrent actors racing on the same data.

You get more control, when you want

With Rust, you know a lot more about what the CPU and memory will be doing than in many other languages. You can know this with C and C++, and newer systems programming languages like Zig. Rust is a little unique, to me, in being notably higher level than these languages while giving you ultimately the same amount of control (if you break glass enough, for some things).

You're still subject to the operating system, most of the time, so you can't control the CPU and memory fully, but you get a lot more control than in Python. Or even than Go, another language used when you need good performance.

This lets you predict what your code is going to do. You're not going to have surprise pauses for the garbage collector, and you're not going to have the runtime scheduler put some tasks off for a while. Instead, you know (or can determine) when memory will be deallocated. And you ultimately control when threads take tasks (though with async and the Tokio runtime, this gets much muddier and you do lose some of this control).

This predictability is really nice. It's useful in production, but it's also just really pleasant and comforting.

Mixing functional and imperative

Rust lets you write in a functional programming style, and it also lets you write in an imperative programming style. Most idiomatic code tends toward functional style, but there's a lot of code that uses imperative style effectively as well!

This is pretty unique in my experience, and I really like it. It means that I, the programmer, can pick the paradigm that best fits the problem at hand at any given moment. Code can be more expressive and clearer for the author and the team working on the code.

One of the cool things here, too, is that the two paradigms effectively translate between each other! If you use iterators in Rust, they convert into the same compiled binary as the imperative code. You often don't lose any efficiency from using either approach, and you're truly free to express yourself!

Helpful compiler errors

A few other languages are renowned for their error message quality—Elm comes to mind. Rust is a standout here, as well.

Earlier in my career, I was abused by C++. Besides all the production crashes and the associated stress of that, the compiler errors for it were absolutely inscrutable. If you messed up a template, you'd sometimes get thousands of lines of errors—from one missing semicolon. And those wouldn't tell you what the error was, but rather, what came after the mistake.

In contrast, Rust's compiler error messages are usually pretty good at telling you exactly what the error is. They even provide suggestions for how to fix it, and where to read more about the error. Sometimes you get into a funny loop with these, where following the compiler's suggestions will lead to a loop of suggesting you change it another way and never get it to succeed, but that's fine. The fact that they're often useful is remarkable!

It's fun!

This is the big one for me. It's very subjective! I really like Rust for all the reasons listed above, and many more I've forgotten.

There are certainly painful times, and learning to love the borrow checker is a process (but one made faster if you've been abused by C++ before). But on balance, using Rust has been great.

It's fun having the ability to go ripping fast even when you don't need to. It's lovely having a type system that lets you express yourself (even if it lets you express yourself too much sometimes). The tooling is a joy.

All around, there's so much to love about Rust. The performance and safety are great, but they're the tip of the iceberg, and it's a language worth considering even when you don't need top performance.


1

Go was introduced at work because I advocated for it! I'd probably use Go for fun sometimes if I were not getting enough of it at my day job. It's a remarkably useful language, just not my favorite type system.

2

They're expressed as nil in Go, which is the zero value, and is the equivalent of null elsewhere.

A atenção é mesmo o grande diferencial do século 21

Querido diário, eu fiz 51 anos, então você vai ter que me perdoar por já ter começado a ter opiniões sobre farmácias.

Após passar anos pandêmicos usando quase só a farmácia do posto de gasolina (pela literal conveniência de aproveitar a viagem para fazer mais coisas), ela começou a frequentemente me deixar na mão, e de nada vale a conveniência se eu sempre acabo tendo de ir a uma segunda farmácia logo depois.

Hoje era dia de repor os remédios de uso contínuo e, mesmo tendo abastecido o carro, eu não fui nela, e optei por ir diretamente na Rua das Três Farmácias – uma rua aqui perto que, curiosamente, tem 3 farmácias quase vizinhas.

A Rua das Três Farmácias é um pouco contra-mão pra mim. Literalmente, até, pois preciso ir adiante e depois fazer um retorno. Só que a conveniência de saber que é quase nula a chance de sair de lá sem algum remédio da minha lista compensa – se falta em uma, complemento nas outras duas.

Escolhas são renúncias, e com a Rua das Três Farmácias não seria diferente: eu preciso escolher em qual farmácia entrarei primeiro. A escolhida tem grande chance de ser a que me fornecerá todos, ou a maioria dos medicamentos que eu fui buscar, mas todas tem seus méritos:

  • a primeira é da mesma rede gaúcha da farmácia do posto, mas tem o estacionamento mais conveniente;
  • a do meio é de uma rede da minha cidade natal, e geralmente tem a maior variedade e suprimento;
  • a terceira é da rede nacional inimiga da privacidade, tem os melhores preços, e fica convenientemente localizada em frente a uma lanchonete com ótimos pasteis.

Quando estou em situação pastelífera, a terceira sempre ganha. Se está chovendo forte, o estacionamento da primeira fica especialmente convidativo. Mas nos demais dias, o embate sempre foi direto e franco entre elas, com chances para todas.

Só que hoje aconteceu algo que pode ter mudado definitivamente as minhas escolhas futuras: eu resolvi arbitrariamente parar na farmácia do meio, e o atendente não estava atrás do balcão, e sim perto da porta. Ele deve ter me reconhecido de visita anterior, pois logo perguntou: “trouxe aquela tua listinha?”.

É, querido diário, eu sou organizado, levo a listinha em papel. Entreguei a ele, que me disse: “pode aguardar aqui mesmo”, e assim fiquei junto ao caixa, na entrada na loja, sem ter que ir ao balcão, nos fundos.

Ele voltou do estoque trazendo tudo que eu precisava, em alguns casos com mais de uma alternativa (de marca ou genérico? pra 30 ou 60 dias? etc.), e nesses casos já me dizia qual a alternativa de menor custo por dose.

Ou seja: fez tudo que poderia ser esperado de um profissional. Parecia até aquele atendimento que hoje é tão raro, que meus pais e avós tinham na única farmácia em que iam ao longo da vida inteira, quando os atendentes trabalhavam lá por longos anos e conheciam os hábitos e preferências dos clientes.

Hoje o mercado de trabalho e a prestação de serviços não são mais assim – não é culpa dos atendentes. Mas aí que está: nesse caso específico, ficou bem perto disso. A atenção desse atendente certamente me fará voltar lá mais vezes.

Exceto, é claro, se eu estiver precisando de pastel.

O artigo "A atenção é mesmo o grande diferencial do século 21" foi originalmente publicado no site TRILUX, de Augusto Campos.

Notes on 2025W01

The first week of 2025 and I am still thoroughly in vacation mode. Let me the one millionth person to say: Happy New Year, Feliz Año Nuevo, and あけましておめでとうakemashite omedetou.

Tess and I returned from our holiday trip to Massachusetts with Erin. We had a great time visiting Tess’ family. Erin and I were both excited to see a little snow too.

We spent New Years with some close friends eating caviar and amazing roast beef, and playing Sorry! with the girls. In the morning, we went over to some other friends’ place for brunch.

The remainder of the week, an attempt was made to relax, but got hung up dealing with a case of head lice, moving logistics, and some other things.

Despite all that, I played a lot of Caves of Qud.


I published some more catch-up posts and photos for 2024:


I added my StoryGraph profile to my Where Am I page. Find me there if you want to see what I’m reading.


And now, some links:

  • modernity is stupid is a great rant about technology, politics, and the state of world. I read it a few weeks ago but forgot to share it.
  • An Unreasonable Amount of Time by Allen Pike, suggesting that magic, in the Penn and Teller sense, is simply devoting far more time to something than most other people would consider reasonable.

Pi in the Pentium: reverse-engineering the constants in its floating-point unit

Intel released the powerful Pentium processor in 1993, establishing a long-running brand of high-performance processors.1 The Pentium includes a floating-point unit that can rapidly compute functions such as sines, cosines, logarithms, and exponentials. But how does the Pentium compute these functions? Earlier Intel chips used binary algorithms called CORDIC, but the Pentium switched to polynomials to approximate these transcendental functions much faster. The polynomials have carefully-optimized coefficients that are stored in a special ROM inside the chip's floating-point unit. Even though the Pentium is a complex chip with 3.1 million transistors, it is possible to see these transistors under a microscope and read out these constants. The first part of this post discusses how the floating point constant ROM is implemented in hardware. The second part explains how the Pentium uses these constants to evaluate sin, log, and other functions.

The photo below shows the Pentium's thumbnail-sized silicon die under a microscope. I've labeled the main functional blocks; the floating-point unit is in the lower right. The constant ROM (highlighted) is at the bottom of the floating-point unit. Above the floating-point unit, the microcode ROM holds micro-instructions, the individual steps for complex instructions. To execute an instruction such as sine, the microcode ROM directs the floating-point unit through dozens of steps to compute the approximation polynomial using constants from the constant ROM.

Die photo of the Intel Pentium processor with the floating point constant ROM highlighted in red. Click this image (or any other) for a larger version.

Die photo of the Intel Pentium processor with the floating point constant ROM highlighted in red. Click this image (or any other) for a larger version.

Finding pi in the constant ROM

In binary, pi is 11.00100100001111110... but what does this mean? To interpret this, the value 11 to the left of the binary point is simply 3 in binary. (The "binary point" is the same as a decimal point, except for binary.) The digits to the right of the binary point have the values 1/2, 1/4, 1/8, and so forth. Thus, the binary value `11.001001000011... corresponds to 3 + 1/8 + 1/64 + 1/4096 + 1/8192 + ..., which matches the decimal value of pi. Since pi is irrational, the bit sequence is infinite and non-repeating; the value in the ROM is truncated to 67 bits and stored as a floating point number.

A floating point number is represented by two parts: the exponent and the significand. Floating point numbers include very large numbers such as 6.02×1023 and very small numbers such as 1.055×10−34. In decimal, 6.02×1023 has a significand (or mantissa) of 6.02, multiplied by a power of 10 with an exponent of 23. In binary, a floating point number is represented similarly, with a significand and exponent, except the significand is multiplied by a power of 2 rather than 10. For example, pi is represented in floating point as 1.1001001...×21.

The diagram below shows how pi is encoded in the Pentium chip. Zooming in shows the constant ROM. Zooming in on a small part of the ROM shows the rows of transistors that store the constants. The arrows point to the transistors representing the bit sequence 11001001, where a 0 bit is represented by a transistor (vertical white line) and a 1 bit is represented by no transistor (solid dark silicon). Each magnified black rectangle at the bottom has two potential transistors, storing two bits. The key point is that by looking at the pattern of stripes, we can determine the pattern of transistors and thus the value of each constant, pi in this case.

A portion of the floating-point ROM, showing the value of pi. Click this image (or any other) for a larger version.

A portion of the floating-point ROM, showing the value of pi. Click this image (or any other) for a larger version.

The bits are spread out because each row of the ROM holds eight interleaved constants to improve the layout. Above the ROM bits, multiplexer circuitry selects the desired constant from the eight in the activated row. In other words, by selecting a row and then one of the eight constants in the row, one of the 304 constants in the ROM is accessed. The ROM stores many more digits of pi than shown here; the diagram shows 8 of the 67 significand bits.

Implementation of the constant ROM

The ROM is built from MOS (metal-oxide-semiconductor) transistors, the transistors used in all modern computers. The diagram below shows the structure of an MOS transistor. An integrated circuit is constructed from a silicon substrate. Regions of the silicon are doped with impurities to create "diffusion" regions with desired electrical properties. The transistor can be viewed as a switch, allowing current to flow between two diffusion regions called the source and drain. The transistor is controlled by the gate, made of a special type of silicon called polysilicon. Applying voltage to the gate lets current flow between the source and drain, which is otherwise blocked. Most computers use two types of MOS transistors: NMOS and PMOS. The two types have similar construction but reverse the doping; NMOS uses n-type diffusion regions as shown below, while PMOS uses p-type diffusion regions. Since the two types are complementary (C), circuits built with the two types of transistors are called CMOS.

Structure of a MOSFET in an integrated circuit.

Structure of a MOSFET in an integrated circuit.

The image below shows how a transistor in the ROM looks under the microscope. The pinkish regions are the doped silicon that forms the transistor's source and drain. The vertical white line is the polysilicon that forms the transistor's gate. For this photo, I removed the chip's three layers of metal, leaving just the underlying silicon and the polysilicon. The circles in the source and drain are tungsten contacts that connect the silicon to the metal layer above.

One transistor in the constant ROM.

One transistor in the constant ROM.

The diagram below shows eight bits of storage. Each of the four pink silicon rectangles has two potential transistors. If a polysilicon gate crosses the silicon, a transistor is formed; otherwise there is no transistor. When a select line (horizontal polysilicon) is energized, it will turn on all the transistors in that row. If a transistor is present, the corresponding ROM bit is 0 because the transistor will pull the output line to ground. If a transistor is absent, the ROM bit is 1. Thus, the pattern of transistors determines the data stored in the ROM. The ROM holds 26144 bits (304 words of 86 bits) so it has 26144 potential transistors.

Eight bits of storage in the ROM.

Eight bits of storage in the ROM.

The photo below shows the bottom layer of metal (M1): vertical metal wires that provide the ROM outputs and supply ground to the ROM. (These wires are represented by gray lines in the schematic above.) The polysilicon transistors (or gaps as appropriate) are barely visible between the metal lines. Most of the small circles are tungsten contacts to the silicon or polysilicon; compare with the photo above. Other circles are tungsten vias to the metal layer on top (M2), horizontal wiring that I removed for this photo. The smaller metal "tabs" act as jumpers between the horizontal metal select lines in M2 and the polysilicon select lines. The top metal layer (M3, not visible) has thicker vertical wiring for the chip's primary distribution power and ground. Thus, the three metal layers alternate between horizontal and vertical wiring, with vias between the layers.

A closeup of the ROM showing the bottom metal layer.

A closeup of the ROM showing the bottom metal layer.

The ROM is implemented as two grids of cells (below): one to hold exponents and one to hold significands, as shown below. The exponent grid (on the left) has 38 rows and 144 columns of transistors, while the significand grid (on the right) has 38 rows and 544 columns. To make the layout work better, each row holds eight different constants; the bits are interleaved so the ROM holds the first bit of eight constants, then the second bit of eight constants, and so forth. Thus, with 38 rows, the ROM holds 304 constants; each constant has 18 bits in the exponent part and 68 bits in the significand section.

A diagram of the constant ROM and supporting circuitry. Most of the significand ROM has been cut out to make it fit.

A diagram of the constant ROM and supporting circuitry. Most of the significand ROM has been cut out to make it fit.

The exponent part of each constant consists of 18 bits: a 17-bit exponent and one bit for the sign of the significand and thus the constant. There is no sign bit for the exponent because the exponent is stored with 65535 (0x0ffff) added to it, avoiding negative values. The 68-bit significand entry in the ROM consists of a mysterious flag bit2 followed by the 67-bit significand; the first bit of the significand is the integer part and the remainder is the fractional part.3 The complete contents of the ROM are in the appendix at the bottom of this post.

To select a particular constant, the "row select" circuitry between the two sections activates one of the 38 rows. That row provides 144+544 bits to the selection circuitry above the ROM. This circuitry has 86 multiplexers; each multiplexer selects one bit out of the group of 8, selecting the desired constant. The significand bits flow into the floating-point unit datapath circuitry above the ROM. The exponent circuitry, however, is in the upper-left corner of the floating-point unit, a considerable distance from the ROM, so the exponent bits travel through a bus to the exponent circuitry.

The row select circuitry consists of gates to decode the row number, along with high-current drivers to energize the selected row in the ROM. The photo below shows a closeup of two row driver circuits, next to some ROM cells. At the left, PMOS and NMOS transistors implement a gate to select the row. Next, larger NMOS and PMOS transistors form part of the driver. The large square structures are bipolar NPN transistors; the Pentium is unusual because it uses both bipolar transistors and CMOS, a technique called BiCMOS.4 Each driver occupies as much height as four rows of the ROM, so there are four drivers arranged horizontally; only one is visible in the photo.

ROM drivers implemented with BiCMOS.

ROM drivers implemented with BiCMOS.

Structure of the floating-point unit

The floating-point unit is structured with data flowing vertically through horizontal functional units, as shown below. The functional units—adders, shifters, registers, and comparators—are arranged in rows. This collection of functional units with data flowing through them is called the datapath.5

The datapath of the floating-point unit. The ROM is at the bottom.

The datapath of the floating-point unit. The ROM is at the bottom.

Each functional unit is constructed from cells, one per bit, with the high-order bit on the left and the low-order bit on the right. Each cell has the same width—38.5 µm—so the functional units can be connected like Lego blocks snapping together, minimizing the wiring. The height of a functional unit varies as needed, depending on the complexity of the circuit. Functional units typically have 69 bits, but some are wider, so the edges of the datapath circuitry are ragged.

This cell-based construction explains why the ROM has eight constants per row. A ROM bit requires a single transistor, which is much narrower than, say, an adder. Thus, putting one bit in each 38.5 µm cell would waste most of the space. Compacting the ROM bits into a narrow block would also be inefficient, requiring diagonal wiring to connect each ROM bit to the corresponding datapath bit. By putting eight bits for eight different constants into each cell, the width of a ROM cell matches the rest of the datapath and the alignment of bits is preserved. Thus, the layout of the ROM in silicon is dense, efficient, and matches the width of the rest of the floating-point unit.

Polynomial approximation: don't use a Taylor series

Now I'll move from the hardware to the constants. If you look at the constant ROM contents in the appendix, you may notice that many constants are close to reciprocals or reciprocal factorials, but don't quite match. For instance, one constant is 0.1111111089, which is close to 1/9, but visibly wrong. Another constant is almost 1/13! (factorial) but wrong by 0.1%. What's going on?

The Pentium uses polynomials to approximate transcendental functions (sine, cosine, tangent, arctangent, and base-2 powers and logarithms). Intel's earlier floating-point units, from the 8087 to the 486, used an algorithm called CORDIC that generated results a bit at a time. However, the Pentium takes advantage of its fast multiplier and larger ROM and uses polynomials instead, computing results two to three times faster than the 486 algorithm.

You may recall from calculus that a Taylor series polynomial approximates a function near a point (typically 0). For example, the equation below gives the Taylor series for sine.

Using the five terms shown above generates a function that looks indistinguishable from sine in the graph below. However, it turns out that this approximation has too much error to be useful.

Plot of the sine function and the Taylor series approximation.

Plot of the sine function and the Taylor series approximation.

The problem is that a Taylor series is very accurate near 0, but the error soars near the edges of the argument range, as shown in the graph on the left below. When implementing a function, we want the function to be accurate everywhere, not just close to 0, so the Taylor series isn't good enough.

The absolute error for a Taylor-series approximation to sine (5 terms), over two different argument ranges.

The absolute error for a Taylor-series approximation to sine (5 terms), over two different argument ranges.

One improvement is called range reduction: shrinking the argument to a smaller range so you're in the accurate flat part.6 The graph on the right looks at the Taylor series over the smaller range [-1/32, 1/32]. This decreases the error dramatically, by about 22 orders of magnitude (note the scale change). However, the error still shoots up at the edges of the range in exactly the same way. No matter how much you reduce the range, there is almost no error in the middle, but the edges have a lot of error.7

How can we get rid of the error near the edges? The trick is to tweak the coefficients of the Taylor series in a special way that will increase the error in the middle, but decrease the error at the edges by much more. Since we want to minimize the maximum error across the range (called minimax), this tradeoff is beneficial. Specifically, the coefficients can be optimized by a process called the Remez algorithm.8 As shown below, changing the coefficients by less than 1% dramatically improves the accuracy. The optimized function (blue) has much lower error over the full range, so it is a much better approximation than the Taylor series (orange).

Comparison of the absolute error from the Taylor series and a Remez-optimized polynomial, both with maximum term x9. This Remez polynomial is not one from the Pentium.

Comparison of the absolute error from the Taylor series and a Remez-optimized polynomial, both with maximum term x9. This Remez polynomial is not one from the Pentium.

To summarize, a Taylor series is useful in calculus, but shouldn't be used to approximate a function. You get a much better approximation by modifying the coefficients very slightly with the Remez algorithm. This explains why the coefficients in the ROM almost, but not quite, match a Taylor series.

Arctan

I'll now look at the Pentium's constants for different transcendental functions. The constant ROM contains coefficients for two arctan polynomials, one for single precision and one for double precision. These polynomials almost match the Taylor series, but have been modified for accuracy. The ROM also holds the values for arctan(1/32) through arctan(32/32); the range reduction process uses these constants with a trig identity to reduce the argument range to [-1/64, 1/64].9 You can see the arctan constants in the Appendix.

The graph below shows the error for the Pentium's arctan polynomial (blue) versus the Taylor series of the same length (orange). The Pentium's polynomial is superior due to the Remez optimization. Although the Taylor series polynomial is much flatter in the middle, the error soars near the boundary. The Pentium's polynomial wiggles more but it maintains a low error across the whole range. The error in the Pentium polynomial blows up outside this range, but that doesn't matter.

Comparison of the Pentium's double-precision arctan polynomial to the Taylor series.

Comparison of the Pentium's double-precision arctan polynomial to the Taylor series.

Trig functions

Sine and cosine each have two polynomial implementations, one with 4 terms in the ROM and one with 6 terms in the ROM. (Note that coefficients of 1 are not stored in the ROM.) The constant table also holds 16 constants such as sin(36/64) and cos(18/64) that are used for argument range reduction.10 The Pentium computes tangent by dividing the sine by the cosine. I'm not showing a graph because the Pentium's error came out worse than the Taylor series, so either I have an error in a coefficient or I'm doing something wrong.

Exponential

The Pentium has an instruction to compute a power of two.11 There are two sets of polynomial coefficients for exponential, one with 6 terms in the ROM and one with 11 terms in the ROM. Curiously, the polynomials in the ROM compute ex, not 2x. Thus, the Pentium must scale the argument by ln(2), a constant that is in the ROM. The error graph below shows the advantage of the Pentium's polynomial over the Taylor series polynomial.

The Pentium's 6-term exponential polynomial, compared with the Taylor series.

The Pentium's 6-term exponential polynomial, compared with the Taylor series.

The polynomial handles the narrow argument range [-1/128, 1/128]. Observe that when computing a power of 2 in binary, exponentiating the integer part of the argument is trivial, since it becomes the result's exponent. Thus, the function only needs to handle the range [1, 2]. For range reduction, the constant ROM holds 64 values of the form 2n/128-1. To reduce the range from [1, 2] to [-1/128, 1/128], the closest n/128 is subtracted from the argument and then the result is multiplied by the corresponding constant in the ROM. The constants are spaced irregularly, presumably for accuracy; some are in steps of 4/128 and others are in steps of 2/128.

Logarithm

The Pentium can compute base-2 logarithms.12 The coefficients define polynomials for the hyperbolic arctan, which is closely related to log. See the comments for details. The ROM also has 64 constants for range reduction: log2(1+n/64) for odd n from 1 to 63. The unusual feature of these constants is that each constant is split into two pieces to increase the bits of accuracy: the top part has 40 bits of accuracy and the bottom part has 67 bits of accuracy, providing a 107-bit constant in total. The extra bits are required because logarithms are hard to compute accurately.

Other constants

The x87 floating-point instruction set provides direct access to a handful of constants—0, 1, pi, log2(10), log2(e), log10(2), and loge(2)—so these constants are stored in the ROM. (These logs are useful for changing the base for logs and exponentials.) The ROM holds other constants for internal use by the floating-point unit such as -1, 2, 7/8, 9/8, pi/2, pi/4, and 2log2(e). The ROM also holds bitmasks for extracting part of a word, for instance accessing 4-bit BCD digits in a word. Although I can interpret most of the values, there are a few mysteries such as a mask with the inscrutable value 0x3e8287c. The ROM has 34 unused entries at the end; these entries hold words that include the descriptive hex value 0xbad or perhaps 0xbadfc for "bad float constant".

How I examined the ROM

To analyze the Pentium, I removed the metal and oxide layers with various chemicals (sulfuric acid, phosphoric acid, Whink). (I later discovered that simply sanding the die works surprisingly well.) Next, I took many photos of the ROM with a microscope. The feature size of this Pentium is 800 nm, just slightly larger than visible light (380-700 nm). Thus, the die can be examined under an optical microscope, but it is getting close to the limits. To determine the ROM contents, I tediously went through the ROM images, examining each of the 26144 bits and marking each transistor. After figuring out the ROM format, I wrote programs to combine simple functions in many different combinations to determine the mathematical expression such as arctan(19/32) or log2(10). Because the polynomial constants are optimized and my ROM data has bit errors, my program needed checks for inexact matches, both numerically and bitwise. Finally, I had to determine how the constants would be used in algorithms.

Conclusions

By examining the Pentium's floating-point ROM under a microscope, it is possible to extract the 304 constants stored in the ROM. I was able to determine the meaning of most of these constants and deduce some of the floating-point algorithms used by the Pentium. These constants illustrate how polynomials can efficiently compute transcendental functions. Although Taylor series polynomials are well known, they are surprisingly inaccurate and should be avoided. Minor changes to the coefficients through the Remez algorithm, however, yield much better polynomials.

In a previous article, I examined the floating-point constants stored in the 8087 coprocessor. The Pentium has 304 constants in the Pentium, compared to just 42 in the 8087, supporting more efficient algorithms. Moreover, the 8087 was an external floating-point unit, while the Pentium's floating-point unit is part of the processor. The changes between the 8087 (1980, 65,000 transistors) and the Pentium (1993, 3.1 million transistors) are due to the exponential improvements in transistor count, as described by Moore's Law.

I plan to write more about the Pentium so follow me on Bluesky (@righto.com) or RSS for updates. (I'm no longer on Twitter.) I've also written about the Pentium division bug and the Pentium Navajo rug. Thanks to CuriousMarc for microscope help. Thanks to lifthrasiir and Alexia for identifying some constants.

Appendix: The constant ROM

The table below lists the 304 constants in the Pentium's floating-point ROM. The first four columns show the values stored in the ROM: the exponent, the sign bit, the flag bit, and the significand. To avoid negative exponents, exponents are stored with the constant 0x0ffff added. For example, the value 0x0fffe represents an exponent of -1, while 0x10000 represents an exponent of 1. The constant's approximate decimal value is in the "value" column.

Special-purpose values are colored. Specifically, "normal" numbers are in black. Constants with an exponent of all 0's are in blue, constants with an exponent of all 1's are in red, constants with an unusually large or small exponent are in green; these appear to be bitmasks rather than numbers. Unused entries are in gray. Inexact constants (due to Remez optimization) are represented with the approximation symbol "≈".

This information is from my reverse engineering, so there will be a few errors.

expSFsignificandvaluemeaning
0 00000 0 0 07878787878787878 BCD mask by 4's
1 00000 0 0 007f807f807f807f8 BCD mask by 8's
2 00000 0 0 00007fff80007fff8 BCD mask by 16's
3 00000 0 0 000000007fffffff8 BCD mask by 32's
4 00000 0 0 78000000000000000 4-bit mask
5 00000 0 0 18000000000000000 2-bit mask
6 00000 0 0 27000000000000000 ?
7 00000 0 0 363c0000000000000 ?
8 00000 0 0 3e8287c0000000000 ?
9 00000 0 0 470de4df820000000 213×1016
10 00000 0 0 5c3bd5191b525a249 2123/1017
11 00000 0 0 00000000000000007 3-bit mask
12 1ffff 1 1 7ffffffffffffffff all 1's
13 00000 0 0 0000007ffffffffff mask for 32-bit float
14 00000 0 0 00000000000003fff mask for 64-bit float
15 00000 0 0 00000000000000000 all 0's
16 0ffff 0 0 40000000000000000  1 1
17 10000 0 0 6a4d3c25e68dc57f2  3.3219280949 log2(10)
18 0ffff 0 0 5c551d94ae0bf85de  1.4426950409 log2(e)
19 10000 0 0 6487ed5110b4611a6  3.1415926536 pi
20 0ffff 0 0 6487ed5110b4611a6  1.5707963268 pi/2
21 0fffe 0 0 6487ed5110b4611a6  0.7853981634 pi/4
22 0fffd 0 0 4d104d427de7fbcc5  0.3010299957 log10(2)
23 0fffe 0 0 58b90bfbe8e7bcd5f  0.6931471806 ln(2)
24 1ffff 0 0 40000000000000000 +infinity
25 0bfc0 0 0 40000000000000000 1/4 of smallest 80-bit denormal?
26 1ffff 1 0 60000000000000000 NaN (not a number)
27 0ffff 1 0 40000000000000000 -1 -1
28 10000 0 0 40000000000000000  2 2
29 00000 0 0 00000000000000001 low bit
30 00000 0 0 00000000000000000 all 0's
31 00001 0 0 00000000000000000 single exponent bit
32 0fffe 0 0 58b90bfbe8e7bcd5e  0.6931471806 ln(2)
33 0fffe 0 0 40000000000000000  0.5 1/2! (exp Taylor series)
34 0fffc 0 0 5555555555555584f  0.1666666667 ≈1/3!
35 0fffa 0 0 555555555397fffd4  0.0416666667 ≈1/4!
36 0fff8 0 0 444444444250ced0c  0.0083333333 ≈1/5!
37 0fff5 0 0 5b05c3dd3901cea50  0.0013888934 ≈1/6!
38 0fff2 0 0 6806988938f4f2318  0.0001984134 ≈1/7!
39 0fffe 0 0 40000000000000000  0.5 1/2! (exp Taylor series)
40 0fffc 0 0 5555555555555558e  0.1666666667 ≈1/3!
41 0fffa 0 0 5555555555555558b  0.0416666667 ≈1/4!
42 0fff8 0 0 444444444443db621  0.0083333333 ≈1/5!
43 0fff5 0 0 5b05b05b05afd42f4  0.0013888889 ≈1/6!
44 0fff2 0 0 68068068163b44194  0.0001984127 ≈1/7!
45 0ffef 0 0 6806806815d1b6d8a  0.0000248016 ≈1/8!
46 0ffec 0 0 5c778d8e0384c73ab  2.755731e-06 ≈1/9!
47 0ffe9 0 0 49f93e0ef41d6086b  2.755731e-07 ≈1/10!
48 0ffe5 0 0 6ba8b65b40f9c0ce8  2.506632e-08 ≈1/11!
49 0ffe2 0 0 47c5b695d0d1289a8  2.088849e-09 ≈1/12!
50 0fffd 0 0 6dfb23c651a2ef221  0.4296133384 266/128-1
51 0fffd 0 0 75feb564267c8bf6f  0.4609177942 270/128-1
52 0fffd 0 0 7e2f336cf4e62105d  0.4929077283 274/128-1
53 0fffe 0 0 4346ccda249764072  0.5255981507 278/128-1
54 0fffe 0 0 478d74c8abb9b15cc  0.5590044002 282/128-1
55 0fffe 0 0 4bec14fef2727c5cf  0.5931421513 286/128-1
56 0fffe 0 0 506333daef2b2594d  0.6280274219 290/128-1
57 0fffe 0 0 54f35aabcfedfa1f6  0.6636765803 294/128-1
58 0fffe 0 0 599d15c278afd7b60  0.7001063537 298/128-1
59 0fffe 0 0 5e60f4825e0e9123e  0.7373338353 2102/128-1
60 0fffe 0 0 633f8972be8a5a511  0.7753764925 2106/128-1
61 0fffe 0 0 68396a503c4bdc688  0.8142521755 2110/128-1
62 0fffe 0 0 6d4f301ed9942b846  0.8539791251 2114/128-1
63 0fffe 0 0 7281773c59ffb139f  0.8945759816 2118/128-1
64 0fffe 0 0 77d0df730ad13bb90  0.9360617935 2122/128-1
65 0fffe 0 0 7d3e0c0cf486c1748  0.9784560264 2126/128-1
66 0fffc 0 0 642e1f899b0626a74  0.1956643920 233/128-1
67 0fffc 0 0 6ad8abf253fe1928c  0.2086843236 235/128-1
68 0fffc 0 0 7195cda0bb0cb0b54  0.2218460330 237/128-1
69 0fffc 0 0 7865b862751c90800  0.2351510639 239/128-1
70 0fffc 0 0 7f48a09590037417f  0.2486009772 241/128-1
71 0fffd 0 0 431f5d950a896dc70  0.2621973504 243/128-1
72 0fffd 0 0 46a41ed1d00577251  0.2759417784 245/128-1
73 0fffd 0 0 4a32af0d7d3de672e  0.2898358734 247/128-1
74 0fffd 0 0 4dcb299fddd0d63b3  0.3038812652 249/128-1
75 0fffd 0 0 516daa2cf6641c113  0.3180796013 251/128-1
76 0fffd 0 0 551a4ca5d920ec52f  0.3324325471 253/128-1
77 0fffd 0 0 58d12d497c7fd252c  0.3469417862 255/128-1
78 0fffd 0 0 5c9268a5946b701c5  0.3616090206 257/128-1
79 0fffd 0 0 605e1b976dc08b077  0.3764359708 259/128-1
80 0fffd 0 0 6434634ccc31fc770  0.3914243758 261/128-1
81 0fffd 0 0 68155d44ca973081c  0.4065759938 263/128-1
82 0fffd 1 0 4cee3bed56eedb76c -0.3005101637 2-66/128-1
83 0fffd 1 0 50c4875296f5bc8b2 -0.3154987885 2-70/128-1
84 0fffd 1 0 5485c64a56c12cc8a -0.3301662380 2-74/128-1
85 0fffd 1 0 58326c4b169aca966 -0.3445193942 2-78/128-1
86 0fffd 1 0 5bcaea51f6197f61f -0.3585649920 2-82/128-1
87 0fffd 1 0 5f4faef0468eb03de -0.3723096215 2-86/128-1
88 0fffd 1 0 62c12658d30048af2 -0.3857597319 2-90/128-1
89 0fffd 1 0 661fba6cdf48059b2 -0.3989216343 2-94/128-1
90 0fffd 1 0 696bd2c8dfe7a5ffb -0.4118015042 2-98/128-1
91 0fffd 1 0 6ca5d4d0ec1916d43 -0.4244053850 2-102/128-1
92 0fffd 1 0 6fce23bceb994e239 -0.4367391907 2-106/128-1
93 0fffd 1 0 72e520a481a4561a5 -0.4488087083 2-110/128-1
94 0fffd 1 0 75eb2a8ab6910265f -0.4606196011 2-114/128-1
95 0fffd 1 0 78e09e696172efefc -0.4721774108 2-118/128-1
96 0fffd 1 0 7bc5d73c5321bfb9e -0.4834875605 2-122/128-1
97 0fffd 1 0 7e9b2e0c43fcf88c8 -0.4945553570 2-126/128-1
98 0fffc 1 0 53c94402c0c863f24 -0.1636449102 2-33/128-1
99 0fffc 1 0 58661eccf4ca790d2 -0.1726541162 2-35/128-1
100 0fffc 1 0 5cf6413b5d2cca73f -0.1815662751 2-37/128-1
101 0fffc 1 0 6179ce61cdcdce7db -0.1903824324 2-39/128-1
102 0fffc 1 0 65f0e8f35f84645cf -0.1991036222 2-41/128-1
103 0fffc 1 0 6a5bb3437adf1164b -0.2077308674 2-43/128-1
104 0fffc 1 0 6eba4f46e003a775a -0.2162651800 2-45/128-1
105 0fffc 1 0 730cde94abb7410d5 -0.2247075612 2-47/128-1
106 0fffc 1 0 775382675996699ad -0.2330590011 2-49/128-1
107 0fffc 1 0 7b8e5b9dc385331ad -0.2413204794 2-51/128-1
108 0fffc 1 0 7fbd8abc1e5ee49f2 -0.2494929652 2-53/128-1
109 0fffd 1 0 41f097f679f66c1db -0.2575774171 2-55/128-1
110 0fffd 1 0 43fcb5810d1604f37 -0.2655747833 2-57/128-1
111 0fffd 1 0 46032dbad3f462152 -0.2734860021 2-59/128-1
112 0fffd 1 0 48041035735be183c -0.2813120013 2-61/128-1
113 0fffd 1 0 49ff6c57a12a08945 -0.2890536989 2-63/128-1
114 0fffd 1 0 555555555555535f0 -0.3333333333 ≈-1/3 (arctan Taylor series)
115 0fffc 0 0 6666666664208b016  0.2 ≈ 1/5
116 0fffc 1 0 492491e0653ac37b8 -0.1428571307 ≈-1/7
117 0fffb 0 0 71b83f4133889b2f0  0.1110544094 ≈ 1/9
118 0fffd 1 0 55555555555555543 -0.3333333333 ≈-1/3 (arctan Taylor series)
119 0fffc 0 0 66666666666616b73  0.2 ≈ 1/5
120 0fffc 1 0 4924924920fca4493 -0.1428571429 ≈-1/7
121 0fffb 0 0 71c71c4be6f662c91  0.1111111089 ≈ 1/9
122 0fffb 1 0 5d16e0bde0b12eee8 -0.0909075848 ≈-1/11
123 0fffb 0 0 4e403be3e3c725aa0  0.0764169081 ≈ 1/13
124 00000 0 0 40000000000000000 single bit mask
125 0fff9 0 0 7ff556eea5d892a14  0.0312398334 arctan(1/32)
126 0fffa 0 0 7fd56edcb3f7a71b6  0.0624188100 arctan(2/32)
127 0fffb 0 0 5fb860980bc43a305  0.0934767812 arctan(3/32)
128 0fffb 0 0 7f56ea6ab0bdb7196  0.1243549945 arctan(4/32)
129 0fffc 0 0 4f5bbba31989b161a  0.1549967419 arctan(5/32)
130 0fffc 0 0 5ee5ed2f396c089a4  0.1853479500 arctan(6/32)
131 0fffc 0 0 6e435d4a498288118  0.2153576997 arctan(7/32)
132 0fffc 0 0 7d6dd7e4b203758ab  0.2449786631 arctan(8/32)
133 0fffd 0 0 462fd68c2fc5e0986  0.2741674511 arctan(9/32)
134 0fffd 0 0 4d89dcdc1faf2f34e  0.3028848684 arctan(10/32)
135 0fffd 0 0 54c2b6654735276d5  0.3310960767 arctan(11/32)
136 0fffd 0 0 5bd86507937bc239c  0.3587706703 arctan(12/32)
137 0fffd 0 0 62c934e5286c95b6d  0.3858826694 arctan(13/32)
138 0fffd 0 0 6993bb0f308ff2db2  0.4124104416 arctan(14/32)
139 0fffd 0 0 7036d3253b27be33e  0.4383365599 arctan(15/32)
140 0fffd 0 0 76b19c1586ed3da2b  0.4636476090 arctan(16/32)
141 0fffd 0 0 7d03742d50505f2e3  0.4883339511 arctan(17/32)
142 0fffe 0 0 4195fa536cc33f152  0.5123894603 arctan(18/32)
143 0fffe 0 0 4495766fef4aa3da8  0.5358112380 arctan(19/32)
144 0fffe 0 0 47802eaf7bfacfcdb  0.5585993153 arctan(20/32)
145 0fffe 0 0 4a563964c238c37b1  0.5807563536 arctan(21/32)
146 0fffe 0 0 4d17c07338deed102  0.6022873461 arctan(22/32)
147 0fffe 0 0 4fc4fee27a5bd0f68  0.6231993299 arctan(23/32)
148 0fffe 0 0 525e3e8c9a7b84921  0.6435011088 arctan(24/32)
149 0fffe 0 0 54e3d5ee24187ae45  0.6632029927 arctan(25/32)
150 0fffe 0 0 5756261c5a6c60401  0.6823165549 arctan(26/32)
151 0fffe 0 0 59b598e48f821b48b  0.7008544079 arctan(27/32)
152 0fffe 0 0 5c029f15e118cf39e  0.7188299996 arctan(28/32)
153 0fffe 0 0 5e3daef574c579407  0.7362574290 arctan(29/32)
154 0fffe 0 0 606742dc562933204  0.7531512810 arctan(30/32)
155 0fffe 0 0 627fd7fd5fc7deaa4  0.7695264804 arctan(31/32)
156 0fffe 0 0 6487ed5110b4611a6  0.7853981634 arctan(32/32)
157 0fffc 1 0 55555555555555555 -0.1666666667 ≈-1/3! (sin Taylor series)
158 0fff8 0 0 44444444444443e35  0.0083333333 ≈ 1/5!
159 0fff2 1 0 6806806806773c774 -0.0001984127 ≈-1/7!
160 0ffec 0 0 5c778e94f50956d70  2.755732e-06 ≈ 1/9!
161 0ffe5 1 0 6b991122efa0532f0 -2.505209e-08 ≈-1/11!
162 0ffde 0 0 58303f02614d5e4d8  1.604139e-10 ≈ 1/13!
163 0fffd 1 0 7fffffffffffffffe -0.5 ≈-1/2! (cos Taylor series)
164 0fffa 0 0 55555555555554277  0.0416666667 ≈ 1/4!
165 0fff5 1 0 5b05b05b05a18a1ba -0.0013888889 ≈-1/6!
166 0ffef 0 0 680680675b559f2cf  0.0000248016 ≈ 1/8!
167 0ffe9 1 0 49f93af61f5349300 -2.755730e-07 ≈-1/10!
168 0ffe2 0 0 47a4f2483514c1af8  2.085124e-09 ≈ 1/12!
169 0fffc 1 0 55555555555555445 -0.1666666667 ≈-1/3! (sin Taylor series)
170 0fff8 0 0 44444444443a3fdb6  0.0083333333 ≈ 1/5!
171 0fff2 1 0 68068060b2044e9ae -0.0001984127 ≈-1/7!
172 0ffec 0 0 5d75716e60f321240  2.785288e-06 ≈ 1/9!
173 0fffd 1 0 7fffffffffffffa28 -0.5 ≈-1/2! (cos Taylor series)
174 0fffa 0 0 555555555539cfae6  0.0416666667 ≈ 1/4!
175 0fff5 1 0 5b05b050f31b2e713 -0.0013888889 ≈-1/6!
176 0ffef 0 0 6803988d56e3bff10  0.0000247989 ≈ 1/8!
177 0fffe 0 0 44434312da70edd92  0.5333026735 sin(36/64)
178 0fffe 0 0 513ace073ce1aac13  0.6346070800 sin(44/64)
179 0fffe 0 0 5cedda037a95df6ee  0.7260086553 sin(52/64)
180 0fffe 0 0 672daa6ef3992b586  0.8060811083 sin(60/64)
181 0fffd 0 0 470df5931ae1d9460  0.2775567516 sin(18/64)
182 0fffd 0 0 5646f27e8bd65cbe4  0.3370200690 sin(22/64)
183 0fffd 0 0 6529afa7d51b12963  0.3951673302 sin(26/64)
184 0fffd 0 0 73a74b8f52947b682  0.4517714715 sin(30/64)
185 0fffe 0 0 6c4741058a93188ef  0.8459244992 cos(36/64)
186 0fffe 0 0 62ec41e9772401864  0.7728350058 cos(44/64)
187 0fffe 0 0 5806149bd58f7d46d  0.6876855622 cos(52/64)
188 0fffe 0 0 4bc044c9908390c72  0.5918050751 cos(60/64)
189 0fffe 0 0 7af8853ddbbe9ffd0  0.9607092430 cos(18/64)
190 0fffe 0 0 7882fd26b35b03d34  0.9414974631 cos(22/64)
191 0fffe 0 0 7594fc1cf900fe89e  0.9186091558 cos(26/64)
192 0fffe 0 0 72316fe3386a10d5a  0.8921336994 cos(30/64)
193 0ffff 0 0 48000000000000000  1.125 9/8
194 0fffe 0 0 70000000000000000  0.875 7/8
195 0ffff 0 0 5c551d94ae0bf85de  1.4426950409 log2(e)
196 10000 0 0 5c551d94ae0bf85de  2.8853900818 2log2(e)
197 0fffb 0 0 7b1c2770e81287c11  0.1202245867 ≈1/(41⋅3⋅ln(2)) (atanh series for log)
198 0fff9 0 0 49ddb14064a5d30bd  0.0180336880 ≈1/(42⋅5⋅ln(2))
199 0fff6 0 0 698879b87934f12e0  0.0032206148 ≈1/(43⋅7⋅ln(2))
200 0fffa 0 0 51ff4ffeb20ed1749  0.0400377512 ≈(ln(2)/2)2/3 (atanh series for log)
201 0fff6 0 0 5e8cd07eb1827434a  0.0028854387 ≈(ln(2)/2)4/5
202 0fff3 0 0 40e54061b26dd6dc2  0.0002475567 ≈(ln(2)/2)6/7
203 0ffef 0 0 61008a69627c92fb9  0.0000231271 ≈(ln(2)/2)8/9
204 0ffec 0 0 4c41e6ced287a2468  2.272648e-06 ≈(ln(2)/2)10/11
205 0ffe8 0 0 7dadd4ea3c3fee620  2.340954e-07 ≈(ln(2)/2)12/13
206 0fff9 0 0 5b9e5a170b8000000  0.0223678130 log2(1+1/64) top bits
207 0fffb 0 0 43ace37e8a8000000  0.0660892054 log2(1+3/64) top bits
208 0fffb 0 0 6f210902b68000000  0.1085244568 log2(1+5/64) top bits
209 0fffc 0 0 4caba789e28000000  0.1497471195 log2(1+7/64) top bits
210 0fffc 0 0 6130af40bc0000000  0.1898245589 log2(1+9/64) top bits
211 0fffc 0 0 7527b930c98000000  0.2288186905 log2(1+11/64) top bits
212 0fffd 0 0 444c1f6b4c0000000  0.2667865407 log2(1+13/64) top bits
213 0fffd 0 0 4dc4933a930000000  0.3037807482 log2(1+15/64) top bits
214 0fffd 0 0 570068e7ef8000000  0.3398500029 log2(1+17/64) top bits
215 0fffd 0 0 6002958c588000000  0.3750394313 log2(1+19/64) top bits
216 0fffd 0 0 68cdd829fd8000000  0.4093909361 log2(1+21/64) top bits
217 0fffd 0 0 7164beb4a58000000  0.4429434958 log2(1+23/64) top bits
218 0fffd 0 0 79c9aa879d8000000  0.4757334310 log2(1+25/64) top bits
219 0fffe 0 0 40ff6a2e5e8000000  0.5077946402 log2(1+27/64) top bits
220 0fffe 0 0 450327ea878000000  0.5391588111 log2(1+29/64) top bits
221 0fffe 0 0 48f107509c8000000  0.5698556083 log2(1+31/64) top bits
222 0fffe 0 0 4cc9f1aad28000000  0.5999128422 log2(1+33/64) top bits
223 0fffe 0 0 508ec1fa618000000  0.6293566201 log2(1+35/64) top bits
224 0fffe 0 0 5440461c228000000  0.6582114828 log2(1+37/64) top bits
225 0fffe 0 0 57df3fd0780000000  0.6865005272 log2(1+39/64) top bits
226 0fffe 0 0 5b6c65a9d88000000  0.7142455177 log2(1+41/64) top bits
227 0fffe 0 0 5ee863e4d40000000  0.7414669864 log2(1+43/64) top bits
228 0fffe 0 0 6253dd2c1b8000000  0.7681843248 log2(1+45/64) top bits
229 0fffe 0 0 65af6b4ab30000000  0.7944158664 log2(1+47/64) top bits
230 0fffe 0 0 68fb9fce388000000  0.8201789624 log2(1+49/64) top bits
231 0fffe 0 0 6c39049af30000000  0.8454900509 log2(1+51/64) top bits
232 0fffe 0 0 6f681c731a0000000  0.8703647196 log2(1+53/64) top bits
233 0fffe 0 0 72896372a50000000  0.8948177633 log2(1+55/64) top bits
234 0fffe 0 0 759d4f80cb8000000  0.9188632373 log2(1+57/64) top bits
235 0fffe 0 0 78a450b8380000000  0.9425145053 log2(1+59/64) top bits
236 0fffe 0 0 7b9ed1c6ce8000000  0.9657842847 log2(1+61/64) top bits
237 0fffe 0 0 7e8d3845df0000000  0.9886846868 log2(1+63/64) top bits
238 0ffd0 1 0 6eb3ac8ec0ef73f7b -1.229037e-14 log2(1+1/64) bottom bits
239 0ffcd 1 0 654c308b454666de9 -1.405787e-15 log2(1+3/64) bottom bits
240 0ffd2 0 0 5dd31d962d3728cbd  4.166652e-14 log2(1+5/64) bottom bits
241 0ffd3 0 0 70d0fa8f9603ad3a6  1.002010e-13 log2(1+7/64) bottom bits
242 0ffd1 0 0 765fba4491dcec753  2.628429e-14 log2(1+9/64) bottom bits
243 0ffd2 1 0 690370b4a9afdc5fb -4.663533e-14 log2(1+11/64) bottom bits
244 0ffd4 0 0 5bae584b82d3cad27  1.628582e-13 log2(1+13/64) bottom bits
245 0ffd4 0 0 6f66cc899b64303f7  1.978889e-13 log2(1+15/64) bottom bits
246 0ffd4 1 0 4bc302ffa76fafcba -1.345799e-13 log2(1+17/64) bottom bits
247 0ffd2 1 0 7579aa293ec16410a -5.216949e-14 log2(1+19/64) bottom bits
248 0ffcf 0 0 509d7c40d7979ec5b  4.475041e-15 log2(1+21/64) bottom bits
249 0ffd3 1 0 4a981811ab5110ccf -6.625289e-14 log2(1+23/64) bottom bits
250 0ffd4 1 0 596f9d730f685c776 -1.588702e-13 log2(1+25/64) bottom bits
251 0ffd4 1 0 680cc6bcb9bfa9853 -1.848298e-13 log2(1+27/64) bottom bits
252 0ffd4 0 0 5439e15a52a31604a  1.496156e-13 log2(1+29/64) bottom bits
253 0ffd4 0 0 7c8080ecc61a98814  2.211599e-13 log2(1+31/64) bottom bits
254 0ffd3 1 0 6b26f28dbf40b7bc0 -9.517022e-14 log2(1+33/64) bottom bits
255 0ffd5 0 0 554b383b0e8a55627  3.030245e-13 log2(1+35/64) bottom bits
256 0ffd5 0 0 47c6ef4a49bc59135  2.550034e-13 log2(1+37/64) bottom bits
257 0ffd5 0 0 4d75c658d602e66b0  2.751934e-13 log2(1+39/64) bottom bits
258 0ffd4 1 0 6b626820f81ca95da -1.907530e-13 log2(1+41/64) bottom bits
259 0ffd3 0 0 5c833d56efe4338fe  8.216774e-14 log2(1+43/64) bottom bits
260 0ffd5 0 0 7c5a0375163ec8d56  4.417857e-13 log2(1+45/64) bottom bits
261 0ffd5 1 0 5050809db75675c90 -2.853343e-13 log2(1+47/64) bottom bits
262 0ffd4 1 0 7e12f8672e55de96c -2.239526e-13 log2(1+49/64) bottom bits
263 0ffd5 0 0 435ebd376a70d849b  2.393466e-13 log2(1+51/64) bottom bits
264 0ffd2 1 0 6492ba487dfb264b3 -4.466345e-14 log2(1+53/64) bottom bits
265 0ffd5 1 0 674e5008e379faa7c -3.670163e-13 log2(1+55/64) bottom bits
266 0ffd5 0 0 5077f1f5f0cc82aab  2.858817e-13 log2(1+57/64) bottom bits
267 0ffd2 0 0 5007eeaa99f8ef14d  3.554090e-14 log2(1+59/64) bottom bits
268 0ffd5 0 0 4a83eb6e0f93f7a64  2.647316e-13 log2(1+61/64) bottom bits
269 0ffd3 0 0 466c525173dae9cf5  6.254831e-14 log2(1+63/64) bottom bits
270 0badf 0 1 40badfc0badfc0bad unused
271 0badf 0 1 40badfc0badfc0bad unused
272 0badf 0 1 40badfc0badfc0bad unused
273 0badf 0 1 40badfc0badfc0bad unused
274 0badf 0 1 40badfc0badfc0bad unused
275 0badf 0 1 40badfc0badfc0bad unused
276 0badf 0 1 40badfc0badfc0bad unused
277 0badf 0 1 40badfc0badfc0bad unused
278 0badf 0 1 40badfc0badfc0bad unused
279 0badf 0 1 40badfc0badfc0bad unused
280 0badf 0 1 40badfc0badfc0bad unused
281 0badf 0 1 40badfc0badfc0bad unused
282 0badf 0 1 40badfc0badfc0bad unused
283 0badf 0 1 40badfc0badfc0bad unused
284 0badf 0 1 40badfc0badfc0bad unused
285 0badf 0 1 40badfc0badfc0bad unused
286 0badf 0 1 40badfc0badfc0bad unused
287 0badf 0 1 40badfc0badfc0bad unused
288 0badf 0 1 40badfc0badfc0bad unused
289 0badf 0 1 40badfc0badfc0bad unused
290 0badf 0 1 40badfc0badfc0bad unused
291 0badf 0 1 40badfc0badfc0bad unused
292 0badf 0 1 40badfc0badfc0bad unused
293 0badf 0 1 40badfc0badfc0bad unused
294 0badf 0 1 40badfc0badfc0bad unused
295 0badf 0 1 40badfc0badfc0bad unused
296 0badf 0 1 40badfc0badfc0bad unused
297 0badf 0 1 40badfc0badfc0bad unused
298 0badf 0 1 40badfc0badfc0bad unused
299 0badf 0 1 40badfc0badfc0bad unused
300 0badf 0 1 40badfc0badfc0bad unused
301 0badf 0 1 40badfc0badfc0bad unused
302 0badf 0 1 40badfc0badfc0bad unused
303 0badf 0 1 40badfc0badfc0bad unused

Notes and references

  1. In this blog post, I'm looking at the "P5" version of the original Pentium processor. It can be hard to keep all the Pentiums straight since "Pentium" became a brand name with multiple microarchitectures, lines, and products. The original Pentium (1993) was followed by the Pentium Pro (1995), Pentium II (1997), and so on.

    The original Pentium used the P5 microarchitecture, a superscalar microarchitecture that was advanced but still executed instruction in order like traditional microprocessors. The original Pentium went through several substantial revisions. The first Pentium product was the 80501 (codenamed P5), containing 3.1 million transistors. The power consumption of these chips was disappointing, so Intel improved the chip, producing the 80502, codenamed P54C. The P5 and P54C look almost the same on the die, but the P54C added circuitry for multiprocessing, boosting the transistor count to 3.3 million. The biggest change to the original Pentium was the Pentium MMX, with part number 80503 and codename P55C. The Pentium MMX added 57 vector processing instructions and had 4.5 million transistors. The floating-point unit was rearranged in the MMX, but the constants are probably the same. 

  2. I don't know what the flag bit in the ROM indicates; I'm arbitrarily calling it a flag. My wild guess is that it indicates ROM entries that should be excluded from the checksum when testing the ROM. 

  3. Internally, the significand has one integer bit and the remainder is the fraction, so the binary point (decimal point) is after the first bit. However, this is not the only way to represent the significand. The x87 80-bit floating-point format (double extended-precision) uses the same approach. However, the 32-bit (single-precision) and 64-bit (double-precision) formats drop the first bit and use an "implied" one bit. This gives you one more bit of significand "for free" since in normal cases the first significand bit will be 1. 

  4. An unusual feature of the Pentium is that it uses bipolar NPN transistors along with CMOS circuits, a technology called BiCMOS. By adding a few extra processing steps to the regular CMOS manufacturing process, bipolar transistors could be created. The Pentium uses BiCMOS circuits extensively since they reduced signal delays by up to 35%. Intel also used BiCMOS for the Pentium Pro, Pentium II, Pentium III, and Xeon processors (but not the Pentium MMX). However, as chip voltages dropped, the benefit from bipolar transistors dropped too and BiCMOS was eventually abandoned.

    In the constant ROM, BiCMOS circuits improve the performance of the row selection circuitry. Each row select line is very long and is connected to hundreds of transistors, so the capacitive load is large. Because of the fast and powerful NPN transistor, a BiCMOS driver provides lower delay for higher loads than a regular CMOS driver.

    A typical BiCMOS inverter. From A 3.3V 0.6µm BiCMOS superscalar microprocessor.

    A typical BiCMOS inverter. From A 3.3V 0.6µm BiCMOS superscalar microprocessor.

    This BiCMOS logic is also called BiNMOS or BinMOS because the output has a bipolar transistor and an NMOS transistor. For more on BiCMOS circuits in the Pentium, see my article Standard cells: Looking at individual gates in the Pentium processor

  5. The integer processing unit of the Pentium is constructed similarly, with horizontal functional units stacked to form the datapath. Each cell in the integer unit is much wider than a floating-point cell (64 µm vs 38.5 µm). However, the integer unit is just 32 bits wide, compared to 69 (more or less) for the floating-point unit, so the floating-point unit is wider overall. 

  6. I don't like referring to the argument's range since a function's output is the range, while its input is the domain. But the term range reduction is what people use, so I'll go with it. 

  7. There's a reason why the error curve looks similar even if you reduce the range. The error from the Taylor series is approximately the next term in the Taylor series, so in this case the error is roughly -x11/11! or O(x11). This shows why range reduction is so powerful: if you reduce the range by a factor of 2, you reduce the error by the enormous factor of 211. But this also shows why the error curve keeps its shape: the curve is still x11, just with different labels on the axes. 

  8. The Pentium coefficients are probably obtained using the Remez algorithm; see Floating-Point Verification. The advantages of the Remez polynomial over the Taylor series are discussed in Better Function Approximations: Taylor vs. Remez. A description of Remez's algorithm is in Elementary Functions: Algorithms and Implementation, which has other relevant information on polynomial approximation and range reduction. For more on polynomial approximations, see Numerically Computing the Exponential Function with Polynomial Approximations and The Eight Useful Polynomial Approximations of Sinf(3),

    The Remez polynomial in the sine graph is not the Pentium polynomial; it was generated for illustration by lolremez, a useful tool. The specific polynomial is:

    9.9997938808335731e-1 ⋅ x - 1.6662438518867169e-1 ⋅ x3 + 8.3089850302282266e-3 ⋅ x5 - 1.9264997445395096e-4 ⋅ x7 + 2.1478735041839789e-6 ⋅ x9

    The graph below shows the error for this polynomial. Note that the error oscillates between an upper bound and a lower bound. This is the typical appearance of a Remez polynomial. In contrast, a Taylor series will have almost no error in the middle and shoot up at the edges. This Remez polynomial was optimized for the range [-π,π]; the error explodes outside that range. The key point is that the Remez polynomial distributes the error inside the range. This minimizes the maximum error (minimax).

    Error from a Remez-optimized polynomial for sine.

    Error from a Remez-optimized polynomial for sine.
  9. I think the arctan argument is range-reduced to the range [-1/64, 1/64]. This can be accomplished with the trig identity arctan(x) = arctan((x-c)/(1+xc)) + arctan(c). The idea is that c is selected to be the value of the form n/32 closest to x. As a result, x-c will be in the desired range and the first arctan can be computed with the polynomial. The other term, arctan(c), is obtained from the lookup table in the ROM. The FPATAN (partial arctangent) instruction takes two arguments, x and y, and returns atan(y/x); this simplifies handling planar coordinates. In this case, the trig identity becomes arcan(y/x) = arctan((y-tx)/(x+ty)) + arctan c. The division operation can trigger the FDIV bug in some cases; see Computational Aspects of the Pentium Affair

  10. The Pentium has several trig instructions: FSIN, FCOS, and FSINCOS return the sine, cosine, or both (which is almost as fast as computing either). FPTAN returns the "partial tangent" consisting of two numbers that must be divided to yield the tangent. (This was due to limitations in the original 8087 coprocessor.) The Pentium returns the tangent as the first number and the constant 1 as the second number, keeping the semantics of FPTAN while being more convenient.

    The range reduction is probably based on the trig identity sin(a+b) = sin(a)cos(b)+cos(a)sin(b). To compute sin(x), select b as the closest constant in the lookup table, n/64, and then generate a=x-b. The value a will be range-reduced, so sin(a) can be computed from the polynomial. The terms sin(b) and cos(b) are available from the lookup table. The desired value sin(x) can then be computed with multiplications and addition by using the trig identity. Cosine can be computed similarly. Note that cos(a+b) =cos(a)cos(b)-sin(a)sin(b); the terms on the right are the same as for sin(a+b), just combined differently. Thus, once the terms on the right have been computed, they can be combined to generate sine, cosine, or both. The Pentium computes the tangent by dividing the sine by the cosine. This can trigger the FDIV division bug; see Computational Aspects of the Pentium Affair.

    Also see Agner Fog's Instruction Timings; the timings for the various operations give clues as to how they are computed. For instance, FPTAN takes longer than FSINCOS because the tangent is generated by dividing the sine by the cosine. 

  11. For exponentials, the F2XM1 instruction computes 2x-1; subtracting 1 improves accuracy. Specifically, 2x is close to 1 for the common case when x is close to 0, so subtracting 1 as a separate operation causes you to lose most of the bits of accuracy due to cancellation. On the other hand, if you want 2x, explicitly adding 1 doesn't harm accuracy. This is an example of how the floating-point instructions are carefully designed to preserve accuracy. For details, see the book The 8087 Primer by the architects of the 8086 processor and the 8087 coprocessor. 

  12. The Pentium has base-two logarithm instructions FYL2X and FYL2XP1. The FYL2X instruction computes y log2(x) and the FYL2XP1 instruction computes y log2(x+1) The instructions include a multiplication because most logarithm operations will need to multiply to change the base; performing the multiply with internal precision increases the accuracy. The "plus-one" instruction improves accuracy for arguments close to 1, such as interest calculations.

    My hypothesis for range reduction is that the input argument is scaled to fall between 1 and 2. (Taking the log of the exponent part of the argument is trivial since the base-2 log of a base-2 power is simply the exponent.) The argument can then be divided by the largest constant 1+n/64 less than the argument. This will reduce the argument to the range [1, 1+1/32]. The log polynomial can be evaluated on the reduced argument. Finally, the ROM constant for log2(1+n/64) is added to counteract the division. The constant is split into two parts for greater accuracy.

    It took me a long time to figure out the log constants because they were split. The upper-part constants appeared to be pointlessly inaccurate since the bottom 27 bits are zeroed out. The lower-part constants appeared to be miniscule semi-random numbers around ±10-13. Eventually, I figured out that the trick was to combine the constants. 

Um expansor de threads para o fediverso, com atenção especial à privacidade.

Eu uso muito serviços on-line do tipo “unroll”, que servem para apresentar em forma de documento o conteúdo de uma thread. Existem alguns desse tipo para o fediverso, mas nenhum atendia exatamente o que eu queria, então eu adaptei um para mim, a partir do template disponibilizado pelo @[email protected] (thanks!)

Ficou legal, e quero compartilhar com vocês: expansor.online

Eu uso principalmente para expandir as minhas próprias threads, quando quero postá-las também em algum outro formato, tipo aqui no meu blog. Mas também me serve bem quando quero arquivar algum texto no Obsidian.

O template do Patrick já dava bastante atenção à privacidade, e eu ampliei esse foco (e está bem descrito na documentação). Também dei um trato na acessibilidade. Não sou javascripteiro, então ainda tem bastante coisa a aprimorar – mas para meu uso já está bem funcional.

Para conhecer, é só visitar https://expansor.online/ , colar a URL de alguma thread e, se as configurações da instância e preferências de privacidade do usuário permitirem, ela será expandida.

Também dá para usar URL com referência à thread que você deseja expandir - por exemplo, para expandir a thread em que eu falei sobre organizar festas que acolhem bem os autistas, visite a URL abaixo:

https://expansor.online/?#url=https://social.br-linux.org/@augustocc/113704287818891733

Espero que seja útil! E tem documentação, também.

O artigo "Um expansor de threads para o fediverso, com atenção especial à privacidade." foi originalmente publicado no site TRILUX, de Augusto Campos.

Autismo para não-autistas - um exemplo de como funciona para mim

É difícil explicar, no dia a dia, como é a experiência de ter autismo.

Mesmo quando se conhece os conceitos, as características como rigidez cognitiva, stimming, hiperfocos, graus de desconexão em relação às sensações etc. tendem a ser percebidas como fatores isolados, excentricidades ou esquisitices, ainda mais quando ocorrem em uma pessoa que na maior parte do tempo os mascara bem, alcançando – na visão dos outros – alta adaptação ao seu meio.

Hoje de manhã percebi em mim um exemplo de interação entre duas dessas características, que resolvi anotar pois talvez ajude a exemplificar ambas a quem não as tem mas deseja entendê-las, e apresentarei como uma soma de fatores.

  • Fator 1 - Ineficiência do link automático entre as sensações – calor, fome etc. – e as reações que elas deveriam provocar. Na prática, eu frequentemente demoro a perceber – e às vezes não percebo, ou ao menos não identifico – sensações como frio, dor, fome e sede. No momento em que chego a perceber, também noto que a manifestação física daquela sensação (por exemplo, a boca seca que manifesta a sede) estava presente há um bom tempo, mas eu não tinha “ouvido o chamado” dela.
  • Fator 2 - Rigidez cognitiva. Eu sou como uma máquina de executar tarefas planejadas. Não é uma rigidez totalmente inflexível, no meu caso: é uma determinação que inclui superar imprevistos e absorver novidades. Mas tem jeito certo e ordem certa: eu tenho uma pista certa na qual quero andar em cada trecho dos meus trajetos, uma ordem certa de fazer as rotinas matinais etc., e é muito custoso e frustrante para mim ter que executar as coisas de maneiras diferentes das planejadas.
  • Somando o Fator 1 e o Fator 2, temos as situações em que uma sensação demora a ser percebida e, ao ser percebida, demandaria uma ação que difere daquela demandada pela rigidez cognitiva - e aí eu tendo a negar duplamente (ou adiar de forma irrazoável) a reação à sensação em questão – como vestir um casaco, por exemplo.

Vamos ao exemplo prático de hoje: eu acordei com muita sede, algo que não é frequente para mim. A minha rotina matinal já inclui tomar uma garrafa de água (metade antes do exercício e metade depois), mas hoje não bastou – no fim das contas eu tomei 3 garrafas, mas vamos seguindo pela ordem:

  • Depois do exercício tomei a segunda metade da garrafa usual, e fui para o banho. Eu ainda estava com sede, mas não tinha notado.
  • Ao sair do banho, segui com os passos usuais da rotina: ferver água para o café, colocar pó no filtro, etc.
  • Quando abri a geladeira para pegar o leite para ferver, veio a percepção: estou com muita sede. IMEDIATAMENTE veio também a absorção desse fato novo, pelos mecanismos de gerenciamento que a rigidez cognitiva me permite. Tinha uma garrafa de água gelada ali à minha mão, eu estava com a geladeira aberta, mas a fechei sem pegar nada além do leite, pois a decisão automática já estava tomada: “Preciso lembrar de tomar uma água depois de terminar de tomar o café”.

Faz sentido? Racionalmente, não faz – nem mesmo pra mim. Mas esse processo não é racional, é um mecanismo de como o autismo (ou Transtorno do Espectro Autista – que não se chama Transtorno à toa) se manifesta em mim.

Eu PERCEBI que não fazia sentido adiar a água para depois do café, mas ainda assim, apesar da percepção, não modifiquei a decisão.

Segui em frente com a rotina de esquentar o leite, mas logo em seguida aconteceu algo que nem sempre acontece: a racionalidade me alcançou, e eu PERCEBI que não fazia sentido adiar a água para depois do café. Mas ainda assim, apesar da percepção, não modifiquei a decisão: apenas lamentei a incongruência, e segui em frente.

A racionalidade persistiu me avisando, como um alarme, mas a rigidez é forte, e a maneira de absorver aquela novidade já tinha sido tomada - água só depois do café. Foi aí que eu precisei parar e negociar comigo mesmo, na forma de um replanejamento: “Se eu pegar a garrafa de água enquanto o leite esquenta, e antes da água ferver, posso tomar desde já, e não atrasará nada do que tinha sido planejado”.

A negociação entre racionalidade e rigidez cognitiva foi bem-sucedida, e dei fim à sede enquanto olhava o leite aquecer, como olho todos os dias. Após, com clareza sobre como o processo tinha se desenrolado, tomei notas para lembrar de mais tarde escrever este texto, porque percebi uma oportunidade não apenas de comunicar isso a quem convive com autistas, mas também de eu me entender um pouco melhor e talvez vencer mais um degrau na minha própria luta contra a rigidez quando ela me atrapalha.


Nota: Fica para uma oportunidade futura um possível texto complementar fazendo a conexão entre o relato acima e outro comportamento que o TEA me traz: o hiperfoco, que não me deixou dar atenção a mais nada enquanto eu não escrevesse este texto, apesar de eu ter decidido que só o escreveria à tarde.

O artigo "Autismo para não-autistas - um exemplo de como funciona para mim" foi originalmente publicado no site TRILUX, de Augusto Campos.

Refining strategy with Wardley Mapping.

The first time I heard about Wardley Mapping was from Charity Majors discussing it on Twitter. Of the three core strategy refinement techniques, this is the technique that I’ve personally used the least. Despite that, I decided to include it in this book because it highlights how many different techniques can be used for refining strategy, and also because it’s particularly effective at looking at the broadest ecosystems your organization exists in.

Where the other techniques like systems thinking and strategy testing often zoom in, Wardley mapping is remarkably effective at zooming out.

In this chapter, we’ll cover:

  • A ten-minute primer on Wardley mapping
  • Recommendations for tools to create Wardley maps
  • When Wardley maps are an ideal strategy refinement tool, and when they’re not
  • The process I use to map, as well as integrate a Wardley map into strategy creation
  • Breadcrumbs to specific Wardley maps that provide examples
  • Documenting a Wardley map in the context of a strategy writeup
  • Why I limited focus on two elements of Wardley’s work: doctrines and gameplay

After working through this chapter, and digging into some of this book’s examples of Wardley Maps, you’ll have a good background to start your own mapping practice.

This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

Ten minute primer

Wardley maps are a technique created by Simon Wardley to ensure your strategy is grounded in reality. Or, as mapping practioners would say, it’s a tool for creating situational awareness. If you have a few days, you might want to start your dive into Wardley mapping by reading Simon Wardley’s book on the topic, Wardley Maps. If you only have ten minutes, then this section should be enough to get you up to speed on reading Wardley maps.

Picking an example to work through, we’re going to create a Wardley map that aims to understand a knowledge base management product, along the lines of a wiki like Confluence or Notion.

Diagram showing a basic Wardley map for a knowledge base management application.

You need to know three foundational concepts to read a Wardley map:

  1. Maps are populated with three kinds of components: users, needs, and capabilities. Users exist at the top, and represent a cohort of users who will use your product. Each kind of user has a specific set of needs, generally tasks that they need to accomplish. Each need requires certain capabilities required to fulfill that need.

    Any box connecting directly to a user is a need. Any box connecting to a need is a capability. A capability can be connected to any number of needs, but can never connect directly to a user; they connect to users only indirectly via a need.

  2. The x-axis is divided into four segments, representing how commoditized a capability is. On the far left is genesis, which represents a brand-new capability that hasn’t existed before. On the far right is commoditized, something so standard and expected that it’s unremarkable, like turning on a switch causing electricity to flow. In between are custom and product, the two categories where most items fall on the map. Custom represents something that requires specialized expertise and operation to function, such as a web application that requires software engineers to build and maintain. Product represents something that can generally be bought.

    In this map, document reading is commoditized: it’s unremarkable if your application allows its users to read content. On the other hand, document editing is someone on the border of product and custom. You might integrate an existing vendor for document editing needs, or you might build it yourself, but in either case document editing is less commoditized than document reading.

  3. The y-axis represents visibility to the user. In this map, reading documents is something that is extremely visible to the user. On the other hand, users depend on something indexing new documents for search, but your users will generally have no visibility into the indexing process or even that you have a search index to begin with.

Although maps can get quite complex, those three concepts are generally sufficient to allow you to decode an arbitrarily complex map.

In addition to mapping the current state, Wardley maps are also excellent at exploring how circumstances might change over time. To illustrate that, let’s look at a second iteration of our map, paying particular attention to the red arrows indicating capabilities that we expect to change in the future.

Diagram showing a basic Wardley map for a knowledge base management application.

In particular, the map now indicates that the current document creation experience will be superseded by an AI-enhanced editing process. Critically, the map also predicts that the AI-enhanced process will be more commoditized than its current authoring experience, perhaps because the AI-enhancement will be driven by commoditized foundational models from providers like Anthropic and OpenAI. Building on that, the only place left in the map for meaningful differentiation is in search indexing. Either the knowledge base company needs to accept the implication that they will increasingly be a search company, or they need to expand the user needs they service to find a new avenue for differentiation.

Some maps will show evolution of a given capability using a “pipeline”, a box that describes a series of expected improvements in a capability over time.

Diagram showing a basic Wardley map for a knowledge base management application.

Now instead of simply indicating that the authoring experience may be replaced by an AI-enhanced capability over time, we’re able to express a sequence of steps. From the starting place of a typical editing experience, the next expected step is AI-assisted creation, and then finally we expect AI-led creation where the author only provides high-level direction to a machine learning-powered agent.

For completeness, it’s also worth mentioning that some Wardley maps will have an overlay, which is a box to group capabilities or requirements together by some common denominator. This happens most frequently to indicate the responsible team for various capabilities, but it’s a technique that can be used to emphasize any interesting element of a map’s topology.

Diagram showing a basic Wardley map for a knowledge base management application, with an overlay to show which teams own which capabilities.

At this point, you have the foundation to read a Wardley map, or get started creating your own. Maps you encounter in the wild might appear singificantly more complex than these initial examples, but they’ll be composed of the same fundamental elements.

More Wardley Mapping resources

The Value Flywheel Effect by David Anderson

Wardley Maps by Simon Wardley on Medium, also available as PDF

Learn Wardley Mapping by Ben Mosior

wardleymaps.com’s resources and @WardleyMaps on Youtube

Tools for Wardley Mapping

Systems modeling has a serious tooling problem, which often prevents would-be adopters from developing their systems modeling practice. Fortunately, Wardley Mapping doesn’t suffer from that problem. Uou can simply print out a Wardley Map and draw on it by hand. You can also use OmniGraffle, Miro, Figma or whatever diagramming tool you’re already familiar with.

There are more focused tools as well, with Ben Mosior pulling together an excellent writeup on Wardley Mapping Tools as of 2024. Of those two, I’d strongly encourage starting with Mapkeep as a simple, free, and intuitive tool for your innitial mapping needs.

After you’ve gotten some practice, you may well want to move back into your most familiar diagramming tool to make it easier to collaborate with colleagues, but initially prioritize the simplest tool you can to avoid losing learning momentum on configuration, setup and so on.

When are Wardley Maps useful?

All successful strategy begins with understanding the constraints and circumstances that the strategy needs to work within. Wardley mapping labels that understanding as situational awareness, and creating situational awareness is the foremost goal of mapping.

Situational awareness is always useful, but it’s particularly essential in highly dynamic environments where the industry around you, competitors you’re sellinga gainst, or the capabilities powering your product are shifting rapidly. In the past several decades, there have been a number of these dynamic contexts, including the rise of web applications, the proliferation of mobile devices, and the expansion of machine learning techniques.

When you’re in those environments, it’s obvious that the world is changing rapidly. What’s sometimes easy to miss is that any strategy the needs to last longer than a year or two is build on an evolving foundation, even if things seem very stable at the time. For example, in the early 2010s, startups like Facebook, Uber and Digg were all operating in physical datacenters with their owned hardware. Over a five year period, having a presence in a physical datacenter went from the default approach for startups to a relatively unconventional solution, as cloud based infrastructure rapidly expanded. Any strategy written in 2010 that imagined the world of hosting was static, was destinated to be invalidated.

No tool is universally effective, and that’s true here as well. While Wardley maps are extremely helpful at understanding broad change, my experience is that they’re less helpful in the details. If you’re looping to optimize your onboarding funnel, then something like systems modeling or strategy testing are likely going to serve you better.

How to Wardley Map

Learning Wardley mapping is a mix of reading others’ maps and writing your own. A variety of maps for reading are collected in the following breadcrumbs section, and I’d recommend skimming all of them. In this section are the concrete steps I’d encourage you to follow for creating the first map of your own:

  1. Commit to starting small and iterating. Simple maps are the foundation of complex maps. Even the smallest Wardley map will have enough detail to reveal something interesting about the environment you’re operating in.

    Conversely, by starting complex, it’s easy to get caught up in all of your early map’s imperfections. At worst, this will cause you to lose momentum in creating the map. At best, it will accidentally steer your attention rather than facilitating discover of which details are important to focus on.

  2. List users, needs and capabilities. Identify the first one or two users for your product. Going back to the knowledge management example from the primer, your two initial users might be an author and a reader. From there, identify those users’ needs, such as authoring content, finding content, and providing feedback on which content is helpful. Finally, write down the underlying technical capabilities necessary to support those needs, which might range from indexing content in a search index to a customer support process to deal with frustrated users.

    Remember to start small! On your first pass, it’s fine to focus on a single user. As you iterate on your map, bring in more users, needs and capabilities until the map conveys something useful.

    Tooling for this can be a piece of paper or wherever you keep notes.

  3. Establish value chains. Take your list and then connect each of the components into chains. For example, the reader in the above knowledge base example would then be connected to needing to discover content. Discovering content would be linked to indexing in the search index. That sequence from reader to discovering content to search index represents one value chain.

    Convergence across chains is a good thing. As your chains get more comprehensive, it’s expected that a given capability would be referenced by multiple different needs. Similarly, it’s expected that multiple users might have a shared need.

  4. Plot value chains on a Wardley Map. You can do this using any of the tools discussed in the Tools for Wardley mapping section, including a piece of paper.

    Because you already have the value chains created, what you’re focused on in this step is placing each component relative to it’s visibility to users (higher up is more visible to the user, lower down is less visible), and how mature the solutions are (leftward represents more custom solutions, rightward represents most commoditized solutions).

  5. Study current state of the map. With the value chains plotted on your map, it will begin to reveal where your organization’s attention should be focused, and what complexity you can delegate to vendors. Jot down any realizations you have from this topology.

  6. Predict evolution of the map, and create a second version of your map that includes these changes. (Keep the previous version so you can better see the evolution of your thinking!)

    It can be helpful to create multiple maps that contemplate different scenarios. Thinking about the running knowledge base example, you might contemplate a future where AI-powered tools become the dominant mechanism for authors creating content. Then you could explore another future where such tools are regulated out of most tools, and imagine how that would shape your approach differently.

    Picking the timeframe for these changes will vary on the evironment you’re mapping. Always prefer a timeframe that makes it easy to believe changes will happen, maybe that’s five years, or maybe it’s 12 months. If you’re caught up wondering whether change might take longer a certain timeframe, than simply extend your timeframe to sidestep that issue.

  7. Study future state of the map, now that you’ve predicted the future, Once again, write down any unexpected implications of this evolution, and how you may need to adjust your approach as a result.

  8. Share with others for feedback. It’s impossible for anyone to know everything, which is why the best maps tend to be a communal creation. That’s not to suggest that you should perform every step in a broad community, or that your map should be the consensus of a working group. Instead, you should test your map against others, see what they find insightful and what they find artificial in the map, and include that in your map’s topology.

  9. Document what you’ve learned as discussed below in the section on documentation. You should also connect that Wardley map writeup with your overall strategy document, typically in the Refine or Explore sections.

One downside of presenting steps to do something is that the sequence can become a fixed recipe. These are the steps that I’ve found most useful, and I’d encourage you to try them if mapping is a new tool in your toolkit, but this is far from the canonical way. Start here, then experiment with other approaches until you find the best approach for you and the strategies that you’re working on.

I’ll update these examples as I continue writing more strategies for this book. Until then, I admit that some of these examples are “what I have laying around” moreso than the “ideal forms of Wardley maps.”

With the foundation in place, the best way to build on Wardley mapping is writing your own maps. The second best way is to read existing maps that others have made, and a number of which exist within this book:

  • LLM evolution studies the evolution of the Large Language Model ecosystem, and how that will impact product engineering organizations attempting to validate and deploy new paradigms like agentic workflows and retrieval augmented generation
  • Gitlab strategy shows a broad Wardley Map, looking at the developer tooling industry’s evolution over time, and how Gitlab’s approach implies they belief commoditization will drive organizations to prefer bundled solutions over integration best-in-breed offerings
  • Evolution of developer experience tooling space explores how Wardley mapping has helped me refine my understanding of how the developer experience ecosystem will evolve over time

In addition to the maps within this book, I also label maps that I create on my blog using the wardley category.

How to document a Wardley Map

As explored in how to create readable strategy documents, the default temptation is to structure documents around the creation process. However, it’s essentially always better to write in two steps: develop a writing-optimization version that’s focused on facilitating thinking, and then rework it into a reading-optimized version that supports both readers who are, and are not, interested in the details.

The writing-optimized version is what we discussed in “How to Wardley Map” above. For a reading-optimized version, I recommend:

  1. How things work today shares a map of the current environment, explains any interesting rationales or controversies behind placements on the map, and highlights the most interesting parts of the map

  2. Transition to future state starts with a second map, this one showing the transition from the current state to a projected future state. It’s very reasonable to have multiple distinct maps, each of which considers one potential evolution, or one step of a longer evolution.

  3. Users and Value chains are the first place you start creating a Wardley map, but generally the least interesting part of explaining a map’s implications. This isn’t because the value chains are unimportant, rather it’s because the map itself tends to implicitly explain the value chain enough that you can move directly to focusing on the map’s most interesting implications.

    In a sufficiently complex, it’s very reasonable to split this into two sections, but generally I find it eliminates redundency to cover users and value chains in one joint section rather than separately. This is a good example of the difference between reading and writing: splitting these two topics helps clarify thinking, but muddles reading.

This ordering may seem too brief or a bit counter-intuitive for you, as the person who has the full set of details, but my experience is that it will be simpler to read for most readers. That’s because most readers read until they agree with the conclusion, then stop reading, and are only interested in the details if they disagree with the conclusion.

This format is also fairly different than the format I recommend for documenting systems models. That is because systems model diagrams exclude much of the relevant detail, showing the relationship between stocks but not showing the magnitude of the flows. You can only fully understand a system model by seeing both the diagram and a chart showing the model’s output. Wardley maps, on the other hand, tend to be more self-explanatory, and often can stand on their own with relatively less written description.

What about doctrines and gameplay?

This book’s components of strategy are most heavily influenced by Richard Rumelt’s approach. Simon Wardley’s approach to strategy built around Wardley Mapping could be viewed as a competing lens. For each problem that Rumelt’s system solves, there is a Wardley solution as well, and it’s worth mentioning some of the components I’ve not included, and why I didn’t.

The two most important components I’ve not discussed thus far are Wardley’s ideas of doctrine and gameplay. Wardley’s doctrine are universally applicable practices like knowing your users, biasing towards data, and design for constant evolution. Gameplay is similar to doctrine, but is context-dependent rather than universal. Some examples of gameplay are talent raid (hiring from knowledgable competitior), bundling (selling products together rather than separately), and exploiting network effects.

I decided not to spend much time on doctrine and gameplay because I find them lightly specialized on the needs of business strategy, and consequently a bit messy to apply to the sorts of problems that this book is most interested in solving: the problems of engineering strategy.

To be explicit, I don’t personally view Rumelt’s approach and Wardley’s approaches as competing efforts. What’s most valuable is to have a broad toolkit, and pull in the pieces of that toolkit that feel most applicable to the problems at hand. I find Wardley Maps exceptionally valuable at enhancing exploration, diagnosis, and refinement in some problems. In other problems, typically shorter duration or more internally-oriented, I find the Rumelt playbook more applicable. In all problems, I find the combination more valuable than anchoring in one camp’s perspective.

Summary

No refinement technique will let you reliably predict the future, but Wardley mapping is very effective at helping you plot out the various potential futures your strategy might need to operate in. With those futures in mind, you can tune your strategy to excel in the most likely, and to weather the less desirable.

It took me years to dive into Wardley mapping. Once I finally did, it was simpler than I’d feared, and now I find myself creating Wardley maps somewhat frequently. When you’re working on your next strategy that’s impacted by the ecosystem’s evolution around it, try your hand at mapping, and soon you’ll start to build your own collection of maps.

Qud

Caves of Qud is an expansive roguelike game with a 2D top-down perspective, and tons of procedurally generated content. It has a very TDTTOE attitude, much like Nethack, and is notable for procedurally generating everything from maps to quests to characters to foundational in-game lore. Despite that, it does a really great job tying all that random content together into a cohesive world that feels rich, deep, and ripe for exploration.

A screenshot of the load game menu showing my most recent game: a character named Uumuuyushum with a turtle-like carapace.
Uumuuyushum, level 10

After being initially a little put off for it, the recent 1.0 release along with a bunch of praise from folks I follow on social media persuaded me to grab a copy. I’ve put several hours into it now, including spending most of a 6 hour flight back from the east coast on the screenshotted game, and I am really enjoying it.

My best game so far. A vaguely turtle-shaped (?) being that can emit sleeping gas when under threat, is pretty handy with a revolver, and eats Snapjaws for breakfast.
My best game so far. A vaguely turtle-shaped (?) being that can emit sleeping gas when under threat, is pretty handy with a revolver, and eats Snapjaws for breakfast.

Live and drink, friend.

Summary of reading: October - December 2024

  • "Dr. Euler's Fabulous Formula" by Paul J. Nahin - a kind of sequel to the previous book I read by this author ("An imaginary tale"). Here he collected all the interesting mathematical explorations that didn't make the cut for that book. I found this one to be much closer to a textbook on the math spectrum - most pages are crammed to the brim with integrals. Not easy bedside reading, for sure, though it's still a good book and I appreciate the author's enthusiasm about these topics.
  • "Sid Meier's Memoir!: A Life in Computer Games" by Sid Meier - an autobiography by the famous computer game designer, best known for his Civilization series. I think I was confused by what "designer" means in Meier's case, because it turns out he's a coder extraordinaire who was churning multiple solo-written games every year in the 1980s and into the 1990s. This book is very interesting and insightful; the author exudes charming good nature, optimisim and some real hard-won design wisdom. A delightful read for fans of the genre.
  • "Pnin" by Vladimir Nabokov - some notes on the life of a professor of Russian who emigrated to the US around WWII and teaches at a fictional New England college. The writing is masterful, but overall I felt like this novel's main point eluded me - there's probably certain groups it was very relevant to, but not me.
  • "Churchill - walking with destiny" by Andrew Roberts - a modern biography of Winston Churchill, augmented with documents that only became available in the last couple of decades. Even though this book is a behemoth (over 1000 pages in the printed edition, 50 hours on audio), it's surprisingly readable and engaging. The author does his best at being objective, providing detailed accounts of criticism against Churchill and his failures. Nevertheless, the character of the man emerging from this book is extremely impressive and inspiring. I should probably try reading one of Churchill's own books - I had no idea he was such a prolific and highly regarder author (with a Nobel prize in literature).
  • "The Chip" by T.R. Reid - secondary title is "How two Americans invented the microchip and launched a revolution". Tells the story of the simultaneous discovery of integrated circuits by Jack Kilby and Robert Noyce. The first half of the book is fascinating; the second half is not as good and mostly seems to be a filler.
  • "How to rule the Universe, without alerting the orderlies" by Gregory Khait - (read in Russian) yet another collection of short stories, with some new stories on the interim period of immigration when people were stuck in Europe while waiting for US visas. At least 1/3 of the stories are repeats from previous books.
  • "I Contain Multitudes: The Microbes Within Us and a Grander View of Life" by Ed Yong - talks about the symbiosis of bacteria with multi-cellular organisms, providing many examples of research findings from the animal and kingdoms, including humans. The book is interesting, but this is a topic that moves quickly, so the material gets outdated fast. Mostly, my impression was surprise at how little we know about these things. The science here is definitely in its infancy.
  • "The computational beauty of nature" by Gary William Flake - "computer explorations of fractals, chaos, complex systems and adaptation". A nice book with some interesting code samples, but most of it feels awfully outdated a quarter century after publishing. It's quite impressive that the C code it bundles is still easy to compile and run, though.
  • "Math for English majors" by Ben Orlin - alas, by far Orlin's weakest work IMHO. It's not clear what this book is even trying to do - a list of humorous definitions for various mathematical terms? Drawing parallels between elementary math and English grammar? I'll update this review if this book becomes popular with my kids, but so far it seems unlikely.
  • "The Guns of August: The Outbreak of World War I" by Barbara W. Tuchman - a detailed history of the events leading to the outbreak of WWI and its first month or so, in which Germany's attack into France was eventually blocked and the Western front settled into the years-long trench warfare deadlock. I found the first half of the book - dealing with the politics leading to the war - the most insightful. It pairs wonderfully with one of my favorite books - "All quiet on the Western front", because it's a perfect contrast of the decisions made in the upper eschelons vs. the reality on the ground for millions of soldiers. The second half is more of a classical military history of events, and I wasn't particularly impressed by it.
  • "Tom Swan and the Head of St George" by Christian Cameron - a wild adventure story set in the late crusades era. A young English knight's escapades in Italy, Greece and Turkey, full of unrealistic fighting, womanizing and drinking. Occasionally fun, but mostly pretty silly. I don't think I'll be reading the rest of the books in this series.
  • "Poverty Safari: Understanding the Anger of Britain's Underclass" by Darren McGarvey - a memoir of a Scottish rapper about growing up in a poor, dysfunctional family in Glasgow in the 1990s, sharing his views of the class-based social divides of society. The author's own struggles with alcoholism and mental health issues play a major role in the book. Some parts of this book are insightful, especially on the topic of class divides within the liberal left.

Re-reads:

  • "To Kill a Mockingbird" by Harper Lee
  • "Anne of Green Gables" by L.M. Montgomery
  • "The Luzhin Defense" by Vladimir Nabokov
  • "Man's Search for Meaning" by Viktor Frankl

Hasselblad V Lenses on Sony Mirrorless Cameras

On a recent trip to the Maniototo I was shooting the Hasselblad 501cm film camera. One evening up the Manuherikia River the skies clouded over. There wasn’t enough light to shoot without a tripod so I put away the film camera and experimented with shooting the Hasselblad lenses adapted to my digital Sony a7rii.

Shooting down the valley towards the Home Hills. 50mm, stopped down, 1/400s.

Shooting down the valley towards the Home Hills. 50mm, stopped down, 1/400s.

Why adapt Hasselblad lenses to a digital camera? If you already have Hasselblad lenses (or more generally any lens from the film-era), modern mirrorless cameras are a way to continue shooting the lenses. Hasselblad make ‘CFV’ digital backs for the V series cameras, however they’re priced similar to a lens filter for a Leica. I’ve never shot with one, they’re probably great, but the CFV bodies lack IBIS (inbuilt image stabilisation). Modern mirrorless cameras have IBIS, so you can shoot in a lot less light for a fifth of the price.

Briar Rose.

Briar Rose.

Lens focal length is the same in medium format film as it is in 35mm full frame. In other words, the 50mm Zeiss Hasselblad lens is a 50mm on the Sony. 50mm is my go-to on full frame so I mainly shot with the Zeiss 50mm FLE f/4 lens, the widest lens for the Hasselblad I own. Also pictured are a couple of shots on the Zeiss 250mm shot at f/5.6.

Hawkdun Range. Taken on the 250mm shot wide open at 1/60s &ndash; acceptable.

Hawkdun Range. Taken on the 250mm shot wide open at 1/60s -- acceptable.

Shooting is straight forward and inline with any other adapted lenses. A few things to call out:

  1. Manually set the focal length of the lens you are shooting on in the camera settings to get IBIS. This gives you another few stops of light, reduces image shake in your pictures, and allows you to shoot the 250mm lens handheld. If it’s bright enough out you don’t need IBIS (ala shooting film). You must remember to either disable IBIS or set it to the correct focal length. IBIS on with the incorrect focal length set will result in blurry pictures!
  2. Hasselblad V lenses have the shutter in the lens. When shooting on a digital camera body it doesn’t use the in-lens shutter, but rather the focal plane shutter in the camera. By default the lens is wide open and the lens will only stop down to the set aperture when the image is being taken. Every lens has a ‘stop down’ lever (acts as a depth of field preview). You must enable this, otherwise the lens will always shoot wide open regardless of the aperture you have selected.
  3. Turn on manual focus assist in the camera body. It highlights in-focus areas in red and makes focusing really easy. The large focusing rings on Hasselblad lenses make focusing easy.
Musterer hut at the foot of the Hawkduns. Taken on the 250mm shot wide open at 1/100s &ndash; too soft for my liking. Likely camera shake.

Musterer hut at the foot of the Hawkduns. Taken on the 250mm shot wide open at 1/100s -- too soft for my liking. Likely camera shake.

Apart from the standard 80mm, the Hasselblad lenses are heavy and chunky. I found myself holding the camera and lens like a Hasselblad 500cm. Left hand on the bottom of the lens, right hand on the camera body, and both held low using the Sony’s LCD back popped back at an angle for me to view vertically down. I usually shoot cameras through the viewfinder, though this seemed apt. Old habits die hard – I was too used to the Hasselblad’s reversed and inverse image, so found it entertainingly difficult to frame images.

Background goes blurrrrr. 50mm f/4 handheld 1/60s. The IBIS worked &ndash; no visible camera shake.

Background goes blurrrrr. 50mm f/4 handheld 1/60s. The IBIS worked -- no visible camera shake.

A square crop to honour the Hasselblad gods. 50mm focused as close as possible.

A square crop to honour the Hasselblad gods. 50mm focused as close as possible.

FotodioX manufactures many different film-era lens to modern mirrorless camera adapters. Within the Hasselblad V to Sony range there are three options:

  1. Standard adapter
  2. Shift adapter (what I purchased)
  3. Tilt and shift adapter

Aside from normal shift effects of correcting verticals or horizontals in a frame, the shift functionality appealed because of the relative sizes of the image circles. Hasselblad lenses are built to project the image onto a 60mm x 60mm film plane. My Sony camera has the typical 36mm x 24mm full frame sensor. With a standard adapter (or the default position of the shift adapter) the camera sensor lines up to the middle of the projected image giving you the best performing area of the lens.

Due to these relative sizes you can create a panoramic image by placing the camera on a tripod and taking three images: lens shifted left, lens in default position, and lens shifted right. Stitch the three images together and you have a panoramic that shows off the full angle of view of the lens.

Sounds good in theory, and I bought the shift adapter to do above. In practice though the 42 mega pixels of my camera are more than enough to give me an image that I can crop. And modern photo editing software is good enough to stitch a handheld panoramic. It wasn’t worth the hassle of setting up my tripod.

Last light. 50mm, f/4, 1/200s. Exposed for the light, shadows brought up in post.

Last light. 50mm, f/4, 1/200s. Exposed for the light, shadows brought up in post.

Quality, contrast, and detail of the photographs is great. It’s comparable to the Sony 24-105mm f/4 lens that I bring on most walks. More than good enough for my needs. Due to the size and weight of the lenses I don’t see myself going out to shoot specifically with Hasselblad lenses on the Sony, I’d pick the Sony E-mount native lenses every time.

For a given project, if my primary rig is the Hasselblad V system on film then I’ll continue to bring the Sony camera and lens adapter as a backup and alternative.

Um Guia de Literatura Para Sistemas Distribuídos (e o porquê eu não quero que você o siga)

Esse não é um artigo comum aqui do blog. Na verdade, é o tipo de material que eu sou mais relutante em produzir: uma recomendação. Não gosto de dar recomendações gerais porque isso presume muitas coisas sobre muitas pessoas ao mesmo tempo, ignorando peculiaridades, nuances e momentos distintos de cada indivíduo, que pode ler e interpretar algo de formas diferentes. Na verdade, esse receio molda a maneira como prefiro acompanhar a carreira das pessoas no dia a dia e conduzir mentorias: de forma pontual, direta e olhando para a pessoa que está à minha frente, sem presumir nenhum molde genérico do tipo “faça assim que vai dar certo pra você”. A coisa menos genérica que existe é gente… Gosto de olhar cada pessoa individualmente, caso a caso. Por isso, já desisti várias vezes de escrever um post assim, mas aqui está… O primeiro post do ano.

Inclusive, grande parte do material que trago aqui (para não dizer que é inteiro) representa as principais referências para a série de System Design que estamos desenvolvendo neste blog.


Por que você não deveria seguir este roteiro

Recentemente, fiz um exercício típico de fim de ano — com a mente vazia —, imaginando uma mentoria comigo mesmo há 6 ou 7 anos, do jeito que mais gosto: uma mesa de boteco sem crachás. Olhei para minha prateleira de livros, para o meu histórico no Kindle e montei um roteiro de Matheus para Fidelis.

Esse é o motivo pelo qual não quero que você siga esse roteiro ao pé da letra. Porém, fique à vontade para, daqui em diante, usá-lo como recomendação do que lhe agradar aos olhos e adaptá-lo à sua realidade, já que não terei a oportunidade de ajudá-lo(a) tão de perto.

Muito importante ressaltar: Todos os links que direcionam para a Amazon são links de afiliados, mas você pode buscar o recurso onde achar melhor.

Autocrítica é muito legal. Não faço questão de ser um cristal perfeito em pessoa — mesmo sendo movido por um senso constante de autoaperfeiçoamento meio estranho —, e acho que isso só é possível porque assumo que tenho muito a melhorar o tempo todo. Tenho muitas dificuldades, até mesmo em áreas nas quais me considero bom. Uma delas é justamente não saber ensinar conteúdo introdutório. Reconheço que é uma falha minha; muita gente faz isso extraordinariamente bem, e admiro essas pessoas por esse e outros motivos. É a minha kryptonita. Por isso, vou partir do pressuposto de que você não seja um iniciante.

Presumo, então, que você já tenha uma boa quilometragem em empresas e projetos de software, e que já tenha feito algumas coisas que deram certo, e muitas que deram errado.

Em termos de conteúdo técnico, parto do princípio de que você já esteja familiarizado(a) com:

  • Containers e orquestração, mesmo que básica
  • Vivência em alguma cloud, qualquer que seja
  • Alguma linguagem de programação (em nível pleno ou sênior+)
  • A “literatura feijão com arroz” (Clean Code, Clean Architecture, Pragmatic Programmer, etc.)
  • Algumas boas horas de voo com software em produção
  • Algumas boas horas em War Rooms resolvendo problemas que outras pessoas — e você também, principalmente — causaram

Caso falte algo disso na sua bagagem de experiências, recomendo que não siga a partir daqui. Espero que este texto, ou algo ainda melhor, chegue novamente até você daqui a um tempo — ou siga, se preferir…


O que esperar a partir daqui?

Diante disso, o que recomendo que você (ou eu) leia? Segue uma lista — em ordem — de livros que considero importantes para entender mais sobre tópicos avançados de software e sistemas distribuídos em diferentes escalas. A sequência começa no “não entendo nada sobre isso, preciso de um direcionamento básico” e vai até conteúdos mais densos, com maior profundidade teórica e prática.

Nada aqui é focado em alguma linguagem ou tecnologia específica, pois o objetivo é apresentar conceitos arquiteturais e teóricos que podem ser absorvidos após certa reflexão. Esses conteúdos devem ser “mastigados” por você e depois aplicados ao seu próprio mundo, lado a lado com suas experiências pessoais.

Alguns desses livros nem são técnicos e sequer mencionam tecnologia diretamente, mas fazem sentido dentro do todo. Talvez sejam até os que mais vão agregar à sua formação. Nem todos possuem tradução para o português até o momento, mas, como mencionei, não são recomendados para iniciantes. Ainda que este material possa servir como um mapa, quem vai dar os pulos é você.

Todos os livros e artigos aqui foram lidos por mim. Por isso, há uma curadoria e um cuidado bem grande na seleção. Então, não está “faltando” nada — a menos que eu leia algo novo e decida inserir aqui. É pessoal.



1. Domain-Driven Design: Atacando as Complexidades no Coração do Software

Domain Driven Design

Começo por um livro que, na minha visão, deveria estar na lista de “feijão com arroz” da engenharia de software. Um dos maiores desafios ao implementar microserviços voltados ao reúso é aprender a identificar domínios de forma efetiva, definir responsabilidades, delimitar escopos e depois lidar com isso na forma de contratos de entrada e saída.

O Domain-Driven Design, por mais que, em um primeiro momento, seja facilmente associado à codificação e construção de blocos, não é bem assim. Ele ensina como funciona o particionamento de modelos, a definição de escopos, responsabilidades e recursos compartilhados entre esses domínios.

Pouco importa quantos patterns e tecnologias você conheça para construir sistemas distribuídos se a funcionalidade desses sistemas tiver responsabilidades compartilhadas, fora de escopo ou “vazadas” entre os domínios.

Por isso, considero este livro de Eric Evans um dos pontos mais importantes para começar a plantar a semente de como construir sistemas em grande escala, pensando em microserviços e padrões mais avançados. A definição de limites claros é a “chave dos grandes mistérios” dos sistemas distribuídos — embora o conceito não se limite apenas a esse tema, é claro…

Link do Livro: Domain-Driven Design: Atacando as Complexidades no Coração do Software - Amazon



2. Microsserviços Prontos Para a Produção: Construindo Sistemas Padronizados em uma Organização de Engenharia de Software

Livros

Estamos aqui com o primeiro livro literalmente sobre o tema. Esse livro ficou no meu Kindle por meses após a compra, pois o adquiri depois de já ter certa vivência em ambientes distribuídos de larga escala. Voltei minha atenção para ele quando precisei montar uma lista de recomendações para um mentorado e queria garantir que ele leria algo que realmente fizesse sentido para o nível de conhecimento em que se encontrava.

Acontece que ele tem um excelente potencial introdutório para a prática. Ele aborda — mas não aprofunda — tópicos como observabilidade, tracing, comunicação entre serviços, tolerância a falhas, escalabilidade teórica, como inserir esse tipo de prática no ciclo de desenvolvimento e em práticas DevOps e como descobrir a Lei de Conway na prática e entender como isso reflete na padronização de arquitetura. Você não terá um material extremamente prático aqui, mas terá a porta de entrada para muitos assuntos que podem ser explorados em maior profundidade depois.

Recomendo bastante que seja o primeiro livro sobre o tema, caso você não tenha nenhum tipo de conhecimento prévio. Ele foi escrito exatamente para isso. O livro é de 2017 e a autora é Susan Rigetti, que teve uma excelente passagem pela Uber e diversas outras empresas. Hoje, ela trabalha muito mais com escrita do que com a parte técnica em si. Não sei se foi o famoso “surto de tecnologia” que faz a pessoa largar tudo e “criar patos” numa fazenda, mas, se foi, tem meu respeito.

Link do Livro: Microsserviços Prontos Para a Produção - Amazon



3. Migrando Sistemas Monolíticos Para Microsserviços: Padrões Evolutivos Para Transformar seu Sistema Monolítico

Migrando Sistemas Monolíticos Para Microsserviços

Este livro, escrito por Sam Newman em 2020, é um guia prático e aprofundado sobre como evoluir uma aplicação monolítica — geralmente robusta e complexa, e estável em muitos aspectos — para uma arquitetura de microsserviços. Aprender a decompor serviços e entender como subdividir domínios é, talvez, a tarefa mais importante na hora de realizar uma migração para microsserviços em ambientes evolutivos. E executar essa migração de forma responsável e gradual costuma ser um exercício complexo que envolve várias áreas corporativas.

Saber decompor fronteiras é tema central em Domain-Driven Design, mas este livro reforça esses conceitos e aproxima você da parte técnica dessa tarefa, oferecendo ferramentas de migração e evolução.

Sinceramente, não cheguei a sentar e ler este livro pensando “vou passar algumas horas aqui lendo esse cara neste domingo maravilhoso”. Em vez disso, tratei-o como um livro de cabeceira, recorrendo a capítulos específicos sempre que surgiam dúvidas e eu precisava de um direcionamento. Depois de 2 anos, acabei lendo-o por completo.

Nem todo mundo tem a oportunidade de participar de um processo de decomposição em que algo grande é dividido em várias partes menores. Muitos profissionais já chegam a ambientes monolíticos estáveis, enquanto outros entram em cenários já decompostos de alguma forma. Em muitos lugares, a transição de monólitos para microserviços nem faz sentido ou não é tratada como um caminho evolutivo. Mas ter uma ordem lógica de raciocínio a respeito pode ajudar em ambos os cenários.

Link do Livro: Migrando Sistemas Monolíticos Para Microsserviços - Amazon



4. Engenharia de Confiabilidade do Google: Como o Google Administra Seus Sistemas de Produção

Engenharia de Confiabilidade

O Google criou o conceito de Site Reliability Engineering (SRE) ao fundir tarefas tradicionais de operação e infraestrutura com a mentalidade de desenvolvimento de software, e os passos iniciais adotados por produtos que exigiam engenharia de confiabilidade estão compilados neste primeiro livro sobre o tema.

Arrisco dizer que este é o livro mais importante da minha carreira, por representar um divisor de águas na forma de modernizar e direcionar meu conhecimento. Lidar com sistemas complexos é, por definição, algo complexo… E compreender a fundo as disciplinas de confiabilidade, resiliência, projeto de fallbacks e acompanhamento da saúde de serviços, sempre avaliando oportunidades de melhoria e excelência operacional, é a chave para construir e manter uma vida saudável em microserviços e sistemas distribuídos.

Acredito que o significado de SRE para o mercado hoje é tanto a porta de entrada quanto a chave para todos os temas complexos que um(a) engenheiro(a) precisa conhecer. É fundamental entender o segundo passo após construir algo: mantê-lo no ar, em alto desempenho, com resiliência e tolerância a falhas.

Difícil não é fazer. Difícil é manter no ar depois de feito. Aqui, você será apresentado a conceitos extremamente importantes, como tolerância a falhas, resiliência como cultura organizacional, Service Levels, Error Budgets etc. Praticamente todo o mercado corporativo de larga escala bebe dessa fonte.

Link do Livro: Engenharia de Confiabilidade do Google - Amazon


5. Criando Microsserviços: Projetando sistemas com componentes menores e mais especializados

Criando Microsserviços

O segundo livro de Sam Newman nesta lista, lançado em 2022, compartilha as experiências e práticas que o autor acumulou ao longo dos anos ajudando empresas a migrar de sistemas monolíticos para arquiteturas distribuídas, compostas por serviços menores e mais especializados. Como um sucessor natural do livro Migrando Sistemas Monolíticos Para Microsserviços, aqui encontramos pontos adicionais sobre segurança, zero-trust, modelos de monólitos, modelos de microserviços e, é claro, uma atualização da visão do autor em relação ao primeiro livro.

“Mas eu posso ler só este e pular o primeiro que você citou?” — Poder, você pode. Mas, como comentei, não quero que você siga minha lista à risca. Eu, particularmente, não pularia.

Link do Livro: Criando Microsserviços - Amazon


6. Antifrágil: Coisas que se beneficiam com o caos

Antifrágil

É praticamente impossível você ter saído para beber comigo e não ter me ouvido falar sobre antifragilidade em tecnologia. Desde, no mínimo, 2017, faço essa associação entre SRE, confiabilidade e antifragilidade. Na primeira vez em que li o livro, associei o tema diretamente às atividades que eu exercia no dia a dia para garantir performance, resiliência e excelência operacional. A correlação se tornou ainda mais forte quando entrei de cabeça no setor bancário, em times de tecnologia de missão crítica.

Sim, este é um dos livros que citei no começo do texto como não tendo relação direta com tecnologia — sequer menciona qualquer aspecto técnico. Escrito em 2012 por Nassim Nicholas Taleb, um matemático analista de riscos do setor financeiro, o autor tenta descrever “o contrário de frágil”. Algo que, tal qual uma taça de cristal, quando frágil, se quebra e não volta ao estado original. Já o oposto seria comparado à Hidra de Lerna, que, ao sofrer danos e adversidades, ganha mais cabeças e fica ainda mais forte — em contraste com algo que sofre danos constantes, como um diamante, mas não se transforma.

Taleb também propõe os “fenômenos dos cisnes negros” (ele tem um livro inteiro dedicado só a isso, caso tenha interesse). São fenômenos aleatórios e adversos que mudam completamente a realidade ao redor, e têm a peculiaridade de parecerem “facilmente previsíveis” depois que ninguém os previu de fato…

Esse tipo de literatura, que não é voltada especificamente à tecnologia, agrega muito ao nosso dia a dia se interpretado com certos olhos. Esse livro mudou a forma como eu trabalho, e indico a todos que já tenham uma certa quilometragem percorrida, propondo a mesma associação que fiz.

Link do Livro: Antifrágil: Coisas que se beneficiam com o caos - Amazon


7. Arquitetura de Software: As Partes Difíceis — Análises Modernas de Trade-off Para Arquiteturas Distribuídas

Arquitetura de Software

Arquitetura de Software: As Partes Difíceis é um livro extremamente importante na minha carreira, pois aborda e detalha muitos aspectos que envolvem uma transição real de um sistema monolítico para um ambiente distribuído. Ele cobre uma ampla variedade de disciplinas, desde modelagem de dados, modularização e cuidados com a base de código e os times responsáveis, até padrões como Saga, diferentes tipos de banco de dados, o teorema CAP, comunicação entre microserviços e cuidados relacionados à resiliência.

Este livro ganhou uma tradução para o português do Brasil agora, em 2024. Tenho as duas versões na minha estante e recomendo qualquer uma delas. Escrito por Neal Ford, Mark Richards, Pramod Sadalage e Zhamak Dehghani, ele aprofunda o tipo de ferramenta de que você precisa para entender sistemas distribuídos complexos com excelência. Aqui, você encontrará ainda mais “portas abertas” para explorar.

Link do Livro: Arquitetura de Software: As Partes Difíceis - Amazon


8. Release It!: Design and Deploy Production-Ready Software

Release-it

Ganhei este livro de presente do meu amigo Carlos Panato, uma das minhas maiores referências no mercado. E vou guardá-lo para sempre, pois expandiu muito meu repertório em determinados assuntos.

Aqui, não falamos apenas sobre construção de software, mas também de diversos processos que permeiam esse tema, aumentando o que entendemos como engenharia, incluindo padrões de resiliência, postmortems, escalabilidade, redes (networking), gestão de capacidade (capacity) etc.

A versão que possuo é de 2012. Novas edições foram lançadas, mas, sinceramente, não as li. Continuo com a minha cópia do coração, que ainda me satisfaz plenamente. Escrito por Michael T. Nygard, recomendo fortemente esse livro, apesar de não existir uma versão em português até o momento em que escrevo este artigo. É, talvez, uma das obras mais importantes que você encontrará nesta lista.

Link do Livro: Release It!: Design and Deploy Production-Ready Software - Amazon


9. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Data Intensive

Aqui, entramos definitivamente em uma lista de livros mais complexos. Este é, sem dúvidas, o livro que mais consulto no meu dia a dia atualmente. Li esse cara pela primeira vez em 2022, depois em 2023, e revisitei vários de seus capítulos em 2024. Desde que encontrei meu caminho na jornada atual — no meu emprego atual —, aprender a lidar com aplicações de uso intensivo de dados ou que realmente exigem alto throughput tem sido o desafio mais instigante que já vivenciei. E posso dizer que essa jornada ainda está em andamento: estou naquela sensação constante e empolgante de aprender algo novo e, de quebra, descobrir mais umas quatro coisas que ainda desconheço na sequência.

Tanto este livro quanto o blog do Martin Kleppmann são incríveis e super recomendados para qualquer pessoa que queira se aprofundar em assuntos complexos de arquitetura e engenharia de software. Aqui, abordamos design de databases, replicação, particionamento, SSTables, LSM Trees, B-Trees, compressão, além de formatos e protocolos de comunicação.

É um dos meus livros preferidos desta lista. E ele está no final por um motivo: não dê um passo maior que a perna.

Link do Livro: Designing Data-Intensive Applications - Amazon


10. Building Event-Driven Microservices: Leveraging Organizational Data at Scale

Event Driven

Segundo passo relacionado a temas mais complexos.

Na jornada até aqui, você já chegou além dos modelos síncronos. Seja por conta da literatura proposta ou não, você já tem ferramental suficiente para avaliar, sugerir e implementar soluções envolvendo mensageria e eventos. O Building Event-Driven Microservices, escrito por Adam Bellemare, é um guia prático e estratégico sobre como projetar e implementar sistemas baseados em eventos de forma mais estruturada, com várias dicas e experiências valiosas demais — algo muito difícil de se encontrar fora de um bate-papo informal com quem já passou por algo parecido.

Arquiteturas event-driven viabilizam grande parte das propostas e estratégias de larga escala para sistemas distribuídos. Adicionar esse tipo de compreensão e experiência ao seu “arsenal” agregará muito no momento de projetar arquiteturas corporativas complexas, levando em conta tudo que pode dar certo e, principalmente, o que pode dar errado em ambientes orientados a eventos.

Aqui, você encontrará dicas de como lidar com práticas DevOps, pipelines de dados e fluxos de negócio orientados a eventos, além de como realizar tratativas de erros e compensações. É uma leitura bem madura, que provavelmente precisará ser revisitada algumas vezes.

Link do Livro: Building Event-Driven Microservices - Amazon


11. The Site Reliability Workbook: Practical Ways to Implement SRE

The Site Reliability Workbook

Dando o próximo passo em relação à literatura mais complexa, aqui avançamos no que foi visto sobre Engenharia de Confiabilidade de sistemas complexos.

O Site Reliability Workbook foi escrito e organizado pelos mesmos editores e autores de Engenharia de Confiabilidade do Google — entre eles Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara e Stephen Thorne. Este livro foca em orientações mais práticas e exemplos mais próximos da realidade de como aplicar os princípios de SRE no dia a dia das equipes de engenharia. Se o primeiro livro introduz a filosofia e os pilares do Site Reliability Engineering, o Workbook adota uma postura mais “mão na massa”, oferecendo uma abordagem ainda mais prática na jornada.

“Posso ler só este e pular o Engenharia de Confiabilidade do Google?” — Desta vez, não. Definitivamente, não…

Link do Livro: The Site Reliability Workbook - Amazon


12. Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems

Building Secure and Reliable Systems

Este livro serviu como a maior base para a série de System Design e também embasa o futuro livro que planejo escrever sobre o tema. Building Secure and Reliable Systems é o mais importante desta lista e precisa ser respeitado na ordem, pois o considero o ponto de chegada, onde amarraremos tudo o que vimos até agora. Ele não é o mais complexo, mas a visão que apresenta exige uma certa base funcional para ser plenamente compreendida.

Em geral, segurança, DevOps e confiabilidade costumam ser tratadas como áreas distintas em grandes organizações — ou como “tudo igual” em empresas de pequeno e médio porte. Neste livro, os autores mostram como essas perspectivas se complementam ao longo de todo o processo de desenvolvimento e operação, reforçando que não existe sistema realmente confiável se não for também seguro, e vice-versa — e explicando como lidar com isso de forma incremental e com aprendizado contínuo.

Também foi publicado por profissionais do Google, como Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea e Adam Stubblefield. Dos três livros que discutem temas similares, este é o mais maduro e sênior, voltado tanto para operações quanto para cargos estratégicos Staff+.

Link do Livro: Building Secure and Reliable Systems - Amazon


13. Ludwig Von Bertalanffy – Teoria Geral dos Sistemas (Bônus)

Teoria Geral dos Sistemas

Ao observar organismos vivos, Ludwig Von Bertalanffy percebeu que o todo não pode ser compreendido apenas pela soma das partes. Fatores como feedback e adaptação mostram que a organização global de um sistema influencia diretamente seus componentes. É o que, hoje em dia, chamamos popularmente de visão holística.

Tive contato com este raro livro durante meu MBA de Data Science e Analytics, na disciplina de Pesquisa Operacional. Ele forneceu uma forma complexa e completa de avaliar qualquer sistema, seja ele tecnológico ou o sistema de trânsito de grandes centros. Afinal, tudo no mundo é um sistema.

Este livro é realmente raro e muito difícil de encontrar. Achei minha edição em um sebo digital; ela é mais velha do que eu — a segunda edição, de 1975.

Recomendo-o como uma maneira de entender tanto sistemas complexos quanto a própria organização em que você está inserido, sem mencionar uma única palavra sobre tecnologia.

Link do Livro: Ludwig Von Bertalanffy - CIA do Saber


Por enquanto, é isso!

The future of htmx

In The Beginning…

htmx began life as intercooler.js, a library built around jQuery that added behavior based on HTML attributes.

For developers who are not familiar with it, jQuery is a venerable JavaScript library that made writing cross-platform JavaScript a lot easier during a time when browser implementations were very inconsistent, and JavaScript didn’t have many of the convenient APIs and features that it does now.

Today many web developers consider jQuery to be “legacy software.” With all due respect to this perspective, jQuery is currently used on 75% of all public websites, a number that dwarfs all other JavaScript tools.

Why has jQuery remained so ubiquitous?

Here are three technical reasons we believe contribute to its ongoing success:

  • It is very easy to add to a project (just a single, dependency-free link)
  • It has maintained a very consistent API, remaining largely backwards compatible over its life (intercooler.js works with jQuery v1, v2 and v3)
  • As a library, you can use as much or as little of it as you like: it stays out of the way otherwise and doesn’t dictate the structure of your application

htmx is the New jQuery

Now, that’s a ridiculous (and arrogant) statement to make, of course, but it is an ideal that we on the htmx team are striving for.

In particular, we want to emulate these technical characteristics of jQuery that make it such a low-cost, high-value addition to the toolkits of web developers. Alex has discussed “Building The 100 Year Web Service” and we want htmx to be a useful tool for exactly that use case.

Websites that are built with jQuery stay online for a very long time, and websites built with htmx should be capable of the same (or better).

Going forward, htmx will be developed with its existing users in mind.

If you are an existing user of htmx—or are thinking about becoming one—here’s what that means.

Stability as a Feature

We are going to work to ensure that htmx is extremely stable in both API & implementation. This means accepting and documenting the quirks of the current implementation.

Someone upgrading htmx (even from 1.x to 2.x) should expect things to continue working as before.

Where appropriate, we may add better configuration options, but we won’t change defaults.

No New Features as a Feature

We are going to be increasingly inclined to not accept new proposed features in the library core.

People shouldn’t feel pressure to upgrade htmx over time unless there are specific bugs that they want fixed, and they should feel comfortable that the htmx that they write in 2025 will look very similar to htmx they write in 2035 and beyond.

We will consider new core features when new browser features become available, for example we are already using the experimental moveBefore() API on supported browsers.

However, we expect most new functionality to be explored and delivered via the htmx extensions API, and will work to make the extensions API more capable where appropriate.

Quarterly Releases

Our release schedule is going to be roughly quarterly going forward.

There will be no death march upgrades associated with htmx, and there is no reason to monitor htmx releases for major functionality changes, just like with jQuery. If htmx 1.x is working fine for you, there is no reason to feel like you need to move to 2.x.

Promoting Hypermedia

htmx does not aim to be a total solution for building web applications and services: it generalizes hypermedia controls, and that’s roughly about it.

This means that a very important way to improve htmx — and one with lots of work remaining — is by helping improve the tools and techniques that people use in conjunction with htmx.

Doing so makes htmx dramatically more useful without any changes to htmx itself.

Supporting Supplemental Tools

While htmx gives you a few new tools in your HTML, it has no opinions about other important aspects of building your websites. A flagship feature of htmx is that it does not dictate what backend or database you use.

htmx is compatible with lots of backends, and we want to help make hypermedia-driven development work better for all of them.

One part of the hypermedia ecosystem that htmx has already helped improve is template engines. When we first wrote about how “template fragments” make defining partial page replacements much simpler, they were a relatively rare feature in template engines.

Not only are fragments much more common now, that essay is frequently cited as an inspiration for building the feature.

There are many other ways that the experience of writing hypermedia-based applications can be improved, and we will remain dedicated to identifying and promoting those efforts.

Writing, Research, and Standardization

Although htmx will not be changing dramatically going forward, we will continue energetically evangelizing the ideas of hypermedia.

In particular, we are trying to push the ideas of htmx into the HTML standard itself, via the Triptych project. In an ideal world, htmx functionality disappears into the web platform itself.

htmx code written today will continue working forever, of course, but in the very long run perhaps there will be no need to include the library to achieve similar UI patterns via hypermedia.

Intercooler Was Right

At the end of the intercooler docs, we said this:

Many javascript projects are updated at a dizzying pace. Intercooler is not.

This is not because it is dead, but rather because it is (mostly) right: the basic idea is right, and the implementation at least right enough.

This means there will not be constant activity and churn on the project, but rather a stewardship relationship: the main goal now is to not screw it up. The documentation will be improved, tests will be added, small new declarative features will be added around the edges, but there will be no massive rewrite or constant updating. This is in contrast with the software industry in general and the front end world in particular, which has comical levels of churn.

Intercooler is a sturdy, reliable tool for web development.

Leaving aside the snark at the end of the third paragraph, this thinking is very much applicable to htmx. In fact, perhaps even more so since htmx is a standalone piece of software, benefiting from the experiences (and mistakes) of intercooler.js.

We hope to see htmx, in its own small way, join the likes of giants like jQuery as a sturdy and reliable tool for building your 100 year web services.

What I did in 2024

It's time for my annual self review. In last year's review I said I wanted to improve my site:

  1. fix broken links
  2. organize with tags
  3. improve search
  4. post to my site instead of to social media
  5. move project tracking to my own site

I didn't have any specific goals for writing articles or topics to learn. So what did I do? The biggest thing is that I'm blogging more than in recent years:

Number of blog posts per year (plot from Observable Plot)

Site management

Changes to Twitter and Reddit in recent years have made me think about how I share knowledge. When I share something, I want it to be readable by everyone, forever. I don't want it to be readable only to "members" or "subscribers", like Quora or Medium. I had posted to some of these sites because they were open. But they're sometimes closed now, requiring a login to view what I posted.

My web site has been up for 30 years. The Lindy Effect suggests that what I post to my own site will last longer than what I post to Google+, FriendFeed, MySpace, Reddit, or Twitter. I don't expect Mastodon, Threads, or Bluesky to be up forever either. The article Don't Build Your Castle in Other People's Kingdoms recommends I focus on my own site. But while my own site is easy to post to, my blog hosted by Blogger is not.

I want to make blogging easier for me. I looked at my options for blogging software, and concluded that my web site already supports many of the things I need for a blog. So I decided to write my own blogging software. How hard could it be? Famous last words, right? It's foolish in the same way as "write a game, not a game engine".

But it actually went pretty well! I only had to support the features needed for my own blog, not for everyone's blogs. I didn't need it to scale. I could reuse the existing features I have built for my web site. There are still some features I want to add, but I think I got 80% of what I wanted in <200 lines of Python.

I made it easier to post to my blog, and I posted a lot more this year than in the previous few years. I'm happy about this.

New pages

I sometimes pair a "theory" page with an "implementation" page. The A* theory page describes the algorithms and the A* implementation page describes how to implement them. The Hexagons theory page describes the math and algorithms and the Hexagons implementation page describes how to implement them.

Last year, I studied mouse+touch drag events in the browser and then wrote up a theory page with my recommendations for how to handle the browser events. I claimed that the way I structured the code led to a lot of flexibility in how to handle UI state. This year I made an implementation page with lots of runnable examples showing that flexibility. I show basic dragging, constraints, snapping, svg vs div vs canvas, handles, scrubbable numbers, drawing strokes, painting areas, sharing state, resizing, and Vue components. I show the code for each example, and also link to a runnable CodePen and JSFiddle.

Concepts and implementation pages

I'm very happy with that page, and I wrote a blog post about it.

I also wanted to write a reference page about Bresenham's Line Drawing Algorithm. This page failed. I had started in 2023 with an interactive page that lets you run different implementations of the algorithm, to see how they don't match up. But I realized this year that my motivation for writing that page was anger, not curiosity. My goal was to show that all the implementations were a mess.

But anger isn't a good motivator for me. I don't end up with a good result..

I put the project on hold to let my anger dissipate. Then I started over, wanting to learn it out of curiosity. I re-read the original paper. I read lots of implementations. I took out my interactive visualizations of brokenness. I changed my focus to the properties I might want in a line drawing algorithm.

But I lost motivation again. I asked myself: why am I doing this? and I didn't have a good answer. There are so many things I want to explore, and this topic doesn't seem feel like it's that interesting in the grand scheme of things. So I put it on hold again.

Updates to pages

I treat my main site like a personal wiki. I publish new pages and also improve old pages. I treat my blog differently. I post new pages, but almost never update the existing posts. This year on the main site I made many small updates:

  • Wrote up what I currently understand about "flow field" pathfinding
  • Rewrote parts of a page about differential heuristics, but still quite unhappy and thinking about more rewrites
  • Simplified the implementation of animations in the hexagon guide, when switching from pointy-top to flat-top and back
  • Added more animation modes to my animated mapgen4. This is a fun page you can just stare at for a while.
  • Fixed a long-standing bug in A* diagrams - a reader alerted me to mouse positions not quite lining up with tiles, and I discovered that functions like getBoundingClientRect() include the border and padding of an element.
  • Added a demo of combining distance fields to my page about multiple start points for pathfinding.
  • Updated my two tutorials on how to make interactive tutorials (1 and 2) to be more consistent, point to each other, and say why you might want one or the other.
  • Updated my "hello world" opengl+emscripten code with font rendering and other fixes
  • Continued working on version 3 of my dual-mesh library. I don't plan to make it a standalone project on GitHub until I have used it in a new project, but you can browse the copy of the library inside mapgen4.
  • Made my hexagon guide printable and also savable for offline use using the browser's "Save As" feature.
  • Improved typography across my site, including some features that Safari and Firefox support but Chrome still doesn't.
  • Reduced my use of CDNs after the polyfill.io supply chain attack. I continue to use CDNs for example code that I expect readers to copy/paste.
  • Switched from yarn to pnpm. I liked yarn 1 but never followed it to yarn 2 or yarn 3, and decided it was time to move away from it.
  • Made of my pages internally linkable, so you can link to a specific section instead of the whole page.
  • Used Ruffle's Flash emulator to restore some of the Flash diagrams and demos on my site. When I tried it a few years ago, it couldn't handle most of my swf files, but now it does, hooray!

I didn't remember all of these. I looked through my blog, my notes, and version control history. Here's the git command to go through all my project folders and print out commits from 2024:

for git in $(find . -name .git)
do 
    dir=$(dirname "$git")
    cd "$dir"
    echo ___ "$dir"
    git --no-pager log --since=2024-01-01 --pretty=format:"%as %s%d%n"
    cd - >/dev/null
done

Learning

Curved and stretched map labels

I decided that I should be focusing more on learning new things for myself, instead of learning things to write a tutorial. The main theme this year was maps:

  • I made a list of topics related to labels on maps. These were all potential projects.
  • I ended up spending a lot of time on basic font rendering. What a rabbit hole! Most of the blog posts in 2024 are about font rendering.
  • I did some small projects using square, triangle, hexagon tiles.
  • I experimented with generating map features and integrating them into an existing map. For example, instead of generating a map and detecting peninsulas, I might want to say "there will be a peninsula here" so that I can guarantee that one exists, and what size it is.
  • I tried my hand at gradient descent for solving the parameter dragging problem. In my interactive diagrams, I might have some internal state s that maps into a draggable "handle" on the diagram. We can represent this as a function pos(s₁) returning position p₁. When the handle is moved to a new location p₂, I want to figure out what state s₂ will have pos(s₂) closest to p₂. Gradient descent seems like a reasonable approach to this problem. However, trying to learn it made me realize it's more complicated than it seems, and my math skills are weak.
  • I wanted to create a starter project for rot.js with Kenney tiles. I was hoping to use this for something, but then never did.
  • While learning about font rendering, I also got to learn about graphics, antialiasing, sRGB vs linear RGB, gamma correction, WebGL2. This was a rabbit hole in a rabbit hole in a rabbit hole in a rabbit hole…

But secondarily, I got interested in programming language implementation:

At the beginning of the year I was following my one-week timeboxing strategy. I've found it's good to prevent me from falling into rabbit holes. But my non-work life took priority, and I ended up relaxing my one-week limits for the rest of the year. I also fell into lots of rabbit holes. I am planning to resume timeboxing next year.

Next year

I want to continue learning lots of new things for myself instead of learning them for writing tutorials. The main theme for 2025 will probably be text:

  • name generators
  • large language models
  • programming languages
  • procedurally generating code

I also want to continue working on maps. It has been six years since I finished mapgen4, and I am starting to collect ideas for new map projects. I won't do all of these but I have lots of choose from:

  • towns, nations, cultures, factions, languages
  • roads, trading routes
  • farms, oil, gold, ore
  • valleys, mountain ranges, lakes, peninsulas, plateaus
  • rivers, coral reefs, caves, chasms, fjords, lagoons
  • forests, trees, snow, waterfalls, swamps, marshes
  • soil and rock types
  • groundwater
  • atmospheric circulation
  • ocean currents
  • tectonic plates
  • animal and plant types
  • named areas
  • icons, stylized drawing
  • update the graphics code in mapgen4

I don't plan to make a full map generator (but who knows!). Instead, I want to learn techniques and write quick&dirty prototype code. I also plan to continue enhancing my web site structure and build process, including navigation, link checking, project management, bookmarks, more blog features, and maybe sidenotes. Although text and maps are the main themes, I have many more project ideas that I might work on. Happy 2025 everyone!

Books I Read in 2024

I enjoy reading quite a bit. Nevertheless, it’s something I need to be intentional about incorporating into my life. I usually to finish a modest number of books in a year. This year I made it to a nice round ten.

A big theme of the year is the Murderbot Diaries. My friend Jess recommended them to me a while back, and I finally got around to All Systems Red last year. I was hooked enough that Tess bought me the subsequent three for Christmas last year. So you’re gonna see almost all the Murderbot books on this list.

Bindle Punk Bruja by Desideria Mesa
Mesa’s debut novel is set in 1920s Kansas City and centers around Rose, the daughter of Mexican immigrants who takes on the city’s mob bosses with the help of friends and some witchy magical powers. This was a gift from my sister Anna for Christmas last year.
The Deep Sky by Yume Kitasei
A murder mystery that takes place in space on a one-way mission to colonize a far-off planet. A gift from my sister.
Wolfsong by TJ Klune
Warewolves, magic, queerness, and family. A gift from my sister.
Fingersmith by Sarah Waters
Mystery, intrigue, gay ladies, and some incredible plot twists. Tess gave me this one.
Artificial Condition by Martha Wells
Murderbot investigates its past.
Rogue Protocol by Martha Wells
Murderbot goes digging for evidence against its former corporate master, GrayCris.
Exit Strategy by Martha Wells
Murderbot attempts to save its former owner from GrayCris.
Network Effect by Martha Wells
Continuing the Murderbot obsession, I picked up this one from a Books Inc in San Francisco’s Marina district.
I’m Starting to Worry About This Black Box of Doom by Jason Pargin
Anna and I decided to read this together, book club style. It’s a ridiculous romp across the US with some deep commentary on the perils of social media: how it promotes dogpiling in ways that can destroy lives, and the groupthink that arises from being terminally online.
The Full Moon Coffee Shop by Mai Mochizuki
A cute, short read about a mysterious popup coffee shop that appears in Kyōto during the full moon.

Recursive project search in Emacs

Before reading this, you might want to check it out in video form on Youtube, or with the embed below:

Video is probably a more helpful format for demonstrating the workflow I’m talking about. Otherwise read on. If you’re coming from the video to look at the Elisp code, it is found towards the bottom of this post.

“Recursive project search” is the name I’m giving to the flow where you do some kind of search to identify things that need to be done, but each of those tasks may leads you to do another search, etc. You need to complete all the sub-searches, but without losing your place in the parent searches.

This is extremely common in software development and maintenance, whether you are just trying to scope out a set of changes, or whether actually doing them. In fact just about any task can end being some form of this, and you never know when it will turn out that way.

This post is about how I use Emacs to do this, which is not rocket science but includes some tips and Elisp tweaks that can help a lot. When it comes to other editors or IDEs I’ve tried, I’ve never come close to finding a decent workflow for this, so I’ll have to leave it to other people to describe their approaches with other editors.

Example task - pyastgrep

I’m going to take as an example my pyastgrep project and a fairly simple refactoring I needed to do recently.

For background, pyastgrep is a command line program and library that allows you to grep Python code at the level of Abstract Syntax Trees rather than just string. At the heart of this is a function that takes a Python path, and converts it to AST and also XML.

The refactoring I want to do is make this function swappable, mostly so that users can apply different caching strategies to it. This is going to be a straightforward example of turning it into a parameter, or “dependency injection” if you want a fancy term. But that may involve modifying a number of functions in several layers of function calls.

The function in question is process_python_file.

Example workflow

The first step is a search, which in this case I will doing using lsp-mode. I happen to use lsp-pyright for Python, but there are other options.

So I’ll kick off by opening the file, putting my cursor on the function python_process_file, and calling M-x lsp-find-references. This returns a bunch of references. I can then step through them using M-x next-error and M-x previous-error — for which there are shortcuts defined, and I also do this so much that I have an F key for it — F2 and shift-F2 respectively.

Notice that the in addition to the normal cursor, there is also a little triangle in the search results which shows the current result you are on.

/blogmedia/emacs-recursive-search-lsp-find-references.png

In this case, after the function definition itself, and an import, there is just one real usage – that last item in the search results.

The details of doing this refactoring aren’t that important, but I’ll include some of the steps for completeness. The last result brings me to code like this:

def search_python_file(
    path: Path | BinaryIO,
    query_func: XMLQueryFunc,
    expression: str,
) -> Iterable[Match | ReadError | NonElementReturned]:

    ...

    processed_python = process_python_file(path)

So I make process_python_file a parameter:

def search_python_file(
    path: Path | BinaryIO,
    query_func: XMLQueryFunc,
    expression: str,
    *,
    python_file_processor: Callable[[Path], ProcessedPython | ReadError] = process_python_file,
) -> Iterable[Match | ReadError | NonElementReturned]:

    ...

    processed_python = python_file_processor(path)

Having done this, I now need to search for all usages of the function I just modified, search_python_file, so that I can pass the new parameter — another M-x lsp-find-references. I won’t go into the details this time, in this case it involves the following:

Wherever search_python_file is used, either:

  • don’t pass python_file_processor, because the default is what we want.

  • or do pass it, usually by similarly adding a python_file_processor parameter to the calling function, and passing that parameter into search_python_file.

This quickly gets me to search_python_files (note the s), and I find that it is imported in pyastgrep.api. There are no usages to be fixed here, but it is exported in __all__. This reminds me that the new parameter to this search_python_files function is actually intended to be a part of the publicly documented API — in fact this is the whole reason I’m doing this change. This means I now need to fix the docs. Another search is needed, but this time a string-based grep in the docs folder. For this, I use ripgrep and M-x rg-project-all-files.

So now I have another buffer of results to get through – I’m about 4 levels deep at this point.

Now comes one of the critical points in this workflow. I’ve completed the docs fix, and I’ve reached the end of that ripgrep buffer of search results:

/blogmedia/emacs-recursive-search-ripgrep-end.png

So I’ve come to a “leaf” of my search. But there were a whole load of other searches that I only got half way through. What happens now?

All I do is kill these finished buffers – both the file I’m done with, and the the search buffer. And that puts me back to the previous search buffer, at exactly the point I left off, with the cursor in the search buffer in the expected place.

/blogmedia/emacs-recursive-search-lsp-find-references-search_python_files_continue.png

So I just continue. This process repeats itself, with any number of additional “side quests”, such as adding tests etc., until I get to the end of last search buffer, at which point I’m done.

Explanation

What I’ve basically done here is a depth first recursive search over “everything that needs to be done to complete the task”. I started working on one thing, which lead to another and another, and for each one I went off and completed them as they came up.

Doing a search like that requires keeping track of a fair amount of state, and if I were doing that in my head, I would get lost very quickly and forget what I was doing. So what I do instead is to use Emacs buffers to maintain all of that state.

The buffers themselves form a stack, and each buffer has a cursor within it which tells me how far through the list of results I am. (This is equivalent to how recursive function calls typically work in a program - there will be a stack of function calls, each with local variables stored in a frame somehow).

In the above case I only went about 4 or 5 levels deep, and each level was fairly short. But you can go much deeper and not get lost, because the buffers are maintaining all the state you need. You can be 12 levels down, deep in the woods, and then just put it to one side and come back after lunch, or the next day, and just carry on, because the buffers are remembering everything for you, and the buffer that is in front of you tells you what you were doing last.

It doesn’t even matter if you switch between buffers and get them a bit out of order – you just have to ensure that you get to the bottom of each one.

Another important feature is that you can use different kinds of search, and mix and match them as you need, such as lsp-find-references and the ripgrep searches above. In addition to these two, you can insert other things “search-like” things linters, static type checkers, compilers and build processes – anything that will return a list of items to check. So this is not a feature of just one mode, it’s a feature of how buffers work together.

At each step, you can also apply different kinds of fixes – e.g. instead of manually editing, you might be using M-x lsp-rename on each instance, and you might be using keyboard macros etc.

When I’ve attempted to use editors other than Emacs, this is one of the things I’ve missed most of all. Just getting them to do the most basic requirement of “do a search, without overwriting the previous search results” has been a headache or impossible – although I may have given up too soon.

I’m guessing people manage somehow – or perhaps not: I’ve sometimes noticed that I’ve been happy to take on tasks that involved this kind of workflow which appeared to be daunting to other people, and completed them without problem, which was apparently impressive to others. I also wonder whether difficulties in using search in an editor drive a reluctance to take on basic refactoring tasks, such as manual renames, or an inability to complete them correctly. If so, it would help to explain why codebases often have many basic problems (like bad names).

In any case, I don’t think I can take for granted that people can do this, so that’s why I’m bothering to post about it!

Requirements

What do you need in Emacs for this to work? Or what equivalent functionality is needed if you want to reproduce this elsewhere?

  • First, the search buffer must have the ability to remember your position. All the search modes I’ve seen in Emacs do this.

  • Second, you need an easy way to step through results, and this is provided by the really convenient M-x next-error function which does exactly what you want (see the Emacs docs for it, which describe how it works). This function is part of Compilation Mode or Compilation Minor Mode, and all the search modes I’ve seen use this correctly.

  • Thirdly, the search modes mustn’t re-use buffers for different searches, otherwise you’ll clobber earlier search results that you hadn’t finished processing. Some modes do this automatically, others have a habit of re-using buffers — but we can fix that:

Unique buffers for searches

If search commands re-use search buffers by default, I’ve found it’s usually pretty easy to override the behaviour so that you automatically always get a unique buffer for each new search you do.

The approach you need often varies slightly for each mode, but the basic principle is similar - different searches should get different buffer names, and you can use some Elisp Advice to insert the behaviour you want.

So here are the main ones that I override:

rg.el for ripgrep searching:

(defadvice rg-run (before rg-run-before activate)
  (rg-save-search))

This is just using the save feature built in to rg.el.

This goes in your init.el. Since I use use-package I usually put it inside the relevant (use-package :config) block.

For lsp-find-references I use the following which makes a unique buffer name based on the symbol being searched for:

(advice-add 'lsp-find-references
            :around #'my/lsp-find-references-unique-buffer)

(defun my/lsp-find-references-unique-buffer (orig-func &rest args)
  "Gives lsp-find-references a unique buffer name, to help with recursive search."
  (let
      ((xref-buffer-name (format "%s %s" xref-buffer-name (symbol-at-point))))
    (apply orig-func args)))

Then for general M-x compile or projectile-compile-project commands I use the following, which gives a unique buffer name based on the compilation command used:

(advice-add 'compilation-start
            :around #'my/compilation-unique-buffer)

(defun my/compilation-unique-buffer (orig-func &rest args)
  "Give compilation buffers a unique name so that new compilations get new
buffers. This helps with recursive search.

If a compile command is run starting from the compilation buffer,
the buffer will be re-used, but if it is started from a different
buffer a new compilation buffer will be created."
  (let* ((command (car args))
         (compilation-buffer (apply orig-func args))
         (new-buffer-name (concat (buffer-name compilation-buffer) " " command)))
      (with-current-buffer compilation-buffer
        (rename-buffer new-buffer-name t))))

I use this mode quite a lot via M-x compile or M-x projectile-compile-project to do things other than compilation – for custom search commands, like pyastgrep, for linters and static checkers.

Other tips

External tools

Many of externals tools that you might run via M-x compile already have output that is in exactly the format that Emacs compilation-mode expects, so that search results or error messages become hyperlinked as expected inside Emacs. For example, mypy has the right format by default, and most older compilers like gcc etc.

For those tools that don’t, sometimes you can tweak the options of how they print. For example, if you run ripgrep as rg --no-heading (just using normal M-x compile, without a dedicated ripgrep mode), it produces the necessary format.

Alternatively, you can make custom wrappers that fix the format. For example, I’ve got this one-liner bash script to wrap pyright:

#!/bin/sh
# Adjust ANSI colour and codes and whitespace output from pyright to make it work nicer in Emacs
pyright "$@" | sed 's/\x1b\[[0-9;]*[mGK]//g' | awk '{$1=$1};1' |  sed 's/\(.*:[0-9]*:[0-9]*\) -/\1: -/g'

Buffer order and keeping things tidy

Emacs typically manages your buffers as a stack. However, it’s very easy for things to get out of order as you are jumping around files or search buffers. It doesn’t matter too much if you go through the search buffers in the “wrong” order – as long as you get to the bottom of all of them. But to be sure I have got to the bottom, I do two things:

  • Before starting anything that I anticipate will be more than a few levels deep, I tidy up by closing all buffers but the one I’m working on. You can do this with crux-kill-other-buffers from crux. I have my own version that is more customised for my needs — crux-kill-other-buffers only kills buffers that are files.

  • At the end, when I think I’m done, I check the buffer list (C-x C-b) for anything else I missed.

Project context

In order to limit the scope of searches to my project, I’m typically leaning on projectile.el, but there are other options.

Conclusion

I hope this post has been helpful, and if you’ve got additional tips for this kind of workflow, please leave a comment!

Reflecting on 2024, preparing for 2025

If you do things a few times, they're a tradition. This is the third time I'm writing one of these, so I guess it's an annual tradition now! This is where I reflect on the year that's been, and talk some about my hopes and goals for the next year.

Reflecting on 2024

This year has been a lot, and there are a few months of it that just feel like a black hole to me. That's because I got sick in the middle of it. I'm really proud of how much I got done in spite of being the sickest I've ever been. And I'm excited to see what I can do next year, now that I'm nearly fully recovered.

Professional

I spoke at a conference! This year marked my first ever conference talk. Technically, my first one was at SIGBOVIK 2024, but I'm really talking about !!con. I've submitted talks to conferences before and this is the first one I've ever had accepted. You can watch the recording. That link takes you to the playlist of all !!con talks from this (final) year, so please enjoy them all!

It was an incredible experience. The whole conference felt like I was with old friends who I just hadn't met yet. It made me remember the power of connecting with other nerds in physical space. And it reminded me of the joy of being on a stage. More on that in the personal section.

I wrote even more than last year. My goal for this year was to continue my status quo: publish at least one blog post each week. I overshot this again, with 60 blog posts over 90,000 words. The most important thing for me has been consistency. By writing every week, I've been able to continue to use this momentum to stretch my creative practice. This even held during my illness this summer, and it was something for me to hold on to when I could do little else.

I got paid for my writing. This is the first time I have been paid explicitly to write. I was sponsored to write a post about an open-source product (and the contract even requires that I misuse it, on purpose, since that's what I pitched). The overall experience was pretty good, and I'm also not sure I would do it again in the near future: while I'm working a day job, I don't want to spend my limited writing time on things I'm not already self-motivated to publish. But—I dearly want to find a way to get paid for my writing which isn't sponsored posts. This might look like a Patreon or similar, so let me know if you're interested, and you might be the nudge to get me over that finish line.

I started coaching people. This year saw me take on my first three coaching clients. These were all pro bono, friends of mine who needed some help with career questions and technical leadership development. It's been an incredible experience, getting to directly help people grow and overcome their challenges. (If you're interested in being coached by me on technical leadership, reach out to me! You're a particularly good fit if you're a senior or staff software engineer aiming to level up, and members of marginalized groups are who I'm most hoping to help.)

I grew as a leader. Most of my reflection here is private because it's so intertwined with specific leadership challenges at my day job, but it's been a really helpful year for me in my leadership development. I've learned a lot, and I've seen a lot of old decisions come around to their conclusion to complete my learning arc.

Personal

I got sick. It's hard to say when exactly I got sick, but my symptoms got to where I had to go to my doctor at the start of May, and my full recovery started in November. I am pretty sure I was sick before that, since I had started needing more and more sleep throughout the spring, but it's impossible to say at this point.

At the peak of my illness, I was in near constant pain (about a 6 on the pain scale) and could not stand up for 5 minutes without having tachycardia. If that happened, I had to lie on my back until my heart rate came back down. Walking around the block was an impossibility, when months prior I was running 25 miles a week and hauling the kids around on my bicycle.

I bounced through multiple doctors. We hit the end of my general practitioner's expertise, so she referred me to a GI practice since our core symptom was related to my liver and abdominal pain. (The liver was ultimately a red herring: what looked unusual on my ultrasound was, with further testing, not concerning.) The GI practice did a lot of tests, and I had a lot of waiting (GI docs are in such high demand, you can't see them), but ultimately... they also found nothing. Meanwhile I was still in pain, and could do very little.

Around this point, I went on medical leave with my employer. Before that I had been working as much as I was able, and contributing something of value, but it had become clear that I was going to need to focus on my recovery if I wanted to actually get to the bottom of this. Going on medical leave was terrifying, because it meant I would be without income for months. But it was ultimately the right decision.

Around this same time, I went to my third doctor. She got me a diagnosis. A friend sent me to her, since she's a specialist in conditions that present with ambiguous symptoms and chronic fatigue. This doctor ran a lot of tests—expensive tests, which insurance doesn't cover—and we ultimately got me a firm diagnosis six months into the whole ordeal. After starting treatment, it's been a very fast recovery: almost two months in, and I'm over 90% normal. How our definitions of "fast" shift. I may have relapses in the future, there's no way to know, but I'm relieved to know what is going on.

I stopped running. The pain started in May when I was running, so I had to abruptly go from running 25 miles a week to not at all. Eventually it moved from pain while running to constant pain. And then eventually... it faded away entirely, as long as I keep up on one of my medications. Now I can run if I want to—but I've decided not to.

For the last decade, I've identified as a runner. I did a few half marathons with reasonable times, and I did a marathon in 4:07, finishing in 75 F heat. For much of the decade before that, I identified as a cyclist. I was into cycling from the moment I could get on a bike, always wanting to go further and faster. In high school, I did a 100 mile bike ride.

Now that I've been forced to take a break from it, I've realized I'm pretty content with not putting in the grueling schedule needed to get back to the high level of performance I thought I wanted to target next year. I might get back into this sooner or later, but right now, I'm working on functional strength and being healthy and having more balance in my life.

I got back into music! When I was sick, I bought a wind synth and started playing music again. Then I started taking lessons with an incredible teacher who's an accomplished musician in his own right. And then I got a drum pad, another wind synth, a hand-me-down keyboard, and got my clarinet back out. I've fallen deep into this and I'm loving every single minute of it, frustration and all.

This is replacing a lot of the dedication and discipline I used to get from endurance exercise. It has the added benefit of creating art in the process, which heals my soul. The deep breathing involved in playing wind instruments certainly helps me as well.

I'm learning some music theory, and it's hard! I want to learn how to write songs and compose music, and I'm going to get there. If you have any favorite resources for this, please send them to me!

The upshot of getting back into music is that it will, hopefully, give me a way to perform again. I've been a performer in some aspect for a lot of my life. In school, I was in our concert bands and in small wind ensembles. I was on the debate team, a shock to people who knew me as the shy kid who shook when forced to speak in front of the class. Since school, though, I've lost this opportunity. I got a taste of performance again with my conference talk, and I think music will be my route to performing regularly.

I'm starting to use my voice. Advocacy and activism were always things I looked up to but didn't feel like I was able to do. But then I found I have a voice, and I realized I need to use it. I shared a few posts this year on things that are important and required me to speak up, like trans rights and the crisis that was happening in Asheville.

Organized two rated chess tournaments. This year I organized two tournaments for our local chess club! They had about 12 players each, and went off smoothly. It was a good experience all around. I'm not sure I'll have the opportunity (or desire) to do this again this year.

Kept my head up. Current events have been... a lot... and I've managed to keep my head up for most of it. I'm going to keep going, and keep trying.

Upgraded the workshop for all-year use. Historically I've only been able to use my workshop for a few months of the year when the weather is right. This year, we got it insulated, replaced the windows, and added a heat pump, and all those combine to mean I can keep it temperate all year so I can go out there whenever I want. (Some wood finishes are inadvisable in cold or hot weather since they need good ventilation, but otherwise any time for anything.)

It's changed my relationship with woodworking, since now I can pop out there and make something whenever I want. When I was rearranging my desk yesterday, I realized I really needed a headphone stand. And so I popped out to the workshop and put in a total of two hours of work (split across a few sessions for glue and finish to dry), and now I have one! Something similar happened when I needed an adapter to mount my drum pad: I just made it.

Last year's goals

This year was a lot! How does it stack up against what I wanted to do last year?

  • ❓ I wanted to keep my rights, and I did—for now, in my own state. But it's really tenuous, and there are many states where I'd be punished for using a bathroom. The state I was born in, Ohio, has banned trans folks from using the bathrooms consistent with their gender, and the incoming administration is a dark cloud. I won't call this a miss yet, but I can't call it a win.
  • ✅ No personal side projects went into production! I once again toyed with the idea and once again talked myself out of it. Good job, me!
  • ✅ I am not sure I struck a better balance with calls and making, but I embraced that I love talking to my friends and just continued to make time for them. This is going to be even more important next year.
  • ✅ I kept writing on the same schedule, and I did expand it! I did a creative writing class, and even wrote a poem as well.
  • ❌ I did not do any comedy this year, so it's a miss. But it's a happy miss, because I found other things that I was drawn to.
  • ✅ I stayed pretty active in my communities, given my health.
  • ✅ I was a good parent and partner, given my health.
  • ✅ I finished voice training! This was almost a gimme, since I was done in January this year.
  • ✅ My ergonomic setup was definitely improved. I still want to work on using Talon more, but I have made improvements there as well.
  • ❌ I did not do more technical projects this year. I've started a few, and I'm going along in the background, but health got in the way.
  • ❌ I did not get back into competitive chess, and I probably won't. It simply doesn't feel as important right now. Music is filling the role that it filled in my life.
  • ✅ I kept my mental health strong!

I think I did really well on these goals, even if it were a normal year and not a year where I spent 6 months sick to various degrees and where I started to feel the crushing weight of our politics and its threat on my rights and my life. This year I did learn a bit about what is important to me, and where I want to spend my time. That's reflect in my hopes and goals for next year.

Hopes and goals for 2025

These aren't predictions or concrete goals, but a reflection on what I'd like the next year to be. This is what I hope 2025 looks like for me.

Keep my rights. A perennial goal at this point, it's the headliner since trans rights are near the top of the Republican's agenda for this new administration. I'm wary, and I'm going to do what I need to do to keep myself and my family safe. I think that can be done from where I am in a safe, supportive community, but we will keep ourselves safe while continuing to advocate for all those who need protection. In particular, I'm going to keep living my best life and being positive representation for other trans folks who are similarly under attack.

No personal-time side projects into production. This one will probably be a forever anti-goal for me. I don't want to do ops-y things in my free time (despite feeling like shaving that yak occasionally), and I don't want to support a product in my free time. My free time is more about playful exploration.

Maintain relationships with friends and family. This is the 2025 version of 2024's goal of "strike a better balance with calls and making." I'm positive I'll have time set aside for making things and for playing music. But this is going to be a challenging year, so my loved ones (given/chosen family and dear friends) will need my support and I will need theirs. So I'm going to put those relationships first and foremost.

Explore ways to make this my living. I want to do more playful exploration, the kinds of things I do on my blog, and make that my living eventually! 2025 isn't when I'll get there, but I want to try out one or two things (like a Patreon? fund myself via consulting and coaching?) to start understanding what might work for both me and my readers.

Keep my mental health strong. This is going to be a challenge, and I am in a good spot for it. I'll need to dedicate effort to it, though, what with the upcoming onslaught.

Release some recorded music. I'm working on a lot of aspects of my music. Eventually, I want to write something and release it. This year might be releasing a recording of a cover, but it might also be an original piece. I'm not sure!

Write some original music. I don't know if I'll release something of my own, but I know I need to work on it. We'll see how this goes! It's scary to me, and it's also something I'm confident I can do if I put in the effort to learn how.

Do some ridiculous fun projects with code. There are a few things I really wanted to work on in 2024 that are just playful, fun, ridiculous things. I didn't get to do them since I was, uh, kinda sick (not sure if I mentioned that yet). So they're burning to get out of me this year. I want at least one to get out of me this year.

* * *

That's it! I've poured a lot of myself into this post. If you've made it this far: thank you, so much, for reading.

2024 had a lot in it, good and bad. I'm trying to hold both at the same time, to remember the good and remember the bad, as they are both important aspects of the year for different reasons.

I hope that 2025 keeps much of the good, and that we can minimize the bad. I'm going to do everything I can to hold joy in this world. Please join me in that, and let's fill 2025 with joy, even in the face of all that's being thrown at us.

datalists are more powerful than you think

by Alexis Degryse

I think we all know the <datalist> element (and if you don’t, it’s ok). It holds a list of <option> elements, offering suggested choices for its associated input field.

It’s not an alternative for the <select> element. A field associated to a <datalist> can still allow any value that is not listed in the <option> elements.

Here is a basic example:

Pretty cool, isn't it? But what happens if we combine <datalist> with less common field types, like color and date:

<label for="favorite-color">What is your favorite color?</label>
<input type="color" list="colors-list" id="favorite-color">
<datalist id="colors-list">
<option>#FF0000</option>
<option>#FFA500</option>
<option>#FFFF00</option>
<option>#008000</option>
<option>#0000FF</option>
<option>#800080</option>
<option>#FFC0CB</option>
<option>#FFFFFF</option>
<option>#000000</option>
</datalist>

Colors listed in <datalist> are pre-selectable but the color picker is still usable by users if they need to choose a more specific one.

  <label for="event-choice" class="form-label col-form-label-lg">Choose a historical date</label>
<input type="date" list="events" id="event-choice">
<datalist id="events">
<option label="Fall of the Berlin wall">1989-11-09</option>
<option label="Maastricht Treaty">1992-02-07</option>
<option label="Brexit Referendum">2016-06-23</option>
</datalist>

Same here: some dates are pre-selectable and the datepicker is still available.

Depending on the context, having pre-defined values can possibly speed up the form filling by users.

Please, note that <datalist> should be seen as a progressive enhancement because of some points:

  • For Firefox (tested on 133), the <datalist> element is compatible only with textual field types (think about text, url, tel, email, number). There is no support for color, date and time.
  • For Safari (tested on 15.6), it has support for color, but not for date and time.
  • With some screen reader/browser combinations there are issues. For example, suggestions are not announced in Safari and it's not possible to navigate to the datalist with the down arrow key (until you type something matched with suggestions). Refer to a11ysupport.io for more.

Find out more

2024W52

We’ve made it to the last week of 2024. Phew.


I’ve been working on catching up on some blog posts from this past year. It’s been hard to find the time to fill out posts the way I want to. This week, despite the hectic nature of holidays, I found some time to fill in some missing posts.


This interview with Dr. Erin A. Cech about her new book The Trouble with Passion was interesting. She’s a sociologist focusing on social inequality, and she’s written a bunch about how employers exploit people’s passion for their work to compensate their employees less for the work they do.


A coworker told me recently about this website, JustWatch, that aggregates listings from various streaming services to answer the question “where do I watch this thing?”. Very handy.

Intel's $475 million error: the silicon behind the Pentium division bug

In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line. The Pentium had many improvements over the previous processor, the Intel 486, including a faster floating-point division algorithm. A year later, Professor Nicely, a number theory professor, was researching reciprocals of twin prime numbers when he noticed a problem: his Pentium sometimes generated the wrong result when performing floating-point division. Intel considered this "an extremely minor technical problem", but much to Intel's surprise, the bug became a large media story. After weeks of criticism, mockery, and bad publicity, Intel agreed to replace everyone's faulty Pentium chips, costing the company $475 million.

In this article, I discuss the Pentium's division algorithm, show exactly where the bug is on the Pentium chip, take a close look at the circuitry, and explain what went wrong. In brief, the division algorithm uses a lookup table. In 1994, Intel stated that the cause of the bug was that five entries were omitted from the table due to an error in a script. However, my analysis shows that 16 entries were omitted due to a mathematical mistake in the definition of the lookup table. Five of the missing entries trigger the bug— also called the FDIV bug after the floating-point division instruction "FDIV"—while 11 of the missing entries have no effect.

This die photo of the Pentium shows the location of the FDIV bug. Click this image (or any other) for a larger version.

This die photo of the Pentium shows the location of the FDIV bug. Click this image (or any other) for a larger version.

Although Professor Nicely brought attention to the FDIV bug, he wasn't the first to find it. In May 1994, Intel's internal testing of the Pentium revealed that very rarely, floating-point division was slightly inaccurate.1 Since only one in 9 billion values caused the problem, Intel's view was that the problem was trivial: "This doesn't even qualify as an errata." Nonetheless, Intel quietly revised the Pentium circuitry to fix the problem.

A few months later, in October, Nicely noticed erroneous results in his prime number computations.2 He soon determined that 1/824633702441 was wrong on three different Pentium computers, but his older computers gave the right answer. He called Intel tech support but was brushed off, so Nicely emailed a dozen computer magazines and individuals about the bug. One of the recipients was Andrew Schulman, author of "Undocumented DOS". He forwarded the email to Richard Smith, cofounder of a DOS software tools company. Smith posted the email on a Compuserve forum, a 1990s version of social media.

A reporter for the journal Electronic Engineering Times spotted the Compuserve post and wrote about the Pentium bug in the November 7 issue: Intel fixes a Pentium FPU glitch. In the article, Intel explained that the bug was in a component of the chip called a PLA (Programmable Logic Array) that acted as a lookup table for the division operation. Intel had fixed the bug in the latest Pentiums and would replace faulty processors for concerned customers.3

The problem might have quietly ended here, except that Intel decided to restrict which customers could get a replacement. If a customer couldn't convince an Intel engineer that they needed the accuracy, they couldn't get a fixed Pentium. Users were irate to be stuck with faulty chips so they took their complaints to online groups such as comp.sys.intel. The controversy spilled over into the offline world on November 22 when CNN reported on the bug. Public awareness of the Pentium bug took off as newspapers wrote about the bug and Intel became a punchline on talk shows.4

The situation became intolerable for Intel on December 12 when IBM announced that it was stopping shipments of Pentium computers.5 On December 19, less than two months after Nicely first reported the bug, Intel gave in and announced that it would replace the flawed chips for all customers.6 This recall cost Intel $475 million (over a billion dollars in current dollars).

Meanwhile, engineers and mathematicians were analyzing the bug, including Tim Coe, an engineer who had designed floating-point units.7 Remarkably, by studying the Pentium's bad divisions, Coe reverse-engineered the Pentium's division algorithm and determined why it went wrong. Coe and others wrote papers describing the mathematics behind the Pentium bug.8 But until now, nobody has shown how the bug is implemented in the physical chip itself.

A quick explanation of floating point numbers

At this point, I'll review a few important things about floating point numbers. A binary number can have a fractional part, similar to a decimal number. For instance, the binary number 11.1001 has four digits after the binary point. (The binary point "." is similar to the decimal point, but for a binary number.) The first digit after the binary point represents 1/2, the second represents 1/4, and so forth. Thus, 11.1001 corresponds to 3 + 1/2 + 1/16 = 3.5625. A "fixed point" number such as this can express a fractional value, but its range is limited.

Floating point numbers, on the other hand, include very large numbers such as 6.02×1023 and very small numbers such as 1.055×10−34. In decimal, 6.02×1023 has a significand (or mantissa) of 6.02, multiplied by a power of 10 with an exponent of 23. In binary, a floating point number is represented similarly, with a significand and exponent, except the significand is multiplied by a power of 2 rather than 10.

Computers have used floating point since the early days of computing, especially for scientific computing. For many years, different computers used incompatible formats for floating point numbers. Eventually, a standard arose when Intel developed the 8087 floating point coprocessor chip for use with the 8086/8088 processor. The characteristics of this chip became a standard (IEEE 754) in 1985.9 Subsequently, most computers, including the Pentium, implemented floating point numbers according to this standard. The result of a basic arithmetic operation is supposed to be accurate up to the last bit of the significand. Unfortunately, division on the Pentium was occasionally much, much worse.

How SRT division works

How does a computer perform division? The straightforward way is similar to grade-school long division, except in binary. That approach was used in the Intel 486 and earlier processors, but the process is slow, taking one clock cycle for each bit of the quotient. The Pentium uses a different approach called SRT,10 performing division in base four. Thus, SRT generates two bits of the quotient per step, rather than one, so division is twice as fast. I'll explain SRT in a hand-waving manner with a base-10 example; rigorous explanations are available elsewhere.

The diagram below shows base-10 long division, with the important parts named. The dividend is divided by the divisor, yielding the quotient. In each step of the long division algorithm, you generate one more digit of the quotient. Then you multiply the divisor (1535) by the quotient digit (2) and subtract this from the dividend, leaving a partial remainder. You multiply the partial remainder by 10 and then repeat the process, generating a quotient digit and partial remainder at each step. The diagram below stops after two quotient digits, but you can keep going to get as much accuracy as desired.

Base-10 division, naming the important parts.

Base-10 division, naming the important parts.

Note that division is more difficult than multiplication since there is no easy way to determine each quotient digit. You have to estimate a quotient digit, multiply it by the divisor, and then check if the quotient digit is correct. For example, you have to check carefully to see if 1535 goes into 4578 two times or three times.

The SRT algorithm makes it easier to select the quotient digit through an unusual approach: it allows negative digits in the quotient. With this change, the quotient digit does not need to be exact. If you pick a quotient digit that is a bit too large, you can use a negative number for the next digit: this will counteract the too-large digit since the next divisor will be added rather than subtracted.

The example below shows how this works. Suppose you picked 3 instead of 2 as the first quotient digit. Since 3 is too big, the partial remainder is negative (-261). In normal division, you'd need to try again with a different quotient digit. But with SRT, you keep going, using a negative digit (-1) for the quotient digit in the next step. At the end, the quotient with positive and negative digits can be converted to the standard form: 3×10-1 = 29, the same quotient as before.

Base-10 division, using a negative quotient digit. The result is the same as the previous example.

Base-10 division, using a negative quotient digit. The result is the same as the previous example.

One nice thing about the SRT algorithm is that since the quotient digit only needs to be close, a lookup table can be used to select the quotient digit. Specifically, the partial remainder and divisor can be truncated to a few digits, making the lookup table a practical size. In this example, you could truncate 1535 and 4578 to 15 and 45, the table says that 15 goes into 45 three times, and you can use 3 as your quotient digit.

Instead of base 10, the Pentium uses the SRT algorithm in base 4: groups of two bits. As a result, division on the Pentium is twice as fast as standard binary division. With base-4 SRT, each quotient digit can be -2, -1, 0, 1, or 2. Multiplying by any of these values is very easy in hardware since multiplying by 2 can be done by a bit shift. Base-4 SRT does not require quotient digits of -3 or 3; this is convenient since multiplying by 3 is somewhat difficult. To summarize, base-4 SRT is twice as fast as regular binary division, but it requires more hardware: a lookup table, circuitry to add or subtract multiples of 1 or 2, and circuitry to convert the quotient to the standard form.

Structure of the Pentium's lookup table

The purpose of the SRT lookup table is to provide the quotient digit. That is, the table takes the partial remainder p and the divisor d as inputs and provides an appropriate quotient digit. The Pentium's lookup table is the cause of the division bug, as was explained in 1994. The table was missing five entries; if the SRT algorithm accesses one of these missing entries, it generates an incorrect result. In this section, I'll discuss the structure of the lookup table and explain what went wrong.

The Pentium's lookup table contains 2048 entries, as shown below. The table has five regions corresponding to the quotient digits +2, +1, 0, -1, and -2. Moreover, the upper and lower regions of the table are unused (due to the mathematics of SRT). The unused entries were filled with 0, which turns out to be very important. In particular, the five red entries need to contain +2 but were erroneously filled with 0.

The 2048-entry lookup table used in the Pentium for division. The divisor is along the X-axis, from 1 to 2. The partial remainder is along the Y-axis, from -8 to 8. Click for a larger version.

The 2048-entry lookup table used in the Pentium for division. The divisor is along the X-axis, from 1 to 2. The partial remainder is along the Y-axis, from -8 to 8. Click for a larger version.

When the SRT algorithm uses the table, the partial remainder p and the divisor d are inputs. The divisor (scaled to fall between 1 and 2) provides the X coordinate into the table, while the partial remainder (between -8 and 8) provides the Y coordinate. The details of the table coordinates will be important, so I'll go into some detail. To select a cell, the divisor (X-axis) is truncated to a 5-bit binary value 1.dddd. (Since the first digit of the divisor is always 1, it is ignored for the table lookup.) The partial remainder (Y-axis) is truncated to a 7-bit signed binary value pppp.ppp. The 11 bits indexing into the table result in a table with 211 (2048) entries. The partial remainder is expressed in 2's complement, so values 0000.000 to 0111.111 are non-negative values from 0 to (almost) 8, while values 1000.000 to 1111.111 are negative values from -8 to (almost) 0. (To see the binary coordinates for the table, click on the image and zoom in.)

The lookup table is implemented in a Programmable Logic Array (PLA)

In this section, I'll explain how the lookup table is implemented in hardware in the Pentium. The lookup table has 2048 entries so it could be stored in a ROM with 2048 two-bit outputs.11 (The sign is not explicitly stored in the table because the quotient digit sign is the same as the partial remainder sign.) However, because the table is highly structured (and largely empty), the table can be stored more compactly in a structure called a Programmable Logic Array (PLA).12 By using a PLA, the Pentium stored the table in just 112 rows rather than 2048 rows, saving an enormous amount of space. Even so, the PLA is large enough on the chip that it is visible to the naked eye, if you squint a bit.

Zooming in on the PLA and associated circuitry on the Pentium die.

Zooming in on the PLA and associated circuitry on the Pentium die.

The idea of a PLA is to provide a dense and flexible way of implementing arbitrary logic functions. Any Boolean logic function can be expressed as a "sum-of-products", a collection of AND terms (products) that are OR'd together (summed). A PLA has a block of circuitry called the AND plane that generates the desired sum terms. The outputs of the AND plane are fed into a second block, the OR plane, which ORs the terms together. The AND plane and the OR plane are organized as grids. Each gridpoint can either have a transistor or not, defining the logic functions. The point is that by putting the appropriate pattern of transistors in the grids, you can create any function. For the division PLA, there are has 22 inputs (the 11 bits from the divisor and partial remainder indices, along with their complements) and two outputs, as shown below.13

A simplified diagram of the division PLA.

A simplified diagram of the division PLA.

A PLA is more compact than a ROM if the structure of the function allows it to be expressed with a small number of terms.14 One difficulty with a PLA is figuring out how to express the function with the minimum number of terms to make the PLA as small as possible. It turns out that this problem is NP-complete in general. Intel used a program called Espresso to generate compact PLAs using heuristics.15

The diagram below shows the division PLA in the Pentium. The PLA has 120 rows, split into two 60-row parts with support circuitry in the middle.16 The 11 table input bits go into the AND plane drivers in the middle, which produce the 22 inputs to the PLA (each table input and its complement). The outputs from the AND plane transistors go through output buffers and are fed into the OR plane. The outputs from the OR plane go through additional buffers and logic in the center, producing two output bits, indicating a ±1 or ±2 quotient. The image below shows the updated PLA that fixes the bug; the faulty PLA looks similar except the transistor pattern is different. In particular, the updated PLA has 46 unused rows at the bottom while the original, faulty PLA has 8 unused rows.

The division PLA with the metal layers removed to show the silicon. This image shows the PLA in the updated Pentium, since that photo came out better.

The division PLA with the metal layers removed to show the silicon. This image shows the PLA in the updated Pentium, since that photo came out better.

The image below shows part of the AND plane of the PLA. At each point in the grid, a transistor can be present or absent. The pattern of transistors in a row determines the logic term for that row. The vertical doped silicon lines (green) are connected to ground. The vertical polysilicon lines (red) are driven with the input bit pattern. If a polysilicon line crosses doped silicon, it forms a transistor (orange) that will pull that row to ground when activated.17 A metal line connects all the transistor rows in a row to produce the output; most of the metal has been removed, but some metal lines are visible at the right.

Part of the AND plane in the fixed Pentium. I colored the first silicon and polysilicon lines green and red respectively.

Part of the AND plane in the fixed Pentium. I colored the first silicon and polysilicon lines green and red respectively.

By carefully examining the PLA under a microscope, I extracted the pattern of transistors in the PLA grid. (This was somewhat tedious.) From the transistor pattern, I could determine the equations for each PLA row, and then generate the contents of the lookup table. Note that the transistors in the PLA don't directly map to the table contents (unlike a ROM). Thus, there is no specific place for transistors corresponding to the 5 missing table entries.

The left-hand side of the PLA implements the OR planes (below). The OR plane determines if the row output produces a quotient of 1 or 2. The OR plane is oriented 90° relative to the AND plane: the inputs are horizontal polysilicon lines (red) while the output lines are vertical. As before, a transistor (orange) is formed where polysilicon crosses doped silicon. Curiously, each OR plane has four outputs, even though the PLA itself has two outputs.18

Part of the OR plane of the division PLA. I removed the metal layers to show the underlying silicon and polysilicon. I drew lines for ground and outputs, showing where the metal lines were.

Part of the OR plane of the division PLA. I removed the metal layers to show the underlying silicon and polysilicon. I drew lines for ground and outputs, showing where the metal lines were.

Next, I'll show exactly how the AND plane produces a term. For the division table, the inputs are the 7 partial remainder bits and 4 divisor bits, as explained earlier. I'll call the partial remainder bits p6p5p4p3.p2p1p0 and the divisor bits 1.d3d2d1d0. These 11 bits and their complements are fed vertically into the PLA as shown at the top of the diagram below. These lines are polysilicon, so they will form transistor gates, turning on the corresponding transistor when activated. The arrows at the bottom point to nine transistors in the first row. (It's tricky to tell if the polysilicon line passes next to doped silicon or over the silicon, so the transistors aren't always obvious.) Looking at the transistors and their inputs shows that the first term in the PLA is generated by p0p1p2p3p4'p5p6d1d2.

The first row of the division PLA in a faulty Pentium.

The first row of the division PLA in a faulty Pentium.

The diagram below is a closeup of the lookup table, showing how this PLA row assigns the value 1 to four table cells (dark blue). You can think of each term of the PLA as pattern-matching to a binary pattern that can include "don't care" values. The first PLA term (above) matches the pattern P=110.1111, D=x11x, where the "don't care" x values can be either 0 or 1. Since one PLA row can implement multiple table cells, the PLA is more efficient than a ROM; the PLA uses 112 rows, while a ROM would require 2048 rows.

The first entry in the PLA assigns the value 1 to the four dark blue cells.

The first entry in the PLA assigns the value 1 to the four dark blue cells.

Geometrically, you can think of each PLA term (row) as covering a rectangle or rectangles in the table. However, the rectangle can't be arbitrary, but must be aligned on a bit boundary. Note that each "bump" in the table boundary (magenta) requires a separate rectangle and thus a separate PLA row. (This will be important later.)

One PLA row can generate a large rectangle, filling in many table cells at once, if the region happens to be aligned nicely. For instance, the third term in the PLA matches d=xxxx, p=11101xx. This single PLA row efficiently fills in 64 table cells as shown below, replacing the 64 rows that would be required in a ROM.

The third entry in the PLA assigns the value 1 to the 64 dark blue cells.

The third entry in the PLA assigns the value 1 to the 64 dark blue cells.

To summarize, the pattern of transistors in the PLA implements a set of equations, which define the contents of the table, setting the quotient to 1 or 2 as appropriate. Although the table has 2048 entries, the PLA represents the contents in just 112 rows. By carefully examining the transistor pattern, I determined the table contents in a faulty Pentium and a fixed Pentium.

The mathematical bounds of the lookup table

As shown earlier, the lookup table has regions corresponding to quotient digits of +2, +1, 0, -1, and -2. These regions have irregular, slanted shapes, defined by mathematical bounds. In this section, I'll explain these mathematical bounds since they are critical to understanding how the Pentium bug occurred.

The essential step of the division algorithm is to divide the partial remainder p by the divisor d to get the quotient digit. The following diagram shows how p/d determines the quotient digit. The ratio p/d will define a point on the line at the top. (The point will be in the range [-8/3, 8/3] for mathematical reasons.) The point will fall into one of the five lines below, defining the quotient digit q. However, the five quotient regions overlap; if p/d is in one of the green segments, there are two possible quotient digits. The next part of the diagram illustrates how subtracting q*d from the partial remainder p shifts p/d into the middle, between -2/3 and 2/3. Finally, the result is multiplied by 4 (shifted left by two bits), expanding19 the interval back to [-8/3, 8/3], which is the same size as the original interval. The 8/3 bound may seem arbitrary, but the motivation is that it ensures that the new interval is the same size as the original interval, so the process can be repeated. (The bounds are all thirds for algebraic reasons; the value 3 comes from base 4 minus 1.20)

The input to a division step is processed, yielding the input to the next step.

The input to a division step is processed, yielding the input to the next step.

Note that the SRT algorithm has some redundancy, but cannot handle q values that are "too wrong". Specifically, if p/d is in a green region, then either of two q values can be selected. However, the algorithm cannot recover from a bad q value in general. The relevant case is that if q is supposed to be 2 but 0 is selected, the next partial remainder will be outside the interval and the algorithm can't recover. This is what causes the FDIV bug.

The diagram below shows the structure of the SRT lookup table (also called the P-D table since the axes are p and d). Each bound in the diagram above turns into a line in the table. For instance, the green segment above with p/d between 4/3 and 5/3 turns into a green region in the table below with 4/3 d ≤ p ≤ 5/3 d. These slanted lines show the regions in which a particular quotient digit q can be used.

The P-D table specifies the quotient digit for a partial remainder (Y-axis) and divisor (X-axis).

The P-D table specifies the quotient digit for a partial remainder (Y-axis) and divisor (X-axis).

The lookup table in the Pentium is based on the above table, quantized with a q value in each cell. However, there is one more constraint to discuss.

Carry-save and carry-lookahead adders

The Pentium's division circuitry uses a special circuit to perform addition and subtraction efficiently: the carry-save adder. One consequence of this adder is that each access to the lookup table may go to the cell just below the "right" cell. This is expected and should be fine, but in very rare and complicated circumstances, this behavior causes an access to one of the Pentium's five missing cells, triggering the division bug. In this section, I'll discuss why the division circuitry uses a carry-save adder, how the carry-save adder works, and how the carry-save adder triggers the FDIV bug.

The problem with addition is that carries make addition slow. Consider calculating 99999+1 by hand. You'll start with 9+1=10, then carry the one, generating another carry, which generates another carry, and so forth, until you go through all the digits. Computer addition has the same problem. If you're adding, say, two 64-bit numbers, the low-order bits can generate a carry that then propagates through all 64 bits. The time for the carry signal to go through 64 layers of circuitry is significant and can limit CPU performance. As a result, CPUs use special circuits to make addition faster.

The Pentium's division circuitry uses an unusual adder circuit called a carry-save adder to add (or subtract) the divisor and the partial remainder. A carry-save adder speeds up addition if you are performing a bunch of additions, as happens during division. The idea is that instead of adding a carry to each digit as it happens, you hold onto the carries in a separate word. As a decimal example, 499+222 would be 611 with carries 011; you don't carry the one to the second digit, but hold onto it. The next time you do an addition, you add in the carries you saved previously, and again save any new carries. The advantage of the carry-save adder is that the sum and carry at each digit position can be computed in parallel, which is fast. The disadvantage is that you need to do a slow addition at the end of the sequence of additions to add in the remaining carries to get the final answer. But if you're performing multiple additions (as for division), the carry-save adder is faster overall.

The carry-save adder creates a problem for the lookup table. We need to use the partial remainder as an index into the lookup table. But the carry-save adder splits the partial remainder into two parts: the sum bits and the carry bits. To get the table index, we need to add the sum bits and carry bits together. Since this addition needs to happen for every step of the division, it seems like we're back to using a slow adder and the carry-save adder has just made things worse.

The trick is that we only need 7 bits of the partial remainder for the table index, so we can use a different type of adder—a carry-lookahead adder—that calculates each carry in parallel using brute force logic. The logic in a carry-lookahead adder gets more and more complex for each bit so a carry-lookahead adder is impractical for large words, but it is practical for a 7-bit value.

The photo below shows the carry-lookahead adder used by the divider. Curiously, the adder is an 8-bit adder but only 7 bits are used; perhaps the 8-bit adder was a standard logic block at Intel.21 I'll just give a quick summary of the adder here, and leave the details for another post. At the top, logic gates compute signals in parallel for each of the 8 pairs of inputs: sum, carry generate, and carry propagate. Next, the complex carry-lookahead logic determines in parallel if there will be a carry at each position. Finally, XOR gates apply the carry to each bit. The circuitry in the middle is used for testing; see the footnote.22 At the bottom, the drivers amplify control signals for various parts of the adder and send the PLA output to other parts of the chip.23 By counting the blocks of repeated circuitry, you can see which blocks are 8 bits wide, 11, bits wide, and so forth. The carry-lookahead logic is different for each bit, so there is no repeated structure.

The carry-lookahead adder that feeds the lookup table. This block of circuitry is just above the PLA on the die. I removed the metal layers, so this photo shows the doped silicon (dark) and the polysilicon (faint gray).

The carry-lookahead adder that feeds the lookup table. This block of circuitry is just above the PLA on the die. I removed the metal layers, so this photo shows the doped silicon (dark) and the polysilicon (faint gray).

The carry-save and carry-lookahead adders may seem like implementation trivia, but they are a critical part of the FDIV bug because they change the constraints on the table. The cause is that the partial remainder is 64 bits,24 but the adder that computes the table index is 7 bits. Since the rest of the bits are truncated before the sum, the partial remainder sum for the table index can be slightly lower than the real partial remainder. Specifically, the table index can be one cell lower than the correct cell, an offset of 1/8. Recall the earlier diagram with diagonal lines separating the regions. Some (but not all) of these lines must be shifted down by 1/8 to account for the carry-save effect, but Intel made the wrong adjustment, which is the root cause of the FDIV error. (This effect was well-known at the time and mentioned in papers on SRT division, so Intel shouldn't have gotten it wrong.)

An interesting thing about the FDIV bug is how extremely rare it is. With 5 bad table entries out of 2048, you'd expect erroneous divides to be very common. However, for complicated mathematical reasons involving the carry-save adder the missing table entries are almost never encountered: only about 1 in 9 billion random divisions will encounter a problem. To hit a missing table entry, you need an "unlucky" result from the carry-save adder multiple times in a row, making the odds similar to winning the lottery, if the lottery prize were a division error.25

What went wrong in the lookup table

I consider the diagram below to be the "smoking gun" that explains how the FDIV bug happens: the top magenta line should be above the sloping black line, but it crosses the black line repeatedly. The magenta line carefully stays above the gray line, but that's the wrong line. In other words, Intel picked the wrong bounds line when defining the +2 region of the table. In this section, I'll explain why that causes the bug.

The top half of the lookup table, explaining the root of the FDIV bug.

The top half of the lookup table, explaining the root of the FDIV bug.

The diagram is colored according to the quotient values stored in the Pentium's lookup table: yellow is +2, blue is +1, and white is 0, with magenta lines showing the boundaries between different values. The diagonal black lines are the mathematical constraints on the table, defining the region that must be +2, the region that can be +1 or +2, the region that must be +1, and so forth. For the table to be correct, each cell value in the table must satisfy these constraints. The middle magenta line is valid: it remains between the two black lines (the redundant +1 or +2 region), so all the cells that need to be +1 are +1 and all the cells that need to be +2 are +2, as required. Likewise, the bottom magenta line remains between the black lines. However, the top magenta line is faulty: it must remain above the top black line, but it crosses the black line. The consequence is that some cells that need to be +2 end up holding 0: these are the missing cells that caused the FDIV bug.

Note that the top magenta line stays above the diagonal gray line while following it as closely as possible. If the gray line were the correct line, the table would be perfect. Unfortunately, Intel picked the wrong constraint line for the table's upper bound when the table was generated.26

But why are some diagonal lines lowered by 1/8 and other lines are not lowered? As explained in the previous section, as a consequence of the carry-save adder truncation, the table lookup may end up one cell lower than the actual p value would indicate, i.e. the p value for the table index is 1/8 lower than that actual value. Thus, both the correct cell and the cell below must satisfy the SRT constraints. Thus, the line moves down if that makes the constraints stricter but does not move down if that would expand the redundant area. In particular, the top line must not be move down, but clearly Intel moved the line down and generated the faulty lookup table.

Intel, however, has a different explanation for the bug. The Intel white paper states that the problem was in a script that downloaded the table into a PLA: an error caused the script to omit a few entries from the PLA.27 I don't believe this explanation: the missing terms match a mathematical error, not a copying error. I suspect that Intel's statement is technically true but misleading: they ran a C program (which they called a script) to generate the table but the program had a mathematical error in the bounds.

In his book "The Pentium Chronicles", Robert Colwell, architect of the Pentium Pro, provides a different explanation of the FDIV bug. Colwell claims that the Pentium design originally used the same lookup table as the 486, but shortly before release, the engineers were pressured by management to shrink the circuitry to save die space. The engineers optimized the table to make it smaller and had a proof that the optimization would work. Unfortunately, the proof was faulty, but the testers trusted the engineers and didn't test the modification thoroughly, causing the Pentium to be released with the bug. The problem with this explanation is that the Pentium was designed from the start with a completely different division algorithm from the 486: the Pentium uses radix-4 SRT, while the 486 uses standard binary division. Since the 486 doesn't have a lookup table, the story falls apart. Moreover, the PLA could trivially have been made smaller by removing the 8 unused rows, so the engineers clearly weren't trying to shrink it. My suspicion is that since Colwell developed the Pentium Pro in Oregon but the original Pentium was developed in California, Colwell didn't get firsthand information on the Pentium problems.

How Intel fixed the bug

Intel's fix for the bug was straightforward but also surprising. You'd expect that Intel added the five missing table values to the PLA, and this is what was reported at the time. The New York Times wrote that Intel fixed the flaw by adding several dozen transistors to the chip. EE Times wrote that "The fix entailed adding terms, or additional gate-sequences, to the PLA."

However, the updated PLA (below) shows something entirely different. The updated PLA is exactly the same size as the original PLA. However, about 1/3 of the terms were removed from the PLA, eliminating hundreds of transistors. Only 74 of the PLA's 120 rows are used, and the rest are left empty. (The original PLA had 8 empty rows.) How could removing terms from the PLA fix the problem?

The updated PLA has 46 unused rows.

The updated PLA has 46 unused rows.

The explanation is that Intel didn't just fill in the five missing table entries with the correct value of 2. Instead, Intel filled all the unused table entries with 2, as shown below. This has two effects. First, it eliminates any possibility of hitting a mistakenly-empty entry. Second, it makes the PLA equations much simpler. You might think that more entries in the table would make the PLA larger, but the number of PLA terms depends on the structure of the data. By filling the unused cells with 2, the jagged borders between the unused regions (white) and the "2" regions (yellow) disappear. As explained earlier, a large rectangle can be covered by a single PLA term, but a jagged border requires a lot of terms. Thus, the updated PLA is about 1/3 smaller than the original, flawed PLA. One consequence is that the terms in the new PLA are completely different from the terms in the old PLA so one can't point to the specific transistors that fixed the bug.

Comparison of the faulty lookup table (left) and the corrected lookup table (right).

Comparison of the faulty lookup table (left) and the corrected lookup table (right).

The image below shows the first 14 rows of the faulty PLA and the first 14 rows of the fixed PLA. As you can see, the transistor pattern (and thus the PLA terms) are entirely different. The doped silicon is darkened in the second image due to differences in how I processed the dies to remove the metal layers.

Top of the faulty PLA (left) and the fixed PLA (right). The metal layers were removed to show the silicon of the transistors. (Click for a larger image.)

Top of the faulty PLA (left) and the fixed PLA (right). The metal layers were removed to show the silicon of the transistors. (Click for a larger image.)

Impact of the FDIV bug

How important is the Pentium bug? This became a highly controversial topic. A failure of a random division operation is very rare: about one in 9 billion values will trigger the bug. Moreover, an erroneous division is still mostly accurate: the error is usually in the 9th or 10th decimal digit, with rare worst-case error in the 4th significant digit. Intel's whitepaper claimed that a typical user would encounter a problem once every 27,000 years, insignificant compared to other sources of error such as DRAM bit flips. Intel said: "Our overall conclusion is that the flaw in the floating point unit of the Pentium processor is of no concern to the vast majority of users. A few users of applications in the scientific/engineering and financial engineering fields may need to employ either an updated processor without the flaw or a software workaround."

However, IBM performed their own analysis,29 suggesting that the problem could hit customers every few days, and IBM suspended Pentium sales. (Coincidentally, IBM had a competing processor, the PowerPC.) The battle made it to major newspapers; the Los Angeles Times split the difference with Study Finds Both IBM, Intel Off on Error Rate. Intel soon gave in and agreed to replace all the Pentiums, making the issue moot.

I mostly agree with Intel's analysis. It appears that only one person (Professor Nicely) noticed the bug in actual use.28 The IBM analysis seems contrived to hit numbers that trigger the error. Most people would never hit the bug and even if they hit it, a small degradation in floating-point accuracy is unlikely to matter to most people. Looking at society as a whole, replacing the Pentiums was a huge expense for minimal gain. On the other hand, it's reasonable for customers to expect an accurate processor.

Note that the Pentium bug is deterministic: if you use a specific divisor and dividend that trigger the problem, you will get the wrong answer 100% of the time. Pentium engineer Ken Shoemaker suggested that the outcry over the bug was because it was so easy for customers to reproduce. It was hard for Intel to argue that customers would never encounter the bug when customers could trivially see the bug on their own computer, even if the situation was artificial.

Conclusions

The FDIV bug is one of the most famous processor bugs. By examining the die, it is possible to see exactly where it is on the chip. But Intel has had other important bugs. Some early 386 processors had a 32-bit multiply problem. Unlike the deterministic FDIV bug, the 386 would unpredictably produce the wrong results under particular temperature/voltage/frequency conditions. The underlying issue was a layout problem that didn't provide enough electrical margin to handle the worst-case situation. Intel sold the faulty chips but restricted them to the 16-bit market; bad chips were labeled "16 BIT S/W ONLY", while the good processors were marked with a double sigma. Although Intel had to suffer through embarrassing headlines such as Some 386 Systems Won't Run 32-Bit Software, Intel Says, the bug was soon forgotten.

Bad and good versions of the 386. Note the labels on the bottom line. Photos (L), (R) by Thomas Nguyen, (CC BY-SA 4.0)

Bad and good versions of the 386. Note the labels on the bottom line. Photos (L), (R) by Thomas Nguyen, (CC BY-SA 4.0)

Another memorable Pentium issue was the "F00F bug", a problem where a particular instruction sequence starting with F0 0F would cause the processor to lock up until rebooted.30 The bug was found in 1997 and solved with an operating system update. The bug is presumably in the Pentium's voluminous microcode. The microcode is too complex for me to analyze, so don't expect a detailed blog post on this subject. :-)

You might wonder why Intel needed to release a new revision of the Pentium to fix the FDIV bug, rather than just updating the microcode. The problem was that microcode for the Pentium (and earlier processors) was hard-coded into a ROM and couldn't be modified. Intel added patchable microcode to the Pentium Pro (1995), allowing limited modifications to the microcode. Intel originally implemented this feature for chip debugging and testing. But after the FDIV bug, Intel realized that patchable microcode was valuable for bug fixes too.31 The Pentium Pro stores microcode in ROM, but it also has a static RAM that holds up to 60 microinstructions. During boot, the BIOS can load a microcode patch into this RAM. In modern Intel processors, microcode patches have been used for problems ranging from the Spectre vulnerability to voltage problems.

The Pentium PLA with the top metal layer removed, revealing the M2 and M1 layers. The OR and AND planes are at the top and bottom, with drivers and control logic in the middle.

The Pentium PLA with the top metal layer removed, revealing the M2 and M1 layers. The OR and AND planes are at the top and bottom, with drivers and control logic in the middle.

As the number of transistors in a processor increased exponentially, as described by Moore's Law, processors used more complex circuits and algorithms. Division is one example. Early microprocessors such as the Intel 8080 (1974, 6000 transistors) had no hardware support for division or floating point arithmetic. The Intel 8086 (1978, 29,000 transistors) implemented integer division in microcode but required the 8087 coprocessor chip for floating point. The Intel 486 (1989, 1.2 million transistors) added floating-point support on the chip. The Pentium (1993, 3.1 million transistors) moved to the faster but more complicated SRT division algorithm. The Pentium's division PLA alone has roughly 4900 transistor sites, more than a MOS Technology 6502 processor—one component of the Pentium's division circuitry uses more transistors than an entire 1975 processor.

The long-term effect of the FDIV bug on Intel is a subject of debate. On the one hand, competitors such as AMD benefitted from Intel's error. AMD's ads poked fun at the Pentium's problems by listing features of AMD's chips such as "You don't have to double check your math" and "Can actually handle the rigors of complex calculations like division." On the other hand, Robert Colwell, architect of the Pentium Pro, said that the FDIV bug may have been a net benefit to Intel as it created enormous name recognition for the Pentium, along with a demonstration that Intel was willing to back up its brand name. Industry writers agreed; see The Upside of the Pentium Bug. In any case, Intel survived the FDIV bug; time will tell how Intel survives its current problems.

I plan to write more about the implementation of the Pentium's PLA, the adder, and the test circuitry. Until then, you may enjoy reading about the Pentium Navajo rug. (The rug represents the P54C variant of the Pentium, so it is safe from the FDIV bug.) Thanks to Bob Colwell and Ken Shoemaker for helpful discussions.

Footnotes and references

  1. The book Inside Intel says that Vin Dham, the "Pentium czar", found the FDIV problem in May 1994. The book "The Pentium Chronicles" says that Patrice Roussel, the floating-point architect for Intel's upcoming Pentium Pro processor, found the FDIV problem in Summer 1994. I suspect that the bug was kept quiet inside Intel and was discovered more than once. 

  2. The divisor being a prime number has nothing to do with the bug. It's just a coincidence that the problem was found during research with prime numbers. 

  3. See Nicely's FDIV page for more information on the bug and its history. Other sources are the books Creating the Digital Future, The Pentium Chronicles, and Inside Intel. The New York Times wrote about the bug: Flaw Undermines Accuracy of Pentium Chips. Computerworld wrote Intel Policy Incites User Threats on threats of a class-action lawsuit. IBM's response is described in IBM Deals Blow to a Rival as it Suspends Pentium Sales 

  4. Talk show host David Letterman joked about the Pentium on December 15: "You know what goes great with those defective Pentium chips? Defective Pentium salsa!" Although a list of Letterman-style top ten Pentium slogans circulated, the list was a Usenet creation. There's a claim that Jay Leno also joked about the Pentium, but I haven't found verification. 

  5. Processors have many more bugs than you might expect. Intel's 1995 errata list for the Pentium had "21 errata (including the FDIV problem), 4 changes, 16 clarifications, and 2 documentation changes." See Pentium Processor Specification Update and Intel Releases Pentium Errata List

  6. Intel published full-page newspaper ads apologizing for its handling of the problem, stating: "What Intel continues to believe is an extremely minor technical problem has taken on a life of its own."

    Intel's apology letter, published in Financial Times. Note the UK country code in the phone number.

    Intel's apology letter, published in Financial Times. Note the UK country code in the phone number.

     

  7. Tim Coe's reverse engineering of the Pentium divider was described on the Usenet group comp.sys.intel, archived here. To summarize, Andreas Kaiser found 23 failing reciprocals. Tim Coe determined that most of these failing reciprocals were of the form 3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J))). He recognized that the factor of 2 indicated a radix-4 divider. The extremely low probability of error indicated the presence of a carry save adder; the odds of both the sum and carry bits getting long patterns of ones were very low. Coe constructed a simulation of the divider that matched the Pentium's behavior and noted which table entries must be faulty. 

  8. The main papers on the FDIV bug are Computational Aspects of the Pentium Affair, It Takes Six Ones to Reach a Flaw, The Mathematics of the Pentium Division Bug, The Truth Behind the Pentium Bug, Anatomy of the Pentium Bug, and Risk Analysis of the Pentium Bug. Intel's whitepaper is Statistical Analysis of Floating Point Flaw in the Pentium Processor; I archived IBM's study here

  9. The Pentium uses floating point numbers that follow the IEEE 754 standard. Internally, floating point numbers are represented with 80 bits: 1 bit for the sign, 15 bits for the exponent, and 64 bits for the significand. Externally, floating point numbers are 32-bit single-precision numbers or 64-bit double-precision numbers. Note that the number of significand bits limits the accuracy of a floating-point number. 

  10. The SRT division algorithm is named after the three people who independently created it in 1957-1958: Sweeney at IBM, Robertson at the University of Illinois, and Tocher at Imperial College London. The SRT algorithm was developed further by Atkins in his PhD research (1970).

    The SRT algorithm became more practical in the 1980s as chips became denser. Taylor implemented the SRT algorithm on a board with 150 chips in 1981. The IEEE floating point standard (1985) led to a market for faster floating point circuitry. For instance, the Weitek 4167 floating-point coprocessor chip (1989) was designed for use with the Intel 486 CPU (datasheet) and described in an influential paper. Another important SRT implementation is the MIPS R3010 (1988), the coprocessor for the R3000 RISC processor. The MIPS R3010 uses radix-4 SRT for division with 9 bits from the partial remainder and 9 bits from the divisor, making for a larger lookup table and adder than the Pentium (link).

    To summarize, when Intel wanted to make division faster on the Pentium (1993), the SRT algorithm was a reasonable choice. Competitors had already implemented SRT and multiple papers explained how SRT worked. The implementation should have been straightforward and bug-free. 

  11. The dimensions of the lookup table can't be selected arbitrarily. In particular, if the table is too small, a cell may need to hold two different q values, which isn't possible. Note that constructing the table is only possible due to the redundancy of SRT. For instance, if some values in the call require q=1 and other values require q=1 or 2, then the value q=1 can be assigned to the cell. 

  12. In the white paper, Intel calls the PLA a Programmable Lookup Array, but that's an error; it's a Programmable Logic Array. 

  13. I'll explain a PLA in a bit more detail in this footnote. An example of a sum-of-products formula with inputs a and b is ab' + a'b + ab. This formula has three sum terms, so it requires three rows in the PLA. However, this formula can be reduced to a + b, which uses a smaller two-row PLA. Note that any formula can be trivially expressed with a separate product term for each 1 output in the truth table. The hard part is optimizing the PLA to use fewer terms. The original PLA patent is probably MOS Transistor Integrated Matrix from 1969. 

  14. A ROM and a PLA have many similarities. You can implement a ROM with a PLA by using the AND terms to decode addresses and the OR terms to hold the data. Alternatively, you can replace a PLA with a ROM by putting the function's truth table into the ROM. ROMs are better if you want to hold arbitrary data that doesn't have much structure (such as the microcode ROMs). PLAs are better if the functions have a lot of underlying structure. The key theoretical difference between a ROM and a PLA is that a ROM activates exactly one row at a time, corresponding to the address, while a PLA may activate one row, no rows, or multiple rows at a time. Another alternative for representing functions is to use logic gates directly (known as random logic); moving from the 286 to the 386, Intel replaced many small PLAs with logic gates, enabled by improvements in the standard-cell software. Intel's design process is described in Coping with the Complexity of Microprocessor Design

  15. In 1982, Intel developed a program called LOGMIN to automate PLA design. The original LOGMIN used an exhaustive exponential search, limiting its usability. See A Logic Minimizer for VLSI PLA Design. For the 386, Intel used Espresso, a heuristic PLA minimizer that originated at IBM and was developed at UC Berkeley. Intel probably used Espresso for the Pentium, but I can't confirm that. 

  16. The Pentium's PLA is split into a top half and a bottom half, so you might expect the top half would generate a quotient of 1 and the bottom half would generate a quotient of 2. However, the rows for the two quotients are shuffled together with no apparent pattern. I suspect that the PLA minimization software generated the order arbitrarily. 

  17. Conceptually, the PLA consists of AND gates feeding into OR gates. To simplify the implementation, both layers of gates are actually NOR gates. Specifically, if any transistor in a row turns on, the row will be pulled to ground, producing a zero. De Morgan's laws show that the two approaches are the same, if you invert the inputs and outputs. I'm ignoring this inversion in the diagrams.

    Note that each square can form a transistor on the left, the right, or both. The image must be examined closely to distinguish these cases. Specifically, if the polysilicon line produces a transistor, horizontal lines are visible in the polysilicon. If there are no horizontal lines, the polysilicon passes by without creating a transistor. 

  18. Each OR plane has four outputs, so there are eight outputs in total. These outputs are combined with logic gates to generate the desired two outputs (quotient of 1 or 2). I'm not sure why the PLA is implemented in this fashion. Each row alternates between an output on the left and an output on the right, but I don't think this makes the layout any denser. As far as I can tell, the extra outputs just waste space. One could imagine combining the outputs in a clever way to reduce the number of terms, but instead the outputs are simply OR'd together. 

  19. The dynamics of the division algorithm are interesting. The computation of a particular division will result in the partial remainder bouncing from table cell to table cell, while remaining in one column of the table. I expect this could be analyzed in terms of chaotic dynamics. Specifically, the partial remainder interval is squished down by the subtraction and then expanded when multiplied by 4. This causes low-order bits to percolate upward so the result is exponentially sensitive to initial conditions. I think that the division behavior satisfies the definition of chaos in Dynamics of Simple Maps, but I haven't investigated this in detail.

    You can see this chaotic behavior with a base-10 division, e.g. compare 1/3.0001 to 1/3.0002:
    1/3.0001=0.33332222259258022387874199947368393726705454969006... 1/3.0002=0.33331111259249383619151572689224512820860216424246...
    Note that the results start off the same but are completely divergent by 15 digits. (The division result itself isn't chaotic, but the sequence of digits is.)

    I tried to make a fractal out of the SRT algorithm and came up with the image below. There are 5 bands for convergence, each made up of 5 sub-bands, each made up of 5 sub-sub bands, and so on, corresponding to the 5 q values.

    A fractal showing convergence or divergence of SRT division as the scale factor (X-axis) ranges from the normal value of 4 to infinity. The Y-axis is the starting partial remainder. The divisor is (arbitrarily) 1.5. Red indicates convergence; gray is darker as the value diverges faster.

    A fractal showing convergence or divergence of SRT division as the scale factor (X-axis) ranges from the normal value of 4 to infinity. The Y-axis is the starting partial remainder. The divisor is (arbitrarily) 1.5. Red indicates convergence; gray is darker as the value diverges faster.

     

  20. The algebra behind the bound of 8/3 is that p (the partial remainder) needs to be in an interval that stays the same size each step. Each step of division computes pnew = (pold - q*d)*4. Thus, at the boundary, with q=2, you have p = (p-2*d)*4, so 3p=8d and thus p/d = 8/3. Similarly, the other boundary, with q=-2, gives you p/d = -8/3. 

  21. I'm not completely happy with the 8-bit carry-lookahead adder. Coe's mathematical analysis in 1994 showed that the carry-lookahead adder operates on 7 bits. The adder in the Pentium has two 8-bit inputs connected to another part of the division circuit. However, the adder's bottom output bit is not connected to anything. That would suggest that the adder is adding 8 bits and then truncating to 7 bits, which would reduce the truncation error compared to a 7-bit adder. However, when I simulate the division algorithm this way, the FDIV bug doesn't occur. Wiring the bottom input bits to 0 would explain the behavior, but that seems pointless. I haven't examined the circuitry that feeds the adder, so I don't have a conclusive answer. 

  22. Half of the circuitry in the adder block is used to test the lookup table. The reason is that a chip such as the Pentium is very difficult to test: if one out of 3.1 million transistors goes bad, how do you detect it? For a simple processor like the 8080, you can run through the instruction set and be fairly confident that any problem would turn up. But with a complex chip, it is almost impossible to come up with an instruction sequence that would test every bit of the microcode ROM, every bit of the cache, and so forth. Starting with the 386, Intel added circuitry to the processor solely to make testing easier; about 2.7% of the transistors in the 386 were for testing.

    To test a ROM inside the processor, Intel added circuitry to scan the entire ROM and checksum its contents. Specifically, a pseudo-random number generator runs through each address, while another circuit computes a checksum of the ROM output, forming a "signature" word. At the end, if the signature word has the right value, the ROM is almost certainly correct. But if there is even a single bit error, the checksum will be wrong and the chip will be rejected. The pseudo-random numbers and the checksum are both implemented with linear feedback shift registers (LFSR), a shift register along with a few XOR gates to feed the output back to the input. For more information on testing circuitry in the 386, see Design and Test of the 80386, written by Pat Gelsinger, who became Intel's CEO years later. Even with the test circuitry, 48% of the transistor sites in the 386 were untested. The instruction-level test suite to test the remaining circuitry took almost 800,000 clock cycles to run. The overhead of the test circuitry was about 10% more transistors in the blocks that were tested.

    In the Pentium, the circuitry to test the lookup table PLA is just below the 7-bit adder. An 11-bit LFSR creates the 11-bit input value to the lookup table. A 13-bit LFSR hashes the two-bit quotient result from the PLA, forming a 13-bit checksum. The checksum is fed serially to test circuitry elsewhere in the chip, where it is merged with other test data and written to a register. If the register is 0 at the end, all the tests pass. In particular, if the checksum is correct, you can be 99.99% sure that the lookup table is operating as expected. The ironic thing is that this test circuit was useless for the FDIV bug: it ensured that the lookup table held the intended values, but the intended values were wrong.

    Why did Intel generate test addresses with a pseudo-random sequence instead of a sequential counter? It turns out that a linear feedback shift register (LFSR) is slightly more compact than a counter. This LFSR trick was also used in a touch-tone chip and the program counter of the Texas Instruments TMS 1000 microcontroller (1974). In the TMS 1000, the program counter steps through the program pseudo-randomly rather than sequentially. The program is shuffled appropriately in the ROM to counteract the sequence, so the program executes as expected and a few transistors are saved. 

  23. One unusual feature of the Pentium is that it uses BiCMOS technology: both bipolar and CMOS transistors. Note the distinctive square boxes in the driver circuitry; these are bipolar transistors, part of the high-speed drivers.

    Three bipolar transistors. These transistors transmit the quotient to the rest of
the division circuitry.

    Three bipolar transistors. These transistors transmit the quotient to the rest of the division circuitry.

     

  24. I think the partial remainder is actually 67 bits because there are three extra bits to handle rounding. Different parts of the floating-point datapath have different widths, depending on what width is needed at that point. 

  25. In this long footnote, I'll attempt to explain why the FDIV bug is so rare, using heatmaps. My analysis of Intel's lookup table shows several curious factors that almost cancel out, making failures rare but not impossible. (For a rigorous explanation, see It Takes Six Ones to Reach a Flaw and The Mathematics of the Pentium Division Bug. These papers explain that, among other factors, a bad divisor must have six consecutive ones in positions 5 through 10 and the division process must go through nine specific steps, making a bad result extremely uncommon.)

    The diagram below shows a heatmap of how often each table cell is accessed when simulating a generic SRT algorithm with a carry-save adder. The black lines show the boundaries of the quotient regions in the Pentium's lookup table. The key point is that the top colored cell in each column is above the black line, so some table cells are accessed but are not defined in the Pentium. This shows that the Pentium is missing 16 entries, not just the 5 entries that are usually discussed. (For this simulation, I generated the quotient digit directly from the SRT bounds, rather than the lookup table, selecting the digit randomly in the redundant regions.)

    A heatmap showing the table cells accessed by an SRT simulation.

    A heatmap showing the table cells accessed by an SRT simulation.

    The diagram is colored with a logarithmic color scale. The blue cells are accessed approximately uniformly. The green cells at the boundaries are accessed about 2 orders of magnitude less often. The yellow-green cells are accessed about 3 orders of magnitude less often. The point is that it is hard to get to the edge cells since you need to start in the right spot and get the right quotient digit, but it's not extraordinarily hard.

    (The diagram also shows an interesting but ultimately unimportant feature of the Pentium table: at the bottom of the diagram, five white cells are above the back line. This shows that the Pentium assigns values to five table cells that can't be accessed. (This was also mentioned in "The Mathematics of the Pentium Bug".) These cells are in the same columns as the 5 missing cells, so it would be interesting if they were related to the missing cells. But as far as I can tell, the extra cells are due to using a bound of "greater or equals" rather than "greater", unrelated to the missing cells. In any case, the extra cells are harmless.)

    The puzzling factor is that if the Pentium table has 16 missing table cells, and the SRT uses these cells fairly often, you'd expect maybe 1 division out of 1000 or so to be wrong. So why are division errors extremely rare?

    It turns out that the structure of the Pentium lookup table makes some table cells inaccessible. Specifically, the table is arbitrarily biased to pick the higher quotient digit rather than the lower quotient digit in the redundant regions. This has the effect of subtracting more from the partial remainder, pulling the partial remainder away from the table edges. The diagram below shows a simulation using the Pentium's lookup table and no carry-save adder. Notice that many cells inside the black lines are white, indicating that they are never accessed. This is by coincidence, due to arbitrary decisions when constructing in the lookup table. Importantly, the missing cells just above the black line are never accessed, so the missing cells shouldn't cause a bug.

    A heatmap showing the table cells accessed by an SRT simulation using the Pentium's lookup table but no carry-save adder.

    A heatmap showing the table cells accessed by an SRT simulation using the Pentium's lookup table but no carry-save adder.

    Thus, Intel almost got away with the missing table entries. Unfortunately, the carry-save adder makes it possible to reach some of the otherwise inaccessible cells. Because the output from the carry-save adder is truncated, the algorithm can access the table cell below the "right" cell. In the redundant regions, this can yield a different (but still valid) quotient digit, causing the next partial remainder to end up in a different cell than usual. The heatmap below shows the results.

    A heatmap showing the probability of ending up in each table cell when using the Pentium's division algorithm.

    A heatmap showing the probability of ending up in each table cell when using the Pentium's division algorithm.

    In particular, five cells above the black line can be reached: these are instances of the FDIV bug. These cells are orange, indicating that they are about 9 orders of magnitude less likely than the rest of the cells. It's almost impossible to reach these cells, requiring multiple "unlucky" values in a row from the carry-save adder. To summarize, the Pentium lookup table has 16 missing cells. Purely by coincidence, the choices in the lookup table make many cells inaccessible, which almost counteracts the problem. However, the carry-save adder provides a one-in-a-billion path to five of the missing cells, triggering the FDIV bug.

    One irony is that if division errors were more frequent, Intel would have caught the FDIV bug before shipping. But if division errors were substantially less frequent, no customers would have noticed the bug. Inconveniently, the frequency of errors fell into the intermediate zone: errors were too rare for Intel to spot them, but frequent enough for a single user to spot them. (This makes me wonder what other astronomically infrequent errors may be lurking in processors.) 

  26. Anatomy of the Pentium Bug reached a similar conclusion, stating "The [Intel] White Paper attributes the error to a script that incorrectly copied values; one is nevertheless tempted to wonder whether the rule for lowering thresholds was applied to the 8D/3 boundary, which would be an incorrect application because that boundary is serving to bound a threshold from below." (That paper also hypothesizes that the table was compressed to 6 columns, a hypothesis that my examination of the die disproves.) 

  27. The Intel white paper describes the underlying cause of the bug: "After the quantized P-D plot (lookup table) was numerically generated as in Figure 4-1, a script was written to download the entries into a hardware PLA (Programmable Lookup Array). An error was made in this script that resulted in a few lookup entries (belonging to the positive plane of the P-D plot) being omitted from the PLA." The script explanation is repeated in The Truth Behind the Pentium Bug: "An engineer prepared the lookup table on a computer and wrote a script in C to download it into a PLA (programmable logic array) for inclusion in the Pentium's FPU. Unfortunately, due to an error in the script, five of the 1066 table entries were not downloaded. To compound this mistake, nobody checked the PLA to verify the table was copied correctly." My analysis suggests that the table was copied correctly; the problem was that the table was mathematically wrong. 

  28. It's not hard to find claims of people encountering the Pentium division bug, but these seem to be in the "urban legend" category. Either the problem is described second-hand, or the problem is unrelated to division, or the problem happened much too frequently to be the FDIV bug. It has been said that the game Quake would occasionally show the wrong part of a level due to the FDIV bug, but I find that implausible. The "Intel Inside—Don't Divide" Chipwreck describes how the division bug was blamed for everything from database and application server crashes to gibberish text. 

  29. IBM's analysis of the error rate seems contrived, coming up with reasons to use numbers that are likely to cause errors. In particular, IBM focuses on slightly truncated numbers, either numbers with two decimal digits or hardcoded constants. Note that a slightly truncated number is much more likely to hit a problem because its binary representation will have multiple 1's in a row, a necessity to trigger the bug. Another paper Risk Analysis of the Pentium Bug claims a risk of one in every 200 divisions. It depends on "bruised integers", such as 4.999999, which are similarly contrived. I'll also point out that if you start with numbers that are "bruised" or otherwise corrupted, you obviously don't care about floating-point accuracy and shouldn't complain if the Pentium adds slightly more inaccuracy.

    The book "Inside Intel" says that "the IBM analysis was quite wrong" and "IBM's intervention in the Pentium affair was not an example of the company on its finest behavior" (page 364). 

  30. The F00F bug happens when an invalid compare-and-exchange instruction leaves the bus locked. The instruction is supposed to exchange with a memory location, but the invalid instruction specifies a register instead causing unexpected behavior. This is very similar to some undocumented instructions in the 8086 processor where a register is specified when memory is required; see my article Undocumented 8086 instructions, explained by the microcode

  31. For details on the Pentium Pro's patchable microcode, see P6 Microcode Can Be Patched. But patchable microcode dates back much earlier. The IBM System/360 mainframes (1964) had microcode that could be updated in the field, either to fix bugs or to implement new features. These systems stored microcode on metalized Mylar sheets that could be replaced as necessary. In that era, semiconductor ROMs didn't exist, so Mylar sheets were also a cost-effective way to implement read-only storage. See TROS: How IBM mainframes stored microcode in transformers

Departure Mono

Here’s a fun fixed-width pixel font I came across the other day: Departure Mono. It’s got a neat old school terminal vibe, think VT100 or Commodore 64.

A screenshot of the Departure Mono website. On the left, a small caption "Departure Mono is a monospaced pixel font with a lo-fi technical vibe". On the right are two examples: a personal letter on continuous feed paper, and a notice on small stationary.

How to effectively refine engineering strategy.

In Jim Collins’ Great by Choice, he develops the concept of Fire Bullets, Then Cannonballs. His premise is that you should cheaply test new ideas before fully committing to them. Your organization can only afford firing a small number of cannonballs, but it can bankroll far more bullets, so why not use bullets to derisk your cannonballs’ trajectories?

This chapter presents a series of concrete techniques that I have personally used to effectively refine strategies well before reaching the cannonball stage. We’ll work through an overview of strategy refinement, covering:

  • An introduction to the practice of strategy refinement
  • Why strategy refinement is the highest impact step of strategy creation
  • How mixed incentives often cause refinement to be skipped, even thought skipping leads to worse organizational outcomes
  • Building your personal toolkit for refining strategy by picking from various refinement techniques like strategy testing, systems modeling, and Wardley mapping
  • Brief introductions to each of those refinement techniques to provide enough context to pick which ones might be useful for the strategy you’re working on
  • Survey of anti-patterns that skip refinement or manufacture consent to create the illusion of refinement without providing the benefits

Each of the refinement techniques, such as systems modeling, are covered in greater detail–including concrete applications to specific engineering strategies–in the refinement section of this book.

This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.

What is strategy refinement?

Most strategies succeed because they properly address narrow problems within a broader strategy. While it’s possible to implementing the entire strategy to validate the approach, this is both inefficient and slow. Worse, it’s easy to get so distracted by miscellaneous details that you lose sight of the levers that will make your strategy impactful.

Strategy refinement is a toolkit of methods to identify those narrow problems that matter most, and validate that your solutions to those problems will be effective. The right tool within the toolkit will vary depending on the strategy you’re working on. It might be using Wardley mapping to understand how the ecosystem’s evolution will impact your approach. Or it might be systems modeling to determine which part of a migration is the most valuable lever. In other cases, it’s slowing down committing to your strategy until you’ve done a narrow test drive to derisk the pieces you don’t quite have conviction in yet.

Whatever tools you’ve relied on to refine strategy thus far in your work, there are always new refinement tools to pick up. This book presents a workable introduction to several tools that I find reliably useful, while providing a broader founderation for deploying other techniques that you develop towards strategy refinement.

Does refinement matter?

At Stripe, the head of engineering one-shot rolled out agile techniques as the required method for engineering development. This change was aimed at our difficulties with planning in periods longer than a month, which was becoming an increasing challenge as we started working with enterprise businesses who wanted us to commit to specific functionality as part of signing their contracts. On the other hand, the approach worked poorly, because it assumed that the issue was engineering managers being generally unfamiliar with agile techniques. The challenge of adoption wasn’t awareness, but rather the difficulty of prioritizing asks from numerous stakeholders in an environment where saying no was frowned upon.

In this agile rollout, the lack of a shared planning paradigm was a real, apt problem. However, the solution solved the easiest part of the problem, without addressing the messier parts, and consequently failed to make meaningful progress. This happens a surprising amount, and can be largely avoided with a small dose of refinement.

On the opposite end, we created Uber’s service adoption strategy exclusively through refinement, because the infrastructure engineering team didn’t have any authority to mandate wider changes. Instead, we relied on two different kinds of refinement to focus our interative efforts. First, we used systems modeling to understand what parts of adoption we needed to focus on. Second, we used strategy testing to learn by migrating individual product engineering teams over to the new platform.

In the agile adoption example, failure to refine turned a moderately challenging problem into a strategy failure. In the service migration example, focus on refinement translated an extremely difficult problem into a success. Refinement is, in my experience, the kernel of effective strategy.

If it matters, why is it skipped?

When a small team creates a strategy, a so-called low-atitude strategies, they almost always spend a great deal of time refining their strategy. This isn’t because most teams believe in refinement. Rather it’s because most teams lack the authority to force others to align with their strategy. This lack of authority means they must incrementally prove out their approach until other teams or executives believe it’s worth aligning with.

High-altitude strategy is typically the domain of executives, who generally have the ability to mandate adoption, and routinely skip the refinement stage, even when it’s inexpensive and is almost guaranteed to make them more successful. Why is that? When executives start a new role, they know making an early impression matters. They also, unfortunately, know that sounding ambitious often resonates more loudly with leadership teams than being effective work. So, while they do hope to eventually be effective, early on they kick off a few aspirational initiatives like a massive overhaul of the codebase, believing it’ll establish their reputation as an effective leader at the company.

This isn’t uniquely an executive failure, it also happens frequently in permissive strategy organizations that require an ambitious, high-leverage project to get promoted into senior engineering roles. For example, you might a novel approach to networking or authorization implemented in a company, whose adoption fails after solving some easier proof points, and trace its heritage back to promotion criteria. In many cases, the promotion will come before the rollout stalls out, disincentivizing the would-be promoted engineer from worrying too deeply about whether this was net-positive for the organization. The executive responsible for the promotion rubric will eventually recognize the flaw, but it’s not the easiest tradeoff for them to pick between an organization that innovates too much while empowering individuals or an organization with little waste but restricted room for creativity.

Another reason refine can get skipped is that sometimes you’re forced to urgently create and commit to a strategy, usually because your boss tells you to. This doesn’t actually prevent refinement–just say you’re committed and refine anyway–but often this interactions turns off the strategist’s mind, tricking them into intellectually thinking they can’t change their approach because they’ve already committed to it. This is never true, all decisions are up for review with proper evidence, but it takes a certain courage to refine when those around you are asking for weekly updates on completing the project.

There’s one other important reason that strategy refinement gets skipped: many people haven’t built out a toolkit to perform strategy refinement, and haven’t worked with someone who has a toolkit.

Building your toolkit

I’m eternally grateful to my father, a professor of economics, who brought me to a systems modeling workshop in Boston one summer when I was in high school. This opened my eyes to the wide world of techniques for reasoning about problems, and systems modeling became the first tool in my toolkit for strategy refinement.

The section on refinement will go into three refinement techniques in significant detail: strategy testing, systems modeling, and Wardley mapping, as well as surveying a handful of other techniques more common to strategy consultants. Systems modeling I adopted early, whereas Wardly mapping I only learned while working on this book. Few individuals are proficient users of many refinement tools, but it’s extraordinarily powerful to unlock your first tool, and worthwhile to slowly expand your experience with other tools over time. All tools are flawed, and each is best at illuminating certain types of problems.

If all of these are unfamiliar, then skim over all of them and pick one that seems most applicable to a current problem you’re working on. You’ll build expertise by trying a tool against many different problems, and talking through the results with engaged peers.

As you practice, remember that the important thing to share is the learning from these techniques, and try to avoid getting too caught up in sharing the techniques themselves. I’ve seen these techniques meaningfully change strategies, but I’ve never seen those changes successfully justified through the inherent insight of the refinement techniques themselves.

Strategy testing

Sometimes you’ll need a strategy to solve an ambiguous problem, or a problem where diagnosing the issues blocking progress are poorly understood. At Carta, one strategy problem we worked on was improving code quality, which is a good example of both of those. It’s difficult to agree on what code quality is, and it’s equally difficult to agree on appropriate, concrete steps to improve it.

To navigate that ambiguity, we spent relatively little time thinking about the right initial solution, and a great deal of our time deploying the strategy testing technique:

  1. Identify the narrowest, deepest available slice of your strategy. Iterate on applying that slice until you see some evidence it’s working.
  2. As you iterate, identify metrics that help you verify the approach is working.
  3. Operate from the belief that people are well-meaning, and strategy failures are due to excess friction and poor ergonomics.
  4. Keep refining until you have conviction that your strategy’s details work in practice, or that the strategy needs to be approached from a new direction.

In this case, we achieved some small wins, funded a handful of specific bets that we believed would improve the problem long-term, and ended the initiative early without making a large organzational commitment. You could argue that’s a failure, but my experience is quite different: having a problem doesn’t mean you have an elegant solution, and strategy testing helps you validate if the solution’s efficiency and ergonomics are viable.

If you’re dealing with a deeply ambiguous problem and there’s no agreement on the nature of the reality you’re operating in, strategy testing is a great technique to start with.

Systems modeling

When you’re unsure where leverage points might be in a complex system, systems modeling is an effective technique to cheaply determine which leverls might be effective. For example, the systems model for onboarding drivers in a ride-share app shows that reengaging drives who’ve left the platform matters more than bringing on new drivers in a mature market.

Similarly, in the Uber service migration example, systems modeling helped us focus on eliminating upfront steps during service onboarding, shifting to reasonable defaults and away from forcing teams from learning the new service platform before it had shown done anything useful for them.

Diagram of a quality systems model

While you can certainly reach these insights without modeling, modeling tends to make the insights immediately visible. In cases where your model doesn’t immediately illuminate what matters most, studying how your model’s projections conflict with real-world data will guide you to understand where your assumptions are contorting your understanding of the problem.

If you generally understand a problem, but need to determine where to focus efforts to make the largest impact, then systems modeling is valuable technique to deploy.

Wardley mapping

Many engineering strategies implicitly make the assumption that the ecosystem we’re operating within is static. However, that’s certainly false. Many experienced engineers and engineering leaders have great judgment, and great intuition, but nonetheless deploy flawed strategy because they’ve anchored on their memory of how things work rather than noticing how things have changed over time.

If, rather than being hit over the head by them, you want to incorporate these changes into your strategy, Wardley mapping is a great tool to add to your kit.

Wardley maps allow you to plot users, their needs, and then study how the solutions to those needs will shift over time. For example, today there is a proliferation of narrow platforms built on recent advances in large language models, but studying a Wardley map of the LLM ecosystem suggests that it’s likely that this ecosystem will consolidate to fewer, broader platforms rather than remaining so widely scattered across distinct vendors.

Wardley map of Large Language Model ecosystem

If your strategy involves adopting a highly dynamic technology such as observability in the 2010s, or if your strategy is intended to operate across five-plus years, then Wardley mapping will help surface how industry evolution will impact your approach.

Anti-patterns in refinement

We’ve already discussed why refinement is often skipped, which is the most frequent and most damning refinement anti-pattern. At Calm, we cargo-culted adoption of decomposing our monolithic codebase into microservices; we had no reason to believe this was improving developer productivity, but we continued to pursue this strategy for a year before recognizing that we were suffering from skipping refinement.

The second most common anti-pattern is creating the impression of strategy refinement through manufactured consent. A new senior leader joined Uber and mandated a complete technical reachitecture, justifying this in part through the evidence that a number of internal leaders had adopted the same techniques successfully on their teams. Speaking with those internal leaders, they themselves were skeptical that the proposal made sense, despite the fact that their surface-level agreement was being used to convince the wider organization that they believed in the new approach.

Finally, refinement often occurs, but counter-evidence is discarded because the refining team is optimizing for a side-goal of some sort. My first team at Yahoo adopted Erlang for a key component of Yahoo! Build Your Own Search Service, which proved to be an excellent solution to our problem of wanting to use Erlang, but a questionable solution to the core problem at hand. Only three of the engineers on our fifteen person team were willing to touch the Erlang codebase, but that counter-evidence was ignored because it was in conflict with the side-goal.

Summary

This chapter has introduced the concept of strategy refinement, surveyed three common refinement techniques–strategy testing, systems modeling, and Wardley mapping–and provided a framework for building your personal toolkit for refinement. When you’re ready to get into more detail, further in the book there’s a section dedicated to the details of applying these techniques, starting with strategy testing.

Boost website speed with prefetching and the Speculation Rules API

by Schepp

Everybody loves fast websites, and everyone despises slow ones even more. Site speed significantly contributes to the overall user experience (UX), determining whether it feels positive or negative. To ensure the fastest possible page load times, it’s crucial to design with performance in mind. However, performance optimization is an art form in itself. While implementing straightforward techniques like file compression or proper cache headers is relatively easy, achieving deeper optimizations can quickly become complex.

But what if, instead of solely trying to accelerate the loading process, we triggered it earlier—without the user noticing?

One way to achieve this is by prefetching pages the user might navigate to next using <link rel="prefetch"> tags. These tags are typically embedded in your HTML, but they can also be generated dynamically via JavaScript, based on a heuristic of your choice. Alternatively, you can send them as an HTTP Link header if you lack access to the HTML code but can modify the server configuration. Browsers will take note of the prefetch directives and fetch the referenced pages as needed.

In addition to <link rel="prefetch">, Chromium-based browsers support <link rel="prerender">. This tag is essentially a supercharged version of <link rel="prefetch">. Known as "NoState Prefetch," it not only prefetches an HTML page but also scans it for subresources—stylesheets, JavaScript files, images, and fonts referenced via a <link rel="preload" as="font" crossorigin> — loading them as well.

The Speculation Rules API

A relatively new addition to Chromium browsers is the Speculation Rules API, which offers enhanced prefetching and enables actual prerendering of webpages. It introduces a JSON-based syntax for precisely defining the conditions under which preprocessing should occur.

Here’s a simple example of how to use it:

<script type="speculationrules">
{
"prerender": [{
"urls": ["next.html", "next2.html"]
}]
}
</script>

The above list-rule specifies that the browser should prerender the URLs next.html and next2.html so they are ready for instant navigation. The keyword prerender means more than fetching the HTML and subresources—it instructs the browser to fully render the pages in hidden tabs, ready to replace the current page instantly when needed. This makes navigation to these pages feel seamless.

Prerendered pages also typically score excellent Core Web Vital metrics. Layout shifts and image loading occur during the hidden prerendering phase, and JavaScript execution happens upfront, ensuring a smooth experience when the user first sees the page.

Instead of listing specific URLs, the API also allows for pattern matching using where and href_matches keys:

<script type="speculationrules">
{
"prerender": [{
"where": { "href_matches": "/*" }
}]
}
</script>

For more precise targeting, CSS selectors can be used with the selector_matches key:

<script type="speculationrules">
{
"prerender": [{
"where": { "selector_matches": ".navigation__link" }
}]
}
</script>

These rules, called document-rules, act on link elements as soon as the user triggers a pointerdown or touchstart event, giving the referenced pages a few milliseconds' head start before the actual navigation.

If you want the preprocessing to begin even earlier, you can adjust the eagerness setting:

<script type="speculationrules">
{
"prerender": [{
"where": { "href_matches": "/*" },
"eagerness": "moderate"
}]
}
</script>

Eagerness values:

  • immediate: Executes immediately.
  • eager: Currently behaves like immediate but may be refined to sit between immediate and moderate.
  • moderate: Executes after a 200ms hover or on pointerdown for mobile devices.
  • conservative (default): Speculates based on pointer or touch interaction.

For even greater flexibility, you can combine prerender and prefetch rules with different eagerness settings:

<script type="speculationrules">
{
"prerender": [{
"where": { "href_matches": "/*" },
"eagerness": "conservative"
}],
"prefetch": [{
"where": { "href_matches": "/*" },
"eagerness": "moderate"
}]
}
</script>

Limitations and Challenges

While the Speculation Rules API is powerful, it comes with some limitations:

  1. Browser support: Only Chromium-based browsers support it. Other browsers lack this capability, so treat it as a progressive enhancement.
  2. Bandwidth concerns: Over-aggressive settings could waste user bandwidth. Chromium imposes limits to mitigate this: a maximum of 10 prerendered and 50 prefetched pages with immediate or eager eagerness.
  3. Server strain: Poorly optimized servers (e.g., no caching, heavy database dependencies) may experience significant load increases due to excessive speculative requests.
  4. Compatibility: Prefetching won’t work if a Service Worker is active, though prerendering remains unaffected. Cross-origin prerendering requires explicit opt-in by the target page.

Despite these caveats, the Speculation Rules API offers a powerful toolset to significantly enhance perceived performance and improve UX. So go ahead and try them out!

I would like to express a big thank you to the Webperf community for always being ready to help with great tips and expertise. For this article, I would like to thank Barry Pollard, Andy Davies, and Noam Rosenthal in particular for providing very valuable background information. ❤️

Misleading Icons: Icon-Only-Buttons and Their Impact on Screen Readers

by Alexander Muzenhardt

Introduction

Imagine you’re tasked with building a cool new feature for a product. You dive into the work with full energy, and just before the deadline, you manage to finish it. Everyone loves your work, and the feature is set to go live the next day.
A few days later, you receive an email from a user who can’t access the new feature. The user points out that they don’t understand what the button does. What do they mean? You review your code, locate the button, and start digging into the problem.

<button>
<i class="icon">📆</i>
</button>

The Problem

You find some good resources explaining that there are people with disabilities who need to be considered in these cases. This is known as accessibility. For example, some individuals have motor impairments and cannot use a mouse. In this particular case, the user is visually impaired and relies on assistive technology like a screen reader, which reads aloud the content of the website or software. The button you implemented doesn’t have any descriptive text, so only the icon is read aloud. In your case, the screen reader says, “Tear-Off Calendar button”. While it describes the appearance of the icon, it doesn’t convey the purpose of the button. This information is meaningless to the user. A button should always describe what action it will trigger when activated. That’s why we need additional descriptive text.

The Challenge

Okay, you understand the problem now and agree that it should be fixed. However, you don’t want to add visible text to the button. For design and aesthetic reasons, sighted users should only see the icon. Is there a way to keep the button “icon-only” while still providing a meaningful, descriptive text for users who rely on assistive technologies like screen readers?

The Solution

First, you need to give the button a descriptive name so that a screen reader can announce it.

<button>
<span>Open Calendar</span>
<i class="icon">📆</i>
</button>

The problem now is that the button’s name becomes visible, which goes against your design guidelines. To prevent this, additional CSS is required.

.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
clip: rect(0, 0, 0, 0);
white-space: nowrap;
border-width: 0;
}
<button>
<span class="sr-only">Open Calendar</span>
<i class="icon">📆</i>
</button>

The CSS ensures that the text inside the span-element is hidden from sighted users but remains readable for screen readers. This approach is so common that well-known CSS libraries like TailwindCSS, Bootstrap, and Material-UI include such a class by default.

Although the text of the buttons is not visible anymore, the entire content of the button will be read aloud, including the icon — something you want to avoid.

In HTML you are allowed to use specific attributes for accessibility, and in this case, the attribute aria-hidden is what you need. ARIA stands for “Accessible Rich Internet Applications” and is an initiative to make websites and software more accessible to people with disabilities.

The attribute aria-hidden hides elements from screen readers so that their content isn’t read. All you need to do is add the attribute aria-hidden with the value “true” to the icon element, which in this case is the “i”-element.

<button>
<span class="sr-only">Open Calendar</span>
<i class="icon" aria-hidden="true">📆</i>
</button>

Alternative

An alternative is the attribute aria-label, which you can assign a descriptive, accessible text to a button without it being visible to sighted users. The purpose of aria-label is to provide a description for interactive elements that lack a visible label or descriptive text. All you need to do is add the attribute aria-label to the button. The attribute aria-hidden and the span-Element can be deleted.

<button aria-label="Open Calendar">
<i class="icon">📆</i>
</button>

With this adjustment, the screen reader will now announce “Open calendar,” completely ignoring the icon. This clearly communicates to the user what the button will do when clicked.

Which Option Should You Use?

At first glance, the aria-label approach might seem like the smarter choice. It requires less code, reducing the likelihood of errors, and looks cleaner overall, potentially improving code readability.

However, the first option is actually the better choice. There are several reasons for this that may not be immediately obvious:

  • Some browsers do not translate aria-label
  • It is difficult to copy aria-label content or otherwise manipulated it as text
  • aria-label content will not show up if styles fail to load

These are just a few of the many reasons why you should be cautious when using the aria-label attribute. These points, along with others, are discussed in detail in the excellent article "aria-label is a Code Smell" by Eric Bailey.

The First Rule of ARIA Use

The “First Rule of ARIA Use” states:

If you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.

Even though the first approach also uses an ARIA attribute, it is more acceptable because aria-hidden only hides an element from screen readers. In contrast, aria-label overrides the standard HTML behavior for handling descriptive names. For this reason, following this principle, aria-hidden is preferable to aria-label in this case.

Browser compatibility

Both aria-label and aria-hidden are supported by all modern browsers and can be used without concern.

Conclusion

Ensuring accessibility in web design is more than just a nice-to-have—it’s a necessity. By implementing simple solutions like combining CSS with aria-hidden, you can create a user experience that is both aesthetically pleasing and accessible for everyone, including those who rely on screen readers. While there may be different approaches to solving accessibility challenges, the key is to be mindful of all users' needs. A few small adjustments can make a world of difference, ensuring that your features are truly usable by everyone.

Cheers
Alex

Resources / Links