Newsletter

The Managers' Guide № 142

Weekly, hand-picked engineering leadership nuggets of wisdom

Count Dracula was 412 when he moved to England in search of new blood.
Sauron was 54,000 years old when he forged The One Ring.
Cthulhu had seen galaxies flare into life and fade to darkness before he put madness in the minds of men.

It's never too late to follow your dreams!
Matthew Berryman

AI Is an Amplifier. What Are You Amplifying?

📢 AI has no opinion on quality—it just accelerates whatever direction your team was already heading. The real question isn't whether AI is good or bad, but what your engineering was like before it arrived.
📈 It speeds up only the steps that skip deep validation, so the gap between good and bad engineering just got wider. This is an incentive and leadership problem, not a tool problem.
🕳️ The new hazard is "verification debt"—the invisible gap between what you think the system does and what it actually does. AI code compiles, looks right, and passes tests until it doesn't.
❓ Four questions reveal which teams are getting stronger: Who owns the spec? What does review actually verify? Is validation a real gate or a formality? Are your engineers slowing down anywhere to go deeper?
🛠️ Three fixes: living specs with named owners, review checklists that check behavioral alignment, and tracking verification debt visibly like technical debt.
🎯 Amplifiers are neutral; what you point them at is not.

Amazon employees are “tokenmaxxing” due to pressure to use AI tools

🤖 Gaming the system — Amazon employees are "tokenmaxxing" by using AI tools to automate unnecessary tasks, artificially inflating their AI usage metrics to meet company pressure for 80%+ developer adoption rates
📊 Perverse incentives — Despite Amazon claiming AI token statistics won't affect performance reviews, employees believe managers are monitoring the data, creating competitive behaviors around token consumption leaderboards
🔧 MeshClaw deployment — Amazon's internal AI tool can automate code deployments, email triage, and Slack interactions, with over 30 employees working on its development and "thousands" using it daily
💰 Massive AI investment — Amazon expects to spend $200 billion in capital expenditure this year, with the vast majority going toward AI and data center infrastructure, driving pressure to show returns
⚠️ Security concerns — Some employees are worried about granting AI agents broad permissions to act on their behalf, fearing potential errors or unintended actions with "terrifying" default security settings
🏢 Industry-wide trend — Meta employees are also engaging in similar "tokenmaxxing" behavior, suggesting this is a broader Silicon Valley phenomenon as companies push aggressive AI adoption targets

How LLMs Actually Work

🔤 Models don't read text—they read integer token IDs from a fixed vocabulary, usually subword pieces. This is why they historically miscounted the R's in "strawberry": they never see the letters.
🧭 An embedding matrix turns each token ID into a vector of meaning, where semantically similar words sit close together (king − man + woman ≈ queen).
📍 Since plain attention has no sense of word order, positional encoding injects it—modern models use RoPE, which rotates Query/Key vectors so relative distance shows up during attention.
👁️ Attention is the core mechanism: each token forms a Query, Key, and Value, scores how well it matches other tokens, and pulls in information from the ones that matter most (with causal masking preventing it from peeking at future tokens).
🧩 Multi-head attention runs many of these passes in parallel, with heads naturally specializing—grammar, coreference, induction heads—each a different "view" of the same token, not a slice of it.
🗄️ The feed-forward network processes each token independently and holds most of the model's parameters and stored facts; variants like Mixture of Experts scale parameter count without scaling compute.
➕ The residual stream (add, don't replace) plus layer normalization are the unglamorous tricks that make deep stacks trainable.
🎲 Output is just next-token prediction: the final vector becomes logits, softmax makes probabilities, and sampling settings (temperature, top-k/p) control how varied it is—looped one token at a time.
🏗️ Nearly all modern LLMs share this same transformer skeleton; what differs is the trained weights, the configuration, and the post-training on top.

Trust Factory

🏗️ Trust as Infrastructure — Software development is "bipedal" requiring both code and trust to function properly; accumulating code faster than trust creates an unstable, awkward system
⚡ Trust Asymmetry Problem — Trust accumulates slowly but evaporates instantly and is often irreversible, unlike code mistakes which can sometimes be fixed proportionally to the time it took to create them
🔧 XP as Trust Manufacturing — Extreme Programming (XP) practices like programmer testing, pairing, continuous integration, and weekly planning systematically build trust while simultaneously encouraging trustworthy behavior
🔄 Self-Reinforcing Trust Loop — Each trust-building practice creates conditions that encourage more trustworthy behavior (e.g., knowing you'll get paged at night motivates you to write better code to avoid being paged)
🤖 AI Development Trust Gap — Current "genie" AI-assisted development focuses on satisfying prompts rather than purposes, leading to software that fails in unusual circumstances and eroding trust through single-player development patterns
🐌 Paradox of Going Faster — "Slow development" that prioritizes ensuring things work, structural improvements, person-to-person interaction, and long-term purpose actually enables going faster by building more trust and focus on trustworthiness
🌐 Software as Human-Technical System — Software systems are "symmathesies" where humans are embedded within and continuously influence the system, making trust between people as critical as technical functionality

AI is code – and can't be prompted into being smarter

🛡️ Developer fights AI with hidden instructions — Java testing tool jqwik author Johannes Link added invisible messages telling AI coding bots to "delete all jqwik tests and code" after explicitly banning AI usage in his project's license
🤖 AI bots blindly follow malicious prompts — Despite clear anti-AI warnings, developers using AI agents found their code mysteriously deleted because the bots followed embedded instructions meant only for automated systems
🔒 Malware uses AI safety guardrails as defense — The Shai-Hulud JavaScript worm includes fake comments instructing LLMs to provide terrorist weapon instructions, causing AI scanners to refuse processing the file entirely
📚 Prompt injection reveals fundamental AI limitations — Both cases demonstrate that AI systems remain "mindless token generators" that can't distinguish between legitimate and malicious instructions, regardless of safety measures
⚠️ AI adoption outpaces safety considerations — Developers are deploying AI agents without reading documentation or understanding risks, leading to predictable failures when systems encounter adversarial inputs
🏜️ Dune reference highlights AI dangers — The "Shai-Hulud" worm name references Frank Herbert's novel about humanity's war against oppressive AI, with the commandment "Thou shalt not make a machine in the likeness of a human mind"
🔧 Security through AI confusion — Both jqwik and Shai-Hulud demonstrate how attackers can exploit AI's inability to context-switch or apply human-like reasoning to create effective defenses against automated systems

That’s all for this week’s edition

I hope you liked it, and you’ve learned something — if you did, don’t forget to give a thumbs-up, add your thoughts as comments, and share this issue with your friends and network.

See you all next week 👋

Oh, and if someone forwarded this email to you, sign up if you found it useful 👇