The Manager's Guide – #91

Weekly Hand-Picked Collection Edition

The Manager's Guide – #91
There is one thing in life that you can always rely on: life being as it is.
- Charlotte Joko Beck

How DevOps and SRE Principles Foster Psychological Safety

  • 🛡️ Psychological safety — the ability to take risks without fear of blame — is crucial for DevOps teams, allowing people to ask questions, admit mistakes, and share ideas openly

  • 🔍 Google's Project Aristotle research revealed that psychological safety, not technical skill, was the #1 predictor of team success

  • 🎯 Four key DevOps/SRE principles that build psychological safety:

    • Blameless postmortems that focus on learning from failures

    • Transparent information sharing to level the playing field between remote and office workers

    • Automation to reduce fear of making mistakes

    • Error budgets that give teams freedom to innovate without fear

  • 💻 Remote/hybrid teams face unique challenges — like information silos and communication gaps — making psychological safety even more critical

  • ⚡ Real impact example — a team reduced miscommunication and downtime by 40% by prioritizing transparency in incident response

  • 🌱 Practical steps for building safety include: creating equal airtime for remote voices, encouraging questions, celebrating lessons from mistakes, and thorough documentation

Leading vs lagging indicators (how product teams move fast)

  • 📊 Product success has two types of metrics — outputs (product updates) and outcomes (business results), but outcomes can take too long to measure effectively

  • ⚡ Leading indicators are early signals that predict feature success, while lagging indicators (like business outcomes) take weeks or months to show impact

  • 🎯 Key finding — more than 50% of features fail to impact customers in their first iteration, making quick feedback crucial

  • 📈 Six essential leading indicators to track:

    • Awareness: Do users know the feature exists?

    • Adoption: Are they trying it?

    • Engagement: Are they using it as intended?

    • Satisfaction: Are they happy with it?

    • Direct feedback: What improvements do users want?

    • Usage patterns: How are users actually interacting with it?

  • ⚙️ Fast iteration benefits — teams maintain momentum, keep technical context fresh, and can fix issues within days instead of weeks

  • 💡 Example insight — waiting 30 days for lagging metrics versus acting on day 3 with leading indicators saves 27 days of potential user frustration and development time

How to Make Technical Debt Your Friend

  • 💡 Key insight — Technical Debt (TD) isn't always bad; it can be a tool for learning and avoiding over-investment in unproven solutions

  • 🔄 MVA approach — Using Minimum Viable Architecture helps teams decide what TD actually needs fixing versus what can remain as is

  • 🎯 Important statistic — Most perceived “technical debt” never actually needs to be repaid because:

    • The initial solution might be good enough

    • The product direction might change

    • The anticipated scale/problems might never materialize

  • ⚖️ Risk management perspective — TD is more like a “contingent liability” than real debt, helping teams:

    • Launch experiments faster

    • Learn at minimal cost

    • Avoid over-engineering solutions nobody wants

  • 🏗️ Real-world parallel — Like Fallingwater's famous architecture that needed later reinforcement, some technical compromises enable bold innovations that wouldn't happen if perfection was required upfront

  • 📈 Strategic benefit — Technical debt can actually be positive when it helps teams:

    • Get faster feedback from users

    • Validate assumptions before major investment

    • Keep development momentum going

Why techies leave Big Tech

  • 💰 Compensation challenges:

    • “Golden handcuffs” from high equity that vests over 4 years

    • Steep drops when initial stock grants expire

    • Stock price fluctuations dramatically affecting total compensation

  • 🔄 Stability shift:

    • Big Tech no longer seen as stable after recent mass layoffs

    • Only Apple and NVIDIA have avoided major layoffs in recent decades

    • Companies like Meta cut 25% of staff in 6 months

  • 📈 Professional growth limits:

    • Learning plateaus after mastering company-specific tech

    • Harder to reach executive positions compared to startups

    • Too much process and bureaucracy as companies mature

  • 🎯 Career path dynamics:

    • Easier path to C-level positions at scale-ups

    • Big Tech experience makes candidates attractive to smaller companies

    • Internal transfers often blocked by politics or process

  • 🌟 Market insights for 2024:

    • Market bifurcation between junior and senior roles

    • Non-public companies hiring more actively than public ones

    • Growing “tiered” system where top-tier experience opens exclusive opportunities

Manage your priorities and energy

  • ⚖️ Old framework — "Company, Team, Self" priority order led to burnout despite being conceptually correct because:

    • Most valuable work isn't always most interesting

    • Strong performers solving urgent problems often went unrecognized

    • Rigid adherence drained energy even when decisions were "right"

  • 🔋 Energy management insight — Work that energizes you is positive-sum because:

    • Energized people accomplish more overall

    • Different people get energy from different activities

    • Some "non-optimal" work can boost productivity if it's energizing

  • 🤝 New "eventual quid pro quo" approach:

    • Generally prioritize company/team needs

    • Add energizing work when becoming depleted

    • Change roles if balance can't be maintained long-term

    • Avoid demanding immediate returns for taking on work

  • ⚡ Key principle for energizing work:

    • OK to do work orthogonal to company needs (like occasional speaking)

    • Never do work opposed to company needs (like using risky tech just to learn it)

    • Keep non-core activities moderate in scope

  • 🎯 Leadership insight — Being too focused on "correct" decisions can actually reduce impact and effectiveness in senior roles

The Swiss Cheese Model

  • 🧀 Core concept — System failures happen when vulnerabilities in multiple protective layers align, like holes in Swiss cheese slices stacking up

  • 🛡️ Key defensive layers include:

    • Technical: Authentication, validation, monitoring, backups

    • Human: Code reviews, deployment procedures, communication patterns

    • Each layer will have imperfections ("holes"), but they can compensate for each other

  • 🎯 Critical insight — Perfect systems aren't the goal because:

    • Eliminating all holes is impossible

    • Focus should be on preventing holes from aligning

    • Well-arranged imperfect layers can create robust systems

  • 📊 Practical application in post-mortems:

    • Map how failures breached multiple layers

    • Look for patterns in how holes align

    • Examine interactions between layers

    • Focus on system resilience, not perfect prevention

  • 💡 Key takeaway — Success comes from:

    • Acknowledging imperfections will exist

    • Building complementary defensive layers

    • Creating systems that fail gracefully

    • Regular monitoring of layer interactions


That’s all for this week’s edition

I hope you liked it, and you’ve learned something — if you did, don’t forget to give a thumbs-up and share this issue with your friends and network.

See y’all next week 👋