Newsletter

The Manager's Guide – #91

Weekly Hand-Picked Collection Edition

25 Feb 2025 — 4 min read

There is one thing in life that you can always rely on: life being as it is.
- Charlotte Joko Beck

How DevOps and SRE Principles Foster Psychological Safety

🛡️ Psychological safety — the ability to take risks without fear of blame — is crucial for DevOps teams, allowing people to ask questions, admit mistakes, and share ideas openly
🔍 Google's Project Aristotle research revealed that psychological safety, not technical skill, was the #1 predictor of team success
🎯 Four key DevOps/SRE principles that build psychological safety:
- Blameless postmortems that focus on learning from failures
- Transparent information sharing to level the playing field between remote and office workers
- Automation to reduce fear of making mistakes
- Error budgets that give teams freedom to innovate without fear
💻 Remote/hybrid teams face unique challenges — like information silos and communication gaps — making psychological safety even more critical
⚡ Real impact example — a team reduced miscommunication and downtime by 40% by prioritizing transparency in incident response
🌱 Practical steps for building safety include: creating equal airtime for remote voices, encouraging questions, celebrating lessons from mistakes, and thorough documentation

📊 Product success has two types of metrics — outputs (product updates) and outcomes (business results), but outcomes can take too long to measure effectively
⚡ Leading indicators are early signals that predict feature success, while lagging indicators (like business outcomes) take weeks or months to show impact
🎯 Key finding — more than 50% of features fail to impact customers in their first iteration, making quick feedback crucial
📈 Six essential leading indicators to track:
- Awareness: Do users know the feature exists?
- Adoption: Are they trying it?
- Engagement: Are they using it as intended?
- Satisfaction: Are they happy with it?
- Direct feedback: What improvements do users want?
- Usage patterns: How are users actually interacting with it?
⚙️ Fast iteration benefits — teams maintain momentum, keep technical context fresh, and can fix issues within days instead of weeks
💡 Example insight — waiting 30 days for lagging metrics versus acting on day 3 with leading indicators saves 27 days of potential user frustration and development time

💡 Key insight — Technical Debt (TD) isn't always bad; it can be a tool for learning and avoiding over-investment in unproven solutions
🔄 MVA approach — Using Minimum Viable Architecture helps teams decide what TD actually needs fixing versus what can remain as is
🎯 Important statistic — Most perceived “technical debt” never actually needs to be repaid because:
- The initial solution might be good enough
- The product direction might change
- The anticipated scale/problems might never materialize
⚖️ Risk management perspective — TD is more like a “contingent liability” than real debt, helping teams:
- Launch experiments faster
- Learn at minimal cost
- Avoid over-engineering solutions nobody wants
🏗️ Real-world parallel — Like Fallingwater's famous architecture that needed later reinforcement, some technical compromises enable bold innovations that wouldn't happen if perfection was required upfront
📈 Strategic benefit — Technical debt can actually be positive when it helps teams:
- Get faster feedback from users
- Validate assumptions before major investment
- Keep development momentum going

💰 Compensation challenges:
- “Golden handcuffs” from high equity that vests over 4 years
- Steep drops when initial stock grants expire
- Stock price fluctuations dramatically affecting total compensation
🔄 Stability shift:
- Big Tech no longer seen as stable after recent mass layoffs
- Only Apple and NVIDIA have avoided major layoffs in recent decades
- Companies like Meta cut 25% of staff in 6 months
📈 Professional growth limits:
- Learning plateaus after mastering company-specific tech
- Harder to reach executive positions compared to startups
- Too much process and bureaucracy as companies mature
🎯 Career path dynamics:
- Easier path to C-level positions at scale-ups
- Big Tech experience makes candidates attractive to smaller companies
- Internal transfers often blocked by politics or process
🌟 Market insights for 2024:
- Market bifurcation between junior and senior roles
- Non-public companies hiring more actively than public ones
- Growing “tiered” system where top-tier experience opens exclusive opportunities

⚖️ Old framework — "Company, Team, Self" priority order led to burnout despite being conceptually correct because:
- Most valuable work isn't always most interesting
- Strong performers solving urgent problems often went unrecognized
- Rigid adherence drained energy even when decisions were "right"
🔋 Energy management insight — Work that energizes you is positive-sum because:
- Energized people accomplish more overall
- Different people get energy from different activities
- Some "non-optimal" work can boost productivity if it's energizing
🤝 New "eventual quid pro quo" approach:
- Generally prioritize company/team needs
- Add energizing work when becoming depleted
- Change roles if balance can't be maintained long-term
- Avoid demanding immediate returns for taking on work
⚡ Key principle for energizing work:
- OK to do work orthogonal to company needs (like occasional speaking)
- Never do work opposed to company needs (like using risky tech just to learn it)
- Keep non-core activities moderate in scope
🎯 Leadership insight — Being too focused on "correct" decisions can actually reduce impact and effectiveness in senior roles

🧀 Core concept — System failures happen when vulnerabilities in multiple protective layers align, like holes in Swiss cheese slices stacking up
🛡️ Key defensive layers include:
- Technical: Authentication, validation, monitoring, backups
- Human: Code reviews, deployment procedures, communication patterns
- Each layer will have imperfections ("holes"), but they can compensate for each other
🎯 Critical insight — Perfect systems aren't the goal because:
- Eliminating all holes is impossible
- Focus should be on preventing holes from aligning
- Well-arranged imperfect layers can create robust systems
📊 Practical application in post-mortems:
- Map how failures breached multiple layers
- Look for patterns in how holes align
- Examine interactions between layers
- Focus on system resilience, not perfect prevention
💡 Key takeaway — Success comes from:
- Acknowledging imperfections will exist
- Building complementary defensive layers
- Creating systems that fail gracefully
- Regular monitoring of layer interactions

I hope you liked it, and you’ve learned something — if you did, don’t forget to give a thumbs-up and share this issue with your friends and network.

See y’all next week 👋