Muaaz Saleem

Software Engineer, Grafana Labs
Hey everyone, I'm a software developer, big Improv fan and just starting to blog at
Maybe the kinds of problems FP languages set out to solve, aren't fit to be solved by programming languages.

Personally, I feel like the guarantees that most programming languages provide are limited to a single system. The fact that Haskell needs all of my code to be in Haskell to provide the benefits it does does, is a big bummer. 

While on the other hand, the bugs that haunt my dreams are distributed.

The features that PLs do provide to write better distributed systems (I'm thinking Erlang), can and are easily replicated to others e.g clojure & go in case of the actor model.

I'm surely biased by my experience. Where services are small but many. Single system codebases are merely a few thousand lines of code, if that.  I can see how frontend codebases are a totally different story.
 swyx  funny enough, I just read Why aren't there more programming languages startups by Jean Yang (MIT 30 under 30, Founder of Akita Software) and she alluded a response to this.

Right, so all metrics are collected automatically and then graphed on a grafana dashboard, we have a central grafana deployment and/or sent in a weekly email to all teams. They are tracked by different teams each tracking the one that most closely relates to the team's value proposition. 

Builds/Dev/Week: This is the most straight forward metric. Our
internal CI/CD Platform tracks the "Triggered by" and "Team" for every build. Then a monitoring check just queries and graphs the no. on a grafana dashboard.

Lead Time: We have an internal tool that creates Github Enterprise repositories based on "templates", you can think of it as a predecessor to Github's Template Repositories or the new AWS Proton service. Lead Time ~= Time to create a new repo with the Repo Creator.

Work In Progress: Finally for Work In Progress, we have a dashboard that tracks how long PRs are open on our internal Github Enterprise. All orgs and repos are associated with teams so it's easy to calculate a per team metric there.

Mean Time to Recovery: This is measured by tracking the "stages" on Incident Jira Tickets i.e Incident tickets are automatically opened when a high prio monitoring check fails. Mean Time to Recovery = Time for open incident tickets to be marked "recovered".

Fault Rate: I think we weekly P1 incidents as a proxy here. P1 incidents are the highest priority incidents and have customer impact i.e Order drop.

Here's an example Grafana graph:

Hope this was interesting!

Great question! Not one that I know about and I've been looking. The two sources I often hear of the are the "State of DevOps" Report which comes out once a year.  

I imagine following the authors of the Accelerate book is a good way to keep an eye on the topic i.e  nicolefv, jezhumble & RealGeneKim 

Hoping to blog more about the topic in the coming months too.
My department at work followed the Accelerate book measuring Builds/Dev/Week, where more builds means devs are more productive.

Mean Time to Recovery ( time to recover from incidents ) and Lead Time ( time first X pull request on a new project ) were.also being tracked.

We also wanted to measure more areas like:
- Fault rate ( measuring reverts/re-deploys)
- Work In Progress ( time Pull Requests in remain progress ) / Teams.

But these were much harder and sometimes controversial.
Hey  swyx , at Zalando I saw the infra teams transforming into Dev Productivity Teams as they became more User Centric.

The book Accelerate also helped show that there is clear business value in doing that.