Debo Ray

Co-Founder, CEO

Share your Social Media

In 2016 I joined Uber as a Software Engineer. Back then Uber just started benchmarking its engineering organization. Velocity was at around 1PR/developer/sprint. Production wasn’t doing well either: We experienced inconsistent behavior from newly introduced changes, high number of incidents and a long time to resolution.

The cause for many of these benchmarks was the fact that hundreds of developers were working on a single monorepo, many code changes and refactoring happened in pre-production and our environments were composed of hundreds of services.

To tackle these challenges, Uber went through two different phases: First we focused on developer velocity. Developer experience at its core came after that.

Developer Velocity

Developer velocity in some way or the other, it's a manifestation of the DORA metrics: Our engineers being able to push enough code, how much of it is landing in production. Then there's the remediation side of the house: how many incidents are happening and how quickly can we remediate those incidents.

The main challenge was around the time it took to actually get things into production and then the number of incidents that were happening.

To solve for that we introduces a number of things:

Breaking monorepos, custom CI and builds

Uber had a couple of monoliths when I started there. That was a big pain point because as the organization grew to hundreds of engineers, all committing to the same to repose, merge conflicts were frequent. Also each build required instrumenting thousands of microservices, which took a while.

We broke it down to libraries to get changes rolled out really easily.
We had invested very heavily in custom CD tooling.
We had invested very heavily in build tooling to make Docker builds faster.
We invested in building our own compute infrastructure on top of Mesos.

The result was an 80% improvement in velocity and specifically in lead time to merge.

P95 time for merging code changes (in mins)

Developer Velocity

Developers needed some way to do end-to-end testing of the features that they were working on. How do we enable that? There's no way to run tens or even hundreds of microservices on your local computer. Uber had over 5,000 microservices at certain points. We fundamentally need to use our production in some way to help the developer do end-to-end testing.

‍

Ephemeral production-like dev environments

We built a tool called Cerberus to enable network access for those things. Then came the DevPod, which was essentially going and running just your IDE in this ephemeral environment context, which now gives the developer a disposable environment, wherein they have the capability of doing a bunch of end-to-end testing. As a by-product we also controlled cost because it's not an always-on environment. Dev environments would get hibernated after a certain idle time.

One of the things we had also done was we instrumented all of the developer laptops really early on. We instrumented all of our DevPods as well, just to understand where things were breaking, what tooling was taking too much time to run, so we could invest in bringing new fixes to all of those infrastructure.

After years of investing in all of this tooling, the DevPod project being the final one, Uber's internal developers NPS finally went green.

‍

Uber engineering KPIs in 2021

Post these changes Uber saw on average one new change landing in production every minute. Average “local” binary build times improved by 1.8x and average container build times improved by 70+%.

‍

In summary it was not one silver bullet, but rather a combination of things, some behavioral, some technical tooling.

‍

Many companies today are Uber in 2016

There's no business that will say, I want to move slower but today’s world is different than 2016: Cloud is expensive. Developers are more expensive, the market is more competitive and organizations can't just keep hiring engineers left and right, especially with the rising costs of acquiring new engineers and onboarding times. As a result, how can we do the best with the tools we already have, the engineering teams that we already have? How can we arm them with the best possible tooling? Because no engineer joins a company to do bad work so how can you give these engineers the tools that enable them to move forward instead of giving them a bunch of obstacles that they constantly have to face and overcome?

The same problem that in the past existed only for large organizations because they had very complex architecture with thousands of services and thousands of developers now applies to smaller organizations because now more than ever, everybody cares about cost.

So step one is let's get our hands around the metrics. Step two is let's figure out a way to improve velocity using the same methodologies and tools that helped improve velocity at Uber.

Moreover developers are an expensive resource, hiring is tough, onboarding takes 8-12 weeks and retaining them is key in this competitive landscape.

The same solutions that helped Uber increase velocity and experience can be applied to many other engineering organizations.

‍

The birth of DevZero

Our platform helps platform engineering teams inside companies to establish golden paths, while still giving developers the flexibility that they need to go and arbitrarily create resources as and when they need them. We are responsible for managing the lifecycle of the infrastructure that we spawn.

‍

How Uber increased developer productivity and what you can learn from that

Debo Ray

Developer Velocity

Breaking monorepos, custom CI and builds

Developer Velocity

Ephemeral production-like dev environments

Uber engineering KPIs in 2021

Many companies today are Uber in 2016

The birth of DevZero