Weekly to Daily Releases at Dropbox

Weekly to Daily Releases at Dropbox

We're in a race. Every AI feature we ship needs to be validated in market the next day. We make a change, we want to see if it works, and the feedback cycle from "code merged" to "users have it" determines how fast we learn. Weekly releases meant we were always a week behind our own insights.

Shipping a desktop client daily is a fundamentally different problem than shipping a web app daily. Web is different because there's no friction in just redeploying every few hours with cache invalidation. You don't need a user to install a binary, so you don't worry about an updater pipeline, client-side state, or OS-level integrations. You just deploy and everyone gets it.

Mobile daily releases are hard too, but for different reasons (app store review, device fragmentation, cellular bandwidth constraints). Desktop sits in the worst middle ground: you're pushing native code directly to user machines with no gatekeeper, and that code has to coexist with local state, filesystem watchers, shell extensions, and sync operations that can't just be interrupted cleanly.

The Dropbox desktop client runs on Windows, macOS, and Linux. It manages a sync engine, a virtual filesystem, tray integrations, shell extensions, and smart sync overlays. When it breaks, people lose access to their files. For years, that fear kept us on a weekly release train. Here's how we broke free.

The Weekly Release Problem

The main problem with weekly releases is simple: we can't prove things out fast enough.

When you ship once a week, every release is large. A week's worth of commits from dozens of engineers bundled together. If something breaks, finding the culprit means bisecting a massive diff. Rollback means reverting everything, including the safe changes engineers spent all week on.

But the bigger problem is the queueing. Your code merges on Wednesday, it ships next Tuesday. The team is cherry-picking hotfixes, managing git flows across release branches, coordinating go/no-go meetings. All of this ceremony exists because the cost of a bad release is "we lose a week." So you slow down to be safe, which means you learn slower, which means you're less competitive.

In an AI product race, learning slower is losing. Chrome moved from 6-week to 4-week release cycles back in 2021, and even that feels slow for the current moment.

We could just be rolling forward rapidly. Every fix, every improvement, every experiment hitting users as fast as the quality gates allow. No cherry-picks. No release branches. Just the main branch flowing forward.

Phase 1: Weekly to Twice-Weekly

We didn't jump straight to daily. The first step was proving we could cut the cycle in half without increasing incident rate.

The bottleneck was manual QA. Our weekly process had a two-day testing phase where QA engineers ran regression suites on physical machines. We replaced it with an automated stability gate checking three things: crash-free rate above 99.8% on dogfood over 24 hours, sync correctness tests passing, and performance benchmarks within 5% of baseline on key operations.

If all three pass, the build promotes automatically. This alone got us to twice-weekly within two months. Manual QA wasn't catching things the automation missed. It was duplicating work.

Phase 2: The Omaha Upgrade

Twice-weekly to daily required infrastructure changes. We upgraded to Omaha 4 (Google's open-source update framework, the same system Chrome uses). It gives fine-grained control over rollout populations: target specific percentages, specific OS versions, pause mid-flight, roll back without shipping a new build.

The upgrade was necessary but not particularly interesting. The point is: you need an updater that supports percentage-based rollouts with real-time telemetry feedback. Without that, daily shipping is reckless.

Release Rings

This is the piece that makes daily releases safe at scale. Instead of a simple dogfood/beta/stable pipeline, we use concentric release rings where each ring is a larger population that acts as a quality gate for the next:

  1. Core foundation team. The innermost ring. The team that builds the release infrastructure itself runs the latest build hours after it's cut. If it's broken enough to notice in normal use, it stops here.

  2. All teams in the app org. The broader desktop engineering organization. Roughly 50 engineers using the product daily, running diverse workflows, filing bugs against their own code.

  3. All teams in the product org. Engineering, product managers, designers, researchers. Hundreds of people with different usage patterns who will notice issues the engineering teams won't.

  4. All teams at the company. Every Dropbox employee. Thousands of people on every OS, every hardware configuration, every workflow from "I just use it for backup" to "I live in shared folders all day."

  5. Beta customers. External users who opt into early builds. Diverse hardware the internal fleet doesn't cover, real-world network conditions, usage patterns we can't replicate internally.

  6. Stable customers. The general population. A build reaches stable only after passing through every previous ring with acceptable metrics. The percentage rollout (1% to 10% to 50% to 100%) happens within this ring.

  7. Long-term stability customers. Enterprise accounts and users who've opted into the most conservative update channel. They get builds that have been stable for weeks, not days.

Each ring has its own crash-rate baseline and promotion criteria. A build advances automatically if metrics hold. The whole pipeline from code merge to 100% stable takes about 3-4 days, but most users are running code that's only 48 hours old.

Auto-Rollback

The thing that makes engineers comfortable shipping daily: if crash rate exceeds a threshold at any ring, the system automatically reverts to the previous stable version. No human in the loop. No 2am pager. No "who's the release manager this week."

It's fired about a dozen times since we turned it on. Every time it caught a real issue. Every time the affected population was small (limited to whatever ring caught it). Compare that to the old world: a bad weekly release affected 100% of users for days.

The psychology matters as much as the mechanics. Engineers know that if their change causes problems, it'll be automatically reverted with minimal user impact. That removes the fear that made everyone treat releases as ceremonies.

The Cultural Shift

When releases are daily, behavior changes without mandates:

Smaller changes. Engineers break work into smaller, independently-shippable units. Not because anyone told them to, but because shipping a 2,000-line change that reaches beta tomorrow is scarier than five 400-line changes over a week.

Feature flags by default. Half-finished work ships to production daily, so you need flags to hide it. This went from "nice to have" to "required infrastructure."

Testing discipline. The automated gates only work if engineers write tests that catch regressions before the build is cut. When consequences of a missing test arrive tomorrow morning instead of next week, people write more tests.

No more git flow. No release branches. No cherry-picks. No merge conflicts between the release branch and main. Just main rolling forward. The simplification alone saved hours of release engineering toil per week. As Simon Willison argues in his agentic engineering patterns guide, shipping worse code with agents is a choice. Daily releases give us the safety net to hold agents to a high bar: if something slips through, it's caught at the next ring within hours, not buried in a week-long batch.

What We're Measuring

Time-to-ship. From commit merge to available on stable channel: was 7-10 days, now 2-3 days. For critical fixes, under 12 hours through expedited ring progression.

Rollback frequency. Up (we roll back more often), but each rollback is cheaper. Total user-hours-impacted-by-rollback dropped by roughly 60%.

Crash-free rate on stable. Improved from 99.7% to 99.9%. Bad builds get caught at inner rings with small populations.

AI iteration speed. This is the one that matters most for us right now. We can ship an AI feature change, see telemetry on real usage within 48 hours, and ship the next iteration the day after. Weekly releases meant that feedback loop was 2-3 weeks. In an AI product race, that's the difference between leading and trailing.

Making It Stick

Daily releases aren't just an infrastructure problem. Three organizational decisions made this work:

Dedicated release engineering team. Four engineers whose full-time job is making releases faster, safer, and more automated. Not a rotation, not a side project.

Executive buy-in for short-term velocity loss. Six months of building infrastructure that didn't ship features. The argument: every week of release friction is a week of delayed learning, multiplied by every engineer on the team.

Willingness to roll back more. Leadership had to accept that more frequent rollbacks signal a healthy system, not a broken one. A system that never rolls back is either never shipping or not detecting problems.

Daily releases turned out to be the foundation for everything else we're doing in AI. Birgitta Böckeler's Exploring Generative AI series documents how AI-generated code needs constant course correction, and the faster your feedback loop, the more confidently you can let agents produce code. You can't iterate on AI features at the speed the field demands if your release pipeline only lets you learn once a week.