Zalando had to fight for survival.
Retail being one of the sectors with the highest competition, players acting in that space have no choice but to reinvent themselves on a continuous basis.
Zalando started in 2008 just a couple of days before the financial crisis, rapidly expanding in countries the following years with its free shipping and 100 days return policy.
Its technology landscape started like many companies with a monolith architecture, but faced limitations from 2013 requiring them to change their software paradigm.
This article shares Zalando’s Quality Engineering transformation journey and the business and technology outcomes part of their today’s competitive advantage.
Follow the QE Unit for more Quality Engineering.
The initial architecture supports the business growth
Zalando started its business with a monolith architecture inside a monorepo that supported the business growth and scalability in multiple countries.
The company opened many countries with a constant rhythm, Austria in 2009, Netherlands and France in 2010, to multiple ones afterward.
After 10 years of existence in 2018, Zalando’s key numbers were:
- 5.4 billion euros of annual revenue
- 250 million of visits per month
- 26 million active customers.
The scale represented more than 2 million orders, or 4200 orders per minute during the Black Friday back in 2018.
The initial architecture was supporting the business with teams that, in their own words “built their own system and added features for 5 years” without the need to evolve.
But things changed in 2013.
The monolith starts to limit business scalability
Zalando started to feel the pains of its two monoliths five years after its creation alongside a growth of its number of employees, especially in technology.
Interestingly, the scalability issues of the monoliths with a distributed organization reached its limitations at 2b euros of annual revenues like other organizations.
The teams were facing key issues:
- Lack of freedom to make local decisions
- Decreasing ownership in the large codebase
- Overly complex system to build and evolve.
These core and underlying issues directly impacted the business in the 3 areas of Productivity, Innovation and Growth.
Technology was shifting from a business-enabler to a business limiting factor—something had to be done to change the negative trend until it was too late.
Zalando wanted to accelerate its software value delivery, preferring a distributed engineering organization rather than a central one.
The traditional alternatives of architecture evolution came up:
- Restructure the monolith
- Evolve to a distributed architecture.
Zalando took the decision #2 aligned with Conway’s law, evolving to a distributed microservices architecture aligning the rest of the company.
It is the start of their Quality Engineering journey.
Zalando initiates its journey to Quality Engineering
The transformational journey starts at Zalando with the definition of a clear vision and strong principles on organization and technology.
“We want autonomous teams to deliver amazing products efficiently at scale.”—Rodrigue Schafer, From Microservices to Monolith at Zalando
That vision aims to unlock the flow of innovation Zalando needs to continuously improve its value proposition and meet the growth target.
Zalando’s transformation required the adoption of many practices that needed a structure to keep a clear vision of the big picture.
That’s where Quality Engineering helps.
The Quality Engineering framework MAMOS structures Quality at Speed practices on the 5 axes of Methods, Architecture, Management, Organization & Skills.
Here is the Quality Engineering Radar at Zalando adopted from 2015:
Methods enables Rapid Feature Development flow
The software Quality at Speed flow highly depends on the collaboration efficiency of the actors across every stage of the lifecycle, more than technology.
That’s why methodologies are more important first.
Zalando’s objective is to enable “Rapid Development Flow” by removing the most important limiting factors faced during development.
A strong emphasis is set to “building the right thing” before “building the things right” with Shift-left methods supporting faster experimentation cycles.
Three areas of improvements are identified:
- User experience
- Non-functional requirements
- Minimize late reworks.
Zalando leverages specific methodologies and counter-forces to ensure a consistent user experience especially with distributed teams and architecture.
The following methods are used for rapid value-hypothesis testing:
- Consistent User Experience with Customer Journey Maps
- Scenarios evaluation with One-Slide Problem Summary
- Prototypes extracted from Design Sprints.
Each software change is first assessed for its contribution to customer journeys, to then retain the most valuable scenario that is tested through rapid user interface testing.
The second challenge is to accelerate while ensuring non-functional requirements and contain the technology complexity to deliver not only Speed, but Quality at Speed.
Aligned with the focus on the user experience, Zalando implements High-Perceived Performance measuring performance and rendering from the customer view.
That measurement is the foundation of delivering non-functional requirements starting with performance, and gradually adding security, reliability through NFR Checklist.
One example is in Compliance & Security where the 4-eyes principle, audit trails, enforced identity and access management, and data protection are defined.
Lastly, as teams must be “autonomous and deliver with efficiency”, the Tech Radar is set to let teams guide their choice through Adopt, Trial, Assess and Hold states.
Minimize late reworks
One main issue for Zalando is to discover late quality issues such as bugs or unmet requirements that require a costly rework.
And the cost of rework exponentially increases with time, requiring them to detect issues earlier to fix them at quicker and a lower cost.
The teams therefore adopts a series of reviews throughout the lifecycle:
- Architecture Reviews for making better choices early
- Peer reviews during design and implementation stages
- Production-Ready Reviews from design to before going live.
These methodologies enable them to ensure that “new functionality becomes a new service” instead of being built into the monolith, or assess non-functional requirements.
Continuous testing techniques are also adopted to inject minimal testing for fast feedback loops, namely with unit test and functional test, aligned with Agile Testing.
Architecture supports an accelerated distributed innovation
Zalando’s architecture evolution from a monolith to a distributed architecture aims to enable autonomous teams to iterate fast on value delivery.
Their starting point is a centralized organization with a tight coupling and complexity that the microservices architecture aims to distribute properly, if done well.
The target architecture is based on these three pillars:
- Modular services
- Development platform.
Technology urbanization is the city planning of the information system functions, applications, services and flows to meet business objectives maintaining a global consistency.
Zalando’s urbanization is based on a Distributed Architecture organized around business domains with Domain-Driven Design (DDD) for “high cohesion and low coupling”.
Each domain supports a clear team ownership of services and data for that specific perimeter, and also makes them responsible to expose their data assets through federated data layers.
Teams also adopt the “Shared concept of core business entities” to improve the integrations and composability of services built by multiple teams.
This top-level organization is the basis for building the so-called “microservices”.
Each application part of the global application landscape is then built following the MACH principles of Microservices, APIs, Cloud-native & SaaS & Headless.
The services integration is facilitated by the use of consumer-driven contracts and the standard protocol REST offering modern connectors, documentation portal and a light integration.
One key precondition is to build the services in an event-driven way with hooks and event connectors to maximize the technical, temporal decoupling, and scalability.
The migration from the monolith to the microservices architecture is achieved with the Strangler Pattern enabling a gradual migration equilibrating value and risks.
Autonomous development teams delivering microservices need a good degree of automation from the building, deployment and operational stages.
Automation is one of the important prerequisites to benefit from a microservices architecture—without automation, it is best to stick with a modular monolith.
Zalando’s teams can provide standard Code Bootstraps that include deployment and monitoring capabilities for approved technologies leveraging its Tech Radar.
The development industrialization is also what enables them to provide a Self-service Development Platform, and more easily add requirements like Test Automation.
The first product is named Stups.io, a secure framework for Docker based Application on top of Amazon Web Services for the provisioning, deployment and monitoring of microservices.
Management drives a productive and scalable collaboration
The role of managers is more about transformational leadership to successfully manage the valuable transition from a monolith to the distributed architecture.
The key challenge is to keep the performance while having a high degree of change, ambiguity and business growth happening at the same time.
Zalando set the management priorities on:
- Radical Agility
- Quality Culture
- Performance Management.
The Agile movement has spread in a variety of organizations, alongside many methodologies such as the Scale Agile Framework, Sociocracy, or Scrum.
Zalando needed something with more impact than a methodology to power its autonomous teams to iterate within a defined frame across the organization.
They pick Radical Agility.
The system is based on the three pillars of Autonomy, Mastery & Purpose to guide the team in their decisions and rapid adaptation required, from the user experience to their skills.
Concretely, the managers implement the following practices:
- Vision to set the north star and enables its cascade in teams
- Empowerment delegating and supporting local decision-making
- Rules of play defining the interactions and decisions framework.
One example is the checkout vision cascading from the company vision as “Allow customers to buy seamlessly and conveniently”.
Delivering software aligned with the Quality Engineering paradigm requires building better in the first place to continuously accelerate for sustainable growth.
Autonomous teams that must iterate fast with an end-to-end responsibility cannot have the quality responsibility delegated—it must equally be their own.
“Quality is related to mindset, and it’s part of engineering. Systems that support multi-billion-Euro companies must be engineered for high quality.“Zalando’s Engineering Quality Definition
This quality definition clearly makes teams responsible to build quality software in the first place using manual and automated techniques depending on the maturity and radar.
That moment marks the change where “QA team is a thing of the past.” and “You and your team are responsible for your code’s behavior: There’s no other safety net.”
The level of autonomy and empowerment given to enable them to iterate with much more velocity and make decisions for ensuring the quality at speed of their products.
The role of managers is therefore far from micromanagement that must focus on the big picture and global consistency with the time now available.
The Quality Engineering mantra of “linking the outputs to the outcomes” is one of the main responsibilities of managers, accountable for generating valuable outcomes.
Managers rely on performance management techniques and retrospective mechanisms to help the team assess the value delivered and find ways to deliver better in next iterations.
Organization fosters Quality at Speed interactions
The entire organization of Zalando is part of the change to achieve the objective of “autonomous teams delivering amazing products.”
From business to technology, an accountability framework is set in a decentralized organization aligned with the technology architecture and urbanization.
Minimum Viable Collaboration is the model implemented withs:
- Responsibility model
- Team Topologies
- Product, Delivery & Mastery roles.
The radical agility model needs a clear responsibility model for efficient and rapid decision-making across the organization.
Each autonomous team has a defined set of decisions they can take themselves, usually with guides and recommandations like with the tech radar.
That model of decisions limits the number of decisions raised to governance and other board instances that are slower, and may not have enough local information.
One good example is for architecture decisions that are usually taken centrally in many organizations, resulting in a limiting factor and frustration for the all teams.
Zalando’s approach architectural decisions with:
- Architecture team decides for global cross-platform & APIs
- Local architecture decisions are owned by cross-functional teams
- Cross-functional teams can consult architecture teams for inputs.
The responsibility of “delivering amazing products” is also not only the one of technology or engineering—it becomes the responsibility of the team including the business.
The Radical Agility model is about being able to move in parallel while minimizing the number of alignment points to keep iterating with speed.
The alignment between the company organization, technology architecture and processes is achieved setting the right Team Topologies.
Firstly, cross-functional teams become the norm for creating autonomous teams with an end-to-end responsibility, sometimes in the form of Feature-teams.
A scalable microservices architecture requires automation provided by a Platform Team responsible for providing reusable software assets in a self-service way.
Lastly, interactions between the teams are designed to limit the number of interactions following the possible models of team topologies.
Quality Engineering roles
The last piece of organization are the evolving roles that will materialize the concept of autonomous teams, empowerment, and mastery.
Three main roles are set:
- Product Owner
- Delivery Lead
- Practice Lead.
Each team with the end-to-end responsibility is assigned a Product Owner responsible for the product roadmap, driving the increments, and adapts based on experimentation.
That first role bridges the gap between business and technology teams that else have the tendency to work on different priorities and rituals.
The role of Delivery Lead complements that transversal organization to streamline software delivery across one or multiple teams, a needed role in the decentralization move.
Finally, the Practice Lead is someone responsible for engineers to achieve Mastery in their position and potential, developing their skills without having a hierarchical role.
Skills enable a sustainable engineering growth
One of the main issues for Zalando was its old technological stack reducing the team magnetism, ending up as a vicious cycle impeding the hiring and retention of talent.
Zalando leveraged its migration to a microservices architecture to adopt modern technologies and contain them with their Tech Radar.
But the company was also a pioneer with the implementation of two practices that do improve the talent attraction and retention.
The three pillars of skills adopted are:
- Cutting-Edge Technologies
- Remote-friendly organization.
Riding a classic car is enjoyable on week-ends but it can be a nightmare in case of mechanical issues on the road, or for everyday driving.
The same analogy can be done with software where in addition, everything that is recent and trendy attracts more engineers to work in that context.
It was not easy to grasp the complexity and delivery processes at Zalando and things changed with the microservices architecture allowing for more independent deployments.
The decentralization of components with the radical agility and techniques such as the Tech Radar enabled teams to deploy modern technologies at scale.
A few open-source solutions were deployed when it was a viable alternative compared to the commercial one, also helping the tech stack modernization and attraction.
But Zalando went further by making it part of its technology and skills strategy.
The company grew a series of internal products through its radical agility and platform team to solve concrete engineering problems, usually not available in the market.
The model was even set as “open-source first” for sharing services between teams, increasing the modularity and interoperability of products internally and externally.
The majority of their solutions were then open-sourced to:
- Increase the company’s technology awareness
- Contribute to the community and company brand
- Attract talents with interesting products to contribute.
The last significant advantage of open-source was to scale the development and maintenance of their open-source products through the community with less internal hires.
Zalando grew its engineering teams through its main “Tech Hub” located in Germany from its roots in the country, opening smaller ones later.
Opening in other countries such as Helsinki, Zurich and Lisbon even if this last one closed, followed to support the growth of the teams.
At that time, the organization was “remote-friendly” as providing a punctual allowance in this model, being already different from the standard 5 days at the office.
In 2020, the company set clear remote guidelines mainly about common sense and rituals to implement as an organization like daily standups via chat and video, 1:1.
Quality Engineering Outcomes for Zalando
The transformation achieved by Zalando represents the impacts of Quality Engineering when acting with transversality on the entire system.
The goal was not to build microservices, grow the teams, and evolve the technological stack—these were only means enabling the company to thrive.
The Black Friday in 2019 let them acquire 840,000 new customers while the accelerated cycles of software powered the NPS from -2.90 to 10.81 between Q4 2017 and Q4 2018.
But most importantly, Zalando had sustainable growth.
Their Quality Engineering maturity enables them to reinvent the company architecture and value proposition continuously delivering Quality at Speed software.
Zalando was fighting for survival with a canned architecture constrained by external solutions, but managed to reverse the situation building up its QE capability.
That investment allowed them to grow their own business capabilities, which later on evolved to a platform ecosystem of business services.
The initial limitations on productivity, innovation, and growth are long gone.
Quality Engineering Roadmap at Zalando
Zalando is a strong representation of what Quality Engineering is: the holistic software paradigm allowing to reinvent companies through software.
The transformation on the five domains of MAMOS made the difference to achieve sustainable growth, avoiding getting back to old habits or just creating more problems.
Quality at Speed software is the competitive advantage letting them adapt faster than competitors, testing and scaling the most valuable ideas and opportunities.
The speed at which they can adapt is like a compounding investment, and they are continuously increasing their Quality Engineering maturity.
Their internal development platform Stups.io evolved to Sunrise, a modern developer portal built on top of Backstage from Spotify.
New roles like Principal Engineer also came up, as well as an experimentation platform, personal development budget, mentoring, or micro frontends.
Yes, Quality Engineering is about challenging transformations and a never-ending continuous improvement journey to thrive in the marketplace.
But aren’t the results worth it?
Adrian Bridgwater (2016), Radical Agility As A Business-Technology Principle. Forbes.
Andrea Moretti (2022), An Introduction to the Zalando Design System. Zalando Engineering Blog.
Dan Persa (2015), From Jimmy to Microservices: Rebuilding Zalando’s Fashion Store. Zalando Engineering Blog.
Dan Woods (2017), How Platforms Are Neutralizing Conway’s Law. Forbes.
Eric Bowman (2016), Radical Agility 101: Study Notes. Zalando Engineering Blog.
Felix Müller (2016), Scaling Architecture @ Zalando. GOTO Con.
Gary Rafferty (2022), Growth Engineering at Zalando. Zalando Engineering Blog.
Gregor Ulm (2021), A Systematic Approach to Reducing Technical Debt. Zalando Engineering Blog.
GunnarObst (2016), Radical agility – from command & control to purpose, autonomy & mastery. SlideShare.
Henning Jacobs, (2022), Cloud native developer experience at Zalando. Youtube.
Jan Brockmeyer (2021), Micro Frontends: from Fragments to Renderers (Part 1). Zalando Engineering Blog.
Jan Brockmeyer, Maxim Shirshin (2021), Micro Frontends: Deep Dive into Rendering Engine (Part 2). Zalando Engineering Blog.
Marcel Weiß (2017), 2x by 2020: Zalando on Its Growth Ambitions. EarlyMoves.
One Decade in Zalando Tech. srcode.
Pamela Canchanya (2019), Building and Running Applications at Scale in Zalando. InfoQ.
Pattern: Strangler application, MIcroservices.io.
Project Mosaic (2016), Microservices for the Frontend. Project Mosaic.
Radical Agility, or How To Consider Social And Ecology With Agility. TeamMood.
Rodrigue Schafer, From Monolith to Microservices at Zalando. Youtube.
Stephanie Cadieux and Miriam Lobis (2018), The journey to an agile organization at Zalando. McKinsey.
Tim Kröger, Henning Jacobs (2020), How to work remotely at Zalando. Zalando Engineering Blog.
Zalando, Our history: from start-up to grown-up. Zalando Corporate.
Zalando’s Engineering and Architecture Principles, Zalando GitHub.
The Quality Engineering Framework, Manifesto & MAMOS are licensed under a Creative Common Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)