Reasons for the shift
The last talk of the OSCamp 2024 in Nuremberg was given by the adidas employees Andreia Otto and Ravi Mogulla. The topic was „SRE Challenges in Monolith to Microservices Shift at adidas E-commerce„.
They started their talk by explaining why the shift from a monolithic architecture to microservices was necessary. One reason was that they needed to speed up the release cycles. Also, microservices make scaling easier and allow for an increased order throughput rate in their shop. Last but not least, the goal was to reduce operational costs with the architectural shift.
They followed the MACH architecture model to make the shift happen. MACH is an acronym for: Microservices, API first, Cloud Native, Headless.
The Challenges
The new microservice architecture came with many challenges. Monitoring and Observing is more complex for microservices than it is for a monolithic app. The CI/CD pipeline made deploying changes easier, but it also leads to more breaking changes reaching production and causing downtimes. Having multiple teams, where each team is responsible for their own microservice, is reducing complexity for the developers, but it has the downside, that the teams often do the same things – in different ways. So a lot of code is written repeatedly among the teams. There are also security concerns about the increased number of API endpoints facing the internet.
Solutions
To face these challenges adidas came up with the following solutions (among others): To be able to spot breaking changes that will lead to downtimes early on and reduce their impact, the strategy of canary deployments is used. For Observability purposes End-to-End Tracing with APM tools was introduced. Also, several security measures were taken to protect the public API endpoints from bots that flood their online shop.
Incident Management
An effective incident management is necessary when dealing with microservice infrastructures. Adidas recruited so-called incident commanders that call in the right teams and coordinate the mitigation or fix of incidents. Afterwards blameless postmortem analysis is done where the 5 „why“ technique is applied to get to the root cause of the problems. The teams then take the necessary actions to prevent that particular problem from happening again.
The talk was followed by an extensive question and answer section where the speakers answered many questions from the audience. The talk definitely was one of the highlights of OSCamp 2024 in Nuremberg and worthily ended the event.




























0 Kommentare