The tools of the repository are structuring for the performance of our development processes. However, we usually forget specific typologies.
As engineers, we love tools. We can tend to lose sight of their goals and their contribution to the organization. The repository is central in our development flows, supporting experimentation and value addition.
Our mission is to build an efficient system at the service of Business through a Quality Engineering approach. We must balance the productivity of today and tomorrow with the complexity and scalability of integrated tools.
This article covers the tools necessary to secure the management of your repository, also considering technical debt management. The examples mentioned are not meant to be exhaustive or prioritized. We do not go through the documentation, IDE, and communication topics.
The categories of tools are ordered in the stages of our development process. So let’s start by managing the code in our repository.
All start with a repository management
The first subject is to store our code in a shared, accessible, durable, secure, extensible directory. The evolution of available platforms and reduction of storage costs have eased this choice.
The repository must be able to provide workspaces, hierarchical the particular case of multirepo or polyrepo. These models will generate many repositories that need organization, such as a library and its books.
For any model, a search functionality covering code, project, and component is necessary. Researching a large repository can be supported by code-specific search engines. Cody, Lerna, or Octolinker are possible solutions.
A structured organization of the repository will support proper access management. The goal is not to block access by default; it will depend on the engineering culture we want to create. The main objective is to clarify the responsibilities of the different projects for the rest of the development chain.
We must pay particular attention to the case of monorepo. Its growth and centralization create the need for faster research, limited project download, and manage packages. Atlassian shares best practices in these articles Monorepos in Git and How to handle big repositories in Git. Microsoft also makes Rush available for package management.
A split-repo model also has specificities. We need a solution to switch from a monorepo to a multirepo before the build. Split.sh is a reference. Many tools are also available to manage the repository migrations and separation: Bazel, Tomono, or this list of shopsys.
Teams can then start contributing to the codebase.
A version management supporting reviews
Versioning management lies in the concepts of commits and branch management models. The coordination, orchestration, and process is the real subject to address.
Development practices have evolved to integrate peer reviews systematically. This practice requires supporting mechanisms of pre-merge, pre-submit, or more flexible hooks. CI solutions incorporates Git natively support these mechanisms. It is possible to complement its approach with Pre-commit.
Code reviews are effective when they combine automated and manual processes.
Automation must focus on repetitive tasks with no added value for a human. It is essential to secure governance and inclusion in team processes to capture their value. Standards will also allow the replicability, scalability, and maintainability of the tools put in place.
SonarQube and Codacy are tools that fulfill this task. Alternatives specifically support reviews, such as Pull Review.
Manual processes have three types of execution: manual, hybrid and automatic. Some methods are complete human initiative, realization, and control. Others, like peer reviews, are hybrids to have part of the process automated. Even if entirely automated, some processes require a regular manual review of the results obtained, like an AI.
We then specify the tools to build our applications without forgetting to manage their dependencies.
A build and dependency management system
The build of our application is a cornerstone in the life of our software. Its objective is to build the executables, which will then be installed and executed in the different environments. The complexity inherent in the code materializes here.
A growing monorepo needs to contain the build time to be usable. It is necessary to compile and test only the relevant dependencies. Bazel is from Google, Buck from Facebook, Pants from Twitter. Yarn-inspired solutions are also emerging, such as Baur or Oao.
The multirepo model may require dependency management depending on their implementation. Shared libraries will tend to create dependencies to be managed between your different projects. Anti-pattern implementations can also create cyclical dependencies to be detected via static analysis tools.
Bots appear to help with dependency management, such as dependabot. They focus mainly on version updates. More complex analyzes require more tools, as is the case with Google.
We can start building applications once our dependencies are under control. Here we find traditional Continuous Integration (CI) solutions, sometimes specific to certain technologies. These include Jenkins, Gitlab CI, Azure DevOps, Bitbucket.
Mechanisms of Refactoring at scale
All the changes made to the codebase will create a complexity to contain. Review mechanisms on single changes alone are not sufficient with their limited prism. We need tools that support a broader refactoring at scale approach.
Humans can perform analysis on our codebase. However, the size, complexity, and speed of the changes require supportive tools. Imagine the time needed to identify improvements through an entire monorepo or 100 repos manually.
The major players are aware of the risk of technical debt and the associated slowdown. They invested significantly in their refactoring capability. Google for example created Rosie making automated refactoring proposals. Other solutions are available, like Turbolift from Skyscanner, Autorefactor, or Dependabot.
Functional refactoring remains most adapted to humans today. The business context and possible functional changes are difficult to model for the moment. Speaking of modeling, that combines static analysis, governance, and automation.
API and data governance
Data has evolved as an increasingly valued asset for organizations. For some, this is their main raison d’être. The consistency of data is therefore structuring for their collection and use.
The acceleration of data creation brings out the need for tools. The applications are at the origin of creating the various data available, with a data structure specified by the developers in their code. The multiplication and distribution of teams require real data governance.
Data dictionaries support at that effect database consistency with conformance checking. For event models, they start to be backed by schema management. The ecosystem converges around this data governance and, hopefully, with standards like asyncAPI.
The next step is to deploy the application in different environments without forgetting to test.
Test tools integrated into our chain
The test is a structuring part in our delivery chain for any model repository model. The important point to consider is its integration with our code organization model.
The stage of building our application must include the tests to be carried out as close as possible to the code. CI solutions support the execution of unit, integration, or other test typologies part of the application’s build. A monorepo requires a specific test approach to allow first testing only the impacted modules.
We then come to the stages of deployments in the different environments. We need to have the ability to run tests as stages of our CI / CD pipeline, also known as quality gates. Other tests must instead be outside the pipeline depending on the organization, such as functional or end-to-end performance.
From the operational perspective, monitoring is a type of testing that we have to include in a broader observability approach. We complete our tooling for more responsiveness with feature flags.
Activation and rollback mechanisms
The ability to manage feature activation requires an inclusion from the design stage. We must assess its benefits in terms of relevance, perimeter, and activation rules.
Managing feature flags requires code to implement its rule and activation mechanism at runtime. We can do this with some basic code consisting of “if” and a variable in a configuration file. We can leverage advanced tooling supporting additional requirements such as continuity of service, accessibility, access management. A solution external to the application also facilitates a decoupled approach. Solutions such as A/B Tasty, LaunchDarkly or Split.io are recognized players.
The feature flags nevertheless require lifecycle management until their deletion. Their lifespan will be different according to the uses; some can be continuous. We have to combine practices and tools to get there. You can read this article on removing feature flags, Piranha from Uber, or recommendations from Featureflags.io.
Visibility is then a fundamental element for our ability to manage our repositories.
Reporting, dashboard, and Process Mining
Our engineering choices must increase the value of our overall business system. We must guide its management with measurement and visualization.
We can use relatively standards graphs solutions to visualize our repository metrics. Some solutions mentioned above will natively provide visualization. We can also use dashboarding and reporting solutions such as Grafana, Kibana, or PowerBI.
Our activities are, moreover, rarely static. Our development chains are successions of events, stages, and component sequences of processes. The concepts of event-oriented architectures, value-stream, and Flow emerged. This is where Process Mining appears to measure and visualize these processes.
Tools at the service of your engineering productivity
Managing a repository requires a much broader approach than just storing code. We understand that architecture is necessary through the different types of tools to integrate.
We must keep a real focus on our company’s objectives to bring value at the end of the day. A technological approach will create complexity by multiplying solutions which will end up being counter-productive.
Our system must enable rapid iteration cycles, supporting the growth and maintainability of our codebase. The relevance of the features offered, activated or deactivated, remains specific to each organization.
Setting up such an engineering system can be questionable from a business point of view. Our approach must be aligned, shared, and linked to business issues.
Isn’t code one of our organization’s main assets?
References
A curated list of repos tooling https://github.com/korfuri/awesome-monorepo
https://medium.com/@ SkyscannerEng / turbolift-a-tool-for-refactoring-at-scale-70603314f7cc
https://blog.bitsrc.io/11-tools-to-build-a-monorepo-in-2021-7ce904821cc2