In the Agile world, there is a concept of a Discovery Sprint. This is a 1-2 week chunk of time a team has to tackle a business problem, develop an understanding of the domain and explore a few solutions.
This is often accompanied by a proof of concept ‘delivery’ phase.
This PoC should produce:
- A strong indication as to whether an idea is worth pursuing
- Demonstration that the team is capable of delivering
- A solid base to develop the project further
It’s important to not only demonstrate the technology, but also establish a way of working that will lead to both short term and long term success.
For us, this means incorporating strong DevOps principles into our data analytics projects.
DevOps is a set of practices aimed at making releases faster and more stable by automating as many operational IT tasks as possible. The idea is to invest some upfront time in developing process automation, which will yield time savings every subsequent release.
This, however, doesn’t produce any user-facing features that can be showcased.
So then, why should the team invest any time into DevOps during the discovery sprint?
Manual tasks for building, testing and deploying a codebase will sap the team of time that could be spent on feature development.
Time invested in task automation, however, is time that could be spent developing the product.
Deciding how much time to spend on DevOps is difficult at any stage. I’ve found identifying automation upfront that will provide an ROI within the two-week sprint is a solid go-to method.
Set-Up Your Git Repo
Get into the habit of starting a repo before you write a single line of code. Having the repository set up allows any team member to instantly jump onto the project when needed.
Once you’ve sent a team member the link to your repo, stop pushing directly to “master”. Establish a development workflow that allows the whole team to work on the same codebase seamlessly.
I’d recommend a simple feature branch workflow for the first few sprints.
Feature branches should only exist for a few hours before being merged into master. This minimises the risk of code conflicts and allows your team to leverage your work faster.
Implement a Smoke Test
If we’re going to automate releases, we need some level of assurance that we’re not going to break the system without knowing.
Our favorite data transformation tool, dbt, allows us to implement this using schema tests.
Implementing unique tests on primary key or surrogate key fields in our transformations helps identify bad joins or aggregations resulting in multiple records for the same entity.
Breaking changes to the codebase would also be revealed in such tests.
Comprehensive tests that validate business logic should be developed in subsequent sprints as a solid understanding of the requirements is formed. In the first sprint, however, high-level signals to flag bugs will provide significant value.
Here’s the big time saver.
Automate the process of pushing to the server or cloud service running your codebase.
An example approach is to create a Gitlab CI/CD pipeline with three stages:
- Containerise the application (Build)
- Run the newly created container through its tests (Test)
- Deploy the new container (Deploy)
If this saves 15 minutes every time the team wants to release an update, and the team is pushing 5 changes a day, there would be an expected savings of 12.5 hours over the course of the sprint.
With practice this simple pipeline will only take an hour or two to set up, so it’s well worth the time investment.
Remember, the earlier we automate deployments, the more time savings we’ll accumulate.
Be sure to take care of this as soon as possible to get the most out of your work.