A Simple and Consistent Workflow Model
In essence, if we want to direct our lives, we must take control of our consistent actions. It’s not what we do once in a while that shapes our lives, but what we do consistently. — Tony Robbins
This is just a small story of how a simple daily process saved me tons of time in identifying a root cause of an issue with exact data and a timeline of its occurrence.
I have been using GitLab for about 15 months now. It is greatly helping me and my team to deliver high-quality software rapidly on demand. I am working in a startup environment and having a good SDLC process is sometimes difficult. Sometime last year, I got an opportunity to work as a standalone backend developer for an enterprise project. I was a newbie at that time and this role was entrusted to me pretty fast after joining. So the stakes were high for me. I got a chance of taking a lot of creative decisions— from gathering requirements, data modeling to deployments, and testing. I am glad that I chose a process model and stayed consistent in following that. Gitlab, GitHub, Azure DevOps provides a way to manage the entire SDLC using a single toolchain.
I have been following certain practices using GitLab from the beginning of development.
- Every user requirement will be turned into Issues(User Stories-Agile).
- Every delivery planned will be grouped into a Milestone(Software Delivery).
- Every environment(dev, staging..)has a separate branch(code is constantly pushed and integrated).
- Every commit/MR will be tagged to an Issue. Every MR to the active branch will trigger a CI job(Automatic Quality, Security, and Standard check on code push/before integration).
- Every successful CI job triggers a successful automated deployment to the corresponding environment.
- If there is any breakage due to the current deployment, it can be rolled back to any previous deployments.
The premise of this post is not to the entire pipeline. But to the first two processes. Issue creation and tagging commit-id related to the Issue.
Recently we came across an issue — An API response mismatch in one of the dashboard’s widgets (Currently there are 30+ widgets, each having a separate unique response). A recent code push broke one of the chart response. This was entirely unexpected as everything was going smoothly. When notified I thought it was due to current working changes. But to my surprise, it was not. To be honest, that issue was existing in the production environment when I checked at that time, although only in certain scenarios. It was waiting on for that scenario to occur, to successfully showcase itself in its all glory and I will be running haywire on what caused it.
But having an established process with this Issue creation and tagging commits easily allowed me to not only track the root cause of the issue but when the first mayhem occurred. When we were working on milestone 03, a change request for a feature update was raised. This was to split one of the widgets into two separate widgets. Both widgets will have some common functionalities in that end.
This is how it happened. …
- A feature update was raised. I worked on that change, testing, and deploying it. Some more bugs related to the issue were raised later on. I fixed those and it was working like charm.
- But here is what happened. Instead of fixing one widget, I changed the functionalities of both widgets. Yes, the mistake was mine. They were similar looking functionalities, my unit test didn't capture it and so my quality engineer. It happens. This was working so far without any issues until we come across a scenario where this functionality must be reused for another widget.
- I updated the code, pushed it, and poof… Error. When I looked into code as the issue was raised, I had no idea how the code suddenly looked like that instead of expected functionality.
- Ok. Let’s skip the boring details. What I did is to search for that issue/user story and related issues to that.
- Find the 3 relevant issues and corresponding related issues. which was a total of 6 issues. Some 15 MRs(Merge Request) is related to it. That’s it. I found the occurrence of the need in Issue no #232 which was registered due to the Issue raised by end-user #91. Tracking to the suspected MR no !927.
- All I did was — copy the functionality from the previous commit and replaced it.
- Tadaaa.., the issue was gone. What should have been hours of crunching on why?how?who?. It took me only 5 minutes to pinpoint an issue that occurred 6 months before.
There is nothing special or unique about this. Every developer and team inside any org will be doing it. But I was relatively new to a project on that scale. I was collaborating with others and working on other projects as well. The only challenge was to stick to the process.
Without this, I might have done a temporary fix to code which was already a calmly and perfectly written logic. Adding one dump of rushed temp-garbage code. Prompting all further code changes to follow the dumpster model. Because my goal is to fix the issue as soon as possible. Haste pushes the need for any temporary solution, that in turn pushes the need to write any ugly looking code. And who is going to look into after it because I will be done once I am out of it and it’s up to the next developers to figure out bashing the monitor and continue the haywire model? And Ladies and Gentlemen that’s where the fall of nice-looking code and design pattern would have occurred.
Phew... Nothing happened like that, A persistent and a small process saved me, and whoever will work on it in the future — lots of hours of starring the monitor and bashing the head. So yes, fail fast and recover faster with a good SDLC. Remember every minute of a stagnant issue in production tars the reputation with end-users. It’s there. it won’t be visible, it’s bottled up and it will come with a blast one day. We need not wait until then, right? One of the philosophies of DevOps — Fix issues before the user reports it and Fix immediately if the user reports it. And if we are unable to find the root cause of the issue fast, How we are supposed to fix fast?.
- Tell me if you are working on someone’s code with this same occurrence and you have no idea of what goes from where to where, What you will do?
- If there is no process and code is already a mess, will you try to create one and clean it up or will you think ok, because it’s like that from long before?
Generally, everyone will always be trying to do something, to do better. But it is not what we are trying but what we try consistently. We need to start somewhere, to initiate it, to start experimenting, accept failures, and continue experimentations until we achieved what we are looking for. That will be a never-ending process of continuous improvement. We can keep setting new standards of quality and process and keep improving it. All we need in the end is to start a practice and remain consistent with it. I am glad that I started and followed it. When I think about it now, I will say that It was never a part of my role but the small excess time I put to follow the process was definitely worth it.