Gartner published that 80 percent of unplanned downtime is caused by people and process issues and poor change management practices. As managers we either create or reference a change management policy and expect our teams to follow it. That policy will likely have to be general enough to address various departments, technologies, and scenarios, or it might concentrate on navigating an application or bureaucratic workflow. Thus it may be beneficial to distill the policy down to something short, blatantly clear and actionable. I typically publish something like the below, under the cover of “Change Behaviors”, “Change Guidelines” or something, and either list it or reference it in the team operating procedures.
The team has a responsibility to ensure all changes to Production are orderly and minimize disruption. For all configuration work, builds, and cutovers, the team will follow the methodology below. For work in environments in Production or under change control, adherence will consist of additional formality including the IT change management process which is represented by the CAB (Change Advisory Board) | CCB (Change Control Board).
- Plan: Plan the work, think through the tasks, consider the adherence to standards and best practices; consider the risks and ensuing mitigation and contingencies, determine the time required to perform the tasks and to rollback if necessary; determine the measure of success and the tasks necessary to validate (test plan); if the change is not routine or it involves multiple parties, document the plan
- Assess: Assess the risk and the impact of the work on Production, other important non-Production operations, peers and end-users
- Communicate: Communicate the work to be done and its potential impact to all concerned; typically, this will require communicating with CAB|CCB and end-users when scheduling, and communicating with Help Desk and peers when performing the work
- Backup: Backup the configuration of the CI (configuration item) to be changed; ensure there is a working plan to rollback to the original configuration in a timely manner
- Configure: Configure the CI as planned
- Test: Test the CI change. Note that this concept of testing encompasses two scenarios. Ideally, the change is first tested in a Test, Development or Sandbox environment prior to implementing in Production. Secondly, the change should be validated with a small test once it is implemented in Production. This is generally referred to as a “smoke test” and is a subset of the testing performed in a non-Production environment.
- Validate: Validate the results of test match the expectation of success
- Communicate: Communicate the status of the change and its verification
- Monitor: Monitor the change in the ensuing hours/days following the work
- Document: Document the new configuration and any problems/resolutions or ancillary matters noticed while performing the work (see #5)
The team should only disengage from an outage, change or cutover event with the approval of the event lead. In the case where no event lead has been designated, the team lead | supervisor | manager | director will serve in that role.
Regardless of when change participants engage from the change event, all or at least designated participants must be available and reachable to react to any associated repercussions of the change. In addition, some or all of the participants should take on the responsibility to perform the Monitor function above.