Neuma White Paper: Change Package-based versus Task-based Change Management of Software

Over the past few years, the Software Configuration Management (SCM) industry has recognized that designers work most naturally by updating software a change at a time rather than a file at a time. For example, to fix a problem may require modifying three files, or to fix three related problems may require changing two files. Traditional file-based change control handled this by recording the same problem report reason against each file as it was submitted to the software library. Now that a change packaging model has been accepted, the design of the model must ensure that a new set of pitfalls does not arise from this model. To this end, the authors propose a clear distinction between explicit change package-based change control and task-based (or implicit change package-based) change control.

Change Packages versus File-based Change Control

The pitfalls with the traditional file-based approach to change control are well understood by the industry today. These include:

the need to double key the reason or reasons for the change against each file,

the need to ensure that all of the files get submitted to the library,

the need to check in files individually,

the need to ensure that all files for the change are promoted through the release life-cycle, and

confusion between dependencies expressed to group files together into a change and those used to identified prerequisite changes.

As well, the paradigm of working on a file at a time, though convenient for a compiler, is not convenient for a designer or system integrator. For example, the designer wishes to submit a change, to view a delta report for a change, run a change through the code review process, promote a change to ready status, merge a change from one stream into another and express dependencies between changes. The system integration team wants to select changes for a build, promote changes, yank changes, and so forth. Whereas a system integrator may have to look through a list of 30 or 40 file revisions and try to determine which ones to promote and which belong together in a file-based environment, in a change-based environment, an organized list of perhaps a dozen changes are easily scanned and the appropriate ones may be promoted.

Once the concept of change packages has been accepted, the next step is to identify a successful model for the changes. This paper discusses the two basic models:

The paper does not try to differentiate between the various checkout/checkin models. Hence, it should be possible to configure any of the models to support one or more of the "locking checkout/checkin", "recorded non-locking checkout/reconcile on checkin" or "unrecorded non-locking checkout/reconcile on checkin" models, with or without notification triggers or logic.

Change Packages versus Task-based Change Control

Change packages define changes by collecting files and other appropriate data together in a first class change object. Task-based change control works by associating a task id with each file that is checked out. The task id relates directly to a problem or feature definition (and may itself be the problem or feature identifier). On the surface it may appear that there is not a major philosophy difference between the two models. However, with respect to implementing a change process, there is a significant difference. A review of each method is in order to help illustrate the differences.

Change Control With Change Packages

A change package is a collection of information pertinent to a change (in our example, a software change) to a (software) baseline. A change package is a first class object. It has an identifier which can be used to uniquely identify both a set of files which have been modified, and the reasons for their modification. Typically, the change package contains other information pertinent to the change. Because the change is used to "update" a software baseline, it will typically contain additional information which is process dependent. This may include information such as:

the software development stream, and perhaps a milestone within the stream, to which it is targeted (e.g., new development or older support stream),

integration instructions, including instructions for resolving an incompatibility if the change is not upward compatible (e.g., run a database conversion program),

special process-related metrics,

environment data (e.g., where are the files kept while the change is being made),

compatibility and pre-requisite change dependencies, and/or

the designer, reviewer(s) and testers for the change

The goal behind the change package philosophy is that the change can proceed through the various stages of promotion without involving participants from the previous stage. The most common involvement from the traditional file-based model was in identifying which files belong together for a change. In most shops, however, this is only a part of the change puzzle.

When changes are made using change packages, files are checked out against a change package as required. The change package can have one or several reasons. In a well-designed model, the system can optionally enforce that the reason or reasons used for the change (e.g., the problem or problems being fixed) are valid; that is, that the reasons have not disappeared over time (e.g., problems already fixed), that the developer is authorized to quote the reason in the developer's work area, and so forth.

When the change is completed, a delta report can be produced by specifying the change identifier, the change can be submitted to the software library as a single unit and stays together throughout the life-cycle of the software. If a change implements a feature and fixes a problem at the same time, the problem and feature states can be automatically promoted, though the promotion criteria for features are typically different than that for problems. A problem is typically fixed by a single change whereas a feature is often implemented as a series of changes, possibly by several designers.

With change identifiers, it is helpful to include the designer's "login id" or initials as part of the identifier for easy recognition. Often, changes are viewed by department (e.g. review all changes made by my staff this week). It is important that the CM tool permits easy access to changes by department, owner, or date to help maximize designer and manager effectiveness.

Tasks, Features and Problems

The term task is often used to identify work to be done. Often tasks reflect items which are typically collected into one or more categories. For example, tasks may be split into "feature" implementation tasks and "problem" resolution tasks. Features and problems each have separate process flows, which differ from the "change" process flow. Features and problems are typically tracked against functional areas of the software product. Changes are typically tracked against design architecture areas of the software design. It often makes sense to implement a feature in multiple changes rather than in a single change. The vast majority of problems are resolved using a single change. Those portions of a change which will impact the development environment the most are sometimes done early (so that there is more time to absorb the change) or even as part of a previous release; often they are done to coincide with an API definition freeze to ensure the stability of the development environment going forward. This is a critical requirement in large projects.

A "problem" definition typically consists of a problem description (which is really a mini-specification), a priority, an owner and/or an assignee, an audit trail indicating how the status, priority, responsibility, disposition, etc. of the problem has changed over time, a test case and tester. The problem may be tracked against a single software baseline or against multiple streams. The process for a problem may depend on such factors as the priority. Typically, a problem review board or a design manager reviews and accepts problems and ensures that the priority, owner/assignee and description are correct.

In a similar fashion a "feature" definition may contain information such as proposed release, a specification document, specification review comments, planned/forecast and actual target dates, dependencies upon other features, effort and cost estimates, owner/assignee, tester, test cases, approvals, and so forth. Typically, the overhead and approvals associated with a feature definition are more substantial than with a problem report, since the problem indicates that a change is needed to match an already approved specification. As such the processes for problems and features are different.

Feature definitions usually arise out of a product management function which identifies feature requests up front and prioritizes them for roll up into the product over time. The prioritization typically affects the release to which the feature is assigned. Problem reports usually arise out of problems within a given product stream, which may or may not apply to other release streams.

Task-based change control arises when a task is tied directly to a specific feature or problem (i.e. one -to-one relationship) or (more often) when the feature or problem definitions themselves are used as the task definitions.

Task-based Change Control

When task-based change control is used, each file which is modified is related to a specific task (i.e. to the associated problem or feature). The way to identify the files associated with a task is to traverse the "file revision" database to see which files refer to a given task. This can be a slow procedure in large, long-term projects, although inverted relationship implementations, where a task to files relation exists, may be used to help improve performance.

As with change packages, the information against a task is used to describe and control a task. Unlike change packages, there is no first class object to contain information associated with a change, so the feature and problem data schema are compromised to include change data, or the required change data is not tracked, or is tracked against each file of the change.

There are two basic ways to organize task-based change control. One is to have a single table of "tasks", regardless of whether the task describes a feature, a problem or otherwise. This in turn implies that the same information is tracked for both features and problems, further compromising the data schema.

A second way to organize task-based change control is to let the task identifier be the identifier used for two separate tables, one for problems and one for features. A significant problem with this approach is that there is no longer a single type of entity through which change control and configuration status accounting can be done. As a result, a hybrid, and significantly more complex, change control process results. Many databases have difficulty dealing with hybrid sets of records and so each must be dealt with separately. For example, if a query is made to identify recent changes, the query would have to traverse both a "problems" and a "features" table. Because of this complication, it is often the case that no change information is stored in the table.

Reasons to Avoid Task-based Change Control

A typical task-based change control solution will coerce a problem or feature definition into a task definition. In this way a problem/feature task does not have to be created for each problem/feature. However, this savings means that there is no first class object corresponding to a change. This is a fundamental difference between task-based and change-set based change control.

Task-based change control puts a different perspective on change management than is found in most software development shops. A summary of some of the problems with task-based change control follow.

Can Only Work On One Task Per Change

Change control becomes confused if an attempt is made to work on two related tasks in a single change. This is a significant concern because often a new or modified feature will inadvertently fix a number of problems. Or consider the case where three problems in a related area need to be fixed. One task identifies that a title needs its spelling corrected, another that the font size is too small, and another that the title is not centered properly. Although three tasks exist, the designer realizes that all three are easily fixed in the same program module. With task-based change control, each task is worked on independently requiring three checkout-edit-compile-test-checkin cycles. With change package-based change control, the change references the three problems to be fixed, so that only a single checkout-edit-compile-test-checkin cycle is required.

Inability to Work on One Task Over Several Changes

Often a single feature requires several changes to implement it. For example, the first change might introduce schema and interface changes while additional changes would implement the associated feature logic. Although this is supported in a task based scenario, it is not possible to track the separate changes, as they all look like a single change. This means that the changes cannot be independently promoted through the system. This can be mitigated somewhat by defining several tasks for a feature implementation, but this has side effects including:

the overhead of definition and association of the related tasks to assess the feature status,

authority control on the tasks table, which can no longer be controlled solely by the project management team since the design team needs to break tasks up into implementation units, or

completion of the first task or tasks may erroneously signal completion of the feature implementation to others on the team.

Complexity of Adding Change Control Data to Task Data

Because the change does not relate to a first class object (other than the problem or feature), additional data must be associated with problems and features to deal with change control. This further complicates the views as seen by the Project Management and the Problem Review Board perspectives, as well as complicating the view of a change as seen by a developer. There is a further confusion of data and a restriction on process implementations in that it becomes difficult to associate data such as upgrade instructions, change metrics, environment data, or other data with a change.

Variant Changes Require Duplication of Task and Compromises Inter-Stream Comparisons

Quite often a feature has to be implemented, or a problem fixed, in more than one development stream, for example in the current field release as well as in the new development stream. If the source code is not identical in each stream, a separate change is required in each stream. This may be further complicated by the requirement to implement a feature on more than one platform or in more than one variant of the product. Especially in the case of "problem" tasks, this results in duplication of tasks to ensure that the changes can be made in multiple streams. This further complicates the ability to identify if a single problem has been fixed in a particular stream, or to identify features or problems implemented in one stream or variant but not in another.

Confusion of Authority For Task Table Ownership or Separate Task and Feature Tables

Whereas features are typically defined by project management early on in the life-cycle, changes are typically sculpted by designers through the middle portion of the life-cycle. Except in small projects, there is likely to be significant differences between PM-defined "feature" tasks and R&D-defined "change" tasks. This will undoubtedly lead to tension in the early parts of the development cycle, and a lack of process clarity in defining and modifying tasks, unless tasks are tracked separately for each group, in which case traceability between the two types of tasks will be a necessary overhead.

Expressing Dependencies

If a task is also the container or focal point for a change, confusion arises between task dependencies and change dependencies, assuming the environment permits specification of both. Feature dependencies are expressed at a functional level, whereas design dependencies are expressed at a design level, and these are quite different in large projects.

Conflicting Process Flows

In a change-set based environment, changes are promoted through the system. Changes have their own life-cycles which are closely related to the development and system integration and test functions and are controlled by the design and integration teams. Tasks tend to have separate life-cycles which reflect approvals, verification and acceptance and are controlled by the product management team. This results in confusion both in the life-cycle definition and in the authorities within and implementation of the process.

Viewing Most Recent Changes

One of the most frequent operations performed by a designer is to look through their most recent changes to resurrect ideas or to perform design recall. When changes are first class objects, they are created as the need to make a change arises, rather than well in advance of the change. With features and problems being defined well in advance of the corresponding software changes, it becomes very difficult in a task-based setting to identify a user's most recent changes. This has a tremendous affect on graphical interfaces and on intelligent use of defaults, where the most recent changes are otherwise used to focus the user or default values on the most pertinent information from the designer's perspective.

Observations

There is a fundamental philosophical and practical difference in the two models. The change package model treats problems, features and changes as separate objects, each having its own process. The task-based model attempts to normalize the concepts of tasks and changes.

A clear benefit of both task-based and change package-based change control is that file revisions are packaged together allowing easy promotion of the files associated with the task or the through the life cycle. Apart from this fact, however, the foregoing presentation appears somewhat one-sided in favor of change packages. This is because change package organization, along with problem tracking and feature planning, is a natural process reflecting the way most organizations work.

Although on the surface, task-based change control appears simple, a closer inspection reveals many of its complexities. Change package-based change control is a natural process because it allows design teams to define changes without affecting those who define problems and features. A clear change process, including the data and authorities, can be easily defined to meet the needs of the development and system integration teams. At the same time, problem tracking and feature planning processes can be similarly defined with the appropriate data and authorities. Task-based change control imposes restrictions and creates a certain level of confusion because it tries to equate a task with a change while also trying to equate a task with a problem or feature definition.

Change Control Strategies

A few guidelines are recommended for change control strategies:

File-based change control will not meet the needs of even a small project (more than one person).

Use a problem tracking system for tracking problem reports. In fact, a customer support problem tracking system should be separate from a design team system as the former deals with non-problems, duplicate problems (raised by several customers), data configuration problems and the like. Design team problems deal with problems against the product specification.

Avoid the use of task-based change control. Use change packages (i.e. first class change objects).

Identify the owner of the features, problems and changes tables.

Define the flow of a change through the product life-cycle and ensure that it is one that is suitable to your organizational behavior.

CM+ and Change Control

The CM+ family of software management products offered by Neuma Technology Corporation includes a functional Configuration Management product. The development of change packages within the CM+ product has evolved through almost twenty years of large project experience. In 1978, change packages were introduced into the largest software project at Bell-Northern Research Ltd. by Neuma's current chief technology officer. These change packages have undergone continual improvement through the experience of very small (25,000 lines of code) and very large (25,000,000 lines of code) projects.

Over this time, several practical improvements evolved so that in the CM+ product, an advanced change package functionality is made available. Change packages in CM+ are characterized by the following properties:

A change package is referred to as a software update, as it is used to update a software baseline.

Each update has a unique identifier.

An update reflects a change made by a single designer whose user id appears as part of the update identifier.

An update includes the set of files or modules modified as part of the update.

An update includes the reasons for the update. These are typically a combination of references to problem reports and approved features.

An update also includes system integration data, environment data, and other user data as may be required by the project team.

An update is targeted to a particular development stream. CM+ will automatically guide the designer through branching requirements based on the development stream. As well, the development stream may be used to help ensure that the appropriate problems or features are being addressed as part of a change.

Updates may have dependencies on one another. This can be used to ensure that system integration of an update respects the design requirements.

CM+ supports the following as a subset of a typical change control process:

A designer checks outs files against an update, usually after the reasons (e.g. problem and feature numbers) have been associated with the update. The check out operation applies both to new files and those being revised.

A delta report can be produced for a software update at any point during development. This may be done through a dialog panel or through a graphical history browser.

Submitting a software update automatically checks in all of the modified files.

An update made to one development stream may be propagated to another development stream or later removed from any stream in which it exists.

System integrators may select updates for a build, promote updates through the change cycle or ensure that promotion of updates don't occur before pre-requisite updates have been promoted.

Builds may be defined in terms of baselines plus a set of updates.

Baselines may be automatically defined based on the promotion level and stream of an update.

This process is easy to support in CM+ because change packages are used as a form of change packaging. As such all information required any stage of the life-cycle is bundled with the change. The configurability of CM+ allows the process to evolve over time without costly administrative effort or down time. The goal of having the system integration team "pull" changes through the system, for each development stream, is readily achieved so that the design team can focus on its primary design and implementation tasks. The scaleable nature of CM+ makes it well-suited for large, complex projects resulting in a natural environment for development and effective control of changes to a software product or product family.

More white papers:

Find out more ...