More white papers:
Find out more ...
|
|
Neuma White Paper:
Change Package-based versus Task-based Change Control
Authors: Joseph A. Farah, Christopher A. Keyes
Neuma Technology Corporation
Submitted to: ICSE '97 CM Workshop
Boston, MA
Over the past few years, the Software Configuration
Management (SCM) industry has recognized that designers work most naturally by
updating software a change at a time rather than a file at a time. For example,
to fix a problem may require modifying three files, or to fix three related
problems may require changing two files. Traditional file-based change control
handled this by recording the same problem report reason against each file as
it was submitted to the software library. Now that a change packaging model has
been accepted, the design of the model must ensure that a new set of pitfalls
does not arise from this model. To this end, the authors propose a clear
distinction between explicit change package-based change control and task-based
(or implicit change package-based) change control.
Change Packages versus File-based Change Control
The pitfalls with the traditional file-based approach to change control are
well understood by the industry today. These include:
- the need to double key the reason or reasons for the change against each
file,
- the need to ensure that all of the files get submitted to the library,
- the need to check in files individually,
- the need to ensure that all files for the change are promoted through the
release life-cycle, and
- confusion between dependencies expressed to group files together into a
change and those used to identified prerequisite changes.
As well, the paradigm of working on a file at a time, though convenient for
a compiler, is not convenient for a designer or system integrator. For example,
the designer wishes to submit a change, to view a delta report for a change,
run a change through the code review process, promote a change to ready status,
merge a change from one stream into another and express dependencies between
changes. The system integration team wants to select changes for a build,
promote changes, yank changes, and so forth. Whereas a system integrator may
have to look through a list of 30 or 40 file revisions and try to determine
which ones to promote and which belong together in a file-based environment, in
a change-based environment, an organized list of perhaps a dozen changes are
easily scanned and the appropriate ones may be promoted.
Once the concept of change packages has been accepted, the next step is to
identify a successful model for the changes. This paper discusses the two basic
models:
- change package-based change control (explicit change packages), and
- task-based change control (implicit change packages).
The paper does not try to differentiate between the various checkout/checkin
models. Hence, it should be possible to configure any of the models to support
one or more of the "locking checkout/checkin", "recorded non-locking
checkout/reconcile on checkin" or "unrecorded non-locking checkout/reconcile on
checkin" models, with or without notification triggers or logic.
Change Packages versus Task-based Change Control
Change packages define changes by collecting files and other appropriate
data together in a first class change object. Task-based change control works
by associating a task id with each file that is checked out. The task id
relates directly to a problem or feature definition (and may itself be the
problem or feature identifier). On the surface it may appear that there is not
a major philosophy difference between the two models. However, with respect to
implementing a change process, there is a significant difference. A review of
each method is in order to help illustrate the differences.
Change Control With Change Packages
A change package is a collection of information pertinent to a change (in
our example, a software change) to a (software) baseline. A change package is a
first class object. It has an identifier which can be used to uniquely identify
both a set of files which have been modified, and the reasons for their
modification. Typically, the change package contains other information
pertinent to the change. Because the change is used to "update" a software
baseline, it will typically contain additional information which is process
dependent. This may include information such as:
- the software development stream, and perhaps a milestone within the stream,
to which it is targeted (e.g., new development or older support stream),
- integration instructions, including instructions for resolving an
incompatibility if the change is not upward compatible (e.g., run a database
conversion program),
- special process-related metrics,
- environment data (e.g., where are the files kept while the change is being
made),
- compatibility and pre-requisite change dependencies, and/or
- the designer, reviewer(s) and testers for the change
The goal behind the change package philosophy is that the change can proceed
through the various stages of promotion without involving participants from the
previous stage. The most common involvement from the traditional file-based
model was in identifying which files belong together for a change. In most
shops, however, this is only a part of the change puzzle.
When changes are made using change packages, files are checked out against a
change package as required. The change package can have one or several reasons.
In a well-designed model, the system can optionally enforce that the reason or
reasons used for the change (e.g., the problem or problems being fixed) are
valid; that is, that the reasons have not disappeared over time (e.g., problems
already fixed), that the developer is authorized to quote the reason in the
developer's work area, and so forth.
When the change is completed, a delta report can be produced by specifying
the change identifier, the change can be submitted to the software library as a
single unit and stays together throughout the life-cycle of the software. If a
change implements a feature and fixes a problem at the same time, the problem
and feature states can be automatically promoted, though the promotion criteria
for features are typically different than that for problems. A problem is
typically fixed by a single change whereas a feature is often implemented as a
series of changes, possibly by several designers.
With change identifiers, it is helpful to include the designer's "login id"
or initials as part of the identifier for easy recognition. Often, changes are
viewed by department (e.g. review all changes made by my staff this week). It
is important that the CM tool permits easy access to changes by department,
owner, or date to help maximize designer and manager effectiveness.
Tasks, Features and Problems
The term task is often used to identify work to be done. Often tasks reflect
items which are typically collected into one or more categories. For example,
tasks may be split into "feature" implementation tasks and "problem" resolution
tasks. Features and problems each have separate process flows, which differ
from the "change" process flow. Features and problems are typically tracked
against functional areas of the software product. Changes are typically tracked
against design architecture areas of the software design. It often makes sense
to implement a feature in multiple changes rather than in a single change. The
vast majority of problems are resolved using a single change. Those portions of
a change which will impact the development environment the most are sometimes
done early (so that there is more time to absorb the change) or even as part of
a previous release; often they are done to coincide with an API definition
freeze to ensure the stability of the development environment going forward.
This is a critical requirement in large projects.
A "problem" definition typically consists of a problem description (which is
really a mini-specification), a priority, an owner and/or an assignee, an audit
trail indicating how the status, priority, responsibility, disposition, etc. of
the problem has changed over time, a test case and tester. The problem may be
tracked against a single software baseline or against multiple streams. The
process for a problem may depend on such factors as the priority. Typically, a
problem review board or a design manager reviews and accepts problems and
ensures that the priority, owner/assignee and description are correct.
In a similar fashion a "feature" definition may contain information such as
proposed release, a specification document, specification review comments,
planned/forecast and actual target dates, dependencies upon other features,
effort and cost estimates, owner/assignee, tester, test cases, approvals, and
so forth. Typically, the overhead and approvals associated with a feature
definition are more substantial than with a problem report, since the problem
indicates that a change is needed to match an already approved specification.
As such the processes for problems and features are different.
Feature definitions usually arise out of a product management function which
identifies feature requests up front and prioritizes them for roll up into the
product over time. The prioritization typically affects the release to which
the feature is assigned. Problem reports usually arise out of problems within a
given product stream, which may or may not apply to other release streams.
Task-based change control arises when a task is tied directly to a specific
feature or problem (i.e. one -to-one relationship) or (more often) when the
feature or problem definitions themselves are used as the task definitions.
Task-based Change Control
When task-based change control is used, each file which is modified is
related to a specific task (i.e. to the associated problem or feature). The way
to identify the files associated with a task is to traverse the "file revision"
database to see which files refer to a given task. This can be a slow procedure
in large, long-term projects, although inverted relationship implementations,
where a task to files relation exists, may be used to help improve
performance.
As with change packages, the information against a task is used to describe
and control a task. Unlike change packages, there is no first class object to
contain information associated with a change, so the feature and problem data
schema are compromised to include change data, or the required change data is
not tracked, or is tracked against each file of the change.
There are two basic ways to organize task-based change control. One is to
have a single table of "tasks", regardless of whether the task describes a
feature, a problem or otherwise. This in turn implies that the same information
is tracked for both features and problems, further compromising the data
schema.
A second way to organize task-based change control is to let the task
identifier be the identifier used for two separate tables, one for problems and
one for features. A significant problem with this approach is that there is no
longer a single type of entity through which change control and configuration
status accounting can be done. As a result, a hybrid, and significantly more
complex, change control process results. Many databases have difficulty dealing
with hybrid sets of records and so each must be dealt with separately. For
example, if a query is made to identify recent changes, the query would have to
traverse both a "problems" and a "features" table. Because of this
complication, it is often the case that no change information is stored in the
table.
Reasons to Avoid Task-based Change Control
A typical task-based change control solution will coerce a problem or
feature definition into a task definition. In this way a problem/feature task
does not have to be created for each problem/feature. However, this savings
means that there is no first class object corresponding to a change. This is a
fundamental difference between task-based and change-set based change
control.
Task-based change control puts a different perspective on change management
than is found in most software development shops. A summary of some of the
problems with task-based change control follow.
Can Only Work On One Task Per Change
Change control becomes confused if an attempt is made to work on two related
tasks in a single change. This is a significant concern because often a new or
modified feature will inadvertently fix a number of problems. Or consider the
case where three problems in a related area need to be fixed. One task
identifies that a title needs its spelling corrected, another that the font
size is too small, and another that the title is not centered properly.
Although three tasks exist, the designer realizes that all three are easily
fixed in the same program module. With task-based change control, each task is
worked on independently requiring three checkout-edit-compile-test-checkin
cycles. With change package-based change control, the change references the
three problems to be fixed, so that only a single
checkout-edit-compile-test-checkin cycle is required.
Inability to Work on One Task Over Several Changes
Often a single feature requires several changes to implement it. For
example, the first change might introduce schema and interface changes while
additional changes would implement the associated feature logic. Although this
is supported in a task based scenario, it is not possible to track the separate
changes, as they all look like a single change. This means that the changes
cannot be independently promoted through the system. This can be mitigated
somewhat by defining several tasks for a feature implementation, but this has
side effects including:
- the overhead of definition and association of the related tasks to assess
the feature status,
- authority control on the tasks table, which can no longer be controlled
solely by the project management team since the design team needs to break
tasks up into implementation units, or
- completion of the first task or tasks may erroneously signal completion of
the feature implementation to others on the team.
Complexity of Adding Change Control Data to Task Data
Because the change does not relate to a first class object (other than the
problem or feature), additional data must be associated with problems and
features to deal with change control. This further complicates the views as
seen by the Project Management and the Problem Review Board perspectives, as
well as complicating the view of a change as seen by a developer. There is a
further confusion of data and a restriction on process implementations in that
it becomes difficult to associate data such as upgrade instructions, change
metrics, environment data, or other data with a change.
Variant Changes Require Duplication of Task and Compromises Inter-Stream
Comparisons
Quite often a feature has to be implemented, or a problem fixed, in more
than one development stream, for example in the current field release as well
as in the new development stream. If the source code is not identical in each
stream, a separate change is required in each stream. This may be further
complicated by the requirement to implement a feature on more than one platform
or in more than one variant of the product. Especially in the case of "problem"
tasks, this results in duplication of tasks to ensure that the changes can be
made in multiple streams. This further complicates the ability to identify if a
single problem has been fixed in a particular stream, or to identify features
or problems implemented in one stream or variant but not in another.
Confusion of Authority For Task Table Ownership or Separate Task and
Feature Tables
Whereas features are typically defined by project management early on in the
life-cycle, changes are typically sculpted by designers through the middle
portion of the life-cycle. Except in small projects, there is likely to be
significant differences between PM-defined "feature" tasks and R&D-defined
"change" tasks. This will undoubtedly lead to tension in the early parts of the
development cycle, and a lack of process clarity in defining and modifying
tasks, unless tasks are tracked separately for each group, in which case
traceability between the two types of tasks will be a necessary overhead.
Expressing Dependencies
If a task is also the container or focal point for a change, confusion
arises between task dependencies and change dependencies, assuming the
environment permits specification of both. Feature dependencies are expressed
at a functional level, whereas design dependencies are expressed at a design
level, and these are quite different in large projects.
Conflicting Process Flows
In a change-set based environment, changes are promoted through the system.
Changes have their own life-cycles which are closely related to the development
and system integration and test functions and are controlled by the design and
integration teams. Tasks tend to have separate life-cycles which reflect
approvals, verification and acceptance and are controlled by the product
management team. This results in confusion both in the life-cycle definition
and in the authorities within and implementation of the process.
Viewing Most Recent Changes
One of the most frequent operations performed by a designer is to look
through their most recent changes to resurrect ideas or to perform design
recall. When changes are first class objects, they are created as the need to
make a change arises, rather than well in advance of the change. With features
and problems being defined well in advance of the corresponding software
changes, it becomes very difficult in a task-based setting to identify a user's
most recent changes. This has a tremendous affect on graphical interfaces and
on intelligent use of defaults, where the most recent changes are otherwise
used to focus the user or default values on the most pertinent information from
the designer's perspective.
Observations
There is a fundamental philosophical and practical difference in the two
models. The change package model treats problems, features and changes as
separate objects, each having its own process. The task-based model attempts to
normalize the concepts of tasks and changes.
A clear benefit of both task-based and change package-based change control
is that file revisions are packaged together allowing easy promotion of the
files associated with the task or the through the life cycle. Apart from this
fact, however, the foregoing presentation appears somewhat one-sided in favor
of change packages. This is because change package organization, along with
problem tracking and feature planning, is a natural process reflecting the way
most organizations work.
Although on the surface, task-based change control appears simple, a closer
inspection reveals many of its complexities. Change package-based change
control is a natural process because it allows design teams to define changes
without affecting those who define problems and features. A clear change
process, including the data and authorities, can be easily defined to meet the
needs of the development and system integration teams. At the same time,
problem tracking and feature planning processes can be similarly defined with
the appropriate data and authorities. Task-based change control imposes
restrictions and creates a certain level of confusion because it tries to
equate a task with a change while also trying to equate a task with a problem
or feature definition.
Change Control Strategies
A few guidelines are recommended for change control strategies:
- File-based change control will not meet the needs of even a small project
(more than one person).
- Use a problem tracking system for tracking problem reports. In fact, a
customer support problem tracking system should be separate from a design team
system as the former deals with non-problems, duplicate problems (raised by
several customers), data configuration problems and the like. Design team
problems deal with problems against the product specification.
- Avoid the use of task-based change control. Use change packages (i.e. first
class change objects).
- Identify the owner of the features, problems and changes tables.
- Define the flow of a change through the product life-cycle and ensure that
it is one that is suitable to your organizational behavior.
CM+ and Change Control
The CM+ family of software management products offered by
Neuma Technology Corporation includes a functional Configuration Management
product. The development of change packages within the CM+
product has evolved through almost twenty years of large project experience. In
1978, change packages were introduced into the largest software project at
Bell-Northern Research Ltd. by Neuma's current chief technology officer. These
change packages have undergone continual improvement through the experience of
very small (25,000 lines of code) and very large (25,000,000 lines of code)
projects.
Over this time, several practical improvements evolved so that in the
CM+ product, an advanced change package functionality is made
available. Change packages in CM+ are characterized by the
following properties:
- A change package is referred to as a software update, as it is used to
update a software baseline.
- Each update has a unique identifier.
- An update reflects a change made by a single designer whose user id appears
as part of the update identifier.
- An update includes the set of files or modules modified as part of the
update.
- An update includes the reasons for the update. These are typically a
combination of references to problem reports and approved features.
- An update also includes system integration data, environment data, and
other user data as may be required by the project team.
- An update is targeted to a particular development stream.
CM+ will automatically guide the designer through branching
requirements based on the development stream. As well, the development stream
may be used to help ensure that the appropriate problems or features are being
addressed as part of a change.
- Updates may have dependencies on one another. This can be used to ensure
that system integration of an update respects the design requirements.
CM+ supports the following as a subset of a typical change
control process:
- A designer checks outs files against an update, usually after the reasons
(e.g. problem and feature numbers) have been associated with the update. The
check out operation applies both to new files and those being revised.
- A delta report can be produced for a software update at any point during
development. This may be done through a dialog panel or through a graphical
history browser.
- Submitting a software update automatically checks in all of the modified
files.
- An update made to one development stream may be propagated to another
development stream or later removed from any stream in which it exists.
- System integrators may select updates for a build, promote updates through
the change cycle or ensure that promotion of updates don't occur before
pre-requisite updates have been promoted.
- Builds may be defined in terms of baselines plus a set of updates.
- Baselines may be automatically defined based on the promotion level and
stream of an update.
This process is easy to support in CM+ because change
packages are used as a form of change packaging. As such all information
required any stage of the life-cycle is bundled with the change. The
configurability of CM+ allows the process to evolve over time
without costly administrative effort or down time. The goal of having the
system integration team "pull" changes through the system, for each development
stream, is readily achieved so that the design team can focus on its primary
design and implementation tasks. The scaleable nature of CM+
makes it well-suited for large, complex projects resulting in a natural
environment for development and effective control of changes to a software
product or product family.
|