White Paper: CM: THE NEXT GENERATION of SCM Standards

One of the biggest problems with SCM standards is that the industry is currently too fragmented. Sure there are lots of ways to do things and plenty of "high level" standards out there. But as a whole the industry uses different terminology for the most basic concepts, and fails to understand that standards must go beyond "ability" and push the industry forward. I do like the "maturity" model as a framework for helping to move forward.

The high level standards are primarily process standards: Make sure you identify every component, manage and control change, have a means of tracking the state of your CM artifacts and ensure that you can prove that you build is what was asked for. These are important but very high level criteria. The problem has been that, as we look deeper, there is greater divergence - there are various basic capabilities without guidance.

For example, it's fine for an SCM standard to say that all revisions of a file, once registered in a CMDB, must be available for all time. And it's fine to say that parallel branches of revisions must be supported. But these capabilities, without guidance, can lead to poorer SCM. Just look at those projects where branching and/or labeling have gotten out of control.

Now one of the more frustrating things I find is the lack of a basic standard for terminology. What do you mean by "status accounting", or even worse, is it a "change package, an update, a changeset or a change"? So maybe it's time to start nailing things down a bit. Of course if we try to do this, we're in for a real battle - that's just the nature of standards.

Perhaps there are some of you who would like to add your opinions so that we can somehow march toward some better standards over time. I'm not going to try to bite off the whole standards issue in one article. It's just too big.

Terminology
So perhaps, to simplify the process of the SCM Process standards discussion, I'll focus on terminology. I hate to leave behind the "push the industry forward" part, but for now I will, while addressing that in a future discussion on standards.

Terminology is such a crucial part of a standard and often the most "visible" component. Lack of consistent terminology can cause confusion even with the best standards. Even if the standards clearly define what the terms mean, if the definitons are unnatural or divisive, the standard, clear as it might be, might lead to overall confusion and rejection.

Now terminology itself is a large domain. There are "things" (Changes, Revisions, etc.) and there are "actions" (Identify, Audit, Build, Release, etc.). Some even have the same name (Build vs Build, Release vs. Release). As actions tend to act on things, I'll focus first on things.

Below I present a bunch of terms used in SCM. I'm sure I don't have all of the terms used in all processes and tools around the world, but hopefully I've captured a number of them. In some cases, term Name Standardization is required, in others term Definition Standarization is required. I selected my preferences, and, more importantly, I've given my reasons for these selections. You may have different preferences and better reasons. Great! But let's hear from you.

Name Standardization
There are a lot of terms used to describe basic concepts. We need to drastically reduce this set of terms (not concepts).

What I'll do in each of the next sections is to identify a set of terms that are used and give my reasons why i prefer some over others. Hopefully, we can follow up with a thread on CM Crossroads, and move toward more standard terminology. I know it's hard for some tools to change terminology, but perhaps over time, if we have some agreed upon terms, we can move there. Then the process descriptions will at least seem less tool dependent.

Change Package
Change, Update, Changeset, Change Package. A Change, in this context, is a modification of a set of files/data elements for a specific purpose. It doesn't have to be source files. It can be requirements. It can be documents. But maybe it should be a coherent set of objects for any given change so that the "modified revisions" is a homogenious set of objects.

The problem with the word Change is that it is a loose term used in various contexts. A Change Review Board looks at Change Requests. A developer create a "change" which is a modified set of source code files for a specific purpose. Yet the word is a natural fit and I would really prefer to use it if I didn't think it would cause confusion.

"Update", used by Neuma, was chosen because it was originally used by IBM (way back), and it was more specific. But these days it seems to have a new meaning as a dynamic change to a program (usually applied over the internet).

Changeset, originally was coined by "Aide de Campe" (subsequently bought out), and meant a set of source code modification directives that could be included or excluded from a particular configuration. That's different than the "changeset" used by Microsoft and others. "Change Package" is bulky, and also has the connotation of Package, implying it's really a collection of "changes", or at least a recursive concept, at least in my mind. It fits well, when you're used to just managing files, but if you've been using changes in CM for over 30 years, the word Package tends to be annecessary verbage.

My Vote: Changeset, Update, Change (in that order), but not Change Package.

Revision or Version
Revision, Version. OK, so let's say a file is changed and put back into the repository. It's still the same "file", just a modified "version" of it. I used to use the term version, but for a parallel connotation, not a sequential one. I've dropped the use of that term altogether now because it is too ambiguous. Which version do you have: English or French? In my eyes, this is very different than 1.1 or 1.2. And this causes confusion especially because most software deliverables come in multiple versions. So I prefer the more exact term Revision, also used by hardware folks, and also more telling of what is going on.

My Vote: Revision

Variant
Variant, Version, Generic, Option. OK so what do you use for parallel "versions"? I'm not talking about parallel releases, but separate product configurations stemming from a given baseline. Like the Military version vs. the Commercial version. Or the Spanish version vs. the English or French versions. To use the word version is still confusing because of its sometimes normal usage as a Revision. I've seen the term Generic used at times, but this is normally for larger variations (like Military/Commercial) and is simply not a well understood term.

Variant seems to me to hit the nail on the head. It's a variation of a standard offering. And indeed, there can be dozens of variations. Option is a good word, but it is really a selection of several Options that determine a Variant. So a French Military Variant is created by selecting the Military Option and the French Option. [Make your life easier by deferring as many Option selections as possible to Run-Time, reducing the number of Variant deliverables.]

My Vote: Variant

Problem
Problem Report, Problem, Defect, Bug, Issue, Non-Conformance. When a problem is found with the way software behaves, there are many different words we use to identify it. Really, what we have is either an error in the original specification, or a non-conformance with that specification. Non-conformance is fairly accurate, though a bit of a mouthful, and doesn't deal with specification errors. Issue is used strongly in the military, and outside of it too. However, "issue" does not necessarily denote a problem with the product. For example, consider a "pricing" issue or a "support" issue. It's more a statement that the customer is not satisfied. I like the term "Problem Report" but it's a mouthful, and can be confused with a Report about Problems as opposed to a statement of problem. Bug is short and historical, but not very friendly once you move outside of the developer/tester domain. Problem, as in Problem Tracking, seems to have a lot of support, and Defect as well, but to a lesser extent.

My Vote: Problem, Defect, Bug (in that order)

Activity
Feature, Activity, Task, ToDo Item. What do we call the things we do to a product that are different than fixing conformance to the original specification? They're features, right? Well, usually. Sometimes they're refactoring activities. Sometimes they're "code instrumenting". Sometimes they're creating basic architectural components upon which features will be built. Feature is a good term for what the end user will see, and it is an important and useful term. ToDo Item is more general, but perhaps too much of a bookkeeping item. But perhaps Task or Activity fit better.

I have a bias here. I like these two terms because, unlike software problems which are too numerous and usually too simple to fix to spend time planning for each one, tasks and activities require some definite specification and expected effort. In other words, they fall well into the Project Management and Product Management domains. The former has to deal with scheduling/tracking the work, whether in an Agile or Traditional manner, while the latter has to deal with what is being completed for the next release(s) of the product. So I think it's important to tie these into the CM world to ensure we don't have different Project Management and CM databases, processes and efforts.

My Vote: Activity,Task

Request
Request, Change Request, Issue, Feature Request. So what about a customer issue? How do we name it? What happens when the customer needs something done? We track it as an issue, or a request. Sometimes it's a change to the requirements, but it is the impetus for such change that we're considering here.

There are a few things to keep in mind here. One, the request is sometimes for a feature, sometimes for a problem, and the customer doesn't always know which. Two, the same request can come from multiple customers. Three, the customer may be an internal user or even a developer. Four, the request does not always require the vendor to do something, other than perhaps pointing to the correct page of the documentation.

Issue is a good word, but it implies problem. Feature Request implies feature. Change Request (CR) is good except that it doesn't always imply that change is required. So I like simply Request, although I don't mind CR as long as there is no confusion with the Change itself. A Request lies in the customer domain. A Problem or Activity lies in the product development domain, eliminating duplicate requests and change-free requests.

My Vote: Request, Change Request

Requirement
This one seems fairly straightforward. We don't have other terms for it, as long as we understand that a request may be an impetus for Requirement change as opposed to a requirement itself. But alas, although the choice of term is fairly standard, the meaning tends not to be.

There are basically two camps. One says a Requirement is a requirement on a Product. The other says a Requirement is a requirement on someone doing work. In the later case, we start with product requirements, and then map them into the design phase and "allocate" new requirements for the design team. This may be a single step allocation, or there may be multiple steps (e.g. from high level design to software detailed design).

There's a big difference between a product Requirement and a design Requirement though. A product Requirement is a "marketing" input - either as part of a contract, or as part of a market standard. A product team doesn't control product requirements, though it does control everything derrived from them. In fact it has to do project management on everything derrived and as such, it would be best to turn the derrived artifacts into project activities rather than having parallel activity and requirement artifacts for product management and project management. Note that there is a lot more data associated with planning and tracking activities than just the "requirement" description and a few attributes such as weight.

Still, it doesn't matter how hard we try, the requirement flow down (i.e. allocation) process is deeply rooted in many projects. As such, it might be unreasonable to expect the terminology to ever change. So instead, I recommend prefacing the term:

Customer Requirement: a requirement expressed by the customer
[Product] Requirement: a requirement for how the product will function, taking into account customer/market requirements.
Design Requirement: any lower level requirement allocated from a Product Requirement

In this scenario, I recommend that Requirement alone imply Product Requirement, that is, a requirement on how the product will function. Note the difference between Customer and Product Requirement. A customer may request the ability to automate backups once a week. A product requirement might instead say that automated backups can be done according to a custom schedule [which might include weekly]. Both of these are "product" requirements (i.e. how the product must/will/should function, but the latter is applicable to a greater market.

Also, be careful of those who distinguish between allocated requirement levels based on the amount of detail. Product Requirements and Design Requirements both need full detail. It's just that one details how a Product functions, from a black box perspective, while the other details the implementation architecture used to build the product. True enough, a Product might have sub-Products, each with its own set of Product requirements. But this again is different from Design reqiurements. For example, a Product might need a million pounds of thrust, and it might involve several engines, products in themselves, as sub-products. But the design identifies whether four 1/4 million pound thrust engines are to be used, or five 200,000 pund thrust engines.

Now despite all of this discussion, there are still numerous labels on requirements: System Requirements, Software Requirements, High Level Requirements, Subsystem Requirements, etc. What's important here is whether these terms represent "product" or "design" requirements, and what camp you're in, because that dictates what you mean by Requirement.

My Vote: Requirement, with a prefix when it is not a Product Requirement.

Artifact
Item, Artifact, Object, File, Document, Datum (Data), Record: So what are the things we're managing in our CMDB? Files? Well if we're only managing Files, we're a first generation CM process. We have to manage all sorts of data, in different sizes, shapes and forms. Some need revision control, while this is an overkill (read overly complex) for things like Requests. Some need Change Control - well, maybe all do to some extent - and some are just stored (e.g. attachments to a Problem).

It doesn't really matter what we call these things. And we're not going to get consistency here at any time in the near future because different things have different properties - like outputs of a process being referred to as Artifacts. Good name, but so is Item, such as a list of Items making up the database subsystem. Object is not a good choice in a software environment because of the obvious other connotation. Document is all right as long as its used in the traditional sense only, otherwise it is simply too confusing to the average reader. Similarly Data is fine if used in the small piece of data sense and not the large object sense which rarely comes to mind when the word is used. Perhaps Record is a bit more inclusive in this case, but from a relational perspective, large objects are not really historically part of a record.

The thing to avoid is that of taking an existing term with historical meaning and trying to stretch it into a wider (or narrower) meaning. This causes confusion.

My Vote: Item, Artifact, Record (in that order), but any of the others only in their traditional contexts.

Iteration
Iteration, Cycle, Internal Release, LifeCycle: The development cycle. What do we call it. LifeCycle used to fit perfectly, with a new LifeCycle for each release. Then came Agile. Now LifeCycle clearly has the meaning of Traditional LifeCycle. Cycle is a bit too generic. Iteration is good but does currently tend to imply Agile methods. Still, there's no reason why, in a traditional shop, each release cycle couldn't be considered an Iteration. Internal Release is OK except that it tends to imply "but not real releases". But a Release is really an Internal Release that is followed by full verification, audit, product documentation and other packaging. So we'll stick to dealing with the development cycle in this definition.

My Vote: Iteration, Internal Release (in that order)

Development Stream
Stream, Development Stream, Release Stream, Release Branch, MainLine, Product Branch: So what do we call the sequence of development work done leading up to a release. This may involve one or multiple Iterations. It's really a stream of work activities including design, implementation, testing, documentation, etc. So I like the term Development Stream, but that's a mouthful. I also like Release Stream, also a bit long, so I like the shorter term Stream. Some like to use the terms Release Branch, generically encapsulating all of the work done along a development branch, and others MainLine, particularly if the Release Branch is the Main Trunk for the source code.

The problem with both of these terms is that there is far more than source code in a development stream, and Release Branch and MainLine simply don't seem to suffice. For example, there are Requirements, Activities, Problems, Documentation, Source Code, Test Cases, Test Results, and so forth. Some might argue that there should be branches for each of these things, but I think that's trying to shoe-horn a natural process into a square container. Yes, Requirements need branches (e.g. release 1 requirement tree vs. release 2 requirement tree). So do some forms of Documentation and Source Code. But "activities" should be specific to a development stream - no branching. Problems might apply to multiple streams, but certainly its the solution or fix that is tracked against each.

And depending on the granularity of your test cases, it might be natural (coarse granularity) or overkill (fine granularity) to have branching, although certainly some fine-grained test cases will apply to specific releases. The more natural approach here is to tag fine-grained Test Cases with the streams to which they apply. Similarly, Activities and Changes should be tagged with the stream to which each one applies.

There's a bit more in this case though. Unlike the other terms, this term involves evolution over time. So Stream History, or Branch History, is important. For example, a Change made in release 2 will, in the vast majority of cases, be applicable to releases 3 and 4. A problem fixed in release 2 will not appear in releases 3 and 4. This assumes that release 3 is derrived from release 2 and release 4 from 3 (or at least 2 in this example). This is an important term. I like Stream and Stream History. And what it comes down to is that the Stream History is the Branch History of the Product Container, whatever that may be in your CMDB/Process/Tools. That is, how does my Product evolve.

The reason I like Stream History (and Stream) over Branch History is that most, though not all, projects use a branching structure that does not reflect Stream History. Hence, Branch History is a whole other consideration that's more applicable on a file/item by file/item basis. There is a rather recent problem with the word Stream though - it has a different connotation in some exiting tools (e.g. Accurev). So maybe Product Branch might be a good alternative. Note that this term must implicitly be placed in the context of a Product (or Product Hierarchy, or whatever is "released" together).

My Vote: Stream, Development Stream, Release Stream. Product Branch

Product vs. Project
Product, Project: We all know what a product is and what a project is, don't we? So why would these two terms cause confusion in a development shop, some may ask. Others know. The primarily reason is the use of the term "Project" in IDEs, such as Eclipse and Microsoft's Visual Studio. The IDEs want to focus on the project that is being completed as part of the development effort. But in the end, the IDE project outlasts the Project. In fact, Product would have been a better word, or perhaps Deliverable, Deliverable Set or Package. An IDE project creates some deliverable(s). But when the release is all done, the IDE project persists to create the deliverables for the next release.

Traditionally, a Project is a managed set of Activities, and this set of Activities culminates, frequently, in a release, be it a new Internal/External Release, a Variant Release or a milestone. The Project is identified by a Work Breakdown Structure which outlines the activities to be performed. An IDE may be used for some of these Activities (and a wider set in more recent IDE releases), but the IDE "project" is a misnomer.

A Product is a collection of deliverables which together meet a product specification. A Product may be a hierarchy of Products, with sub-Products being reusable across multiple Products. So here's my take:

My Vote: Product, Project and IDE Project

Product: A collection of deliverables which meet a product specification.
Project: A managed set of Activities culminating in a Release or other milestone.
IDE Project: The term used by some IDEs for their product file collections.

Process Flow, Work Flow, State Flow. I'm not really sure if there is confusion about these terms in the industry - it's hard to tell. But just to be sure, here's my take on them.

State Flow. (aka. State-Transition Flow). State applies to an Item/Artifact. As it moves through the process it assumes different states which indicate who may act on it and what actions are permitted. For example, a Problem moves from it's original state through a triage process that puts it in one of several states. Eventually the Problem gets fixed or answered and reaches a "closed" state. Only the originator of the problem can agree that it is closed - i.e. the original problem has been addressed, and this would only be done once the problem reached some sort of "fixed and tested" or "answered" state.

Work Flow. Work flow shows how some input flows through the system to cause some output. For example, a Request may cause a "feature" Activity to be created which may cause a change to Requirements. The Activity may lead to the creation of product Changes, and Test Cases which are to be run on the changed product. The Test Results might then cause the Request to reach an "implemented" state and then a release might result in the Request being marked "completed" and ready for "closing" by the originator of the Request. Work flow documents the interaction of Items/Artifacts according to the planned process. It highlights traceability actions that must be performed (hopefully automatically in most cases). It does not focus on the who - the state flow deals with that.

Process Flow. This is a broader term that encapsulates both State Flow and Work Flow. It also involves other aspects such as consolidation of Work Flows into an overall Release Process. For example, milestones are met by performing regular builds which collect all (or a designated subset) of Changes which have been readied by the development team for a specific Product Development Stream.

Configuration
Configuration, Alignment, Baseline. I don't think there is a lot of confusion here. But let's clarify some things. A Configuration is an aggregation of a set of item revisions. If the items are Requirements, we get a Requirements Configuration. If the items are source code files, we get a Source Code Configuration. You might want to aggregate unlike items, but let's not go there right now. A Configuration typically consists of a consistent set of items (i.e. item revisions).

The process used to create this consistent Configuration is sometimes referred to as Configuration Alignment - aligning the right revisions into a Configuration. For this reason, you might find the term Alignment used as a synonym for Configuration.

Some might also use the term Baseline as a synonym, but this is not valid. A Baseline is a frozen (i.e. never to be chanegd) Configuration. Configurations can change, and dynamic Configurations, expressed as a set of rules for what should be in the configuration, change as the data in the CMDB changes. Now there may be tools that only permit you to create a Configuration by creating a Baseline, and in that case the Configuration cannot change either. But a Baseline never changes - it is a frozen Configuration, hopefully, but not necessarily, a consistent one. The purpose of a Baseline, as its name implies, is to measure change. How much have we moved on from the Baseline? What's the difference between the item as it is now and how it was in the Baseline?

My Vote: Configuration and Baseline

Configuration: An aggregation of a set of item revisions.
Baseline: A frozen Configuration

Build
Build, Build Record, Build Notice, BOM. So what then is a Build? Well, Build has three different connotations: [1] an action, [2] the result of a Build action, [3] a record of a Build[1] action. Even the action sense is often used as a noun: I'm doing a Build[1].

A Build[1] takes a Configuration and transforms it into a set of (potential) deliverables, (i.e. the Build[2]). Most often a Build[1] is done against a non-frozen Configuration - that is, developers creating their own Builds[2] to test their in-progress changes. It's up to the developer to know what incremental changes are being included, and this is done through a Workspace mechanism. But when a "system" Build[1] is done from the CM Repository (i.e. CMDB), it is often important to record exactly what was being built so that if there are problems they can be reproduced, fixed and retested. But some shops do system builds more than once a day. Does this mean that a new Baseline has to be created more than once a day in order to track what went into the Build? Some shops actually do this. However, if the Build is to reflect a Baseline, does that mean that all 12 Variants of that Build require 12 separate Baselines. Again some shops actually attempt to do this.

But a Build[3] Record is a record of what was built. It is not a Baseline. I can build the same Baseline with two different versions of a compiler to ensure that both versions give the same result. Same Baseline, different Builds. The important thing about a Build[3] Record is that it identifies:

You can see that several Builds can be performed from the same Baseline. In general, you "Manage the Baseline and Build the Subset". That is, your Baseline is an aggregate of all of the possible components that can be used for a Build. You select Variant Options to customize what you're building from the Baseline. You add in some Changes that may reflect a new Variant option, but more frequently, fixes Problems and/or adds or extends Features. You identify the revision of your Tools/Processes that are being used for the Build. So you might create a Baseline twice a month, but do Builds on that Baseline, adding in all other ready Changes, a couple times a day, or multiple times for different Variants. Developer Builds are (typically) not tracked so rigorously by the CMDB. A Build[3] Record, is often referred to simply as a Build[3]. A future Build[3] is sometimes referred to as a Build Notice (similar to Change Notice in the hardware world) or in the past as a Build Record.

Finally, BOM is used in the hardware world to describe the set of part revisions required to build something. There are a few differences though and so it might be wise to avoid this term.

My Vote: Build, Build Notice, Build Record

Context View
Context, Views, Context Views. There are a lot of products, a few Development/Support Streams for those products, loads of Changes, etc. in a CMDB. Navigating all of that data can be overwhelming. However, any decent CM tool will understand that you're interested in working in a particular Context. And the Context can be used to simplify the view of the data. Normally it's the context of a particular Build Record, or the Latest checked in source code of a particular product for a particular Development Stream. Sometimes its a combination of a Baseline or Build plus a set of Changes. Whatever it is, and however it is specified, the goal is to identify the context of your discussion, query, data search, or whatever.

When applied within the "context" of a tool, the Context is referred to as a Context, a View, or a Context View. There are other terms used here and there, but these are the most dominant. For the most part, I think any of these three terms suffices, with Context View preferred if there is any confusion. In particular, Context is a fairly generic term that can sometimes cause confusion - that's why I quoted it in the first sentence of this paragraph. View is a frequent database term as well, which is more specific than a Context View, usually applying to form.

My Vote: Context View, Context, View (in that order).

Upgrade Package
Service Pack, Increment, Update Package, Change Supplement. Lots of different terms are used to identify an upgrade package for something that's already been delivered. However, as this is really at the tail end of the ALM process, it won't have a big influence on the overall process or standards. Still, if we had to pick a term, I like Update Package, because it's clear, or Increment, clear but a bit more general in nature.

My Vote: Update Package, Increment (in that order).

Summing Up
So there you have it. I've hit on some of the big ones, but there are still a lot go. Please add to this article through the reply mechanism. What terms do you use/not use for CM artifacts, and why? Do you think we can ever attain common ground? We've been further apart than we are now. Some terms, like Branch and Test Case, were agreed upon decades ago. Some may never get there.

Just to summarize my picks (bold preferred):

Joe Farah is the President and CEO of Neuma Technology . Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe by email at farah@neuma.com

More white papers:

Find out more ...

Neuma White Paper:

CM: THE NEXT GENERATION of SCM Standards