Neuma White Paper:
CM: THE NEXT GENERATION of Best Practices
There is process and there are tools. Generally, the tools are lacking and the process could be improved. Someone develops a new tool; someone else a new set of processes. A bit better is some areas, less so in others. To make real strides in moving forward, we need to identify the Best Practices and understand why they are Best Practices. This article presents what I consider to be a number of Best Practices for CM and ALM, and gives the rationale behind them.
Perhaps the term Best Practice has been over-used. There are Best Practices, which are the proper way of doing things, and there are best practices which are, perhaps, the preferred way of doing things, or a subjective view on how things should be done. You be the judge on the Best Practices presented here.
I've selected a number of key practices Neuma uses in its own product development. These have been split up into a few basic areas of CM:
- Change Mangement
- Change Requests
As you read through these, you may notice that some build upon others. Following a set of best practices will improve your process. But it will also reveal other areas which can be improved upon.
This first group of Best Practices deals with identifying the objects being managed. Identification is a key function of a CM system. When done poorly, it can have a significantly negative impact.
Use dumb key identification, don't embed data within keys
CM and ALM have many varied objects to manage: files, problem reports, requirements, test cases, and so forth. Each object needs to be identified with a key that can be used to retrieve the object quickly. I've seen some very, let's say, creative key identification schemes. A requirement: PPPWNNN-### where PPP is the product, NNN the subsystem, W the weight and ### a sequence number to differentiate it from other requirements. What's wrong with this kind of key? It puts all sorts of data into the key. If the data changes, does the key change? And if so, how do you relate it back to the previous value of the key?
Keep your keys dumb. How dumb? Well that depends. For requirements, I like something like r.#### or rq-####. What about adding in a product prefix [r.PPP####]? This might be OK, but generally, requirements are going to be one thing that survives a product because it's used in another. In fact, various requirement sub-trees will survive from one to another. Down the road, the product prefix could then be a liability.
What about change identifiers? Is c.#### preferred? Here I like to introduce the concept of "ownership". A change (not a feature) is implemented by a developer. It may fix a problem or two. It may implement all or part of a feature, or yank a previously defined feature. If your changes are owned by the developer, it's OK to have the developer's name in the change. So c.joe112 and joe-113, etc. would all be valid.
Similary, if I have a file name "abc.def", rather than naming the revisions of that file rev.193453, rev.195562, etc., I might want to name them "abc.def-1.1", "abc.def-1.2" etc. The fact that the revisions belong to (i.e are owned by and don't make sense without) the file allows me to improve on my dumb key identification. "abc.def-1.2" means a lot more than "rev.195562". Similarly, branches (e.g. "abc.def-1") may be identified using the parent key because the parent owns the branch.
What about arbitrary, user-assigned names, such as for files. From a CM vendor's perspective, life would be a lot simpler if file names were just f.1232, f.15334 and so forth. In reality, filenames are complex. I might have 100 files named Makefile. The key to identify the Makefile must then include the entire path down to the Makefile, "/usr/me/myprod/mydir/Makefile," to keep it unique. But the problems don't end there. What happens if this is a shared file or a shared directory, and "/usr/me/myprod/yourdir/Makefile" is supposed to be the exact same file object? And what about when a new version of myprod has the same file moved to /usr/me/myprod/mysub/mydir/Makefile. Even if the file didn't change, it has a different key because a new revision of one or more of it's ancestors included a move, rename or other structural operation.
To deal with this case, most file systems and most CM tools have an internal key identifier. The CM tool then does its best job to track a file so that it can distinguish a new file from one that has moved. In this way, historical information about the file is preserved. CM tools, like file systems, also have their own ways of dealing with shared files, directories, etc. So the best identification advice I can give you here is to pick a tool that does an adequate job.
Track Titles and Descriptions, including for Files
Closely related to dumb identification is the concept of using titles. A name r.1534 doesn't convey a lot of information on it's own. But if it always appears beside a short title, such as "Subsystem Assembly Operating Temperature Range", it becomes very clear. All objects, including files, should have short descriptions associated with them. Don't try to fit a description into a file name. Even though some have tried to do this, it is generally a bad practice. Use a title instead.
I use the word "title" because I'm talking about a short description that will uniquely (or nearly uniquely) help a human to identify an object. If the description is vague (e.g. "Another module of the file system"), it doesn't help. I already know this by browsing the source tree. As well, long descriptions, good though they are, do not take the place of a good title. A title is visible on pull-downs, summary reports, query panels, etc. There's no real estate for long descriptions and just using the first few words of a long description is usually insufficient.
Perhaps the most important place for accuracy in titles is in defining a title for a problem report. If your problem report titles are like "Found bug in CM tool", to determine if you're bug has been found before will be a long task. If instead, your problem report title says: "CM Tool Crashes on Checkout When No Context is Established", you will more easily be able to identify duplicates. It doesn't matter that the full description is more detailed. If you get 20 matches for your keyword search, and the titles are clear, you likely have to browse the titles visually and identify if there's a duplicate. Otherwise, you have to navigate and read 20 long descriptions to identify that your problem is unique.
Name files, but number requirements and test cases, grouping them hierarchically
We talked about how the world would be easier for CM vendors if all files were just uniquely numbered instead of being named. Well, although this is preferred by virtually all developers, it is actually a fair bit of work to "name" files rather than to "number" them. Imagine if you had to create names for each test case, for each requirement, for each activity of your WBS.
It's fine to name files, source code or documents. But don't impose that for other things. Let the CM/ALM tool number your requirements and test cases for you. And pick one that allows you to arrange them hierarchically into functional groups. Unlike files, which are used by the developer extensively, for diagnosis, debugging, coding, searching, editing, etc., requirements and test cases are more often used in aggregate forms - the requirements tree or baseline, the test suite, etc. It's still necessary to identify them individually for traceability purposes, but the identification does not have to be performed manually by a product manager or test plan developer. It can be automated. When I create a requirement, I want to specify a title and a long description. I don't want to have to invent a name. Especially because I'm about to spawn 200 requirements from my customer document.
2D Branching - Branch Id, Revision within a branch. Branch parent as data, not revision number.
Here's one I've never understood. Why would someone adopt a revision numbering scheme that went 1.1, 1.2, 220.127.116.11, 18.104.22.168, 1.3, etc.? Well the answer is because somebody proposed it and made a tool available that supported it. Everyone knows that 1.1, 1.2, 1.2.1, 1.2.2, 1.3 is much clearer. And then you can have 22.214.171.124 off of 1.2.1 and 126.96.36.199.1 off of 188.8.131.52, etc. This supports a full tree structure.
According to my identification rules above, this should be OK because the parent key owns each subsequent level in a real sense. And that's true. However, as time goes on, branch numbers will get longer and longer. So I would propose that this living history portion of the key is a complexity.
Instead, I would recommend identification of branches and revisions within the branch. And just as a title is an attribute of a requirement, the history should be an attribute of the branch, not part of the key identification. Maybe your branches are labelled r1 r2 r2a r2b r3 r4 r5 (release 1 through release 5), and revisions numbered within each branch. r3.12 is very clear - the 12th revision in r3.
But r3.12 doesn't show me if r3 branched off of r1, r2, r2a or r2b and if it branched off of r2b, it doesn't show me where that branched from. But that's OK. You have tools to show you branch history, and if you have a good enough CM tool, your CM tool will worry about branch history so that you don't even have to. If my r3 branch has a "branched from" attribute that specifies that revision (such as r2b.9), then my tool has all the information necessary to build history.
This is a 2-Dimensional revision numbering system: Branch id + Revision id. This will simplify your discussions on branching strategies and help you to implement other best practices.
The next group of Best Practices I've addressed deals with branching. I'd be surprised if I find wide agreement across the CM community on these items. But I'll maintain that they are critical to the advance of CM technology in general.
Main branch per release instead of a Single Main Branch
There are certainly a number of discussions going on as to whether a single main branch is better than one main branch per major release. Single Main proponents say that it makes everything clear. But remember Einstein's principle: "Everything should be made as simple as possible, but not one bit simpler." The attempt to keep a single main branch is violating this principle.
First of all, if you draw your main branch along a time scale, and color code it to reflect the releases being produced, you'll find clear starting and ending point. The problem with this is that starting and ending points have to be imposed on the development team for each release. Inside the endpoints, there are very simple rules, but outside, things are complex. With a main branch per release stream, there is no need to end one release to start up another. Yes you have to say which stream you wish to work on, but you have to do that with a single main branch as well, because there is other data, problems, tasks, requirements, etc., whose view you wish to reflect the stream you're working on. As well, with main branch per stream, the process is always the same. It's just the throttles that are adjusted (i.e. increase/decrease the number of problems/tasks assigned to the stream).
Do not overload branching - Don't Branch when you don't need to
Branching is a powerful capability. And like all power, it can get you into trouble quickly. Branching has been used for numerous purposes including parallel stream development (good), bookkeeping for "change packages", bookkeeping for "baselines", bookkeeping for special builds, tracking promotion levels, performing temporary parallel check-outs, and more. "What power!" is usually followed by "What a mess!".
Don't make the mistake of using branching to work around deficiencies in your CM tool. Use a database instead. Change packages should be used to group files of a change together, not branching and labelling. Don't use a branch so that your temporary parallel checkout can be visible before checkin. Signal this with your meta-data. And using branches to manage promotion levels is a pessimistic solution, which, although it handles the worst case, makes the common case more complex. Bolster up your CM environment so that you can track data in a natural manner, rather than by branching and labelling.
In the end, you'll find a much cleaner branching strategy and less complexity. If you find it too complex a task with your current tool, start looking around. The result will be a more agile and higher quality process, consuming less resources.
Stream-based change management with stream history
Going one step further, you could adopt a stream-based change mangement strategy whereby branches are done for parallel stream development only. If you choose a CM tool that will fully support this strategy, you'll find that CM complexity will all but disappear, and you can focus on change management by targeting changes to specific streams, promoting them within the stream, and propagating them to other streams if and when necessary. There's a presentation by yours truly at the 2005 ALM Expo that goes into this in more detail, so I won't bore you again with the details.
In presenting Change Management practices, I've divided it into two groups. The first group deals with making changes. The second group, which perhaps should logically come first, deals with Change Requests.
Use Change Packages
In our tool they're called Updates. In another tool, Changesets. Elsewhere Change Packages. I don't care what you call them. Just use them. Change packages perform a number of key functions. If you're not using them, you're trying to maintain these functions through other, more complex means, or maybe not at all. Change packages.
- tie files together for a particular change.
- allow you to ensure that all files of a change move through the promotion levels together.
- contain valuable traceability data that does not have to be replicated for each file changed
- can be code reviewed (eg. deltas), propagated (merged), checked-in, etc. as a single entity.
Using change packages is the first and most fundamental step towards moving from file configuration management to change management.
While you're at it, don't use them just for source code. Requirements need change management just as much. Find a tool that will let you do both. Or perhaps you find other items in your development environment that could also benefit from change management.
Promote change packages
So how do you move from file based CM to Change Management? Once you're using change packages, take your promotion status off of your file revisions and place it on to your change packages. File revisions are either being change or frozen. They don't need any other promotion status. But the change is what moves through the system - all of the files together, as well as any other applicable data.
Ideally, you should be able to set your view to a given promotion level for a given stream, and as a change is promoted into that promotion level, you should see the file changes, added/removed or moved files and directories, and related information.
Promoting files is a dangerous practice. What happens if you promote one file but not another that was part of the same change? Not only that, but if your change has numerous files associated with it, you could be looking at significantly more work.
Change promotion has additional capabilities not available in a file-based CM scheme. For example, because changes are bundled, the explicit dependencies are more easily specified (i.e. between changes), and it is easier to detect implicit dependencies because changes may have partial overlap on the set of files modified. This means that your CM tool can do a lot more for you to ensure the integrity of your builds.
Change-Based Process: Plan/Review/Propagate/Yank Changes
A change should be a meaningful amount of work that can, generally, stand on its own without breaking a build. If multiple developers are required, a task might be implemented through a set of dependent changes. Once you've started working with changes, think in terms of changes. Plan your changes. Instead of implementing an entire feature, it might be better to implement the key data, interface and message components as one change and then the functionality as a separate change or two. Across a whole release, this strategy might stablize your release early on in its lifetime so that only feature capabilities and details need to be implemented on top of a stable base.
Avoid the definition of a Change that is tied to a feature request (i.e. where the change set is defined by the feature number referenced). This does not allow you to break up an implementation into more logical chunks. It also puts pressure on the developers to implement the entire feature in one go. That in turn might result in extended checkout times for some critical files that others might need. A Change must be a first order object and must be independent from Change Request objects (other than the obvious traceability).
Establish some guidelines. Try to implement minor problem fixes in a single change when convenient. Decide whether or not you want to allow bundling of fixes into a single change. Rather than propagating files between streams, learn to propagate changes, and configure your CM tool to support change-based propagation (and yanking).
Hold your code reviews around the Change, not around the feature or fix. And decide the level of review tracking you're going to do. Are you going to review changes before they're checked in (informal review), or after they're checked in (formal review), or both? Are you going to track the review against the Change package? Action items too? Configure your environment to support this process.
Handling change requests properly is a significant aspect of Change Management. Change requests fall more into the Planning and Product Management side of things, and less so into the development team side.
Separate Problems/Issue from Features
Change requests come in two basic flavors - requests to fix bugs, and requests for new functionality. Bug fixes are very different from features. Most of the time a bug is fixed by a single change, often to a single file, typically in less than a day. A bug fix request needs to be verified, but reproducing the bug is generally the largest portion of work for the developer. On the other hand, a feature may require less than a day, maybe weeks. It often involves changing several files and sometimes it is implemented through a sequence of changes rather than as a single change. A new feature is something that current customers have not yet paid for. The functionality hindered by a bug they have paid for.
In practice, these differences lead to very different processes for handling bug requests and feature requests. A bug is prioritized, assigned, fixed and integrated. The bug description is the specification. A feature has to be analyzed and specified. If it's a big feature, it may be broken down into component features and implemented over a series of releases. Because of this, features will go through sizing and scheduling in a much more extensive manner. Scheduling will include specification and review. As well because bugs are generally more numerous and require much less developer effort, it is often the case that an average cost (i.e. planned effort) will be allocated to a bug fix, rather than sizing each one.
A feature will typically need a document. A problem typically won't. A feature will likely be part of a feature decomposition tree. A problem should reference an existing feature or requirement. And so forth. Problems and Features are different animals and will need to be treated differently in the CM tool and processes. Don't try to shoehorn features into problem-shaped containers. Use two different sets of objects.
Separate Customer Data from Development Data
All that being said, from a customer perspective, they've got a gripe list. Feature XYZ should behave like your competitor's product. ABC has a bug that won't let me do this. Customers distinguish between features and problems, but generally, a customer request is a mini-specification, whether it be a problem report or a feature suggestion. The vendor needs to track customer requests and report on the progress being made against these requests. On the customer side of the fence, a request is a request. It's not even always clear if a request is a problem or a feature request.
When you report on customer requests, you want them to be able to see the data pertinent to the request. You do not want them to see your development data, that it took you three iterations to fix the problem and that you inserted a new problem into the product.
As well, you may have the same request come in from 20 different sites. You don't want your development team going through the 20 different requests, prioritizing and sizing, etc. Perhaps your product is such that most requests are fixed by a field configuration change, such as changing a system parameter, or replacing the batteries. Again, you don't need your development team dealing with these issues.
So treat customer requests differently from development requests. Customer requests are on the outside. The data and reports regarding them should stay on the outside. You may wish to export an internal "reference number" that the customer can push back to you for your own convenience, but better still, have your internal requests, that is, your problem reports and activities, reference the requests they address.
Similarly, when you're talking about requirements, there are the Customer/Product Requirements, and there are the System/Software "requirements" that are generated from the Customer/Product requirements. These need to be separate. Customer/Product requirements, like customer Requests, are external input. They may be negotiated with the customer in terms of weight and priority, but as far as the development team is concerned, that's between the salesman and the customer.
System/Software "requirements" are requirements which are generated based on analysis of Customer/Product requirements. In fact, they aren't requirements at all. They're activities (features, tasks, directives, etc.) that are going to be fed to the development team so that, when executed, the Customer/Product requirements will be met. Such activities are not cast in stone. If a customer needs triple redundancy technology but it's easier to product quadruple redundancy technology, the internal activity can be adjusted to reflect this. As well, whereas product requirements are the property of the customer or market representatives, the activities are the property of the product team. The team must work through a process that spells out the activity objectives and design, ensures it meets the required deliverables, prioritize, size, schedule and track the activities.
These activities are in the realm of product development. They should not be mixed in with product requirement data. They do not have the same structure. The information does not need to be visible to the customer.
Besides, it's just plain confusing when some refers to the feature/work breakdown structure as requirements. Use a different word, and a different data schema for activities (how about "activities" or "specifications").
Although requirements should be separate from "activities" and customer requests should be separate from "activitites and problems", that is not to suggest that they should occupy different repositories. The data traceability capability is required and the more seamlessly integrated the external and internal tracking applications are, the better.
Consider Priority driven (Agile) vs schedule driven releases
Schedule driven releases where everything is planned and properly executed are good. You get what you expect when you expected it. But what if the competition is doing more in the same time. How can you improve your competitiveness. You may wish to look at agile methods, including the use of priority-driven development, as opposed to schedule-based development.
You may also want to moderate this priority-based development. Did you know that 20% of the features take 80% of the time to develop. OK, it may vary for you. But factor sizing into your implementation priorities. Leave the effort intensive features out of the current release and instead put in two or three times as many lower effort features at or around the same priority level. The customer will likely be happier that you provided the 20 low effort features instead of the 7 large effort ones. Of course the 7 are still going to be sore points, so if it's a specific customer you're dealing with, negotiate up front - do you want these 20 or those 7.
And priority-based development doesn't have to exclude schedule-based development. You can have both going on at the same time, or you can alternate release styles.
As well, you can use this strategy to improve product quality. We'll often, especially near the start of a new release, take the approach of giving our development team a week or two to fix as many problems as they can, with instructions to focus on the ones that are easiest to fix and have have the least risk to the product. It's a morale booster and it clears away a lot of items that otherwise would have to be managed and scheduled. Then when it's time to allocate time to problem fixes for the release, a big dent has already been made. You might even want to do this 2 or 3 times during a release cycle, especially after full verification runs.
A Problem is Not a Problem Unless it's in the CM Repository
Switching over to a problem tracking tool from an older one or from a paper-based system, or even from no system at all, can be a challange. People are used to the old way of doing it. My recommendation, start posting metrics on problems, and fixes, but use only data that are provided from the repository. Cut over all of your problem reports from the old way of tracking them to the new system and then issue the statement that "a problem is not a problem unless it's in the CM repository". You'll get a few people up in arms for a bit, but ultimately, you'll have a better flowing system that everyone will appreciate.
Now if you're doing this with customer requests, you'd better be a bit more careful. If a customer has a problem, you want to make sure it gets captured in the CM repository. So you may need to push your support staff a bit to use the new system, or even choose a system where the customers themselves can report the problems with your product.
Prevent Checkout without an approved reason, not Checkin
If you have an integrated tool that can check that the reason for a change is approved, you may be making such a check already everytime someone tries to check in a change. If you are, great. But you can do a bit better too. Try preventing the check-out operation (as well as or) instead of the check-in operation. This way, developers will be aware, before they start their work, that the activity is approved.
SUMMARYOK, that's it for now. I'm sure there are quite a number of other Best Practices that we apply in our shop. But there's only so much ground to be covered in one article. Besides, I think I've put one or two practices in here that may spark some disagreement or discussion. If so, let's here from you. You can post comments against this article at http://www.cmcrossroads.com/.
Joe Farah is the President and CEO of Neuma Technology . Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe by email at firstname.lastname@example.org