More white papers:
Find out more ...
|
|
Neuma White Paper:
CM: THE NEXT GENERATION of Metrics
I find it rare to see
a project with too many metrics. Quite the opposite is usually the
case. The problem is that establishing metrics is a lot more work than
identifying what you want to measure. It's hard to collect the data.
It's hard to mine the data, and then to present the comparisons and
trends which are the metrics is no simple feat. However, as CM tools
move more and more into the realm of ALM and beyond, we'll see some
changes. As integration across the lifecycle becomes more seamless,
the focus will move to dashboards that not only provide standard
metrics, but which are easily customized to mine and present the data
necessary.
The goal is to have a rich collection of data and an
easy way of mining the data to establish the metrics for those measures
deemed important to process, team and product improvement. When you
measure something and publish the measurement regularly, improvement
happens. This is because a focus is brought on the public results.
However
the interpretation of the measure must be clear. For example, are
glaciers falling into the ocean a sign of global warming (weakening and
falling) or global cooling (expanding ice pushing the edges into the
ocean)? A wrong interpretation can be very costly. So a good rule is
to understand how to interpret a measure before publishing it.
Looking
at the software world, is the number of lines of code generated per
month a reasonable measure? I want my team to create a given set of
functionality using the minimal number of lines of code. So larger
numbers are not necessarily a good thing and may be a bad thing. Lines
of code generated is not a good measure for productivity. However,
averaged over a project, and compared from project to project (using
similar programming technology), it is a fair measure of project size
and is also useful for overall "bugs per KLOC" metrics.
The
point here is that you must select metrics that can clearly be
interpreted. And when you publish these regularly, the team will tend
toward improving the metric because they understand the interpretation
and how it ties back to what they're doing.
Creating Effective Metrics - What's Needed
I
don't intend to go into detail about which metrics I find useful and
which I don't. If you wish, you can view my article from a couple of
years back http://www.cmcrossroads.com/content/view/6717/259/ .
Instead I want to focus on how to build an infrastructure so that you
can easily generate metrics and test out their effect on your project.
The idea is that if it is easy to create metrics, they will be used to
improve your process, team and product.
So what do we need?
1. A Single Next Generation Repository Across the Life Cycle
I'm
sure there are a lot of you that can balance multiple repositories and
do magic to get data from one to another, knitting multiple data
sources together into a fountain of data. But no matter how you cut
it, there's no replacement for a single repository that can hold the
data for your entire application life cycle.
I don't really
want to mix transient data, such as compile results, with persistent
data, such as management information. But I want all of the persistent
information in one place, easily accessible. I want the schema for
this data under control of a small team or a single person who
understands that the entire team needs all of the data, and who has a
thorough understanding of the ALM processes.
I don't want to
jump through security hoops as I hop from one database to another. I
don't want to put up with server problems and network issues because my
data is distributed through multiple repositories. I don't want to
learn different access methods for each application. I want a single
uniform way to access my data. Maybe it's relational. Even better is
a next generation hybrid repository which provides both relational and
data networking so that I don't have to map from the real world model
to the relational and then back again to retrieve the data. And, this
being a CM world, if it support data revisions (not just source code
revisions), all the better, with perhaps some advanced large object
handing for things such as source code, problem descriptions and
requirements.
2. Data mining capability
OK. Now
that I have all that data, all in one place, how do I get at it. I may
have tens of thousands of files, hundreds of thousands of file
revisions, thousands of problem reports, work breakdown structures,
requirements trees, test cases, documents, etc. Each has it's large
object components and each has its data component (I don't like using
the term meta-data here - that implies that the data is used to
describe the large objects rather than the large objects having the
same rank as the rest of the data for an object.)
So how to make
sense of it all. You need a data mining capability - a powerful data
query language and tools to explore and navigate, to help give hints on
how to mine the data. My job will be simpler if the data schema maps
onto my real world of CM/ALM data and my query language lets me express
things in real world terms: staff of <team member>, problems
addressed by changes of <build record>, testcases of
<requirement>. Even better, it will let me aggregate these
things: testcases of <requirements tree members>, files affected
by <changes>, etc. Then I don't have to iterate over a set of
items.
A good data mining capability requires:
- a high performance data engine
- a next generation query language
- a schema mapped closely to the real world objects
- a data summary presentation capability that can be used to zoom in to details from a higher level data summary
If
I have a high performance data engine, I can work through the process
without having to wait around and lose my train of thought. Better
yet, I can have point and click navigation tools that let me navigate
data summaries instantly, crossing traceability links rapidly. I can
analyse the data between the right-click and the pop-up menu appearing,
and select the appropriate actions to display.
The query
language must let me work with data sets easily, including the ability
to do boolean set algebra and to transform data from one domain to
another along the traceability (or other reference) links (e.g. from
"changes" to "file revisions"). In the CM world, I need to be able to
do some special operations - that is operations that you may not
normally find in a database. These include (1) taking a file (or file
set) and identifying an ordered list of the revisions in the history of
a particular revision, (2) taking a set of changes and identifying the
list of affected files, (3) taking the root of a WBS or of a source
code tree and expanding it to get all of the members, etc. The query
language must also be sensitive to my context view. A change to a
header file might affect a file in one context, but not in a different
context. A next generation CM-based query language must be able to
work with context views. It must both work within a consistent view,
and between views, such as when I want to identify the problems fixed
between the customer's existing release and the one that we're about to
ship to them.
And the schema must map closely to real world
objects. I'd like to say: give me the members of this directory, and
not: go through all objects and find those objects whose parent is this
directory. I want to ask for the files of a change, not all files
whose change number matches a particular change identifier. I want to
be able to ask for the files modified by the changes implementing this
feature - not the more complex relational equivalent. And I would
prefer not to have to instruct the repository how to maintain inverted
index lists so that the queries go rapidly.
A data summary
capability should allow me to easily select and summarize the data. For
example, graph problems for the current product and stream showing
priority by status, and let me zoom into the most interesting areas at
a click of the appropriate bar of the graph.
3. Easily customized metrics with dashboard presentation
Given
a wealth of data and a means of mining that data, the next step would
be to provide an easy way to express and to customize the metrics I'm
interested in tracking. This generally requires some calculation and
presentation specification language that can use the data query
language to mine the data and compute the desired metrics in a manner
that can be presented for comparison over specific points in time or
along a continuum.
Depending on my role in the project, I may
want a specifically designed dashboard. If my goal is to monitor
product quality perhaps I want to look at metrics such as:
- the number of problems introduced per 100 changes within a product development stream
- the number of problems raised each week for a given product development stream
- test suite completion and success rates over time for a given product development stream
If
I'm a project manager, my mix will be different. The point is that I
don't want a few pre-canned metrics - I want the ability to easily
define which metrics are necessary for me at any one time. Ideally I
can sit down and invent a metric on the fly, interactively, as a result
of a trend that I've noticed over the past few weeks - and I can turn
it on or off as necessary. Metrics are useful for evaluating product
quality. They are also useful for evaluating process quality. What
should happen when we improve the process in this way? Let's measure
and make sure it does happen.
Metrics are also useful for
evaluating teams. This is a scary one. Nobody wants their weaknesses
pointed out. "Sure we have twice as many bugs, but we do three times
as much work." How do you establish fair team metrics? This is where
interpretation is tricky. Do you want more changes from a team member
or fewer - what does it indicate? More functionality completed?
Additional rework frequently happening? Incremental feature development
versus big bang? A lot of the interpretation will depend on your
process guidelines and procedures. But a lot will simply reflect the
different work habits of your team members.
So perhaps you want
to overload metrics here and over time prune out the ones that aren't
really useful or yield ambiguous interpretations. When your metrics
can be easily specified and collected into a dashboard or two, you'll
start asking better questions and hopefully will improve your decision
making. Maybe it will be as simple as noticing that 20% of the staff
disappear at March break so that better be accounted for in the
planning phase.
4. CM processes and applications that can reliably collect data
Remember
that one of the key problems in establishing metrics is
interpretation. For good interpretation, you need to collect
meaningful and reliable data, and you need to have a yardstick against
which the metrics can be interpreted. This is where the CM tools and
processes must come to the rescue. A version control system that
tracks all the file revisions and marks each one with the user,
timestamp and description will not give me a lot of metrics, and the
ones it does give me will not only be difficult to mine, but will be
difficult to interpret.
The CM process will demand traceability
- from the requirements, through the tasks, to the software changes, to
the builds and onto verification. Your CM tools must be able to handle
these processes and must be able to collect data in a reliable manner.
If your CM tools/processes dictate that you should not check in your
source code until the build manager requests your changes, then it's no
use trying to use the CM tool to measure how long it takes to complete
a feature implementation. The completed code could be sitting on
somebody's personal disk for weeks. If your CM tool is a file-based
tool as opposed to a change-based tool, don't expect it to give you any
metrics with respect to logical changes. If your data is stored within
the version control file, don't expect to be able get at it easily.
If
your processes don't require referencing an approved feature or
accepted problem report as part of starting a new software change,
expect a lot of variation on the traceability metrics. If it's not
easy to search for potentially duplicate problems in your repository,
expect a lot of duplicate problem reports, which could further skew
your data.
It's one thing to say process first then find the
right tool, but perhaps its better to say give me a tool that will both
do the job and help me to define and continually refine my process so
that the process decisions are enforced as they are made. Better yet,
give me that same tool, but with a strong process supported out of the
box. And for certain, customization must be easy to do so that the
experts who want the metrics don't have to run to a technical resource
to have them supported.
Metrics - What's Useful, What's Overhead
So
when I'm developing metrics for a project, what's useful and what's
overhead? The answer depends on numerous variables which are
constantly changing. Define the metrics you might find useful. Create
a large list if you want. Then turn the presentation of them on and
off as required. Group them into dashboards according to role. Add to
the list as you discover additional problems or trends.
Make
sure that the data you're using is reliable. Make sure that the
processes support gathering good data. Make sure that the
interpretation of the data is unambiguous. Make the important metrics
widely visible, especially if the team members performance can affect
the results. Even consider competitive metrics that pit one part of
the team or one project against another.
Good process, good tools, good presentation. And meaningful metrics. These will make your metrics useful.
Joe Farah is the President and CEO of Neuma Technology . Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe by email at farah@neuma.com
|