More white papers:
Find out more ...
|
|
Neuma White Paper:
CM: THE NEXT GENERATION - Building Reliable Applications
I came
across a paper the other day in "Better Software". It was entitled
"Code Improvement", by Jeff Grover and Zhon Johansen. It's a short but good article focusing on
developing well-designed code. My
favorite point was "Start/Finish each task by refactoring". In other words, stop the entropy of expanding
your software solution by ensuring that the code is as minimal as necessary to
meet the requirements. There are
parallels in the CM world that ultimately lead to a reliable CM Process and
Tool Support capability. In this article, I'll venture where I don't usually go
- inside our own development environment - to bring out a few points that may
apply equally to general development and to a CM environment.
I am
perpetually frustrated by the "big-IT" solutions that come out of the
software industry under the guise of "it has to be big to do all it
does" and "if it's not big it's not a world class solution".
Think about it. A few things will come
to mind. Like some Database Vendors,
Business Process Software solutions, and, yes, CM solutions. In the latter area the industry has seen a
few "small-IT" tools emerge which meet the requirements better than
the "big-IT" tools, at least in some areas for some tools, in a
broader sense for at least one. And better small-IT solutions will emerge over
time across the entire software spectrum.
When I
buy a software solution, I don't want an internal Help Desk, a need for
significant consulting, a huge training bill, a large platform resource requirement,
a solution administration team. I want
easy of use, easy customization, small footprint, zero administration,
reliability and loads of features that meet my requirements, out of the box if
possible. This is certainly true of CM
tools as well.
Architecture
First
First and
foremost, in building reliable applications, these things have to be the minds
of the architects. The software industry aims too low in general. Let's glue things together. Let's copy and paste and change it. We'll optimize it later, or we can use bigger
boxes to run it. Let's get it running
first, then we can look at the features we need.
If you're
in this type of environment, I don't envy you. What's missing is the building
of the underlying architecture. Some
might say architecture doesn't just happen.
But more properly, good architecture doesn't just happen. If you don't build it well, you'll have a
poor architecture.
The
philosophy that says "just make it work" is very pervasive across the
software industry - largely because it's easy to do and there are so many
writing software. And this is not to
mention the schedule pressures to get to market. If you want reliable applications, you need
to have good architects and good architecture.
You have to replace the "pick one of the many ways to do it"
philosophy with the "what's the proper way to do this". And to answer that question, you need to
understand architecture from the ground up and develop the appropriate rules to
guide you. There may be a few in your midst that do - maybe it takes them a
couple of days to rationalize their "gut feel" about something. But it's usually worth it.
Take a
look at various large software systems - how rapidly do they evolve? The ones that evolve slowly, or perhaps not
at all may have maxed out on their architecture (though this is not always the
reason they don't evolve). From the
bottom up, the software must be well architected. If you're cutting and pasting code all over
the place, it becomes much more than a challange to universally improve the
functionality of that portion of code.
Re-Use
After
designing the overall architecture, build fully re-usable libraries/APIs from
the bottom up. Design the interfaces and
review them with system architects for maximum re-useability and for
appropriate usability.
For
example, file systems have always had re-useable APIs, but not useable
ones. "Set this flag on and that
one off" and call it with this action and the routine can do about
anything you want. I want simple to use
routines, and so I always start a major project by introducing a file system
API. It has meaningful routines such as:
- FindFile
- CreateFile
-
OpenFile
-
Seek
-
ReadBytes
-
WriteBytes
-
OpenTextFile
-
AppendToTextFile
-
TextRead
-
TextWrite
-
CloseFile
and then
a bunch of directory functions and a few other file functions in similar vane.
I don't
have to set flags to say that working with binary data or ascii data. I don't have to set modes saying "create
if and otherwise open" or vice versa. I don't have to do different read
operations for text and binary data using the same functions. That's all taken care of and implicit in the
calls I use. Even more to the point, the
functions are easy for anyone to pick up, and make for clearer code
reading. OK, there are a few good file
system APIs out there, but it's just that the universal ones seem to prefer
assembly language level usability. And I
don't mind having the ability to do raw IO on devices. But please don't impose that level on
everyone!
Having
our own API also allows us to easily add functionality to our file system, such
as using "> []" to redirect output to a window. And, it provides a
nice re-usable module which is portable across platforms (by design).
We know
when we start a major project that we'll need to: access the file system, have
a database, deal with storage allocation issues, etc. So we start by ensuring we have the
"best" APIs for these functions - APIs that are easy to use and which
are highly functional. And because
they're re-usable, we benefit more than once.
Not only do we have these advanced, portable modules ready for any
project, but the reliability of them has already been tested out on other
projects.
We don't
stop with a few module APIs, we go much further. For example, at Neuma we have modules for
command line interpreters, name directories, database query languages, macro
definition and substitution, networking, GUI generation, and many more. Most of these were largely developed in the
1980's and 1990's, but they still evolve to support, for example, new GUI
architecture advances. After a couple of
decades, they have become very stable and very reliable.
In
effect, generating these libraries of re-usable APIs allows us to build new
applications very rapidly. We're
basically using a very high level language - even beyond that of a Perl or
Visual Basic. And that's our goal, to
keep making it easier to build new applications, even to the point of exporting
this capability to the run-time environment.
A
Glimpse Inside Neuma
So how
does Neuma do design? Or better yet, how
did Neuma build a "reliable" CM product? What did we do right and what did we do
wrong?
First, we
decided to address areas where we were very strong. These were primarily Database and
Configuration Management, where I as founder had had over a dozen years of
experience in helping to architect a couple of major Telecom products which
were highly successful. Even before we
decided to focus on CM, Neuma began building a Next Generation Database (NGDB)
because we saw a need for it in the market-place and we were good at doing
it. We especially saw a need for a
small-IT database.
We
focused, as we did in the telecom industry where 4 hours down time in 40 years
is acceptable, on reliability.
Reliability has to be planned into a system, by keeping it simple, and
by minimizing the dependence of complex or non-universal OS platform features. We built our own platform-independant APIs
and ensured that they did what they were supposed to do on all platforms. And we made sure that the APIs were easy for
developers to use. We paid attention to
the order, the types and the naming of parameters.
Having
had years of experience in compiler design, database design, operating system
design and configuration management design, we started with a number of design
guidelines - you know, like don't use goto's when coding. But more extensive than this. Like paying attention to how to design each
function of an API - whether to have one function with a lot of options or
several functions with few options, or both, and in what order the parameters
should occur. Like using C pointers only
in a very restricted manner - to pass arguments and to reference entire records
of internal or database data. Like being
consistent across all of our software, especially with coding standards, and
training new employees on the design rules in such a way that they could
understand how each rule helped. Like
establishing good naming standards that helped code readability rather than
hindered it. Like instituting peer code reviews with a focus on design rules
and refactoring as part of those reviews.
Some
might be surprised that on the C-language side of things, we actually
significantly restricted the use of the language. C is too flexible. we replaced the "case ... break"
with a "_case" macro, ensuring the previous case was ended before the
new case. We disallowed loops where the
conditions were at the end of the loop rather than at the beginning - simply
because it makes the code harder to read.
We severely restricted how pointers could be used. Of course we eliminated "goto" and
statement labels (other than for "switch" statements). We eliminated nested functions, simplifying
scoping rules and increasing re-use. We
assumed that #define's would eventually be replaced with real variables and so
named the defined variables contrary to the usual UPPER CASE convention. We
replaced C-Strings with our own descriptor-based strings so that strings would
not have to be copied to point to sub-strings. And so forth - whatever we could
do to reduce errors and improve readability and simplicity.
We
introduced some basic guidelines.
Simplicity is more important that optimization. Keep functions under 50 lines generally. Mixed Case for Global Names, including Global
Variables (with guidelines) and Function and Type Names - no abbreviations
here, unless they were well defined acronyms (e.g. DB), and lower case for
local names, including field names and local variables. As local names had full context within a few
lines, shorter names were recommended to make the code more concise and hence
easier to read. Common local variables,
such as str for a general String, pos for position within a string, i, j, k for
normal arbitrary loop index values, n and count for counts. We focused more on good names for boolean
locals. And we made sure that the names
chosen made the code (e.g. if statement) more readable. This is especially important with booleans,
arrays, and function names.
We
focused on constant code refactoring, because software architecture was
everything. Copy and paste was replaced
with factoring and this not only kept the code minimal, but eliminated the
multiplication of bugs with each copy and paste operation. It further allowed us to improve the factored
code rather than having to try to track down all of the copy/paste locations.
Getting
the Requirements Right
Neuma's
CM product evolved over a period of about 20 years (so far). So how did we get the requirements right, or
did we? Well first of all, the wealth of
large project CM experience helped. We
knew, for example, that change packages (aka "updates") had to be at
the center of any solution. We also knew
the value of seamlessly integrating applications and the derrived
benefits. But it was still not
easy. For one thing, GUIs were rare and
evolving in their infancy 20 years back. Networking dealt with connecting mainframes
together (except in the Unix world), and not users. And our CM tool experience was based on fixed
company requirements, once for an IBM mainframe supporting hundreds of users,
and once for a small network of DEC Vax mainframes supporting a couple hundred
users. Keeping the command line
interface simple was important. So was
the architecture of each language we were using, in both cases proprietary.
The
focused in-house requirements gave us a tremendous blind side, especially
because Windows wasn't mainstream for development, and Unix was just expanding
its foot-hold. There were no File System
standards to adhere to (i.e. make the design architecture mimic file system architecture). As a result, our first releases of CM+
focused on a complex Folder/Module/Section paradigm, where each Module, which
shared a common base name, was composed of several Sections, identified by the
file Suffix. For example a C-module had
a .h and a .c component, and in our case a .x component as we preferred (and
still do) to keep "externals" separate from all other header
definitions. An Oracle form had a
different set of sections. An Assembler
language module had a .inc and a .asm component.
Although
the product let you define your own module types (called groups), and this was
a selling feature for some shops, we soon realized that every shop worked
differently - and it wasn't always easy to package things into neat module
groups, especially because the groupings overlapped. As a result, it became nearly impossible to
automatically load in a new project.
Even if all of the module groups were defined up front, when the system
encountered an overlapping section (i.e. one that could be part of several
different groups), it didn't know which group to assign to it.
As well,
in the old days, apart from Unix, executables were often built from all of the
files in one directory. There was no
overlapping name space and it was easy to go from a file name to exactly what
executable, or executables if it were shared, it belonged to. This flat name
space made things very easy and a few of the older CM tools adopted it. But in the end, as the Hierarchical File
System took precedence and users wanted the same name in different products,
and, especially with O-O Design, the same name in different subsystems of the
same product, we had to admit that our design was inadequate.
Our first
attempt to fix the problem was to allow a flat name space per product. But this was inadequate. This resulted, in the mid-1990's, in Neuma
having to, not only completely re-do it's product file-handling architecture,
but also in having to improve its context specification ability. In a flat name space, some aspects of context
aren't as important from a file management perspective. In a hierarchical,
product-based, overlapping name space, it was crucial. Furthermore, through all of this, we had to
ensure that our existing customers would have the option of continuing with the
flat name space or moving to an overlapping name space.
The point
is, it was not easy to get the requirements right. And requirements continue to evolve. So what's the solution?
Solution
Architecture
One of
the main reasons we were able to weather the storm is that we focused on
architecture at different levels. We did
not need to know what the CM requirements were to understand what a NG Database
must look like to support a general set of engineering applications.
On top of
that, we knew from the start, that automation and zero administration were
important goals. Even after completing
the initial NGDB architecture, we took the time to understand what potential
clients said was number one in our target market requirements - customization -
making the tool work the way the customer wanted. This molded most of our efforts beyond the
NGDB design. We would seriously consider
whether or not customization would be required for each feature and err on the
"yes" side. But we would also
consider how to build an architecture that was easy to customize.
And when
GUIs came along, this became a priority as well. If every site had different
customizations, we did not want to get into customers having to paint forms,
create dialogs, etc. We wanted the tools
to do the tedious work, while the customer just identified what information was
desired.
In fact,
with each release, one of the largest components of the release is to support
customization more easily and more widely.
If it's easier for the customer, it's easier for us to sell and to
support. So the business case for this
effort is easy.
At the
same time, we would not compromise on reliability. This meant simplicity where possible,
especially when interfacing to outside elements. A multiple site solution has to interface
with outside elements so must be kept simple if automation is to result. An automatic baseline capability is anything
but simple, by definition, but does not have to interface to outside elements,
as long as all of the information is in the CM repository.
Aim
High
It's
complex, and yes, gut-wrenching, to bite off more than you can easily
handle. But if you don't bite off
enough, you pay for it later. The single
biggest problem with the software industry, from a solution perspective, is
that it rarely aims high enough, with noteable exceptions. It says, this what we know how to do so we'll
provide this bit of functionality.
Eventually the architecture gets maxed out.
Instead,
identify what the solution needs are:
like zero-administration, small footprint, etc. and make your architecture stick to it. In the end it will pay for itself. We get people asking frequently "How can
you fit so much in so small a package".
It's because we aimed to support a full ALM solution from the
start. We didn't have to build 10
solutions and then add additional glue to partially integrate them. We identified from the start that an ALM
solution was necessary, and that we didn't understand where the boundaries of
ALM would eventually end up.
If you
hire different people to build the different walls of your house, it will cost
more, there will be more glue, it will take longer and your overall structural
integrity will suffer. Identify the
common properties and make the tools to build all of the walls the same (dry
wall, aluminum beams, insulation type, processes, etc.). Then it's easy to look at the structural
integrity up front.
Don't try
to build reliability into a software product after all the pieces have been
built - it won't happen. The increase in
complexity will negate any attempts at reliability. And so too with the other attributes: We want the product to be small-IT in the
end, not BIG-IT.
What
About CM
So there
are some lessons in the CM world - CM vendors take note. These are simply basic requirements. The ALM applications have to work
together. We want easy-to-use
low-training applications. Zero
administration. Small footprint. Easy
branching. Easy baseline definition.
Change-based management. Support for
development streams, and in general support more for the normal way of doing
things - make that easier than the exceptions.
Support for Multiple Site operation.
Easy backups. High reliability
and availability. Easy to customize,
extensively if necessary, while eliminating the necessity as much as possible.
And so forth.
It's not
sufficient to look at a piece of the solution and apply good architecture to
each piece. It has to be a solution-wide
effort. Making Multiple Site operation
work for the files in the solution does not give me a Multiple Site ALM
solution. Consistent backups for the
file repository does not give me consistent backups for the entire ALM
repository. If even one piece has high
administration, the solution will appear to have high administration. The same goes for reliability - the weakest
link will be the most noticeable.
The first
two generation of CM tools did not abide by these lessons. The next generation tool must, or it won't be
considered a third generation tool. The
good architecture of a tool will be much more apparent in third and fourth
generation CM tools - at cost, risk, roll-out time, resource requirements,
reliability, and accessibility levels, to name a few.
To Sum
Up
So if you
want to build better applications, and applications that are more reliable:
Understand your requirements well, and expect them to change
Aim
High, not only to meet, but to exceed your requirements
Put
Architecture first, at the solution level and at the design level, using
experienced architects
Generate re-usable components that will grow in reliability over time, and
learn how to build APIs that are widely re-useable
Train
your development team on your architectural requirements and guidelines, and do
peer reviews against your guidelines
Understand the processes surrounding your application, and improve on them
Yes,
you'll still have to document your product requirements well, especially so
that you can verify them against the candidate releases. But you'll also be able to better withstand
the storms of changing requirements, evolving standards and competition.
I'm sure
I've only scratched the surface, but maybe I've also ruffled a few feathers -
Let's hear from you.
Joe Farah is the President and CEO of Neuma Technology . Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe by email at farah@neuma.com
|