Neuma White Paper:
Integrated Tools Enhance Distributed Development
When I look at the prospect of a distributed development effort, it scares me. So much depends on having the right people, good communicators, in the right places. So much depends on merging of cultures successfully. But more and more distributed development is taking place.
So how do you increase your chances for success? I'd like to focus on four key areas.
- Process and Management
- Communication and meetings
- Reviews and Approvals
- Tools and Data Repository
Why only these areas? Because I feel the rest of it is not a whole lot different than for a single site. Ideally, the goal should be to make these four areas behave as if the distributed development were being done in a single site.
Process and Management
Management deals with process. For certain, having managers which are not territorial and can communicate easily is a key. But having effective, well-defined processes is even more so. Good management happens when good processes are defined and followed. The difference between a poor and a great manager is far less when there are good processes and good tools to support them.
Processes may be tailored to time zone differences, but they should generally be "universal" - that is, they should apply across all sites involved in the development. Universal processes make management easier.
Ideally, your process will work equally well for a single site as for distributed development. A good tool will help you to support and enforce your process. If your process is defined by repository data, you should be able to update your process at all sites at the same time through a normal data update transaction, depending on the type of repository solution you have in mind.
Communication and Meetings
Communication across distance is essential - even if its just to put a voice to a name. If you're doing distributed development these days, for sure you will want some video-conferencing capability, even if it's just from your desktop. Putting a face to a name is better than just a voice. It's important to hear someone's tone. But watching someone's reaction is also a huge part of communcation. If you don't believe me, ask your spouse. For this reason, I'm really surprised video phones never caught on - maybe after the VOIP revolution.
Good telecommunications tools are a big help, but your repository and process tools can also help immensely with communication. If all of your data and your processes are defined through your tools, your communications and meetings can focus on non-compliances and variations. Reports and status information is available before the formal communication begins. Watch your tool selection carefully - it should be the hub of your communications. Track meetings, or store the video and/or messenger clips directly in your tool. Being absent from a meeting is a lot worse across distance, so do all you can to let your participants timeshift their participation.
Reviews and Approvals
Reviews and approvals are a special form of communication. Especially where there is a significant time zone shift, your tools must provide help here. To-do lists, electronic approval, document repositories and change-package based code reviews help to ensure that a review can be done as necessary - even without holding a meeting. Written reviews are always recommended - they are more objective, leave an audit trail, and provide a checklist for post review actions. Even better, they work well within a distributed development constraint. When a change-package based delta can be generated from your CM tool, a single change identifier can be used to identify the code review task. Code reviews can be done on-line by anyone at any time.
Good tools should not eliminate oral communication. Some things will always have to be discussed, even if its just to clear up the mis-interpretations read into the tone of a less objective review. Significant issues, or issues that go back and forth a few times, should be resolved through a meeting or tele-conference.
Tools and Data Repository
Tools interact with the data repository (or repositories). Looking at the previous three areas, it is clear that tools play a critical role in distributed development. The key, from my perspective, is ensuring that all sites, and the tools at each site, have access to all of the data, not just a portion. Data, along with good tools, is the key to making good decisions. (OK, good judgement and experience may play a part as well.)
There are three basic models for distributed development when it comes to data.
1. central repository with distributed access, through the web or otherwise
2. distributed repository with appropriate pieces at appropriate sites
3. replicated repository with local access to the local site.
I strongly recommend the third model. But if your tool sets require a number of different repositories, the recommendation is less strong - it's not easy to build distributed solutions on tools that don't share a common repository. And it's risky.
A central repository has its place, for example, if there are only a few users outside the main site. Solutions involving a thin client (e.g. remote desktop), with workspaces hosted at the main site can work reasonably well. But move the workspace (or the tools), out to the remote site and some delay problems appear. Try doing a workspace differences operation against your repository over a 100 Kbps link as opposed to a 100 Mbps LAN. Not a big problem if your workspace only has a few megabytes in it.
A distributed repository has applications where it is necessary to keep data away from some of the sites. This is the case often in a contractor/sub-contractor model, where the sub-contractor is only supposed to see a specific piece of the puzzle. And it's difficult to decide how to partition the data. What data goes where? How often is it synchronized back to the main site? How do I capture a backup which is consistent?
More often though, we're dealing with organizations which want to share information across sites. Each site needs to make timely decisions, even when the other sites have gone to sleep for the night. In some organizations, the user's workspace is on his/her laptop and the user is moving from site to site.
A replicated repository strategy meets these needs. The repository looks the same from any site. The tools work the same at every site. At a minimum, a master site (or a quorum of sites) is required to sequence transactions so that all sites can ensure that they are processed in a consistent order, with the same results.
A replicated repository has other benefits as well. Each site acts as a backup of all of the other sites. This also forms an important part of any disaster recovery planning. And in the case of a disk crash, or any other interruption of service to the local repository, it's easy enough for a user to connect to a remote repository, even if at a slower speed.
Choice of Tools
When it comes to talking distributed development, there's too much focus on the version control repository. That's important, sure. But it doesn't contain the key information. It's the management information, not the code base, that's going to make distributed development work. It's the process and the communication channels. A version control repository does little to help here.
Management data spans from requirements through test data, from documents through customer tracking data. It's not easy to do this if you're planning on gluing a bunch of tools together, especially if they rely on different repositories.
Editors, compilers, etc., that is, the personal tools, are not so much an issue with distributed development. The key reason is that they generally don't share corporate data, but instead work within the workspace or sandbox.
So what else is there to look out for?
Well, first of all, is there a lot of administration to setting up a new site? If so, something can and generally will go wrong. Not only do you want to have low administration, you also want flexible timing. Can I create a synchronization point on day 1 and not deploy the site until day 14 when I'm physically there? Is it easy to add a new site a few weeks or even months down the road without having to create a new synchronization point? And coordinating complex administration across long distances can be tricky.
Next, does the solution continuously verify synchronization of data? What happens when your repository or tools are upgraded? If someone makes an upgrade mistake at one site, the data might drift out of synch. How soon will you find out? And even more important, how easy is it to recover? It's likely easier if you find out sooner. Because you're dealing with multiple sites, an automatic recovery feature is sure a bonus.
Also consider whether or not your solution requires that each update/transaction is processed at the master prior to slave sites. If so, a large transaction can impose significant delays to the slave sites. The master should only have to order the transactions, and perhaps help ensure they are distributed to all sites for asynchronous execution at each site.
Another factor: How difficult is it to upgrade your repository and tools? If new release upgrades are painful, then a replicated data solution will have painful periods. This is complicated further if your solution is a composite of tools, even if they run on a single repository.
If you identify a single-repository, end-to-end integrated solution with low administration and automatic problem detection and correction, what you'll notice is that distributed development is easier than you thought. Yes you still have to pay attention to management and processes, but your solution gives you significant support here. Yes you still have to ensure you have good communication, but again your solution will give you great support, especially if it supports the review process and electronic approval.
If you have reasonable bandwidth between sites, you may also find that you can connect to a remote site without realizing that you're not connected locally.
A warning though. Don't take technology at face value, especially when it's the face of a salesman. Bring it in to evaluate your distributed development on your local network. If this is a difficult process in itself, then beware. Test out some real use cases, from all sites. Pay attention to administrative needs, to scalability and to reliability. Force errors and check out the recovery procedures. Load in 10,000 files and see what happens.
There are always going to be issues - Very Large Files across slow links, network outages, etc. But in my experience, distributed development with a single repository and an integrated management suite can be pleasant, and is well within the reach of today's technology.
The measure of success is always going to be: How different is this from a single site scenario?
And if you need a recommendation, don't hesitate to ask. I've been using this technology for years and would be more than willing to discuss it with you. Even though I'm biased, that doesn't mean I can't be objective. I'd also welcome your own recommendations and comments as replies to this article.
P.S. Have you take the time look at the Spotlight Reviews?
Joe Farah is the President and CEO of Neuma Technology. Prior to co-founding Neuma in 1990, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto.
You can contact Joe at firstname.lastname@example.org