Making Sense of Agile for Management

From my very first corporate software development project, I can remember thinking that there was something profoundly wrong with the way the software projects were managed – trying to set your plan in stone at the front, then manage your work to that schedule just always seemed impossible. It was never clear what value we were getting from our estimates and schedules — they were never accurate until the end of the project, and, as a manager it seemed like inordinate amounts of time went to revising the schedule and to reseting expectations with customers.

At Salesforce.com, we’ve transitioned to Scrum in the last year. I’d dabbled with some of the Agile methodologies in previous jobs, but this is the first time I’ve been part of a complete implementation of the practices. I have to say this is also the first time I’ve ever felt that things we do for project management all make sense and really add value over the course of the project. It makes us better at shipping product.

In the process here, I’ve read several books, but the one that I have found most useful as a manager has been Lean Software Development, by Mary Poppendieck. It goes beyond the individual practices to give a framework for why Agile works and ties the framework back to examples from other companies and industries. I found that it really helped pull the big picture into focus – it helped me understand where to spend my energy during the transition to get the most of the practices.

Zero Tolerance Manifesto

Based on some conversations at my present job, I decided to write up what I learned during my time at BEA. These are the things it takes to keep a large development team productive. It only really works when these things are built into the culture. You will need tools and infrastructure to make this practical, but if you instill this into your culture, your team will build what you need when you need it.

Just to be clear, the most important thing you do is check in new or improved code into the product. The things below are preconditions to checking in code, the things you do to be a professional developer. If you’re doing this right, these things will speed you up, not slow you down. These are not an excuse to decrease your sense of urgency about moving your project forward. Have you checked in code yet?

These are in order of priority:

  1. Don’t f’ing break the build. Ever.

    1. build your changes before you check them in. It’s hard to believe I have to say this, but experience says I do.   I know it can be tempting.  But don’t do it.   Make sure you sync up with product line before you do it — building against a month old copy of source doesn’t really tell you anything.
    2. stick around after you checkin and make sure the group build succeeds. if you broke it, DROP EVERYTHING AND FIX IT. Fix it like your job depends on it. Because it does.
    3. don’t check in to a broken build. the second you check in to a broken build, you become part of the problem — you better start looking for the solution. Everyone hates to wait, but that’s what you should do. Or better, start looking for the person who broke it or his friends and make them fix it. Or better still, figure out what’s broken and propose the fix.
  2. If you want to break the build in the privacy of your own machine, that’s your business. The second you break the group build, the morale and productivity of every person in the product group dives towards zero. Unacceptable.
    Here are the steps you *must* take in order to avoid breaking the build.

  3. Don’t break tests

    1. run the tests for all affected areas and fix any regressions *before* you check in.
    2. check the results of the group tests after you checkin — assume any failures are yours until you explicitly rule it out. If you broke tests, fix them before you work on anything else.
    3. broken tests are not an excuse to ignore the tests. be sure your change doesn’t make the situation worse. if the tests that are broken directly cover the area you are going to be working on, you should probably fix the tests before you make your change. If the product isn’t currently at 100% pass rate, then you’ll have to baseline the product before you make the product and compare it the results after your change. yes, this sucks, but do it anyway.
  4. Tests are the teams safety net. Having all the tests passing all the time makes it dirt simple to figure out if you’ve broken anything with a change. When it’s dirt simple to figure that out, you will have more confidence in your ability to make changes safely — you will write more code and won’t shy away from refactoring or taking on changes in new areas.

  5. write tests. lots of ‘em

    1. write unit tests for the public methods on all your classes.
    2. write functional tests to verify that the key requirements of your feature are working.
    3. write functional test to verify that your dependencies on other parts of the product are still being satisfied. think of this as writing diagnostics — when these tests break, you should know where to look to solve the problem.
    4. when something turns up broken without a test, write one. don’t let the same regression happen twice.
  6. Tests are your safety net, even for your own code. If you want to be productive, having tests that show something is working gives you a great sanity check on your code, but it also pays dividends over the long haul — your tests keep you from breaking your own code.
    Tests are also your shield. If you write a feature and someone later discovers that some part of the feature that didn’t have a test is now broken, guess who’s going to have to fix it? You. Even if it was someone else’s change that broke the feature. The best way to avoid this is to have tests for every aspect of your code that matters. Then it’s everyone’s responsibility to keep that test passing with every checkin and you don’t have to worry about someone else breaking our stuff.

  7. During parallel development, integrate changes early and often

    1. When you fix a bug or finish a feature in version X, integrate it forward to version x+1 immediately (apply rules 1-3 to integrations too). No sense waiting, it’s only going to get harder.
  8. It is inevitable that at some point you will have to work in two branches of the code at once, usually because one release is ramping down while another is ramping up. When this happens, it’s important not to let the backlog of differences between the two versions build up. The sooner you integrate a change forward, the more likely it is to work and the more likely you are to remember what you changed and why.

  9. Fix bugs right away.

    1. When a bug comes in, fix it. Whatever you’re working on, come to a natural stopping point, then set it aside and fix your bug. If a bug turns out to be too large to be accommodated in the slack of your current sprint, put it on the backlog — and do it first thing next sprint.
    2. Leave slack in your sprint commitments to accommodate bug fixing.
    3. Don’t defer bugs indefinitely. If a bug isn’t important enough to fix now, then you should consider closing the bug. This is especially true for large bugs or minor enhancements of marginal value. Keeping it around is a drain on your attention and it invites you to keep other bugs with it. Don’t be careless, but don’t be overly cautious either — if a bugs important, it’ll come back.
  10. It is inevitable that bugs will sneak through even the most exhaustive testing suite. The unfortunate part of this is that these bugs always come up after you’ve moved on to something else. This is not an excuse to tolerate bugs, however. If you put off fixing the bug until “someday”, chances are you won’t fix it or that it will take you a lot longer to fix then than if you just fixed it today.

grid sweet spot?

Grid computing seems to be climbing the hype curve. I can’t claim to be an expert on it, but I have to say it has the smell of improbablility.

Distributed computing architectures are in most cases more art than science. While some problems are easily decomposed into tiny pieces that can be solved in parallel, others are clearly not. Also, often the critical bottle necks are communication, not computation — you spend your time finding the right ways to package up the problem and connect the pieces to avoid moving data around unnecessarily.

You can certainly develop algorithms for distributing the computation that adaptively address these problems; however, any time you ask a computer to do something that has a degree of “art” to it, you have to accept a certain amount of slop.

Grid computing’s sweet spot seems to lie on an improbable location in the terrain — a place where computers are too cheap to be managed directly by people but too expensive to be left idle. The improbability of this location is increased by the as-yet steady march of Moore’s law and the inevitable friction you’d have to accept in the system.

When you add to this the challenges of trying to develop, debug, and manage such a grid, I think you are forced to conclude grid is likely to see more theory than practice.

I think the reason its so popular is that it is technically cool and it appeals to certain ideals widely held by engineers (e.g. efficient use of resources).

tim bray, on search: metadata

Tim Bray has been running series on search that I’ve finally gotten around to reading. Nice series — gives a basic tour of the technology, the hard problems, and one person’s opinions on what works and doesn’t.

I especially liked his article on Metadata. Whether it’s Yahoo’s directory or Google’s PageRank, metadata is what really makes the difference on results. Metadata is hard to come by, so you should take every chance you get to collect it. But don’t expect it to come cheap and don’t expect your users to create it just for you.