Tuesday, August 28, 2007

Application Visibility

So it's your first week at the new job and you are charged with building a new product for one of the largest customers in your organization's history. After a short period of flattery, the gavel that is reality smacks down hard to let you know that your ass is on the line. You have the much needed guidance from your team and extremely competent management that understands software - but ultimately success or failure is at your doorstep.

So you are in pretty deep - what's the plan?

  1. Carefully test your code
  2. Perform code reviews
  3. Put your pride away and ask for help - even if you think you have the right answer you may not!
  4. Be honest
Fast forward to a few days after you have successfully released your application into production. You stuck to the plan: you tested very carefully, you had your peers scrutinize your work, you asked for help, and you reported the state of the project honestly and consistently. The customer was surprised that things went so well. Congratulations, you just did the impossible: exceeding expectations in a situation where things could have easily gone horribly wrong.

After a week in production some issues are discovered. It's your application so you need to fix it. Easy enough right? Just plow through thousands of log entries and piece together what went wrong. Not exactly because you completely neglected to think about what types of problems might occur in production, therefore you are not logging the information needed to investigate the customer's complaint. You have no visibility into your application. The only course of action to take at this point is to retroactively update the code in production to give you the visibility needed to solve the problem.

The mistake was costly in terms of personal pride and hardship, but the lesson is valuable. You learn that there is no way of telling what an application is going to do in a high volume production environment where real data is being passed in. Thus, it is absolutely essential that you build in enough visibility to resolve issues that appear in production. You also learn that nothing of value comes easy.

1 comment:

Anonymous said...

Scary - you just described my past week :) Not at the the-first-week-of-the-job kind-of thingm but the week where some bizarre production issues arise and you have to troubleshoot and then explain what the heck happened.

Added some more logging earlier today...

I will also add that the business context and visibility is also vital. That was one aspect of our system that we didn't fully understand and came to a head with the production issue...