Tuesday, August 30, 2005

Why corporate IT is melting down

This week, Winn Schwartau writes in his Security Awareness blog about why Windows is such a mess and why it has to fail:

http://securityawareness.blogspot.com/2005/08/mad-as-hell-xiii-reprise.html

In a comment on this article, I wrote about what happens when corporate idiocy is then combined with the WinTel problem of cheap PC's and bug-ridden software.

I think this comment is worthy of an article in its own right, so here it is, in an expanded form, since I can write more here than on a comment page.


It is human nature to not want to admit error. It is the nature of bureaucracies to flat out refuse to admit error, no matter what the cost. They would rather run the entire corporation into bankruptcy than do something that would be an admission of error. And this is with good reason. The one who admits a mistake gets blamed for everything that goes wrong, even if the mistake wasn't his decision and even if the things going wrong have nothing to do with the decision. People get fired from their jobs for admitting mistakes. People get blacklisted from whole industries if they admit mistakes in public.

This, in itself, is a disaster that affects most corporations. Now guess what happens when you get an IT department involved, an aging infrastructure, and a budget crunch.

Initially, everything is running smoothly. The corporation is using big iron for everything important. This is probably some combination of mainframes, minis, workstations, etc. PC's are used, but not for anything more critical than as terminals for accessing the equipment in the machine room. The equipment works well. Partly because very expensive equipment is designed better, partly because it is easier to design and test software when the hardware configuration is carefully controlled, and partly because the number of computers is small enough for the IT department to be able to support.

This all works great until the big iron starts costing too much money. Maybe the electric bills are too high (some old mainframes draw a LOT of power!) Maybe some parts have broken and need replacement. Maybe the annual maintenance contracts are getting too expensive. Maybe the manufacturer is dropping support for the old equipment. It could even be something as trivial as needing more hard drives.

At this point, the IT department is doomed. They would like to buy more of the same. Add more memory/disk to the mainframe. Replace one cluster of minis with the newest model. Move to the latest system software. Ideally, they want to keep everything exactly the way it is. But their bosses won't stand for this. They know an upgrade is needed, but they don't want to spend the money on new big-iron. They look through the latest Dell/Gateway/HP catalog and see that PC's cost $500 each, and PC servers cost $5000 each. They order the IT group to replace the mainframes with a network of PC's.

Sometimes, an IT manager can fight this. Most of the time, he doesn't dare. He can be fired and replaced with someone who will tow the corporate line. The decision has already been made, and made by people with absolutely no expertise.

So the PC's are installed everywhere. The IT managers get bonuses for saving money (if they can make their bosses believe the move to PC's was their idea), and the executives consider the case closed. Everybody pats themselves on the back for a job well done (except for the IT people who know exactly what's about to happen - usually the help desk staff.)

Soon, the PC's start failing, or other weird problems start happening. Users have random system crashes. Unwanted programs (spyware, viruses, worms, etc.) start installing themselves all over the place. Users bring programs in from home, even though there may be a policy forbidding it.

The IT help desk does their best to keep everything running smoothly. They patch, clean, upgrade, and reinstall the PC's as necessary. But the problem doesn't ever go away. This is partly because the hardware is cheap junk. Partly because individual (usually untrained) users are doing their own system maintenance (even possibly against corporate policy). Partly because hackers and script kiddies attack Windows far more often than any other system. Partly because the IT staff has not been properly trained to transition from mainframe maintenance to Windows maintenance. And partly because Windows really is very insecure and very expensive to maintain in a large networked environment.

So the users start complaining a lot. The IT help desk gets swamped with calls. There is never enough money in the budget to hire more help desk staff. Help desk staff burn out and quit and have to be replaced with new staff that don't have sufficient training. This forces the help desk to start using handbooks instead of analysis in order to keep up with the calls, degrading the quality of support and making users even more angry.

IT clamps down on security by installing draconian firewalls and proxies throughout the network. They lock users out of their own PC's in order to restrict who upgrades what. They download and test/review every patch from Microsoft and push the updates onto user's computers over the network.

But this isn't fast enough. Soon a virus arrives and trashes the network. It takes weeks to fully recover. Word gets around that Microsoft actually had a patch available to fix the security hole that the virus used, but it wasn't deployed across the corporate network. Because IT hadn't yet tested the patch against all the corporate software. Those users who had hacked their way around IT's restrictions and installed the update anyway, of course, weren't damaged by the virus.

In order to prevent this from happening again, IT turns on Windows' auto-update facility, where patches are automatically downloaded from Microsoft and installed. This prevents a recurrence of the problem, but it also eliminates any semblance of control over the network. IT no longer knows what system software is running on the PC's. Some patches will break applications, and IT won't find out until after users complain about the broken apps.

The situation spirals further and further out of control. Ultimately, the entire IT department is little more than a group of highly paid errand-boys. All of the real system maintenance is being done by the software vendors through automatic updates. The IT people will run cables and replace broken hardware, but they end up powerless to do anything else. The help desk tries valiantly to make the best of the situation, but ultimately, they are powerless to do anything more than chase down symptoms, read scripts, and apologize a lot.

Some people in IT see this happening and they know exactly why. They know that they need to get rid of the PCs and consolidate control back in the machine room. But the reasons for getting rid of the big iron (high cost) still exist, and executives refuse to include in-house support as part of the cost of running a PC-based network. And he who admits an error gets blamed for it. And users won't want to give up the freedom they were given, even if that freedom is clobbering their ability to do their jobs.

And, of course, the executives will say something like "Everybody else has switched over to PC's and they're doing fine, so the problem must be with you and your staff." Completely ignoring the fact that everybody else is also melting down and refusing to admit it.

Which is where we are today.

No comments: