CSI Problems

No you’re not on the wrong blog, this is not about the TV show CSI. This is from Mark Minasi’s Windows Networking Tech Page Issue #53 for February 2006, in which he talks about his 28 Rules To Troubleshoot Any Network Problem. Lot’s of these are common sense type of stuff, and things technical people have already covered, but, as Mark says, it’s always good to have a refresher. This will also help some newbies and other people who haven’t learned all of the tricks of the trade yet. As always, Mark is a master of all things windows and never fails to teach me something good in everything I have ever read of his. And this quote below will help explain the title. Oh, and for all of you users out there, you should never assume anything and always try to be as correct as you can when answering questions from the technical support department.

Separate the C and Si Problems
I’ve solved a lot of network problems, but this one was a toughie.

“I’ve got a DHCP server that is delivering IP addresses to two segments. The systems on the same segment as the DHCP server are getting IP addresses with no trouble, but the systems on the other segment, none of them work!”

My first question (and probably yours, if you’re a network techie) is, “does the router between the two segments pass DHCP requests?” (In geek-ese, you may know that the other way to say this is “does the router support RFC 1542 BOOTP forwarding?”) Or alternatively, I ask, “is there a DHCP forwarder on the second segment?”

“Yes,” the person replies, explaining that the router passes BOOTP packets.

Hmmm. So what else might it be? Check IP connectivity — does the router block any particular port? If it’s in a network with an Active Directory and the DHCP server is on a 2000 or 2003 server, has that server been authorized in AD? No port blocks, and yes, it’s been authorized. That’s when I realize that it was a stupid question — if DHCP weren’t working, the first segment wouldn’t have IP addresses. Ah, but what if — a eureka moment! — somehow (1) the DHCP server hadn’t been authorized for the past six days and for some reason all of the systems on the nearby segment still had lease time left but all of the ones on the second segment had their leases run out earlier, and so were the canaries in the coal mine? So I tell the person to try to do an IPCONFIG /RENEW on one system on each segment. The one of the first segment succeeds, the one on the second doesn’t.

Ready for the answer? It’s simple: the guy had no idea what the heck BOOTP forwarding was, figured that his router guys must have allowed for that — after all, they did go to a CCNA boot camp — and just told me what I wanted to hear. In other words, it is always possible that the carbon-based parts of the network (“C” is the symbol for the element carbon) don’t report reliable information, and so the problem lay not in the silicon part of the network (“Si” is the symbol for the element silicon) but in the carbon component. To paraphrase Shakespeare, “the fault, dear Brutus, lay not in the chips but in the people.”

Don’t misunderstand me, I’m not saying that everyone lies or is incompetent. But I am saying that under stress people don’t always think as clearly as they should, and that network support people have had a lot of new things thrown in their laps in the past few years — remember when we “discovered” security in 2001, or that we all need database servers whether we want them or not in 2004? — without receiving a concomitant increase in staffing. We’re all just human. We make mistakes. Think about how we make silicon-based systems more reliable: we cluster them. The same thing works for carbon-based units: more eyeballs looking at a problem often make for a more quickly-solved problem.

And — this is important — remember that we techies tend to think of computer problems in terms of the silicon side sometimes more than we do the carbon side. In fact, sometimes we see the carbon side as being sort of minimal, and only relevant in a few cases. But if you sit back and think about most of the things that you have to fix, you’ll end up seeing that most of those problems have a carbon component that is at least as important as the silicon component. I mean, Trojans don’t write themselves, y’know?

We always referred to them as IO error’s, or Idiot Operator’s, but Mark’s is a lot more “PC”, or politically correct, hehe.