I’ve been working with computers most of my life. My first/only computer course was 1968. For the past 25+ years they have been an integral part of my work life.
Nowadays I wrangle around a dozen machines (see photo) at work which let me produce a forecast and feed it to a bunch of different (buzzword coming) platforms.
Mostly, I get it. I understand how computers work. That gives me a leg up. Often it’s necessary to think along with the programmer to affect a fix.
There are two things which always surprise me.
1) There’s always something that’s not working!
It might be hardware or software or even a bad piece of data which should be a temperature or cloud but ends up being interpreted as a command. The computer stops what its doing. There’s never a time when I can depend on everything!
Google is well known for designing its software specifically to understand hardware will always fail. Those Google guys are right.
2) Computers often continue to work when something’s wrong–though it turns out they’re really waiting to fail at a time less convenient to me!
That’s part of what’s happened today and it’s causing me to tear my hair out.
A hardware failure late last week took out a two hard drive RAID array (two disks which act as one to provide constant backup or, in this case, additional speed). This particular piece of equipment was down for a day while we waited for FedEx to deliver the replacement. No problem. Like Google we understand working around bad hardware.
Once we replaced the drives we had to repopulate them with data. In this case it was an accurate rendition of the Earth’s surface–really. That meant nearly 200 GB of data had to move across our network. It took hours.
By late last Thursday evening we were up and running perfectly. We’d made some accommodations for the new hardware. No sweat.
Saturday was rainy and heavily tested this new configuration which worked nearly perfectly.
It failed this morning!
Why?
Who knows.
What was different between Saturday and today? As far as I can tell nothing!
The point is the computer was working just fine though it obviously wasn’t. There was something still wrong that needed just the right moment… the right set of circumstances… to fail
For whatever reason I was always under the (false) assumption that you needed perfection within these complex system for things to work. Obviously not. And, of course, it makes you wonder what’s next… or if you really can ever fix all the problems.
I’ve still got over two hours of data transfer to go this second time. Time to think about what might be next.