Specialists at JPL are now confident that that the Spirit Mars rover can be restored to full health soon. The issue was the contents of Spirit‘s 256MB flash memory system (esentially the same kind of memory used on Earthbound digital cameras). The operating system for the Rover was rebooting because the amount of files in the flash memory were taking up so much memory that the rover couldn’t perform normal functioning.
It is hoped that by deleting unnecessary data from the flash memory, the system will stabilize and Spirit can resume normal operations once again. The rover has indicated that it is still in the position that it was when the problem began several weeks ago, and all other systems appear to be functioning normally.
It’s difficult enough to diagnose a computer problem like this when you’re sitting in front of it – no less when that computer is on another planet and there’s ten minutes of lag between the two. The people at JPL did a fabulous job of diagnosing and fixing the problem – they represent American ingenuity at it’s best.
I totally disagree with your conclusion, “this is American ingeniuty at its best.” Major failures like this are completely unacceptable in an $820 million dollar project. The engineers in charge of this project should know every aspect of the rover. They should know exactly how much memory is used at all times. They should have tested it extensively before it left earth. They should have designed backup systems, and then backup systems for the backup systems. Moreover, the rover should not have experienced any failures in the first place. The software controlling memory storage should never have been allowed to save so much data that the rover no longer had enough memory to function. This is basic engineering and basic programming. In addition, have these engineers never heard of six sigma? The probability of failure should be so small that it simply would not happen. I could see these mistakes being made by a college club, but professional engineers should be held to a much higher standard.
I’d tend to agree, but the team had a very limited testing window for something like this, and the OS they were using (VxWorks) has been used before in deep space missions.
Things will always go wrong, even with all the testing that was done. The fact that JPL could diagnose and fix a problem like that is still amazing to me…
They should have designed backup systems, and then backup systems for the backup systems.
Uh, they had them, dude. Total redundancy of all systems. Maybe you heard about it? It’s called “The other rover.”
All they have to do is avoid allowing Opportunity‘s flash memory from filling up by deleting older files rather than keeping them – thankfully that’s a pretty easy fix to keep the problem from happening again.
What I would find difficult to accept would be if this was the same problem that Pathfinder had – see http://www.cis.ksu.edu/~hatcliff/842/Docs/Course-Overview/pathfinder-robotmag.pdf