Reliable Processor Design
From GSLISWiki
[edit] Reliable Processor Design
Reliable Processor Design - Smruti Ranjan Sarangi
- Problem: In the field of computer science reliable processor design is becoming an increasingly important issue. The problem is that people don't have proper design guidelines and testing interfaces for processors. This project would look at testing and design frameworks that ensure processor reliability. It would address several problems. The first is collecting all the problems that designers face today. Secondly, it would propose novel testing and design guidelines to ensure a bug-free design. This testing and reliability engineering tool will be called ProcDbg.
- Others done before: no
- Use of Rapid Techniques: The project will start out with a rapid ethnographic study of design groups. It will observe what designers do, how they do it and how are errors introduced into the design. The ethnography would look at log data and other persistent information and analyze it to look for patterns. After that some amount of cultural probing will be done, with devices like scratchpads in software where designers can log their immediate concern. Then the project will involve the creation of scenarios and the first prototype of a tester. This will be shown to people in the field and their feedback will be analyzed.
- Help: Related literature, analysis of the techniques that I am using.
- Personas:
- Joe is a designer in a processor company. He is fairly experienced but believes that the process of design can be further improved. He uses the system developed in this project. He is also very meticulous and makes a lot of comments about the software.
- Mike is a free lance tester. He takes beta versions of processors and tests them. He relies on freely available testing mechanisms and uses them to find bugs in the processors that he works on. This is his hobby, and he is very passionate about it.
- Scenarios:
- Joe starts his typical day designing a processor. But, now his schedule is slightly different. After making a small change he tests the code he has written in the new tester that his company has bought. This tester has very good diagnostic information and immediately points out if there are any bugs in the code. Joe uses that to fix the code he has just written and runs the tester again. This is an iterative process. Ultimately he is able to produce a bug free component of the processor. After that he uses other features of the framework. The framework has a nice user interface. He can use it to zoom into internals of the processor that he is working on and get detailed information. He does that and looks at internal signals. He sees that there might be a reason for a slight performance degradation. He changes the relevant code a bit and solves the problem.
- Mike is a free lance tester of processors. He starts his day by downloading a limited free version of our processor reliability platform. It helps him run tests and debug them. The tool has an interface to a message board, that allows him to interact with other free-lance testers and reliability engineers. They can together focus on problems and discuss several issues that they are facing. He uses the message board and reads posts by other people. This clears a couple of his doubts. He has some more doubts about usage of the tool and he posts them in the newsgroup.
- Gracie is an end user of the processor. She has signed up in a special program called the -- elite user's program. She is entitled to discounts from the company if she reports bugs or other performance degradations that she experiences. She has downloaded a very very limited and free version of the ProcDbg tool. That allows us to analyze and collect detailed dumps of programs that crash because of presumed processor bug. She then collates all that data and mails it to the developers of the tool. The designers of ProcDbg then analyze all this data and hopefully this data will give them an insight into the nature of the bug. Then they use the full version of ProcDbg and find the bug.
- Martha works in Dell, that produces systems based on our hypothetical processor. Dell typically takes components from different vendors and integrates them. It buys processors from Intel, disks from Seagate, Motherboards from Broadcom and so on. It needs to find bugs in the processor as well as bugs introduced in the integration process. Martha then downloads a full version of ProcDbg and buys the associated hardware. She uses ProcDbg for the processor and other tools for other parts of the computer system. ProcDbg has an interface to interact with other testers and reliability platforms for other components. She joins together all the testing interfaces. After splicing the system as well as the testing platforms, she begins her testing. She then uncovers some bugs using ProcDbg. This would not have been possible if the processor would have been tested in an isolated environment. Thanks to the the interfaces of ProcDbg that she used, that could communicate with interfaces in the testing modules of other components.

