Re: Cinematronics Testing (Was: Re: Speech board idea...)

From: Joseph J. Welser <jwelser_at_crystal.cirrus.com>
Date: Mon Jun 30 1997 - 10:36:29 EDT

> I think this is an excellent idea! The analogy between this and VLSI is a
> very good one, if you can just image this circuit much smaller, you would
> have a C-CPU on a chip. The C-CPU is much like taking a simplified version
> of, oh a PIC processor, and blowing it up to a board level design.

        Sure, this is a microprocessor, plain and simple. If anything it is easier to define controllability and observability points on the CCPU, because if I ever want more fault coverage, I can "break a loop" by just desoldering a chip. In a VLSI microprocessor, once the chip gets fabricated and I realize I need more controllability/observability points, I'm screwed.

        Taking this point to its ultimate conclusion -- I can DEFINITELY get 100% fault coverage on the CCPU board if I desolder and pull enough chips. However, I'm hoping that I can get significantly close to that without having to desolder anything.
 
> The only thing I disagreed on was the 50 percent instruction count on what
> I've found to be typical failures on the board. A single PROM failure will
> always take out more than 50% of the instructions, they interact so closely
> when decoding an instruction my guess would be closer to 90%, depending on
> the PROM. I can think of two of the PROMs, that if they were to fail, would
> take out 100% of the instruction set. But I think we talk of different
> things when we use these percentages...

        I think we're looking at things totally reversed. If a single chip can do so much damage to the system, it will be easily spotted through my strategy. Some problems occur if several single chips can each clobber the CCPU in the same way. I don't think this is the case (most of the PROM outputs go through their own combinational decodes. Things get tricky when outputs of different PROMS, etc get ANDed together, etc. Then, my sort of scheme would be able to tell that (if it observed a bad output of that AND gate) either PROM1 or PROM2 or the AND gate is bad, but it may or may not be able to narrow things down (You're probably sick of hearing me say this, but it all depends on the controllability and observability points...i.e. EXACTLY where they are defined.) BUT, I'm sure anybody who has fixed Cinematronics boards before would very much appreciate something that will return, through 5 or 10 minutes of work (mostly setting up the diagnostics) a half-dozen, or probably f!
 ewer chips that are contributing t

o the problem, rather than having to go through and check for stuck lines, etc on all the chips (now you can just go through and check for stuck lines on the chips that are returned by the diagnostics.)

> I think the only misunderstanding has been what we considered test programs.
> My definition was to: Write some software that by toggling an output line
> could indicate which RAM chip has failed to pass it's test. Then extended
> this methodology to beyond RAM, using a more limited instruction set, until
> there was simply no way for the software to run well enough to indicate
> errors. I believe this sort of software would be very limited in finding
> hardware problems on the CPU card. This is based on the problems I've found
> in the past. With the exception of I/O, I have yet to fix a CPU card that I
> could have written software that would have found the problem.
>
> Where this differs from what you describe is you're probing of "vectors" to
> allow you to observe failure modes that software alone could not detect.
> I'm assuming to "observe" these points you will need more that a logic probe
> with an LED that lights up, and to place data on the "real" inputs, your
> going to need more than a 4.7k resistor pulled to plus 5v.
>
        OK, we WERE totally confused by what we considered "fault" coverage, and my intentions in general...

        I consider a fault to be anything that will cause a digital circuit to fail, most common of these are stuck faults (a line is "Stuck at 0" or "Stuck at 1") or shorts (harder to find than stuck faults, if, say 2 inputs are shorted together and move together)

        It may or may not be necessary to have additional hardware, depending upon what it would cost to make a "diagnostic board." One of my visions was to just buy a big board (roughly the size of a CPU board) and put headers in the exact places where all the points I wanted to control/observe are (i.e. the ROMs, the ALUs, and those DIP shunts....basically all the socketed chips on the board.) Then, there would be some sort of a controller (maybe a PIC, maybe something even simpler) that would inject the proper vector streams (which would be stored in a ROM) into the controllability points, and observe the results at the observability points. The proper vector results would also be stored in a ROM, and would just be compared with the observed results. If any differ, then you know that that section failed. The vector(s) which failed may or may not help pinpoint the problem in that section, so I'd figure out some way to return that info too. Then, I would have a diagnostic ch!
 art for people to look at which wo

uld say that, if test program 1, vectors 7, 14, and 111 failed, to check/replace chip xx. This is sort of like a dedicated Cinematronics test fixture.

        The degenerate case of all this is just using 1 vector, which I alluded to when I spoke of nops and logic probes. If you just put one combination on the inputs/controllability points, you'd get a predicatble combination on the output. You can just probe this with a logic probe, and compare. Your fault coverage from this kind of strategy would probably be depressingly low, though...

> Since I think we were also thinking of different things when we spoke of
> percentage of coverage, I think using your definition I'll bet you can get
> pretty damn close to 100% coverage by choosing your vectors properly.

        Yeah, it all depends on your vectors and the points that you choose (or have available) to "break the loop." The more and more components there are in your little partitions, the more and more faults that can occur, and the harder it is to pinpoint exactly what is causing things to go bad. Although, by basically using common sense, you can choose the proper vectors, ATPG is the way to go. ATPG programs will come up with an optimAL set of vectors (Getting the best possible fault coverage, with the shortest possible vector LENGTH....vector width is fixed at the number of inputs, and coming up with this set of vectors in a reasonable amount of time -- Test Pattern Generation, like almost all other VLSI CAD algorithms is an NP complete problem, so the are all kinds of heuristics so that it doesn't take years to do this....on some of our chips it DOES take as long as a month, but this is for chips with millions of transistors)
 
        Anyways, as excited as I am about this, it's still going to remain on the back burner indefinitely.....I want to get those multi-game sound cards done first! I'll let you guys know if/when I start working on it again.

Joe
Received on Mon Jun 30 07:38:39 1997

This archive was generated by hypermail 2.1.8 : Fri Aug 01 2003 - 00:31:37 EDT