My PAL with the LASERs

Back in the distant past, MAME started cataloguing programmable logic devices (PLDs) in addition to ROMs. This was met with considerable hostility from certain segments of the community, as it was seen as forcing them to obtain files they saw as unnecessary for emulation in order to run their precious games. However PLDs are programmable devices, and it’s important to preserve them. So far though, the PLD dumps have mainly been used by PCB owners looking to repair their games. The haven’t been used by MAME for emulation. However, PLDs are key to the operation of many arcade games, performing functions like address decoding and video mixing.

One such arcade board is Zaccaria’s Laser Battle, also released under license by Midway as Lazarian. This board uses complex video circuitry that was poorly understood. It includes:

  • TTL logic for generating two symmetrical area effects, or one area effect and one point effect
  • TTL logic for for generating an 8-colour background tilemap
  • Three Signetics S2636 Programmable Video Interfaces (PVIs), drawing four 4×4 monochrome sprites each
  • TTL logic for generating a single 32×32 4-colour sprite
  • A Signetics 82S101 16×48×8 Programmable Logic Array (PLA) for mixing the video layers and mapping colours

On top of this, they decided it was a good idea to use some clever logic to divide the master clock by four when feeding the Signetics S2621 Universal Sync Generator (USG) that generates video timings, but to divide it by three to generate the pixel clock feeding the rest of the video hardware. This lets them get one third more horizontal resolution than the Signetics chips are designed to work with. They need additional logic to line up the pixel clock with the end of horizontal blanking, because the number of pixels in a line isn’t divisible by three, and some more logic for delaying the start of the visible portion of each frame and cutting it off early because they wanted less vertical resolution than the Signetics chips are designed for. It uses an GRBGRBGR colour scheme where the most significant bits are are in the middle of the byte and the missing least significant blue bit effectively always, so it can’t produce black, only a dark blue Was this design effort worth it? I guess they must’ve made some money off the Midway license at least.

Anyway, this game has never worked properly in MAME. It’s always been missing the area and point effects, the colours have always been completely wrong, and the mixing between layers hasn’t properly either. And that’s done inside the PLA. The PLA has 48 internal logic variables, each of which can be programmed to recognise an arbitrary combination of levels on the 16 input line. Each of the internal variables can drive any combination of the eight output lines. The outputs can be configured to be inverting or non-inverting.

In theory this sounds like a job for a ROM, so why use a PLA instead? Well a ROM with 16 input bits and eight output bits would need 64kB of space. Such a ROM would likely have been prohibitively expensive when this game was produced. I mean, its program ROMs are only 2kB each, so there’s no way they’d be sourcing a ROM 32 times that size just for video mixing. The PLA maps the same number of inputs to the same number of outputs with just 1,928 bit of storage, or a little less than one of the program ROMs. It can’t produce absolutely any arbitrary input to output mapping, but it’s more than enough for this application. In fact, it turns out they didn’t even need to use all the space in the PLA.

Read the rest of the post if you want to know more about the process of decoding the PLA bitstream and examining its contents.

The hookup

By examining the schematics, we can see that the PLA’s inputs are hooked up like this:

Bit Name Description
0 NAV0 Sprite bit 0
1 NAV1 Sprite bit 1
2 CLR0 Sprite colour bit 0
3 CLR1 Sprite colour bit 0
4 LUM Sprite luminance (brightness)
5 C1* Combined PVI colour bit 1 (red) gated with blanking (active low)
6 C2* Combined PVI colour bit 2 (green) gated with blanking (active low)
7 C3* Combined PVI colour bit 3 (blue) gated with blanking (active low)
8 BKR Background tilemap red
9 BKG Background tilemap green
10 BKB Background tilemap blue
11 SHELL Shell point
12 EFF1 Effect 1 area
13 EFF2 Effect 2 area
14 COLEFF0 Area effect colour bit 0
15 COLEFF1 Area effect colour bit 1

CLR0, CLR1, LUM, COLEFF1 and COLEFF2 are relatively static bits that the game program sets by writing to I/O ports. The rest of the bits are generated dynamically based on video register and RAM contents.

Streams of bits

The PLA bitstream consists of 48 sets of 40 bits for each of the internal logic variables, followed by a final eight bits specifying which of the outputs should be active low. Each internal variable has two bits controlling how it’s affected by each of the 16 inputs (32 bits total), followed by eight bits specifying which outputs it shouldn’t activate (32+8 makes for a total of 40). If a pair of input bits are both zero (00), it’s impossible for that logic variable to be activated. This is an easy way to spot unused variables, and it’s the default state in an unprogrammed chip. If only the first bit in a pair is set (10), the corresponding input line must be high in order for the variable to be activated. If only the second of the bits is set (01), the corresponding input line must be low in order for the variable to be activated. If both bits in a pair are set (11), the input may be activated irrespective of the state of the corresponding input line (often called a “don’t care” condition).

Finally, MAME expects the bitstream in a file containing a 32-bit integer specifying the total number bits, followed by the bits themselves, packed into bytes least-significant bit first. Armed with this knowledge, we can write a some code that transforms converts the bitstream to an intermediate representation and displays it as C code:

UINT8 const *bitstream = memregion("plds")->base() + 4;
UINT32 products[48];
UINT8 sums[48];
for (unsigned term = 0; 48 > term; term++)
{
    products[term] = 0;
    for (unsigned byte = 0; 4 > byte; byte++)
    {
        UINT8 bits = *bitstream++;
        for (unsigned bit = 0; 4 > bit; bit++, bits >>= 2)
        {
            products[term] >>= 1;
            if (bits & 0x01) products[term] |= 0x80000000;
            if (bits & 0x02) products[term] |= 0x00008000;
        }
    }
    sums[term] = ~*bitstream++;
    if (PLA_DEBUG)
    {
        UINT32 const sensitive = ((products[term] >> 16) ^ products[term]) & 0x0000ffff;
        UINT32 const required = ~products[term] & sensitive & 0x0000ffff;
        UINT32 const inactive = ~((products[term] >> 16) | products[term]) & 0x0000ffff;
        printf("if (!0x%04x && ((x & 0x%04x) == 0x%04x)) y |= %02x; /* %u */\n", inactive, sensitive, required, sums[term]);
    }
}
UINT8 const mask = *bitstream;
if (PLA_DEBUG) printf("y ^= %02x;\n", mask);

Equations

When we feed this the PLA bitstream dumped from a Lazarian board, we get the following output (blank lines added for readability:

if (!0x0000 && ((x & 0x001f) == 0x0001)) y |= 01; /* 0 */
if (!0x0000 && ((x & 0x001f) == 0x0002)) y |= 03; /* 1 */
if (!0x0000 && ((x & 0x001f) == 0x0003)) y |= 04; /* 2 */
if (!0x0000 && ((x & 0x001f) == 0x0011)) y |= 08; /* 3 */
if (!0x0000 && ((x & 0x001f) == 0x0012)) y |= 18; /* 4 */
if (!0x0000 && ((x & 0x001f) == 0x0013)) y |= 20; /* 5 */
if (!0x0000 && ((x & 0x001f) == 0x0005)) y |= 07; /* 6 */
if (!0x0000 && ((x & 0x001f) == 0x0006)) y |= 03; /* 7 */
if (!0x0000 && ((x & 0x001f) == 0x0007)) y |= 04; /* 8 */
if (!0x0000 && ((x & 0x001f) == 0x0015)) y |= 38; /* 9 */
if (!0x0000 && ((x & 0x001f) == 0x0016)) y |= 18; /* 10 */
if (!0x0000 && ((x & 0x001f) == 0x0017)) y |= 20; /* 11 */
if (!0x0000 && ((x & 0x001f) == 0x0009)) y |= 05; /* 12 */
if (!0x0000 && ((x & 0x001f) == 0x000a)) y |= 04; /* 13 */
if (!0x0000 && ((x & 0x001f) == 0x000b)) y |= 06; /* 14 */
if (!0x0000 && ((x & 0x001f) == 0x0019)) y |= 28; /* 15 */
if (!0x0000 && ((x & 0x001f) == 0x001a)) y |= 20; /* 16 */
if (!0x0000 && ((x & 0x001f) == 0x001b)) y |= 30; /* 17 */
if (!0x0000 && ((x & 0x001f) == 0x000d)) y |= 07; /* 18 */
if (!0x0000 && ((x & 0x001f) == 0x000e)) y |= 04; /* 19 */
if (!0x0000 && ((x & 0x001f) == 0x000f)) y |= 04; /* 20 */
if (!0x0000 && ((x & 0x001f) == 0x001d)) y |= 38; /* 21 */
if (!0x0000 && ((x & 0x001f) == 0x001e)) y |= 20; /* 22 */
if (!0x0000 && ((x & 0x001f) == 0x001f)) y |= 20; /* 23 */

if (!0x0000 && ((x & 0x0023) == 0x0000)) y |= 01; /* 24 */
if (!0x0000 && ((x & 0x0043) == 0x0000)) y |= 02; /* 25 */
if (!0x0000 && ((x & 0x0083) == 0x0000)) y |= 04; /* 26 */
if (!0x0000 && ((x & 0x29e3) == 0x01e0)) y |= 01; /* 27 */
if (!0x0000 && ((x & 0x2ae3) == 0x02e0)) y |= 02; /* 28 */
if (!0x0000 && ((x & 0x2ce3) == 0x04e0)) y |= 04; /* 29 */

if (!0x0000 && ((x & 0x08e3) == 0x08e0)) y |= 38; /* 30 */
if (!0x0000 && ((x & 0xffe3) == 0x10e0)) y |= 84; /* 31 */
if (!0x0000 && ((x & 0xffe3) == 0x50e0)) y |= 84; /* 32 */

if (!0x0000 && ((x & 0x0000) == 0x0000)) y |= 00; /* 33 */

if (!0x0000 && ((x & 0xffe3) == 0xd0e0)) y |= 04; /* 34 */
if (!0x0000 && ((x & 0xe0e3) == 0x20e0)) y |= 80; /* 35 */
if (!0x0000 && ((x & 0xe0e3) == 0x60e0)) y |= 40; /* 36 */
if (!0x0000 && ((x & 0xe0e3) == 0xa0e0)) y |= c0; /* 37 */
if (!0x0000 && ((x & 0xe0e3) == 0xe0e0)) y |= c0; /* 38 */
if (!0x0000 && ((x & 0xffe3) == 0x90e0)) y |= 40; /* 39 */

if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 40 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 41 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 42 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 43 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 44 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 45 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 46 */
if (!0xffff && ((x & 0x0000) == 0x0000)) y |= ff; /* 47 */

y ^= 00;

Fortunately this PLA seems to have been programmed manually and not using an automatic logic minimiser. Can you spot the patterns yet? The last eight terms (40–47) have been left in their virgin, unprogrammed states and hence can’t affect the output. Term 33 has been programmed to have no effect on the output, so they’ve used 81.25% of the chip. They could’ve gotten clever and added more logic if they really wanted to, but I guess with everything else they did on the board they were out of tricks by this point. But anyway, on to the equations.

Sprite mapping

The first 24 terms (0–23) only depend on the sprite-related bits, so it’s a good bet they control sprite colours. Let’s make a table of the mappings these terms produce. For convenience we’ll treat NAV0/NAV1 and CLR0/CLR1 as two-bit numbers:

NAV 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
CLR 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
LUM 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
01 03 04 08 18 20 07 03 04 38 18 20 05 04 06 28 20 30 07 04 04 38 20 20

Now the pattern should be really clear. Sprite pixel value 0 produce no output (making it dark or transparent), while the other three values are each mapped to colour depending on the value of CLR. Setting LUM shifts the colour left by 3 bits into the most significant bits of the colour output. This can be summarised in a table with NAV in rows and CLR in columns:

0 1 2 3
0 00/00 00/00 00/00 00/00
1 01/08 07/38 05/28 07/38
2 03/18 03/18 04/20 04/20
3 04/20 04/20 06/30 04/20

Or we could even use human-readable colour names since we know the output format is GRBGRBGR (there’s a distinct lack of green in this palette):

0 1 2 3
0 dark dark dark dark
1 red white magenta white
2 yellow yellow blue blue
3 blue blue cyan blue

So they used half the PLA to handle sprite colour mapping.

Backgrounds and PVIs

The next set of six terms look straightforward enough, let’s make a table and see what they produce (once again we’re treating NAV0/NAV1 as a two-bit number):

NAV 0 0 0 0 0 0
C1* 0 1 1 1
C2* 0 1 1 1
C3* 0 1 1 1
BKR 1
BKG 1
BKB 1
SHELL 0 0 0
EFF2 0 0 0
01 02 04 01 02 04

This shows that the PVIs’ outputs are mapped directly onto the low red, green and blue bits of the output (not actually the LSBs, they’re at the top of the output byte). The TTL-generated sprite has priority over both the PVI outputs and the background tilemap; additionally, the PVIs, the shot point, and area effect 2 also have priority over the background tilemap (since the OBJ/SCR lines from the PVIs are not considered, the PVIs don’t take priority with black object pixels, only with coloured pixels).

Effects

The rest of the terms relate to effects. We can look at them all at once (NAV and COLEFF are two-bit numbers):

NAV 0 0 0 0 0 0 0 0 0
C1* 1 1 1 1 1 1 1 1 1
C2* 1 1 1 1 1 1 1 1 1
C3* 1 1 1 1 1 1 1 1 1
BKR 0 0 0 0
BKG 0 0 0 0
BKB 0 0 0 0
SHELL 1 0 0 0 0
EFF1 1 1 1 1
EFF2 0 0 0 1 1 1 1 0
COLEFF 0 1 3 0 1 2 3 2
38 84 84 04 80 40 c0 c0 40

So what does this tell us? Lots of things! You might notice that the second and third columns can easily be reduced to a single term, and so could the seventh and eighth columns, clearly an oversight but not important since there’s space to spare in the PLA. More seriously, we can work out priorities from the rows:

  • Sprite and PVI output always have priority over these effects
  • Background, shell and effect 2 have priority over effect 1
  • Remember that shell and effect 2 are mutually exclusive, so there’s no priority between them

Now we can look at the output each effect actually produces:

  • The first column says the shell sets the MSBs of all three colours, making it light grey or “dark white”.
  • Effect 1 sets the background colour (depending on COLEFF) to medium blue (3), dark magenta (2), or just on the cyan side of medium blue (1 or 0).
  • Effect 2 sets the background colour (depending on COLEFF) to dark cyan (0), dark magenta (1), or dark grey (2 or 3).

Emulation

So how do we get this information into MAME? Well we could take what we’ve learned and write C++ code to draw the graphics in the appropriate sequence, but that would have several disadvantages. Firstly it’s not how the real machine works — the real machine works by composoing a 16-bit value and feeding it through the PLA to get the output colour. Secondly, implementing it in C++ wouldn’t allow someone to try a different PLA dump to see what effects is has on gameplay. Thirdly, someone couldn’t develop their own PLA program to drop into MAME to play with for homebrew purposes. Instead, we use some more code to turn our intermediate representation of the PLA terms into a 64kB mapping table:

for (UINT32 inp = 0x0000; 0xffff >= inp; inp++)
{
    m_mixing_table[inp] = 0;
    for (unsigned term = 0; 48 > term; term++)
    {
        if (!~(inp | (~inp << 16) | products[term]))
            m_mixing_table[inp] |= sums[term];
    }
    m_mixing_table[inp] ^= mask;
}

Then we render scanlines into a 16-bit bitmap and pass it through this table to convert it to colours for MAME’s video output. (Yes, it’s possible to run the PLA equations on the bitmap data directly from the compact intermediate representation. However it’s slower as it involves several logical operations, and 64kB is a small amount of memory these days for a cached lookup table. However, CPUs are getting pretty fast, and cache miss latency keeps getting worse, so perhaps sticking with the intermediate form wouldn’t be such a bad idea.)

This entry was posted on Tuesday, 15 December, 2015 at 10:54 pm and is filed under Development, MAME, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply