Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple II/II+/IIe floating bus problems #1180

Closed
ryandesign opened this issue Oct 25, 2023 · 0 comments
Closed

Apple II/II+/IIe floating bus problems #1180

ryandesign opened this issue Oct 25, 2023 · 0 comments

Comments

@ryandesign
Copy link
Contributor

ryandesign commented Oct 25, 2023

I've found some bugs in the way that Clock Signal implemented the Apple II floating bus in #425.

Here is the 2012 "rainbow" sample program that uses the floating bus to synchronize lores / hires mode switches between scan lines to produce hires rainbow-colored lines:

https://web.archive.org/web/20151021120119/http://hoop-la.ca/apple2/2011/rainbow/

You can load it into memory by starting a new Apple II+/IIe emulator, pressing F12 to reset and go to the Applesoft BASIC prompt, and pasting this in:

1FORI=768TO792:READN:IFN<8THENPOKEI,173:POKEI+1,80+N:N=192:I=I+2:DATA3,4,7,234,0,208,251
2POKEI,N:NEXT:HGR:HCOLOR=4:HPLOT0,0:CALL62454:Y=63:FORI=0TO4:READK:GOSUB4:GOSUB4:NEXT:HCOLOR=0:Y=123:FORI=0TO3:HPLOT0,YTO100,Y:Y=Y-1-2*(Y=122):DATA6,160,22,136,208,253,240,237,2,6,1,0,5
3NEXT:GR:COLOR=13:HLIN0,39AT14:FORI=0TO7:POKE9200+I,0:POKE13168+I,0:NEXT:COLOR=1:HLIN0,39AT13:HOME:PRINTTAB(16)"RAINBOW":PRINT:PRINT"MIXED GRAPHICS (HI-RES/COLOR)":CALL768
4HCOLOR=K:HPLOT0,YTO279,Y:Y=Y-1:RETURN

Run it with:

RUN

The original longer and less-obfuscated version of this program is here:

https://macgui.com/usenet/?group=2&id=22342#msg


Here is what it looks like on my real unenhanced Apple //e connected to a Toshiba CRT TV:

rainbow

The CRT makes it a little hard to see but there are six solid bands of color in red, orange, yellow, green, blue, and violet. There is no flickering.

It also looks correct in OpenEmulator 1.1.1-202203110628 and Virtual ][ 11.4.


Here is what it looks like in Clock Signal 2023-09-10:

rainbow

The red and blue lines are totally missing and the yellow line is flickering. Something is amiss!

(The problem that the lines are slanted instead of horizontal is #1173.)


The machine language program that the rainbow BASIC program pokes into memory starting at $300 is as follows:

0300-   AD 57 C0    LDA   $C057
0303-   AD 53 C0    LDA   $C053
0306-   AD 54 C0    LDA   $C054
0309-   AD 50 C0    LDA   $C050 ; 4 cycles -- reads the "floating bus"
                                ; value--the byte last read by the video
                                ; refresh controlled by the video scanner.
030C-   D0 FB       BNE   $0309 ; 3 cycles if not $00, keep reading the
                                ; "floating bus" -- zeroes are strategically
                                ; placed within the hi-res screen
030E-   AD 56 C0    LDA   $C056 ; 4 cycles -- switch to lo-res
0311-   A0 16       LDY   #$16  ; 2 cycles -- Y = 22
0313-   88          DEY         ; 2 cycles times 22 = 44 cycles
0314-   D0 FD       BNE   $0313 ; 3 cycles times 22 + 2 cycles = 68 cycles
0316-   EA          NOP         ; 2 cycles -- 4 + 2 + 44 + 68 + 2 = 120
                                ; cycles, long enough to display horizontal
                                ; lines of lo-res two pixels high
0317-   AD 57 C0    LDA   $C057 ; switch to hi-res
031A-   4C 09 03    JMP   $0309 ; do it all over again

It relies on the fact that the hires screen it sets up doesn't contain any zero bytes except for where it wants to switch to lores mode.


Bob Bishop's 1982 article Have an Apple Split presents BASIC programs that prepare the hires screen with special bytes and generate a machine language program that samples the floating bus as fast as possible (one sample every eight cycles), recording the results in memory for later analysis. The preparation program puts $00 into every byte of hires page 1's 0th row, $01 into every byte of the 1st row, all the way down to $BF in the 191st row, and puts $C0 through $FF into every byte of the rows of screen holes.

I rewrote the two programs in assembly and combined them and made the sampling portion wait until the vertical blanking interval begins when running on a //e so that the sampled data always begins in approximately the same place (but not exactly; there's a seven-cycle jitter in my naive VBL detection method so since we sample every eight cycles results on repeated runs may be positioned one location off).

Here is the source code of my program.
; SPDX-FileCopyrightText: © 2023 Ryan Carsten Schmidt <https://github.com/ryandesign>
; SPDX-License-Identifier: MIT

;save as fbtest.s and assemble and link with:
;cl65 -t apple2 -C apple2-asm.cfg --start-addr 16384 -u __EXEHDR__ -o fbtest fbtest.s

GBASL       =     $26           ;graphics base address low byte
GBASH       =     $27           ;graphics base address high byte
A1L         =     $3C           ;general purpose A1 register low byte
A1H         =     $3D           ;general purpose A1 register high byte
A2L         =     $3E           ;general purpose A2 register low byte
A2H         =     $3F           ;general purpose A2 register high byte
A3L         =     $40           ;general purpose A3 register low byte
A3H         =     $41           ;general purpose A3 register high byte
A4L         =     $42           ;general purpose A4 register low byte
A4H         =     $43           ;general purpose A4 register high byte
HGRPAGE     =     $E6           ;hires drawing base address high byte
RDVBLBAR    =   $C019           ;vertical blanking flag
TXTCLR      =   $C050           ;graphics
TXTSET      =   $C051           ;text
MIXCLR      =   $C052           ;no split
MIXSET      =   $C053           ;split
LOWSCR      =   $C054           ;page 1
HISCR       =   $C055           ;page 2
LORES       =   $C056           ;lores
HIRES       =   $C057           ;hires
IDBYTE1     =   $FBB3           ;machine identification byte 1
IDBYTE2     =   $FBC0           ;machine identification byte 2
NXTA1       =   $FCBA           ;increment A1 routine
IDROUTINE   =   $FE1F           ;machine identification routine
MONZ        =   $FF69           ;monitor warm start without bell

ROWS        =     192           ;number of hires rows
HOLESTARTROW=     128           ;first hires row that's followed by holes
BYTESPERROW =      40           ;bytes per row
HOLESPERROW =       8           ;holes per row for rows that have them
PAGE1H      =     $20           ;hires page 1 base address high byte
DATA        =   $1000           ;generated data start address
DATALEN     =    $800           ;generated data length
DATAEND     =   DATA+DATALEN-1  ;generated data end address
OPLDAABS    =     $AD           ;lda (absolute addressing) opcode
OPSTAABS    =     $8D           ;sta (absolute addressing) opcode
OPRTS       =     $60           ;rts opcode

.proc main
            bit MIXCLR          ;no split
            bit LOWSCR          ;show page 1
            bit HIRES           ;hires
            bit TXTCLR          ;show graphics

            lda #PAGE1H         ;load hires page 1 base address high byte
            sta HGRPAGE         ;store it so we draw on page 1
            jsr hiresfill       ;fill hires screen

            lda #<DATA          ;load data start address low byte
            sta A1L             ;store in A1L
            lda #>DATA          ;load data start address high byte
            sta A1H             ;store in A1H
            lda #<DATAEND       ;load data end address low byte
            sta A2L             ;store in A2L
            lda #>DATAEND       ;load data end address high byte
            sta A2H             ;store in A2H
            lda #<prog          ;load program start address low byte
            sta A3L             ;store in program start address low byte
            lda #>prog          ;load program start address high byte
            sta A3H             ;store in A3H
            jsr genprog         ;generate program to sample floating bus

            jsr runprog         ;run generated program

            bit TXTSET          ;show text
            jmp MONZ            ;go to monitor
.endproc

;fill the hires screen: rows filled with 0-191; holes with 192-255
;input: HGRPAGE = high byte of the page address
.proc hiresfill
            ldx #ROWS-1         ;load y coordinate into X
@rowloop:   txa                 ;transfer y coordinate to A
            jsr hiresrowaddr    ;get memory address in GBASL,GBASH
            txa                 ;transfer y coordinate to A
            ldy #BYTESPERROW-1  ;load byte offset into Y
@byteloop:  sta (GBASL),Y       ;store y coordinate in screen byte
            dey                 ;decrement byte offset
            bpl @byteloop       ;loop for each byte
            cpx #HOLESTARTROW   ;check if y coordinate is row with holes
            bcc @nextrow        ;no holes on this row
            adc #63             ;update value to store in hole
            ldy #BYTESPERROW+HOLESPERROW-1
                                ;load byte offset into Y
@holeloop:  sta (GBASL),Y       ;store value in screen hole
            dey                 ;decrement byte offset
            cpy #BYTESPERROW-1  ;check if byte offset reached the end
            bne @holeloop       ;loop for each hole
@nextrow:   dex                 ;decrement y coordinate
            cpx #$FF            ;check if y coordinate reached the end
            bne @rowloop        ;loop for each row
            rts                 ;return
.endproc

;compute the address of the hires row
;based on the first part of HPOSN in the Apple II+ ROM
;input: HGRPAGE = high byte of page address, A = y coordinate
;output: GBASL,GBASH = row address
.proc hiresrowaddr
            sta GBASH           ;save y coordinate in GBASH
            and #%11000000      ;retain high two bits of A
            sta GBASL           ;store A in GBASL
            lsr A               ;shift A right
            lsr A               ;shift A right
            ora GBASL           ;AND A with GBASL
            sta GBASL           ;store A in GBASL

            lda GBASH           ;restore y coordinate from GBASH
            asl A               ;shift A left
            asl A               ;shift A left
            asl A               ;shift A left
            rol GBASH           ;rotate GBASH left
            asl A               ;shift A left
            rol GBASH           ;rotate GBASH left
            asl A               ;shift A left
            ror GBASL           ;rotate GBASL right
            lda GBASH           ;load GBASH into A
            and #%00011111      ;retain low five bits of A
            ora HGRPAGE         ;combine with hires page base address
            sta GBASH           ;store A in GBASH

            rts                 ;return
.endproc

;generate a program that samples the floating bus every 8 cycles
;input: A1L,A1H = data start address, A2L,A2H = data end address,
;A3L,A3H = program start address
.proc genprog
            clc                 ;clear carry
@loop:      ldy #0              ;load 0 into offset

            lda #OPLDAABS       ;load lda (absolute) opcode into A
            sta (A3L),Y         ;store in program
            iny                 ;increment offset
            lda #<HIRES         ;load hires soft switch address low byte
            sta (A3L),Y         ;store in program
            iny                 ;increment offset
            lda #>HIRES         ;load hires soft switch address high byte
            sta (A3L),Y         ;store in program
            iny                 ;increment offset

            lda #OPSTAABS       ;load sta (absolute) opcode into A
            sta (A3L),Y         ;store in program
            iny                 ;increment offset
            lda A1L             ;load data address low byte
            sta (A3L),Y         ;store in program
            iny                 ;increment offset
            lda A1H             ;load data address high byte
            sta (A3L),Y         ;store in program

            lda A3L             ;load A3 low byte
            adc #6              ;increment by the number of bytes stored above
            sta A3L             ;store it back in A3 low byte
            bcc @next           ;if carry is still clear, skip A3H increment
            inc A3H             ;increment A3 high byte

@next:      jsr NXTA1           ;increment A1
            bcc @loop           ;loop if A1 hasn't reached A2

            ldy #0              ;load 0 into offset
            lda #OPRTS          ;load rts opcode into A
            sta (A3L),Y         ;store in program

            rts                 ;return
.endproc

;run the generated program
;if running on a IIe, wait for the vertical blanking interval to begin
.proc runprog
            sec                 ;set carry before identification routine
            jsr IDROUTINE       ;run machine identification routine
            bcc prog            ;if carry was cleared it's a IIgs
            lda IDBYTE1         ;load machine ID byte 1
            cmp #6              ;check for IIe or better
            bne prog            ;it's a II, II+, or III in II+ emulation
            lda IDBYTE2         ;load machine ID byte 2
            beq prog            ;it's a IIc or IIc+

@loop2:     bit RDVBLBAR        ;wait for the vertical blanking
            bpl @loop2          ; interval to end
@loop1:     bit RDVBLBAR        ;wait for the vertical blanking
            bmi @loop1          ; interval to begin

                                ;fall through to prog
.endproc

;the generated program
.proc prog                      ;self-modifying! genprog overwrites bytes
            rts                 ;starting here
.endproc

You can poke it into memory by entering the monitor with:

CALL -151

and then pasting this in:

4000:2C 52 C0 2C 54 C0 2C 57 C0 2C
:50 C0 A9 20 85 E6 20 37 40 A9 00 85
:3C A9 10 85 3D A9 FF 85 3E A9 17 85
:3F A9 CE 85 40 A9 40 85 41 20 7C 40
:20 B2 40 2C 51 C0 4C 69 FF A2 BF 8A
:20 5A 40 8A A0 27 91 26 88 10 FB E0
:80 90 0B 69 3F A0 2F 91 26 88 C0 27
:D0 F9 CA E0 FF D0 E0 60 85 27 29 C0
:85 26 4A 4A 05 26 85 26 A5 27 0A 0A
:0A 26 27 0A 26 27 0A 66 26 A5 27 29
:1F 05 E6 85 27 60 18 A0 00 A9 AD 91
:40 C8 A9 57 91 40 C8 A9 C0 91 40 C8
:A9 8D 91 40 C8 A5 3C 91 40 C8 A5 3D
:91 40 A5 40 69 06 85 40 90 02 E6 41
:20 BA FC 90 D2 A0 00 A9 60 91 40 60
:38 20 1F FE 90 16 AD B3 FB C9 06 D0
:0F AD C0 FB F0 0A 2C 19 C0 10 FB 2C
:19 C0 30 FB 60

Run it with:

4000G

In my version, data is stored starting at $1000. I'll show some output from running this program on Virtual ][ 11.4 (this output matches my real unenhanced //e) and on Clock Signal emulating an unenhanced //e.

Let's start with the first few displayed scanlines. That data starts 70 * 65 / 8 cycles after the start of data, around memory location $1238, and can be displayed by running:

1230.126F

Here's output from Virtual ][:

1230- BF BF BF FF 3F 3F 3F 3F
1238- 80 80 C0 00 00 00 00 00
1240- 81 81 C1 01 01 01 01 01
1248- 82 82 C2 02 02 02 02 02
1250- 83 83 C3 03 03 03 03 03
1258- 84 84 C4 04 04 04 04 04
1260- 85 85 85 C5 05 05 05 05
1268- 05 86 86 C6 06 06 06 06

We start at $1238 with three bytes (80 80 C0) of horizontal blanking preceding line $0. $80 $80 comes from the end of visible scanline $80 (($0 - $40 + $C0) % $C0) followed by $C0 from the eight holes that follow that scanline (so the 25 bytes for horizontal blanking are indeed from consecutive memory locations, specifically these were from between $2068 and $207F). Next, five bytes (00 00 00 00 00) from the displayed portion of line $0 (from memory locations between $2000 and $2027). Then three (81 81 C1) from the next horizontal blanking period from the end of visible scanline $81 (($1 - $40 + $C0) % $C0) and the holes that follow it, and so on.

Compare this with output from Clock Signal 2023-09-10:

1230- B7 F7 3F 3F 3F 3F 3F 59
1238- 08 A8 00 00 00 00 00 B8
1240- B8 F8 01 01 01 01 01 B9
1248- B9 F9 02 02 02 02 02 BA
1250- BA FA 03 03 03 03 03 BB
1258- BB FB 04 04 04 04 04 BC
1260- BC FC 05 05 05 05 05 BD
1268- BD BD FD 06 06 06 06 06

The horizontal blanking period period for line $0 begins at $1237 here instead of $1238 due to my imprecise detection of the VBL, so that difference is fine, but the three bytes that start line $0's HBL here (59 08 A8) appear to be nonsense, and they are -- Clock Signal is returning values from the memory locations that literally precede line $0's visible component in memory, in other words it is taking them from memory locations outside of (preceding) hires page 1 (specifically from $1FE8 to $1FFF). Clock Signal's behavior matches the italicized conclusion in Bishop's article, quoted in the code:

// (1) a complete sixty-five-cycle scan line consists of sixty-five consecutive bytes of
// display buffer memory that starts twenty-five bytes prior to the actual data to be displayed.

but that's not correct for scan lines < 64. What he wrote in the preceding paragraph is more accurate:

a scan line consists of its visible forty-cycle component preceded by its invisible twenty-five-cycle HBL component, which is mapped from $40 display lines earlier. This model assumes a circular screen; that is, if counting up $40 lines would take you off the top of the screen, continue counting up from the bottom.

Clock Signal appears to initialize Apple II RAM to random values so the values it returns for the horizontal blanking period preceding the first 64 rows are literally random. They could contain any value, including the zero that the rainbow program is looking for as its signal, which could account for it switching modes at the wrong times. We can make Clock Signal's behavior more obvious by writing a specific value to those memory locations before running the test program, e.g. $42 with these monitor commands:

1FE7:42
1FE8<1FE7.1FFEM

Then we get:

1230- F7 3F 3F 3F 3F 3F 42 42
1238- 42 42 00 00 00 00 00 B8
1240- B8 F8 01 01 01 01 01 B9
1248- B9 F9 02 02 02 02 02 BA
1250- BA FA 03 03 03 03 03 BB
1258- BB FB 04 04 04 04 04 BC
1260- BC FC 05 05 05 05 05 BD
1268- BD FD 06 06 06 06 06 BE

In this run we see a minor difference from the previous one: we get four garbage $42s here instead of three garbage values because that'll happen once every eight scanlines since we're sampling every eight cycles and the horizontal blanking period is 25 cycles long.

Following the five $00 bytes for scanline $0, the horizontal blanking bytes for line $1 (B8 B8 F8) are again from the wrong row. At least they're from within hires page 1 this time but again they're taken from the 24 bytes that precede row $1 (row $1 starts at $2080; these bytes are from row $B8 (which starts at $2050) and the holes that follow it rather than from the correct row $81 (which starts at $20D0). Returning data from the wrong row like this could also definitely account for a program switching modes at the wrong time, since it would see its signal bytes at the wrong times.


The second problem is specific to the vertical blanking period whose data was recorded starting at $1000:

1000.103F

With Virtual ][:

1000- 80 80 80 C0 00 00 00 00
1008- 81 81 81 C1 01 01 01 01
1010- 82 82 82 C2 02 02 02 02
1018- 83 83 83 C3 03 03 03 03
1020- 84 84 84 C4 04 04 04 04
1028- 85 85 85 C5 05 05 05 05
1030- 86 86 86 86 C6 06 06 06
1038- 06 87 87 87 C7 07 07 07

With Clock Signal:

1000- 0C 35 2A 00 00 00 00 00
1008- B8 B8 F8 01 01 01 01 01
1010- B9 B9 F9 02 02 02 02 02
1018- BA BA FA 03 03 03 03 03
1020- BB BB FB 04 04 04 04 04
1028- BC BC FC 05 05 05 05 05
1030- BD BD BD FD 06 06 06 06
1038- 06 BE BE FE 07 07 07 07

Clock Signal is following what Bob Bishop wrote and which is quoted in the code:

// (2) During VBL the data acts just as if it were starting a whole new frame from the beginning, but
// it never finishes this pseudo-frame. After getting one third of the way through the frame (to
// scan line $3F), it suddenly repeats the previous six scan lines ($3A through $3F) before aborting
// to begin the next true frame.

But again Bishop wasn't precise enough. Although the output he published from his sampling program shows it, his text didn't mention that the data returned by the bus during vertical blanking is pulled from memory locations eight bytes earlier than during the visible portion. Looking at the correct output from Virtual ][, during the non-VBL scan of line $0 we got five $00 bytes preceded by three (or, 1 in 8 times, four) bytes:

1238- 80 80 C0 00 00 00 00 00

But during VBL we got only four $00 bytes preceded by four (or, 1 in 8 times, five) bytes:

1000- 80 80 80 C0 00 00 00 00

Clock Signal isn't implementing that difference. During non-VBL, we got the correct five $00's:

1238- 08 A8 00 00 00 00 00 B8

But during VBL we also got five $00's when we should have gotten four:

1000- 0C 35 2A 00 00 00 00 00

All of this is investigated much more thoroughly and explained much more precisely in Jim Sather's book Understanding the Apple II. Once I read the relevant passages there, I discovered one more issue with Clock Signal's implementation which Bishop must not have realized because his sample program was not designed to detect it: There are not 25 consecutive bytes during the horizontal blanking period. There are 24 consecutive bytes and the first byte is read twice.


After fixing these issues (I'll submit a pull request shortly), the rainbow program works:

rainbow-fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant