Explaining the Bug

After (re)discovering the semicolon bug in Atari BASIC revision A, I thought I’d spend a bit of time trying to find out exactly why BASIC was exhibiting this behaviour. In order to do this, I had to re-learn how BASIC stores programs in memory.

Atari BASIC uses tokenization to reduce the memory footprint and increase the execution speed of programs. Tokenization replaces keyword strings (such as PRINT) with single-character tokens (0x20). Ultimately, this bug is caused by the tokenization process and incorrect bounds checking.

First, let’s look at a simple PRINT statement, and how Atari BASIC tokenizes it. You may want to reference De Re Atari which has a good explanation of the tokenizing process, as well as a token table.

print-hi

0A 00 0A 0A 20 0F 02 48 49 16
line number llen slen PRINT string strlen H I eol

Our first example print statement has a 2-byte string constant “HI”, followed by the token for end-of-line. 0x20 is the token for PRINT. llen and slen are the line length and statement length.
print-hi-semicolon

0A 00 0B 0B 20 0F 02 48 49 15 16
line number llen slen PRINT string strlen H I ; eol

Our second example adds a semicolon to the end of line. The normal behaviour for semicolon is to suppress the automatic carriage return. Note that the string is still 2 bytes long, followed by the token for the semicolon (0x15).

print-hi-control-u

0A 00 0B 0B 20 0F 03 48 49 15 16
line number llen slen PRINT string strlen H I ^U eol

Our third example now has a 3-byte string constant with no semicolon. The only difference between this example and the previous example is the string constant length.

Now that we’ve seen how the different lines are tokenized, let’s look at the BASIC source code. We need to look at the XPRINT function, which begins at 0xB3B6.

B3B6            XPRINT
B3B6  A5C9          LDA     PTABW           ; GET TAB VALUE
B3B8  85AF          STA     SCANT           ; SCANT
B3BA  A900          LDA     #0              ; SET OUT INDEX = 0
B3BC  8594          STA     COX
                ;
B3BE  A4A8      :XPR0   LDY    STINDEX      ; GET STMT DISPL
B3C0  B18A          LDA     [STMCUR],Y      ; GET TOKEN
                ;
B3C2  C912          CMP     #CCOM
B3C4  F053 ^B419    BEQ     :XPTAB          ; BR IF TAB
B3C6  C916          CMP     #CCR
B3C8  F07C ^B446    BEQ     :XPEOL          ; BR IF EOL
B3CA  C914          CMP     #CEOS
B3CC  F078 ^B446    BEQ     :XPEOL          ; BR IF EOL
B3CE  C915          CMP     #CSC
B3D0  F06F ^B441    BEQ     :XPNULL         ; BR IF NULL
B3D2  C91C          CMP     #CPND
B3D4  F061 ^B437    BEQ     :XPRIOD
                ;
B3D6  20E0AA        JSR     EXEXPR          ; GO EVALUATE EXPRESSION
B3D9  20F2AB        JSR     ARGPOP          ; POP FINAL VALUE
B3DC  C6A8          DEC     STINDEX         ; DEC STINDEX
B3DE  24D2          BIT     VTYPE           ; IS THIS A STRING
B3E0  3016 ^B3F8    BMI     :XPSTR          ; BR IF STRING
                ;
B3E2  20E6D8        JSR     CVFASC          ; CONVERT TO ASCII
B3E5  A900          LDA     #0
B3E7  85F2          STA     CIX
                ;
B3E9  A4F2      :XPR1   LDX     CIX         ; OUTPUT ASCII CHARACTERS
B3EB  B1F3          LDA     [INBUFF],Y      ; FROM INBUFF
B3ED  48            PHA                     ; UNTIL THE CHAR
B3EE  E6F2          INC     CIX             ; WITH THE MSB ON
B3F0  205DB4        JSR     :XPRC           ; IS FOUND
B3F3  68            PLA
B3F4  10F3 ^B3E9    BPL     :XPR1
B3F6  30C6 ^B3BE    BMI     :XPR0           ; THEN GO FOR NEXT TOKEN
B3F8            :XPSTR
B3F8  209BAB        JSR     GSTRAD          ; GO GET ABS STRING ARRAY
B3FB  A900          LDA     #0
B3FD  85F2          STA     CIX
B3FF  A5D6      :XPR2C  LDA     VTYPE+EVSLEN    ; IF LEN LOW
B401  D004 ^B407    BNE     :XPR2B          ; NOT ZERO BR
B403  C6D7          DEC     VTYPE+EVSLEN+1  ; DEC LEN HI
B405  30B7 ^B3BE    BMI     :XPR0           ; BR IF DONE
B407  C6D6      :XPR2B  DEC     VTYPE+EVSLEN    ; DEC LEN LOW
                ;
B409  A4F2      :XPR2   LDY     CIX         ; OUTPUT STRING CHARS
B40B  B1D4          LDA     [VTYPE+EVSADR],Y ; FOR THE LENGTH
B40D  E6F2          INC     CIX             ; OF THE STRING
B40F  D002 ^B413    BNE     :XPR2A
B411  E6D5          INC     VTYPE+EVSADR+1
B413            :XPR2A
B413  205FB4        JSR     :XPRC1
B416  4CFFB3        JMP     :XPR2C
                ;
B419            :XPTAB
B419  A494      :XPR3   LDY     COX         ; DO UNTIL COX+1 <SCANT
B41B  C8            INY
B41C  C4AF          CPY     SCANT
B41E  9009 ^B429    BCC     :XPR4
B420  18        :XPIC3  CLC
B421  A5C9          LDA     PTABW           ; SCANT = SCANT+TAB
B423  65AF          ADC     SCANT
B425  85AF          STA     SCANT
B427  90F0 ^B419    BCC     :XPR3
                ;
B429  A494      :XPR4   LDY     COX         ; DO UNTIL COX = SCANT
B42B  C4AF          CPY     SCANT
B42D  B012 ^B441    BCS     :XPR4A
B42F  A920          LDA     #$20            ; PRINT BLANKS
B431  205DB4        JSR     :XPRC
B434  4C29B4        JMP     :XPR4
                ;
B437  2002BD    :XPRIOD JSR     GIOPRM      ; GET DEVICE NO.
B43A  85B5          STA     LISTDTD         ; SET AS LIT DEVICE
B43C  C6A8          DEC     STINDEX         ;DEC INDEX
B43E  4CBEB3        JMP     :XPR0           ; GET NEXT TOKEN
                ;
B441            :XPR4A
B441  E6A8      :XPNULL INC     STINDEX     ; INC STINDEX
B443  4CBEB3        JMP     :XPR0
                ;
B446            :XPEOL
B446  A4A8      :XPEOS  LDY     STINDEX     ; AT END OF PRINT
B448  88            DEY
B449  B18A          LDA     [STMCUR],Y      ; IF PREV CHAR WAS
B44B  C915          CMP     #CSC            ; SEMI COLON THEN DONE
B44D  F009 ^B458    BEQ     :XPRTN          ; ELSE PRINT A CR
B44F  C912          CMP     #CCOM           ; OR A COMMA
B451  F005 ^B458    BEQ     :XPRTN          ; THEN DONE
B453  A99B          LDA     #CR
B445  205FB4        JSR     :XPRC1          ; THEN DONE
B458            :XPRTN
B458  A900          LDA     #0              ; SET PRIMARY
B45A  85B5          STA     LISTDTD         ; LIST DVC = 0
B45C  60            RTS                     ; AND RETURN

I know that’s a lot of code, but let’s follow the bouncing ball. The first key part happens at address 0xB3C8 – we look for an eol token (0x16). If we find one, we branch to XPEOL (0xB446). What’s the first thing we do at the end of line? We rewind one byte (DEY – decrement Y), and see if it’s the token for semicolon (0x15). If it is, we skip printing a carriage return.

But wait a minute. We blindly rewind one byte, even if that rewind takes us inside a string constant! There’s the bug. We should not be blindly rewinding one byte – we should be checking to see if we are inside a string constant our outside a string constant.

Looking at the code, similar behaviour will happen with the value 0x12, which is the token for a comma.

This bug has been fixed in revision C BASIC, but I’m not aware of commented source code being available for revision C.

Leave a Reply

Your email address will not be published. Required fields are marked *