I spent some time looking into your logic bug, but I couldn't figure out exactly how it worked, since I got confused by the way you were shuffling values between registers and memory. I wondered how the code would look if it worked in the other direction - keeping values in memory and only loading them into registers when necessary. This is what I came up with (excluding the tile map and tile data, which is the same as your code):
; Let $50+51 be our destination in VRAM
; since VRAM is outside the zero-page,
; we need two bytes
LDA #$00
STA $50
LDA #$02
STA $51
; Let $52 be our source in the tile-map
LDA #$20
STA $52
; The tile-map is always in the zero-page,
; but let's use a 16-bit address
; so we can use indirect addressing
LDA #$00
STA $53
draw_tile_loop:
; Each tile starts at the first pixel
LDY #$00
; Read the value pointed to by $0052
; into the accumulator.
; Without a "vanilla indirect" variant of LDA,
; we'll use "indirect, Y-indexed"
; since we just set Y to 0.
LDA ($52),Y
; If the tile-start is $FF...
CMP #$FF
; ...then we're at the end of the tile map
BEQ all_done
; Otherwise, let $54 be our source in the tile
STA $54
; Keep $55 as 00, so we can use indirect mode
LDA #$00
STA $55
draw_tile_data_loop:
; Copy the pixel from the tile data to VRAM
LDA ($54),Y
STA ($50),Y
; Prepare to copy the next pixel
INY
; Have we drawn all 8 pixels of this tile?
CPY #$08
; If not, go back and do the next one.
BNE draw_tile_data_loop
; Otherwise, we've done this tile,
; let's move to the next tile in the tile map
; We know the tile map is all in zero-page,
; so we can get away with INC.
INC $52
; Let's move to the next tile in VRAM too
; Since this is outside zero-page,
; we need to do a full 16-bit addition with carry
CLC
LDA $50
ADC #$08 ; 8 pixels in a tile!
STA $50
LDA $51
ADC #$00 ; propagate the carry
STA $51
JMP draw_tile_loop
all_done:
BRK
As a bonus, it demonstrates the "indirect Y-indexed" mode! This version might be considered a little more wasteful, since it spends bytes of the zero-page to store values that will always be 0, and because it uses "LDA ($52),Y" to emulate the missing "indirect zero-page" mode. It works, though, and the inner pixel-copying loop ("draw_tile_data_loop") is quite tidy, I think!
I spent some time looking into your logic bug, but I couldn't figure out exactly how it worked, since I got confused by the way you were shuffling values between registers and memory. I wondered how the code would look if it worked in the other direction - keeping values in memory and only loading them into registers when necessary. This is what I came up with (excluding the tile map and tile data, which is the same as your code):
; Let $50+51 be our destination in VRAM
; since VRAM is outside the zero-page,
; we need two bytes
LDA #$00
STA $50
LDA #$02
STA $51
; Let $52 be our source in the tile-map
LDA #$20
STA $52
; The tile-map is always in the zero-page,
; but let's use a 16-bit address
; so we can use indirect addressing
LDA #$00
STA $53
draw_tile_loop:
; Each tile starts at the first pixel
LDY #$00
; Read the value pointed to by $0052
; into the accumulator.
; Without a "vanilla indirect" variant of LDA,
; we'll use "indirect, Y-indexed"
; since we just set Y to 0.
LDA ($52),Y
; If the tile-start is $FF...
CMP #$FF
; ...then we're at the end of the tile map
BEQ all_done
; Otherwise, let $54 be our source in the tile
STA $54
; Keep $55 as 00, so we can use indirect mode
LDA #$00
STA $55
draw_tile_data_loop:
; Copy the pixel from the tile data to VRAM
LDA ($54),Y
STA ($50),Y
; Prepare to copy the next pixel
INY
; Have we drawn all 8 pixels of this tile?
CPY #$08
; If not, go back and do the next one.
BNE draw_tile_data_loop
; Otherwise, we've done this tile,
; let's move to the next tile in the tile map
; We know the tile map is all in zero-page,
; so we can get away with INC.
INC $52
; Let's move to the next tile in VRAM too
; Since this is outside zero-page,
; we need to do a full 16-bit addition with carry
CLC
LDA $50
ADC #$08 ; 8 pixels in a tile!
STA $50
LDA $51
ADC #$00 ; propagate the carry
STA $51
JMP draw_tile_loop
all_done:
BRK
As a bonus, it demonstrates the "indirect Y-indexed" mode! This version might be considered a little more wasteful, since it spends bytes of the zero-page to store values that will always be 0, and because it uses "LDA ($52),Y" to emulate the missing "indirect zero-page" mode. It works, though, and the inner pixel-copying loop ("draw_tile_data_loop") is quite tidy, I think!