And rewriting my very first program to be much smarter
I spent some time looking into your logic bug, but I couldn't figure out exactly how it worked, since I got confused by the way you were shuffling values between registers and memory. I wondered how the code would look if it worked in the other direction - keeping values in memory and only loading them into registers when necessary. This is what I came up with (excluding the tile map and tile data, which is the same as your code):
; Let $50+51 be our destination in VRAM
; since VRAM is outside the zero-page,
; we need two bytes
; Let $52 be our source in the tile-map
; The tile-map is always in the zero-page,
; but let's use a 16-bit address
; so we can use indirect addressing
; Each tile starts at the first pixel
; Read the value pointed to by $0052
; into the accumulator.
; Without a "vanilla indirect" variant of LDA,
; we'll use "indirect, Y-indexed"
; since we just set Y to 0.
; If the tile-start is $FF...
; ...then we're at the end of the tile map
; Otherwise, let $54 be our source in the tile
; Keep $55 as 00, so we can use indirect mode
; Copy the pixel from the tile data to VRAM
; Prepare to copy the next pixel
; Have we drawn all 8 pixels of this tile?
; If not, go back and do the next one.
; Otherwise, we've done this tile,
; let's move to the next tile in the tile map
; We know the tile map is all in zero-page,
; so we can get away with INC.
; Let's move to the next tile in VRAM too
; Since this is outside zero-page,
; we need to do a full 16-bit addition with carry
ADC #$08 ; 8 pixels in a tile!
ADC #$00 ; propagate the carry
As a bonus, it demonstrates the "indirect Y-indexed" mode! This version might be considered a little more wasteful, since it spends bytes of the zero-page to store values that will always be 0, and because it uses "LDA ($52),Y" to emulate the missing "indirect zero-page" mode. It works, though, and the inner pixel-copying loop ("draw_tile_data_loop") is quite tidy, I think!