mel*_*bok 6 assembly z80 texas-instruments
I'm trying to write two bytes (color values) to the VRAM of my TI-84 Plus CE-T calculator, which uses the Zilog eZ80 CPU. The VRAM starts at 0xD40000 and is 0x25800 bytes long. The calculator has a built in syscall called MemSet, which fills a chunk of memory with one byte, but I want it to alternate between two different values and store these in memory. I tried using the following code:
#include "includes\ti84pce.inc"
.assume ADL=1
.org userMem-2
.db tExtTok,tAsm84CeCmp
call _homeup
call _ClrScrnFull
ld hl,13893632 ; = D40000, vram start
ld bc,153600 ; = 025800, count/vram length
j1:
ld (hl),31 ; set first byte
inc hl
dec bc
jr z,j2 ; jump to end if count==0
ld (hl),0 ; set second byte
inc hl
dec bc
jr z,j2 ; jump to end if count==0
jp j1 ; loop
j2:
call _GetKey
call _ClrScrnFull
ret
Run Code Online (Sandbox Code Playgroud)
I want it to output 31 00 31 00 31 00... into memory starting at 0xD40000, but instead it seems to change only the first byte and jump to the end after doing so. Any ideas on how to fix this?
这不起作用:
dec bc
jr z,j2
Run Code Online (Sandbox Code Playgroud)
只有8位dec并inc修改标志。可以通过正确检测是否bc为零来解决。
这是不使用手动循环的另一种技术:
ld hl,$D40000
ld (hl),31
inc hl
ld (hl),0
dec hl
ld de,$D40002
ld bc,$25800 - 2
ldir
Run Code Online (Sandbox Code Playgroud)
首先,如果要移动SP,则需要保存和还原它。其次,您需要禁用中断,否则将出现竞争条件错误:如果中断在副本末尾附近触发,则堆栈将扩展到其下方的任何内容,恰好是VAT。
; Index registers are actually fast on the eZ80
ld ix, 0
add ix, sp
di
; Do some hack using SP here
ld sp, ix
ei
Run Code Online (Sandbox Code Playgroud)
@ Ped7g eZ80将缓存任何-IR / -DR后缀指令;与Z80不同,它不会在每次迭代时从内存中重新读取操作码。因此,诸如LDIR之类的指令可以仅在2个总线周期(一次读取和一次写入)中执行每次迭代。因此,SP hack不仅不必要,而且实际上更慢。 SP hack仍然最好留给更有经验的程序员。
The eZ80 is very well pipelined and its performance is limited by its lack of any cache and 1-byte-wide bus. The only instruction that runs slower than the bus is MLT, a 2-bus-cycle instruction that needs 5 clock cycles. For every other instruction, just count the number of bytes in the opcode, and the number of read and write cycles, and you've got its execution time. It's a huge pity that in the TI-84+CE series, TI decided to pair the fast eZ80 with an SRAM that somehow needs four clock cycles for each read and write (at 48 MHz)! Yes, TI, a world leader in semiconductor design, managed to design a slow SRAM. Getting on-die SRAM to perform poorly is an engineering feat.
@harold has the right answer, though I prefer optimizing for size instead of speed outside of inner loops.
#include "includes\ti84pce.inc"
.assume ADL=1
.org userMem-2
.db tExtTok,tAsm84CeCmp
call _homeup
call _ClrScrnFull
; Initialize registers
ld hl, vRam
ld bc, lcdWidth * lcdHeight * 2 - 2
push hl
pop de
; Write initial 2-byte value
ld (hl), 31
inc hl
ld (hl), 0
inc hl
ex de, hl
; Copy everything all at once. Interrupts may trigger while this instruction is processing.
ldir
call _GetKey
call _ClrScrnFull
ret
Run Code Online (Sandbox Code Playgroud)
On EFnet, #ez80-dev is a good place to ask questions. cemetech.net is also a good place.
tum_ 答案的变化与更快的dec bc循环零测试机制。
LD SP,$D65800 ; <end of VRAM>: 0xD40000+0x25800
LD BC,$004B ; 0x4B many times (in C) the 256x inner loop (B=0)
; that results into 0x4B00 repeats of loop, which when 8 bytes per loop
; are set makes the total 0x25800 bytes (VRAM size)
; (if you would unroll it for more than 8 bytes, it will be a bit more
; tricky to calculate the initial BC to get correct amount of looping)
; (not that much tricky, just a tiny bit)
LD HL,31 ; H <- 0, L <- 31
.L1
PUSH HL ; (SP – 2) <- L, (SP – 1) <- H, SP <- SP - 2
PUSH HL ; set 8 bytes in each iteration
PUSH HL
PUSH HL
DJNZ .L1 ; loop by B value (in this example it starts as 0 => 256x loop)
DEC C ; loop by C ("outer" counter)
JR NZ,.L1 ; btw JP is faster than JR on original Z80, but not on eZ80
.END
Run Code Online (Sandbox Code Playgroud)
(BTW我没做过eZ80编程,也没在调试器里验证过,所以这个有点假设。。。其实想想,不是pusheZ80 32位的吗?init的hl应该是ld hl,$001F001F设置4字节为 single push,并且循环的内部主体应该只有两个push hl)
(但我没有吨Z80编程的,所以这就是为什么我即使评论打扰这个话题,即使我还没有看到EZ80代码以往任何时候)
编辑:原来 eZ80 推送是 24 位的,即上面的代码会产生不正确的结果。它当然可以轻松修复(因为问题是实现细节,而不是主体),例如:
LD SP,$D65800 ; <end of VRAM>: 0xD40000+0x25800
LD BC,$0014 ; 0x14 many times (in C) the 256x inner loop (B=0)
; that results into 0x1400 repeats of loop, which with 30 bytes per
; loop set makes the total 0x25800 bytes (VRAM size)
LD HL,$1F001F ; will set bytes 31, 0, 31
LD DE,$001F00 ; will set bytes 0, 31, 0
.L1
PUSH DE
PUSH HL
; here SP = SP-6, and 6 bytes 31, 0, 31, 0, 31, 0 were set
PUSH DE
PUSH HL
PUSH DE
PUSH HL
PUSH DE
PUSH HL
PUSH DE
PUSH HL ; unrolled 5 times to set 30 bytes in total
DJNZ .L1 ; loop by B value (in this example it starts as 0 => 256x loop)
DEC C ; loop by C ("outer" counter)
JR NZ,.L1
Run Code Online (Sandbox Code Playgroud)