2009-08-26

Switch statements, 8032/8051 style

Now this next trick is a little bit more clever.

If, when disassembling 8032 / 8051 code, you ever see the start of an lcall'ed routine that looks like this
ROM:0000B5BA                 pop     DPH             ; Data Pointer, High Byte
ROM:0000B5BC pop DPL ; Data Pointer, Low Byte
and then proceeds to use DPTR for movc instructions, the effective DPTR address used in that routine will be the next address after the LCALL (because lcall did push the current PC on the stack)

This is effectively used by, hum, some code, to implement switch statements that might be a bit tricky to detect during reverse engineering, as the disassembler obviously expect lcall to return, and will start disassembly the switch table.

Example code:
ROM:00006CA7 ROM_6CA7:                               ; CODE XREF: ROM_6C60+15j
ROM:00006CA7 ; ROM_6C60+1Bj ...
ROM:00006CA7 mov DPTR, #0xC46A
ROM:00006CAA movx A, @DPTR
ROM:00006CAB xrl A, #0x60
ROM:00006CAD jnz ROM_6CF4
ROM:00006CAF mov DPTR, #0xB03E
ROM:00006CB2 movx A, @DPTR
ROM:00006CB3 lcall case_switch_byte
ROM:00006CB3 ; ---------------------------------------------------------------------------
ROM:00006CB6 .word ROM_6CE4
ROM:00006CB8 .byte 0
ROM:00006CB9 .word ROM_6CE4
ROM:00006CBB .byte 0x20
ROM:00006CBC .word ROM_6CE0
ROM:00006CBE .byte 0x2A
ROM:00006CBF .word ROM_6CDC
ROM:00006CC1 .byte 0x2B
ROM:00006CC2 .word ROM_6CDE
ROM:00006CC4 .byte 0x2D
ROM:00006CC5 .word ROM_6CE2
ROM:00006CC7 .byte 0x2F
ROM:00006CC8 .word ROM_6CDA
ROM:00006CCA .byte 0x39
ROM:00006CCB .word ROM_6CD2
ROM:00006CCD .byte 0x5A
ROM:00006CCE .word 0
ROM:00006CD0 .word 0x6CEC
ROM:00006CD2 ; ---------------------------------------------------------------------------
ROM:00006CD2
ROM:00006CD2 ROM_6CD2: ; DATA XREF: ROM_6C60+6Bo
ROM:00006CD2 mov DPTR, #0xB03E
ROM:00006CD5 mov A, #0x30 ; '0'
ROM:00006CD7 movx @DPTR, A
ROM:00006CD8 sjmp ROM_6D54
ROM:00006CDA ; ---------------------------------------------------------------------------


with case_switch_byte being:

ROM:B5BA
ROM:B5BA ; =============== S U B R O U T I N E =======================================
ROM:B5BA
ROM:B5BA ; Iput: A = matching case value (byte)
ROM:B5BA
ROM:B5BA case_switch_byte: ; CODE XREF: ROM:4046p
ROM:B5BA ; ROM_4552+178p ...
ROM:B5BA pop DPH ; Data Pointer, High Byte
ROM:B5BC pop DPL ; DPTR = lcall return address
ROM:B5BE mov R0, A
ROM:B5BF
ROM:B5BF loop_through_cases: ; CODE XREF: case_switch_byte+24j
ROM:B5BF clr A
ROM:B5C0 movc A, @A+DPTR ; NB: @ is confusing. It's just A+DPTR
ROM:B5C1 jnz valid_dest ; make sure dest @ != 0
ROM:B5C3 mov A, #1
ROM:B5C5 movc A, @A+DPTR
ROM:B5C6 jnz valid_dest
ROM:B5C8 inc DPTR
ROM:B5C9 inc DPTR ; if null dest, just use the next
ROM:B5C9 ; word as dest address
ROM:B5CA
ROM:B5CA dest_match: ; CODE XREF: case_switch_byte+1Fj
ROM:B5CA movc A, @A+DPTR
ROM:B5CB mov R0, A
ROM:B5CC mov A, #1
ROM:B5CE movc A, @A+DPTR
ROM:B5CF mov DPL, A ; Data Pointer, Low Byte
ROM:B5D1 mov DPH, R0 ; dest into DPTR
ROM:B5D3 clr A
ROM:B5D4 jmp @A+DPTR
ROM:B5D5 ; ---------------------------------------------------------------------------
ROM:B5D5
ROM:B5D5 valid_dest: ; CODE XREF: case_switch_byte+7j
ROM:B5D5 ; case_switch_byte+Cj
ROM:B5D5 mov A, #2
ROM:B5D7 movc A, @A+DPTR
ROM:B5D8 xrl A, R0 ; cmp val with parameter
ROM:B5D9 jz dest_match
ROM:B5DB inc DPTR
ROM:B5DC inc DPTR
ROM:B5DD inc DPTR ; skip 3 bytes to next switch table entry
ROM:B5DE sjmp loop_through_cases
ROM:B5DE ; End of function case_switch_byte
ROM:B5DE
ROM:B5E0


The same kind of routine also exists for a word parameter instead of a byte.
Most of the time, the case values will follow some kind of logical order, so if you see a bunch of sequencing bytes of word, interlaced with what look like offsets, and preceded by an lcall, you might want to chack what's on the other end of that lcall.

OR, the preferred way once you have made your initial pass at identifying code, look for a function that starts by popping DPH and DPL, and seek all the lcalls that cross reference to it to identify the switch tables.

Oh, and for those who might wonder, of course, as soon as you pop the PC address that's been enqueued on the stack, the lcall never returns, and becomes exactly like a jump.

Coming next: How the hell are these bloody strings and other data sitting in standalone data sections referenced, where there does not appear to be any obvious address referencing to them anywhere in the disassembly...

No comments:

Post a Comment

Note: only a member of this blog may post a comment.