Skip to content

Add MOVZ+MOVK address symbolization for ARM64 (non-PIE)#91

Open
boppitybop wants to merge 2 commits intoGrammaTech:mainfrom
boppitybop:fix/arm64-movz-movk-symbolization
Open

Add MOVZ+MOVK address symbolization for ARM64 (non-PIE)#91
boppitybop wants to merge 2 commits intoGrammaTech:mainfrom
boppitybop:fix/arm64-movz-movk-symbolization

Conversation

@boppitybop
Copy link

@boppitybop boppitybop commented Feb 18, 2026

Add MOVZ+MOVK address symbolization for ARM64 (non-PIE)

Fixes #90

ARM64 non-PIE executables construct absolute addresses using
MOVZ+MOVK chains (2-4 instructions, each carrying a 16-bit slice
with LSL #0/GrammaTech#16/GrammaTech#32/GrammaTech#48).  ddisasm had no rules for this pattern,
causing the instructions to retain raw immediates.  The reassembled
binary then contained wrong addresses.

Changes:
- Arm64Loader: record op_shifted facts for IMM operands with LSL
- arm64_symbolization.dl: add movz_movk_chain rules (2/3/4 deep),
  emit symbolic_operand_candidate with G0-G3 attributes for EXEC
  binaries when the reconstructed value matches code or data_segment
- Disassembler: add G2/G3 to AttributeMap
- Add ex_movz_movk regression test
@junghee
Copy link
Collaborator

junghee commented Feb 19, 2026

@boppitybop Thank you for your contribution!

Before we can move forward with this MR, please sign our Contributor License Agreement (CLA) to grant GrammaTech license to the work: As mentioned in https://github.com/GrammaTech/ddisasm/blob/main/CONTRIBUTING.md#contributor-license-agreement

CLA form: https://github.com/GrammaTech/ddisasm/blob/main/GrammaTech-CLA-ddisasm.pdf
Email: CLA@GrammaTech.com

Please let me know once it's done or if you have any questions.

Thank you!

@junghee
Copy link
Collaborator

junghee commented Feb 23, 2026

Hi @boppitybop,

Could you let me know how you are reassembling the generated assembly?

I tried binary-printing using gtirb-pprinter with --use-gcc=aarch64-linux-gnu-gcc, but I'm seeing a number of errors, including:

/tmp/fileJrYM8Z.s:132487: Error: selected processor does not support bti ' 
/tmp/fileJrYM8Z.s:132492: Error: selected processor does not support swpl w0,w0,[x1]' 
/tmp/fileJrYM8Z.s:133976: Error: unexpected characters following instruction at operand 3 -- cmge d0,d0,#0,#0' 
/tmp/fileJrYM8Z.s:135177: Error: selected processor does not support cntd x0'
...

After changing:

.arch armv8-a

to

.arch armv8.5-a+sve

the"selected processor does not support" errors went away.
However, I'm still encountering other issues, for example:

invalid addressing mode at operand 2 -- ld1 {v1.16b,v2.16b},[x2,# 32]!'

This is happening after applying your corresponding change in gtirb-pprinter.

Could you share the exact toochain version and flags you're using to reassemble successfully?

@boppitybop
Copy link
Author

boppitybop commented Mar 3, 2026

Thanks for the thorough testing, @junghee. It surfaced a bug in the initial implementation.

I realised Capstone aliases MOVZ as MOV when imm16 ≠ 0 (ARM preferred disassembly), reporting ARM64_INS_MOV and folding the shift into the immediate.
The Datalog rules only matched MOVZ, so _start's chains worked by coincidence (imm16=0, hw>0 → capstone keeps movz), but main's MOVZ was silently missed.

So now Arm64Loader.cpp will restore canonical MOVZ mnemonic and un-fold immediate/shift for Datalog matching.

For the arch, I guess ideally the pprinter would detect which extensions the binary uses and emit the appropriate .arch/.arch_extension directives.

I think the error such as invalid addressing mode isn't from this PR — they reproduce identically on stock ddisasm v1.9.2 + gtirb-pprinter v2.2.2 (grammatech/ddisasm:latest).

I couldn't get static binary reassembly to work on ARM64 due to these pprinter issues

  • prfm [x1,#640] (missing prefetch type operand)
  • mov x5,v17 (should use umov for GPR vector transfers)
  • ld1 addressing mode (wrong syntax for post-index form)
  • adrp x0,#4640768 (ADRP can't take raw immediates)

For validating the MOVZ+MOVK symbolization itself, I used a dynamically-linked build of the test file (examples/arm64_asm_examples/ex_movz_movk/src.s) which kept glibc out of the binary.

aarch64-linux-gnu-gcc -no-pie src.s -o ex
ddisasm ex --ir ex.gtirb
gtirb-pprinter ex.gtirb --asm ex.s
aarch64-linux-gnu-gcc -no-pie -nostartfiles -nostdlib ex.s -o ex_rebuilt -lc
qemu-aarch64 -L /usr/aarch64-linux-gnu ./ex_rebuilt
# Output: "Hello from MOVZ+MOVK!"

Toolchain: aarch64-linux-gnu-gcc 9.4.0, binutils 2.34 (Ubuntu 20.04).

Copy link
Collaborator

@junghee junghee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boppitybop Thank you for the contribution!
I've added some inline review comments -- please take a look and address them when you get a chance so we can move this forward.

Comment on lines +339 to +353
// MOVZ/MOVK with an explicit lsl #N (N > 0).
movz_movk_insn(EA, Reg, Imm band 65535, Shift, Operation):-
instruction(EA,_,_,Operation,ImmOp,RegOp,0,0,_,_),
(Operation = "MOVZ" ; Operation = "MOVK"),
op_immediate(ImmOp, Imm, _),
op_regdirect_contains_reg(RegOp, Reg),
op_shifted(EA, 1, Shift, "LSL").

// MOVZ/MOVK without an explicit shift (lsl #0).
movz_movk_insn(EA, Reg, Imm band 65535, 0, Operation):-
instruction(EA,_,_,Operation,ImmOp,RegOp,0,0,_,_),
(Operation = "MOVZ" ; Operation = "MOVK"),
op_immediate(ImmOp, Imm, _),
op_regdirect_contains_reg(RegOp, Reg),
!op_shifted(EA, 1, _, _).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// MOVZ/MOVK with an explicit lsl #N (N > 0).
movz_movk_insn(EA, Reg, Imm band 65535, Shift, Operation):-
instruction(EA,_,_,Operation,ImmOp,RegOp,0,0,_,_),
(Operation = "MOVZ" ; Operation = "MOVK"),
op_immediate(ImmOp, Imm, _),
op_regdirect_contains_reg(RegOp, Reg),
op_shifted(EA, 1, Shift, "LSL").
// MOVZ/MOVK without an explicit shift (lsl #0).
movz_movk_insn(EA, Reg, Imm band 65535, 0, Operation):-
instruction(EA,_,_,Operation,ImmOp,RegOp,0,0,_,_),
(Operation = "MOVZ" ; Operation = "MOVK"),
op_immediate(ImmOp, Imm, _),
op_regdirect_contains_reg(RegOp, Reg),
!op_shifted(EA, 1, _, _).
movz_movk_insn(EA, Reg, Imm band 65535, Shift, Operation):-
instruction_get_operation(EA, Operation),
(Operation = "MOVZ" ; Operation = "MOVK"),
arch.move_reg_imm(EA,Reg,Imm,_),
(
// MOVZ/MOVK with an explicit lsl #N (N > 0).
op_shifted(EA, 1, Shift, "LSL")
;
// MOVZ/MOVK without an explicit shift (lsl #0).
!op_shifted(EA, 1, _, _),
Shift = 0
),
Shift % 16 = 0.

Since the two rules are similar, we can merge them into the one above.
Also, I think it would be good to check that Shift % 16 = 0.

const cs_arm64_op& CsOp = Details.operands[i];
// For aliased MOVZ, we fix up the immediate operand to recover the
// original 16-bit value and shift that capstone folded away.
cs_arm64_op CsOp = Details.operands[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this line three lines up, after the comment // Load capstone operand.

all: out.txt
check: out.txt
ex: src.s
$(CC) -no-pie $^ -o $@ -static
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$(CC) -no-pie $^ -o $@ -static
$(CC) -no-pie $^ -o $@

I tested it locally and found it only works without -static. And without it, the command matches to the one in your comment:

aarch64-linux-gnu-gcc -no-pie src.s -o ex

@@ -0,0 +1,38 @@
// Test: MOVZ+MOVK address construction in non-PIE ARM64 binaries.
//
// This exercises the movz_movk_pair Datalog rules. In a non-PIE executable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This exercises the movz_movk_pair Datalog rules. In a non-PIE executable
// This exercises the movz_movk_chain Datalog rules. In a non-PIE executable

//
// This exercises the movz_movk_pair Datalog rules. In a non-PIE executable
// the linker fills absolute addresses into the MOVZ/MOVK immediates, so
// ddisasm must recognise the pair as constructing an address and emit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ddisasm must recognise the pair as constructing an address and emit
// ddisasm must recognize the pair as constructing an address and emit

// ddisasm must recognise the pair as constructing an address and emit
// symbolic operand candidates with G0/G1 attributes.
//
// Expected behaviour after the fix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Expected behaviour after the fix:
// Expected behavior:

Comment on lines +404 to +417
movz_movk_complete(EA_first, EA_last, Value):-
movz_movk_chain(EA_first, EA_last, _, Value, _),
// This chain cannot be extended by another MOVK.
(
!next(EA_last, _)
;
next(EA_last, EA_after),
!movz_movk_insn(EA_after, _, _, _, "MOVK")
;
next(EA_last, EA_after),
movz_movk_insn(EA_after, Reg2, _, _, "MOVK"),
movz_movk_chain(EA_first, EA_last, Reg, _, _),
Reg2 != Reg
).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
movz_movk_complete(EA_first, EA_last, Value):-
movz_movk_chain(EA_first, EA_last, _, Value, _),
// This chain cannot be extended by another MOVK.
(
!next(EA_last, _)
;
next(EA_last, EA_after),
!movz_movk_insn(EA_after, _, _, _, "MOVK")
;
next(EA_last, EA_after),
movz_movk_insn(EA_after, Reg2, _, _, "MOVK"),
movz_movk_chain(EA_first, EA_last, Reg, _, _),
Reg2 != Reg
).
movz_movk_complete(EA_first, EA_last, Value):-
movz_movk_chain(EA_first, EA_last, Reg, Value, _),
// This chain cannot be extended by another MOVK.
(
!next(EA_last, _), UNUSED(Reg)
;
next(EA_last, EA_after), UNUSED(Reg),
!movz_movk_insn(EA_after, _, _, _, "MOVK")
;
next(EA_last, EA_after),
movz_movk_insn(EA_after, Reg2, _, _, "MOVK"),
Reg2 != Reg
).

movz_movk_insn(EA, _, _, Shift, _),
movz_movk_shift_group(Shift, Group).

// Suppress unpaired MOVZ/MOVK from being independently symbolized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate unlikely_have_symbolic_immediate is used in code inference (block_heuristic) checking whether the immediate could be a possible target; it is not used in symbolization.
So, I think the following comment would be sufficient:

// Immediate in movz or movk may not be symbolic unless the instruction is part of movz_movk_chain.

I think it would also be good to add the following rule for potentially better code inference:

may_have_symbolic_immediate(EA,as(Value,address)):-
    movz_movk_complete(EA_first,EA_last,Value),
    movz_movk_member(EA,EA_first,EA_liast).

Comment on lines +98 to +106
if(IsAliasedMovz && CsOp.type == ARM64_OP_IMM)
{
CsOp.imm = AliasedImm16;
if(AliasedShift > 0)
{
CsOp.shift.type = ARM64_SFT_LSL;
CsOp.shift.value = AliasedShift;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(IsAliasedMovz && CsOp.type == ARM64_OP_IMM)
{
CsOp.imm = AliasedImm16;
if(AliasedShift > 0)
{
CsOp.shift.type = ARM64_SFT_LSL;
CsOp.shift.value = AliasedShift;
}
}
if(IsAliasedMovz && CsOp.type == ARM64_OP_IMM && AliasedShift > 0)
{
CsOp.imm = AliasedImm16;
CsOp.shift.type = ARM64_SFT_LSL;
CsOp.shift.value = AliasedShift;
}

I think it can be simplified as above since CsOp.imm is unchanged when shift.value is 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARM64: MOVZ+MOVK address construction not symbolized in non-PIE binaries

2 participants