From 8a5d1f0f83c2967d7ab7f02b1e7d7a0506881882 Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Wed, 11 Dec 2024 13:26:11 +0100 Subject: [PATCH 1/6] Integrating load/store pair for RV32 with the main manual --- src/riscv-unprivileged.adoc | 1 + src/zfinx.adoc | 9 +- src/zilsd.adoc | 294 ++++++++++++++++++++++++++++++++++++ 3 files changed, 300 insertions(+), 4 deletions(-) create mode 100644 src/zilsd.adoc diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 83a9a4531..40fa3592c 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -190,6 +190,7 @@ include::v-st-ext.adoc[] include::scalar-crypto.adoc[] include::vector-crypto.adoc[] include::unpriv-cfi.adoc[] +include::zilsd.adoc[] include::rv-32-64g.adoc[] include::extending.adoc[] include::naming.adoc[] diff --git a/src/zfinx.adoc b/src/zfinx.adoc index aae57fe0f..384b5c6f9 100644 --- a/src/zfinx.adoc +++ b/src/zfinx.adoc @@ -97,10 +97,11 @@ operand is zero—i.e., `x1` is not accessed. [NOTE] ==== -Load-pair and store-pair instructions are not provided, so transferring -double-precision operands in RV32Zdinx from or to memory requires two -loads or stores. Register moves need only a single FSGNJ.D instruction, -however. +Load-pair and store-pair instructions are contained in a seaparate extension +(see Section <>). +In case this is not available, transferring double-precision operands in +RV32Zdinx from or to memory requires two loads or stores. Register moves need +only a single FSGNJ.D instruction, however. ==== === Zhinx diff --git a/src/zilsd.adoc b/src/zilsd.adoc new file mode 100644 index 000000000..9baf25194 --- /dev/null +++ b/src/zilsd.adoc @@ -0,0 +1,294 @@ +[[sec:zilsd]] +== "Zilsd", "Zclsd" Extensions for Load/Store pair for RV32, Version 1.0 + +The Zilsd & Zclsd extensions provide load/store pair instructions for RV32, reusing the existing RV64 doubleword load/store instruction encodings. + +Operands containing `src` for store instructions and `dest` for load instructions are held in aligned `x`-register pairs, i.e., register numbers must be even. Use of misaligned (odd-numbered) registers for these operands is _reserved_. + +Regardless of endianness, the lower-numbered register holds the +low-order bits, and the higher-numbered register holds the high-order +bits: e.g., bits 31:0 of an operand in Zilsd might be held in register `x14`, with bits 63:32 of that operand held in `x15`. + +[[zilsd, Zilsd]] +=== Load/Store pair instructions (Zilsd) + +The Zilsd extension adds the following RV32-only instructions: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|no +|ld rd, offset(rs1) +|<<#insns-ld>> + +|yes +|no +|sd rs2, offset(rs1) +|<<#insns-sd>> + +|=== + +As the access size is 64-bit, accesses are only considered naturally aligned for effective addresses that are a multiple of 8. +In this case, these instruction are guaranteed to not raise an address-misaligned exception. +Even if naturally aligned, the memory access might not be performed atomically. + +If the effective address is a multiple of 4, then each word access is required to be performed atomically. + +To ensure fault handling is possible for the load instructions, it must be ensured that the register which is the source of the base address is not overwritten before the entire operation is complete. +This affects x2 for the stack pointer relative instruction and rs1 otherwise. +To guarantee this, if one of the destination registers of the pair is the source register containing the base, it must not be written to before the other register in the pair has been written. + +[NOTE] +==== +If an implementation performs a doubleword load access atomically and the register file implements writeback for even/odd register pairs, +the mentioned atomicity requirements are inherently fulfilled. +Otherwise, an implementation either needs to delay the writeback until the write can be performed atomically, +or order sequential writes to the registers to ensure the requirement above is satisfied. +==== + +[[zclsd, Zclsd]] +=== Compressed Load/Store pair instructions (Zclsd) + +Zclsd depends on Zilsd and Zca. It has overlapping encodings with Zcf and is thus incompatible with Zcf. + +Zclsd adds the following RV32-only instructions: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|no +|c.ldsp rd, offset(sp) +|<<#insns-cldsp>> + +|yes +|no +|c.sdsp rs2, offset(sp) +|<<#insns-csdsp>> + +|yes +|no +|c.ld rd', offset(rs1') +|<<#insns-cld>> + +|yes +|no +|c.sd rs2', offset(rs1') +|<<#insns-csd>> + +|=== + +=== Use of x0 as operand + +LD instructions with destination `x0` are processed as any other load, but the result is discarded entirely. Specifically, a load pair to `x0` does not cause `x1` to be written. For C.LDSP, usage of `x0` as the destination is reserved. + +When using `x0` as `src` of SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed. + +=== Exception Handling + +For the purposes of RVWMO and exception handling, LD and SD instructions are +considered to be misaligned loads and stores, with one additional constraint: +an LD or SD instruction whose effective address is a multiple of 4 gives rise +to two 4-byte memory operations. + +NOTE: This definition permits LD and SD instructions giving rise to exactly one +memory access, regardless of alignment. +If instructions with 4-byte-aligned effective address are decomposed +into two 32b operations, there is no constraint on the order in which the +operations are performed and each operation is guaranteed to be atomic. +These decomposed sequences are interruptible. +Exceptions might occur on subsequent operations, making the effects of previous +operations within the same instruction visible. + +NOTE: Software should make no assumptions about the number or order of +accesses these instructions might give rise to, beyond the 4-byte constraint +mentioned above. +For example, an interrupted store might overwrite the same bytes upon return +from the interrupt handler. + +<<< + +=== Instructions +[#insns-ld,reftext="Load doubleword to register pair, 32-bit encoding"] +==== ld + +Synopsis:: +Load doubleword to even/odd register pair, 32-bit encoding + +Mnemonic:: +ld rd, offset(rs1) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 0x3, attr: ['LOAD'], type: 8}, + {bits: 5, name: 'rd', attr: ['dest, dest[0]=0'], type: 2}, + {bits: 3, name: 0x3, attr: ['width=D'], type: 8}, + {bits: 5, name: 'rs1', attr: ['base'], type: 4}, + {bits: 12, name: 'imm[11:0]', attr: ['offset[11:0]'], type: 3}, +]} +.... + +Description:: +Loads a 64-bit value into registers `rd` and `rd+1`. +The effective address is obtained by adding register rs1 to the +sign-extended 12-bit offset. + +Included in: <> + +<<< + +[#insns-sd,reftext="Store doubleword from register pair, 32-bit encoding"] +==== sd + +Synopsis:: +Store doubleword from even/odd register pair, 32-bit encoding + +Mnemonic:: +sd rs2, offset(rs1) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 0x23, attr: ['STORE'], type: 8}, + {bits: 5, name: 'imm[4:0]', attr: ['offset[4:0]'], type: 3}, + {bits: 3, name: 0x3, attr: ['width=D'], type: 8}, + {bits: 5, name: 'rs1', attr: ['base'], type: 4}, + {bits: 5, name: 'rs2', attr: ['src, src[0]=0'], type: 4}, + {bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'], type: 3}, +]} +.... + +Description:: +Stores a 64-bit value from registers `rs2` and `rs2+1`. +The effective address is obtained by adding register rs1 to the +sign-extended 12-bit offset. + +Included in: <> + +<<< + +[#insns-cldsp,reftext="Stack-pointer based load doubleword to register pair, 16-bit encoding"] +==== c.ldsp + +Synopsis:: +Stack-pointer based load doubleword to even/odd register pair, 16-bit encoding + +Mnemonic:: +c.ldsp rd, offset(sp) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x2, type: 8, attr: ['C2']}, + {bits: 5, name: 'imm', type: 3, attr: ['offset[4:3|8:6]']}, + {bits: 5, name: 'rd', type: 2, attr: ['dest≠0, dest[0]=0']}, + {bits: 1, name: 'imm', type: 3, attr: ['offset[5]']}, + {bits: 3, name: 0x3, type: 8, attr: ['C.LDSP']}, +], config: {bits: 16}} +.... + +Description:: +Loads stack-pointer relative 64-bit value into registers `rd'` and `rd'+1`. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `ld rd, offset(x2)`. C.LDSP is only valid when _rd_≠x0; the code points with _rd_=x0 are reserved. + +Included in: <> + +<<< + +[#insns-csdsp,reftext="Stack-pointer based store doubleword from register pair, 16-bit encoding"] +==== c.sdsp + +Synopsis:: +Stack-pointer based store doubleword from even/odd register pair, 16-bit encoding + +Mnemonic:: +c.sdsp rs2, offset(sp) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x2, type: 8, attr: ['C2']}, + {bits: 5, name: 'rs2', type: 4, attr: ['src, src[0]=0']}, + {bits: 6, name: 'imm', type: 3, attr: ['offset[5:3|8:6]']}, + {bits: 3, name: 0x7, type: 8, attr: ['C.SDSP']}, +], config: {bits: 16}} +.... + +Description:: +Stores a stack-pointer relative 64-bit value from registers `rs2'` and `rs2'+1`. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `sd rs2, offset(x2)`. + +Included in: <> + +<<< + +[#insns-cld,reftext="Load doubleword to register pair, 16-bit encoding"] +==== c.ld + +Synopsis:: +Load doubleword to even/odd register pair, 16-bit encoding + +Mnemonic:: +c.ld rd', offset(rs1') + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x0, type: 8, attr: ['C0']}, + {bits: 3, name: 'rd`', type: 2, attr: ['dest, dest[0]=0']}, + {bits: 2, name: 'imm', type: 3, attr: ['offset[7:6]']}, + {bits: 3, name: 'rs1`', type: 4, attr: ['base']}, + {bits: 3, name: 'imm', type: 3, attr: ['offset[5:3]']}, + {bits: 3, name: 0x3, type: 8, attr: ['C.LD']}, +], config: {bits: 16}} +.... + +Description:: +Loads a 64-bit value into registers `rd'` and `rd'+1`. +It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. + +Included in: <> + +<<< + +[#insns-csd,reftext="Store doubleword from register pair, 16-bit encoding"] +==== c.sd + +Synopsis:: +Store doubleword from even/odd register pair, 16-bit encoding + +Mnemonic:: +c.sd rs2', offset(rs1') + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x0, type: 8, attr: ['C0']}, + {bits: 3, name: 'rs2`', type: 4, attr: ['src, src[0]=0']}, + {bits: 2, name: 'imm', type: 3, attr: ['offset[7:6]']}, + {bits: 3, name: 'rs1`', type: 4, attr: ['base']}, + {bits: 3, name: 'imm', type: 3, attr: ['offset[5:3]']}, + {bits: 3, name: 0x7, type: 8, attr: ['C.SD']}, +], config: {bits: 16}} +.... + +Description:: +Stores a 64-bit value from registers `rs2'` and `rs2'+1`. +It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. +It expands to `sd rs2', offset(rs1')`. + +Included in: <> From dad219567a74e8a7b02cac3f21e4556722aca7e2 Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Wed, 11 Dec 2024 15:35:05 +0100 Subject: [PATCH 2/6] Fixed typo --- src/zfinx.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zfinx.adoc b/src/zfinx.adoc index 384b5c6f9..fc5f8edf5 100644 --- a/src/zfinx.adoc +++ b/src/zfinx.adoc @@ -97,7 +97,7 @@ operand is zero—i.e., `x1` is not accessed. [NOTE] ==== -Load-pair and store-pair instructions are contained in a seaparate extension +Load-pair and store-pair instructions are contained in a separate extension (see Section <>). In case this is not available, transferring double-precision operands in RV32Zdinx from or to memory requires two loads or stores. Register moves need From d689eb817bf326bce6431ba60b848987f9a626b4 Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Wed, 11 Dec 2024 15:57:10 +0100 Subject: [PATCH 3/6] Add myself as contributor --- src/riscv-unprivileged.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 40fa3592c..1ca2c6dce 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -87,6 +87,7 @@ Jan Gray, Gianluca Guida, Michael Hamburg, John Hauser, +Christian Herber, John Ingalls, David Horner, Bruce Hoult, From 24f5d8d230fdfa5c3d0bf0489b17c85f256664b5 Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Wed, 11 Dec 2024 15:57:32 +0100 Subject: [PATCH 4/6] Add Zilsd/Zclsd extensions to preface --- src/colophon.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/colophon.adoc b/src/colophon.adoc index 42820d7e6..073007fb3 100644 --- a/src/colophon.adoc +++ b/src/colophon.adoc @@ -29,6 +29,7 @@ h|Extension h|Version h|Status |*Zihintpause* |*2.0* |*Ratified* |*Zimop* | *1.0* | *Ratified* |*Zicond* | *1.0* |*Ratified* +|*Zilsd* | *1.0* |*Ratified* |*M* |*2.0* |*Ratified* |*Zmmul* |*1.0* |*Ratified* |*A* |*2.1* |*Ratified* @@ -50,6 +51,7 @@ h|Extension h|Version h|Status |*Zhinxmin* |*1.0* |*Ratified* |*C* |*2.0* |*Ratified* |*Zce* |*1.0* |*Ratified* +|*Zclsd* |*1.0* |*Ratified* |*B* |*1.0* |*Ratified* |_P_ |_0.2_ |_Draft_ |*V* |*1.0* |*Ratified* @@ -72,7 +74,7 @@ h|Extension h|Version h|Status The changes in this version of the document include: -* The inclusion of all ratified extensions through March 2024. +* The inclusion of all ratified extensions through January 2025. * The draft Zam extension has been removed, in favor of the definition of a misaligned atomicity granule PMA. * The concept of vacant memory regions has been superseded by inaccessible memory or I/O regions. From 8b5fe0bb6dcc3a9f76aca31eb68067fd522a4640 Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Mon, 16 Dec 2024 09:55:44 +0100 Subject: [PATCH 5/6] Fixed typo --- src/zilsd.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zilsd.adoc b/src/zilsd.adoc index 9baf25194..17807811e 100644 --- a/src/zilsd.adoc +++ b/src/zilsd.adoc @@ -34,7 +34,7 @@ The Zilsd extension adds the following RV32-only instructions: |=== As the access size is 64-bit, accesses are only considered naturally aligned for effective addresses that are a multiple of 8. -In this case, these instruction are guaranteed to not raise an address-misaligned exception. +In this case, these instructions are guaranteed to not raise an address-misaligned exception. Even if naturally aligned, the memory access might not be performed atomically. If the effective address is a multiple of 4, then each word access is required to be performed atomically. From ac90e31351cff686453f6e1fb22256e0c66d067e Mon Sep 17 00:00:00 2001 From: Christian Herber Date: Wed, 8 Jan 2025 14:54:33 +0100 Subject: [PATCH 6/6] Updating with latest textual changes --- docs-resources | 2 +- src/zilsd.adoc | 18 ++++++++++++++---- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/docs-resources b/docs-resources index a76dd1d39..36fc02378 160000 --- a/docs-resources +++ b/docs-resources @@ -1 +1 @@ -Subproject commit a76dd1d390cba28e3b1ce86313af03ce9b69399d +Subproject commit 36fc02378d2d2426f3870cc3c330a1db0bee7554 diff --git a/src/zilsd.adoc b/src/zilsd.adoc index 17807811e..12c980f35 100644 --- a/src/zilsd.adoc +++ b/src/zilsd.adoc @@ -39,9 +39,18 @@ Even if naturally aligned, the memory access might not be performed atomically. If the effective address is a multiple of 4, then each word access is required to be performed atomically. -To ensure fault handling is possible for the load instructions, it must be ensured that the register which is the source of the base address is not overwritten before the entire operation is complete. +The following table summarizes the required behavior: + +[%header] +|=== +|Alignment |Word accesses guaranteed atomic? |Can cause misaligned trap? +|8B |yes |no +|4B not 8B |yes |yes +|else |no | yes +|=== + +To ensure resumable trap handling is possible for the load instructions, the base register must have its original value if a trap is taken. The other register in the pair can have been updated. This affects x2 for the stack pointer relative instruction and rs1 otherwise. -To guarantee this, if one of the destination registers of the pair is the source register containing the base, it must not be written to before the other register in the pair has been written. [NOTE] ==== @@ -89,9 +98,10 @@ Zclsd adds the following RV32-only instructions: === Use of x0 as operand -LD instructions with destination `x0` are processed as any other load, but the result is discarded entirely. Specifically, a load pair to `x0` does not cause `x1` to be written. For C.LDSP, usage of `x0` as the destination is reserved. +LD and C.LD instructions with destination `x0` are processed as any other load, but the result is discarded entirely and x1 is not written. +For C.LDSP, usage of `x0` as the destination is reserved. -When using `x0` as `src` of SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed. +When using `x0` as `src` of SD, C.SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed. === Exception Handling