ARMv8如何读取cache line中MESI 状态以及Tag信息（tag RAM dirty RAM）并以Cortex-A55示例

本文主要是介绍ARMv8如何读取cache line中MESI 状态以及Tag信息（tag RAM dirty RAM）并以Cortex-A55示例，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Cortex-A55 MESI 状态获取

一，系统寄存器以及读写指令
二，Cortex-A55 Data cache的MESI信息获取（AARCH 64）
- 2.1 将Set/way信息写入Data Cache Tag Read Operation Register
- 2.2 读取Data Register 1和Data Register 0数据并解码
参考文章：

一，系统寄存器以及读写指令

本文以Cortex-A55处理器为例，通过访问处理器中的内部存储单元（tag RAM和dirty RAM），来读取cache line 中的MESI信息。
Cortex-A55提供了一种通过读取一些系统寄存器，来访问Cache 和 TLB使用的一些内部存储单元（internal memory）的机制。这个功能可以探查出当缓存中的数据与主存中的数据不一致时存在的问题。
此外，AArch64模式和AArch32模式的读取方式不同：
当处理器处于AArch64模式时，先通过一些只写（write-only）寄存器来选择具体的cache line和内存地址，然后通过只读寄存器来读取具体的tag信息。下图为相关寄存器以及相关操作指令，需要注意的是，这些操作只在EL3时可用，如果在其他模式下使用这些指令，将会进入Undefined Instruction 异常。
ARMv8下，AArch64的EL3如下图红框所示：
Exception levels in AArch64
AArch64下获取内部存储单元信息的相关寄存器以及指令：
AArch64 registers used to access internal memory

当处理器处于AArch32模式下时，先通过一些只写（write-only）CP15寄存器来选择具体的cache line和内存地址，然后通过只读CP15寄存器来读取具体的tag信息。下图为相关寄存器以及相关操作指令，需要注意的是，这些操作只在EL3时可用，如果在其他模式下使用这些CP15指令，将会进入Undefined Instruction 异常。
ARMv8下，AArch32的EL3如下图红框所示：
Exception levels in AArch32
AArch32下获取内部存储单元信息的相关寄存器以及指令：
AArch32 CP15 registers used to access internal memory
Cortex-A55支持一下内部存储单元信息的获取：

L1 data cache
L1 instruction cache
L2 TLB
Main TLB RAM
Walk cache
IPA cache

二，Cortex-A55 Data cache的MESI信息获取（AARCH 64）

接下来，本文以Cortex-A55的Data cache为例，读取其某个cache line的tag信息，其具体的步骤很简单，分为两步：

写入Data Cache Tag Read Operation Register，写入的内容为具体的Set和way信息，通过way index和set index来定位到想要读取的cache line。
读取相应的 Data Register 0 和 Data Register 1寄存器，通过对Data Register寄存器里面的数据进行解码，来获取tag 信息。

2.1 将Set/way信息写入Data Cache Tag Read Operation Register

首先，我们需要从一个虚拟地址（VA）中解析出Set index信息。
下图为Cortex-A57的4-way组相连的32KB大小的data cache结构，其cache line大小也为64 bytes，从图中可知，一个VA可以被分成几个部分：Tag，Set index，word index以及byte index。其中Set index = VA[13:6]。
在这里插入图片描述
在另一个实例中，32KB大小的4-way组相连data cache，cache line大小为32 bytes，其Set index = VA[12:5]：

Cortex-A55的Data cache为4-way 组相连结构。假设其为32KB，一个cache line的大小为64 bytes，我们就可以求出该data cache中有 32 KB / 64 B / 4 = 2^7 = 128个set（组），也就是说至少需要7个bit才能完整解析出具体的set index。如下图所示，可以通过公式：

S = log2(Data cache size / 4).
S=12 For a 16KB cache.
S=13 For a 32KB cache.
S=14 For a 64KB cache.

来计算出Set index的范围：Set index = VA[12:6]。
由于是4-way 组相连结构，cache line 可以存在与任意一个way中，所以我们的cache way可能为0，1,2，3中任意一个数字。
求得了set和way的index后，需要对其进行编码，然后写入到Data Cache Tag Read Operation Register寄存器中。其编码规则如下图所示，只需将Set和way的值写入对应的bit中即可，其中Rd[5:3]为cahche double word数据的偏移量，由于本次示例是读取tag信息，所以Rd[5:3]为0即可。
在这里插入图片描述

所以我们要写入Data Cache Tag Read Operation Register的Rd的值可以通过以下代码获取：

unsigned int get_Rd_data(int * VA, way_num)
{unsigned int set_way_index = VA | 0x1FC0; //get way index, VA[12:6]set_way_index |= way_num < 30; //way_num could be 0,1,2,3 return set_way_index;
}

Rd中除了Set和way信息，其他值均为0，0x1FC0为VA[12:6]全为1的情况：
在这里插入图片描述
然后我们使用如下指令将Rd的值写入，假设Rd为R0：

MSR S1_6_c15_c2_0, x0; x0 = get_Rd_data(VA,way_num)

2.2 读取Data Register 1和Data Register 0数据并解码

将Set/way信息写入Data Cache Tag Read Operation Register 后，相当于选择了想要操作的cache line，接下来我们将读取Data Register 1和Data Register 0的数据来获取该cache line里的tag信息，除了tag信息外，我们还可以从Data Register 1和Data Register 0两个寄存器中获取：

MESI 状态信息
outer内存属性
valid 信息

可获得的信息具体见下图：
在这里插入图片描述

需要注意的是，如果是想获取MESI状态信息，则需要两个寄存器配合使用，即读取Data Register 0 [4] - Dirty以及Data Register 1 [31:30] - MESI ：
Data Register 0 [4]里的为来自Dirty RAM的Dirty bit，用于判断当前cache line 是否为diry。

0：clean
1：dirty

Data Register 1 [30:29]里的为来自tag RAM的MESI信息：

0b00 Invalid
0b01 Shared
0b10 Unique non-transient
0b11 Unique transient

关于transient的概念本文这里不做过多描述，读者可以自行查阅ARM相关文档：

In Armv8, it is IMPLEMENTATION DEFINED whether a Transient hint is supported. In an implementation that supports the Transient hint, the Transient hint is a qualifier of the cache allocation hints, and indicates that the benefit of caching is for a relatively short period. It indicates that it might be better to restrict allocation of transient entries, to avoid possibly casting-out other, less transient, entries.
A55 has a specific behavior for memory regions that are marked as Write-Back cacheable and transient, as defined in the Armv8 A architecture：

For any load that is targeted at a memory region that is marked as transient, the following occurs:
If the memory access misses in the L1 data cache, the returned cache line is allocated in the L1 data cache but is marked as transient.
On eviction, if the line is clean and marked as transient, it is not allocated into the L2 cache but is marked as invalid.

For stores that are targeted at a memory region that is marked as transient, if the store misses in the L1 data cache, the line is allocated into the L2 cache.

关于MESI信息，读者只需知道transient hint是一种由具体架构实现定义的一种属性，本文假设当前环境没有实现transient，所以Data Register 1 [30:29] MESI信息里的Unique表示为 0b10。
根据笔者的上篇博文：缓存一致性（cache coherency）解决方案：MESI 协议状态转换详解可知，来自tag RAM的MESI信息需要和Dirty bit一起组合使用，才能表示出完整的MESI信息：

M，Modified， Unique Dirty((UD)，只存在于当前cache中（unique），并且该cache line上的数据与下一级存储单元中的数据不同（dirty）。换言之，cache line中最新的数据位于当前cache，其他cache中没有备份，cache line中的内容与主存中的不一致。
E，Exclusive， Unique Clean(UC)，数据只存在于当前cache line中，并且为clean的。cache中cache line中的数据于主存中的一致，并且其他core中的cache没有该地址的数据备份，只存在一个cache中。
S，Shared Clean (SC)， Shared ，cache line中的data不一定与主存中的一致，但是shared cache line中的数据是最新的，且存在于多个core中。
I，Invalid，无效的数据，或者可以说当前cache line里没有数据。

最后可以得出如下组合关系：
在这里插入图片描述

比如读取到的Data Register 0 [4]为1，以及Data Register 1 [31:30]为2，根据上图的组合关系，可知当前cache line的MESI状态为 Unique+Dirty = Modified。

完整的获取MESI信息的示例代码如下：

; step 1: write set index and way num into Data Cache Tag Read Operation Register
MSR S1_6_c15_c2_0, x0; x0 = get_Rd_data(VA,way_num)
; step 2: read Data Register 1 and Data Register 0
MRS x1, S3_6_c15_c0_0   ;x1 =  Data Register 0 
MRS x2, S3_6_c15_c0_1    ;x2 =  Data Register 1