Latest Intel platform Granite Rapids has introduced a new instruction -
AMX-FP16, which performs dot-products of two FP16 tiles and accumulates
the results into a packed single precision tile. AMX-FP16 adds FP16
capability and allows a FP16 GPU trained model to run faster without
loss of accuracy or added SW overhead.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 21]
Add CPUID definition for AMX-FP16.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <
20230303065913.
1246327-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NULL, NULL, "fzrm", "fsrs",
"fsrc", NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
- NULL, NULL, NULL, NULL,
+ NULL, "amx-fp16", NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
},
#define CPUID_7_1_EAX_FSRS (1U << 11)
/* Fast Short REP CMPS/SCAS */
#define CPUID_7_1_EAX_FSRC (1U << 12)
+/* Support Tile Computational Operations on FP16 Numbers */
+#define CPUID_7_1_EAX_AMX_FP16 (1U << 21)
/* XFD Extend Feature Disabled */
#define CPUID_D_1_EAX_XFD (1U << 4)