Time-power-energy balance of BLAS kernels in modern FPGAs
Resumen:
Numerical Linear Algebra (NLA) is a research field that in the last decades has been characterized by the use of kernel libraries that are de facto standards. One of the most remarkable examples, in particular in the HPC field, is the Basic Linear Algebra Subroutines (BLAS). Most BLAS operations are fundamental in multiple scientific algorithms because they generally constitute the most computationally expensive stage. For this reason, numerous efforts have been made to optimize such operations on various hardware platforms. There is a growing concern in the high-performance computing world about power consumption, making energy efficiency an extremely important quality when evaluating hardware platforms. Due to their greater energy efficiency, Field-Programmable Gate Arrays (FPGAs) are available today as an interesting alternative to other hardware platforms for the acceleration of this type of operation. Our study focuses on the evaluation of FPGAs to address dense NLA operations. Specifically, in this work we explore and evaluate the available options for two of the most representative kernels of BLAS, i.e. GEMV and GEMM. The experimental evaluation is carried out in an Alveo U50 accelerator card from Xilinx and an Intel Xeon Silver multicore CPU. Our findings show that even in kernels where the CPU reaches better runtimes, the FPGA counterpart is more energy efficient.
2022 | |
Los investigadores contaron con el apoyo de la Universidad de la República y el PEDECIBA. Se agradece a la ANII – MPG Independent Research Groups : “Efficient Hetergenous Computing” - CSC group |
|
Dense numerical linear algebra Energy-efficiency HPC Matrix-matrix multiplication |
|
Inglés | |
Universidad de la República | |
COLIBRI | |
https://link.springer.com/chapter/10.1007/978-3-031-23821-5_6
https://hdl.handle.net/20.500.12008/35893 |
|
Acceso abierto | |
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |