Mohammad Hosseinabady
A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis
Hosseinabady, Mohammad; Nunez-Yanez, Jose Luis
Authors
Jose Luis Nunez-Yanez
Abstract
Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining, dataflow graph, and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.
Journal Article Type | Article |
---|---|
Online Publication Date | Apr 23, 2019 |
Publication Date | Jun 30, 2020 |
Deposit Date | Dec 11, 2023 |
Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
Print ISSN | 0278-0070 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 39 |
Issue | 6 |
Pages | 1272-1285 |
DOI | https://doi.org/10.1109/TCAD.2019.2912923 |
Public URL | https://uwe-repository.worktribe.com/output/11511791 |
You might also like
Dynamic energy management of FPGA accelerators in embedded systems
(2018)
Journal Article
Energy optimization in commercial FPGAs with voltage, frequency and logic scaling
(2015)
Journal Article
Simultaneous multiprocessing in a software-defined heterogeneous FPGA
(2018)
Journal Article
Multi-precision convolutional neural networks on heterogeneous hardware
(2018)
Presentation / Conference Contribution
Pipelined streaming computation of histogram in FPGA OpenCL
(2018)
Presentation / Conference Contribution
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search