Skip to main content

Research Repository

Advanced Search

All Outputs (8)

Sparse matrix-dense matrix multiplication on heterogeneous CPU+FPGA embedded system (2020)
Conference Proceeding
Hosseinabady, M., & Nunez-Yanez, J. (2020). Sparse matrix-dense matrix multiplication on heterogeneous CPU+FPGA embedded system. In Proceedings of the 11th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures / 9th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms. https://doi.org/10.1145/3381427.3381428

Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment... Read More about Sparse matrix-dense matrix multiplication on heterogeneous CPU+FPGA embedded system.

Run-time power modelling in embedded GPUs with dynamic voltage and frequency scaling (2020)
Conference Proceeding
Nunez-Yanez, J., Nikov, K., Eder, K., & Hosseinabady, M. (2020). Run-time power modelling in embedded GPUs with dynamic voltage and frequency scaling. . https://doi.org/10.1145/3381427.3381429

This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabl... Read More about Run-time power modelling in embedded GPUs with dynamic voltage and frequency scaling.

A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis (2019)
Journal Article
Hosseinabady, M., & Nunez-Yanez, J. L. (2020). A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(6), 1272-1285. https://doi.org/10.1109/TCAD.2019.2912923

Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine... Read More about A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis.

Pipelined streaming computation of histogram in FPGA OpenCL (2018)
Conference Proceeding
Hosseinabady, M., & Nunez-Yanez, J. L. (2018). Pipelined streaming computation of histogram in FPGA OpenCL. In S. Bassini, M. Danelutto, P. Dazzi, G. R. Joubert, & F. Peters (Eds.), Parallel Computing is Everywhere (632-641). https://doi.org/10.3233/978-1-61499-843-3-632

The emergence of High-Level Synthesis (HLS) techniques and tools, along with new features in high-end FPGAs such as multi-port memory interfaces, has enabled designers to utilize FPGAs not only for compute-bound but also for memory-bound tasks. This... Read More about Pipelined streaming computation of histogram in FPGA OpenCL.

Dynamic energy management of FPGA accelerators in embedded systems (2018)
Journal Article
Hosseinabady, M., & Nunez-Yanez, J. L. (2018). Dynamic energy management of FPGA accelerators in embedded systems. ACM Transactions on Embedded Computing Systems, 17(3), Article 63. https://doi.org/10.1145/3182172

In this article, we investigate how to utilise an Field-Programmable Gate Array (FPGA) in an embedded system to save energy. For this purpose, we study the energy efficiency of a hybrid FPGA-CPU device that can switch task execution between hardware... Read More about Dynamic energy management of FPGA accelerators in embedded systems.

Multi-precision convolutional neural networks on heterogeneous hardware (2018)
Conference Proceeding
Amiri, S., Hosseinabady, M., McIntosh-Smith, S., & Nunez-Yanez, J. (2018). Multi-precision convolutional neural networks on heterogeneous hardware. In 2018 Design, Automation & Test in Europe Conference & Exhibition (419-424). https://doi.org/10.23919/DATE.2018.8342046

Fully binarised convolutional neural networks (CNNs) deliver very high inference performance using single-bit weights and activations, together with XNOR type operators for the kernel convolutions. Current research shows that full binarisation result... Read More about Multi-precision convolutional neural networks on heterogeneous hardware.

Simultaneous multiprocessing in a software-defined heterogeneous FPGA (2018)
Journal Article
Nunez-Yanez, J., Amiri, S., Hosseinabady, M., Rodríguez, A., Asenjo, R., Navarro, A., …Gran, R. (2019). Simultaneous multiprocessing in a software-defined heterogeneous FPGA. Journal of Supercomputing, 75(8), 4078-4095. https://doi.org/10.1007/s11227-018-2367-9

Heterogeneous chips that combine CPUs and FPGAs can distribute processing so that the algorithm tasks are mapped onto the most suitable processing element. New software-defined high-level design environments for these chips use general purpose langua... Read More about Simultaneous multiprocessing in a software-defined heterogeneous FPGA.

Energy optimization in commercial FPGAs with voltage, frequency and logic scaling (2015)
Journal Article
Luis Nunez-Yanez, J., Hosseinabady, M., & Beldachi, A. (2016). Energy optimization in commercial FPGAs with voltage, frequency and logic scaling. IEEE Transactions on Computers, 65(5), 1484-1493. https://doi.org/10.1109/TC.2015.2435771

This paper investigates the energy reductions possible in commercially available FPGAs configured to support voltage, frequency and logic scalability combined with power gating. Voltage and frequency scaling is based on in-situ detectors that allow t... Read More about Energy optimization in commercial FPGAs with voltage, frequency and logic scaling.