Vulnerability detection through machine learning-based fuzzing: A systematic review

Bamohabbat Chafjiri, Sadegh; Legg, Phil; Hong, Jun; Tsompanas, Michail Antisthenis

doi:10.1016/j.cose.2024.103903

Vulnerability detection through machine learning-based fuzzing: A systematic review

Bamohabbat Chafjiri, Sadegh; Legg, Phil; Hong, Jun; Tsompanas, Michail Antisthenis

Authors

Sadegh Bamohabbat Chafjiri

Professor Phil Legg Phil.Legg@uwe.ac.uk
Professor in Cyber Security

Jun Hong Jun.Hong@uwe.ac.uk
Professor in Artificial Intelligence

Michail Tsompanas Antisthenis.Tsompanas@uwe.ac.uk
Senior Lecturer in Computer Science

Abstract

Modern software and networks underpin our digital society, yet the rapid growth of vulnerabilities that are uncovered within these threaten our cyber security posture. Addressing these issues at scale requires automated proactive approaches that can identify and mitigate these vulnerabilities in a suitable time frame. Fuzzing techniques have emerged as crucial methods to preemptively tackle these risks. However, traditional fuzzing methods encounter various challenges, such as a lack of strategy for deep bug identification, time-intensive bug analysis, quality of inputs, seed scheduling and others. To overcome these challenges, diverse Machine Learning (ML) models and optimisation techniques have been employed, including advanced feature engineering, optimised seed selection, refined predictive/fitness models, and Gradient-based optimisation. Furthermore, the use of ML architectures such as Long Short-Term Memory (LSTM), Generative Adversarial Network (GAN), Sequence-to-Sequence (Seq2Seq), and Generative Randomised Unit (GRU), have demonstrated greater effectiveness within ML-based fuzzing. In this paper, we delve into this paradigm shift, aiming to address fundamental challenges across different ML categories. We survey popular ML categories such as Traditional Machine Learning (TML), Deep Learning (DL), Reinforcement Learning (RL), and Deep Reinforcement Learning (DRL), to investigate their potential for enhancing traditional fuzzing approaches. We explore the respective advantages in each category of ML-based fuzzing, while also analysing the challenges unique to each category. Our work provides a comprehensive survey across the fuzzing domain and how machine learning techniques have been utilised, that we believe will be of use to future researchers in this domain.

Journal Article Type	Article
Acceptance Date	May 15, 2024
Online Publication Date	May 19, 2024
Publication Date	Aug 31, 2024
Deposit Date	May 16, 2024
Publicly Available Date	Jun 5, 2024
Journal	Computers and Security
Print ISSN	0167-4048
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	143
Article Number	103903
DOI	https://doi.org/10.1016/j.cose.2024.103903
Public URL	https://uwe-repository.worktribe.com/output/11997720