Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities. (arXiv:2306.06109v1 [cs.CR])

Deep learning (DL) models have become increasingly popular in identifying
software vulnerabilities. Prior studies found that vulnerabilities across
different vulnerable programs may exhibit similar vulnerable scopes, implicitly
forming discernible vulnerability patterns that can be learned by DL models
through supervised training. However, vulnerable scopes still manifest in
various spatial locations and formats within a program, posing challenges for
models to accurately identify vulnerable statements. Despite this challenge,
state-of-the-art vulnerability detection approaches fail to exploit the
vulnerability patterns that arise in vulnerable programs. To take full
advantage of vulnerability patterns and unleash the ability of DL models, we
propose a novel vulnerability-matching approach in this paper, drawing
inspiration from program analysis tools that locate vulnerabilities based on
pre-defined patterns. Specifically, a vulnerability codebook is learned, which
consists of quantized vectors representing various vulnerability patterns.
During inference, the codebook is iterated to match all learned patterns and
predict the presence of potential vulnerabilities within a given program. Our
approach was extensively evaluated on a real-world dataset comprising more than
188,000 C/C++ functions. The evaluation results show that our approach achieves
an F1-score of 94% (6% higher than the previous best) and 82% (19% higher than
the previous best) for function and statement-level vulnerability
identification, respectively. These substantial enhancements highlight the
effectiveness of our approach to identifying vulnerabilities. The training code
and pre-trained models are available at https://github.com/optimatch/optimatch.

Source: https://arxiv.org/abs/2306.06109


Related post