Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network. (arXiv:2208.05476v1 [cs.CR])
Malicious software (malware) causes much harm to our devices and life. We are
eager to understand the malware behavior and the threat it made. Most of the
record files of malware are variable length and text-based files with time
stamps, such as event log data and dynamic analysis profiles. Using the time
stamps, we can sort such data into sequence-based data for the following
analysis. However, dealing with the text-based sequences with variable lengths
is difficult. In addition, unlike natural language text data, most sequential
data in information security have specific properties and structure, such as
loop, repeated call, noise, etc. To deeply analyze the API call sequences with
their structure, we use graphs to represent the sequences, which can further
investigate the information and structure, such as the Markov model. Therefore,
we design and implement an Attention Aware Graph Neural Network (AWGCN) to
analyze the API call sequences. Through AWGCN, we can obtain the sequence
embeddings to analyze the behavior of the malware. Moreover, the classification
experiment result shows that AWGCN outperforms other classifiers in the
call-like datasets, and the embedding can further improve the classic model’s
performance.
Source: https://arxiv.org/abs/2208.05476