Detecting Concept Drift for the reliability prediction of Software Defects using Instance Interpretation. (arXiv:2305.16323v1 [cs.SE])
In the context of Just-In-Time Software Defect Prediction (JIT-SDP), Concept
drift (CD) can occur due to changes in the software development process, the
complexity of the software, or changes in user behavior that may affect the
stability of the JIT-SDP model over time. Additionally, the challenge of class
imbalance in JIT-SDP data poses a potential risk to the accuracy of CD
detection methods if rebalancing is implemented. This issue has not been
explored to the best of our knowledge. Furthermore, methods to check the
stability of JIT-SDP models over time by considering labeled evaluation data
have been proposed. However, it should be noted that future data labels may not
always be available promptly. We aim to develop a reliable JIT-SDP model using
CD point detection directly by identifying changes in the interpretation of
unlabeled simplified and resampled data. To evaluate our approach, we first
obtained baseline methods based on model performance monitoring to identify CD
points on labeled data. We then compared the output of the proposed methods
with baseline methods based on performance monitoring of threshold-dependent
and threshold-independent criteria using well-known performance measures in CD
detection methods, such as accuracy, MDR, MTD, MTFA, and MTR. We also utilize
the Friedman statistical test to assess the effectiveness of our approach. As a
result, our proposed methods show higher compatibility with baseline methods
based on threshold-independent criteria when applied to rebalanced data, and
with baseline methods based on threshold-dependent criteria when applied to
simple data.
Source: https://arxiv.org/abs/2305.16323