Classifying network traffic is the basis for important network applications.
Prior research in this area has faced challenges on the availability of
representative datasets, and many of the results cannot be readily reproduced.
Such a problem is exacerbated by emerging data-driven machine learning based
approaches. To address this issue, we present(N et)2databasewith three open
datasets containing nearly 1.3M labeled flows in total, with a comprehensive
list of flow features, for there search community1. We focus on broad aspects
in network traffic analysis, including both malware detection and application
classification. As we continue to grow them, we expect the datasets to serve as
a common ground for AI driven, reproducible research on network flow analytics.
We release the datasets publicly and also introduce a Multi-Task Hierarchical
Learning (MTHL)model to perform all tasks in a single model. Our results show
that MTHL is capable of accurately performing multiple tasks with hierarchical
labeling with a dramatic reduction in training time.