MalVol-25: a diverse, labeled and detailed malware volatile memory dataset for detection and response testing and validation

Dunsin, Dipo, Ghanem, Mohamed Chahine and Almeida Palmieri, Eduardo (2025) MalVol-25: a diverse, labeled and detailed malware volatile memory dataset for detection and response testing and validation. [Dataset]

Abstract

This Dataset addresses the critical need for high-quality malware datasets that support advanced analysis techniques, particularly reinforcement learning (RL). Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective RL training. To fill this gap, we developed a systematic dataset generation approach combining automated malware execution in controlled virtual environments with dynamic monitoring tools. The resulting dataset comprises clean and infected memory snapshots across multiple malware families and operating systems, capturing detailed behavioural and environmental features. Key design decisions include applying ethical and legal compliance, thorough validation using both automated and manual methods, and comprehensive documentation to ensure replicability and integrity. The dataset’s distinctive features enable modelling system states and transitions, facilitating RL-based malware detection and response strategies. This resource is significant for advancing adaptive cybersecurity defences and digital forensic research. Its scope supports diverse malware scenarios and offers potential for broader applications in incident response and automated threat mitigation.

Documents
10536:53322
[thumbnail of README File.pdf]
Preview
README File.pdf - Accepted Version

Download (550kB) | Preview
Details
Record
View Item View Item