DETAIL DOCUMENT
Deteksi Malware Pada Windows Event Logs Dengan Pendekatan Deteksi Outlier Tanpa Korelasi Kejadian
Total View This Week0
Institusion
Institut Teknologi Sepuluh Nopember
Author
Achmad, Riki Mi'roj
Subject
T57.5 Data Processing 
Datestamp
2023-08-18 01:16:31 
Abstract :
Malware (malicious software) merupakan perangkat lunak yang diselubungkan kedalam sistem dengan maksud membahayakan kerahasiaan, integritas, atau ketersediaan data, aplikasi, atau sistem operasi korban. Deteksi malware pada Windows dapat dilakukan dengan memanfaatkan Windows Event Log yang merekam seluruh kegiatan yang terjadi di sistem operasi. Namun melakukan deteksi malware secara manual sangat sulit dilakukan karena banyaknya aktivitas yang direkam pada event log. Tugas Akhir ini akan memanfaatkan algoritma machine learning untuk melakukan deteksi malware pada event log yang dihasilkan oleh Windows Event Log dengan bantuan sysmon. Pendekatan yang digunakan adalah Outlier Detection karena event log yang disebabkan oleh aktivitas malware bisa saja merupakan sebuah outlier atau anomaly diantara event-event normal pada windows. Algoritma Outlier Detection akan mencari sebuah anomali atau keanehan yang terdapat pada dataset yang dihasilkan dari event log yang telah dikumpulkan. Terdapat 3 model yang digunakan pada penelitian ini yaitu Isolation Forest, Local Outlier Factor dan One-Class SVM. Berdasarkan hasil yang didapatkan dan analisis yang telah dilakukan dari penelitian ini dapat disimpulkan performa terbaik secara keseluruhan berdasarkan nilai f1-score diperoleh oleh model Local Outlier Factor dengan f1- score tertinggi mencapai 0.9873. Performa terbaik selanjutnya yaitu pada model One-Class SVM yang memiliki nilai f1-score tertinggi mencapai 0.9451. Dan model Isolation Forest memiliki nilai f1-score tertinggi yaitu 0.7620. Fitur yang dianggap fitur paling berpengaruh pada dataset didapatkan setelah dilakukan pre-processing dengan mengubah nilai menjadi 0 atau 1 untuk fitur dengan missing value lebih dari 70% dan mengubah nilai menjadi panjang string untuk fitur TargetFilename. Seleksi fitur juga dilakukan dengan menggunakan metode Principal Component Analysis untuk mendapatkan fitur yang dianggap paling berpengaruh. Terdiri dari 20 fitur yang diantaranya adalah TargetFilename, EventID, TargetProcessGuid, EventType, TargetImage, PreviousCreationUtcTime, DestinationHostname, Company, Description, Product, IntegrityLevel, CreationUtcTime, StartFunction, ParentProcessGuid, User, LogonId, ParentProcessId, TerminalSessionId, RuleName dan TargetObject ======================================================================================================================== Malware (malicious software) is a software planted into a system with the intention of compromising the confidentiality, integrity, or availability of the victim's data, application, or operating system. Malware Detection in Windows can be done by utilizing Windows Event Log which records all activities that occur in the operating system. But doing malware detection manually by inspecting the event log can be very difficult to do because of the huge amount of logs that is generated by the event log. Machine learning technology is widely used in many various industrial sectors. In this paper we will utilize machine learning algorithm to do malware detection in event log which generated by Windows Event Log with the help of plugin Sysmon. Outlier Detection approach will be used because event log caused by malware activity are sometimes outliers or anomalies between normal events in windows. Outlier Detection algorithm will find anomalies in the dataset provided by the event log. There are 3 Outlier Detection models used in this study namely Isolation Forest, Local Outlier Factor and One?Class SVM. Analysis of the accuracy and effectiveness will also be done to get the best result in the machine learning model used in this project. Based on the results obtained and the analysis that has been carried out from this study, it can be concluded that the best overall performance based on the f1-score value was obtained by the Local Outlier Factor model with the highest f1-score reaching 0.9873. The next best performance is the One-Class SVM model which has the highest f1-score reaching 0.9451. And the Isolation Forest model has the highest f1-score reaching 0.7620. Features that are considered the most influential features in the dataset obtained after after pre-processing by changing the value to 0 or 1 for features with a missing value of more than 70% and changing the value to a string length for the TargetFilename feature. Feature selection are also carried out using the Principal Component Analysis method consist of 20 features which include TargetFilename, EventID, TargetProcessGuid, EventType, TargetImage, PreviousCreationUtcTime, DestinationHostname, Company, Description, Product, IntegrityLevel, CreationUtcTime, StartFunction, ParentProcessGuid, User, LogonId, ParentProcessId, TerminalSessionId , RuleName and TargetObject 
Institution Info

Institut Teknologi Sepuluh Nopember