Disanayaka A.D.M.M.S

it21010026@my.sliit.lk

Advanced Real-time log analysis and anomaly detection

In the proposed system, real-time log analysis for anomaly detection refers to the practice of continuously examining and reviewing log data as it happens to uncover unexpected patterns or behaviors that may reveal a security threat, system failure, or illegal activity. The business applies deterministic advice for finding out symptoms by logs that are simple to discover using normal keywords of key phrases such as “error,” “failure,” or “warning.” An anomaly is a term that can be quickly distinguished, and the solution comprises data accumulation, and preprocessing, as well as machine learning models for identifying anomalies that are supervised, unsupervised, and semi-supervised learning models. ML types have different uses and optimal performance under certain situations, and this is the significance of the article. These systems include the requirement of an appropriate event log dataset. Since there is no matching dataset for this system to match the relevant requirements, a matching dataset was created and used for this system., and various system events were tracked to detect anomalies. It captures essential details like event types, timestamps, user information, and risk levels, which are crucial for identifying patterns of normal and suspicious activities.

The dataset is labeled with different risk categories such as "Low," "Medium," and "High." These labels are based on predefined rules or criteria that assess the severity and likelihood of the event being associated with a security threat, making it suitable for supervised learning models that classify events as either normal or abnormal. By including user and system details, along with the outcomes of each event, the dataset enables a comprehensive analysis that helps train models to spot unusual behaviors. This is essential for building an effective anomaly detection system that can quickly identify and address potential security threats in VDI environments. Several suspicious activities that mainly affect Windows Remote machines were used for this. It sends alerts to suitable teams concerning anomalies signed by the ML models that distinguish normal and anomalous data This system diagram gives information about an anomaly detection system, which utilizes advanced machine learning and filters the event logs stored in the system. The advanced machine learning model is trained with the event logs stored in the system. After analyzing the event data stored in this way, event feedback is given in 5-minute intervals. Through the Event Feedback given in this way, suspicious activities in the system will be identified. Then the identified suspicious activities are automatically terminated by the system. In this way, every activity that is terminated will notify the admin through an alert. While, the system handles threats quickly, and the admin is always informed and ready to take further action if needed. The sensitive information in the system should be protected from most authorized users. The use case that comes under the Anomaly detection system is the Immediate Threat Detection and Response, Reduced False Positives, Continuous Monitoring, Proactive Threat Hunting, Operational Efficiency and Performance Monitoring, Data Privacy and Security. However, in the process of developing the real-time log analysis and anomaly detection system for VDI, the authors decided to use it. Because the system is specifically developed to operate on the Windows platform, I propose to use the NET as the framework. .NET is a framework that was developed by Microsoft and as such it is well suited for developing applications on Windows.

Since a large portion of the VDI environment is based on Windows, it is not advisable to use. The suitability of NET is to make sure that the system fully integrates with the established structures. This choice helps the system to parse logs; identify problematic patterns and act on them in case of potential security risks in real time along with leveraging the strengths of the Windows architecture. In short, .NET was chosen to ensure that the system operates efficiently and optimally in a Windows-based VDI infrastructure. The system has trained the dataset using the Random Forest model because it gave the highest accuracy. We also attempted such models as Logistic Regression, LDA, and the Naive Bayes, however, the performance of such models was subpar. We used Random Forest as our model for this project since it achieved the highest accuracy among all the compared algorithms. In the management of security risks in events, this research has grouped them into four security risk levels to ensure quick identification of risks and subsequent response to the same. List of High-Risk events Privilege escalation, unusual login attempts, Event log clearing, and Critical also belong to high risks. The Medium-Risk event New or unknown process, unusual network activity, or Error. Low-risk events are File and directory changes, unexpected system changes, and Warnings. Normal events, which are unspectacular although crucial, are informational and must be continually observed. This helps in focusing on the actions to take and or prevent further destruction of system security. In our risk management, there is something referred to as ‘Medium Risk’ activities; these are not ideals for high risk but should not be taken lightly. For example, a medium risk may be categorized by the emergence of new or unidentified processes, irregular networking, or errors that are beyond the threshold of the system. Nevertheless, the timing of such activities is extremely important. If medium-risk events happen during non-business hours such as late evening or night-time, then they need to be considered as critical risks. This is because processes that are not regular, deviate from normal network traffic, or the errors that occur during odd hours are probably not part of normal business processes and therefore possibly indicative of a security breach. Hence, it is crucial to pay a lot of attention to the activities within such a period since they may signify either an ongoing or a likely security threat.

Results & Discussions

While the Random Forest model for real-time log analysis and detecting anomalies tracked a very high overall accuracy of 99%. The classification report which followed yielded near perfect precision, recall and F1-scores for the different classes with classes 1 and 3 being perfectly scored with an F1 of 100%. The macro and weighted averages supported and strengthened the belong model, weights averages make nearly perfect at 99% as shown.

Technologies

Here are some of the technologies that was used to develop the system

HTML 100%
RF Classifier - Model Training 90%
JavaScript 75%
Python 80%
.Net 8 90%
C# 55%