- AUTHOR Mahesh Immanni
- PUBLISHED ON January 22, 2021
Machine Learning, What it is?
Machine learning is the study of computer algorithms that allow computer programs to automatically improve on the experiences. An algorithm is a set of rules/instructions that a computer programmer specifies and can process. In simple words, machine learning algorithms learn by experience, similar to how humans do.
Machine Learning is a process that provides the system an ability to learn from experience without being explicitly programmed, also improving when new experiences are brought into consideration.
Machine learning is a subset of Artificial Intelligence:
How Machine Learning Has Evolved?
Machine learning has its evolution story from pattern recognition and theory; computers use it to learn and perform specific tasks, do research, and get enhanced from the data. Computers learn from previous computations to produce reliable, repeatable decisions/ results. Machine learning concepts have been used on a large scale in areas namely: self-driving Google cars, online recommendation offers such as those from Amazon, check customer feedback on social media sites, in Security and Fraud detection.
Machine Learning (ML) in the Context of Security
As technology is evolving, hackers have educated themselves to attack on highly-secured systems and capture all the confidential data. In today’s fast-paced world where new security threats are evolving faster than ever, it is high time for the anti-virus/anti-malware products to evolve faster than ever before to mitigate the evolving threats in current times.
Machine Learning is an important aspect in security domains to safeguard your confidential data and detect security breaches in other systems. Machine learning helps automate the process of finding, contextualizing, and triaging relevant data at any stage in the threat intelligence lifecycle.
What is the Context of Security we are Talking About?
When we talk of security, we talk of a very broad definition. It can be related to physical access to the resources via breaking into physical infrastructures, it can be related to virtual access to the resources via hacking or social engineering, it can also be related to viruses/malware/ransomware.
Cyber attacks can be reduced with the below mentioned 3 ways:
- Confidentiality: Sensitive data is disclosed only to authorized parties who have a right to access and view said data
- Integrity: sensitive data is protected from being deleted or modified by an unauthorized party and, if such data is deleted as a cause of human error by an authorized party, then the damage can be reversed
- Availability: sensitive data can be accessed by the right people, albeit through secure access channels safeguarded by authentication systems.
Machine learning plays an important role in fields like:
- Threat Identification
- Network Vulnerability
- Automate response
- Alert us regarding Unethical Hackers
- Endpoint protection.
- Protecting Cloud Data
How can ML Help in the Context of Security?
There few ways ML can contribute to improving security:
- Detecting anomalies by knowing what is normal behaviour vs what is not normal user behaviour
- Using classification to determine if a certain executable is a potential badware.
- Analyze patterns and learn to prevent attacks and respond to changing behavior.
- Be more proactive in preventing threats and responding to active attacks in real time.
- Reduce the amount of time spent on routine tasks
- Enhance organizations to use their resources more strategically.
How does it work?
In case of anomaly detection, the system can be trained based on the action sequences to perform goodware and when such a model is put to test and it sees a non standard sequence of actions it will be flagged as anomaly. The problem with this is you may not be able to create a large data set to represent actions represented by goodware easily.
Whereas in case of Classification, one of the possible processes that can be adopted is to extract features from the executable and then use these features as the basis of training the Machine learning models. This will require a large set of known goodware and known badware to form the training, test, and validation data.
Also based on how the model will improvise or learn with new known goodware and badware, one can look at a process of batch learning or online learning.
In case of batch learning it may be a preferred way for the vendor to train the new model and then deploy after validating any improvements; hence, keeping a strict control on the performance of the model. But in case of online training there are possibilities that the model will be biased towards a particular usage pattern; hence, reducing the overall efficiency of the model.
Challenges to achieve good efficiency
There are few challenges faced to create a good model that is generalized enough to take care of unseen scenarios:
- Having large enough dataset to train on which is representative of the goodware and badware to avoid sampling bias. Sampling bias will lead to non-generalized models which will perform well with the training data but may not be good on new instances of data which were not seen in training data.
- Selecting features such that they are relevant towards identifying goodware vs badware. Having too many features which are not relevant may contribute towards noise and hence, leading to bad data to train with.
- To overcome threats, organizations have to implement some strategies that might require talented staff that can prove time consuming for long run.
- Strategies involve gathering data, processing the data to train the algorithms, engineering the algorithms, and training them to learn from the data which suits the organization’s business goals.
- A false correlation occurs when things completely independent of each other exhibit a very similar behavior, which may create the illusion they are somehow connected.