Does Securing APIs through Machine Learning Make Sense?
API Security and Machine Learning
The increasing number of API breaches paints a grim picture of how security of the developed APIs is evolving. Two recent incidents provide evidence. First, Uber was reportedly breached in 2019 where the attacker could simply use the victim’s phone number or email, to obtain a UUID by making an API request. The victim UUID was then used to query for the personal information belonging to the victim. In addition, the account takeover allowed the attacker to execute application activities such as booking rides or other monetary transactions, all through API calls.
A similar vulnerability was found with Apple’s Sign In capability. The vulnerability allowed the account takeover of an arbitrary user, given that the attacker knows the email address of the victim. To elaborate, Apple’s API which issues the JWT required to log into a third-party application, could be triggered to issue the JWT for an arbitrary email address.
The microservice architecture has led to the explosion in the number of APIs managed by organizations both large and small and deployed in either public or private clouds. However, designs and architectures for securing such APIs from rogue attacks, have fallen short. This is where machine learning can shine. Automated threat detection capability built using malicious pattern recognition (supervised learning) and anomaly detection (unsupervised learning) can rapidly help close the gap between API expansion and the security shortcomings.
In this article, we discuss three of the several use cases where CloudVector has leveraged machine learning to improve our customer’s API ecosystem security and also highlight valuable data insights which influence the customer’s business.
Use Case 1: PII Labeling
An API, by definition, is an interface or a liaison between the client issuing the API request and the backend data sources. As part of the data exchange, the API may help ferry data elements which personally identify individuals. Take for example, the Twitter mobile application. When a user retrieves tweets for a random Twitter account, the API supporting such a call also provides PII information such as name or location of the account holder, in addition to the tweets and related statistics. Hence, securing such APIs which are accessible over the internet becomes critical so as to protect against abuse and possible data breach. Indeed, such incidents have occurred as noted with Uber and Apple above.
Therefore, securing such APIs begins with identifying which APIs among the ecosystem, carry sensitive data such as name, address, SSN, credit card number, etc. This is because once APIs carrying sensitive data have been identified, the necessary governance and protection mechanisms can then be prioritized towards these APIs. Label determination for individual API parameters is feasible with CloudVector’s deployment architecture since the platform allows inspecting the API payload for metadata determination.
Figure 1: An example of CloudVector labels assigned to an API’s parameters
Figure 1 shows an example where CloudVector assigns one label to two parameters of an API call. CloudVector uses supervised learning algorithms to build a machine learning model to identify the sensitive data label associated with the API parameters. Take for example, the identification of Name. While typically supervised learning algorithms imply distinguishing one class from the other, in the case of Name classification, we use the one-class classification approach where the goal is to classify the sample as a Name or not. Specifically, we use one-class SVM, Local Outlier Factor, and other relevant algorithms to build and evaluate the model.
Use Case 2: Anomaly Detection
Typical data breaches or Distributed Denial of Service (DDoS) attacks have a pattern which is unusual in context of the normally observed traffic. In case of a data breach, there would be an unusual number of records or bytes extracted as API responses. For a DDoS attack, the victim API would observe either an abnormal number of requests to the endpoint, an abnormal amount of data passed as the API request payload, or the endpoint would deal with other payload specific quirks. Protecting the API against such attacks shields the target organization from a significant financial and reputation loss.
At CloudVector, we build anomaly detection models based on various metrics that characterize an API. As an example, to prevent a data breach scenario, we build the anomaly detection model off of the metric which identifies the size of the API response payload. If a typical client-API interaction exchange results in 50KB API response payload size, a 1MB response could be worrisome. Such a metric model characterizes the time series of observed values for the metric and helps forecast bounds for normal interactions with the API. Note that we leverage Holt-Winters forecasting approach to model the time series. Also note that CloudVector captures several metadata metrics out-of-the-box for monitoring. Such metrics include API response payload size, number of incoming requests, and so on.
The above use case for monitoring a metric is only one of the several ways to look for anomalies. CloudVector’s Anomaly Detection platform, however, comprises a suite of systems built to look for intrusion attempts across the entire ecosystem.
Use Case 3: Risk Scoring APIs
The final use case we discuss is how to score APIs based on their identified meta properties. The computed scores are a reflection of the risk associated with the API endpoint, and quantifies the impact as well as chances of an offensive attack succeeding against the API.
How do we compute the risk score for an API? We use Regression models where individual attributes include the factors discussed above. That is, the risk associated with APIs carrying sensitive data is expected to be more than those without any. Similarly, anomalies observed for an API endpoint suggest intrusion attempts (risk score goes up) which should grab the attention of security admins and help them with definition of the necessary protection policies. We use 10+ factors in identification of the risk score.
In this article, we summarized how CloudVector’s machine learning technology can help close the widening gap between API ecosystem expansion and the lagging security measures. We discuss how CloudVector leverages ML for PII identification, a critical use case to understand the API landscape and influence business decisions. We also provide a peek into the anomaly detection capability to protect APIs against both common and impactful attacks. Finally, we discuss how risk-scoring APIs provide insights to security administrators for them to take the necessary next steps for protection.
If you would like to know more about the CloudVector platform or how we further utilize machine learning for API protection, please feel free to reach out at firstname.lastname@example.org.