Nowadays, online banking has become a default service for almost all banks in the US. The early version of online banking started in the late 1980s and in 1994, Stanford Credit Union became the first financial institution to offer internet banking to all of its customers. By 2000, online banking had become mainstream, which also created new security headaches as it opened a new avenue for banking fraud. In traditional banking, transactions were done in-person and the legitimacy of a user was verified using photo IDs and/or established long-term customer relationships, which meant a fraudster had to show up at the bank to compromise a user’s account – a high barrier to cross. However, with online banking a user doesn’t interact with a bank employee in-person, a user is identified solely through a username and password. These credentials can be, and often are, easily stolen by fraudsters via various phishing and social engineering scams. Security analysts realize that every online session that uses the correct username/password combination cannot be blindly trusted, and there is a need to use additional attributes to verify that the session being conducted is in fact, by a legitimate user.

Rules-based-blog

A common data attribute used by online banking fraud detection solutions is IP address. By analyzing the location associated with an incoming IP address and its characteristics, fraud detection models try to distinguish between legitimate and non-legitimate sessions. The first generation of such models were rule-based approaches that used simple rules such as, `if a user appears from Russia, then generate an alert.’  These rules were created due to the large number of frauds that banks were experiencing from IP addresses coming from countries like Russia, Nigeria, and other countries outside of the US. However over time these rules became ineffective for two reasons:

  1. Fraudsters spoofed their IP address to mimic logins from the US.
  2. Legitimate sessions from outside the US (think military, traveling abroad, etc.) were alerted with this model, creating a high false positive rate, and leading to many frustrated bank customers. (ex: if a user has been seen logging in from Russia several times before, it does not make sense to create an alert for them “from Russia.”

Anecdotally, I shared a similar frustration when traveling for work in 2010. I was on a business trip in Basel, Switzerland and logged into my bank account to pay some bills, and my access was denied. I was asked to enter a “Secure Code” which would be sent to my cellphone, however I did not have an international roaming service and was unable to receive the code, making that security feature difficult and inefficient, leaving me extremely frustrated.

The rule based models continued to be enhanced creating ways to keep track of a users’ footprint and used rules like if a user appears from a new location, then generate an alert. However, even these models turned out to be too simplistic with high false positive rates. Such rules generate an alert every time a user accesses his account from a new location and does not factor in the possibilities that the user might be moving around in a local area or is a frequent business traveler and constantly changing location. Moreover, fraudsters have adapted to these rules by choosing IP addresses based on users’ zip codes.

In another personal experience, I had opened a bank account and signed up for online access. I created the credentials while I was at work in Mountain View, CA and then logged in a couple of times later in the day when I was at home in Foster City, CA. The next morning I got a call from the bank saying that they had noticed an unusual logging pattern with my account.

The fact is that every person / user is different and rules do not support learning by nature – instead rules are based on known patterns of fraud and are eventually bound for failure; fraudsters will sooner or later figure out the rules and change their mode of operation. More importantly the rules do not take into account the fact that some of the patterns seen in frauds might be common among legitimate users, this could result in a high number of false alerts.

IP address is a valuable source of data but instead of putting efforts to create and maintain a black-list of IP addresses or fraud patterns, fraud detection algorithms should focus on users and try to predict why, when, and how a legitimate user might change their IP address. This would make it much harder for fraudsters to beat the system (or adapt their strategy) as it would require access to the users’ historical behavior, something that is not easily accessible. In other words, fraud detection models would be much more effective and robust if they focus on users’ behavior modeling instead of looking for signatures found from past fraudulent cases.

More contexts about why behavioral analytics based models are effective in detecting and preventing frauds in the next blog post.