by Kevin Oden, Ph.D., Managing Director, RMA MVC

Scott Rayburn, Product Marketing Manager

Given the clandestine, illicit nature of money laundering, it is impossible to know how much “dirty money” enters the international banking system each year. As a point of reference, the United Nations Office of Drugs and Crime (UNODC) estimates the total amount of money laundered is somewhere between 2 and 5% of global GDP, or $800 billion to $2 trillion in US dollars, annually.

In the US, banks are required by law to comply with regulations aimed to cut down on money laundering. The highest profile example is the Bank Secrecy Act (BSA) of 1970, which requires financial institutions to assist the government in the detection and prevention of money laundering. Despite investing more than $25 billion per year to fight financial crime through the deployment of anti-money laundering (AML) models and related know your customer (KYC) programs, US banks are likely still missing more money laundering activity than they are catching 50 years later.

Over time, the models that screen for money laundering activity have grown to become more sophisticated, with advances in artificial intelligence and machine learning promising to further improve detection and prevention. Let’s take a look at a few common and some not so common types of AML models and their pros and cons, then consider some overall best practices for AML model validation that will ensure each model performs as intended.


For decades, rules-based approaches to fraud detection ruled the day – and for some banks, they still do. This classic approach to fraud detection is an expert-based approach and is built upon the experience of fraud analysts. These models, also known as rules engines, are programmed with various if-then statements and designed to protect the bank from suspicious activity by, for example, flagging cash transactions over a certain amount (like $10,000) over a certain time period (such as more than five in two weeks).

Since fraud is dynamic and evolving with the detection of fraud forcing the fraudsters to develop new techniques to avoid detection, this deterministic approach often requires frequent manual adjustments, results in many false positives, and cannot adequately assess the relationships between different behaviors. Rules-based AML models are often only as good as the number of branches on the decision tree and the fraud experience of the experts the rules are based upon. These systems are also expensive to build as they require manual input of rules by fraud experts.


The second generation of AML models brought linear and logistic regression models to the forefront. These models were more powerful in many respects than the rules-based models, working to incorporate large data sets and understand the cause-effect relationship between different variables.

However, by its very nature, fraud is a rare event. Fraudsters look to be the proverbial “needle in the haystack” of transactions, adapting their approaches as the earlier “needles” are detected. This rare event characteristic leads to data imbalances between non-fraudulent and fraudulent activity, which requires additional and often complex data modeling and adjustment techniques (e.g. stratified sampling, SMOTE, etc.) to balance accuracy and precision.


Today, banks are increasingly shifting to large data, statistically driven approaches due to improved detection power along with operational and cost efficiencies. Machine learning (ML) encompasses a broad suite of techniques including modeling frameworks rather than a single model, with most introductory ML approaches including logistic regression as one of these techniques. Broadly speaking, the most relevant breakdown of this large and growing set of techniques is into supervised learning versus unsupervised learning.

In supervised learning, the fraudulent activity is labelled, and classification in the data set is the goal with the aim of prediction out of sample. With unsupervised learning, fraud is not labelled, and the goal is to determine deviations from normal behavior – in other words, outlier detection.

A non-exhaustive list of supervised learning techniques includes gradient boosting (and its adaptations), random forests, and neural networks. These techniques are powerful and are designed, due to their inherent high-dimensionality, to have the ability to fit in-sample with a great degree of accuracy. However, overfitting and model stability tend to be common problems. 

The art of producing a ML model that predicts well out-of-sample and is stable continues to evolve but requires training and experience. Furthermore, the interpretability (cause-effect) is often obscured because of the complexity of the models developed. Unsupervised learning techniques can prove to be a useful complement to the supervised approach adding a degree of interpretability.

Finally, and very much worth looking into further, is a very promising ML technique called Bayesian Rules List (BRL), which combines the best of expert-based systems (explainability) and the predictive power of some of the best ML based techniques.


The latest tool or modeling technique is social network analysis, which looks to detect fraud characteristics in a linked network of entities.  This approach adds additional information to the detection problem: the relationship or connection between individuals or entities in a network to try to detect fraudulent activity.

As fraudsters and fraudulent activity tend to learn from each other (“birds of a feather”), this provides enhanced ability to detect and possibly prevent fraud.  The downside is it requires additional information and more sophisticated and emerging analysis techniques.


If you’ve read this far, you may be wondering, “Which model is best?” That question is almost as difficult to answer as it is to quantify the amount of money laundered each year! But what is for certain is that each type of model needs to be regularly monitored and validated to ensure it’s performing as intended – or at least making progress toward that goal.

At the RMA Model Validation Consortium (MVC), we specialize in AML model validation and follow a proven approach to ensure each financial institution’s unique needs are satisfied. At a high level, here are the nine stages of our AML model validation approach:

  1. Data validation 
  2. Methodology review
  3. Conceptual soundness
  4. Scenario testing and validation 
  5. Threshold assessment 
  6. Coverage assessment and gap analysis 
  7. Process validation 
  8. Documentation review