Mining industry

Learn about industry news, development updates, community happenings, and other aspects of the PoW world! After 9 consecutive increases, Bitcoin’s difficulty finally underwent a decrease of 1.49% on…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Detecting Complex Fraud Patterns with ArangoDB

This article presents a case study of using AQL queries for detecting complex money laundering and financial crime patterns. While there have been multiple publications about the advantages of graph databases for fraud detection use cases, few of them provide concrete examples of implementing detection of complex fraud patterns that would work in real-world scenarios.

This case study is based on a third-party transaction data generator, which is designed to simulate realistic transaction graphs of any size. The generator disguises complex financial fraud patterns of two kinds:

In this post, we demonstrate that AQL provides declarative, yet efficient means for detecting such patterns and showcase a set of ArangoDB features:

In the case study, the dataset we use is produced by a third-party generator of financial transaction data available at:

The generator can produce any data size and thus is invaluable for testing fraud detection at a large scale. A sample dataset with 1.5 million transactions that we used for testing can also be downloaded from the following link and imported using arangorestore:

Three Vertex Collection Diagram
Three Vertex Collection Diagram

In ArangoDB, we represent the generated dataset by 3 vertex collections, one for each entity type:

Each entity is identified by a UUID-based key and described by a JSON document. For example, we have a client entity with the key d942a1fe-ad32–46da-9630–4622c4c01354 and describe it as:

One of the money laundering patterns is a circular money flow. Money is being transferred from one account to another and eventually lands back in the originator’s account.

To detect such a circular flow, we begin at a particular transaction from account A to account B and monitor transactions that eventually link back to account A. We check for transactions from account B at a later time but with a similar amount of money. We continue following the transactions from account to account to check if we return to account A. A sequence of transactions showing this money laundering pattern would result in a graph such as this one, which results from the AQL query in the following section.

Money Laundering Pattern Graph
Money Laundering Pattern Graph

A very simple AQL query can detect if there is a circle of transactions starting at a given transaction @firstTrans:

However, such a query won’t work in practice, because in a realistic data set we will have thousands of transactions per account and a traversal of depth 3 would already need to visit billions of paths. Also, there will be a lot of accidental circular paths in the transaction graph that do not necessarily represent a circular money flow.

In a circular money flow, money is retransferred after it was received, so we can filter the transactions that happen within a certain time window after the first one. Also, the amount of money being retransferred is similar to the original amount. So we can optimize our query by applying corresponding filters on the dates and amount attributes of all transactions on the path:

The AQL optimizer recognizes that these conditions must be satisfied by all edges in the path and performs an optimization that avoids traversing the edges that do not satisfy the filter. As a result, this query can perform a deep traversal on a large dataset within milliseconds.

For example, within just 10ms we can find a cycle of length 8 beginning at
@firstTrans = ‘transactions/0976884d-b460–406e-8d18–7ddf4e56fc92’

Thanks to the built-in graph viewer of the WebUI we can easily generate a graph with our results by extracting and returning the edges from our results:

Money Laundering Pattern Graph
Money Laundering Pattern Graph

As a result of flagging a transaction as suspicious, we may decide it is necessary to investigate all transactions of the client in question for other fraudulent behavior. This is accomplished with a small change to our initial FOR loop. We change the initial transaction lookup to a graph traversal that starts at a specified client and traverses all of their transactions using the same criteria from the first query.:

Money Laundering Pattern Graph
Money Laundering Pattern Graph

Performing analysis on a single transaction or client is very important, but analyzing all of the available transactions to start getting a picture of just how widespread fraud is in your dataset is where some serious business value is gained.

As before, we can easily generate a graph with our results by extracting the edges from our results:

Now we can see an excellent visualization of all the money cycles in the dataset:

Money Transfer cycles in Dataset
Money Transfer cycles in Dataset

Another money laundering pattern hidden in the generated data set is indirect money transfers over multiple accounts. Money is split into smaller amounts and sent to the destination over multiple layers of intermediate accounts. We can try to detect such a pattern by looking for a sequence of transactions from one account to multiple other accounts that are relatively close in time and are followed by retransfers of a similar amount of money to further accounts.

The problem of detecting a money flow network is more complicated than detecting circular money flows because in this case, our results are not just separate paths in the graph, but rather an entire subgraph. Our idea of detecting such graphs is based on solving the problem in two phases:

Of course, there will likely be a lot of such transaction sequences for each account. For detecting potentially fraudulent behavior, it is also important to check if the money is being transferred to multiple accounts and whether the recipient of the money retransfers it further. So we add a subquery, which checks whether each recipient has transferred at least 90% of the received money within the following 30 days.

The query identifies patterns of potential participation of the client in the money flows. Still, to visualize the pattern from the beginning to the end through all intermediate layers, we will perform such analysis for all clients. In practice, since the transactions constituting a pattern are close in time, such analysis can be applied incrementally, each time processing only relatively recent transactions.

We can also make our search more efficient by applying additional filters:

Also instead of returning the suspicious patterns, as a result, we instead mark the suspicious edges, so we can traverse over them afterward to visualize entire money flows. More specifically, in each suspicious edge, we write an attribute outboundLead pointing to the transaction that leads the pattern it belongs to:

Next, we detect suspicious inbound money flow patterns in an analogous way and annotate the suspicious edges with the attribute inboundLead:

Now we are ready to link the pieces together. Our goal is to run graph traversals for each money flow pattern from the initial source to the final destination:

Graph Traversal for Money flow
Graph Traversal for Money flow

To see more patterns just increase the limit in the first subquery:

Graph Traversal for Money flow
Graph Traversal for Money flow

As you can see now, the bad guys can get pretty tricky with how they move around money but not tricky enough to get past a well-done AQL query!

The ability to nest graph traversals while aggregating data all in a single query allows for efficiently gaining insights that may have otherwise been impossible to obtain and shows the power of the ArangoDB query language. Although we developed relatively complex queries, they are very declarative and constitute just a small fraction of the complexity that would be necessary to implement a corresponding algorithm using a programming language.

As promised, we covered a lot in this article, and if you would like to continue learning more about fraud detection, be sure to check out our Fraud Detection white paper:

Add a comment

Related posts:

Erikson Paper

Erickson theory of development is the division of the life of a human into eight stages, from infant to older adult. Within in stage there are events or milestones that should occur. Each stage in…

The Audience

Who do I write for? There has to be a target audience. For a friend? My family? My professor? All I know is we are mandated to write essays and papers for our instructors throughout the journey as…

5 Ways to Get Rid of Numbness After a Dental Procedure

A trip to the dentist for a routine filling, root canal, or other procedure will likely require a local anesthetic to numb the area and prevent you from feeling pain during your treatment. In most…