Transaction categorisation — good vs. great
The rise of Open Banking is ripe with opportunities for adopting new apps and solutions for analysing account data. Raw data on its own, however, doesn’t equal actionable insights. The missing ingredient enabling fintech solutions to enrich raw account data is a transaction categorisation engine.
Transaction categorisation is a process of identifying the context or purposes of specific bank account records. The process implies (1) using algorithms trained to recognise keywords, phrases or patterns in transaction details and (2) assigning categories based on pre-defined rules. But not all engines are created equal.
For developers or data scientists planning to use transaction categorisation for automated decision-making, it’s important to choose the best engine for the task at hand. Having built one from scratch, here is our take on a good transaction categorisation engine vs. a great one.
Comparing categorisation engines
When comparing transaction categorisation engines, there are generally five factors that must be considered:
- Categorisation rate
This is the percentage amount of how many transactions an engine can recognise. While it might seem to be the most important factor, it can often be artificially “inflated”. It’s possible to build a categorisation engine with a 95% categorisation rate, but this number alone does not reveal how many transactions were categorised correctly. To evaluate the true merit of a categorisation engine, it’s important to look at the categorisation rate together with the error rate.
- Error rate
This is the percentage amount that shows how many categorised transactions were categorised incorrectly. The error rate is tested by observing the transactions within a specific category (often manually, by taking random samples) and counting how many transactions do not belong in this category. This measure can be critical for understanding the output quality from a categorisation engine, as it reveals the “true categorisation rate”.
- Number of categories
This is number represents how many categories of transactions an engine is capable of identifying. Generally, categorisation engines with a lower number of categories tend to have lower error rates and higher categorisation rates. It makes sense — the fewer categories an engine has to ‘choose’ from, the less likely the engine will choose the wrong category. Similarly, the larger the number of categories, the harder the task for the engine to assign the right category with the same level of precision. However, having a larger number of categories provides a greater level of insight.
- Categorisation speed
While this factor is not related to data quality, categorisation speed can be important if you’re building models on large amounts of data or if you require instant or immediate categorisation results in production use of the engine.
- Maintenance frequency
Transactions are constantly changing. New merchants appear in the market, new methods of payment are invented, etc. This means that the best results will be achieved by an engine that is constantly maintained and re-trained to remain relevant. When comparing categorisation engines, it’s a good idea to compare how often each of the engines is updated.
Good vs. great
A great transaction categorisation engine is one that has the right balance between the five factors to perform the task at hand.
For example, if you’re building personal finance management (PFM) app, you might require a fast categorisation engine with a reasonably high categorisation rate, a large number of categories, but having a low error rate is not critical — the app users wouldn’t mind the results not being 100% correct. However, if you’re planning to use transaction categorisation for credit risk purposes (e.g. automated credit decisions), having a low error rate is critical to ensure accurate credit decisions.
In conclusion, the rise of Open Banking has provided many opportunities to build incredible apps and solutions on account data. Transaction categorisation is a layer that adds ‘superpowers’ to these apps and solutions by enriching the raw account data. To best utilise these powers, it’s paramount to find the best fit for your particular use-case.
If you’re interested to see how your current transaction categorisation engine would perform against Nordigen’s engine, let’s talk: firstname.lastname@example.org
Nordigen is a global account data analytics provider that helps banks and lenders improve the speed and accuracy of their credit decisions. We provide solutions that include income verification, verification of liabilities, transaction categorisation and behavioural feature engineering.
Connect to bank accounts and get raw transaction data. Free access to regulated banking data in Europe.
If you found this article interesting please give us a ”Clap”.
If there is anything you want to discuss from your perspective add a comment below.
Check out other Nordigen Blog Posts, share them with your team and let’s build more awareness around open banking together!