Setup Environment

Set up environment for data processing and input credentials

%%capture capt  

Pull Data From API for Processing

Pull all transaction data from API and create large table with all extracted data. As part of this step consider performing some data structure optimizations for example storing text fields as categories.


Current Extraction Runtime:  14.3 minutes, 100 API calls, 990002 records extracted so far (continuing ....)
Current Extraction Runtime:  28.74 minutes, 200 API calls, 1990013 records extracted so far (continuing ....)
Current Extraction Runtime:  43.37 minutes, 300 API calls, 2990018 records extracted so far (continuing ....)
Current Extraction Runtime:  58.13 minutes, 400 API calls, 3990028 records extracted so far (continuing ....)
Original Memory Usage : 299.255 megabytes
Final Memory Usage : 195.352 megabytes
Extraction Total Runtime is: 63.63 minutes

Review Some Sample Data

Display some key details for a sample of transaction reports.


report.reportType report.reporter transaction.direction transaction.transactionDatetime transaction.amount
internationalFundsTransferInstruction
CBA
incoming
2020-09-22 22:58:25+00:00
$9987.50
internationalFundsTransferInstruction
CBA
incoming
2020-02-27 05:51:06+00:00
$9985.00
internationalFundsTransferInstruction
CBA
incoming
2020-10-22 02:51:24+00:00
$9950.00
internationalFundsTransferInstruction
CBA
incoming
2020-07-20 03:57:49+00:00
$9950.00

Explore Data

Leverage a Data Exploration Analysis (EDA) tool - the below example uses sweetviz.

Pandas profiling is another good option.


Chart of Transactions Per Day

Count of the number of transations per transaction day broken down by report type (use a log scale to get all the information on the one graph).

## Create Daily Count Summary

Chart of Breakdown of Report Amounts

The number of reports for each amount (rounded to the whole dollar). The amount and count are both in log scale.


Chart of Breakdown of Report Direction

Break the number and total dollar amount down by direction.


Chart of Total Transaction Amounts Per Day

Break down Total Amount and Maximum Dollar Amount per Day.


Identify Anomalies

Process the extracted data to generate an expected pattern of activity (for transaction report counts and amounts). Then identify reporting that falls outside that pattern.

The below two charts illustrate that generally speaking transaction reporting counts and amounts have been slowly increasing over the period - with a few dips for public holidays.

They also identify that for a period in March 2020, reporting volumes drifted well above the expected band, which warrants futher investigation. It may mean some transaction reporting has been duplicated.



Competition Unstructured Extension

Transaction reporting delay (the difference between when a transaction occurs and when the related report is submitted) maybe an indicator of data processing issues at the reporter or other significant issues. Analyse transaction reporting delay over the period and identify any anomalies that have occured. Additionally identify if average reporting delay is uniform across the reporting population.