Resumo:
Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area.