Fraud Detection in White-Collar Crime
White-collar crime is and has always been an urgent issue for the society. In recent years, white-collar crime has increased dramatically by technological advances. The studies show that companies are affected annually by corruption, balance-sheet manipulation, embezzlement, criminal insolvency and other economic crimes. The companies are usually unable to identify the damage caused by fraudulent activities. To prevent fraud, companies have the opportunity to use intelligent IT approaches. The data analyst or the investigator can use the data which is stored digitally in today’s world to detect fraud.
In the age of Big Data, digital information is increasing enormously. Storage is cheap today and no longer a limited medium. The estimates assume that today up to 80 percent of all operational information is stored in the form of unstructured text documents. This bachelor thesis examines Data Mining and Text Mining as intelligent IT approaches for fraud detection in white-collar crime. Text Mining is related to Data Mining. For a differentiation, the source of the information and the structure is important. Text Mining is mainly concerned with weak- or unstructured da-ta, while Data Mining often relies on structured sources.
At the beginning of this bachelor thesis, an insight is first given on white-collar crime. For this purpose, the three essential tasks of a fraud management are discussed. Based on the fraud tri-angle of Cressey it is showed which conditions need to come together so that an offender com mits a fraudulent act. Following, some well-known types of white-collar crime are considered in more detail.
Text Mining approach was used to demonstrate how to extract potentially useful knowledge from unstructured text. For this purpose, two self-generated e-mails were converted into structured format. Moreover, a case study will be conducted on fraud detection in credit card dataset. The dataset contains legitimate and fraudulent transactions. Based on a literature research, Data Mining techniques are selected and then applied on the dataset by using various sampling techniques and hyperparameter optimization with the goal to identify correctly predicted fraudulent transactions. The CRISP-DM reference model was used as a methodical procedure.
The results from the case study show, that Naïve Bayes and Logistic Regression in small da-tasets and Support Vector Machine as well as Neural Networks are appropriate Data Mining techniques to detect fraud. The results were measured using several evaluation metrics such as precision, accuracy, recall and F-1 score. The data analyst has the opportunity to improve the predictive accuracy by tuning the hyperparameters.
Text Mining can extract patterns and structures as well as useful information in text documents with the help of linguistic, statistical and mathematical methods. However, using Text Mining in unstructured data is difficult and time-consuming.
Wenn Sie an der Thesis interessiert sind, dann kontaktieren Sie entweder den Author oder einen der beiden Professoren.