Getting Started on Your Thesis Proposal
Step 1: Identify a general area to research. (for a quick paper for ACML)
- Financial X-ray of Customers using ML methods
- GAN-BERT (Long-term study) PhD
Step 2: Formulate a research question or questions.
- Is it feasible to perform a prognosis in a timely manner on financial welfare of customers so that to provide early intervention to customers are likely at the risk of retirement preparedness? whether financial status will improve or worsen (and how quickly) or remain stable over time.
Observation duration: four years (include 7 datasets belong to CFS company)
- A financial X-ray to identify the impact of Covid-19 pandemic on CFS customers
Step 4: Provide a justification for the research.
A prognosis of customers’ financial welfare is feasible, thus enabling the financial advisors (or superannuation companies/banks) to provide personalised and effective intervention to those at risk of retirement preparedness and improve their financial performance. The potential of customers for future financial burden and associated retirement issues can be detected basis on the interpretation of their financial interaction data. This timeline prognosis of the customer’s financial condition will be called a financial X-ray.
A prognosis is built basis of the normal period of customer’s finance condition that influenced by average 90 attributes or factors. A prefect financial X-ray (or timeline prognosis) is achieved from measuring changes of financial progression, customer engagement and customer’s financial literacy score using machine learning algorithms.
Step 5: Identify the subjects from whom or from which you will collect data. Identify how you will collect data.
The datasets are used in this research is belonging to CFS from 2016 to 2019 (each year includes two datasets) is available. We can request the dataset 2020 from CFS as well. Each of the different datasets is characterised by the value of an average of 90 attributes related to CFS superannuation’s customers.
Step 6: outline the procedure and methods you will use.
- Data Pre-processing: Balancing the datasets, feature selection to reduce overfitting and training time in pre-processing phase.
- Using Graph base regression methods
- Model tuning
- Model comparison: Applying seven different well-known supervised regressions for model comparison purposes.
It should be noted that all attributes except general customer information like age, gender and contact information will be added consecutively during the four years scanning recorded finance data into datasets. Adding importance features associated with customer financial engagement can be affecting the timely prediction of economic welfare variable and will be demonstrated on the graphs in the experimental results. Therefore, the experiments will be done in seven steps.
The first step belongs to dataset 06/2016. The second step is for dataset 12/2016; the remaining features of other datasets will be amended similarly until the 7th step, including all the dataset attributes.
Step 7: Identify the type of data you will end up with.
The output of the financial X-ray test is quantitative data which is associated with the RMSE value.
Step 8: State what type of analysis or testing you will carry out on the data.
The evaluation criteria used for determining the effectiveness of the regression methods include the Mean Absolute Error (MAE), the R-squared (R^2), the Root Mean Squared Error (RMSE) and the Pearson Correlation Coefficient (PCC).
Step 9: Set out your research project outcomes.
It obviously can be identified those customers are in the risk of financial condition with comparing their RMSE value with a defined RMSE threshold. Therefore, the result indicates progressive and non-progressive customers.
For instance: Assume the threshold RMSE value set to 0.15 before financial year of 2018 since a confirmed target variable “financial_welfare” shows a RMSE value 0.15. An observation above the threshold value before financial year of 2018, represents early prediction of customer’s financial welfare and vice versa. Several experiments are conduct to assess the effectiveness of the proposed algorithm compared to well-known regression methods. Finally, a test dataset is applied for analyse the results (including graphs and tables).
Step 10: Identify the problems you anticipate.
- Difficulty of setting a RMSE threshold value due to the different unbalanced datasets.
- Feature importance for the high dimension datasets
- Prediction Accuracy
Step 11: Set out possible solutions, alternatives or contingencies.
- Applying SMOGN sampling techniques before learning process.
- Using ExtraTreesClassifier or ET Classifier class in the scikit-learn API.
- Model selection and Model Tuning: Applying seven different well-known supervised regressions to make sure the effectiveness of proposed model.
Step 12: Identify the resources required.