Business Intelligence from Web Data Analytics and Data Mining

Order now


Educational Objectives
The purpose of the final exam is to apply the theoretical concepts and technical skills that
students learn in class to a real-world data set, and develop analytical and critical skills in the
context of data mining. You are to work on one of the data sets provided below. You must
start to apply the six phases of CRISP-DM development life cycle. Student is expected to:
• Clearly convey the project objectives and requirements in terms of business
intelligence needs
• Understand and familiarize themselves with the data through exploratory analysis
techniques learned in class
• Prepare and select the variables that they need to build models
Exam Report
Using one of the following datasets, perform an analysis allowing you to run some data
analysis to garner business intelligence from the dataset.
Your report should contain at least the following sections:
• A half-page executive summary presenting your most important findings (10%)
• A business understanding section, what do the variables measure, what is the business
intelligence you expect to gain (10%)
• A section on data pre-processing (10%)
• A section on exploratory data analysis, including graphic representations seen during
the course and transformation of variables if needed like for example qualitative into
numeric (30%)
• You should be running R to run EDA. You can put your R commands and outputs in
an appendix, which shall not count towards your page total. You may include snippets
of R code in your report (which will count toward your page total, 30%)
• Conclusions on the dataset on what was found (10%)
• References, if any
If you use additional techniques (graphs, analysis, or algorithms) not shown during the
course, explain them.
What you need to deliver:
• Report (15 page, page limit, not including appendices) in word, rtf, or pdf format
• Appendices (if needed) in word, rtf, or pdf format
• History of R commands (can be R history file in txt, word, rtf, or pdf format), can be
in the report or appendix
• The dataset you used

Dataset choices
All datasets are taken from the UCI dataset. Choose one of the following datasets for
your report. The link to the complete listings are here:
The following are the five choices you have to run your data in R and complete your

1. Record Linkage Comparison Patterns Data Set

2. Online Retail Data Set

3. YearPredictionMSD Data Set

4. Census-Income (KDD) Data Set

5. microblogPCU Data Set