Learning Outcomes:

1.Rationalise appropriate scenarios for Machine Learning applications and evaluate the choice of machine learning methods for given application requirements.

2.Demonstrate competency in using appropriate libraries/toolkits to solve given real-world Machine Learning problems and develop and evaluate suitable application.

3.Understandand apply the relevant input data preparation and processing required for the Machine Learning models used, and quantitatively evaluate and qualitatively interpret the learning outcome.

4.Recognise and critically address the ethical, legal, social and professional issues that can arise when applying Machine Learning technologies.

Plagiarism is presenting somebody else’s work as your own. It includes: copying information directly from the Web or books without referencing the material; submitting joint coursework as an individual effort; copying another student’s course work; stealing course work from another student and submitting it as your own work. Suspected plagiarism will be investigated and if found to have occurred will be dealt with according to the procedures set down by the University. Please see your student handbook for further details of what is/ isn’t plagiarism.

All material copied or amended from any source (e.g. internet, books)must be referenced correctly according to the reference style you areusing.

Your work will be submitted for plagiarism checking. Any attempt to by pass our plagiarism detection systems will be treated as a severe Assessment Offence.

Coursework Submission Requirements

An electronic copy of your work for this course work must be fully uploaded on the Deadline Date using the link on the course work Moodle page for COMP1804.For this course work you must submit 4separate files:

  • A single pdf file named‘ report.pdf’ which will be the written report; the written report must have a maximum limit of 3500 words including references. It is also recommended for the report to have at least 2000 words.
  • A single csv file named‘ exclusions_dataset_taskX.csv’ (X is the number of the task you chose). The csv filewillincludealldata from yourdataset with an annotation as to whether each data pointhasbeen excluded from further analysis and why.Regardingthe format.

Available sub-tasks

 
Sub-task 1: Text Classification/regression–peer reviews.This task is to implement a ML solution for text classification/regression(long texts). It uses a dataset of ML paper peer reviews from the International Conference of Learning Representation (in the years between 2017 and 2020)[1,2]. Specifically, you will use as input a text document concatenating: the title of the paper, the abstract of the paper, the review comments, the final acceptance/rejection comment. Such input should be used to predict the following attributes:•Acceptance status (‘Accept’ or ‘Reject’)•Review score(Integer number between 1 and 10).Note that for the latter attribute youcan choose whether to use multiclass classification or regression.You can choose whether to predict both features simultaneously or separately. Additionally, the dataset is provided with a further attribute, the reviewer confidence score (an integer number between 1 and 5), which is optional to use.If you want to explore the4data further, a separate dataset with the text field split into the original fields “review comments”, “paper title”, “paper abstract” and “final acceptance/rejection comment” can be provided upon request.
 
Sub-task 2: Image classification–skin lesions.This task is to implement a ML solution for a classification problem from images. Specifically, you are provided with images of skin lesions[3] and your task is to correctly predict the following attributes:•Whether a skin lesion is benign or malign (1 for ‘is_benign’, 0 for ‘is_malign’)•The fine-grained diagnosis for the skin lesion (7 possible categories).You can choose whether to predict both features simultaneously or separately.Additionally, the dataset is provided with a further attribute, the location of the skin lesion (for example, “scalp”), which is optional to use. If you want to explore the data further, a separate dataset with more attributes can be provided upon requests.The dataset has been adapted to the requirements ofthis module; the original dataset was released under the terms of the CC BY-NC 4.0licenceby Tschandl et al. [3].
 
Sub-task 3: Image classification -advertisements.This task is to implement a ML solution for a classification problem from images. Specifically, you are provided with images of advertisements[4]and your task is to correctly predict the topic ofeach advertisement.•Images are of different sizes and there are 39possible topiccategories. •You may choose to group together some of the categories (keeping no less than 12categories).You should thoroughly discuss (and will be evaluated on) the reasons behind and the implications of grouping together different categories.
 
Sub-task 4: Text classification –amazonreviews.This task is to implement a ML solution for a multi-task classificationproblem fromtext data(mostly short texts). Specifically, you are provided with Amazon reviews [5] (the text is the review title and the review main body joined together) and yourtask is to predict the following attributes:•The number of starsassociated with the review (on a scale of 1 to 5).•Whether a product is from the category “Video Games”(“video_games”)or “Musical Instrument”(“musical_instrument”).Note that for the firstattribute you can choose whether to use multiclass classification or regression. You can choose whether to predict both features simultaneously or separately. Additionally, the dataset is provided with a further attribute: whether the review is verified or not (either True or False), which is optional to use. If you want to explore the data further, a separate dataset with the text field split into the original fields “review title”, “review main body” can be provided upon request.