Project 2: Implementation of Supervised Learning Instructions: In the project, y

April 22, 2024

Project 2: Implementation of Supervised Learning
Instructions:
In the project, you need to apply algorithms in supervised learning to analyze the data set and furthermore get some observations or conclusions. Please select a new data set. Extension of project 1 is NOT allowed. The data set should be large enough, i.e., a minimum of at least 100 instances and 5 attributes. 
Project Requirements
In this project, you will conduct classification or regression models on data sets using the concepts studied in class. At least TWO models are required to analyze the data. Advanced skills in data preparation are required.  You also need to evaluate and compare the performance of different models. The project MUST include the following techniques:
Data Preparation
Select at least one technique to prepare the features:
Scaling for quantitative features
The technique dealing with the categorical features 
Select at least one advanced technique to process the data:
The technique to deal with the imbalanced data, such as up-sampling, down-sampling, and SMOTE for classification.
The technique for dimensionality reduction, such as PCA. 
Models in Supervised Learning
The project could be either classification or regression.
At least TWO models are required to analyze the data.
Adjusting Hyperparameters are NOT required.   
Evaluation of the model.
At least TWO evaluation metrics should be applied. 
Make comparison of different models.
Make comparison with/without advanced techniques in data processing. 
In addition, data cleaning is optional. If you select standard machine learning dataset from Kaggle or UCI ML Repository, data cleaning is not necessary.
Presentation Requirements    
Accordingly, please prepare a presentation for each team that includes the following. Please upload these presentations individually though you are working in a team.  Also, please present these in class. Each team will get approx. 10 min for presentation, so please plan your talk accordingly. 
Introduction of your project with goals – should be with reference to the data in this assignment
Description of your data set along with the target
Live demo / demo snapshots of execution of the data preparation, the models, and performance evaluation of all the models
Show the results of the performance evaluation
Conclusions from all the above analysis
Submission:
A final submission should include all the source code, data set and slides for the presentation. Submission should be uploaded on Canvas individually by each student, even though you are working in groups.
About the data set:
You could find the data by your self or select from the following resources:
Kaggle
https://www.kaggle.com/Links to an external site.
UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/index.phpLinks to an external site.
Stanford Large Network Dataset Collection
https://snap.stanford.edu/data/Links to an external site.
Dataverse Network
https://dataverse.org/Links to an external site.
Reddit Open Data
https://www.reddit.com/r/opendata/Links to an external site.
CDC Data
https://www.cdc.gov/nchs/tools/index.htm?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fnchs%2Fdata_access%2Fdata_tools.htmLinks to an external site.
World Bank Catalog
https://datacatalog.worldbank.org/Links to an external site.
Metor Boston Data Common
https://datacommon.mapc.org/Links to an external site.
COVID-19 Data Repository by Johns Hopkins University
https://github.com/CSSEGISandData/COVID-19Links to an external site.
BELOW I ATTACHED ALL CLASS NOTES AS WELL AS A SCREENSHOT OF THE SAE INSTRUCTIONS PROVIDED ABOVE TRY TO USE ONLY FAMILIAR THINGS LEARNED FROM THE CLASS NOT SUPER COMPLEX X

Are you struggling with this assignment?

Our team of qualified writers will write an original paper for you. Good grades guaranteed! Complete paper delivered to straight to your email.

GET HELP WITH YOUR PAPER