In this assignment, we build a simple ANN model for a dataset and we explore some methods to optimize model parameters.
Dependencies
Python 2.7Scikit Learn packagePandas toolkit (If it is needed) Numpy toolkit (If it is needed)
Dataset
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Note: You do not need to download the data. Two subsets of data for training and test is created and posted for downloading. The original data set also is included for your further testing and experimenting.
The list of the fields in order of columns in the data file is:PregnanciesNumber of times pregnantGlucosePlasma glucose concentration 2 hours in an oral glucose tolerance test BloodPressureDiastolic blood pressure (mm Hg)SkinThicknessTriceps skin fold thickness (mm)Insulin2-Hour serum insulin (mu U/ml)BMIBody mass index (weight in kg/(height in m)^2)DiabetesPedigreeFunctionDiabetes pedigree functionAge (years)OutcomeClass variable (0 No Diabetes or 1 Diabetes) 268 of 768 are 1, the others are 0.
ANNprojectdescription– Loading the required libraries and modulesTo finish this assignment, you may use any module that you think is necessary. These modules includes:1- Pandas2- Numpy3- Matplotlib4- Sklearn
– Reading the data and performing basic data checks
1- Use describe() method of the dataframe (e.g df.describe()) to output the data dimensions and fields basic stats.
– Creating the train and test datasets

1- Splityourdataintotrain,testand,validation.

2- Splityourdatabasedontheratiooftrain=0.75,validation=0.15and,test=0.10

3- Printoutthetotalnumberofrowsandthenumberofrowsallocatedforeach
category above.Note1: There are many ways you can achieve splitting dataset.You can use any method as long as you have a balanced ratio of classes in each category.

4- Useyourtestdatasetonlyforfinaltesting.Youshouldnotusetestdatasetfor adjusting parameters.

– Building your model

1- Useneural_networkclassifiertobuildyournetwork.

2- Thereareafewparametersneedstobeadjustedinthismodel:
a. Hidden_layer_sizes b. Activationc. Solverd. Batch_size
e. Learning_ratef. Learning_rate_init

3- Mostofparametersabovehavedefaultvalues.Youmayrunyournetworkon default values to get an idea about your network

4- Youmayadjustparametersbasedwhatwetalkedintheclassetc

– Network metric and plots

1- To evaluate your network, calculate confusion matrix. For that you may use confusion_matrix function from sklearn.metrics library.How do you interpret precision and recall?

2- Plot error plot for each iteration (epoch) for both validate and train.Based on your parameter setting you may see an overfitting problem. Try to spot it if that’s happened.

3- Once you finish your parameter tunning, run your model on your test data set and create a plot for your test the same as previous step

Whatareyourdeliverables?
a. A Jupyter document containing your model and experiment results.

Question

In this assignment, we build a simple ANN model for a dataset and we explore some methods to optimize model parameters.
Dependencies
Python > 2.7Scikit Learn packagePandas toolkit (If it is needed) Numpy toolkit (If it is needed)
Dataset
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Note: You do not need to download the data. Two subsets of data for training and test is created and posted for downloading. The original data set also is included for your further testing and experimenting.
The list of the fields in order of columns in the data file is:PregnanciesNumber of times pregnantGlucosePlasma glucose concentration 2 hours in an oral glucose tolerance test BloodPressureDiastolic blood pressure (mm Hg)SkinThicknessTriceps skin fold thickness (mm)Insulin2-Hour serum insulin (mu U/ml)BMIBody mass index (weight in kg/(height in m)^2)DiabetesPedigreeFunctionDiabetes pedigree functionAge (years)OutcomeClass variable (0 No Diabetes or 1 Diabetes) 268 of 768 are 1, the others are 0.
ANNprojectdescription– Loading the required libraries and modulesTo finish this assignment, you may use any module that you think is necessary. These modules includes:1- Pandas2- Numpy3- Matplotlib4- Sklearn
– Reading the data and performing basic data checks
1- Use describe() method of the dataframe (e.g df.describe()) to output the data dimensions and fields basic stats.
– Creating the train and test datasets


1- Splityourdataintotrain,testand,validation.


2- Splityourdatabasedontheratiooftrain=0.75,validation=0.15and,test=0.10


3- Printoutthetotalnumberofrowsandthenumberofrowsallocatedforeach
category above.Note1: There are many ways you can achieve splitting dataset.You can use any method as long as you have a balanced ratio of classes in each category.


4- Useyourtestdatasetonlyforfinaltesting.Youshouldnotusetestdatasetfor adjusting parameters.


– Building your model


1- Useneural_networkclassifiertobuildyournetwork.


2- Thereareafewparametersneedstobeadjustedinthismodel:
a. Hidden_layer_sizes b. Activationc. Solverd. Batch_size
e. Learning_ratef. Learning_rate_init


3- Mostofparametersabovehavedefaultvalues.Youmayrunyournetworkon default values to get an idea about your network


4- Youmayadjustparametersbasedwhatwetalkedintheclassetc


– Network metric and plots


1- To evaluate your network, calculate confusion matrix. For that you may use confusion_matrix function from sklearn.metrics library.How do you interpret precision and recall?


2- Plot error plot for each iteration (epoch) for both validate and train.Based on your parameter setting you may see an overfitting problem. Try to spot it if that’s happened.


3- Once you finish your parameter tunning, run your model on your test data set and create a plot for your test the same as previous step


Whatareyourdeliverables?
a. A Jupyter document containing your model and experiment results.

Unlock all answers

Related questions

Weekly leaderboard