For unlimited access to Homework Help, a Homework+ subscription is required.

Avatar image
dhdnsn282 asked for the first time
Avatar image
aliyuabdullahiisa07 asked for the first time
Avatar image
jcperry1952 asked for the first time
Avatar image
affriencequeen12 asked for the first time
Avatar image
alizaib asked for the first time
Avatar image
tricia asked for the first time
in Statistics·
14 Dec 2022

 

I will pay someone to help me finish this project

TO START: First, select a classification methodology to learn in your project.  Select either Classification Trees or Support Vector Machines.  Those topics are covered in the following chapters in the ISLR text:

(a) Chapter 8, pages, 303-331 covers Tree-Based Methods (or CART, short for classification and regression trees).  Only show results for classification trees, not regression trees.

(b) Chapter 9, pages, 337-368 covers SVM (“support vector machines”);

       In terms of coverage, if you pick Chapter 8, Tree-Based Methods, get through boosting.

       In Chapter 9, SVM, get through the section on support vector machines. 

 

Appendix. After making your choice from above, get introduced to it by walking through the lab example provided in the ISLR text.  Show that work in an Appendix, including any graphs.  On those graphs, insert your name into the titles.  To actually get the data used in the ISLR examples, you will likely need to download an R package called ISLR; it contains the data sets used in the text. 

 

  1. [20 points]  Begin by introducing your reader to the corporation from which your stock data comes.  Tell the reader something you learned about that corporation that you found interesting, something which would demonstrate to a recruiter that you possess curiosity and the ability to employ it.

                Explain to the reader why we took two different transformations of the price data, return and risk. Illustrate your discussion with before-and-after graphics.  Review LN2.A and your LN2 homework as a refresher.

                Then, using trimmed screenshots where needed from Excel, sketch out for the reader how you converted your Yahoo-sourced stock data into lagged stock risk data set since 2006. 

                                   

  1. [20] First, draw a random sample of size n=300 without replacement from your stock return data set.  Recall that your stock return data contains a HiLo return column and standardized log lag1 and log lag2 return columns.  I will call this your n=300 stock return data set.  Show and explain how this is done.

         Second, draw another random sample of size 300 without replacement from your stock return data set.  Recall that your stock risk data contains a HiLo risk column and standardized log lag1 and log lag2 columns.  I will call this your n=300 stock risk data set.

                Next, using your n=300 stock return data set, walk through the steps covered by the ISLR text for your chosen method, SVM or CART, explaining in your own words what you are doing.  Put your name into the title of any graphs you show.  Where you are unclear as to what is happening or why it is being done, say so, and document your efforts to work towards understanding.  Do not waste time trying to fake comprehension via plagiarism.

               You are welcome to read up on from other sources concerning your chosen method, including from textbooks, academic articles, and the web.  However, carefully credit the sources from which you gain insight, and do not plagiarize them!  It is natural that you will not understand much of what you read, but you can start wrestling with it and you can document that wrestling.

               Finally, run the program on your n=300 stock risk data set and compare the performance to that of your n=300 stock return data set.  Include use of the chi-square test. Discuss the differences, the reasons that these would happen, and the lessons learned about the nature of the stock market.

 

  1. [20]  Select one of the tuning parameters or decision criteria that lie beneath the surface of your chosen methodology, CART or CART.  Engage with it by researching beyond the ISLR text.  Then experiment with it. Experiment with your data and with other data sets. Try decreasing or increasing n.  Look at other sources for help, documenting the sources.  When borrowing text, use quotation marks and footnote the source.  Do not plagiarize text or graphics.    

               Here are some examples of possibilities from Chapter 8 on CART.  On page 312, the Gini index is defined, but what is it?  Can you compute it yourself?  How is the G statistic used after it is computed?  What is it compared against?  Is that a parameter that you can tune?  Also, at the top of that page, the text says that the "...classification error rate is not sufficiently sensitive."  See if you can demonstrate a lack of sensitivity!  See if you can figure out what is meant in this context by "sensitive?"  To what is it not sensitive?  The text also says that entropy is an alternative to Gini and gives similar results.  Do you find that to be true?  

               Here is an example from Chapter 9 on SVM.  On page 346, a parameter denoted C is introduced, but what does it do?  Demonstrate that you worked diligently on this problem: the goal is to engage in with the issue, not to produce miracles of comprehension or a plagiarism dump!  Take a hands-on, practical approach: that requires experimentation. 

 

  1. [20] Create classification space plots for both of your n=300 data sets, using your chosen methodology, SVM or CART.  Be sure to explain how you went about this.  Create the plot using the same techniques that we did in our plots for other methods.  Work it out yourself, step by step. 

                 If you are doing SVM, you will find that the SVM software automatically outputs classification space plots.  Do not show me those plots, as they will count ZERO on this assignment.  I am requiring that you create a classification space yourself,  step by step.  As we know from trying to do that ourselves, it is not easy.  Evidence of thoughtful and diligent work is more important than getting your plots to work perfectly.

                                   

  1. [20]  Prepare a comparative study of knn, naive Bayes, logistic regression, and your selected method.  Make this comparison on your two n=300 data sets, splitting the data randomly in half to get the training and testing sets.

        Explain what you are doing as you go along, explain what you understand about what distinguishes the methods, discuss reasons why the results vary, and why there might be systematic differences in performance between return data and risk data.  Show classification space plots for knn, naïve Bayes, and logistic regression.  At the end, show a single table in which you summarize the overall correct forecast rate for the stock returns for the four methods; then another table summarizing the performance on the stock risk data.

                       

Avatar image
abigailcarreiro2008 asked for the first time

Start filling in the gaps now
Log in