• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.


Stephens CPU Classification Prob

Page history last edited by Stephen O'Connell 10 years, 9 months ago


Presentation and demo code/data from last nights meeting (2010-07-08).  Presentation has been updated with output of models and samples.  The demo.zip includes the code we reviewed last night with some changes so it should run top down out of the directory that demo.zip is expanded.  I have included R console log output if R is not available.  Let me know if you have any questions.
















Comments (5)

stephen.oconnell said

at 4:39 pm on Jun 26, 2010

Mike thanks for putting the files out there, I got distracted yesterday and then raking pine needles all day, my hands are killing... I have converted the files to csv to make it more straight forward for the matlab and pythons folks.

stephen.oconnell said

at 5:26 pm on Jun 26, 2010

I would suggest using the CSV and not the Rdget files. There was an issue with the 3 of the samples having an 'x' for a classification which is invalid in this context. I have assigned them to the right class in the CSV files. If you want to use the Rdget files run the following code after loading the labels_y.Rdget file:

lbl <- dget("labels_y.Rdget")

lbl$Y[lbl$SAMPLE == 'SAMPLE_49'] <- '5'
lbl$Y[lbl$SAMPLE == 'SAMPLE_786'] <- '5'
lbl$Y[lbl$SAMPLE == 'SAMPLE_1250'] <- '3'

stephen.oconnell said

at 10:12 pm on Jun 26, 2010

I have uploaded a version of the problem presentation. I had to grey a few things out since it is being posted to the web...

mike@mbowles.com said

at 6:03 am on Jun 27, 2010

I've uploaded my copies of Stephen's CPU data. There are three files.
1. labels_y.Rdget - a vector of class labels. 1-6 in accord with Stephens presentation in class.
2. avg_matrix.Rdget.tar.gz - a matrix where each row contains 130 measurements of the average daily load for a single server.
3. p95_matrix.Rdget.tar.gz - a matrix where each row contains 130 measurements of the 95th percentile load for a single server.

the ith row from the vector of class labels and from two matrices of daily load measurements are all data corresponding to the same server we can use the labels in conjunction with either or both of the matrices.

mike@mbowles.com said

at 6:39 am on Jun 27, 2010

i just uploaded the .R file that will run friedman's boosted tree program - MART. The two commands at the end of that file


have to be run separately to generate the prediction and the table. I'm not sure why.

you cannot download MART using the normal "Load Package". To download MART, go to http://www.stanford.edu/class/stats315b/ This is Prof Friedman's course page for 315b. click on "Homework 2". That give download url's and instructions. I've also uploaded the help files for MART and some of its related programs.

You don't have permission to comment on this page.