Presentation and demo code/data from last nights meeting (2010-07-08). Presentation has been updated with output of models and samples. The demo.zip includes the code we reviewed last night with some changes so it should run top down out of the directory that demo.zip is expanded. I have included R console log output if R is not available. Let me know if you have any questions.
20100708_UPDATED_cs229_class_project.pdf
demo.zip
labels_y.csv
p95.csv
avg.csv
MartCpu.R
R_MartHelp.txt
MART_tutorial.pdf
cs229_class_project.pdf
p95_matrix.Rdget.tar.gz
avg_matrix.Rdget.tar.gz
labels_y.Rdget
Comments (5)
stephen.oconnell said
at 4:39 pm on Jun 26, 2010
Mike thanks for putting the files out there, I got distracted yesterday and then raking pine needles all day, my hands are killing... I have converted the files to csv to make it more straight forward for the matlab and pythons folks.
stephen.oconnell said
at 5:26 pm on Jun 26, 2010
I would suggest using the CSV and not the Rdget files. There was an issue with the 3 of the samples having an 'x' for a classification which is invalid in this context. I have assigned them to the right class in the CSV files. If you want to use the Rdget files run the following code after loading the labels_y.Rdget file:
# LOAD FILE INTO lbl
lbl <- dget("labels_y.Rdget")
# FIX THE 'x' SAMPLES
lbl$Y[lbl$SAMPLE == 'SAMPLE_49'] <- '5'
lbl$Y[lbl$SAMPLE == 'SAMPLE_786'] <- '5'
lbl$Y[lbl$SAMPLE == 'SAMPLE_1250'] <- '3'
stephen.oconnell said
at 10:12 pm on Jun 26, 2010
I have uploaded a version of the problem presentation. I had to grey a few things out since it is being posted to the web...
mike@mbowles.com said
at 6:03 am on Jun 27, 2010
I've uploaded my copies of Stephen's CPU data. There are three files.
1. labels_y.Rdget - a vector of class labels. 1-6 in accord with Stephens presentation in class.
2. avg_matrix.Rdget.tar.gz - a matrix where each row contains 130 measurements of the average daily load for a single server.
3. p95_matrix.Rdget.tar.gz - a matrix where each row contains 130 measurements of the 95th percentile load for a single server.
the ith row from the vector of class labels and from two matrices of daily load measurements are all data corresponding to the same server we can use the labels in conjunction with either or both of the matrices.
mike@mbowles.com said
at 6:39 am on Jun 27, 2010
i just uploaded the .R file that will run friedman's boosted tree program - MART. The two commands at the end of that file
yp<-martpred(X)
table(yp,y)
have to be run separately to generate the prediction and the table. I'm not sure why.
you cannot download MART using the normal "Load Package". To download MART, go to http://www.stanford.edu/class/stats315b/ This is Prof Friedman's course page for 315b. click on "Homework 2". That give download url's and instructions. I've also uploaded the help files for MART and some of its related programs.
You don't have permission to comment on this page.