Bernard Szlachta at 00:37, 17 February 2015

2015-02-17T00:37:16Z

New page

[[Category:R]]
[[Category:Machine Learning]]

<slideshow style="nobleprog" headingmark="⌘" incmark="…" scaled="false" font="Trebuchet MS" >
;title: Introduction to R with exercises
;author: MIHALY BARASZ for NobleProg Ltd
</slideshow>

== TABLE OF CONTENTS ⌘==
* Sources and further reading
* Machine Learning vs. Statistical Learning
* Linear regression
* Exercise for linear regression
* Exercise for linear regression (contd.)
* R best practices
* Logistic regression
* Testing, cross-validation
* Classification exercise
* Presenting the results
* Deploying your results
* Generalized Linear Models
* Generalized Linear Model (cont.)
* Regularization
* Regularization more generally
== 1 SOURCES AND FURTHER READING ⌘==
Source materials
* “An Introduction to Statistical Learning”
** Available for free in PDF form online
** Online course by Trevor Hastie and Rob Tibshirani
* Andrew Ng's “Machine Learning” online course
Further reading
* “Think Stats” and “Think Bayes”
**both by Allen B. Downey
**both available for free online
**programming in Python
== 2 MACHINE LEARNING VS. STATISTICAL LEARNING ⌘==
* Different origins
* Different focus
* Highly convergent in the recent years
== 3 LINEAR REGRESSION ⌘==
The simplest model for estimating a numerical response
Y=β0+β1X1+β2X2+⋯+βpXp+ε
Details
* Understanding the results
* Assessing the accuracy
* Iterpreting the coefficients
* Understanding factors
* Adding higher oder terms and interactions
== 4 EXERCISE FOR LINEAR REGRESSION ⌘==
* Data file: Advertising.csv (from ISLR)
* Multivariate linear regression
**Which variables are important
**Do they not have any predicting power?
**How much precision do we lose by dropping the "unimportant" variables?
== 5 EXERCISE FOR LINEAR REGRESSION (CONTD.) ⌘==
* Interactions between variables
Regression with all interactions
**Comparing results
**What are interations
**Visualizing interactions

== 6 R BEST PRACTICES ⌘==
* Organizing your work (and data)
* Reusable work
* Plotting
* Learning
== 7 LOGISTIC REGRESSION ⌘==
Response is categorical: Yes or No.
f(X)=β0+β1X1+β2X2+⋯+βpXp
Find a suitable f(X) and classify to Yes if f(X)>0 and to No otherwise. What is a good f?
* Minimizes the training error? Not fine-grained enough; hard to optimize for.
* Map f(X) to probabilities and maximize for the likelihood of training data.
== 8 TESTING, CROSS-VALIDATION ⌘==
* Training vs. Test-set performance
* Bias-Variance trade-off (under/overfitting)
* Strategies for estimating test error; Cross-Validation
* Bootstra
== 9 CLASSIFICATION EXERCISE ⌘==
* Data: "defaulters" from the ISLR package
== 10 PRESENTING THE RESULTS ⌘==
* Session in R Markdown
== 11 DEPLOYING YOUR RESULTS ⌘==
* Exporting a model to a spreadsheet
* Porting to a different programming environment
* Using R as a library
* Deploying R applications to web
**Shiny: http://shiny.rstudio.com/
== 12 GENERALIZED LINEAR MODELS ⌘==
* What's common in linear regression and logistic regression?
* How do they fit under one common assumption?
* What is the family parameter in glm?
== 13 GENERALIZED LINEAR MODEL (CONT.) ⌘==
* Common underlying assumption: a linear function of the predictors determines the distribution of the response.
* The parameters of the linear function are determined in a way to maximize the likelihood of the observations.
f(X)=β0+β1X1+β2X2+⋯+βpXp
For example, given the value of predictors X we assume that the distribution of the response depends only on f(X):
* Linear regression: N(f(X),σ2) with a constant σ2 (its value doesn't matter)
* Two class classification: binomial, with the probability of the Yes class being p, where logp1−p=f(X)
Deviance: negative log likelihood (times two :)). This is what we actually minimize in practice. In case of linear regression…
== 14 REGULARIZATION ⌘==
* Prediction accuracy; especially if p>n.
* Model interpretability: removes irrelevant features. Feature selection.
== 15 REGULARIZATION MORE GENERALLY ⌘==
Methods
* Subset selection
* Shrinkage (aka. regularization). Ridge regression, lasso
* Dimension reduction. Pricipal components regression; partial least squares.
== 16 REGULARIZATION EXERCISE ⌘==
* Data: regul.csv
== 17 TREE BASED METHODS ⌘==
* Decision trees
* Random forests (baggin, bootstrap)
* Boosting
== 18 UNSUPERVISED LEARNING ⌘==
* Reasons, goals
* Methods
== 19 PRINCIPAL COMPONENTS ANALYSIS ⌘==
== 20 CLUSTERING ⌘==
* Goals
* Examples
* Challenges
== 21 K-MEANS CLUSTERING ⌘==
Demonstration of R magic

Machine Learning with R - Revision history

Bernard Szlachta at 00:37, 17 February 2015