[r] resources, tips and what-not - for Curtis


Code to Remember - should probably have these memorized by now.

Image Linetypes and ShapesLinetypes and Shape Types in R



Example of a for loop
just a reminder of the basic syntax

for (i in VECTOR)
 {
    WHATNOT HAPPENS with "i"
}



Example of while loop
while (i != TRUE) {
    WHATNOT HAPPENS with "i"
        #keeps doing this until i == TRUE,
        #i.e. if i == TRUE, the loop stops
}



Example of if statement

  if ( -SOME STATEMENT THAT IS TRUE OR FASLE ){  
      IF TRUE, DO THIS STUFF
  }
  
  else { 
      IF THAT STATEMENT WAS FALSE, THEN DO THIS
  }



Example of if and if else statement
I like organizing my multiple if statements like this:

A<-2 
if(A==1){B<-1} else 
if(A==2){B<-2} else
if(A==3){B<-3}

    this means that B is assigned to 2


Multiple Conditional Statements


    AND:
if (x>2 & x <5) { do this }

    
    OR:
if (x>2 | x <5) { do this }



Where "x axis" is a vector (or multiple vectors) to be plotted on the x axis, from minimum to max. 
y-axis places those points vertically. ylim sets the range of the y-axis (useful, but not totally necessary). Main sets the main title. 

plot(x axis,y axis, type="n", main=paste(X1,"- Continuous Time - Color=Player",sep=" "), ylim=c(0,1),
         ylab="action choice", xlab="time"
    )


Add data on top of a PLOT
lines() -- adds lines on top of existing plot. 


Hist - hist - Histogram basics

Example: rolling two d4 dies
The following are possible: c(2,3,4,5,3,4,5,6,4,5,6,7,5,6,7,8)
x = c(2,3,4,5,3,4,5,6,4,5,6,7,5,6,7,8)
hist(x
)

this produces a nice histogram

Adding Breaks to Histogram: hist(x,bins), where "bins" "suggests a number of bins. But doesn't actually set it. You'll want to designate breaks. These set start and end points for 

breaks = c(0,1,2,3,4,5,6,7,8,9)
hist(x,breaks)

produces the hist you wanted!

Density Hist (instead of the frequency hist above)
hist(BMI, freq=FALSE, main="Density plot")



ggplot2 Hist


HistGram <- qplot(VARIABLE, data=mydata, geom="histogram",binwidth=1)

ggplot2 relative frequency histogram (harder than it should be!!!)
ggplot(dataframe, aes(x=x)) + 
  geom_histogram(aes(y=..count../sum(..count..)),
                 binwidth=0.05,
                 colour="#808080") +
  geom_vline(xintercept = c(0.25,0.75),
             colour="#808080",
             linetype="longdash") +
  xlim(0,1) +
  #facet_wrap(~ treatment) + #adds mult hists
  theme(strip.text.x = element_text(size = 12)) + #nice look
  labs(title = "", x="X-Axis Label", y="relative frequency")



Adding a LINE to a plot or hist - abline
abline(h = 0.05)
#adds a horizontal line at 0.05

abline(v = 1.5)
#adds a vertical line at 1.5




Adding a subtitle to a PLOT or histogram
this adds the following just below the main title, and just able the plot/histogram. 
I've made it to give statisitics info too, average payoffs of players. 
  mtext((paste
         ("Red Payoffs: ",round(payoffs[1],digits=2),
          " - Blue Payoffs: ",round(payoffs[2],digits=2),
          " - Black Payoffs: ",round(payoffs[3],digits=2),
          " - Orange Payoffs: ",round(payoffs[4],digits=2))  
          )
        )



More complicated text for plot title and elsewhere
It let's you put in basic notation, and reports numeric values in the chart itself. 


Special Plot/Hist/Graph Layouts
layout() - a little hard to explain - create a matrix, (3x3 below) and indicate where each graph/hist/plot is to go.  

layout(matrix(c(1,1,1,2,3,4,5,5,5), 3, 3, byrow = TRUE))
     


layout(matrix(c(layout), nrow, ncols, byrow = TRUE))




data.frame()
Create a matrix with a header, column label "names". 
data<-data.frame(cbind(secondsLeft=0,subject=1:4),strategy0=sample(100, 4, replace=TRUE),payoff=0)

  secondsLeft subject strategy0 payoff
1           0       1        75      0
2           0       2        96      0
3           0       3        75      0
4           0       4        58      0


Matrix Operations Stuff
Create:
a <- matrix(1,2,4)
#creates row:2 by column:4 matrix, values 1

Call a Row
a[1,]
#this called row 1

Call a Column
a[,3]
#this called column 3

Call multiple columns
a[,2:4]
this called columns 2, 3 & 4 - so it returns a matrix of 3 columns


Convert String Name Into Callable Variable 
ie a variable or data frame name in string form ("a"), call the variable a. 
Example: 
a=c(1,2,3,4)

get("a)
[1] 1 2 3 4
~surprisingly useful!


Sorting matrix by multiple (or one) columns


SET-UP
DF <- data.frame(col1 = c(1,2,3,4,5,6,7,8), 
                 col2 = c("a","a","b","b","z","z","m","n"), 
                 col3 = c(2,3,1,8,5,3,2,9))

Here, 1st sort DF by column 3 descending, then sort by column 2 alphabetically
DF[with(DF,order(col2, -col3)),]

Here, sort DF by column 3 ascending. 
DF[with(DF,order(col3)),]





What does "lty" mean? 
lty      "line type"
      seven pre-set styles, specified by either their integer or name  (default = “solid”)
 0 blank 
 1 solid 
 2 dashed 
 3 dotted 
 4 dotdash 
 5 longdash 
 6 two dash 


Export - write to csv file
Step 1. set your working directory (the .csv file will be sent there). 
Step 2. make sure the data you want to export is structured in a way the csv file can deal with (i.e., an nXm matrix, perhaps with a names row. 



write.table(data,file="NameMe.csv",sep=",",row.names=FALSE)

# "data" is the matrix, data.frame you want export. You can do an subset or a data[2:4] subset. 
# file="BlahBlah.csv" is the name of the output file
# sep="," just knows to save it as a comma seperated .csv file (other options: tab, space etc)
row.names=FALSE removes the row number index column from the output file. you want this. 

Post Script
Error: "cannot open file 'BlahBlah.csv': Permission denied" probably means you have that output file opened. 

Subset with Multiple "IN" statements
#Where L1, L2 and L3 could be numbers, or characters (depending on the 
# vector type of ColName)

subset(data, subset = ColName %in% c(L1,L2,L3))

    



Import data from Google Docs
Step 1. Go to google docs file
            >> File > Publish to web
            >>  Sheets to publish > 1 Sheet
             >  Get a link to the published data > CSV (comma.....)
            >> copy and paste the full URL
e.g. "https://docs.google.com/spreadsheet/pub?key=0AqJiu95E6EGfdEVSeE80Y1dPUmZDbkRMZ1JRbzlGd2c&output=csv"
Let's call that URL just URL.
Step 2. Make sure you have the RCurl package installed. install.packages("RCurl")


library(RCurl)

options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))

Myurl <- getURL("URL_see_step_1_above")
MyData <- read.csv(textConnection(Myurl))


And now MyData if in R. 

Post Scripts 
(1)  options(...), this may change as google docs changes its standard. this worked as of March 2013. 
(2)  Keep in mind that as you update your google doc, that URL isn't updated. That URL you get in step one is the version of the google retrieved when you hit "publish data" (at least that's been the case in my experience). 
Thus you may need to re-publish your google doc each time you want to import into R an updated version of your google doc. 



Return Multi-Line Paste Text
Supply you have a function, and you want to return/print/paste something over multiple lines:
cat()

  cat(    
    paste(
      paste("ID is:",Variable_1, sep=" "),
      paste("Name is ",Variable_2, sep=" "),
      paste("Some other next line:",Variable_3 ,sep=" "),
            sep=" \n"
    )
  )



Convert a Table into a nice clean matrix
As in, the data the HIST() fucntion uses to make a histogram.
If you check out the table() function, it's its own data type (not matrix). 


as.data.frame(mytable)



plyr package 
a package for all sorts of sorting subsetting and stuffs

install
install.packages("plyr")
library(plyr)


plyr arrnage() - sort, and multisort
for my purposes, a function to sort data via column. 
arrange(data,...)
data is the data.frame you want to work with. 
... is the rest. Whatever is first (seperated by commas) will be sorted last. 
     default sort is sort assending (A=>Z or -100=>0=>100)
     to select decending sort, desc(ColName)

Example
dd <- data.frame(b = factor(c("Aa", "Ba", "Ba", "Zz"), 
                 levels = c("Aa", "Ba", "Zz"), ordered = TRUE), 
                 x = c("A", "D", "A", "C"),
                 y = c(8, 3, 9, 9), z = c(1, 1, 1, 2))

Example of plyr arrange: sort dd, have column "z" sorted desending, with column b sorted by level. 
arrange(dd,desc(z),b)
   b x y z
1 Zz C 9 2
2 Aa A 8 1
3 Ba D 3 1
4 Ba A 9 1

Example of plyr arrange: have column "y" sorted from least to most, with b sorted from least level to highest. 
> arrange(dd,y,b)
   b x y z
1 Ba D 3 1
2 Aa A 8 1
3 Ba A 9 1
4 Zz C 9 2

> arrange(dd,y,desc(b))
   b x y z
1 Ba D 3 1
2 Aa A 8 1
3 Zz C 9 2
4 Ba A 9 1



Clock Your Code - time how long your code takes to run.  (notes)
The first line starts the clock, and the second stops it, and returns a report. This is basically a stopwatch, how much time it took to go from the first line to the last. 
ptm <- proc.time()
{ insert code you want to clock }
proc.time() - ptm
   user  system elapsed 
   1.05    0.02    1.06 
  • user time relates to the execution of the code
  • system time relates to your CPU
  • elapsed time is the difference in times since you started the stopwatch

Returns the time to run a function
system.time(function(args))
e.g.
system.time(sum(mtcars$mpg))




Take a Random Sample of Rows from a Data.Frame (matrix)
If you want to take a same part of a larger dataset (say, to test code), and you don't want to use head(data,nrows), then use:

mydata #this is a dataframe with many rows and perhaps many columns

temp <- mydata[sample(nrow(mydata),3),]
  #this creates a new data.frame of 3 rows, randomly sampled from mydata.



Drop Columns from Data.Frame, by Name (notes)

data <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)
drops <- c("x","z")
data <- data[,!(names(data) %in% drops)]

Alternatively, Dropping a column from a data frame by column name
Given the data frame "DF" and the column "ColumnX"

DF <- subset(DF, select = -c(ColumnX))


Selecting Only A Set of Columns from a Data Frame, by Name
Given the data frame "DF" and you want to select the columns "Column1" and "ColumnY"

DF <- subset(DF, select = c("Column1", "ColumnY"))




Select Zero Rows From a Data Frame (that is, create a empty version of another dataframe)
NuDF <- DF[0,]



Rename Column Names, by Name
There's a plyr function, rename(), that I've always had issue with, so here' my fix: 

data.frame called "mydata"
   with a column "Column1"
   you want to rename "Column1_old"
names(mydata)[which(names(mydata)=="Column1")] = "Column1_old"

# that's it




Take a large data.frame, and keep only the first row for each unique observation of a value in one column. 
If "temp" is your dataframe (with make rows) and ID is your variable that you want only one observation for:


temp <- temp[!duplicated(test$ID),]


Remove Columns from a Data Frame by Name
MyData 
names(MyData)
[1]  "1" "2" "3" "4"

MyData <- MyData[ , -which(names(MyData) %in% c("1","3"))]

names(MyData)
[1] "2" "4"

Get a list of column names in a nice organized way:
paste(names(DF), collapse=", ")

[1] "Col1, Col2, Col3, ..., Coln"




Change the Order of a Factor
MyData$FactorVector <- 
                factor(MyData$FactorVector,
                levels = c("06", "05", "04", "03", "02", "01"))

#ALSO
MyData$time <- factor(MyData$time, rev(levels(MyData$time)))



Define Colors in a ggplot2 graph
 ggplot(MyData, aes(x=time, y=value, group=geo,colour=factor(geo))) + 
   geom_line(size = 2) +
   geom_vline(x=c(18,30)) +
   scale_colour_manual(values=c("brown", "blue", "dark blue","light blue","grey"))



Rename  Factor names: 
http://www.cookbook-r.com/Manipulating_data/Renaming_levels_of_a_factor/
mapvalues(x, from = c("beta", "gamma"), to = c("two", "three"))
MyData$geo <-mapvalues(MyData$geo, 
          from = c("DE", "FR", "IT", "SE", "UK"), 
          to = c("Germany", "France","Italy","Sweeden","England"))



Tilt Axis Labels with ggplot2
opts(axis.text.x=theme_text(angle=-90))


Given a String/Character for an Object, Evaluate the object named by the text. 
 eval(parse(text="5+5"))
[1] 10
 class("5+5")
[1] "character"
 class(parse(text="5+5"))
[1] "expression"


Plot - add latex expression into plot title

Complicated stuff:

Simple stuff
X <- 1:10
Y <- 1:10
a <- 0.8
plot(X,Y,main=bquote(R^2 : .(a)))


When loading a file with load(...), save the object to a new namespace

Given that you saved an object (data frame) to an .RData like 'saved.file.rda'
You could: load('saved.file.rda')

bar <- get(load('saved.file.rda'))


Add progress bar to your code, loop or while loop
total <- 20
# create progress bar
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
   Sys.sleep(0.1)
   # update progress bar
   setTxtProgressBar(pb, i)
}
close(pb)


Install an R Package/Library from a specific cran repository 

install.packages("PackageNames", repos = "http://cran.cnr.Berkeley.edu/")



When using dplyr::select to rearrange a data frame's columns, how do I attached all other columns? 

dplyr::select(DF, var1, var2, -var3, matches(".")

This will take the data frame DF, make var1 the first column, var2 the section, remove var3, and then attached to the rest of the data frame all other columns/variables that existed. 



Grouping collection of asynchronous events, by dates
That is, suppose you have a big flow of events all comeing in at different dates. Suppose you want to put these dates into sets of, say, week1, week2, or month1, month2 etc. 
Here's one way from the valve dataset. 

DFPlot$nuDate = cut(x = as.Date(DFDate), breaks = as.Date(7 * (-94:2), origin = "2013-05-21" ))
DFPlot <- DFPlot %>%
  dplyr::select(
    Item, Date, nuDate, matches(".")
  ) %>%
  dplyr::mutate(
    Date = as.Date(nuDate)
  )

You'll obviously need to set the breaks to the appropriate date ranges. 










Simple Stats Functions - you should probably memorize 



SUMMARY & SIMPLE STATS
mean() -- gives the arithmetic mean
sum() -- gives the total
sd() -- give the standard deviation
min() -- minimum
max() -- maximum
length() -- count. 
srt() - gives sense of str of dataframe (length, number of rows and columns, and a few examples)

Sequences
5:9 => 5,6,7,8,9
seq() - e.g. se1(3) => 1 2 3
seq(3,5) => 3,4,5
seq(3,7,0.5) => 3,3.5,4,4.5,5
9:5 => 9,8,7,6,5




Manipulating Vectors

c() - stands for 'combine' apparently

x = c('I','want','to')
x[4] <- 'party'

x
[1] "I" "want" "to" "party"

x[c(2,3)]
[1] "want" "to"
x[2:4]
[1] "want" "to" "party"

cbind() - attach a column to a data.frame
rbind() - attach a row to a data.frame
union() - append a value to a vector. So, similar to cbind or rbind, but just adding vectors together (or adding values to vectors). Also, union combines only UNIQUE values. (so union(c(1,2),c(2,3))=c(1,2,3)

Merging Datasets
Merging tips - rbind cbind, merge
Comparing datasets - dupsBetweenGroups, splitting, checking for dublicates,

Merge two datasets by two (or more) column names
data.new <- merge(merge_data_x,
                  merge_data_y,
                  by.x = c("ID", "ID2"),
                  by.y = c("ID", "ID_2"))



Also note about dropping "missing" data - the details setting "drops all unmatched cases". all=TRUE
data.new <- merge(x,y,by.x = "ID",by.y = "ID" , all=TRUE)) 

If you are okay with dropping unmatches instances in one dataset, but note the other. Suppose you want to keep all rows in x, but you can drop y. 
data.new <- merge(x,y,by.x = "ID",by.y = "ID" , all.x=TRUE)) 



length() 
Count the number of values/objects in a vector

hey=c(1,4,5,3,2,2,2)
length(hey)

    [1] 7

length(unique(hey))

    [1] 5


#subsets a ticks file (experimental data), for the name of the block you're interested in, then gives the numbers of the subjects in that subset: 
unique((subset(ticks20120214, name=="4p-600-1-nPL-d"))$subject)

    [1] 4 3 2 1


paste() -- print combine text and data
lets you combine variables, numbers and strings/text into a single text. Useful for plots, tables, histograms, etc. Don't forget the 'sep=..." field. 
X1="POOP"
paste(X1,"- Continuous Time - Color=Player",sep=" ")

[1] "POOP - Continuous Time - Color=Player"


Round
round() -- the value, and the number of digits past the decimal point. 
round(0.123456789,digits=4)

[1] 0.1235




sample() - random plus more
Random integer between zero and 100
sample(1:100,1,replace=T)

    [1]  47 #SOME RANDOM INT
  
sample(100, 4, replace=TRUE)

    [1] 67  4  1 45

L26=LETTERS[1:26]
sample(L26, 4, replace=TRUE)

    [1] "R" "K" "M" "R"






Fun With Variable Names, Strings, and Values
Hard to explain, but say you have dataframes called ticks2012a, and ticks2012b. Say you want to cycle through them. (Note the values in variable 'x' below are all strings!)
You wouldn't be able to loop through ticks2012a, because it would start going through the values inside that dataframe. 
ticks2012a<-c(1,2,3,4)
ticks2012b<-c(5,6,7,8)
x<-c("ticks2012a","ticks2012b")
for (i in c(1:length(x))) {
    print(get(x[i]))
}

[1] 1 2 3 4
[1] 5 6 7 8

See:
x<-c(ticks2012a,ticks2012b)
for (i in c(1:length(x))) {
    print(x)
}

[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8
[1] 1 2 3 4 5 6 7 8

Now going the other way, 
Converting a Value Name (or data name, or object name etc.) into a string:
data <- c(1,2,3,4)
 deparse(substitute(data))
 [1] "data"
     


Combinations [pdf] - given combinations of a vector or a sequence of integers. 
(under utils, perhaps not in base)
library(combinat)
permn(seq(3))
   #gives all 6 possible combinations



Testing for and Removing NAs or NaNs - many functions don't work if you have any NAs or NaNs in the vector you're curious about. Easy to remove them. 

 a=c(1,2,3,4,NA)
 mean(a)

[1] NA    #rough!

mean(a,na.rm = TRUE)

[1] 2.5     #good!


Testing for and Removing NULLs 

is.null()

a <- c()
is.null(a)
[1] TRUE

Using as.character(). to find and replace NULLs from a vector
~ this is probably a dirty way to do this, but effective, and fast enough for me. 
ifelse(test, yes, no)
ifelse(as.character(data$column)=="NULL","This was NULL!",data$column))





Which and Head - Given a vector of values, and given a number you're interested in, find the next highest and the next lowest value to your value of interest in that vector. 

data<-c(0,2,3,5,4,6,8)
x=3.5 #the number of interest
y=sort(data)
Above <-y[which(y>x)[1]]
Below <-y[max(which(y<x))]




%in% - Check whether or not a value is inside another vector/data-frame. 

x<-c(0,1,2,3,4,5,6,8)
3 %in% x     #TRUE
7 %in% x     #FALSE



Not InCheck whether or not a value is not-inside another vector/data-frame. 
x<-c(0,1,2,3,4,5,6,8)
!(3 %in% x)    #FALSE
!(7 %in% x)    #TRUE





Given two Vectors, return the values the values in one that are not in the other. 
# Set-up: 
Bigger <- c(1,2,3,4,5); Smaller <- c(1,4)

Missing Values - Returns Values In Bigger that are not in Smaller
Bigger[!(Bigger %in% Smaller)]
[1] 2 3 5


Count of the number of items in one list that are not in another
length(unique(Bigger[!(Bigger %in% Smaller)]))


When does a value appear in a vector? - return an index. 
match(VALUE,VECTOR)
Returns the index in which the value "VALUE" first appears in the vector "VECTOR"
a <- 1
b <- c(3,4,5,6,4,3,1,222,5)
match(a,b)
[1]  7
 


Fast Find and Replace
Original_Data[Original_Data == "Incorrect"] <- "Correct"

#suppose you want to replace all 4's with 10s
dataset <- c(1,2,3,3,3,4,5,6,7,5,4,3,2,2,3,4)
dataset[dataset == 4] <- 10

> dataset
 [1]  1  2  3  3  3 10  5  6  7  5 10  3  2  2  3 10

And you can set this up to cycle through many find-and-replace terms
Incorrect <- c("Error1","Error2","Error3","Error4")
Correct <- c("Correct1","Correct2","Correct3","Correct4")
dataset <- #and the dataset has all the values you want to look into....
for (i in 1:length(Incorrect)){
  dataset[ == Incorrect[i]] <- correct[i]
}




ggplot2 







ggplot2 - a line plot over time, with dots and cool stuff

library(ggplot2)
ggplot(data=MyData, aes(x=yVal, y=yVal, group=Groups, colour=COLOR)) + 
  geom_point(size=4) +
  geom_line(size=1)


ggplot2 - multiplot

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  require(grid)
  
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  
  numPlots = length(plots)
  
  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                     ncol = cols, nrow = ceiling(numPlots/cols))
  }
  
  if (numPlots==1) {
    print(plots[[1]])
    
  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    
    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
      
      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}


  multiplot(p7,p1,p2,p3,p4,p5,p6,
            layout=(matrix(c(1,1,1,1,1,1,2,3,4,5,6,7), 
            2, 6, byrow = TRUE)))
    # top row is one plot
    # second row is 6 plots


 


ggplot2 - change the order of a plot legend


ggplot2 - reverse the order or a discrete or continuous scale



ggpplot2 background color
theme(panel.background = element_rect(fill='white', colour="white"))





R - String for a quotation mark
 
paste(" 'Hey!' ")




plyr id() function 






Garbage Collection: Function: cg()
Report on memory usage. Gives you a sense of the system resources at your disposal. 



Reshape Package - http://had.co.nz/reshape/




melt() - data, id, na.rm
Converts a multi-column data frame into a three column data frame. 

require(reshape2) 

nuDF <- 
melt(data = DF,
             value.name = "Value")
             )

- Where DF is obviously the data frame
- Pretty sure the first column is ALWAYS selected to be the main stacking column. Though that technically doesn't affect how useful the data is, it's helpful to keep that in mind. 
- Where "value.name" is really up to you. It's whatever you think is appropriate
- This will take every column but the first, and make it a row entry 
  - with col2 = "variable", the old columns name. 



acast() - data, by ids, variable, mean, 




dcast()

  dcast(DF, ID_1 ~ Direction, value.var = "ID_2", list)
DF is the data frame
ID_1 is the thing you want unique values of. 
Direction is the thing that you want converted into columns (say, Direction 0 and 1)
Value.Var is the value that actually gets inserted into the new column. 
list is just an functio that gets applied to those value. list just puts in a list. black will just put in values (assuming there is only one value, error otherwise)

This will now show each unique Id1+Id2 combo
  dcast(DF, ID_1 + ID_2 ~ Direction, value.var = "ID_3", list)




Update the Version of R on Your Machine (Windows)
1) install the new version
2) Make note of all or the packages/libraries that you installed.
3) Uninstall the old version of R
4) Run install.packages("...") for all the packages you will want. 


ddply - a useful plyr function for finding stats by variable settings



subsetting with ddply
ddply(mydata, .(instrument), summarise,
      avgProfit = mean(TCurr[TCurr > 0]))


this takes the subset of TCurr such that that variable is less than 0

Conditional applying
system.time({
  NuData <- ddply(Prices.MovingAvg, 
                  .(DefID_AppID_Q), 
                  summarize,
                  NumDaysTraded = length(Turnover[Turnover!=0]))
})



Dates
as.Date

Change the Way Date is Formatted (e.g. going from "/" separated dates to "." or "-" seperated date formats. 
format(as.Date(0, origin = "2011-08-09"), "%Y.%m.%d") # Converts to "2011.08.09"


Time
strptime(paste("07/10/13",10:30), "%m/%d/%y %H:%M")

SymbolMeaningExample
%dday as a number (0-31)01-31
%a
%A
abbreviated weekday 
unabbreviated weekday
Mon
Monday
%mmonth (00-12)00-12
%b
%B
abbreviated month
unabbreviated month
Jan
January
%y
%Y
2-digit year 
4-digit year
07
2007



Noussair Model - Model with Interaction Term between a categorical variable (a factor variable) and a continuous variable. 
library(reshape2)
df <- data.frame(x = 1:300)
df$y1 <-  (0.7/df$x + 0.1*(df$x-1)/df$x + rnorm(300,0,0.015))
df$y2 <-  (0.5/df$x + 0.1*(df$x-1)/df$x + rnorm(300,0,0.015))
df$y3 <-  (0.3/df$x + 0.1*(df$x-1)/df$x + rnorm(300,0,0.015))
df <- melt(df, id = 1)
These three treatments all converge to the same value (0.1), but start at different initial values, depending on membership to the factor variable y1, y2 or y3. The following will estimate these: 

summary(lm(df$value ~ 0 + df$variable:(I(1/df$x)) + I((df$x-1)/df$x)))



Check if a subset of numbers are in a larger set, return true if yes, false if no. 

checklist <- c(1,2,3)






Text Functions

Return Index of Character in String
Suppose you want to find the index number of the first instance of a character in a string? 
E.g., you have "123456_789", and you want to return the index of the character "_"?


regexpr("_","263_6")[1]

Returns: 4


Substring Function
substr(String, start, stop) - and the start and stop are inclusive

substr("123456789", 2,5)

[1] "2345"
dplyr::mutate(
    data,
    newVar = substr(Var1, nchar(as.character(Var1)) - 1, 
nchar(as.character(Var1))
)

# returns the last two characters from the column of strings in Var1


SubString Surrounded by...
A function that returns the string surrounded by the first and second appearance of a particular character. 
E.g. SubStrSur("_", "123_456_789) returns "456"

SubStrSur <- function(Exp,x){
   #Given a string, returns the value between Exp
   NuStr <- substr(x, regexpr(Exp,x)[1]+1,nchar(x))
   substr(NuStr,1, regexpr(Exp,NuStr)[1]-1)
 }



Using Sapply and strsplit (string split) to find substrings
PriceObs.SM_Study$AppID = 
           sapply(strsplit(PriceObs.SM_Study$DefID_AppID_Q, "_"),"[[",2)



For ggplot2 xlim ylim, if you don't want to drop rows, you need to coord_cartesian
coord_cartesian(xlim = NULL, ylim = NULL, wise = NULL)



dplyr tools and tips
Introduction to dplyr - http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html




Remove Grey/Gray border around lengend
+ theme(legend.key = element_blank())


Smoothing Stuff - for heatmaps, and contour stuff. 

#interp converts x,y,z values into something plots can work with (x,y vector, and z matrix)
s = interp(x,y,z,
           xo=seq(0,5, length=100),
           yo=seq(0,5, length=100),
           duplicate = "median")


#smooth.2d is a function that takes a list of s's structure (which is what plotting functions use for heatmaps) and smooths those z values. 
smooth.2d(z, x=cbind(x,y), theta=0.5)

#a heatmap maker:
    x <- CoopPlayerSummary.temp$AggUndercutNum
    y <- CoopPlayerSummary.temp$CounterpartAggUndercutNum
    z <- loess(Score ~ x + y,data=CoopPlayerSummary.temp,span=0.9)
    t. <- interp(x,y,z$fitted,
                 xo=seq(0,90, length=100),
                 yo=seq(0,90, length=100), duplicate = "median")
    t.df <- data.frame(t.)
    t.df[is.na(t.df)] = 0.
    
    if (length(levels) == 0){
      zlim = range(gt$value, finite = TRUE)
      levels = pretty(zlim, NumCols )
    }

    colramp = colorRampPalette(c("#fafcef","#6b0200","#0f0000"))
    
    plot(NA,
         xlim=(c(0,90)),
         ylim=(c(0,90)),
         xlab = "...", 
         ylab = "...",
         frame=FALSE,axes=F,xaxs="i",yaxs="i")
    
    .filled.contour(x=t.df$x, 
                    y=t.df$y, 
                    z=as.matrix(t.df[3:ncol(t.df),1:length(t.df$x)]),
                    levels = levels,
                    #nlevels=length(colramp(ncols)), 
                    col = colramp(length(levels))
    )
    axis(1, seq(0.0, 90, 5), label=TRUE, tcl=-0.5, col = "white", col.ticks='grey')
    axis(2, seq(0.0, 90, 5), label=TRUE, tcl=-0.5, col = "white", col.ticks='grey',las=1)



Progress Bar
Very simple!
total <- 20 # create progress bar pb <- txtProgressBar(min = 0, max = total, style = 3) for(i in 1:total){ Sys.sleep(0.1) # update progress bar setTxtProgressBar(pb, i) } close(pb)





























Subpages (2): colors ggvis
Comments