### R-Code -- Descriptive & Summary Statistics

 Let's say you've loaded data from a csv file (example of how to do this in the link). FYI Here's the data I'm working with for this example: FYI, red is input you would type, and green is output from R.Here's text from my R Window as I ran through this exercise. (gdoc)First, load the data    nihs<-read.table("C:\\Documents and Settings\\admin\\My Documents\\temp\\NHIS 2007 data.csv",header=T,sep=",")I've called the csv file "nihs". Thus R now had the object nihs. First, just type: nihsThis should print out in your script window what your nihs data looks like. str(nihs)str stands structure. It gives you a simple sense of the structure of your new R Object. How many observations you have (observations are basically rows). How many variables (variables are basically columns). We get the following output:'data.frame':   4785 obs. of  9 variables: \$ HHX   : int  16 20 69 87 88 99 101 122 129 134 ... \$ FMX   : int  1 1 1 1 1 1 1 1 1 1 ... \$ FPX   : int  2 1 2 1 1 1 1 1 2 2 ... \$ SEX   : int  1 1 2 1 2 2 1 1 2 2 ... \$ BMI   : num  33.4 26.5 32.1 26.6 27.1 ... \$ SLEEP : int  8 7 7 8 8 98 6 7 7 7 ... \$ educ  : int  16 14 9 14 13 12 13 12 16 18 ... \$ height: int  74 70 61 68 66 98 99 70 65 64 ... \$ weight: int  260 185 170 175 168 998 172 170 147 148 ...fix(nihs)this loads a spreadsheet where you can see and edit your new R object.(Note that you won't be able to work in the R script window until you've close your 'fix' window.) Mean, Median, Mode, Variance & Standard DeviationSay we're curious about basic summary statistics (mean, median, variance, standard deviation) for our SLEEP variable. mean(nihs\$SLEEP)the mean is: 9.506792 median(nihs\$SLEEP)the median is: 7(Instructions for the finding the mode in R)var(nihs\$SLEEP)the variance is:  217.0364sd(nihs\$SLEEP)  the standard deviation is:14.73215summary(nihs\$SLEEP)  Min.    1st Qu. Median  Mean    3rd Qu. Max.   3.000   6.000   7.000   9.507   8.000   99.000 this gives you a simple taste of what your variable's central tendency and range looks like. If you type summary(nihs), you'll get output for all the variables in your nihs object. Once Here's text from my R Window as I ran through this exercise. (gdoc)Getting a visual sense of your data - Using R to get a histogram and plot of your data.Finding and Removing OutliersNow, note that SLEEP's max is 99 (hours) and the standard deviation is 14.7 hours. That strikes me as a bit extreme. This is supposed to be observations for individual's hours of sleep a day. Thus any over 24 is an outlier. Here's how we remove outliers. Removing outliers from data.
Comments