Lecture 10: Scatter Plots

Today in R: Scatter Plots, Segments, Small Multiples and Vector Power

  1. Scatter plots: geom_point()
  2. Segments: geom_segment()
  3. Small multiples
  4. Instead of a loop: Use vector power

1. Scatter plots

p1 <- ggplot() +
  geom_point(data = df, 
             mapping = aes(x = xvar, y = yvar)) 

Scatter plots: Shapes

p1 <- ggplot() +
  geom_point(data = df, 
             mapping = aes(x = xvar, y = yvar),
             shape = SHAPE.NUMBER) 

Scatter plots: One color

p1 <- ggplot() +
  geom_line(data = polys, 
            mapping = aes(x = xvar, y = yvar),
            color = "COLOR.NAME") 

Scatter plots: Colors by Group

p1 <- ggplot() +
  geom_line(data = polys, 
            mapping = aes(x = xvar, y = yvar, 
                          color = VARIABLE))

To show colors by a variable, specify colors in

  scale_color_manual(values=c('A'='grey', 
                              'E'='red', 
                              'F'='blue'))

Scatter plots: Calling out Regions

  • best fit line: use cautiously
    geom_smooth(method = lm, se = FALSE)
  • best fit curve: same
    geom_smooth(se = FALSE)
  • best fit curve: with shaded error region
    geom_smooth()
  • annotations
    geom_rect() geom_segment()

Some Examples With Property Data from Arlington, VA

  • property data for Arlington County, VA
  • observe attributes about properties
    • assessed value
    • year built
    • many other things

Some Examples With Property Data from Arlington, VA

p1 <- ggplot() +
  geom_point(data = arl.samp, 
            mapping = aes(x = PropertyYearBuilt, 
                          y = ln.TotalAssessedAmt))

Some Examples With Property Data from Arlington, VA

Colors and Shape for Property Data from Arlington, VA

p2 <- ggplot() +
  geom_point(data = arl.samp, 
            mapping = aes(x = PropertyYearBuilt, 
                          y = ln.TotalAssessedAmt),
            color = "blue",
            shape = 17)

Colors and Shape for Property Data from Arlington, VA

Colors by Value for Property Data from Arlington, VA

p2 <- ggplot() +
  geom_point(data = arl.samp, 
            mapping = aes(x = PropertyYearBuilt, 
                          y = ln.TotalAssessedAmt,
                          color = as.factor(postwar)),
            scale_color_manual = c("blue","red"))

Colors by Value for Property Data from Arlington, VA

2. Drawing Segments

This is a scatterplot with segments!

Thanks to WSJ.

Code Segments

s2 <- ggplot() +
      geom_segment(data = df,
                   mapping = aes(x = VARIABLE1, 
                                 xend = VARIABLE2,
                                 y = VARIABLE3, 
                                 yend = VARIABLE4))

There is also geom_curve for brave people

3. Small Multiples, or Facets

 facet_grid(rows = vars(VARIABLE))

Thanks to Winston Chang.

Facet Columns

 facet_grid(cols = vars(VARAIBLE))

Or both.

Faceting for Arlington

print(table(arl.samp$CommercialInd))

False  True 
12085   299 
p2 <- ggplot() +
  geom_point(data = arl.samp, 
            mapping = aes(x = PropertyYearBuilt, 
                          y = ln.TotalAssessedAmt,
                          color = as.factor(postwar))) +
  scale_color_manual(values = c("blue","red")) +
  facet_grid(rows = vars(CommercialInd))

Faceting for Arlington

4. Avoiding a Loop

Suppose you want to do this many times

df$ln.x <- log(df$x)

This does not work!

tolog <- c(x,y,z)
for(i in tolog){
  df$ln.i <- log(df$i)  
}

The Elegant Solution

tolog <- c("x","y","z")
df[paste0("ln.",tolog)] <- log(df[tolog])

The Elegant Solution in Action

df <- data.frame(x = c(1, 2, 3),
                 y = c(10, 20, 30),
                 z = c(100, 200, 300))
df
  x  y   z
1 1 10 100
2 2 20 200
3 3 30 300

The Elegant Solution in Action

df <- data.frame(x = c(1, 2, 3),
                 y = c(10, 20, 30),
                 z = c(100, 200, 300))
tolog <- c("x","y","z")
df[paste0("ln.",tolog)] <- log(df[tolog])
df
  x  y   z      ln.x     ln.y     ln.z
1 1 10 100 0.0000000 2.302585 4.605170
2 2 20 200 0.6931472 2.995732 5.298317
3 3 30 300 1.0986123 3.401197 5.703782