By Bayram Sarilmaz


2018-06-13 14:40:31 8 Comments

I have a data frame with 20 columns, and I want to plot one specific column (called BB) against each single column in the data frame. The plots I need are probability density plots, and I’m using the following code to generate one plot (plotting columns BB vs. AA as an example):

mydata = as.data.frame(fread("filename.txt")) #read my data as data frame

#function to calculate density
get_density <- function(x, y, n = 100) {
  dens <- MASS::kde2d(x = x, y = y, n = n)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}
set.seed(1)

#define the x and y of the plot; x = column called AA; y = column called BB
xy1 <- data.frame(
  x = mydata$AA,
  y = mydata$BB
)

#call function get_density to calculate density for the defined x an y
xy1$density <- get_density(xy1$x, xy1$y) 

#Plot
ggplot(xy1) + geom_point(aes(x, y, color = density), size = 3, pch = 20) + scale_color_viridis() +
  labs(title = "BB vs. AA") +
  scale_x_continuous(name="AA") +
  scale_y_continuous(name="BB")

Would appreciate it if someone can suggest a method to produce multiple plot of BB against every other column, using the above density function and ggplot command. I tried adding a loop, but found it too complicated especially when defining the x and y to be plotted or calling the density function.

1 comments

@Gregor 2018-06-13 14:50:19

Since you don't provide sample data, I'll demo on mtcars. We convert the data to long format, calculate the densities, and make a faceted plot. We plot the mpg column against all others.

library(dplyr)
library(tidyr)
mtlong = gather(mtcars, key = "var", value = "value", -mpg) %>%
    group_by(var) %>%
    mutate(density = get_density(value, mpg))

ggplot(mtlong, aes(x = value, y = mpg, color = density)) +
    geom_point(pch = 20, size = 3) +
    labs(x = "") +
    facet_wrap(~ var, scales = "free")

enter image description here

@Bayram Sarilmaz 2018-06-13 15:01:47

I've tried the following: mtlong = gather(mydata, key = "var", value = "value", -mydata$BB) %>% group_by(var) %>% mutate(density = get_density(mydata$BB, value)), and got this error: Error: NULL must evaluate to column positions or names, not a double vector

@Gregor 2018-06-13 15:05:35

Don't use data$ inside dplyr verbs. You seem to have replaced mpg with mydata$BB. mpg should be replaced with BB. Only use mydata$ if my code uses mtcars$.

@Bayram Sarilmaz 2018-06-13 16:12:18

Thanks, it seems that this will work. I got another error after doing the change you suggested: Error in mutate_impl(.data, dots) : Evaluation error: bandwidths must be strictly positive. What does that mean?

@Gregor 2018-06-13 16:16:24

Sounds like an error in kde2d. You can see it here in the kde2d code.

@Bayram Sarilmaz 2018-06-13 16:41:36

Any idea how to get around this?

@Gregor 2018-06-13 16:46:45

A typical debugging process would have you isolate the problem by identifying a column that has the issue (subset your data and try different subsets). Then see what's weird about that input. You can also search for the error message. This question would suggest that maybe one of your columns is a constant and can't be used. If you still have trouble, ask a new question and post a minimal example data set that reproduces the problem.

Related Questions

Sponsored Content

8 Answered Questions

[SOLVED] Extracting specific columns from a data frame

  • 2012-04-10 02:24:04
  • Aren Cambre
  • 753402 View
  • 283 Score
  • 8 Answer
  • Tags:   r dataframe

19 Answered Questions

[SOLVED] Drop data frame columns by name

  • 2011-01-05 14:34:29
  • Btibert3
  • 1037566 View
  • 681 Score
  • 19 Answer
  • Tags:   r dataframe r-faq

16 Answered Questions

[SOLVED] Changing column names of a data frame

  • 2011-05-21 11:31:23
  • Son
  • 934703 View
  • 289 Score
  • 16 Answer
  • Tags:   r dataframe

1 Answered Questions

[SOLVED] how to find 2 consecutive numbers from a list and display in a plot in r

  • 2016-08-19 20:33:56
  • user3354212
  • 74 View
  • 0 Score
  • 1 Answer
  • Tags:   r ggplot2

10 Answered Questions

[SOLVED] How to drop columns by name in a data frame

  • 2011-03-08 14:56:26
  • leroux
  • 284584 View
  • 231 Score
  • 10 Answer
  • Tags:   r dataframe subset

1 Answered Questions

[SOLVED] Stacked bar plot in R/ggplot: ensure order of the caption and all elements on it

  • 2016-02-23 21:19:00
  • Rafael Santos
  • 118 View
  • 1 Score
  • 1 Answer
  • Tags:   r ggplot2

0 Answered Questions

R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot

  • 2016-02-23 18:33:09
  • Rafael Santos
  • 232 View
  • 1 Score
  • 0 Answer
  • Tags:   r ggplot2

2 Answered Questions

1 Answered Questions

[SOLVED] R - new column in data frame calculated with a formula variable

  • 2014-01-14 18:19:59
  • John
  • 74 View
  • 0 Score
  • 1 Answer
  • Tags:   r dataframe

1 Answered Questions

[SOLVED] ggplot is not working properly inside a function despite working outside it - R

Sponsored Content