By Julio Diaz


2011-03-06 04:20:48 8 Comments

I am trying to make a bar graph where the largest bar would be nearest to the y axis and the shortest bar would be furthest. So this is kind of like the Table I have

    Name   Position
1   James  Goalkeeper
2   Frank  Goalkeeper
3   Jean   Defense
4   Steve  Defense
5   John   Defense
6   Tim    Striker

So I am trying to build a bar graph that would show the number of players according to position

p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)

but the graph shows the goalkeeper bar first then the defense, and finally the striker one. I would want the graph to be ordered so that the defense bar is closest to the y axis, the goalkeeper one, and finally the striker one. Thanks

12 comments

@Gavin Simpson 2011-03-06 13:42:41

The key with ordering is to set the levels of the factor in the order you want. An ordered factor is not required; the extra information in an ordered factor isn't necessary and if these data are being used in any statistical model, the wrong parametrisation might result — polynomial contrasts aren't right for nominal data such as this.

## set the levels in order we want
theTable <- within(theTable, 
                   Position <- factor(Position, 
                                      levels=names(sort(table(Position), 
                                                        decreasing=TRUE))))
## plot
ggplot(theTable,aes(x=Position))+geom_bar(binwidth=1)

barplot figure

In the most general sense, we simply need to set the factor levels to be in the desired order. If left unspecified, the levels of a factor will be sorted alphabetically. You can also specify the level order within the call to factor as above, and other ways are possible as well.

theTable$Position <- factor(theTable$Position, levels = c(...))

@Prasad Chalasani 2011-03-06 15:16:44

@Gavin: 2 simplifications: since you already are using within, there's no need to use theTable$Position, and you could just do sort(-table(...)) for decreasing order.

@Gavin Simpson 2011-03-06 15:22:08

@Prasad the former was a leftover from testing so thanks for pointing that out. As far the latter, I prefer explicitly asking for the reversed sort than the - you use as it is far easier to get the intention from decreasing = TRUE than noticing the - in all the rest of the code.

@Prasad Chalasani 2011-03-06 15:34:11

@Gavin ok I see what you mean

@Gavin Simpson 2011-03-06 15:39:41

@Prasad - it it just personal preference after many years writing analysis scripts in my work that I have had to revisit at times and cursed myself for not writing clearer code. There is nothing wrong with using -.

@Prasad Chalasani 2011-03-06 22:22:19

@Gavin, sure your approach makes sense. I frequently choose shorter syntax over clarity but I know it can come back to bite me sometimes!

@Ömer An 2016-12-29 02:46:01

geom_bar() no longer has a binwidth parameter. Please use geom_histogram() instead.

@Gavin Simpson 2017-05-25 16:32:39

@LéoLéopoldHertz준영 Please stop spamming me; I just told you why your claim on the reorder() solution was wrong. If you want help with your specific question, ask a new question and explain what the difference is with existing answers.

@Anton 2019-02-18 11:56:06

@GavinSimpson; I think the part about levels(theTable$Position) <- c(...) leads to undesired behaviour where the actual entries of the data frame gets reordered, and not just the levels of the factor. See this question. Maybe you should modify or remove those lines?

@Gregor 2019-02-18 23:03:55

Strongly agree with Anton. I just saw this question and went poking around on where they got the bad advice to use levels<-. I'm going to edit that part out, at least tentatively.

@Gavin Simpson 2019-02-19 04:09:56

@Anton Thanks for the suggestion (and to Gregor for the edit); I would never do this via levels<-() today. This is something from from 8 years back and I can't recall if things were different back then or whether I was just plain wrong, but regardless, it is wrong and should be erased! Thanks!

@indubitably 2019-02-14 11:41:42

Since we are only looking at the distribution of a single variable ("Position") as opposed to looking at the relationship between two variables, then perhaps a histogram would be the more appropriate graph. ggplot has geom_histogram() that makes it easy:

ggplot(theTable, aes(x = Position)) + geom_histogram(stat="count")

enter image description here

Using geom_histogram():

I think geom_histogram() is a little quirky as it treats continuous and discrete data differently.

For continuous data, you can just use geom_histogram() with no parameters. For example, if we add in a numeric vector "Score"...

    Name   Position   Score  
1   James  Goalkeeper 10
2   Frank  Goalkeeper 20
3   Jean   Defense    10
4   Steve  Defense    10
5   John   Defense    20
6   Tim    Striker    50

and use geom_histogram() on the "Score" variable...

ggplot(theTable, aes(x = Score)) + geom_histogram()

enter image description here

For discrete data like "Position" we have to specify a calculated statistic computed by the aesthetic to give the y value for the height of the bars using stat = "count":

 ggplot(theTable, aes(x = Position)) + geom_histogram(stat = "count")

Note: Curiously and confusingly you can also use stat = "count" for continuous data as well and I think it provides a more aesthetically pleasing graph.

ggplot(theTable, aes(x = Score)) + geom_histogram(stat = "count")

enter image description here

Edits: Extended answer in response to DebanjanB's helpful suggestions.

@indubitably 2019-02-15 01:01:14

Thank you for your suggestions. I have extended my answer and hopefully improved it.

@mpalanco 2019-02-03 15:27:03

Another alternative using reorder to order the levels of a factor. In ascending (n) or descending order (-n) based on the count. Very similar to the one using fct_reorder from the forcats package:

Descending order

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, -n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

enter image description here

Ascending order

df %>%
  count(Position) %>%
  ggplot(aes(x = reorder(Position, n), y = n)) +
  geom_bar(stat = 'identity') +
  xlab("Position")

enter image description here

Data frame:

df <- structure(list(Position = structure(c(3L, 3L, 1L, 1L, 1L, 2L), .Label = c("Defense", 
"Striker", "Zoalkeeper"), class = "factor"), Name = structure(c(2L, 
1L, 3L, 5L, 4L, 6L), .Label = c("Frank", "James", "Jean", "John", 
"Steve", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

@JColares 2018-08-03 07:17:19

If the chart columns come from a numeric variable as in the dataframe below, you can use a simpler solution:

ggplot(df, aes(x = reorder(Colors, -Qty, sum), y = Qty)) 
+ geom_bar(stat = "identity")  

The minus sign before the sort variable (-Qty) controls the sort direction (ascending/descending)

Here's some data for testing:

df <- data.frame(Colors = c("Green","Yellow","Blue","Red","Yellow","Blue"),  
                 Qty = c(7,4,5,1,3,6)
                )

**Sample data:**
  Colors Qty
1  Green   7
2 Yellow   4
3   Blue   5
4    Red   1
5 Yellow   3
6   Blue   6

When I found this thread, that was the answer I was looking for. Hope it's useful for others.

@Robert McDonald 2018-02-24 04:19:22

In addition to forcats::fct_infreq, mentioned by @HolgerBrandl, there is forcats::fct_rev, which reverses the factor order.

theTable <- data.frame(
    Position= 
        c("Zoalkeeper", "Zoalkeeper", "Defense",
          "Defense", "Defense", "Striker"),
    Name=c("James", "Frank","Jean",
           "Steve","John", "Tim"))

p1 <- ggplot(theTable, aes(x = Position)) + geom_bar()
p2 <- ggplot(theTable, aes(x = fct_infreq(Position))) + geom_bar()
p3 <- ggplot(theTable, aes(x = fct_rev(fct_infreq(Position)))) + geom_bar()

gridExtra::grid.arrange(p1, p2, p3, nrow=3)             

gplot output

@Paul 2019-02-25 18:26:27

"fct_infreq(Position)" is the little thing that does so much, thanks!!

@Holger Brandl 2014-12-12 16:58:13

I think the already provided solutions are overly verbose. A more concise way to do a frequency sorted barplot with ggplot is

ggplot(theTable, aes(x=reorder(Position, -table(Position)[Position]))) + geom_bar()

It's similar to what Alex Brown suggested, but a bit shorter and works without an anynymous function definition.

Update

I think my old solution was good at the time, but nowadays I'd rather use forcats::fct_infreq which is sorting factor levels by frequency:

require(forcats)

ggplot(theTable, aes(fct_infreq(Position))) + geom_bar()

@user3282777 2015-09-20 05:26:46

I do not understand the second argument to reorder function and what does it do. Can you kindly explain what is happening?

@Holger Brandl 2015-09-21 06:42:59

@user3282777 have you tried the docs stat.ethz.ch/R-manual/R-devel/library/stats/html/… ?

@Dan 2018-04-25 19:13:58

thanks for coming back and updating your answer!

@Mike 2019-03-11 14:18:18

Great solution! Good to see others employing tidyverse solutions!

@user2739472 2016-12-08 13:22:49

Like reorder() in Alex Brown's answer, we could also use forcats::fct_reorder(). It will basically sort the factors specified in the 1st arg, according to the values in the 2nd arg after applying a specified function (default = median, which is what we use here as just have one value per factor level).

It is a shame that in the OP's question, the order required is also alphabetical as that is the default sort order when you create factors, so will hide what this function is actually doing. To make it more clear, I'll replace "Goalkeeper" with "Zoalkeeper".

library(tidyverse)
library(forcats)

theTable <- data.frame(
                Name = c('James', 'Frank', 'Jean', 'Steve', 'John', 'Tim'),
                Position = c('Zoalkeeper', 'Zoalkeeper', 'Defense',
                             'Defense', 'Defense', 'Striker'))

theTable %>%
    count(Position) %>%
    mutate(Position = fct_reorder(Position, n, .desc = TRUE)) %>%
    ggplot(aes(x = Position, y = n)) + geom_bar(stat = 'identity')

enter image description here

@c0bra 2018-08-27 08:47:26

IMHO best solution as forcats is as well as dplyr a tidyverse package.

@Alexandru Papiu 2016-07-31 19:11:08

I agree with zach that counting within dplyr is the best solution. I've found this to be the shortest version:

dplyr::count(theTable, Position) %>%
          arrange(-n) %>%
          mutate(Position = factor(Position, Position)) %>%
          ggplot(aes(x=Position, y=n)) + geom_bar(stat="identity")

This will also be significantly faster than reordering the factor levels beforehand since the count is done in dplyr not in ggplot or using table.

@zach 2016-07-29 16:15:32

A simple dplyr based reordering of factors can solve this problem:

library(dplyr)

#reorder the table and reset the factor to that ordering
theTable %>%
  group_by(Position) %>%                              # calculate the counts
  summarize(counts = n()) %>%
  arrange(-counts) %>%                                # sort by counts
  mutate(Position = factor(Position, Position)) %>%   # reset factor
  ggplot(aes(x=Position, y=counts)) +                 # plot 
    geom_bar(stat="identity")                         # plot histogram

@QIBIN LI 2014-12-01 13:20:16

Using scale_x_discrete (limits = ...) to specify the order of bars.

positions <- c("Goalkeeper", "Defense", "Striker")
p <- ggplot(theTable, aes(x = Position)) + scale_x_discrete(limits = positions)

@Yu Shen 2015-04-28 01:04:21

Your solution is the most suitable to my situation, as I want to program to plot with x being an arbitrary column expressed by a variable in a data.frame. The other suggestions would be harder to express the arrangement of the order of x by an expression involving the variable. Thanks! If there is interest, I can share my solution using your suggestion. Just one more issue, adding scale_x_discrete(limits = ...), I found that there is blank space as wide as the bar-chart, on the right of the chart. How can I get rid of the blank space? As it does not serve any purpose.

@geotheory 2015-08-04 09:50:03

This seems necessary for ordering histogram bars

@Dan Nguyen 2015-09-10 13:53:22

QIBIN: Wow...the other answers here work, but your answer by far seems not just the most concise and elegant, but the most obvious when thinking from within ggplot's framework. Thank you.

@user2460499 2017-05-25 18:13:31

When I tried this solution, on my data it, didn't graph NAs. Is there a way to use this solution and have it graph NAs?

@Kalif Vaughn 2018-11-06 17:00:46

This is an elegant and simple solution - thank you!!

@Lauren Fitch 2018-11-09 21:02:28

This solution worked for me where the others above did not.

@Alex Brown 2012-02-10 17:13:16

@GavinSimpson: reorder is a powerful and effective solution for this:

ggplot(theTable,
       aes(x=reorder(Position,Position,
                     function(x)-length(x)))) +
       geom_bar()

@Gavin Simpson 2012-06-14 10:05:06

Indeed +1, and especially in this case where there is a logical order that we can exploit numerically. If we consider arbitrary ordering of categories and we don't want alphabetical then it is just as easy (easier?) to specify the levels directly as shown.

@Prasad Chalasani 2011-03-06 04:44:07

You just need to specify the Position column to be an ordered factor where the levels are ordered by their counts:

theTable <- transform( theTable,
       Position = ordered(Position, levels = names( sort(-table(Position)))))

(Note that the table(Position) produces a frequency-count of the Position column.)

Then your ggplot function will show the bars in decreasing order of count. I don't know if there's an option in geom_bar to do this without having to explicitly create an ordered factor.

@Chase 2011-03-06 13:44:35

I didn't fully parse your code up there, but I'm pretty sure reorder() from the stats library accomplishes the same task.

@Gavin Simpson 2011-03-06 14:23:52

@Chase how do you propose using reorder() in this case? The factor requiring reordering needs to be reordered by some function of itself and I'm struggling to see a good way to do that.

@Gavin Simpson 2011-03-06 14:39:36

ok, with(theTable, reorder(Position, as.character(Position), function(x) sum(duplicated(x)))) is one way, and another with(theTable, reorder(Position, as.character(Position), function(x) as.numeric(table(x)))) but these are just as convoluted...

@Prasad Chalasani 2011-03-06 14:55:03

I simplified the answer slightly to use sort rather than order

@Chase 2011-03-06 15:45:06

@Gavin - perhaps I misunderstood Prasad's original code (I don't have R on this machine to test...) but it looked as if he was reordering the categories based on frequency, which reorder is adept at doing. I agree for this question that something more involved is needed. Sorry for the confusion.

@Léo Léopold Hertz 준영 2017-05-25 16:18:14

This does proposal not work with the data set provided in my other comments today.

Related Questions

Sponsored Content

6 Answered Questions

[SOLVED] Plotting two variables as lines using ggplot2 on the same graph

16 Answered Questions

[SOLVED] Plot two graphs in same plot in R

  • 2010-04-01 23:28:14
  • Sandra Schlichting
  • 1091166 View
  • 493 Score
  • 16 Answer
  • Tags:   r plot ggplot2 r-faq

5 Answered Questions

[SOLVED] Rotating and spacing axis labels in ggplot2

  • 2009-08-25 21:05:24
  • Christopher DuBois
  • 560741 View
  • 540 Score
  • 5 Answer
  • Tags:   r ggplot2 labels

12 Answered Questions

[SOLVED] Side-by-side plots with ggplot2

  • 2009-08-08 18:16:43
  • Christopher DuBois
  • 268622 View
  • 273 Score
  • 12 Answer
  • Tags:   r visualization ggplot2

1 Answered Questions

[SOLVED] ggplot2 | geom_bar and position = "identity"

  • 2016-08-24 07:52:53
  • watchtower
  • 8831 View
  • 2 Score
  • 1 Answer
  • Tags:   r ggplot2

1 Answered Questions

[SOLVED] Stacked Bar Graph of Count of Variables within date bins

  • 2018-03-07 18:11:48
  • Mathew James
  • 37 View
  • 0 Score
  • 1 Answer
  • Tags:   r

7 Answered Questions

[SOLVED] How can we make xkcd style graphs?

  • 2012-10-01 14:22:51
  • jebyrnes
  • 72456 View
  • 654 Score
  • 7 Answer
  • Tags:   r ggplot2

1 Answered Questions

1 Answered Questions

ggplot2: fill & position='dodge' for geom_line

  • 2014-11-06 21:24:38
  • smgmu
  • 492 View
  • -1 Score
  • 1 Answer
  • Tags:   r ggplot2

2 Answered Questions

[SOLVED] ggplot: y-axis (breaks) values from stacked proportional bar graph?

Sponsored Content