By eli-k


2012-10-16 23:38:41 8 Comments

Working with a data frame similar to this:

set.seed(100)  
df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))             
df <- df[order(df$cat, df$val), ]  
df  

   cat        val  
1  aaa 0.05638315  
2  aaa 0.25767250  
3  aaa 0.30776611  
4  aaa 0.46854928  
5  aaa 0.55232243  
6  bbb 0.17026205  
7  bbb 0.37032054  
8  bbb 0.48377074  
9  bbb 0.54655860  
10 bbb 0.81240262  
11 ccc 0.28035384  
12 ccc 0.39848790  
13 ccc 0.62499648  
14 ccc 0.76255108  
15 ccc 0.88216552 

I am trying to add a column with numbering within each group. Doing it this way obviously isn't using the powers of R:

 df$num <- 1  
 for (i in 2:(length(df[,1]))) {  
   if (df[i,"cat"]==df[(i-1),"cat"]) {  
     df[i,"num"]<-df[i-1,"num"]+1  
     }  
 }  
 df  

   cat        val num  
1  aaa 0.05638315   1  
2  aaa 0.25767250   2  
3  aaa 0.30776611   3  
4  aaa 0.46854928   4  
5  aaa 0.55232243   5  
6  bbb 0.17026205   1  
7  bbb 0.37032054   2  
8  bbb 0.48377074   3  
9  bbb 0.54655860   4  
10 bbb 0.81240262   5  
11 ccc 0.28035384   1  
12 ccc 0.39848790   2  
13 ccc 0.62499648   3  
14 ccc 0.76255108   4  
15 ccc 0.88216552   5  

What would be a good way to do this?

5 comments

@andrii 2018-09-22 07:40:42

Here is a small improvement trick that allows sort 'val' inside the groups:

# 1. Data set
set.seed(100)
df <- data.frame(
  cat = c(rep("aaa", 5), rep("ccc", 5), rep("bbb", 5)), 
  val = runif(15))             

# 2. 'dplyr' approach
df %>% 
  arrange(cat, val) %>% 
  group_by(cat) %>% 
  mutate(id = row_number())

@zcoleman 2019-01-09 20:40:26

Can you not sort after the group_by?

@hannes101 2018-06-18 09:28:45

I would like to add a data.table variant using the rank() function which provides the additional possibility to change the ordering and thus makes it a bit more flexible than the seq_len() solution and is pretty similar to row_number functions in RDBMS.

# Variant with ascending ordering
library(data.table)
dt <- data.table(df)
dt[, .( val
   , num = rank(val))
    , by = list(cat)][order(cat, num),]

    cat        val num
 1: aaa 0.05638315   1
 2: aaa 0.25767250   2
 3: aaa 0.30776611   3
 4: aaa 0.46854928   4
 5: aaa 0.55232243   5
 6: bbb 0.17026205   1
 7: bbb 0.37032054   2
 8: bbb 0.48377074   3
 9: bbb 0.54655860   4
10: bbb 0.81240262   5
11: ccc 0.28035384   1
12: ccc 0.39848790   2
13: ccc 0.62499648   3
14: ccc 0.76255108   4

# Variant with descending ordering
dt[, .( val
   , num = rank(-val))
    , by = list(cat)][order(cat, num),]

@Jaap 2017-10-06 20:01:39

For making this question more complete, a base R alternative with sequence and rle:

df$num <- sequence(rle(df$cat)$lengths)

which gives the intended result:

> df
   cat        val num
4  aaa 0.05638315   1
2  aaa 0.25767250   2
1  aaa 0.30776611   3
5  aaa 0.46854928   4
3  aaa 0.55232243   5
10 bbb 0.17026205   1
8  bbb 0.37032054   2
6  bbb 0.48377074   3
9  bbb 0.54655860   4
7  bbb 0.81240262   5
13 ccc 0.28035384   1
14 ccc 0.39848790   2
11 ccc 0.62499648   3
15 ccc 0.76255108   4
12 ccc 0.88216552   5

If df$cat is a factor variable, you need to wrap it in as.character first:

df$num <- sequence(rle(as.character(df$cat))$lengths)

@mnel 2012-10-16 23:41:50

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

@Frank 2017-03-14 22:07:28

It might be worth mentioning that ave gives a float instead of an int here. Alternately, could change df$val to seq_len(nrow(df)). I just ran into this over here: stackoverflow.com/questions/42796857/…

@hannes101 2017-07-28 12:23:01

Interestingly this data.table solution seems to be quicker than using frank: library(microbenchmark); microbenchmark(a = DT[, .(val ,num = frank(val)), by = list(cat)] ,b =DT[, .(val , id = seq_len(.N)), by = list(cat)] , times = 1000L)

@EcologyTom 2018-04-10 14:16:39

Thanks! The dplyr solution is good. But if, like me, you kept getting weird errors when trying this approach, make sure that you are not getting conflicts between plyr and dplyr as explained in this post It can be avoided by explicitly calling dplyr::mutate(...)

@chinsoon12 2018-05-23 00:14:07

another data.table method is setDT(df)[, id:=rleid(val), by=.(cat)]

@Przemyslaw Remin 2018-07-24 09:31:59

How to modify library(plyr) and library(dplyr) answers to make the ranking val column in descending order?

@James S. 2018-09-17 01:07:24

I tried using the plyr method and got an error: "Error in unique.default(x) : unique() applies only to vectors" - has anyone ever seen that happen?

@Markus Graf 2018-10-01 09:31:15

@PrzemyslawRemin You can simply sort the whole dataset in advance. df <- df[order(df$val),]

@Markus Graf 2018-10-01 10:30:14

data.table was the moste effective way, it took not a second to compute about 17000 rows. Using ddply it was running for ever so I had to kill the R process.

@alittleboy 2012-10-16 23:51:06

Here is an option using a for loop by groups rather by rows (like OP did)

for (i in unique(df$cat)) df$num[df$cat == i] <- seq_len(sum(df$cat == i))

Related Questions

Sponsored Content

17 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

15 Answered Questions

[SOLVED] Remove rows with all or some NAs (missing values) in data.frame

9 Answered Questions

[SOLVED] Sample random rows in dataframe

17 Answered Questions

[SOLVED] R - list to data frame

  • 2010-11-19 16:40:52
  • Btibert3
  • 561296 View
  • 432 Score
  • 17 Answer
  • Tags:   r list dataframe

18 Answered Questions

[SOLVED] Changing column names of a data frame

  • 2011-05-21 11:31:23
  • Son
  • 1093297 View
  • 331 Score
  • 18 Answer
  • Tags:   r dataframe

14 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

13 Answered Questions

[SOLVED] How to join (merge) data frames (inner, outer, left, right)?

13 Answered Questions

[SOLVED] Drop factor levels in a subsetted data frame

19 Answered Questions

[SOLVED] Drop data frame columns by name

  • 2011-01-05 14:34:29
  • Btibert3
  • 1168945 View
  • 744 Score
  • 19 Answer
  • Tags:   r dataframe r-faq

9 Answered Questions

[SOLVED] Grouping functions (tapply, by, aggregate) and the *apply family

Sponsored Content