#### [SOLVED] Counting the number of elements with the values of x in a vector

I have a vector of numbers:

``````numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
453,435,324,34,456,56,567,65,34,435)
``````

How can I have R count the number of times a value x appears in the vector? #### @tmfmnk 2020-06-26 12:15:49

One option could be to use `vec_count()` function from the `vctrs` library:

``````vec_count(numbers)

key count
1  435     3
2   67     2
3    4     2
4   34     2
5   56     2
6   23     2
7  456     1
8   43     1
9  453     1
10   5     1
11 657     1
12 324     1
13  54     1
14 567     1
15  65     1
``````

The default ordering puts the most frequent values at top. If looking for sorting according keys (a `table()`-like output):

``````vec_count(numbers, sort = "key")

key count
1    4     2
2    5     1
3   23     2
4   34     2
5   43     1
6   54     1
7   56     2
8   65     1
9   67     2
10 324     1
11 435     3
12 453     1
13 456     1
14 567     1
15 657     1
`````` #### @Nik 2020-03-12 16:40:15

This is a very fast solution for one-dimensional atomic vectors. It relies on `match()`, so it is compatible with `NA`:

``````x <- c("a", NA, "a", "c", "a", "b", NA, "c")

fn <- function(x) {
u <- unique.default(x)
out <- list(x = u, freq = .Internal(tabulate(match(x, u), length(u))))
class(out) <- "data.frame"
attr(out, "row.names") <- seq_along(u)
out
}

fn(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    c    2
#> 4    b    1
``````

You could also tweak the algorithm so that it doesn't run `unique()`.

``````fn2 <- function(x) {
y <- match(x, x)
out <- list(x = x, freq = .Internal(tabulate(y, length(x)))[y])
class(out) <- "data.frame"
attr(out, "row.names") <- seq_along(x)
out
}

fn2(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    a    3
#> 4    c    2
#> 5    a    3
#> 6    b    1
#> 7 <NA>    2
#> 8    c    2
``````

In cases where that output is desirable, you probably don't even need it to re-return the original vector, and the second column is probably all you need. You can get that in one line with the pipe:

``````match(x, x) %>% `[`(tabulate(.), .)

#>  3 2 3 2 3 1 2 2
`````` #### @Taz 2020-05-25 14:00:21

Really great solution! Thats also the fastest one I could come up with. It can be a little bit improved for performance for factor input using u <- if(is.factor(x)) x[!duplicated(x)] else unique(x). #### @Pascal Martin 2020-02-21 15:09:30

A method that is relatively fast on long vectors and gives a convenient output is to use `lengths(split(numbers, numbers))` (note the S at the end of `lengths`):

``````# Make some integer vectors of different sizes
set.seed(123)
x <- sample.int(1e3, 1e4, replace = TRUE)
xl <- sample.int(1e3, 1e6, replace = TRUE)
xxl <-sample.int(1e3, 1e7, replace = TRUE)

# Number of times each value appears in x:
a <- lengths(split(x,x))

# Number of times the value 64 appears:
a["64"]
#~ 64
#~ 15

# Occurences of the first 10 values
a[1:10]
#~ 1  2  3  4  5  6  7  8  9 10
#~ 13 12  6 14 12  5 13 14 11 14
``````

The output is simply a named vector.
The speed appears comparable to `rle` proposed by JBecker and even a bit faster on very long vectors. Here is a microbenchmark in R 3.6.2 with some of the functions proposed:

``````library(microbenchmark)

f1 <- function(vec) lengths(split(vec,vec))
f2 <- function(vec) table(vec)
f3 <- function(vec) rle(sort(vec))
f4 <- function(vec) plyr::count(vec)

microbenchmark(split = f1(x),
table = f2(x),
rle = f3(x),
plyr = f4(x))
#~ Unit: microseconds
#~   expr      min        lq      mean    median        uq      max neval  cld
#~  split  402.024  423.2445  492.3400  446.7695  484.3560 2970.107   100  b
#~  table 1234.888 1290.0150 1378.8902 1333.2445 1382.2005 3203.332   100    d
#~    rle  227.685  238.3845  264.2269  245.7935  279.5435  378.514   100 a
#~   plyr  758.866  793.0020  866.9325  843.2290  894.5620 2346.407   100   c

microbenchmark(split = f1(xl),
table = f2(xl),
rle = f3(xl),
plyr = f4(xl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval cld
#~  split  21.96075  22.42355  26.39247  23.24847  24.60674  82.88853   100 ab
#~  table 100.30543 104.05397 111.62963 105.54308 110.28732 168.27695   100   c
#~    rle  19.07365  20.64686  23.71367  21.30467  23.22815  78.67523   100 a
#~   plyr  24.33968  25.21049  29.71205  26.50363  27.75960  92.02273   100  b

microbenchmark(split = f1(xxl),
table = f2(xxl),
rle = f3(xxl),
plyr = f4(xxl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval  cld
#~  split  296.4496  310.9702  342.6766  332.5098  374.6485  421.1348   100 a
#~  table 1151.4551 1239.9688 1283.8998 1288.0994 1323.1833 1385.3040   100    d
#~    rle  399.9442  430.8396  464.2605  471.4376  483.2439  555.9278   100   c
#~   plyr  350.0607  373.1603  414.3596  425.1436  437.8395  506.0169   100  b
``````

Importantly, the only function that also counts the number of missing values `NA` is `plyr::count`. These can also be obtained separately using `sum(is.na(vec))` #### @GWD 2018-12-17 15:52:21

This can be done with `outer` to get a metrix of equalities followed by `rowSums`, with an obvious meaning.
In order to have the counts and `numbers` in the same dataset, a data.frame is first created. This step is not needed if you want separate input and output.

``````df <- data.frame(No = numbers)
df\$count <- rowSums(outer(df\$No, df\$No, FUN = `==`))
`````` #### @Therii 2018-11-16 16:56:04

There are different ways of counting a specific elements

``````library(plyr)
numbers =c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,7,65,34,435)

print(length(which(numbers==435)))

#Sum counts number of TRUE's in a vector
print(sum(numbers==435))
print(sum(c(TRUE, FALSE, TRUE)))

#count is present in plyr library
#o/p of count is a DataFrame, freq is 1 of the columns of data frame
print(count(numbers[numbers==435]))
print(count(numbers[numbers==435])[['freq']])
`````` #### @ishandutta2007 2017-06-07 13:14:06

``````numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435 453,435,324,34,456,56,567,65,34,435)

> length(grep(435, numbers))
 3

> length(which(435 == numbers))
 3

> require(plyr)
> df = count(numbers)
> df[df\$x == 435, ]
x freq
11 435    3

> sum(435 == numbers)
 3

> sum(grepl(435, numbers))
 3

> sum(435 == numbers)
 3

> tabulate(numbers)
 3

> table(numbers)['435']
435
3

> length(subset(numbers, numbers=='435'))
 3
`````` #### @geotheory 2013-06-06 14:49:12

There is also `count(numbers)` from `plyr` package. Much more convenient than `table` in my opinion. #### @stevec 2020-05-09 03:41:07

Is there a dplyr equivalent of this? You can change the number to whatever you wish in following line

``````length(which(numbers == 4))
`````` #### @Berny 2015-05-15 12:35:40

If you want to count the number of appearances subsequently, you can make use of the `sapply` function:

``````index<-sapply(1:length(numbers),function(x)sum(numbers[1:x]==numbers[x]))
cbind(numbers, index)
``````

Output:

``````        numbers index
[1,]       4     1
[2,]      23     1
[3,]       4     2
[4,]      23     2
[5,]       5     1
[6,]      43     1
[7,]      54     1
[8,]      56     1
[9,]     657     1
[10,]      67     1
[11,]      67     2
[12,]     435     1
[13,]     453     1
[14,]     435     2
[15,]     324     1
[16,]      34     1
[17,]     456     1
[18,]      56     2
[19,]     567     1
[20,]      65     1
[21,]      34     2
[22,]     435     3
`````` #### @Garini 2018-05-30 13:24:28

Is this by any means faster than table?? #### @pomber 2014-12-26 17:06:41

Using table but without comparing with `names`:

``````numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435)
x <- 67
numbertable <- table(numbers)
numbertable[as.character(x)]
#67
# 2
``````

`table` is useful when you are using the counts of different elements several times. If you need only one count, use `sum(numbers == x)` #### @Akash 2014-12-26 07:11:31

One more way i find convenient is:

``````numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
(s<-summary (as.factor(numbers)))
``````

This converts the dataset to factor, and then summary() gives us the control totals (counts of the unique values).

Output is:

``````4   5  23  34  43  54  56  65  67 324 435 453 456 567 657
2   1   2   2   1   1   2   1   2   1   3   1   1   1   1
``````

This can be stored as dataframe if preferred.

as.data.frame(cbind(Number = names(s),Freq = s), stringsAsFactors=F, row.names = 1:length(s))

here row.names has been used to rename row names. without using row.names, column names in s are used as row names in new dataframe

Output is:

``````     Number Freq
1       4    2
2       5    1
3      23    2
4      34    2
5      43    1
6      54    1
7      56    2
8      65    1
9      67    2
10    324    1
11    435    3
12    453    1
13    456    1
14    567    1
15    657    1
`````` #### @JBecker 2012-12-13 21:43:28

My preferred solution uses `rle`, which will return a value (the label, `x` in your example) and a length, which represents how many times that value appeared in sequence.

By combining `rle` with `sort`, you have an extremely fast way to count the number of times any value appeared. This can be helpful with more complex problems.

Example:

``````> numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
> a <- rle(sort(numbers))
> a
Run Length Encoding
lengths: int [1:15] 2 1 2 2 1 1 2 1 2 1 ...
values : num [1:15] 4 5 23 34 43 54 56 65 67 324 ...
``````

If the value you want doesn't show up, or you need to store that value for later, make `a` a `data.frame`.

``````> b <- data.frame(number=a\$values, n=a\$lengths)
> b
values n
1       4 2
2       5 1
3      23 2
4      34 2
5      43 1
6      54 1
7      56 2
8      65 1
9      67 2
10    324 1
11    435 3
12    453 1
13    456 1
14    567 1
15    657 1
``````

I find it is rare that I want to know the frequency of one value and not all of the values, and rle seems to be the quickest way to get count and store them all. #### @Heather Stark 2013-01-31 13:54:48

Is the advantage of this, vs table, that it gives a result in a more readily usable format? thanks #### @JBecker 2013-04-22 20:42:11

@HeatherStark I would say there are two advantages. The first is definitely that it is a more readily used format than the table output. The second is that sometimes I want to count the number of elements "in a row" rather than within the whole dataset. For example, `c(rep('A', 3), rep('G', 4), 'A', rep('G', 2), rep('C', 10))` would return `values = c('A','G','A','G','C')` and `lengths=c(3, 4, 1, 2, 10)` which is sometimes useful. #### @ClementWalter 2016-06-21 16:54:09

using microbenchmark, it appears that `table` is faster `when the vector is long` (I tried 100000) but slightly longer when it shorter (I tried 1000) #### @skan 2016-12-13 19:46:17

This is going to be really slow if you have a lot of numbers. #### @Sergej Andrejev 2012-04-19 13:13:15

There is a standard function in R for that

`tabulate(numbers)` #### @omar 2016-06-01 15:55:10

The disadvantage of `tabulate` is that you can not deal with zero and negative numbers. #### @Dodgie 2017-01-31 00:26:43

But you can deal with zero instances of a given number, which the other solutions do not handle #### @pglpm 2019-07-05 08:36:34

Fantastically fast! And as omar says, it gives zero count for non-appearing values, extremely useful when we want to build a frequency distribution. Zero or negative integers can be handled by adding a constant before using `tabulate`. Note: `sort` seems to be necessary for its correct use in general: `tabulate(sort(numbers))`. The most direct way is `sum(numbers == x)`.

`numbers == x` creates a logical vector which is TRUE at every location that x occurs, and when `sum`ing, the logical vector is coerced to numeric which converts TRUE to 1 and FALSE to 0.

However, note that for floating point numbers it's better to use something like: `sum(abs(numbers - x) < 1e-6)`. #### @JD Long 2009-12-17 18:13:56

good point about the floating point issue. That bites my butt more than I generally like to admit. #### @JBecker 2013-04-22 20:46:07

@Jason while it does answer the question directly, my guess is that folks liked the more general solution that provides the answer for all `x` in the data rather than a specific known value of `x`. To be fair, that was what the original question was about. As I said in my answer below, "I find it is rare that I want to know the frequency of one value and not all of the values..." #### @Jesse 2009-12-17 17:55:16

I would probably do something like this

``````length(which(numbers==x))
``````

But really, a better way is

``````table(numbers)
`````` #### @Ken Williams 2009-12-18 19:41:20

`table(numbers)` is going to do a lot more work than the easiest solution, `sum(numbers==x)`, because it's going to figure out the counts of all the other numbers in the list too. #### @skan 2015-12-02 12:16:16

the problem with table is that it's more difficult to include it inside more complex calculus, for example using apply() on dataframes #### @Shane 2009-12-17 17:25:59

You can just use `table()`:

``````> a <- table(numbers)
> a
numbers
4   5  23  34  43  54  56  65  67 324 435 453 456 567 657
2   1   2   2   1   1   2   1   2   1   3   1   1   1   1
``````

Then you can subset it:

``````> a[names(a)==435]
435
3
``````

Or convert it into a data.frame if you're more comfortable working with that:

``````> as.data.frame(table(numbers))
numbers Freq
1        4    2
2        5    1
3       23    2
4       34    2
...
`````` Don't forget about potential floating point issues, especially with table, which coerces numbers to strings. #### @Shane 2009-12-17 18:18:17

That's a great point. These are all integers, so it isn't a real issue in this example, right? #### @Ian Fellows 2009-12-18 02:11:37

not exactly. The elements of the table are of class integer class(table(numbers)), but 435 is a floating point number. To make it an integer you can use 435L. #### @Heather Stark 2013-01-31 13:52:05

@Ian - I am confused about why 435 is a float in this example. Can you clarify a bit? thanks. #### @baudtack 2013-11-05 05:31:43

@HeatherStark This is because all numbers, unless integers are explicitly requested, are floats by default. #### @pomber 2014-12-26 17:08:17

Why not `a["435"]` insetead of `a[names(a)==435]`? #### @skan 2016-12-13 17:00:51

@pomber if you also had the count for NAs a["NA"] wouldn't work. #### @Garini 2018-05-30 13:25:10

Is the table option faster than a simple sapply as in one of the following answers? #### @JD Long 2009-12-17 17:27:54

here's one fast and dirty way:

``````x <- 23
length(subset(numbers, numbers==x))
``````

### [SOLVED] How can I count the occurrences of a list item?

• 2010-04-08 13:30:00
• weakish
• 1715705 View
• 1543 Score
• Tags:   python list count

### [SOLVED] Drop data frame columns by name

• 2011-01-05 14:34:29
• Btibert3
• 1497809 View
• 880 Score
• Tags:   r dataframe r-faq

### [SOLVED] Count the number occurrences of a character in a string

• 2009-07-20 20:00:36
• Mat
• 979377 View
• 961 Score
• Tags:   python string count

### [SOLVED] jQuery: count number of rows in a table

• 2009-07-19 14:02:41
• danjan
• 698772 View
• 493 Score
• Tags:   jquery count row

### [SOLVED] How do I erase an element from std::vector<> by index?

• 2009-05-17 17:59:36
• dau_man
• 773807 View
• 515 Score
• Tags:   c++ stl vector erase