[SOLVED] How to convert a factor to integer\numeric without loss of information?

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

``````f <- factor(sample(runif(5), 20, replace = TRUE))
##   0.0248644019011408 0.0248644019011408 0.179684827337041
##   0.0284090070053935 0.363644931698218  0.363644931698218
##   0.179684827337041  0.249704354675487  0.249704354675487
##  0.0248644019011408 0.249704354675487  0.0284090070053935
##  0.179684827337041  0.0248644019011408 0.179684827337041
##  0.363644931698218  0.249704354675487  0.363644931698218
##  0.179684827337041  0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
##   1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
##   1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
``````

I have to resort to `paste` to get the real values:

``````as.numeric(paste(f))
##   0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
##   0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
##  0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
##  0.17968483 0.02840901
``````

Is there a better way to convert a factor to numeric? @Indi 2017-02-22 18:26:18

Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.

Every answer in this post failed to generate results for me , NAs were getting generated.

``````y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
 NA NA NA NA NA Warning message: NAs introduced by coercion
``````

What worked for me is this -

``````as.integer(y2)
#  1 2 3 4 1
`````` @MrFlick 2017-02-22 19:19:37

Are you sure you had a factor? Look at this example.`y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric` This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information. @Indi 2017-02-22 19:34:42

Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2]  NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed. @Indi 2017-02-22 19:36:14

Let me update my scenario in the answer that I had provided @MrFlick 2017-02-22 19:37:51

OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , `as.numeric(y)` should have worked just fine, no need for the `unclass()`. But again, that's not what this question was about. This answer isn't appropriate here. @Indi 2017-02-22 19:45:10

Well, I really hope it helps someone who was in a hurry like me and read just the title ! @Phil 2017-05-09 16:39:12

@jogo `%>%` is from the `magrittr` package. @Jerry T 2018-11-13 02:37:04

late to the game, accidently, I found `trimws()` can convert `factor(3:5)` to `c("3","4","5")`. Then you can call `as.numeric()`. That is:

``````as.numeric(trimws(x_factor_var))
`````` @MrFlick 2018-11-13 18:54:39

Is there a reason you would recommend using `trimws` over `as.character` as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, `trimws` is just going to do a bunch of unnecessary regular expression work to return the same result. @Jerry T 2019-02-22 18:54:50

as.numeric(levels(f))[f] is might be a bit confusing and hard to remember for beginners. trimws does no harm. @davsjob 2018-11-01 10:05:27

You can use `hablar::convert` if you have a data frame. The syntax is easy:

Sample df

``````library(hablar)
library(dplyr)

df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
``````

Solution

``````df %>%
convert(num(a, b))
``````

gives you:

``````# A tibble: 2 x 2
a     b
<dbl> <dbl>
1    7.  1.50
2    3.  6.30
``````

Or if you want one column to be integer and one numeric:

``````df %>%
convert(int(a),
num(b))
``````

results in:

``````# A tibble: 2 x 2
a     b
<int> <dbl>
1     7  1.50
2     3  6.30
`````` The most easiest way would be to use `unfactor` function from package varhandle

``````unfactor(your_factor_variable)
``````

This example can be a quick start:

``````x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)

class(x)  # -> "character"
class(y)  # -> "numeric"

x <- factor(x)
y <- factor(y)

class(x)  # -> "factor"
class(y)  # -> "factor"

library(varhandle)
x <- unfactor(x)
y <- unfactor(y)

class(x)  # -> "character"
class(y)  # -> "numeric"
`````` @CJB 2016-01-25 09:32:41

The `unfactor` function converts to character data type first and then converts back to numeric. Type `unfactor` at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had. @CJB 2016-01-25 09:38:02

Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach. The `unfactor` function takes care of things that cannot be converted to numeric. Check the examples in `help("unfactor")` @Selrac 2016-09-28 16:35:49

Error: could not find function "unfactor" @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (`library("varhandle")`) first (as I mentioned in the first line of my answer!!) @Selrac 2016-09-29 14:01:31

You are right. Sorry I miss this. @Gregor 2016-11-08 20:03:37

I appreciate that your package probably has some other nice functions too, but installing a new package (and adding an external dependency to your code) isn't as nice or easy as typing `as.character(as.numeric())`. @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the `as.numeric()` and `as.character()` in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions @Joshua Ulrich 2010-08-05 19:01:13

See the Warning section of `?factor`:

In particular, `as.numeric` applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor `f` to approximately its original numeric values, `as.numeric(levels(f))[f]` is recommended and slightly more efficient than `as.numeric(as.character(f))`.

The FAQ on R has similar advice.

Why is `as.numeric(levels(f))[f]` more efficent than `as.numeric(as.character(f))`?

`as.numeric(as.character(f))` is effectively `as.numeric(levels(f)[f])`, so you are performing the conversion to numeric on `length(x)` values, rather than on `nlevels(x)` values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.

Some timings

``````library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
##                         expr   min    lq      mean median     uq      max neval
##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05
##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05
##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05
##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05
##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05
`````` @Ari B. Friedman 2011-08-08 11:27:47

For timings see this answer: stackoverflow.com/questions/6979625/… @Sam 2014-04-18 00:25:03

Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks. @Jonathan 2014-06-27 19:12:01

@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f]. @maycca 2016-04-13 21:23:00

when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you ! @user08041991 2017-01-31 12:25:15

@maycca did you overcame this issue? @maycca 2017-01-31 13:26:29

@user08041991, no sorry, I don't think so... @djhurio 2015-10-09 12:34:35

It is possible only in the case when the factor labels match the original values. I will explain it with an example.

Assume the data is vector `x`:

``````x <- c(20, 10, 30, 20, 10, 40, 10, 40)
``````

Now I will create a factor with four labels:

``````f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
``````

1) `x` is with type double, `f` is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.

``````> typeof(x)
 "double"
> typeof(f)
 "integer"
``````

2) It is not possible to revert back to the original values (10, 20, 30, 40) having only `f` available. We can see that `f` holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.

``````> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
\$levels
 "A" "B" "C" "D"

\$class
 "factor"
``````

To revert back to the original values we have to know the values of levels used in creating the factor. In this case `c(10, 20, 30, 40)`. If we know the original levels (in correct order), we can revert back to the original values.

``````> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
 TRUE
``````

And this will work only in case when labels have been defined for all possible values in the original data.

So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor. @Jealie 2014-03-27 23:39:05

R has a number of (undocumented) convenience functions for converting factors:

• `as.character.factor`
• `as.data.frame.factor`
• `as.Date.factor`
• `as.list.factor`
• `as.vector.factor`
• ...

But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:

``````as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
``````

that you can store at the beginning of your script, or even better in your `.Rprofile` file. @Joshua Ulrich 2014-04-18 12:03:14

There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that `as.integer(factor)` returns the underlying integer codes (as shown in the examples section of `?factor`). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method. @Jealie 2014-04-18 20:11:04

That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome `factor->numeric` conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it `as.numeric.factor` makes sense to me, but YMMV. @Joshua Ulrich 2014-04-18 22:44:21

If you find yourself doing that a lot, then you should do something upstream to avoid it all-together. @jO. 2014-08-08 07:56:02

as.numeric.factor returns NA? @Jealie 2014-08-08 14:43:53

@jO.: in the cases where you used something like `v=NA;as.numeric.factor(v)` or `v='something';as.numeric.factor(v)`, then it should, otherwise you have a weird thing going on somewhere. @TheSciGuy 2019-04-09 18:16:41

Works great! Nicely done.

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

• 2014-01-29 15:21:45
• BrodieG
• 103433 View
• 670 Score
• Tags:   r data.table dplyr

[SOLVED] Please explain working of how are we converting factor variable to numeric in R

• 2018-11-13 18:50:50
• nand
• 57 View
• 0 Score
• Tags:   r

[SOLVED] Reorder levels of a factor without changing order of values

• 2010-03-03 22:44:57
• crangos
• 76874 View
• 109 Score
• Tags:   r levels

[SOLVED] Converting numeric values to factor levels with factor levels assigned on the basis of the numeric ordering

• 2017-02-27 16:51:33
• koteletje
• 143 View
• 1 Score
• Tags:   r

[SOLVED] How can I convert a factor variable with missing values to a numeric variable?

• 2017-01-10 20:29:51
• 688 View
• -1 Score
• Tags:   r

[SOLVED] R: converting some levels of a factor to numeric

• 2016-12-30 06:31:30
• 73 View
• 0 Score
• Tags:   r

How to preserve original values in a variable turned into a factor?

• 2016-09-29 20:45:36
• elikesprogramming
• 365 View
• 3 Score
• Tags:   r

[SOLVED] Convert factor to integer

• 2011-01-25 20:15:24
• Jeff Erickson
• 130395 View
• 58 Score
• Tags:   r integer coercion

[SOLVED] Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work)

• 2011-09-30 14:26:33
• Michael
• 107124 View
• 29 Score
• Tags:   r r-factor