By Wencheng Lau-Medrano


2019-05-15 16:09:41 8 Comments

I have a list of names which looks like this:

c("xxxxxx xx",             "xxx yyy xxxxx",       "xxx yy xxxxxx", 
  "xxxxxxx yyyyyyy xxxxx", "xxxx xxxx",           "xxx yyyyyy xxx", 
  "xxxxx yyyyy xxxxxxxx",  "xxx yyyyyyyy xxxx",   "xx xxx", 
  "xxxxx yyyyy xxxxx",     "xxxx yy xxxxxx",      "xxxxx yyyy xxx", 
  "xxxxxxx yy xxxxx",      "xxxxx yyyyyyy xxxxx", "xxxx yyyy xxxxxx", 
  "xxxxx yyyy xxxxx",      "xxxxxxxx  xxxxx",     "xxxxxx yyyyyyyy xxxxx", 
  "xxxxxx yy xxxxx",       "xxx yyyy xxxxxx")

I need to extract (index) all those names with word of 4-6 letters.

I know that I could split each string, calculate their number of characters with nchar and then index which ones have a length between 2 and 4. But, is there any way to do that with a single line using regular expressions?

The expected output must be a vector: Numeric

[1]  1  2  3  5  6  8  9 11 12 13 15 16 20

Or logical

[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE 
[11] TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE

1 comments

@kath 2019-05-15 16:21:49

Base R
You can use grepl

grepl("\\b\\w{4,6}\\b", my.text)
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

stringr
You can use stringr's str_detect with

library(stringr)
str_detect(my.text, "\\b\\w{4,6}\\b")
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

In both versions the keypoint is the regular expression which matches words of length 4 to 6. \\b indicates a word boundary. \\w matches any word characters; [A-z0-9_]. If you only want to match letters you can use [A-z] or [[:alpha:]] instead of \\w.

Data

my.text <- c("xxxxxx xx", "xxx yyy xxxxx", "xxx yy xxxxxx", "xxxxxxx yyyyyyy xxxxx", 
             "xxxx xxxx", "xxx yyyyyy xxx", "xxxxx yyyyy xxxxxxxx","xxx yyyyyyyy xxxx", "xx xxx")

@Felix T. 2019-05-15 16:25:27

I don't think this will totally work, because It'll return TRUE for words that are greater than 6.

@kath 2019-05-15 16:25:55

You're right... thanks I'll fix this!

Related Questions

Sponsored Content

56 Answered Questions

[SOLVED] How to replace all occurrences of a string?

28 Answered Questions

[SOLVED] How to count string occurrence in string?

9 Answered Questions

[SOLVED] Check whether a string matches a regex in JS

11 Answered Questions

[SOLVED] How to negate specific word in regex?

  • 2009-08-06 17:20:45
  • Bostone
  • 599758 View
  • 556 Score
  • 11 Answer
  • Tags:   regex

52 Answered Questions

[SOLVED] What is the best regular expression to check if a string is a valid URL?

6 Answered Questions

[SOLVED] Replace specific characters within strings

6 Answered Questions

[SOLVED] How to find the length of a string in R

2 Answered Questions

[SOLVED] Vectors & Ifelse Logic - Won't Populate Vector

1 Answered Questions

optim() Multiple parameter types

  • 2017-05-30 17:19:14
  • Ralph
  • 461 View
  • 0 Score
  • 1 Answer
  • Tags:   r optimization

1 Answered Questions

Split dataset without separating records with common attribute

  • 2014-01-09 09:00:35
  • CptNemo
  • 59 View
  • 0 Score
  • 1 Answer
  • Tags:   r vector split

Sponsored Content