By Wencheng Lau-Medrano


2019-05-15 16:09:41 8 Comments

I have a list of names which looks like this:

c("xxxxxx xx",             "xxx yyy xxxxx",       "xxx yy xxxxxx", 
  "xxxxxxx yyyyyyy xxxxx", "xxxx xxxx",           "xxx yyyyyy xxx", 
  "xxxxx yyyyy xxxxxxxx",  "xxx yyyyyyyy xxxx",   "xx xxx", 
  "xxxxx yyyyy xxxxx",     "xxxx yy xxxxxx",      "xxxxx yyyy xxx", 
  "xxxxxxx yy xxxxx",      "xxxxx yyyyyyy xxxxx", "xxxx yyyy xxxxxx", 
  "xxxxx yyyy xxxxx",      "xxxxxxxx  xxxxx",     "xxxxxx yyyyyyyy xxxxx", 
  "xxxxxx yy xxxxx",       "xxx yyyy xxxxxx")

I need to extract (index) all those names with word of 4-6 letters.

I know that I could split each string, calculate their number of characters with nchar and then index which ones have a length between 2 and 4. But, is there any way to do that with a single line using regular expressions?

The expected output must be a vector: Numeric

[1]  1  2  3  5  6  8  9 11 12 13 15 16 20

Or logical

[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE 
[11] TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE

1 comments

@kath 2019-05-15 16:21:49

Base R
You can use grepl

grepl("\\b\\w{4,6}\\b", my.text)
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

stringr
You can use stringr's str_detect with

library(stringr)
str_detect(my.text, "\\b\\w{4,6}\\b")
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

In both versions the keypoint is the regular expression which matches words of length 4 to 6. \\b indicates a word boundary. \\w matches any word characters; [A-z0-9_]. If you only want to match letters you can use [A-z] or [[:alpha:]] instead of \\w.

Data

my.text <- c("xxxxxx xx", "xxx yyy xxxxx", "xxx yy xxxxxx", "xxxxxxx yyyyyyy xxxxx", 
             "xxxx xxxx", "xxx yyyyyy xxx", "xxxxx yyyyy xxxxxxxx","xxx yyyyyyyy xxxx", "xx xxx")

@Felix T. 2019-05-15 16:25:27

I don't think this will totally work, because It'll return TRUE for words that are greater than 6.

Related Questions

Sponsored Content

31 Answered Questions

[SOLVED] How to count string occurrence in string?

13 Answered Questions

[SOLVED] How do I remove all non alphanumeric characters from a string except dash?

  • 2010-07-09 06:45:32
  • Luke101
  • 357546 View
  • 604 Score
  • 13 Answer
  • Tags:   c# regex

11 Answered Questions

[SOLVED] Check whether a string matches a regex in JS

53 Answered Questions

[SOLVED] What is the best regular expression to check if a string is a valid URL?

12 Answered Questions

[SOLVED] How to negate specific word in regex?

  • 2009-08-06 17:20:45
  • Bostone
  • 697287 View
  • 639 Score
  • 12 Answer
  • Tags:   regex

6 Answered Questions

[SOLVED] Replace specific characters within strings

6 Answered Questions

[SOLVED] How to find the length of a string in R

2 Answered Questions

[SOLVED] Vectors & Ifelse Logic - Won't Populate Vector

1 Answered Questions

optim() Multiple parameter types

  • 2017-05-30 17:19:14
  • Ralph
  • 626 View
  • 0 Score
  • 1 Answer
  • Tags:   r optimization

1 Answered Questions

Split dataset without separating records with common attribute

  • 2014-01-09 09:00:35
  • CptNemo
  • 60 View
  • 0 Score
  • 1 Answer
  • Tags:   r vector split

Sponsored Content