Reading time: 15 minutes (2,891 words)

1 Introduction

A string is a sequence of characters usually surrounded by quotation marks (for example in R by " " or ' '). The stringr package provides functions for working with character strings and in this tutorial I’ll show you how to use it in practice. As in my other tutorials I’m using the starwars dataset from the dplyr package. So let’s load both packages.

library(stringr)
library(dplyr)

In this tutorial we will focus on the <chr> columns of the dataset.

starwars %>%
  print()

## # A tibble: 87 x 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu~
##  3 R2-D2        96    32 <NA>       white, bl~ red             33   none  mascu~
##  4 Darth V~    202   136 none       white      yellow          41.9 male  mascu~
##  5 Leia Or~    150    49 brown      light      brown           19   fema~ femin~
##  6 Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
##  7 Beru Wh~    165    75 brown      light      blue            47   fema~ femin~
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu~
##  9 Biggs D~    183    84 black      light      brown           24   male  mascu~
## 10 Obi-Wan~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

2 Detect matches

This first set of functions is used to detect matches in a string. As one input we need a so called pattern which is by default a regular expression for functions of the stringr package. We’ll learn more about regular expressions in section 8. Until then simply consider the pattern = argument as an input for another (sub)string which is then matched with your input string =.

str_detect()

With this function we can detect the presence of a pattern match in a string. It is equivalent to Base R’s grepl() function. Below I am looking for the pattern = Skywalker in the name column of the starwars dataset.

str_detect(string = starwars$name, pattern = "Skywalker")

##  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE

This functions returns a logical vector containing either TRUE when the pattern was matched or FALSE when it wasn’t.

We can also use the function with dplyr::mutate() and dplyr::filter() to get a more meaningful output. I’m storing the output of function str_detect() in a new column match and then subset all rows where TRUE was returned.

starwars %>%
  mutate(match = str_detect(string = name, pattern = "Skywalker")) %>%
  select(name, match) %>%
  filter(match == TRUE)

## # A tibble: 3 x 2
##   name             match
##   <chr>            <lgl>
## 1 Luke Skywalker   TRUE 
## 2 Anakin Skywalker TRUE 
## 3 Shmi Skywalker   TRUE

str_starts() / str_ends()

These two functions are used to detect the presence of a pattern match at the beginning or end of a string. Let’s have a look whose character’s name starts with the letter L…

starwars %>%
  mutate(match = str_starts(string = name, pattern = "L")) %>%
  select(name, match) %>%
  filter(match == TRUE)

## # A tibble: 6 x 2
##   name             match
##   <chr>            <lgl>
## 1 Luke Skywalker   TRUE 
## 2 Leia Organa      TRUE 
## 3 Lando Calrissian TRUE 
## 4 Lobot            TRUE 
## 5 Luminara Unduli  TRUE 
## 6 Lama Su          TRUE

… or ends with the letter e.

starwars %>%
  mutate(match = str_ends(string = name, pattern = "e")) %>%
  select(name, match) %>%
  filter(match == TRUE)

## # A tibble: 5 x 2
##   name                  match
##   <chr>                 <lgl>
## 1 Jabba Desilijic Tiure TRUE 
## 2 Palpatine             TRUE 
## 3 Barriss Offee         TRUE 
## 4 Taun We               TRUE 
## 5 Sly Moore             TRUE

Note that these functions point to the beginning or end of the whole string not of individual words of the string!

str_count()

Next let us count the number of matches in a string with function str_count(). Below I’m counting the occurences of the pattern ke in the strings of column name.

starwars %>%
  mutate(count = str_count(string = starwars$name, pattern = "ke")) %>%
  select(name, count) %>%
  filter(count > 0)

## # A tibble: 4 x 2
##   name                  count
##   <chr>                 <int>
## 1 Luke Skywalker            2
## 2 Anakin Skywalker          1
## 3 Wicket Systri Warrick     1
## 4 Shmi Skywalker            1

str_which()

The function str_which() is used to find the indexes of strings that contain a pattern match. It is equivalent to Base R’s grep() function. Let’s have a look which strings contain the pattern Skywalker.

str_which(string = starwars$name, pattern = "Skywalker")

## [1]  1 11 41

The function returns 1, 11 and 41 which correspond to the row numbers in the starwars datset.

starwars %>%
  select(name) %>%
  slice(1, 11, 41)

## # A tibble: 3 x 1
##   name            
##   <chr>           
## 1 Luke Skywalker  
## 2 Anakin Skywalker
## 3 Shmi Skywalker

The indexes might be used to subset the starwars dataset in a following step. However note that str_which() cannot be used with dplyr::mutate() as it is not vectorised over string = andpattern =.

str_locate()

With this function the positions of pattern matches in a string can be located. This function can only be used with a vector as input and returns a matrix with one column denoting the start position and another column denoting the end position of the string. Below I’m locating the string walk in the name column of the starwars dataset.

str_locate(string = starwars$name, pattern = "walk")[c(11,11,41), ]

##      start end
## [1,]    11  14
## [2,]    11  14
## [3,]     9  12

Note that I’m using [ here to subset the matrix by the rows we’ve found with str_which().

There is a related function called str_locate_all() which returns a list instead of a matrix.

str_locate_all(string = starwars$name, pattern = "walk")[c(1,11,41)]

## [[1]]
##      start end
## [1,]     9  12
## 
## [[2]]
##      start end
## [1,]    11  14
## 
## [[3]]
##      start end
## [1,]     9  12

3 Subset strings

Next we’ll take a look at functions that are used to subset a string.

str_sub()

This is the most basic function in this section and extracts substrings from a character vector based on position. Below I create a new column name_sub which is a substring of the strings of column name that start = at position 1 and end = at position 4.

starwars %>%
  mutate(name_sub = str_sub(string = name, start = 1, end = 4)) %>%
  select(name, name_sub)

## # A tibble: 87 x 2
##    name               name_sub
##    <chr>              <chr>   
##  1 Luke Skywalker     Luke    
##  2 C-3PO              C-3P    
##  3 R2-D2              R2-D    
##  4 Darth Vader        Dart    
##  5 Leia Organa        Leia    
##  6 Owen Lars          Owen    
##  7 Beru Whitesun lars Beru    
##  8 R5-D4              R5-D    
##  9 Biggs Darklighter  Bigg    
## 10 Obi-Wan Kenobi     Obi-    
## # ... with 77 more rows

You may also use negative numbers to move the pointer in the other direction.

starwars %>%
  mutate(name_sub = str_sub(string = name, start = 1, end = -3)) %>%
  select(name, name_sub)

## # A tibble: 87 x 2
##    name               name_sub        
##    <chr>              <chr>           
##  1 Luke Skywalker     Luke Skywalk    
##  2 C-3PO              C-3             
##  3 R2-D2              R2-             
##  4 Darth Vader        Darth Vad       
##  5 Leia Organa        Leia Orga       
##  6 Owen Lars          Owen La         
##  7 Beru Whitesun lars Beru Whitesun la
##  8 R5-D4              R5-             
##  9 Biggs Darklighter  Biggs Darklight 
## 10 Obi-Wan Kenobi     Obi-Wan Keno    
## # ... with 77 more rows

str_subset()

This function returns only the strings that contain a pattern match. It is a wrapper around the str_detect() function from the previous section and also equivalent to Base R’s grep(pattern, x, value = TRUE).

str_subset(string = starwars$name, pattern = "Sky")

## [1] "Luke Skywalker"   "Anakin Skywalker" "Shmi Skywalker"

This function returns only rows with a match of the pattern Sky.

str_extract()

Suppose we want to extract digits that are contained in the string of a Star Wars character’s name. The function str_extract() returns the first pattern match found in each string, as a vector.

str_extract(string = starwars$name, pattern = "[:digit:]")[1:5]

## [1] NA  "3" "2" NA  NA

There is also the function str_extract_all() which returns every pattern match as a list, for example both digits of R2-D2.

str_extract_all(string = starwars$name, pattern = "[:digit:]")[1:5]

## [[1]]
## character(0)
## 
## [[2]]
## [1] "3"
## 
## [[3]]
## [1] "2" "2"
## 
## [[4]]
## character(0)
## 
## [[5]]
## character(0)

The pattern [:digit:] is a special regular expression that comprises the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.

str_match()

The function str_match() returns the first pattern match found in each string as a matrix with a column for each ( ) group in a pattern. In the argument pattern = it is possible to group different patterns together inside multiple ( ). Each string pattern inside a bracket is matched Below I separate the pattern walker into the groups wal and ker.

str_match(string = starwars$name, pattern = "(wal)(ker)")[c(1,11,41),]

##      [,1]     [,2]  [,3] 
## [1,] "walker" "wal" "ker"
## [2,] "walker" "wal" "ker"
## [3,] "walker" "wal" "ker"

The first column displays the complete string. The second and third column show each group.

There is also the function str_match_all(), which returns a list instead of a matrix.

str_match_all(string = starwars$name, pattern = "(wal)(ker)")[c(1,11,41)]

## [[1]]
##      [,1]     [,2]  [,3] 
## [1,] "walker" "wal" "ker"
## 
## [[2]]
##      [,1]     [,2]  [,3] 
## [1,] "walker" "wal" "ker"
## 
## [[3]]
##      [,1]     [,2]  [,3] 
## [1,] "walker" "wal" "ker"

4 Manage lengths

Sometimes it may become necessary for you to know and manipulate the length of the strings in your dataset. The following function will take care of this.

str_length()

The function str_length() returns the width of a string, which is generally equal to the number of characters.

starwars %>%
  mutate(name_length = str_length(string = name)) %>%
  select(name, name_length)

## # A tibble: 87 x 2
##    name               name_length
##    <chr>                    <int>
##  1 Luke Skywalker              14
##  2 C-3PO                        5
##  3 R2-D2                        5
##  4 Darth Vader                 11
##  5 Leia Organa                 11
##  6 Owen Lars                    9
##  7 Beru Whitesun lars          18
##  8 R5-D4                        5
##  9 Biggs Darklighter           17
## 10 Obi-Wan Kenobi              14
## # ... with 77 more rows

Note that blank spaces also number among the length of the string.

str_pad()

With str_pad() you can pad strings to a constant width. Below I’m setting the minimum width = of the name column to 20 characters and add white spaces with pad = to the right side = of each string.

str_pad(string = starwars$name, width = 20, side = "right", pad = " ")

##  [1] "Luke Skywalker      "  "C-3PO               "  "R2-D2               " 
##  [4] "Darth Vader         "  "Leia Organa         "  "Owen Lars           " 
##  [7] "Beru Whitesun lars  "  "R5-D4               "  "Biggs Darklighter   " 
## [10] "Obi-Wan Kenobi      "  "Anakin Skywalker    "  "Wilhuff Tarkin      " 
## [13] "Chewbacca           "  "Han Solo            "  "Greedo              " 
## [16] "Jabba Desilijic Tiure" "Wedge Antilles      "  "Jek Tono Porkins    " 
## [19] "Yoda                "  "Palpatine           "  "Boba Fett           " 
## [22] "IG-88               "  "Bossk               "  "Lando Calrissian    " 
## [25] "Lobot               "  "Ackbar              "  "Mon Mothma          " 
## [28] "Arvel Crynyd        "  "Wicket Systri Warrick" "Nien Nunb           " 
## [31] "Qui-Gon Jinn        "  "Nute Gunray         "  "Finis Valorum       " 
## [34] "Jar Jar Binks       "  "Roos Tarpals        "  "Rugor Nass          " 
## [37] "Ric Olié            "  "Watto               "  "Sebulba             " 
## [40] "Quarsh Panaka       "  "Shmi Skywalker      "  "Darth Maul          " 
## [43] "Bib Fortuna         "  "Ayla Secura         "  "Dud Bolt            " 
## [46] "Gasgano             "  "Ben Quadinaros      "  "Mace Windu          " 
## [49] "Ki-Adi-Mundi        "  "Kit Fisto           "  "Eeth Koth           " 
## [52] "Adi Gallia          "  "Saesee Tiin         "  "Yarael Poof         " 
## [55] "Plo Koon            "  "Mas Amedda          "  "Gregar Typho        " 
## [58] "Cordé               "  "Cliegg Lars         "  "Poggle the Lesser   " 
## [61] "Luminara Unduli     "  "Barriss Offee       "  "Dormé               " 
## [64] "Dooku               "  "Bail Prestor Organa "  "Jango Fett          " 
## [67] "Zam Wesell          "  "Dexter Jettster     "  "Lama Su             " 
## [70] "Taun We             "  "Jocasta Nu          "  "Ratts Tyerell       " 
## [73] "R4-P17              "  "Wat Tambor          "  "San Hill            " 
## [76] "Shaak Ti            "  "Grievous            "  "Tarfful             " 
## [79] "Raymus Antilles     "  "Sly Moore           "  "Tion Medon          " 
## [82] "Finn                "  "Rey                 "  "Poe Dameron         " 
## [85] "BB8                 "  "Captain Phasma      "  "Padmé Amidala       "

Whenever the string has fewer than 20 characters blank spaces are added until the minimum width is reached.

str_trunc()

Instead of padding we can also truncate a string with function str_trunc(). Again we have to set the desired width = after the truncation and also a replacement of the excess content with argument ellipsis =.

str_trunc(string = starwars$name, width = 7, side = "right", ellipsis = "...")

##  [1] "Luke..." "C-3PO"   "R2-D2"   "Dart..." "Leia..." "Owen..." "Beru..."
##  [8] "R5-D4"   "Bigg..." "Obi-..." "Anak..." "Wilh..." "Chew..." "Han ..."
## [15] "Greedo"  "Jabb..." "Wedg..." "Jek ..." "Yoda"    "Palp..." "Boba..."
## [22] "IG-88"   "Bossk"   "Land..." "Lobot"   "Ackbar"  "Mon ..." "Arve..."
## [29] "Wick..." "Nien..." "Qui-..." "Nute..." "Fini..." "Jar ..." "Roos..."
## [36] "Rugo..." "Ric ..." "Watto"   "Sebulba" "Quar..." "Shmi..." "Dart..."
## [43] "Bib ..." "Ayla..." "Dud ..." "Gasgano" "Ben ..." "Mace..." "Ki-A..."
## [50] "Kit ..." "Eeth..." "Adi ..." "Saes..." "Yara..." "Plo ..." "Mas ..."
## [57] "Greg..." "Cordé"   "Clie..." "Pogg..." "Lumi..." "Barr..." "Dormé"  
## [64] "Dooku"   "Bail..." "Jang..." "Zam ..." "Dext..." "Lama Su" "Taun We"
## [71] "Joca..." "Ratt..." "R4-P17"  "Wat ..." "San ..." "Shaa..." "Grie..."
## [78] "Tarfful" "Raym..." "Sly ..." "Tion..." "Finn"    "Rey"     "Poe ..."
## [85] "BB8"     "Capt..." "Padm..."

str_trim()

Often you’ll encounter unwanted whitespace in a string which can be removed with function str_trim(). Use argument side = to indicate the beginning and/or end of the string. Take a look at the following vector names of Star Wars characters.

names_raw[1:20]

##  [1] "   Luke Skywalker   "  "       C-3PO        "  "       R2-D2        " 
##  [4] "    Darth Vader     "  "    Leia Organa     "  "     Owen Lars      " 
##  [7] " Beru Whitesun lars "  "       R5-D4        "  " Biggs Darklighter  " 
## [10] "   Obi-Wan Kenobi   "  "  Anakin Skywalker  "  "   Wilhuff Tarkin   " 
## [13] "     Chewbacca      "  "      Han Solo      "  "       Greedo       " 
## [16] "Jabba Desilijic Tiure" "   Wedge Antilles   "  "  Jek Tono Porkins  " 
## [19] "        Yoda        "  "     Palpatine      "

Every name has annoying whitespace at the beginning and end of the string. Now let’s use str_trim() to get rid of them.

str_trim(names_raw, side = "both")[1:20]

##  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
##  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
##  [7] "Beru Whitesun lars"    "R5-D4"                 "Biggs Darklighter"    
## [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
## [13] "Chewbacca"             "Han Solo"              "Greedo"               
## [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
## [19] "Yoda"                  "Palpatine"

Note that a whitespace that separates individual words of the string are not removed.

str_squish()

In addition to str_trim() the function str_squish() trims whitespace from each end and also collapses multiple whitespaces into a single space.

str_squish(names_raw)[1:20]

##  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
##  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
##  [7] "Beru Whitesun lars"    "R5-D4"                 "Biggs Darklighter"    
## [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
## [13] "Chewbacca"             "Han Solo"              "Greedo"               
## [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
## [19] "Yoda"                  "Palpatine"

5 Mutate strings

When mutating a string we do not only match a pattern to a string but also supply a replacement value for it.

str_sub() <-

We’ve used the function str_sub() in section 3 to reduce a string according to a pattern. In addition we may replace the matched substring with a value of our choice. Below I’m using the str_sub() in with dplyr::mutate() to create a new column names_sub which contains the Star Wars characters names with the replaced substring at the first four characters.

starwars %>%
  mutate(name_sub = `str_sub<-`(name, start = 1, end = 4, value = "replacement")) %>%
  select(name, name_sub)

## # A tibble: 87 x 2
##    name               name_sub                 
##    <chr>              <chr>                    
##  1 Luke Skywalker     replacement Skywalker    
##  2 C-3PO              replacementO             
##  3 R2-D2              replacement2             
##  4 Darth Vader        replacementh Vader       
##  5 Leia Organa        replacement Organa       
##  6 Owen Lars          replacement Lars         
##  7 Beru Whitesun lars replacement Whitesun lars
##  8 R5-D4              replacement4             
##  9 Biggs Darklighter  replacements Darklighter 
## 10 Obi-Wan Kenobi     replacementWan Kenobi    
## # ... with 77 more rows

Note that replacement functions like str_sub() have to be inserted differently in dplyr::mutate() because of the assignment operator <-.

str_replace()

With this function the first matched pattern in each string is replaced. Below I’m replacing the letter e with letter x.

starwars %>%
  mutate(name_new = str_replace(string = name, pattern = "e", replacement = "x")) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new          
##    <chr>              <chr>             
##  1 Luke Skywalker     Lukx Skywalker    
##  2 C-3PO              C-3PO             
##  3 R2-D2              R2-D2             
##  4 Darth Vader        Darth Vadxr       
##  5 Leia Organa        Lxia Organa       
##  6 Owen Lars          Owxn Lars         
##  7 Beru Whitesun lars Bxru Whitesun lars
##  8 R5-D4              R5-D4             
##  9 Biggs Darklighter  Biggs Darklightxr 
## 10 Obi-Wan Kenobi     Obi-Wan Kxnobi    
## # ... with 77 more rows

We can also get rid of the first matched pattern of a string with str_remove(). The special character | inserts an or, that means either a, e, i, o or u are removed from the string - but only the first occurence!

starwars %>%
  mutate(name_new = str_remove(string = name, pattern = "a|e|i|o|u")) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new         
##    <chr>              <chr>            
##  1 Luke Skywalker     Lke Skywalker    
##  2 C-3PO              C-3PO            
##  3 R2-D2              R2-D2            
##  4 Darth Vader        Drth Vader       
##  5 Leia Organa        Lia Organa       
##  6 Owen Lars          Own Lars         
##  7 Beru Whitesun lars Bru Whitesun lars
##  8 R5-D4              R5-D4            
##  9 Biggs Darklighter  Bggs Darklighter 
## 10 Obi-Wan Kenobi     Ob-Wan Kenobi    
## # ... with 77 more rows

str_replace_all()

Furthermore, you can replace or remove all matched patterns in a string. Now all letters e are replaced with x instead of only the first match in each string.

starwars %>%
  mutate(name_new = str_replace_all(string = name, pattern = "e", replacement = "x")) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new          
##    <chr>              <chr>             
##  1 Luke Skywalker     Lukx Skywalkxr    
##  2 C-3PO              C-3PO             
##  3 R2-D2              R2-D2             
##  4 Darth Vader        Darth Vadxr       
##  5 Leia Organa        Lxia Organa       
##  6 Owen Lars          Owxn Lars         
##  7 Beru Whitesun lars Bxru Whitxsun lars
##  8 R5-D4              R5-D4             
##  9 Biggs Darklighter  Biggs Darklightxr 
## 10 Obi-Wan Kenobi     Obi-Wan Kxnobi    
## # ... with 77 more rows

Just the same all vowels are removed in each string of column name with function str_remove_all().

starwars %>%
  mutate(name_new = str_remove_all(string = name, pattern = "a|e|i|o|u")) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new     
##    <chr>              <chr>        
##  1 Luke Skywalker     Lk Skywlkr   
##  2 C-3PO              C-3PO        
##  3 R2-D2              R2-D2        
##  4 Darth Vader        Drth Vdr     
##  5 Leia Organa        L Orgn       
##  6 Owen Lars          Own Lrs      
##  7 Beru Whitesun lars Br Whtsn lrs 
##  8 R5-D4              R5-D4        
##  9 Biggs Darklighter  Bggs Drklghtr
## 10 Obi-Wan Kenobi     Ob-Wn Knb    
## # ... with 77 more rows

str_to_lower()

With this function you can convert strings to lower case.

starwars %>%
  mutate(name_new = str_to_lower(string = name)) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new          
##    <chr>              <chr>             
##  1 Luke Skywalker     luke skywalker    
##  2 C-3PO              c-3po             
##  3 R2-D2              r2-d2             
##  4 Darth Vader        darth vader       
##  5 Leia Organa        leia organa       
##  6 Owen Lars          owen lars         
##  7 Beru Whitesun lars beru whitesun lars
##  8 R5-D4              r5-d4             
##  9 Biggs Darklighter  biggs darklighter 
## 10 Obi-Wan Kenobi     obi-wan kenobi    
## # ... with 77 more rows

str_to_upper()

The function str_to_upper() converts your strings to upper case.

starwars %>%
  mutate(name_new = str_to_upper(string = name)) %>%
  select(name, name_new)

## # A tibble: 87 x 2
##    name               name_new          
##    <chr>              <chr>             
##  1 Luke Skywalker     LUKE SKYWALKER    
##  2 C-3PO              C-3PO             
##  3 R2-D2              R2-D2             
##  4 Darth Vader        DARTH VADER       
##  5 Leia Organa        LEIA ORGANA       
##  6 Owen Lars          OWEN LARS         
##  7 Beru Whitesun lars BERU WHITESUN LARS
##  8 R5-D4              R5-D4             
##  9 Biggs Darklighter  BIGGS DARKLIGHTER 
## 10 Obi-Wan Kenobi     OBI-WAN KENOBI    
## # ... with 77 more rows

str_to_title()

Converting strings to title case with function str_to_title() is also possible.

starwars %>%
  mutate(name_lower = str_to_lower(string = name),
         name_new = str_to_title(string = name_lower)) %>%
  select(name_lower, name_new)

## # A tibble: 87 x 2
##    name_lower         name_new          
##    <chr>              <chr>             
##  1 luke skywalker     Luke Skywalker    
##  2 c-3po              C-3po             
##  3 r2-d2              R2-D2             
##  4 darth vader        Darth Vader       
##  5 leia organa        Leia Organa       
##  6 owen lars          Owen Lars         
##  7 beru whitesun lars Beru Whitesun Lars
##  8 r5-d4              R5-D4             
##  9 biggs darklighter  Biggs Darklighter 
## 10 obi-wan kenobi     Obi-Wan Kenobi    
## # ... with 77 more rows

There is another function called str_to_sentence() which only converts the first character of the string to upper case.

starwars %>%
  mutate(name_lower = str_to_lower(string = name),
         name_new = str_to_sentence(string = name_lower)) %>%
  select(name_lower, name_new)

## # A tibble: 87 x 2
##    name_lower         name_new          
##    <chr>              <chr>             
##  1 luke skywalker     Luke skywalker    
##  2 c-3po              C-3po             
##  3 r2-d2              R2-d2             
##  4 darth vader        Darth vader       
##  5 leia organa        Leia organa       
##  6 owen lars          Owen lars         
##  7 beru whitesun lars Beru whitesun lars
##  8 r5-d4              R5-d4             
##  9 biggs darklighter  Biggs darklighter 
## 10 obi-wan kenobi     Obi-wan kenobi    
## # ... with 77 more rows

6 Join and split

Some functions of the stringr package are used to combine different strings into a new one or to split one string into many.

str_c()

With this function we can join multiple strings into a single one. This function is very similar to Base R’s paste0() function. The argument sep = controls how the strings are combined, below for example with a :.

starwars %>%
  mutate(new_string = str_c(species, name, sep = ":")) %>%
  select(name, species, new_string)

## # A tibble: 87 x 3
##    name               species new_string              
##    <chr>              <chr>   <chr>                   
##  1 Luke Skywalker     Human   Human:Luke Skywalker    
##  2 C-3PO              Droid   Droid:C-3PO             
##  3 R2-D2              Droid   Droid:R2-D2             
##  4 Darth Vader        Human   Human:Darth Vader       
##  5 Leia Organa        Human   Human:Leia Organa       
##  6 Owen Lars          Human   Human:Owen Lars         
##  7 Beru Whitesun lars Human   Human:Beru Whitesun lars
##  8 R5-D4              Droid   Droid:R5-D4             
##  9 Biggs Darklighter  Human   Human:Biggs Darklighter 
## 10 Obi-Wan Kenobi     Human   Human:Obi-Wan Kenobi    
## # ... with 77 more rows

str_flatten()

The function str_flatten()collapses an input vector into a single string, separated by collapse =.

str_flatten(starwars$name, collapse = " ")

## [1] "Luke Skywalker C-3PO R2-D2 Darth Vader Leia Organa Owen Lars Beru Whitesun lars R5-D4 Biggs Darklighter Obi-Wan Kenobi Anakin Skywalker Wilhuff Tarkin Chewbacca Han Solo Greedo Jabba Desilijic Tiure Wedge Antilles Jek Tono Porkins Yoda Palpatine Boba Fett IG-88 Bossk Lando Calrissian Lobot Ackbar Mon Mothma Arvel Crynyd Wicket Systri Warrick Nien Nunb Qui-Gon Jinn Nute Gunray Finis Valorum Jar Jar Binks Roos Tarpals Rugor Nass Ric Olié Watto Sebulba Quarsh Panaka Shmi Skywalker Darth Maul Bib Fortuna Ayla Secura Dud Bolt Gasgano Ben Quadinaros Mace Windu Ki-Adi-Mundi Kit Fisto Eeth Koth Adi Gallia Saesee Tiin Yarael Poof Plo Koon Mas Amedda Gregar Typho Cordé Cliegg Lars Poggle the Lesser Luminara Unduli Barriss Offee Dormé Dooku Bail Prestor Organa Jango Fett Zam Wesell Dexter Jettster Lama Su Taun We Jocasta Nu Ratts Tyerell R4-P17 Wat Tambor San Hill Shaak Ti Grievous Tarfful Raymus Antilles Sly Moore Tion Medon Finn Rey Poe Dameron BB8 Captain Phasma Padmé Amidala"

There is also an optional argument collapse = to function str_c() which allows you to combine the input vector to a single string.

str_c(starwars$name, collapse = " ")

## [1] "Luke Skywalker C-3PO R2-D2 Darth Vader Leia Organa Owen Lars Beru Whitesun lars R5-D4 Biggs Darklighter Obi-Wan Kenobi Anakin Skywalker Wilhuff Tarkin Chewbacca Han Solo Greedo Jabba Desilijic Tiure Wedge Antilles Jek Tono Porkins Yoda Palpatine Boba Fett IG-88 Bossk Lando Calrissian Lobot Ackbar Mon Mothma Arvel Crynyd Wicket Systri Warrick Nien Nunb Qui-Gon Jinn Nute Gunray Finis Valorum Jar Jar Binks Roos Tarpals Rugor Nass Ric Olié Watto Sebulba Quarsh Panaka Shmi Skywalker Darth Maul Bib Fortuna Ayla Secura Dud Bolt Gasgano Ben Quadinaros Mace Windu Ki-Adi-Mundi Kit Fisto Eeth Koth Adi Gallia Saesee Tiin Yarael Poof Plo Koon Mas Amedda Gregar Typho Cordé Cliegg Lars Poggle the Lesser Luminara Unduli Barriss Offee Dormé Dooku Bail Prestor Organa Jango Fett Zam Wesell Dexter Jettster Lama Su Taun We Jocasta Nu Ratts Tyerell R4-P17 Wat Tambor San Hill Shaak Ti Grievous Tarfful Raymus Antilles Sly Moore Tion Medon Finn Rey Poe Dameron BB8 Captain Phasma Padmé Amidala"

str_dup()

With this function you can repeat strings times = times.

str_dup(starwars$name, times = 2)[1:10]

##  [1] "Luke SkywalkerLuke Skywalker"        
##  [2] "C-3POC-3PO"                          
##  [3] "R2-D2R2-D2"                          
##  [4] "Darth VaderDarth Vader"              
##  [5] "Leia OrganaLeia Organa"              
##  [6] "Owen LarsOwen Lars"                  
##  [7] "Beru Whitesun larsBeru Whitesun lars"
##  [8] "R5-D4R5-D4"                          
##  [9] "Biggs DarklighterBiggs Darklighter"  
## [10] "Obi-Wan KenobiObi-Wan Kenobi"

str_split()

Three functions exist to split a vector of strings into a matrix of substrings. The split is carried out at occurrences of a pattern match. First, I split the strings in the name column of the starwars dataset at the occurence of a whitespace (pattern = " "). The function str_split() returns a list unless the argument simplify = TRUE - then a character matrix is returned.

str_split(string = starwars$name, pattern = " ", simplify = FALSE)[1:5]

## [[1]]
## [1] "Luke"      "Skywalker"
## 
## [[2]]
## [1] "C-3PO"
## 
## [[3]]
## [1] "R2-D2"
## 
## [[4]]
## [1] "Darth" "Vader"
## 
## [[5]]
## [1] "Leia"   "Organa"

The function str_split_fixed() always returns a matrix where the number of columns is set with argument n =. Each column captures a part of the string until the first match occurs. Below I’m creating 3 columns.

str_split_fixed(string = starwars$name, pattern = " ", n = 3)[1:10, ]

##       [,1]      [,2]          [,3]  
##  [1,] "Luke"    "Skywalker"   ""    
##  [2,] "C-3PO"   ""            ""    
##  [3,] "R2-D2"   ""            ""    
##  [4,] "Darth"   "Vader"       ""    
##  [5,] "Leia"    "Organa"      ""    
##  [6,] "Owen"    "Lars"        ""    
##  [7,] "Beru"    "Whitesun"    "lars"
##  [8,] "R5-D4"   ""            ""    
##  [9,] "Biggs"   "Darklighter" ""    
## [10,] "Obi-Wan" "Kenobi"      ""

str_glue()

This functions lets you create a string from strings and {expressions} to evaluate. The expressions reflect R code you can specify just like other R code, for example to create the following output.

str_glue("Luke Skywalker has {starwars$eye_color[1]} eyes.")

## Luke Skywalker has blue eyes.

You can access different columns of the same data frame even more easily with str_glue_data().

str_glue_data(starwars, "{name} is a {species} from {homeworld}.")[1:10]

## Luke Skywalker is a Human from Tatooine.
## C-3PO is a Droid from Tatooine.
## R2-D2 is a Droid from Naboo.
## Darth Vader is a Human from Tatooine.
## Leia Organa is a Human from Alderaan.
## Owen Lars is a Human from Tatooine.
## Beru Whitesun lars is a Human from Tatooine.
## R5-D4 is a Droid from Tatooine.
## Biggs Darklighter is a Human from Tatooine.
## Obi-Wan Kenobi is a Human from Stewjon.

7 General functions

This set of functions won’t affect the values of a string but change its apperance, for example through ordering, converting or formatting.

7.1 Order strings

str_order()

With this function you can order a character vector. By default NA values always come last.

str_order(starwars$name)

##  [1] 26 52 11 28 44 65 62 85 47  7 43  9 21 23  2 86 13 59 58 42  4 68 64 63 45
## [26] 51 33 82 46 15 57 77 14 22 16 66 34 18 71 49 50 69 24  5 25  1 61 48 56 27
## [51] 30 32 10  6 87 20 55 84 60 40 31  3 73  8 72 79 83 37 35 36 53 75 39 76 41
## [76] 80 78 70 81 74 38 17 29 12 54 19 67

It returns a vector of indexes where each value represents the position of the string in the name column of the starwars dataset. For example the first row after sorting corresponds to the 26th row before sorting. Let’s check this below with dplyr::slice().

starwars %>% select(name) %>% arrange(name) %>% slice(1)

## # A tibble: 1 x 1
##   name  
##   <chr> 
## 1 Ackbar

starwars %>% select(name) %>% slice(26)

## # A tibble: 1 x 1
##   name  
##   <chr> 
## 1 Ackbar

str_sort()

With this function you can sort a character vector. It is similar to dplyr::arrange().

str_sort(starwars$name)

##  [1] "Ackbar"                "Adi Gallia"            "Anakin Skywalker"     
##  [4] "Arvel Crynyd"          "Ayla Secura"           "Bail Prestor Organa"  
##  [7] "Barriss Offee"         "BB8"                   "Ben Quadinaros"       
## [10] "Beru Whitesun lars"    "Bib Fortuna"           "Biggs Darklighter"    
## [13] "Boba Fett"             "Bossk"                 "C-3PO"                
## [16] "Captain Phasma"        "Chewbacca"             "Cliegg Lars"          
## [19] "Cordé"                 "Darth Maul"            "Darth Vader"          
## [22] "Dexter Jettster"       "Dooku"                 "Dormé"                
## [25] "Dud Bolt"              "Eeth Koth"             "Finis Valorum"        
## [28] "Finn"                  "Gasgano"               "Greedo"               
## [31] "Gregar Typho"          "Grievous"              "Han Solo"             
## [34] "IG-88"                 "Jabba Desilijic Tiure" "Jango Fett"           
## [37] "Jar Jar Binks"         "Jek Tono Porkins"      "Jocasta Nu"           
## [40] "Ki-Adi-Mundi"          "Kit Fisto"             "Lama Su"              
## [43] "Lando Calrissian"      "Leia Organa"           "Lobot"                
## [46] "Luke Skywalker"        "Luminara Unduli"       "Mace Windu"           
## [49] "Mas Amedda"            "Mon Mothma"            "Nien Nunb"            
## [52] "Nute Gunray"           "Obi-Wan Kenobi"        "Owen Lars"            
## [55] "Padmé Amidala"         "Palpatine"             "Plo Koon"             
## [58] "Poe Dameron"           "Poggle the Lesser"     "Quarsh Panaka"        
## [61] "Qui-Gon Jinn"          "R2-D2"                 "R4-P17"               
## [64] "R5-D4"                 "Ratts Tyerell"         "Raymus Antilles"      
## [67] "Rey"                   "Ric Olié"              "Roos Tarpals"         
## [70] "Rugor Nass"            "Saesee Tiin"           "San Hill"             
## [73] "Sebulba"               "Shaak Ti"              "Shmi Skywalker"       
## [76] "Sly Moore"             "Tarfful"               "Taun We"              
## [79] "Tion Medon"            "Wat Tambor"            "Watto"                
## [82] "Wedge Antilles"        "Wicket Systri Warrick" "Wilhuff Tarkin"       
## [85] "Yarael Poof"           "Yoda"                  "Zam Wesell"

numeric =

Both functions have the argument numeric =, which when set to TRUE sorts digits numerically. Compare this output…

str_sort(c("100a10", "100a5", "2b", "2a"), numeric = FALSE)

## [1] "100a10" "100a5"  "2a"     "2b"

…to the output below.

str_sort(c("100a10", "100a5", "2b", "2a"), numeric = TRUE)

## [1] "2a"     "2b"     "100a5"  "100a10"

7.2 Helpers

str_conv()

With this function we can override the encoding of a string. This is sometimes helpful when reading data from other sources into R and odd symbols appear inside the strings.

str_conv(string = starwars$name, encoding = "UTF-8")

##  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
##  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
##  [7] "Beru Whitesun lars"    "R5-D4"                 "Biggs Darklighter"    
## [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
## [13] "Chewbacca"             "Han Solo"              "Greedo"               
## [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
## [19] "Yoda"                  "Palpatine"             "Boba Fett"            
## [22] "IG-88"                 "Bossk"                 "Lando Calrissian"     
## [25] "Lobot"                 "Ackbar"                "Mon Mothma"           
## [28] "Arvel Crynyd"          "Wicket Systri Warrick" "Nien Nunb"            
## [31] "Qui-Gon Jinn"          "Nute Gunray"           "Finis Valorum"        
## [34] "Jar Jar Binks"         "Roos Tarpals"          "Rugor Nass"           
## [37] "Ric Olié"              "Watto"                 "Sebulba"              
## [40] "Quarsh Panaka"         "Shmi Skywalker"        "Darth Maul"           
## [43] "Bib Fortuna"           "Ayla Secura"           "Dud Bolt"             
## [46] "Gasgano"               "Ben Quadinaros"        "Mace Windu"           
## [49] "Ki-Adi-Mundi"          "Kit Fisto"             "Eeth Koth"            
## [52] "Adi Gallia"            "Saesee Tiin"           "Yarael Poof"          
## [55] "Plo Koon"              "Mas Amedda"            "Gregar Typho"         
## [58] "Cordé"                 "Cliegg Lars"           "Poggle the Lesser"    
## [61] "Luminara Unduli"       "Barriss Offee"         "Dormé"                
## [64] "Dooku"                 "Bail Prestor Organa"   "Jango Fett"           
## [67] "Zam Wesell"            "Dexter Jettster"       "Lama Su"              
## [70] "Taun We"               "Jocasta Nu"            "Ratts Tyerell"        
## [73] "R4-P17"                "Wat Tambor"            "San Hill"             
## [76] "Shaak Ti"              "Grievous"              "Tarfful"              
## [79] "Raymus Antilles"       "Sly Moore"             "Tion Medon"           
## [82] "Finn"                  "Rey"                   "Poe Dameron"          
## [85] "BB8"                   "Captain Phasma"        "Padmé Amidala"

You can find more information by looking up the documentation on character encodings.

?stringi::`stringi-encoding`

str_view()

This function is super helpful as you can view a HTML rendering of the first regex matches.

str_view(string = starwars$name[1:4], pattern = "ke")

The function str_view_all() also enables us to render all pattern matches

str_view_all(string = starwars$name[1:4], pattern = "ke")

str_wrap()

This functions enables the Knuth-Plass paragraph wrapping algorithm. First I generate a single string from all strings of the name column with function str_flatten(). This string looks pretty messy when printed to the console.

str_flatten(starwars$name, collapse = " ") %>%
  print()

## [1] "Luke Skywalker C-3PO R2-D2 Darth Vader Leia Organa Owen Lars Beru Whitesun lars R5-D4 Biggs Darklighter Obi-Wan Kenobi Anakin Skywalker Wilhuff Tarkin Chewbacca Han Solo Greedo Jabba Desilijic Tiure Wedge Antilles Jek Tono Porkins Yoda Palpatine Boba Fett IG-88 Bossk Lando Calrissian Lobot Ackbar Mon Mothma Arvel Crynyd Wicket Systri Warrick Nien Nunb Qui-Gon Jinn Nute Gunray Finis Valorum Jar Jar Binks Roos Tarpals Rugor Nass Ric Olié Watto Sebulba Quarsh Panaka Shmi Skywalker Darth Maul Bib Fortuna Ayla Secura Dud Bolt Gasgano Ben Quadinaros Mace Windu Ki-Adi-Mundi Kit Fisto Eeth Koth Adi Gallia Saesee Tiin Yarael Poof Plo Koon Mas Amedda Gregar Typho Cordé Cliegg Lars Poggle the Lesser Luminara Unduli Barriss Offee Dormé Dooku Bail Prestor Organa Jango Fett Zam Wesell Dexter Jettster Lama Su Taun We Jocasta Nu Ratts Tyerell R4-P17 Wat Tambor San Hill Shaak Ti Grievous Tarfful Raymus Antilles Sly Moore Tion Medon Finn Rey Poe Dameron BB8 Captain Phasma Padmé Amidala"

Now let’s use the function str_wrap() with a width = of 80 characters per line. In addition I use cat() to take into account the linke breaks (\n) added by str_wrap().

str_flatten(starwars$name, collapse = " ") %>%
  str_wrap(string = ., width = 80) %>%
  cat(sep = "\n")

## Luke Skywalker C-3PO R2-D2 Darth Vader Leia Organa Owen Lars Beru Whitesun lars
## R5-D4 Biggs Darklighter Obi-Wan Kenobi Anakin Skywalker Wilhuff Tarkin Chewbacca
## Han Solo Greedo Jabba Desilijic Tiure Wedge Antilles Jek Tono Porkins Yoda
## Palpatine Boba Fett IG-88 Bossk Lando Calrissian Lobot Ackbar Mon Mothma Arvel
## Crynyd Wicket Systri Warrick Nien Nunb Qui-Gon Jinn Nute Gunray Finis Valorum
## Jar Jar Binks Roos Tarpals Rugor Nass Ric Olié Watto Sebulba Quarsh Panaka Shmi
## Skywalker Darth Maul Bib Fortuna Ayla Secura Dud Bolt Gasgano Ben Quadinaros
## Mace Windu Ki-Adi-Mundi Kit Fisto Eeth Koth Adi Gallia Saesee Tiin Yarael Poof
## Plo Koon Mas Amedda Gregar Typho Cordé Cliegg Lars Poggle the Lesser Luminara
## Unduli Barriss Offee Dormé Dooku Bail Prestor Organa Jango Fett Zam Wesell
## Dexter Jettster Lama Su Taun We Jocasta Nu Ratts Tyerell R4-P17 Wat Tambor San
## Hill Shaak Ti Grievous Tarfful Raymus Antilles Sly Moore Tion Medon Finn Rey Poe
## Dameron BB8 Captain Phasma Padmé Amidala

8 Regular expressions

Throughout this tutorial we’ve already encountered many regular expressions when setting the match pattern inside a stringr function. In the package stringr all pattern arguments are interpreted as regular expressions after any special characters have been parsed.

Below, I’ve created a modified version of the opening quote of the Star Wars movies:

string <- "A (long)\tt1me ago \\in a {galaxy}\nfaaar, faar, far!? away..."

This string looks pretty messy, especially because I inserted sequences of characters that have a special meaning. For example \n inserts the beginning of a new line. Remeber that whenever a \ appears in a regular expression, you must write it as \\ in the string (pattern) that represents the regular expression. With the function writeLines() we can have a look at how R views my string after all special characters have been parsed.

writeLines(string)

## A (long) t1me ago \in a {galaxy}
## faaar, faar, far!? away...

8.1 Basics

Let’s start with regular expressions used for character matching.

Match characters I

regexp	matches	example
a	a

| | \. | . |

| | \! |! |

| | \? | ? |

| | \\ | |

| | \( |( |

| | \) | ) |

| | \{ |{ |

| | \} | } |

Match characters II

regexp	matches	example
\n	new line (return)

| | \t | tab |

| | \s | any whitespace |

| | \S | any non-whitespace |

| | \d | any digit |

| | \D | any non-digit |

| | \w | any word character |

| | \W | any non-word character |

| | \b | word boundaries |

Match characters III

regexp	matches	example
[:digit:]	digits

| | [:alpha:] | letters |

| | [:lower:] | lowercase letters |

| | [:upper:] | uppercase letters |

| | [:alnum:] | letters and numbers |

| | [:punct:] | punctuation |

| | [:graph:] | letters, numbers, punctuation |

| | [:space:] | space characters (i.e. ) |

| | [:blank:] | space and tab (but not new line) |

| | . | every character except a new line |

8.2 Extensions

There is a whole range of other regular expresions which you can use for quantification, grouping etc.

Alternates

regexp	matches	example
long\|far	or

| | [star] | one of |

| | [^star] | anything but |

| | [a-r] | range |

Anchors

regexp	matches	example
^A	start of string

| | .$ | end of string |

Quantifiers

regexp	matches	example
a?	zero or one

| | a* | zero or more |

| | a+ | one or more |

| | a{3} | exactly n |

| | a{2,} | n or more |

| | a{1,3} | between n and m |

Look arounds

regexp	matches	example
a(?=r)	followed by

| | a(?!r) | not followed by |

| | (?<=f)a | preceded by |

| | (?<!f)a | not preceded by |

Groups

With parentheses you can set precedence (order of evaluation) and create groups.

regexp	matches	example
(al\|f)a	sets precedence

Use an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern. The numbers refers to each group by its order of appearance (1 = first group, 2 = second group, etc.).

regexp	matches	example
(a)(w)\1	first () group

8.3 Interpretation

It is also possible to change the default behavior of stringr to interpret patterns as regular expressions.

regex()

With this function we can modify a regular expression to ignore cases, match the end of lines as well as the end of strings, allow R comments within the expression and/or to have . match everything including \n.

str_detect(string = starwars$name, pattern = regex("sky", ignore_case = TRUE))

##  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE

fixed()

The function fixed() matches raw bytes in the string. It is very fast, but not usually what you want for non-ASCII character sets. Below I have the symbol which is a latin capital I with dot above.

str_detect(string = "\u0130", pattern = fixed("i", ignore_case = TRUE))

## [1] FALSE

coll()

To matche raw bytes and use locale specific collation rules to recognize characters that can be represented in multiple ways you can use coll(). Note that this function is rather slow.

str_detect("\u0130", coll("i", ignore_case = TRUE, locale = "tr"))

## [1] TRUE

boundary()

This function matches boundaries between characters, line breaks, sentences or words.

str_split(starwars$name, boundary("word"))[1:5]

## [[1]]
## [1] "Luke"      "Skywalker"
## 
## [[2]]
## [1] "C"   "3PO"
## 
## [[3]]
## [1] "R2" "D2"
## 
## [[4]]
## [1] "Darth" "Vader"
## 
## [[5]]
## [1] "Leia"   "Organa"

References

Wickham, Hadley. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.

R Tutorial: stringr package

Philipp Leppert

23.12.2021