This function takes delimiters for the beginning and (optionally, if different) end of sections of a string, and returns a vector with split string elements.
split_sandwiches(.string, start_rx, end_rx = NULL)
.string | A string |
---|---|
start_rx | A regular expression denoting the beginning of a section. Use |
end_rx | A regular expression denoting the end of a section. If none supplied, sections end when the next |
A vector of strings
The main use case for split_sandwiches()
is for html editing: You might want to separate the original text from the html tags, make certain edits to the text only, and then re-wrap the tags.
This is different from str_split()
or similar, because the delimiters are preserved and remain attached to a section.
Note that split_sandwiches()
is not vectorized (sorry). It only takes a single character object.
my_string <- "<span style='text-color:blue'> I am blue and <b>bold</b>, yay! </span>" split_sandwiches(my_string, "\\<[^\\>\\<]*\\>")#> [1] "<span style='text-color:blue'>" " I am blue and " #> [3] "<b>" "bold" #> [5] "</b>" ", yay! " #> [7] "</span>"