\name{regmatches} \alias{regmatches} \alias{regmatches<-} \title{Extract or Replace Matched Substrings} \description{ Extract or replace matched substrings from match data obtained by \code{\link{regexpr}}, \code{\link{gregexpr}} or \code{\link{regexec}}. } \usage{ regmatches(x, m, invert = FALSE) regmatches(x, m, invert = FALSE) <- value } \arguments{ \item{x}{a character vector} \item{m}{an object with match data} \item{invert}{a logical: if \code{TRUE}, extract or replace the non-matched substrings.} \item{value}{an object with suitable replacement values for the matched or non-matched substrings (see \code{Details}).} } \details{ If \code{invert} is \code{FALSE} (default), \code{regmatches} extracts the matched substrings as specified by the match data. For vector match data (as obtained from \code{\link{regexpr}}), empty matches are dropped; for list match data, empty matches give empty components (zero-length character vectors). If \code{invert} is \code{TRUE}, \code{regmatches} extracts the non-matched substrings, i.e., the strings are split according to the matches similar to \code{\link{strsplit}} (for vector match data, at most a single split is performed). Note that the match data can be obtained from regular expression matching on a modified version of \code{x} with the same numbers of characters. The replacement function can be used for replacing the matched or non-matched substrings. For vector match data, if \code{invert} is \code{FALSE}, \code{value} should be a character vector with length the number of matched elements in \code{m}. Otherwise, it should be a list of character vectors with the same length as \code{m}, each as long as the number of replacements needed. Replacement coerces values to character or list and generously recycles values as needed. Missing replacement values are not allowed. } \value{ For \code{regmatches}, a character vector with the matched substrings if \code{m} is a vector and \code{invert} is \code{FALSE}. Otherwise, a list with the matched or non-matched substrings. For \code{regmatches<-}, the updated character vector. } \examples{ x <- c("A and B", "A, B and C", "A, B, C and D", "foobar") pattern <- "[[:space:]]*(,|and)[[:space:]]" ## Match data from regexpr() m <- regexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Match data from gregexpr() m <- gregexpr(pattern, x) regmatches(x, m) regmatches(x, m, invert = TRUE) ## Consider x <- "John (fishing, hunting), Paul (hiking, biking)" ## Suppose we want to split at the comma (plus spaces) between the ## persons, but not at the commas in the parenthesized hobby lists. ## One idea is to "blank out" the parenthesized parts to match the ## parts to be used for splitting, and extract the persons as the ## non-matched parts. ## First, match the parenthesized hobby lists. m <- gregexpr("\\\\([^)]*\\\\)", x) ## Write a little utility for creating blank strings with given numbers ## of characters. blanks <- function(n) { vapply(Map(rep.int, rep.int(" ", length(n)), n, USE.NAMES = FALSE), paste, "", collapse = "") } ## Create a copy of x with the parenthesized parts blanked out. s <- x regmatches(s, m) <- Map(blanks, lapply(regmatches(s, m), nchar)) s ## Compute the positions of the split matches (note that we cannot call ## strsplit() on x with match data from s). m <- gregexpr(", *", s) ## And finally extract the non-matched parts. regmatches(x, m, invert = TRUE) } \keyword{character} \keyword{utilities}