By hek2mgl


2015-04-13 19:17:00 8 Comments

I'm wondering whether it is possible to write a 100% reliable sed command to escape any regex metacharacters in an input string so that it can be used in a subsequent sed command. Like this:

#!/bin/bash
# Trying to replace one regex by another in an input file with sed

search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3"
replace="/xyz\n\t[0-9]\+\([^ ]\)\{2,3\}\3"

# Sanitize input
search=$(sed 'script to escape' <<< "$search")
replace=$(sed 'script to escape' <<< "$replace")

# Use it in a sed command
sed "s/$search/$replace/" input

I know that there are better tools to work with fixed strings instead of patterns, for example awk, perl or python. I would just like to prove whether it is possible or not with sed. I would say let's concentrate on basic POSIX regexes to have even more fun! :)

I have tried a lot of things but anytime I could find an input which broke my attempt. I thought keeping it abstract as script to escape would not lead anybody into the wrong direction.

Btw, the discussion came up here. I thought this could be a good place to collect solutions and probably break and/or elaborate them.

2 comments

@mklement0 2015-04-13 19:34:03

Note:

  • If you're looking for prepackaged functionality based on the techniques discussed in this answer:
    • bash functions that enable robust escaping even in multi-line substitutions can be found at the bottom of this post (plus a perl solution that uses perl's built-in support for such escaping).
    • @EdMorton's answer contains a tool (bash script) that robustly performs single-line substitutions.
  • All snippets assume bash as the shell (POSIX-compliant reformulations are possible):

SINGLE-line Solutions


Escaping a string literal for use as a regex in sed:

To give credit where credit is due: I found the regex used below in this answer.

Assuming that the search string is a single-line string:

search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3'  # sample input containing metachars.

searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it.

sed -n "s/$searchEscaped/foo/p" <<<"$search" # if ok, echoes 'foo'
  • Every character except ^ is placed in its own character set [...] expression to treat it as a literal.
    • Note that ^ is the one char. you cannot represent as [^], because it has special meaning in that location (negation).
  • Then, ^ chars. are escaped as \^.
    • Note that you cannot just escape every char by putting a \ in front of it because that can turn a literal char into a metachar, e.g. \< and \b are word boundaries in some tools, \n is a newline, \{ is the start of a RE interval like \{1,3\}, etc.

The approach is robust, but not efficient.

The robustness comes from not trying to anticipate all special regex characters - which will vary across regex dialects - but to focus on only 2 features shared by all regex dialects:

  • the ability to specify literal characters inside a character set.
  • the ability to escape a literal ^ as \^

Escaping a string literal for use as the replacement string in sed's s/// command:

The replacement string in a sed s/// command is not a regex, but it recognizes placeholders that refer to either the entire string matched by the regex (&) or specific capture-group results by index (\1, \2, ...), so these must be escaped, along with the (customary) regex delimiter, /.

Assuming that the replacement string is a single-line string:

replace='Laurel & Hardy; PS\2' # sample input containing metachars.

replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it

sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar" # if ok, outputs $replace as is


MULTI-line Solutions


Escaping a MULTI-LINE string literal for use as a regex in sed:

Note: This only makes sense if multiple input lines (possibly ALL) have been read before attempting to match.
Since tools such as sed and awk operate on a single line at a time by default, extra steps are needed to make them read more than one line at a time.

# Define sample multi-line literal.
search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3
/def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4'

# Escape it.
searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n')           #'

# Use in a Sed command that reads ALL input lines up front.
# If ok, echoes 'foo'
sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"
  • The newlines in multi-line input strings must be translated to '\n' strings, which is how newlines are encoded in a regex.
  • $!a\'$'\n''\\n' appends string '\n' to every output line but the last (the last newline is ignored, because it was added by <<<)
  • tr -d '\n then removes all actual newlines from the string (sed adds one whenever it prints its pattern space), effectively replacing all newlines in the input with '\n' strings.
  • -e ':a' -e '$!{N;ba' -e '}' is the POSIX-compliant form of a sed idiom that reads all input lines a loop, therefore leaving subsequent commands to operate on all input lines at once.

    • If you're using GNU sed (only), you can use its -z option to simplify reading all input lines at once:
      sed -z "s/$searchEscaped/foo/" <<<"$search"

Escaping a MULTI-LINE string literal for use as the replacement string in sed's s/// command:

# Define sample multi-line literal.
replace='Laurel & Hardy; PS\2
Masters\1 & Johnson\2'

# Escape it for use as a Sed replacement string.
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace")
replaceEscaped=${REPLY%$'\n'}

# If ok, outputs $replace as is.
sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar" 
  • Newlines in the input string must be retained as actual newlines, but \-escaped.
  • -e ':a' -e '$!{N;ba' -e '}' is the POSIX-compliant form of a sed idiom that reads all input lines a loop.
  • 's/[&/\]/\\&/g escapes all &, \ and / instances, as in the single-line solution.
  • s/\n/\\&/g' then \-prefixes all actual newlines.
  • IFS= read -d '' -r is used to read the sed command's output as is (to avoid the automatic removal of trailing newlines that a command substitution ($(...)) would perform).
  • ${REPLY%$'\n'} then removes a single trailing newline, which the <<< has implicitly appended to the input.


bash functions based on the above (for sed):

  • quoteRe() quotes (escapes) for use in a regex
  • quoteSubst() quotes for use in the substitution string of a s/// call.
  • both handle multi-line input correctly
    • Note that because sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
    • Also, using command substitutions ($(...)) to call the functions won't work for strings that have trailing newlines; in that event, use something like IFS= read -d '' -r escapedValue <(quoteSubst "$value")
# SYNOPSIS
#   quoteRe <text>
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# SYNOPSIS
#  quoteSubst <text>
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
  printf %s "${REPLY%$'\n'}"
}

Example:

from=$'Cost\(*):\n$3.' # sample input containing metachars. 
to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars.

# Should print the unmodified value of $to
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from" 

Note the use of -e ':a' -e '$!{N;ba' -e '}' to read all input at once, so that the multi-line substitution works.



perl solution:

Perl has built-in support for escaping arbitrary strings for literal use in a regex: the quotemeta() function or its equivalent \Q...\E quoting.
The approach is the same for both single- and multi-line strings; for example:

from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars.

# Should print the unmodified value of $to.
# Note that the replacement value needs NO escaping.
perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from" 
  • Note the use of -0777 to read all input at once, so that the multi-line substitution works.

  • The -s option allows placing -<var>=<val>-style Perl variable definitions following -- after the script, before any filename operands.

@Tino 2017-11-27 17:49:45

FWIW, newer sed allow sed -z to match NUL separated lines, so the matches can include \n. Example use: find -print0 | sed -z ... | xargs --null script etc. Multiline regex with \n come in very handy, as Linux (or Ubuntu for Windows) allows linefeeds in filenames (like: echo help me world > $'\n\nminime\nwas here\n')

@Christian Bongiorno 2018-06-11 18:26:21

Fantastic solution - and it didn't quite work for me. I have passwords in bind variables in a script. But when I echo them bash escapes the funky characters. Compounding the problem ;-/ props though for the answer

@mklement0 2018-06-12 00:40:27

Thanks, @ChristianBongiorno. I don't quite understand the use case you're describing, however; are you talking about keyboard macros defined with bind? How does echoing values come into play? Can you give an example?

@mklement0 2018-06-12 19:24:56

@Tino: Thanks, I've added a -z-based variant to the answer, but note that it's not about older or newer per se, it's about GNU sed, which defines -z as a nonstandard option, vs. other sed implementations, such as the BSD sed found on macOS, which do not.

@Christian Bongiorno 2018-06-13 17:52:53

@mklement0 - I have a password stored in an associative array as a bind variable. When I simple echo the password out bash actually escapes the \` for example to \\\` automatically and thus messes up the regex idea you provide. (which is pretty clever)

@mklement0 2018-06-13 18:20:47

@ChristianBongiorno: Thanks for the explanation, but I still don't get it, unfortunately; I'm curious, however: how about asking a new question focused on this aspect?

@Ed Morton 2015-04-14 11:45:49

Building upon @mklement0's answer in this thread, the following tool will replace any single-line string (as opposed to regexp) with any other single-line string using sed and bash:

$ cat sedstr
#!/bin/bash
old="$1"
new="$2"
file="${3:--}"
escOld=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< "$old")
escNew=$(sed 's/[&/\]/\\&/g' <<< "$new")
sed "s/$escOld/$escNew/g" "$file"

To illustrate the need for this tool, consider trying to replace a.*/b{2,}\nc with d&e\1f by calling sed directly:

$ cat file
a.*/b{2,}\nc
axx/bb\nc

$ sed 's/a.*/b{2,}\nc/d&e\1f/' file  
sed: -e expression #1, char 16: unknown option to `s'
$ sed 's/a.*\/b{2,}\nc/d&e\1f/' file
sed: -e expression #1, char 23: invalid reference \1 on `s' command's RHS
$ sed 's/a.*\/b{2,}\nc/d&e\\1f/' file
a.*/b{2,}\nc
axx/bb\nc
# .... and so on, peeling the onion ad nauseum until:
$ sed 's/a\.\*\/b{2,}\\nc/d\&e\\1f/' file
d&e\1f
axx/bb\nc

or use the above tool:

$ sedstr 'a.*/b{2,}\nc' 'd&e\1f' file  
d&e\1f
axx/bb\nc

The reason this is useful is that it can be easily augmented to use word-delimiters to replace words if necessary, e.g. in GNU sed syntax:

sed "s/\<$escOld\>/$escNew/g" "$file"

whereas the tools that actually operate on strings (e.g. awk's index()) cannot use word-delimiters.

Related Questions

Sponsored Content

1 Answered Questions

[SOLVED] Escaping forward slashes in sed command

  • 2016-11-21 07:37:50
  • Jonathan Andersson
  • 17150 View
  • 25 Score
  • 1 Answer
  • Tags:   bash shell sed

34 Answered Questions

[SOLVED] RegEx match open tags except XHTML self-contained tags

  • 2009-11-13 22:38:26
  • Jeff
  • 2572506 View
  • 1324 Score
  • 34 Answer
  • Tags:   html regex xhtml

21 Answered Questions

[SOLVED] Non greedy (reluctant) regex matching in sed?

41 Answered Questions

[SOLVED] How can I replace a newline (\n) using sed?

  • 2009-08-09 19:10:10
  • hhh
  • 1478086 View
  • 1228 Score
  • 41 Answer
  • Tags:   sed

39 Answered Questions

[SOLVED] A comprehensive regex for phone number validation

1 Answered Questions

[SOLVED] Escape string for use in Javascript regex

2 Answered Questions

[SOLVED] Escaping double quotation marks in sed

  • 2017-04-17 20:07:33
  • Zaki Ahmed
  • 1018 View
  • 0 Score
  • 2 Answer
  • Tags:   regex bash shell sed

1 Answered Questions

[SOLVED] Escape sed delimiter character when also regex special character

4 Answered Questions

[SOLVED] sed and regex to replace ',' except inside a string

  • 2014-02-08 16:12:18
  • Peyman
  • 251 View
  • 3 Score
  • 4 Answer
  • Tags:   regex sed

3 Answered Questions

[SOLVED] Sed regex on part of a line

  • 2012-07-30 05:44:22
  • Viper Bailey
  • 153 View
  • 2 Score
  • 3 Answer
  • Tags:   regex sed

Sponsored Content