Web www.grok2.com
grok2.gif (391 bytes)

 


7. KNOWN BUGS AMONG SED VERSIONS

Most versions of GNU sed and ssed contain a "buglist" in the archive source code of known errors or reported behaviors that may be misconstrued as bugs. This portion of the sed FAQ does not attempt to fully reproduce those buglists files. However, we do seek to do some substantial reporting, particularly where certain programs have no "buglist" of their own or are not being actively maintained.

As a rule of thumb, if the bug "bites" someone on the sed-users mailing list, I tend to report it.

7.1. ssed v3.59 (by Paolo Bonzini)

(1) N does not discard the contents of the pattern space upon reaching the end of file; not a bug. See section 6.7.5.A, above.

(2) If \x26 is entered into the RHS of a substitution, it is interpreted as an ampersand metacharacter, and the entire pattern matched in the "find" portion is inserted at that point. A literal ampersand should be inserted instead.

(3) Under Windows 2000, the -i switch doesn't create backup files properly. When passed one or more files to process, the source file(s) are unchanged, and the output changed files are given filenames like sedDOSxyz with no way to correspond them with the names of the source files.

7.2. GNU sed v4.0 - v4.0.5

(1) N does not discard the contents of the pattern space upon reaching the end of file; not a bug. See section 6.7.5.A, above.

(2) If \x26 is entered into the RHS of a substitution, it is interpreted as an ampersand metacharacter, and the entire pattern matched in the "find" portion is inserted at that point. A literal ampersand should be inserted instead.

7.3. GNU sed v3.02.80

(1) N does not discard the contents of the pattern space upon reaching the end of file; not a bug. See section 6.7.5.A, above.

(2) Same as #2 for GNU sed v4.0, above.

7.4. GNU sed v3.02

(1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and MS-Windows: 'l' (list) command does not display a lone carriage return (0x0D, ^M) embedded in a line.

(2) The expression "\<" causes problems when attempting the following types of substitutions, which should print "+aaa +bbb":

       echo aaa bbb | sed 's/\</+/g'    # prints "+a+a+a +b+b+b"
       echo aaa bbb | sed 's/\<./+&/g'  # prints "+a+a+a +b+b+b"

(3) The N command no longer discards the contents of the pattern space upon reaching the end of file. This is not a bug, it's a feature. See section 6.7.5, "Commands which operate differently".

7.5. GNU sed v2.05

(1) If a number follows the substitute command (e.g., s/f/F/10) and the number exceeds the possible matches on the pattern space, the command 't label' always jumps to the specified label. 't' should jump only if the substitution was successful (or returned "true").

(2) 'l' (list) command does not convert the following characters to hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC, 0xFD, 0xFE.

(3) A range address like "/foo/,14" is supposed to match every line from the first occurrence of "foo" until line 14, inclusive, and then match only those lines containing "foo" thereafter. In gsed v2.05, if "foo" occurs later in the file, every line from there to the end of file will be matched (since gsed is looking for line 14 to occur again!).

(4) The regexes /\`/ and /\'/ are not interpreted as a backquote and apostrophe, as might be expected. Instead, they are used to represent the beginning-of-line and end-of-line (respectively), to conform with similar regexes in the GNU versions of Emacs and awk. As a consequence, there is no clear way to indicate an apostrophe, since a bare apostrophe (') has special meaning to the Unix shell and the quoted apostrophe (\') is interpreted as the EOL. A double-quote apostrophe (\\') was interpreted as a backslash to sed and a quote mark to the shell--again, not providing the expected results. This syntax changed in the next version of gsed.

(5) Multiple occurrences of the 'w' command fail, as shown here, given that both "aaa" and "bbb" occur within the file:

       gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt

(6) The expression "\<" causes problems when attempting the following type of substitution, which should print "+aaa +bbb":

       echo aaa bbb | sed 's/\</+/g'    # sed hangs up with no output

The syntax 's/\<./+&/g' issues the proper output.

7.6. GNU sed v1.18

(1) Same as #1 for GNU sed v2.05, above.

(2) The following command will lock the computer under Win95. Echos is an echo command that does not issue a trailing newline:

       echos any_word | gsed "s/[ ]*$//"

(3) Same as #3 for GNU sed v2.05, above.

7.7. GNU sed v1.03 (by Frank Whaley)

(1) The \w and \W escape sequences both match only nonword characters. \w is misdefined and should match word characters.

(2) The underscore is defined as a nonword character; it should be defined as a word character.

(3) same as #3 for GNU sed v2.05, above.

7.8. sed v1.6 (by Walter Briscoe) - still in beta version

(1) Duplicated subexpressions (still) do not match an empty set as they should. This problem was inherited from HHsed15.

       echo 123 | sed "s/\([a-z][a-z]\)*/=\1/"  # does not return '='

(2) If grouping is followed by a + operator, nothing is matched. This problem was inherited from HHsed; it fixed a bug with the * operator, but the problem with the + operator persists.

       echo aaa | sed "/\(a\)+/d"          # nothing is deleted.

(3) With the interval expressions \{1,\} and +, there is a bug related to the & replacement character. This affected the BETA release, and it's not known if it affects the final release.

       echo ab | sed "s/a[^a]*/&c/"        # returns 'abc'. Okay.
       echo ab | sed "s/a[^a]+/&c/"        # returns 'ab'. Bug!
       echo ab | sed "s/a[^a]\{1,\}/&c/"   # returns 'ab'. Bug!

7.9. HHsed v1.5 (by Howard Helman)

(1) If a number follows the substitute command (e.g., s/foo/bar/2), in a sed script entered from the command line, two semicolons must follow the number, or they must be separated by an -e switch. Normally, only 1 semicolon is needed to separate commands.

       echo bit bet | HHsed "s/b/n/2;;s/b/B/"          # solution 1
       echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B"    # solution 2

(2) If the substitute command is followed by a number and a "p" flag, when the -n switch is used, the "p" flag must occur first.

       echo aaa | HHsed -n "s/./B/3p"    # bug! nothing prints
       echo aaa | HHsed -n "s/./B/p3"    # prints "aaB" as expected

(3) The following commands will cause HHsed to lock the computer under MS-DOS or Win95. Note that they occur because of malformed regular expressions which will match no characters.

       sed -n "p;s/\<//g;" file
       sed -n "p;s/[char-set]*//g;" file

(4) The range command '/RE1/,/RE2/' in HHsed will match one line if both regexes occur on the same line (see section 3.4(3), above). Though this could be construed as a feature, it should probably be considered a bug since its operation differs from every other version of sed. For example, '/----/,/----/{s/^/>>/;}' should put two angle brackets ">>" before every line which is sandwiched between a row of 4 or more hyphens. With HHsed, this command will only prefix the hyphens themselves with the angle brackets.

(5) If the hold space is empty, the H command copies the pattern space to the hold space but fails to prepend a leading newline. The H command is supposed to add a newline, followed by the contents of the pattern space, to the hold space at all times. A workaround is "{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing that the hold space is empty and using the command only once. Another alternative is to use the G or the h command alone at key points in the script.

(6) If grouping is followed by an '*' or '+' operator, HHsed does not match the pattern, but issues no warning. See below:

       echo aaa | HHsed "/\(a\)*/d"      # nothing is deleted
       echo aaa | HHsed "/\(a\)+/d"      # nothing is deleted
       echo aaa | HHsed "s/\(a\)*/\1B/"  # nothing is changed
       echo aaa | HHsed "s/\(a\)+/\1B/"  # nothing is changed

(7) If grouping is followed by an interval expression, HHsed halts with the error message "garbled command", in all of the following examples:

       echo aaa | HHsed "/\(a\)\{3\}/d"
       echo aaa | HHsed "/\(a\)\{1,5\}/d"
       echo aaa | HHsed "s/\(a\)\{3\}/\1B/"

(8) In interval expressions, 0 is not supported. E.g., \{0,3\)

7.10. sedmod v1.0 (by Hern Chen)

Technically, the following are limits (or features?) of sedmod, not bugs, since the docs for sedmod do not claim to support these missing features.

(1) sedmod does not support standard interval expressions \{...\} present in nearly all versions of sed.

(2) If grouping is followed by an '*' or '+' operator, sedmod gives a "garbled command" message. However, if the grouped expressions are strings literals with no metacharacters, a partial workaround can be done like so:

       \(string\)\1*    # matches 1 or more instances of 'string'
       \(string\)\1+    # matches 2 or more instances of 'string'

(3) sedmod does not support a numeric argument after the s/// command, as in 's/a/b/3', present in nearly all versions of sed.

The following are bugs in sedmod v1.0:

(4) When the -i (ignore case) switch is used, the '/regex/d' command is not properly obeyed. Sedmod may miss one or more lines matching the expression, regardless of where they occur in the script. Workaround: use "/regex/{d;}" instead.

7.11. HP-UX sed

(1) Versions of HP-UX sed up to and including version 10.20 are buggy. According to the README file, which comes with the GNU cc at <ftp://ftp.ntua.gr/pub/gnu/sed/sed-2.05.bin.README>:

"When building gcc on a hppa*-*-hpux10 platform, the `fixincludes' step (which involves running a sed script) fails because of a bug in the vendor's implementation of sed. Currently the only known workaround is to install GNU sed before building gcc. The file sed-2.05.bin.hpux10 is a precompiled binary for that platform."

7.12. SunOS sed v4.1

(1) Bug occurs in RE pattern matching when a non-null '[char-set]*' is followed by a null '\NUM' pattern recall, illustrated here and reported by Greg Ubben:

       s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/  # between '[0-9]*' and '\2'
       s/\(a\{0,1\}\).\{0,1\}\1/bar/      # between '.\{0,1\}' and '\1'

Workaround: add a do-nothing 'X*' expression which will not match any characters on the line between the two components. E.g.,

       s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
       s/\(a\{0,1\}\).\{0,1\}X*\1/bar/

7.13. SunOS sed v5.6

(1) If grouping is followed by an asterisk, SunOS sed does not match the null string, which it should do. The following command:

       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'

should transform "foo" to "goo" under normal versions of sed.

7.14. Ultrix sed v4.3

(1) If grouping is followed by an asterisk, Ultrix sed replies with "command garbled", as shown in the following example:

       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'

(2) If grouping is followed by a numeric operator such as \{0,9\}, Ultrix sed does not find the match.

7.15. Digital Unix sed

(1) The following comes from the man pages for sed distributed with new, 1998 versions of Digital Unix (reformatted to fit our margins):

[Digital] The h subcommand for sed does not work properly. When you use the h subcommand to place text into the hold area, only the last line of the specified text is saved. You can use the H subcommand to append text to the hold area. The H subcommand and all others dealing with the hold area work correctly.

(2) "$d" command issues an error message, "cannot parse". Reported by Carlos Duarte on 8 June 1998.

[end-of-file]


Site Links
  The Books I Own
  Main Page
  Vi in Emacs
  Linux on Vaio
  Study NZ
  Utilities
  Programming Fun?
  SED FAQ
  C Language
  Source Code Browsers
  C Struct Packing
  Walt Disney World
  PPP RFCs
  FSM/HSM
  Tcl/Tk
  Photographs of Flowers
  Random Photogaphs
  Put this on your site!
  SQLite
  The Sundial Bridge
  Repetitive Strain Injury (RSI)
  Selling Software Online (MicroISV)
  Tcl Tk Life-Savers
  The Experience Shows!
  Green Tips
  .htaccess tricks
  Web-Site Development Online Tools
  Blog
 

 

 

 


Site copyright of domain owner. All rights reserved.