Web www.grok2.com
grok2.gif (391 bytes)

 


5. WHY ISN'T THIS WORKING?

5.1. Why don't my variables like $var get expanded in my sed script?

Because your sed script uses 'single quotes' instead of "double quotes." Unix shells never expand $variables in single quotes.

This is probably the most frequently-asked sed question. For more info on using variables, see section 4.30.

5.2. I'm using 'p' to print, but I have duplicate lines sometimes.

Sed prints the entire file by default, so the 'p' command might cause the duplicate lines. If you want the whole file printed, try removing the 'p' from commands like 's/foo/bar/p'. If you want part of the file printed, run your sed script with -n flag to suppress normal output, and rewrite the script to get all output from the 'p' comand.

If you're still getting duplicate lines, you are probably finding several matches for the same line. Suppose you want to print lines with the words "Peter" or "James" or "John", but not the same line twice. The following command will fail:

     sed -n '/Peter/p; /James/p; /John/p' files

Since all 3 commands of the script are executed for each line, you'll get extra lines. A better way is to use the 'd' (delete) or 'b' (branch) commands, like so (with GNU sed):

     sed '/Peter/b; /James/b; /John/b; d' files          # one way
     sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files  # a 2nd way
     sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files  # a 3rd way
     sed '/Peter\|James\|John/!d' files                  # shortest way

On standard seds, these must be broken down with -e commands:

     sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
     sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files

The 3rd line would require too many -e commands to fit on one line, since standard versions of sed require an -e command after each 'b' and also after each closing brace '}'.

5.3. Why does my DOS version of sed process a file part-way through and then quit?

First, look for errors in the script. Have you used the -n switch without telling sed to print anything to the console? Have you read the docs to your version of sed to see if it has a syntax you may have misused? (Look for an N or H command that gathers too much.)

Next, if you are sure your sed script is valid, a probable cause is an end-of-file marker embedded in the file. An EOF marker (SUB) is a Control-Z character, with the value of 1A hex (26 decimal). As soon as any DOS version of sed encounters a Ctrl-Z character, sed stops processing.

To locate the EOF character, use Vern Buerg's shareware file viewer LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With Unix utilities ported to DOS, use 'od' (octal dump) to display hexcodes in your file, and then use sed to locate the offending character:

       od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"

Then edit the input file to remove the offending character(s).

If you would rather NOT edit the input file, there is still a fix. It requires the DJGPP 32-bit port of 'tr', the Unix translate program (v1.22 or higher). GNU od and tr are currently at v2.0 (for DOS); they are packaged with the GNU text utilities, available at

       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt20b.zip
       http://www.simtel.net/gnudlpage.php?product=/gnu/djgpp/v2gnu/txt20b.zip&name=txt20b.zip

It is important to get the DJGPP version of 'tr' because other versions ported to DOS will stop processing when they encounter the EOF character. Use the -d (delete) command:

       tr -d \32 < badfile.txt | sed -f myscript.sed

5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs. stingy pattern matching")

The two most common causes for this problem are: (1) misusing the '.' metacharacter, and (2) misusing the '*' metacharacter. The RE '.*' is designed to be "greedy" (i.e., matching as many characters as possible). However, sometimes users need an expression which is "stingy," matching the shortest possible string.

(1) On single-line patterns, the '.' metacharacter matches any single character on the line. ('.' cannot match the newline at the end of the line because the newline is removed when the line is put into the pattern space; sed adds a newline automatically when the pattern space is printed.) On multi-line patterns obtained with the 'N' or 'G' commands, '.' will match a newline in the middle of the pattern space. If there are 3 lines in the pattern space, "s/.*//" will delete all 3 lines, not just the first one (leaving 1 blank line, since the trailing newline is added to the output).

Normal misuse of '.' occurs in trying to match a word or bounded field, and forgetting that '.' will also cross the field limits. Suppose you want to delete the first word in braces:

       echo {one} {two} {three} | sed 's/{.*}/{}/'       # fails
       echo {one} {two} {three} | sed 's/{[^}]*}/{}/'    # succeeds

's/{.*}/{}/' is not the solution, since the regex '.' will match any character, including the close braces. Replace the '.' with '[^}]', which signifies a negated character set '[^...]' containing anything other than a right brace. FWIW, we know that 's/{one}/{}/' would also solve our question, but we're trying to illustrate the use of the negated character set: [^anything-but-this].

A negated character set should be used for matching words between quote marks, for fields separated by commas, and so on. See also section 4.12 ("How do I parse a comma-delimited data file?").

(2) The '*' metacharacter represents zero or more instances of the previous expression. The '*' metacharacter looks for the leftmost possible match first and will match zero characters. Thus,

       echo foo | sed 's/o*/EEE/'

will generate 'EEEfoo', not 'fEEE' as one might expect. This is because /o*/ matches the null string at the beginning of the word.

After finding the leftmost possible match, the '*' is GREEDY; it always tries to match the longest possible string. When two or three instances of '.*' occur in the same RE, the leftmost instance will grab the most characters. Consider this example, which uses grouping '\(...\)' to save patterns:

       echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'

What will be displayed is 'bit', never anything longer, because the leftmost '.*' took the longest possible match. Remember this rule: "leftmost match, longest possible string, zero also matches."

5.5. What is CSDPMI*B.ZIP and why do I need it?

If you use MS-DOS outside of Windows and try to use GNU sed v1.18 or 3.02, you may encounter the following error message:

       no DPMI - Get csdpmi*b.zip

"DPMI" stands for DOS Protected Mode Interface; it's basically a means of running DOS in Protected Mode (as opposed to Real Mode), which allows programs to share resources in extended memory without conflicting with one another. Running HIMEM.SYS and EMM386.EXE is not enough. The "CSDPMI*B.ZIP" refers to files written by Charles Sandmann to provide DPMI services for 32-bit computers (i.e., 386SX, 386DX, 486SX, etc.). Download the binary file (the source code is also available):

       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5b.zip  # binaries
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5s.zip  # source
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5b.zip # binaries
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5s.zip # source

and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE and you're all set. There are DOC files enclosed, but they're nearly incomprehensible for the average computer user. (Another case of user-vicious documentation.)

If you're running Windows and you normally use a DOS session to run GNU sed (i.e., you get to a DOS prompt with a resizable window or you press Alt-Enter to switch to full-screen mode), you don't need the CWS*.EXE files at all, since Windows uses DPMI already.

5.6. Where are the man pages for GNU sed?

Prior to GNU sed v3.02, there weren't any. Until recently, man pages distributed with gsed were borrowed from old sources or from other compilations. None of them were "official." GNU sed v3.02 had the first real set of official man pages, and the documentation has greatly improved with GNU sed version 4.0, which now includes both man pages and textinfo pages.

5.7. How do I tell what version of sed I am using?

Try entering "sed" all by itself on the command line, followed by no arguments or parameters. Also, try "sed --version". In a pinch, you can also try this:

       strings sed | grep -i ver

Your version of 'strings' must be a version of the Unix utility of this name. It should not be the DOS utility STRINGS.COM by Douglas Boling.

5.8. Does sed issue an exit code?

Most versions of sed do not, but check the documentation that came with whichever version you are using. GNU sed issues an exit code of 0 if the program terminated normally, 1 if there were errors in the script, and 2 if there were errors during script execution.

5.9. The 'r' command isn't inserting the file into the text.

On most versions of sed (but not all), the 'r' (read) and 'w' (write) commands must be followed by exactly one space, then the filename, and then terminated by a newline. Any additional characters before or after the filename are interpreted as part of the filename. Thus

       /RE/r  insert.me

will would try to locate a file called ' insert.me' (note the leading space!). If the file was not found, most versions of sed say nothing, not even an error message.

When sed scripts are used on the command line, every 'r' and 'w' must be the last command in that part of the script. Thus,

       sed -e '/regex/{r insert.file;d;}' source         # will fail
       sed -e '/regex/{r insert.file' -e 'd;}' source    # will succeed

5.10. Why can't I match or delete a newline using the \n escape sequence? Why can't I match 2 or more lines using \n?

The \n will never match the newline at the end-of-line because the newline is always stripped off before the line is placed into the pattern space. To get 2 or more lines into the pattern space, use the 'N' command or something similar (such as 'H;...;g;').

Sed works like this: sed reads one line at a time, chops off the terminating newline, puts what is left into the pattern space where the sed script can address or change it, and when the pattern space is printed, appends a newline to stdout (or to a file). If the pattern space is entirely or partially deleted with 'd' or 'D', the newline is not added in such cases. Thus, scripts like

       sed 's/\n//' file       # to delete newlines from each line
       sed 's/\n/foo\n/' file  # to add a word to the end of each line

will never work, because the trailing newline is removed before the line is put into the pattern space. To perform the above tasks, use one of these scripts instead:

       tr -d '\n' < file              # use tr to delete newlines
       sed ':a;N;$!ba;s/\n//g' file   # GNU sed to delete newlines
       sed 's/$/ foo/' file           # add "foo" to end of each line

Since versions of sed other than GNU sed have limits to the size of the pattern buffer, the Unix 'tr' utility is to be preferred here. If the last line of the file contains a newline, GNU sed will add that newline to the output but delete all others, whereas tr will delete all newlines.

To match a block of two or more lines, there are 3 basic choices: (1) use the 'N' command to add the Next line to the pattern space; (2) use the 'H' command at least twice to append the current line to the Hold space, and then retrieve the lines from the hold space with x, g, or G; or (3) use address ranges (see section 3.3, above) to match lines between two specified addresses.

Choices (1) and (2) will put an \n into the pattern space, where it can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example of using 'N' to delete a block of lines appears in section 4.13 ("How do I delete a block of specific consecutive lines?"). This example can be modified by changing the delete command to something else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append), or 's' (substitute).

Choice (3) will not put an \n into the pattern space, but it does match a block of consecutive lines, so it may be that you don't even need the \n to find what you're looking for. Since several versions of sed support this syntax:

       sed '/start/,+4d'  # to delete "start" plus the next 4 lines,

in addition to the traditional '/from here/,/to there/{...}' range addresses, it may be possible to avoid the use of \n entirely.

5.11. My script aborts with an error message, "event not found".

This error is generated by the csh or tcsh shells, not by sed. The exclamation mark (!) is special to csh/tcsh, and if you use it in command-line or shell scripts--even within single quotes--it must be preceded by a backslash. Thus, under the csh/tcsh shell:

       sed '/regex/!d'      # will fail
       sed '/regex/\!d'     # will succeed

The exclamation mark should not be prefixed with a backslash when the script is called from a file, as "-f script.file".


Site Links
  The Books I Own
  Main Page
  Vi in Emacs
  Linux on Vaio
  Study NZ
  Utilities
  Programming Fun?
  SED FAQ
  C Language
  Source Code Browsers
  C Struct Packing
  Walt Disney World
  PPP RFCs
  FSM/HSM
  Tcl/Tk
  Photographs of Flowers
  Random Photogaphs
  Put this on your site!
  SQLite
  The Sundial Bridge
  Repetitive Strain Injury (RSI)
  Selling Software Online (MicroISV)
  Tcl Tk Life-Savers
  The Experience Shows!
  Green Tips
  .htaccess tricks
  Web-Site Development Online Tools
  Blog
 

 

 

 


Site copyright of domain owner. All rights reserved.