了解sed的工作原理(pattern space 和 hold space)

作者:sealinger 发布时间:February 27, 2011 分类:混口饭吃

sed是一个非交互式的流编辑器(stream editor)。所谓非交互式,是指使用sed只能在命令行下输入编辑命令来编辑文本,然后在屏幕上查看输出;而所谓流编辑器,是指sed每次只从文件(或输入)读入一行,然后对该行进行指定的处理,并将结果输出到屏幕(除非取消了屏幕输出又没有显式地使用打印命令),接着读入下一行。整个文件像流水一样被逐行处理然后逐行输出。

sed一次处理一行内容。处理时,把当前处理的行存储在临时缓冲区中,称为“模式空间”(pattern space),接着用sed命令处理缓冲区(pattern space)中的内容,处理完成后,把缓冲区(pattern space)的内容送往屏幕。接着清空缓冲区(pattern space),处理下一行,这样不断重复,直到文件末尾。

pattern space(模式空间)相当于车间sed把流内容在这里处理
hold space(保留空间)相当于仓库,加工的半成品在这里临时储存(当然加工完的成品也在这里存储)

How sed Works:


先读入一行,去掉尾部换行符,存入pattern space,执行编辑命令。
处理完毕,除非加了-n参数,把现在的pattern space打印出来,在后边打印曾去掉的换行符。
把pattern space内容给hold space,把pattern space置空。
接着读下一行,处理下一行。

一种非平凡情况,一个文件仅一行,尾部没换行,sed只打印,不会尾部加换行,但若在尾部又附加了输出,他会再补上那个换行。

经典实例解释:


下面的解释小而简洁,但是可以将它作为一个准则,帮助你理解sed命令。

SED在哪里缓存数据

SED维护两个数据缓冲区:主动模式空间(pattern space)和辅助保留空间(hold space)。在“通常”操作中,SED从输入流读取一行存入pattern space,这里就是文本编辑操作发生的地方。hold space最初是空的,但也有在pattern space和hold space直接移动数据的命令。

这里,我们用SED的“x”命令来做一个小实验:

'x'  - 交换pattern space和hold space的内容

一个文件包含三行:

#cat file
line1
line2
line3
#

用SED x 命令操作后:
#sed 'x' file



line1
line2
#

解释:

#sed 'x' file 
        <-- 第一行是空的,因为hold space和pattern space交换了内容,记住最初的时候hold space是空的;在处理完第一行后,现在hold space的内容是line1。
line1    <-- 第二行输出是line1,现在hold space的内容是line2,and so on a so forth . ^_^
line2
#

------------------

操作pattern space和hold space的命令:


$ man sed
       d      Delete pattern space.  Start next cycle.
              删除pattern space的内容,开始下一个循环.

       h H    Copy/append pattern space to hold space.
              复制/追加pattern space的内容到hold space.
       g G    Copy/append hold space to pattern space.
              复制/追加hold space的内容到pattern space.
       x      Exchange the contents of the hold and pattern spaces.
              交换hold space和pattern space的内容.

课后理解:


1)交换第1行和第2行的内容


$ sed -n '1{h;n;x;H;x};p' filename

2)用sed实现tac的功能


$ sed -n -e '1!G;h;$p' filename

$ sed -e '1!G;h;$!d' filename

这2种写法都相当于tac filename。

---------------------------------

引用资料:

# info sed

File: sed.info,  Node: Execution Cycle,  Next: Addresses,  Up: sed Programs

How `sed' Works
===============

   `sed' maintains two data buffers: the active _pattern_ space, and
the auxiliary _hold_ space. Both are initially empty.

   `sed' operates by performing the following cycle on each lines of
input: first, `sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space.  Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.

   When the end of the script is reached, unless the `-n' option is in
use, the contents of pattern space are printed out to the output
stream, adding back the trailing newline if it was removed.(1) Then the
next cycle starts for the next input line.

   Unless special commands (like `D') are used, the pattern space is
deleted between two cycles. The hold space, on the other hand, keeps
its data between cycles (see commands `h', `H', `x', `g', `G' to move
data between both buffers).


 

-------------------

following explaination could be small but concise , can treat it as a guideline when you study all the sed commands.

Where SED buffers data
SED maintains two data buffers: the active pattern space, and the auxiliary hold space. In "normal" operation, SED reads in one line from the input stream and places it in the pattern space. This pattern space is where text manipulations occur. The hold space is initially empty, but there are commands for moving data between the pattern and hold spaces.

right, a small practice here for SED command "x" :
'x'  - Exchange the contents of the hold and pattern spaces.

say a file contains following 3 lines ,
#cat file
line1
line2
line3
#
by applying 'x' command, the output is as following :
#sed 'x' file

line1
line2
#
explain :

            <-- first line is empty , because hold space and pattern space exchange the contents , do remember initially the hold space is empty , now hold space contains line1 after first line data manipulation .
line1     <- second output is line1, now hold space contains line2 , and so on a so forth . ^_^
line2

标签: none

已有 5 条评论 »

  1. 有点意思。。。

  2. bells bells

    hi
    请问“2)用sed实现tac的功能”中的 "!" (感叹号) 表示什么意思??是表示取反吗?

    1. 是的,“1!G”表示除了第一行之外,都执行G操作。

  3. jaze jaze

    1!G;h;$p' $p是什么意思阿?

    1. $p 表示 对最后一行直接打印pattern space里的内容,其实就是直接打印最后一行的原文。

添加新评论 »

captcha
请输入验证码