3 个必须知道的 Linux 文本操作命令
系统管理员使用无数的命令行工具,您可能经常使用本文讨论的三个工具:grep
、sed
和awk
。但您知道使用它们来操作文本的所有方法吗?如果不知道(或者您不确定),请继续阅读。
在我开始之前,这里是命令名称的由来:
grep
:根据Wikipedia的说法,该名称“来自ed
命令 g/re/p(全局搜索正则表达式并打印匹配的行),效果相同。”ed
是一个“面向行的文本编辑器。”即使对于喜欢命令行的人来说,逐行编辑文件似乎也太过时了,但在古代人们必须从某件事开始)。sed
:该名称源于其主要用途,即流编辑器。awk
:它的名字来自作者的姓名首字母(Aho、Weinberger 和 Kernighan)。如果 Kernighan 这个名字让你想起什么(双关语),那是因为这位加拿大计算机科学家为 Unix 的创建做出了贡献,并且是第一本关于 C 语言的书的合著者。
追踪命令的谱系树非常好,但真正重要的是这些命令对于文本操作非常有帮助。
在下面的例子中,我将使用一个名为的文件quotes.txt
来说明如何使用这些命令。以下是此文件的内容:
$ cat quotes.txt
"God does not play dice with the universe."
- Albert Einstein, The Born-Einstein Letters 1916-55
"Not only does God play dice but... he sometimes throws them where they cannot be seen."
- Stephen Hawking
"I regard consciousness as fundamental..."
- Max Planck
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
- Carl Sagan
"[T]he atoms or elementary particles themselves are not real; they form a world of potentialities or possibilities rather than one of things or facts."
- Werner Heisenberg
grep
最简单的使用方法grep
是:
$ grep universe quotes.txt
"God does not play dice with the universe."
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
此示例提供了要搜索的字符串(universe)以及查找它的位置(quotes.txt)。
如果要搜索的字符串中有空格,则必须用引号引起来:
$ grep "the universe" quotes.txt
"God does not play dice with the universe."
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
使用时的一些常见变化grep
是:
- 忽略大小写:
grep -i string-to-search filename
- 在多个文件中搜索:
grep -i string-to-search *.txt
您可以搜索正则表达式:
$ grep "191[0-9]" quotes.txt
- Albert Einstein, The Born-Einstein Letters 1916-55
如果要启用扩展正则表达式模式以使用诸如 、 或 之类的符号+
,?
可以|
使用该命令,这是将标志添加到 的egrep
快捷方式。这还使您能够搜索多个字符串:-E
grep
$ egrep -i "albe|hawk" quotes.txt
- Albert Einstein, The Born-Einstein Letters 1916-55
- Stephen Hawking
显示包含单词“universe”和下一行的行(以包含作者姓名):
$ grep -i universe -A 1 quotes.txt
"God does not play dice with the universe."
- Albert Einstein, The Born-Einstein Letters 1916-55
--
"The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
- Carl Sagan
您可能已经猜到了,您可以通过传递不同的数字来显示更多行。或者,您可以使用标志来显示之前的行-B
。
So far, I've showed grep
running alone, but it is very common to have it in a chain of commands:
$ echo "Authors who mentioned 'universe'"; cat quotes.txt | grep -i universe -A 1 | grep "^-"
- Albert Einstein, The Born-Einstein Letters 1916-55
- Carl Sagan
sed
My favorite use for sed
is to replace strings in files. For example:
$ cat quotes.txt | sed 's/universe/Universe/g'
This will replace universe
with Universe
and send the result to stdout. The g
flag means "replace all occurrences of the string in each line."
Some variations for this are:
- Replace the string only if it's found in the first three lines:
sed '1,3 s/universe/Universe/g' quotes.txt
- Replace the n-th occurrence of a pattern in a line (for example, the second occurrence):
sed 's/universe/Universe/2' quotes.txt
These examples don't change the original file. If you want sed
to change the file in place, use -i
:
$ sed -i 's/universe/Universe/g' quotes.txt
If you use the -i
flag, make sure that you know exactly what and how many occurrences will be affected, as it will modify the original file. To find out, you can run a grep
and search for the pattern first.
awk
The awk
utility is very powerful, offering many options for processing text files.
Most of the situations where I use awk
involve processing files with a structure (columns) that is reasonably predictable, including the character used as a column separator.
When awk
processes a file, it splits each line using the "field separator" (internal variable FS
, which by default is the space character). Each field is assigned to positional variables ($1
contains the first field, $2
contains the second, and so forth. $0
represents the full line).
You can also apply filters to each line. For example:
$ cat quotes.txt | awk '/universe/ { print NR " - " $0 }'
1 - "God does not play dice with the universe."
10 - "The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself."
The commands passed to awk
use single quotes (it is like passing a mini-program to be interpreted):
- The
/universe/
part tellsawk
to select only the lines that match this pattern. - The "main" program goes between the curly brackets.
NR
is the internal variable that contains the number of the current record, for example, the current line number.- I added the
" -"
string for aesthetics.
The internal variables in awk
are:
NR
: The total number of input records seen so far by the commandNF
: The number of fields in the current input recordFS
: The input field separator (a space by default)
Here is an example using a more "predictable" file format:
$ cat /etc/passwd | awk '/nologin/ { FS=":"; print $1 }'
(output omitted)
...
redis
akmods
cjdns
haproxy
systemd-oom
In this last example:
/nologin/
selects only the lines that contain this pattern.FS=": ";
sets the field separator to:
instead of the default (space).print $1
prints the first field in each line (considering that the separator is:
).
Learn more
Those were some simple examples for using grep
, sed
, and awk
.
If you read the man
pages for each, you will notice plenty of additional parameters and uses for these handy commands.
For simple use cases and things you do only once in a while, it is always good to have tools like these in your toolbox.
If the required action is more complex, it is worth considering if these tools still make sense for you to use. For a corporate use case or managing "everything-as-code," I recommend using Ansible. Ansible modules have similar features that let you emulate the operations described above, with the advantage that Ansible modules usually have idempotency and that the full process will be documented somewhere (such as in your internal Git repo).