AWK 命令入门 [初学者指南]
AWK命令可以追溯到早期的 Unix 时代。它是POSIX 标准的一部分,应该可以在任何类 Unix 系统上使用。甚至更多。
尽管有时 AWK 因其年代久远或与Perl等多功能语言相比功能不足而受到质疑,但它仍然是我日常工作中喜欢使用的工具。有时我会用它来编写相对复杂的程序,但也因为它可以编写功能强大的单行代码来解决数据文件的问题。
所以,这正是本文的目的。向您展示如何在不到 80 个字符的时间内利用 AWK 功能执行有用的任务。本文并非完整的 AWK 教程,但我在开头仍包含了一些基本命令,因此即使您之前的经验很少或没有,您也可以掌握核心 AWK 概念。
本 AWK 教程的示例文件
本文中描述的所有单行命令都将在同一个数据文件上进行测试:
cat file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
您可以在GitHub上在线获取该文件的副本。
了解 AWK 中的预定义和自动变量
AWK 支持一些预定义和自动变量来帮助你编写程序。其中你经常会遇到:
RS –记录分隔符。AWK每次处理一条记录。记录分隔符是用于将输入数据流拆分为记录的分隔符。默认情况下,这是换行符。因此,如果您不更改它,一条记录就是输入文件的一行。
NR –当前输入记录号。如果您对记录使用标准换行符分隔符,则此值与当前输入行号匹配。
FS/OFS –用作字段分隔符的字符。一旦 AWK 读取一条记录,它就会根据 的值将其拆分为不同的字段FS
。当 AWK 在输出中打印一条记录时,它会重新连接字段,但这次使用的是 分隔OFS
符而不是FS
分隔符。通常,FS
和OFS
是相同的,但这不是强制性的。“空格”是它们两者的默认值。
NF – 当前记录中的字段数。如果您对字段使用标准“空格”分隔符,则该值将与当前记录中的字数相匹配。
还有其他或多或少标准的 AWK 变量可用,因此值得查看特定的 AWK 实现手册以了解更多详细信息。但是,这个子集已经足以开始编写有趣的单行代码。
一、AWK命令基本使用
1. 打印所有行
这个例子基本上没用,但它仍然很好地介绍了 AWK 语法:
awk '1 { print }' file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
AWK 程序由一个或多个语句组成pattern { action }
。
如果对于输入文件的给定记录(“行”),模式求值为非零值(相当于 AWK 中的“真”),则执行相应操作块中的命令。在上面的示例中,由于1
是非零常数,因此{ print }
将针对每个输入记录执行操作块。
另一个技巧是,如果您没有明确指定动作块,AWK{ print }
将使用默认动作块。因此,上述命令可以缩短为:
awk 1 file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
几乎同样没用的,以下 AWK 程序将使用其输入但不会在输出中产生任何内容:
awk 0 file
2. 删除文件头
awk 'NR>1' file
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
请记住,这相当于明确地写下:
awk 'NR>1 { print }' file
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
此行代码将写入输入文件除第一个之外的记录,因为在这种情况下条件1>1
显然不成立。
由于该程序使用的是默认值RS
,实际上它将丢弃输入文件的第一行。
3. 打印一定范围内的行
这只是前面例子的概括,不需要太多解释,除了说它&&
是逻辑and
运算符:
awk 'NR>1 && NR < 4' file
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
4. 删除仅含空格的行
awk 'NF' file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
AWK 根据变量中指定的字段分隔符将每条记录拆分为字段FS
。默认字段分隔符是一个或多个空格字符(又称空格或制表符)。使用这些设置,任何包含至少一个非空格字符的记录都将包含至少一个字段。
换句话说,只有NF
当记录仅包含空格时,才会出现 0(“false”)的情况。因此,该行代码将仅打印包含至少一个非空格字符的记录。
5. 删除所有空白行
awk '1' RS='' file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
此单行代码基于一个模糊的 POSIX 规则,该规则规定如果设置RS
为空字符串,“则记录由 <newline> 加上一个或多个空行组成的序列分隔”。
值得一提的是,在 POSIX 术语中,空行是完全空的行。仅包含空格的行不算作“空白”。
6. 提取字段
这可能是 AWK 最常见的用例之一:提取数据文件的某些列。
awk '{ print $1, $3}' FS=, OFS=, file
CREDITS,USER
99,sylvain
52,sonia
52,sonia
25,sonia
10,sylvain
8,öle
,
,
,
17,abhishek
在这里,我明确将输入和输出字段分隔符都设置为逗号。当 AWK 将记录拆分为字段时,它会将第一个字段的内容存储到 $1 中,将第二个字段的内容存储到 $2 中,依此类推。我在这里没有使用它,但值得一提的是,$0 是整个记录。
在本行代码中,您可能注意到我使用了一个没有模式的操作块。在这种情况下,模式被假定为 1(“true”),因此对每条记录都执行操作块。
根据您的需要,它可能无法产生我们想要的空白行或纯空格行。在这种情况下,第二个版本可能会更好一些:
awk 'NF { print $1, $3 }' FS=, OFS=, file
CREDITS,USER
99,sylvain
52,sonia
52,sonia
25,sonia
10,sylvain
8,öle
,
17,abhishek
在这两种情况下,我都在命令行上传递了自定义值。另一种选择是在 AWK 程序中使用特殊FS
块在读取第一条记录之前初始化这些变量。因此,根据您的喜好,您可能更喜欢这样写:OFS
BEGIN
awk 'BEGIN { FS=OFS="," } NF { print $1, $3 }' file
CREDITS,USER
99,sylvain
52,sonia
52,sonia
25,sonia
10,sylvain
8,öle
,
17,abhishek
值得一提的是,在读取最后一条记录后,您还可以使用END
块来执行某些任务。就像我们刚才看到的那样。话虽如此,我承认这远非完美,因为只有空白的行处理得并不好。我们很快就会看到一个可能的解决方案,但在此之前,让我们做一些数学运算……
7. 按列进行计算
AWK 支持标准算术运算符。并且会根据上下文自动在文本和数字之间转换值。此外,您还可以使用自己的变量来存储中间值。所有这些允许您编写紧凑的程序来对数据列执行计算:
awk '{ SUM=SUM+$1 } END { print SUM }' FS=, OFS=, file
263
或者,等效地使用+=
简写语法:
awk '{ SUM+=$1 } END { print SUM }' FS=, OFS=, file
263
请注意,在使用之前不需要声明 AWK 变量。未定义的变量被假定为保存空字符串。根据 AWK 类型转换规则,它等于 0 数字。由于这个特性,我没有费心明确处理包含$1
文本(在标题中)、空格或什么都没有的情况。在所有这些情况下,它将计为 0,不会干扰我们的求和。当然,如果我执行乘法,情况会有所不同。那么,为什么不使用评论部分来为这种情况提出解决方案呢?
8. 计算非空行的数量
我之前已经提到过这个END
规则。下面是另一个可能的应用,用于计算文件中非空行的数量:
awk '/./ { COUNT+=1 } END { print COUNT }' file
9
这里我使用了变量,并对与正则表达式匹配的每一行COUNT
增加 ( ) 。也就是说,每行至少包含一个字符。最后,END 块用于在处理整个文件后显示最终结果。名称没有什么特别之处。我可以使用、、或任何其他符合AWK 变量命名规则的名称+=1
/./
COUNT
Count
count
n
xxxx
但是,这个结果正确吗?这取决于你对“空”行的定义。如果你认为只有空白行(根据 POSIX)才是空的,那么这是正确的。但也许你更愿意将只有空白的行也视为空的?
awk 'NF { COUNT+=1 } END { print COUNT }' file
8
这次结果不同,因为后续版本也忽略了只有空格的行,而初始版本只忽略了空行。你能看出区别吗?我让你自己想清楚。如果这还不够清楚,请随时使用评论部分!
最后,如果你只对数据行感兴趣,并且给出我的特定输入数据文件,我可以这样写:
awk '+$1 { COUNT+=1 } END { print COUNT }' file
7
它之所以有效是因为 AWK 类型转换规则。模式中的一元加号强制在数字上下文中计算 $1。在我的文件中,数据记录的第一个字段包含一个数字。非数据记录(标题、空白行、仅包含空格的行)包含文本或不包含任何内容。当转换为数字时,它们都等于 0。
请注意,使用最新的解决方案,最终拥有 0 个积分的用户的记录也将被丢弃。
B. 在 AWK 中使用数组
数组是 AWK 的一个强大功能。AWK 中的所有数组都是关联数组,因此它们允许将任意字符串与另一个值关联。如果您熟悉其他编程语言,您可能知道它们是哈希、关联表、字典或映射。
9. AWK 数组简单示例
假设我想知道所有用户的总积分。我可以在一个关联数组中存储每个用户的条目,每当我遇到该用户的记录时,我就增加数组中存储的相应值。
awk '+$1 { CREDITS[$3]+=$1 }
END { for (NAME in CREDITS) print NAME, CREDITS[NAME] }' FS=, file
abhishek 17
sonia 129
öle 8
sylvain 109
我承认这不再是一行代码。主要是因为在for
处理完文件后,循环用于显示数组的内容。所以,现在让我们回到更短的例子:
10. 使用 AWK 识别重复行
与其他 AWK 变量一样,数组既可用于操作块,也可用于模式。利用这一点,我们可以编写一行代码来仅打印重复的行:
awk 'a[$0]++' file
52,01 dec 2018,sonia,team
该运算符是从C 语言++
家族继承的后增运算符(AWK 是该语言的骄傲成员,感谢Brian Kernighan是其原作者之一)。
正如其名称所暗示的,后增运算符会增加(“加 1”)一个变量,但只有在其值被用于评估覆盖表达式之后才会发生。
在这种情况下,a[$0]
将进行评估以查看是否打印记录,并且一旦做出决定,在所有情况下,数组条目都会增加。
因此,第一次读取记录时,a[$0]
它是未定义的,因此对于 AWK 来说相当于零。因此,第一个记录不会写入输出。然后该条目从零更改为一。
第二次读取相同的输入记录时,a[$0]
现在为 1。这是“真”。该行将被打印。但是,在此之前,数组条目从 1 更新为 2。依此类推。
11. 删除重复行
As a corollary of the previous one-liner, we may want to remove duplicate lines:
awk '!a[$0]++' file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
The only difference is the use of the logical, not operator (!
) that reverse the truth value of the expression. What was false becomes true, and what was true becomes false. The logical not have absolutely no influence on the ++
post increment which works exactly as before.
C. Field and record separator magic
12. Changing the field separators
awk '$1=$1' FS=, OFS=';' file
CREDITS;EXPDATE;USER;GROUPS
99;01 jun 2018;sylvain;team:::admin
52;01 dec 2018;sonia;team
52;01 dec 2018;sonia;team
25;01 jan 2019;sonia;team
10;01 jan 2019;sylvain;team:::admin
8;12 jun 2018;öle;team:support
17;05 apr 2019;abhishek;guest
That program sets the FS
and OFS
variable to use a coma as input field separator and a semicolon as the output field separator. Since AWK does not change the output record as long as you did not change a field, the $1=$1
trick is used to force AWK to break the record and reassemble it using the output field separator.
Remember here the default action block is { print }
. So you could rewrite that more explicitly as:
awk '$1=$1 { print }' FS=, OFS=';' file
CREDITS;EXPDATE;USER;GROUPS
99;01 jun 2018;sylvain;team:::admin
52;01 dec 2018;sonia;team
52;01 dec 2018;sonia;team
25;01 jan 2019;sonia;team
10;01 jan 2019;sylvain;team:::admin
8;12 jun 2018;öle;team:support
17;05 apr 2019;abhishek;guest
You may have noticed both those examples are removing empty lines too. Why? Well, remember the AWK conversion rules: an empty string is “false.” All other strings are “true.” The expression $1=$1
is an affectation that alters $1
. However, this is an expression too. And it evaluates to the value of $1
–which is “false” for the empty string. If you really want all lines, you may need to write something like that instead:
awk '($1=$1) || 1 { print }' FS=, OFS=';' file
CREDITS;EXPDATE;USER;GROUPS
99;01 jun 2018;sylvain;team:::admin
52;01 dec 2018;sonia;team
52;01 dec 2018;sonia;team
25;01 jan 2019;sonia;team
10;01 jan 2019;sylvain;team:::admin
8;12 jun 2018;öle;team:support
17;05 apr 2019;abhishek;guest
Do you remember the &&
operator? It was the logical AND. ||
is the logical OR. The parenthesis is necessary here because of the operators precedence rules. Without them, the pattern would have been erroneously interpreted as $1=($1 || 1)
instead. I let as an exercise for you to test how the result would have been different then.
Finally, if you are not too keen about arithmetic, I bet you will prefer that simpler solution:
awk '{ $1=$1; print }' FS=, OFS=';' file
CREDITS;EXPDATE;USER;GROUPS
99;01 jun 2018;sylvain;team:::admin
52;01 dec 2018;sonia;team
52;01 dec 2018;sonia;team
25;01 jan 2019;sonia;team
10;01 jan 2019;sylvain;team:::admin
8;12 jun 2018;öle;team:support
17;05 apr 2019;abhishek;guest
13. Removing multiple spaces
awk '$1=$1' file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
This is almost the same program as the preceding one. However, I left the field separators to their default values. So, multiple whitespaces are used as the input field separator, but only one space is used as the output field separator. This has the nice side effect of coalescing multiples whitespaces into one space.
14. Joining lines using AWK
We have already used OFS
, the output field separator. As you may have guessed, it has the ORS
counterpart to specify the output record separator:
awk '{ print $3 }' FS=, ORS=' ' file; echo
USER sylvain sonia sonia sonia sylvain öle abhishek
Here, I used a space after each record instead of a newline character. This one-liner is sufficient in some use cases, but it still has some drawbacks.
Most obviously, it does not discard whitespace-only lines (the extra spaces after öle are coming from that). So, I may end up using a plain regular expression instead:
awk '/[^[:space:]]/ { print $3 }' FS=, ORS=' ' file; echo
USER sylvain sonia sonia sonia sylvain öle abhishek
It is better now, but there is still a possible issue. It will be more obvious if we change the separator to something visible:
awk '/[^[:space:]]/ { print $3 }' FS=, ORS='+' file; echo
USER+sylvain+sonia+sonia+sonia+sylvain+öle+abhishek+
There is an extra separator at the end of the line— because the field separator is written after each record. Including the last one.
To fix that, I will rewrite the program to display a custom separator before the record, starting from the second output record.
awk '/[^[:space:]]/ { print SEP $3; SEP="+" }' FS=, ORS='' file; echo
USER+sylvain+sonia+sonia+sonia+sylvain+öle+abhishek
Since I take care of adding the separator by myself, I also set the standard AWK output record separator to the empty string. However, when you start dealing yourself with separators or formatting, it may be the sign you should consider using the printf function instead of the print
statement. As we will see it right now.
D. Field formatting
I have already mentioned the relationship between the AWK and C programming languages. Among other things, from the C language standard library AWK inherits the powerful printf
function, allowing great control over the formatting of the text sent to the output.
The printf
function takes a format as the first argument, containing both plain text that will be output verbatim and wildcards used to format different section of the output. The wildcards are identified by the %
character. The most common being %s
(for string formatting), %d
(for integer numbers formatting) and %f
(for floating point number formatting). As this can be rather abstract, let’s see an example:
awk '+$1 { printf("%s ", $3) }' FS=, file; echo
sylvain sonia sonia sonia sylvain öle abhishek
You may notice, as the opposite of the print
statement, the printf
function does not use the OFS
and ORS
values. So, if you want some separator, you have to explicitly mention it as I did by adding a space character at the end of the format string. This is the price to pay for having full control of the output.
While not at all a format specifier, this is an excellent occasion to introduce the \n
notation which can be used in any AWK string to represent a newline character.
awk '+$1 { printf("%s\n", $3) }' FS=, file
sylvain
sonia
sonia
sonia
sylvain
öle
abhishek
15. Producing tabular results
AWK enforces a record/field data format based on delimiters. However, using the printf
function, you can also produce fixed-width tabular output. Because each format specifier in a printf
statement can accept an optional width parameter:
awk '+$1 { printf("%10s | %4d\n", $3, $1) }' FS=, file
sylvain | 99
sonia | 52
sonia | 52
sonia | 25
sylvain | 10
öle | 8
abhishek | 17
As you can see, by specifying the width of each field, AWK pads them to the left with spaces. For text, it is usually preferable to pad on the right, something that can be achieved using a negative width number. Also, for integers, we may like to pad fields with zeros instead of spaces. This can be obtained by using an explicit 0 before the field width:
awk '+$1 { printf("%-10s | %04d\n", $3, $1) }' FS=, file
sylvain | 0099
sonia | 0052
sonia | 0052
sonia | 0025
sylvain | 0010
öle | 0008
abhishek | 0017
16. Dealing with floating point numbers
The %f
format does not deserve much explanations…
awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%f",SUM/NUM); }' FS=, file
AVG=37.571429
… except maybe to say you almost always want to explicitly set the field width and precision of the displayed result:
awk '+$1 { SUM+=$1; NUM+=1 } END { printf("AVG=%6.1f",SUM/NUM); }' FS=, file
AVG= 37.6
Here, the field width is 6, which means the field will occupy the space of 6 characters (including the dot, and eventually padded with spaces on the left as usually). The .1 precision means we want to display the number with 1 decimal numbers after the dot. I let you guess what %06.1
would display instead.
E. Using string functions in AWK
In addition to the printf
function, AWK contains few other nice string manipulation functions. In that domain, modern implementations like Gawk have a richer set of internal functions at the price of lower portability. As of myself, I will stick here with just a few POSIX-defined function that should work the same anywhere.
17. Converting text to upper case
This one, I use it a lot, because it handles internationalization issues nicely:
awk '$3 { print toupper($0); }' file
99,01 JUN 2018,SYLVAIN,TEAM:::ADMIN
52,01 DEC 2018,SONIA,TEAM
52,01 DEC 2018,SONIA,TEAM
25,01 JAN 2019,SONIA,TEAM
10,01 JAN 2019,SYLVAIN,TEAM:::ADMIN
8,12 JUN 2018,ÖLE,TEAM:SUPPORT
17,05 APR 2019,ABHISHEK,GUEST
As a matter of fact, this is probably the best and most portable solution to convert text to uppercase from the shell.
18. Changing part of a string
Using the substr
command, you can split a string of characters at a given length. Here I use it to capitalize only the first character of the third field:
awk '{ $3 = toupper(substr($3,1,1)) substr($3,2) } $3' FS=, OFS=, file
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,Sylvain,team:::admin
52,01 dec 2018,Sonia,team
52,01 dec 2018,Sonia,team
25,01 jan 2019,Sonia,team
10,01 jan 2019,Sylvain,team:::admin
8,12 jun 2018,Öle,team:support
17,05 apr 2019,Abhishek,guest
The substr
function takes the initial string, the (1-based) index of the first character to extract and the number of characters to extract. If that last argument is missing, substr
takes all the remaining characters of the string.
So, substr($3,1,1)
will evaluate to the first character of $3
, and substr($3,2)
to the remaining ones.
19. Splitting fields in sub-fields
The AWK record-field data model is really nice. However, sometimes you want to split fields themselves into several parts based on some internal separator:
awk '+$1 { split($2, DATE, " "); print $1,$3, DATE[2], DATE[3] }' FS=, OFS=, file
99,sylvain,jun,2018
52,sonia,dec,2018
52,sonia,dec,2018
25,sonia,jan,2019
10,sylvain,jan,2019
8,öle,jun,2018
17,abhishek,apr,2019
Somewhat surprisingly, this works even if some of my fields are separated by more than one whitespace. Mostly for historical reasons, when the separator is a single space, split
will consider “the elements are separated by runs of whitespace.” And not only by just one. The FS
special variable follows the same convention.
However, in the general case, one character string match one character. So, if you need something more complex, you have to remember the field separator is an extended regular expression.
As an example, let’s see how would be handled the group field which appears to be a multivalued field using a colon as separator:
awk '+$1 { split($4, GRP, ":"); print $3, GRP[1], GRP[2] }' FS=, file
sylvain team
sonia team
sonia team
sonia team
sylvain team
öle team support
abhishek guest
Whereas I would have expected to display up to two groups per user, it shows only one for most of them. That issue is caused by the multiple occurrences of the separator. So, the solution is:
awk '+$1 { split($4, GRP, /:+/); print $3, GRP[1], GRP[2] }' FS=, file
sylvain team admin
sonia team
sonia team
sonia team
sylvain team admin
öle team support
abhishek guest
The slashes instead of the quotes denote the literal as being a regular expression rather than a plain string, and the plus sign indicates this expression will match one or several occurrences of the previous character. So, in that case, each separator is made (of the longest sequence of) one or several consecutive colons.
20. Searching and replacing with AWK commands
Speaking of regular expressions, sometimes you want to perform substitution like the sed s///g
command, but only on one field. The gsub
command is what you need in that case:
awk '+$1 { gsub(/ +/, "-", $2); print }' FS=, file
99 01-jun-2018 sylvain team:::admin
52 01-dec-2018 sonia team
52 01-dec-2018 sonia team
25 01-jan-2019 sonia team
10 01-jan-2019 sylvain team:::admin
8 12-jun-2018 öle team:support
17 05-apr-2019 abhishek guest
The gsub
function takes a regular expression to search, a replacement string and the variable containing the text to be modified in place. If that later is missing, $0 is assumed.
F. Working with external commands in AWK
Another great feature of AWK is you can easily invoke external commands to process your data. There are basically two ways of doing it: using the system
instruction to invoke a program and letting it intermixing its output in the AWK output stream. Or using a pipe so AWK can capture the output of the external program for finer control of the result.
Those may be huge topics by themselves, but here are few simple examples to show you the power behind those features.
21. Adding the date on top of a file
awk 'BEGIN { printf("UPDATED: "); system("date") } /^UPDATED:/ { next } 1' file
UPDATED: Thu Feb 15 00:31:03 CET 2018
CREDITS,EXPDATE,USER,GROUPS
99,01 jun 2018,sylvain,team:::admin
52,01 dec 2018,sonia,team
52,01 dec 2018,sonia,team
25,01 jan 2019,sonia,team
10,01 jan 2019,sylvain,team:::admin
8,12 jun 2018,öle,team:support
17,05 apr 2019,abhishek,guest
In that AWK program, I start by displaying the work UPDATED. Then the program invokes the external date command, which will send its result on the output right after the text produced by AWK at that stage.
The rest of the AWK program just remove an update statement eventually present in the file and print all the other lines (with the rule 1
).
Notice the next
statement. It is used to abort processing of the current record. It is a standard way of ignoring some records from the input file.
22. Modifying a field externally
For more complex cases, you may need to consider the | getline VARIABLE
idiom of AWK:
awk '+$1 { CMD | getline $5; close(CMD); print }' CMD="uuid -v4" FS=, OFS=, file
99,01 jun 2018,sylvain,team:::admin,5e5a1bb5-8a47-48ee-b373-16dc8975f725
52,01 dec 2018,sonia,team,2b87e9b9-3e75-4888-bdb8-26a9b34facf3
52,01 dec 2018,sonia,team,a5fc22b5-5388-49be-ac7b-78063cbbe652
25,01 jan 2019,sonia,team,3abb0432-65ef-4916-9702-a6095f3fafe4
10,01 jan 2019,sylvain,team:::admin,592e9e80-b86a-4833-9e58-1fe2428aa2a2
8,12 jun 2018,öle,team:support,3290bdef-fd84-4026-a02c-46338afd4243
17,05 apr 2019,abhishek,guest,e213d756-ac7f-4228-818f-1125cba0810f
This will run the command stored in the CMD
variable, read the first line of the output of that command, and store it into the variable $5
.
Pay special attention to the close statement, crucial here as we want AWK to create a new instance of the external command each time it executes the CMD | getline
statement. Without the close statement, AWK would instead try to read several lines of output from the same command instance.
23. Invoking dynamically generated commands
Commands in AWK are just plain strings without anything special. It is the pipe operator that triggers external programs execution. So, if you need, you can dynamically construct arbitrary complex commands by using the AWK string manipulation functions and operators.
awk '+$1 { cmd = sprintf(FMT, $2); cmd | getline $2; close(cmd); print }' FMT='date -I -d "%s"' FS=, file
99 2018-06-01 sylvain team:::admin
52 2018-12-01 sonia team
52 2018-12-01 sonia team
25 2019-01-01 sonia team
10 2019-01-01 sylvain team:::admin
8 2018-06-12 öle team:support
17 2019-04-05 abhishek guest
We have already met the printf
function. sprintf
is very similar but will return the built string rather than sending it to the output.
24. Joining data
To show you the purpose of the close statement, I let you try out that last example:
awk '+$1 { CMD | getline $5; print }' CMD='od -vAn -w4 -t x /dev/urandom' FS=, file
99 01 jun 2018 sylvain team:::admin 1e2a4f52
52 01 dec 2018 sonia team c23d4b65
52 01 dec 2018 sonia team 347489e5
25 01 jan 2019 sonia team ba985e55
10 01 jan 2019 sylvain team:::admin 81e9a01c
8 12 jun 2018 öle team:support 4535ba30
17 05 apr 2019 abhishek guest 80a60ec8
与使用上面命令的示例相反uuid
,这里在AWK 程序运行时仅启动一个实例,并且在处理每个记录时,我们都会读取同一od
过程的输出的另一行。
结论
当然,这篇 AWK 快速导览不能替代关于该工具的完整课程或教程。但是,对于那些不熟悉它的人来说,我希望它能给你足够的启发,让你可以立即将 AWK 添加到你的工具箱中。
另一方面,如果您已经是 AWK 爱好者,您可能已经在这里找到一些技巧,可以用来提高效率或只是给您的朋友留下深刻印象。
但是,我并不假装自己已经详尽无遗。因此,无论如何,请毫不犹豫地使用下面的评论部分分享您最喜欢的 AWK 单行代码或任何其他 AWK 技巧!