在 Linux 终端中使用 Fold 和 FMT 命令处理换行
当您使用文字处理器时,格式化文本以使行适合目标设备上的可用空间应该不是问题。但在终端上工作时,事情就没那么容易了。
当然,您也可以使用自己喜欢的文本编辑器手动换行,但这种情况很少见,甚至对于自动处理来说也是不可能的。
希望 POSIXfold
实用程序和 GNU/BSDfmt
命令可以帮助您重新排列文本,以便行不会超过给定的长度。
那么,Unix 中的“line”是什么意思呢?
在详细介绍fold
和fmt
命令之前,我们先来定义一下我们在说什么。在文本文件中,一行由任意数量的字符组成,后跟特殊的换行控制序列(有时称为 EOL,代表行尾)
在类 Unix 系统中,行尾控制序列由(唯一的)字符换行符组成,有时缩写为 LF 或\n
按照从 C 语言继承的惯例书写。在二进制级别,换行符表示为保存0a
十六进制值的字节。
您可以使用hexdump
本文中经常使用的实用程序轻松检查这一点。因此这可能是熟悉该工具的好机会。例如,您可以检查下面的十六进制转储,以查找每个 echo 命令发送了多少个换行符。一旦您认为自己找到了解决方案,只需重试那些不带该| hexdump -C
部分的命令,看看您是否猜对了。
sh$ echo hello | hexdump -C
00000000 68 65 6c 6c 6f 0a |hello.|
00000006
sh$ echo -n hello | hexdump -C
00000000 68 65 6c 6c 6f |hello|
00000005
sh$ echo -e 'hello\n' | hexdump -C
00000000 68 65 6c 6c 6f 0a 0a |hello..|
00000007
值得一提的是,不同的操作系统在换行符方面可能遵循不同的规则。如上所述,类 Unix 操作系统使用换行符,但 Windows 与大多数 Internet 协议一样,使用两个字符:回车符+换行符对(CRLF,或0d 0a
,或\r\n
)。在“经典” Mac OS(直到 21 世纪初的 MacOS 9.2)上,Apple 计算机仅使用 CR 作为换行符。其他传统计算机也使用 LFCR 对,在较旧的 ASCII 不兼容系统中甚至使用完全不同的字节序列。幸运的是,后者已成为历史,我怀疑今天您是否还会再看到使用EBCDIC的计算机!
说到历史,如果你好奇的话,“回车”和“换行”控制字符的使用可以追溯到电传打字机时代使用的Baudot 码。你可能在老电影中看到过电传打字机被描述为房间大小的计算机的接口。但即使在那之前,电传打字机也是“独立”用于点对点或多点通信。当时,典型的终端看起来像一台重型打字机,带有机械键盘、纸张和固定打印头的移动托架。要开始新的一行,必须将托架移回最左边,并且必须通过旋转压板(有时称为“滚筒”)将纸张向上移动。这两个动作由两个独立的机电系统控制,换行和回车控制字符直接连接到设备的这两个部分。由于移动托架比旋转压板需要更多时间,因此首先启动回车是合乎逻辑的。分离这两个功能还带来了一些有趣的副作用,比如允许叠印(仅发送 CR)或有效传输“双行间”(一个 CR + 两个 LF)。
本节开头的定义主要描述了逻辑行是什么。然而,大多数情况下,“任意长”的逻辑行必须发送到物理设备(如屏幕或打印机),而这些设备的可用空间是有限的。在具有较大物理行的设备上显示较短的逻辑行不是问题。只是文本右侧有未使用的空间。但是,如果您尝试显示大于设备上可用空间的文本行,该怎么办?实际上,有两种解决方案,每种解决方案都有其缺点:
- 首先,设备可以按其物理尺寸截断行,从而向用户隐藏部分内容。有些打印机会这样做,尤其是哑打印机(是的,今天仍然有基本的点阵打印机在使用,特别是在恶劣或肮脏的环境中!)
- 显示长逻辑行的第二种方法是将它们拆分成几行物理行。这称为换行,因为行似乎会环绕可用空间,如果您可以调整显示大小(例如使用终端仿真器时),这种效果尤其明显。
这些自动行为非常有用,但有时您仍希望无论设备的物理尺寸如何,都在给定位置中断长行。例如,它可能很有用,因为您希望换行符出现在屏幕和打印机上的相同位置。或者因为您希望在未执行换行的应用程序中使用该文本(例如,如果您以编程方式将文本嵌入 SVG 文件)。最后,不管您信不信,仍然有许多通信协议在传输中规定了最大行宽,包括IRC和SMTP等流行协议(如果您曾经看到错误550 超出最大行长度,您就知道我在说什么)。因此,有很多场合您需要将长行拆分成较小的块。这是 POSIX 命令的工作fold
。
折叠命令
如果不带任何选项,该fold
命令会添加额外的换行控制序列,以确保没有一行超过 80 个字符的限制。为了清楚起见,一行最多包含 80 个字符加上换行序列。
如果你已经下载了该文章的支持材料,你可以自己尝试一下:
sh$ fold POSIX.txt | head -5
The Portable Operating System Interface (POSIX)[1] is a family of standards spec
ified by the IEEE Computer Society for maintaining compatibility between operati
ng systems. POSIX defines the application programming interface (API), along wit
h command line shells and utility interfaces, for software compatibility with va
riants of Unix and other operating systems.[2][3]
# Using AWK to prefix each line by its length:
sh$ fold POSIX.txt | awk '{ printf("%3d %s\n", length($0), $0) }'
80 The Portable Operating System Interface (POSIX)[1] is a family of standards spec
80 ified by the IEEE Computer Society for maintaining compatibility between operati
80 ng systems. POSIX defines the application programming interface (API), along wit
80 h command line shells and utility interfaces, for software compatibility with va
49 riants of Unix and other operating systems.[2][3]
0
80 The standards emerged from a project that began circa 1985. Richard Stallman sug
80 gested the name POSIX to the IEEE instead of former IEEE-IX. The committee found
71 it more easily pronounceable and memorable, and thus adopted it.[2][4]
您可以使用选项更改最大输出行长度-w
。更有趣的可能是使用该-s
选项来确保行在单词边界处换行。让我们比较一下-s
在示例文本的第二段中应用该选项和不应用该选项的结果:
# Without `-s` option: fold will break lines at the specified position
# Broken lines have exactly the required width
sh$ awk -vRS='' 'NR==2' POSIX.txt |
fold -w 30 | awk '{ printf("%3d %s\n", length($0), $0) }'
30 The standards emerged from a p
30 roject that began circa 1985.
30 Richard Stallman suggested the
30 name POSIX to the IEEE instea
30 d of former IEEE-IX. The commi
30 ttee found it more easily pron
30 ounceable and memorable, and t
21 hus adopted it.[2][4]
# With `-s` option: fold will break lines at the last space before the specified position
# Broken lines are shorter or equal to the required width
awk -vRS='' 'NR==2' POSIX.txt |
fold -s -w 30 | awk '{ printf("%3d %s\n", length($0), $0) }'
29 The standards emerged from a
25 project that began circa
23 1985. Richard Stallman
28 suggested the name POSIX to
27 the IEEE instead of former
29 IEEE-IX. The committee found
29 it more easily pronounceable
24 and memorable, and thus
17 adopted it.[2][4]
显然,如果您的文本包含的单词长度超过最大行长,折叠命令将无法遵守该-s
标志。在这种情况下,该fold
实用程序将在最大位置处拆分超长的单词,始终确保没有一行超过允许的最大宽度。
sh$ echo "It's Supercalifragilisticexpialidocious!" | fold -sw 10
It's
Supercalif
ragilistic
expialidoc
ious!
多字节字符
与大多数(如果不是全部)核心实用程序一样,该fold
命令在设计时就将一个字符等同于一个字节。然而,在现代计算中,情况已不再如此,尤其是随着UTF-8 的广泛采用。这会导致一些不幸的问题:
# Just in case, check first the relevant locale
# settings are properly defined
debian-9.4$ locale | grep LC_CTYPE
LC_CTYPE="en_US.utf8"
# Everything is OK, unfortunately...
debian-9.4$ echo élève | fold -w2
é
l�
�v
e
单词“élève”(法语中“学生”的意思)包含两个带重音符号的字母:(é
带有尖音符的拉丁小写字母 E)和è
(带有重音符的拉丁小写字母 E)。使用 UTF-8 字符集,这些字母分别使用两个字节进行编码(分别为c3 a9
和c3 a8
),而不是像非重音拉丁字母那样只使用一个字节。您可以使用实用程序检查原始字节来检查这一点hexdump
。您应该能够精确定位与é
和è
字符相对应的字节序列。顺便说一句,您还可以在该转储中看到我们的老朋友换行符,其十六进制代码在前面提到过:
debian-9.4$ echo élève | hexdump -C
00000000 c3 a9 6c c3 a8 76 65 0a |..l..ve.|
00000008
现在让我们检查一下 fold 命令产生的输出:
debian-9.4$ echo élève | fold -w2
é
l�
�v
e
debian-9.4$ echo élève | fold -w 2 | hexdump -C
00000000 c3 a9 0a 6c c3 0a a8 76 0a 65 0a |...l...v.e.|
0000000b
显然,由于额外的换行符,命令生成的结果fold
比原始字符串稍长:包括换行符在内分别为 11 个字节长和 8 个字节长。说到这个,fold
您可能在命令的输出中看到0a
每两个字节出现一次换行符 ( )。这正是问题所在:fold 命令在字节位置而不是字符位置换行。即使该换行发生在多字节字符的中间!无需提及,生成的输出不再是有效的 UTF-8 字节流,因此我的终端使用Unicode 替换字符( �
) 作为无效字节序列的占位符。
就像我几周前写的命令一样cut
,这是 GNU 实用程序实现的一个限制fold
,这显然与 POSIX 规范相悖,POSIX 规范明确规定“字符中间不能断行”。
因此,GNUfold
实现似乎只能正确处理固定长度的单字节字符编码(US-ASCII、Latin1等)。作为一种解决方法,如果存在合适的字符集,您可以在处理文本之前将其转码为单字节字符编码,然后再将其转码回 UTF-8。但是,这至少可以说很麻烦:
debian-9.4$ echo élève |
iconv -t latin1 | fold -w 2 |
iconv -f latin1 | hexdump -C
00000000 c3 a9 6c 0a c3 a8 76 0a 65 0a |..l...v.e.|
0000000a
debian-9.4$ echo élève |
iconv -t latin1 | fold -w 2 |
iconv -f latin1
él
èv
e
这一切都令人非常失望,因此我决定检查其他实现的行为。通常情况下,该fold
实用程序的 OpenBSD 实现在这方面要好得多,因为它符合 POSIX 标准,并且会遵守LC_CTYPE
区域设置以正确处理多字节字符:
openbsd-6.3$ locale | grep LC_CTYPE
LC_CTYPE=en_US.UTF-8
openbsd-6.3$ echo élève | fold -w 2 C
él
èv
e
openbsd-6.3$ echo élève | fold -w 2 | hexdump -C
00000000 c3 a9 6c 0a c3 a8 76 0a 65 0a |..l...v.e.|
0000000a
如您所见,OpenBSD 实现正确地在字符位置处剪切行,而不管编码它们所需的字节数是多少。在绝大多数用例中,这就是您想要的。但是,如果您需要将一个字节视为一个字符的传统(即 GNU 样式)行为,则可以暂时将当前语言环境更改为所谓的POSIX 语言环境(由常量“POSIX”或出于历史原因的“C”标识):
openbsd-6.3$ echo élève | LC_ALL=C fold -w 2
é
l�
�v
e
openbsd-6.3$ echo élève | LC_ALL=C fold -w 2 | hexdump -C
00000000 c3 a9 0a 6c c3 0a a8 76 0a 65 0a |...l...v.e.|
0000000b
最后,POSIX 指定了-b
标志,该标志指示实用程序以字节fold
为单位测量行长度,但仍然保证多字节字符(根据当前的语言环境设置)不会被破坏。LC_CTYPE
作为练习,我强烈建议您花点时间找出将当前语言环境更改为“C”(上文)所获得的结果与使用标志所获得的结果-b
(下文)在字节级别的差异。这可能很微妙。但有一个区别:
openbsd-6.3$ echo élève | fold -b -w 2 | hexdump -C
00000000 c3 a9 0a 6c 0a c3 a8 0a 76 65 0a |...l....ve.|
0000000b
那么,你发现区别了吗?
好吧,通过将语言环境更改为“C”,该fold
实用程序就不会考虑多字节序列 - 因为根据定义,当语言环境为“C”时,工具必须假定一个字符是一个字节。因此,可以在任何地方添加换行符,甚至可以在在另一个字符编码中被视为多字节字符的字节序列中间添加换行符。这正是工具生成c3 0a a8
字节序列时发生的情况:当将字符编码定义为 UTF-8 时,两个字节c3 a8
被理解为一个字符LC_CTYPE
。但是在“C”语言环境中,相同的字节序列被视为两个字符:
# Bytes are bytes. They don't change so
# the byte count is the same whatever is the locale
openbsd-6.3$ printf "%d bytes\n" $(echo -n é | LC_ALL=en_US.UTF-8 wc -c)
2 bytes
openbsd-6.3$ printf "%d bytes\n" $(echo -n é | LC_ALL=C wc -c)
2 bytes
# The interpretation of the bytes may change depending on the encoding
# so the corresponding character count will change
openbsd-6.3$ printf "%d chars\n" $(echo -n é | LC_ALL=en_US.UTF-8 wc -m)
1 chars
openbsd-6.3$ printf "%d chars\n" $(echo -n é | LC_ALL=C wc -m)
2 chars
另一方面,使用该-b
选项,该工具仍应支持多字节。该选项唯一改变的是它计算位置的方式,这次以字节为单位,而不是默认的以字符为单位。在这种情况下,由于多字节序列不会被分解,因此生成的输出仍然是有效的字符流(根据当前的LC_CTYPE
语言环境设置):
openbsd-6.3$ echo élève | fold -b -w 2
é
l
è
ve
您已经看到了,现在不再出现 Unicode 替换字符 ( �
),并且我们在此过程中没有丢失任何有意义的字符 — 代价是这次行包含可变数量的字符和可变数量的字节。最后,该工具确保每行的字节数不超过选项要求的字节数-w
。我们可以使用该wc
工具检查以下内容:
openbsd-6.3$ echo élève | fold -b -w 2 | while read line; do
> printf "%3d bytes %3d chars %s\n" \
> $(echo -n $line | wc -c) \
> $(echo -n $line | wc -m) \
> $line
> done
2 bytes 1 chars é
1 bytes 1 chars l
2 bytes 1 chars è
2 bytes 2 chars ve
再次,花点时间研究一下上面的示例。它使用了我之前没有详细解释过的printf和wc 命令。所以,如果还不够清楚,请随时使用评论部分寻求解释!
出于好奇,我-b
使用 GNUfold
实现检查了我的 Debian 机器上的标志:
debian-9.4$ echo élève | fold -w 2 | hexdump -C
00000000 c3 a9 0a 6c c3 0a a8 76 0a 65 0a |...l...v.e.|
0000000b
debian-9.4$ echo élève | fold -b -w 2 | hexdump -C
00000000 c3 a9 0a 6c c3 0a a8 76 0a 65 0a |...l...v.e.|
0000000b
不要花时间尝试找出该示例的-b
和非-b
版本之间的区别:我们已经看到 GNU fold 实现不支持多字节,因此两个结果相同。如果您不相信这一点,也许您可以使用该diff -s
命令让您的计算机确认这一点。如果您这样做,请使用评论部分与其他读者分享您使用的命令!
无论如何,这是否意味着该-b
选项在实用程序的 GNU 实现中无用fold
?好吧,通过更仔细地阅读命令的 GNU Coreutils 文档fold
,我发现该-b
选项仅处理特殊字符,例如制表符或退格符,它们在正常模式下分别计为 1~8(一到八)或 -1(减一)位置,但它们在字节模式下始终计为 1 个位置。令人困惑?所以,也许我们可以花点时间更详细地解释一下。
制表符和退格符处理
您处理的大多数文本文件仅包含可打印字符和行尾序列。但是,偶尔,一些控制字符可能会进入您的数据。制表符 ( \t
) 就是其中之一。更罕见的是,退格符 ( \b
) 也可能会遇到。我在这里仍然提到它,因为正如它的名称所暗示的那样,它是一个控制字符,使光标向后(向左)移动一个位置,而大多数其他字符使光标向前(向右)移动。
sh$ echo -e 'tab:[\t] backspace:[\b]'
tab:[ ] backspace:]
这可能在您的浏览器中不可见,因此我强烈建议您在终端上进行测试。但是制表符 ( \t
) 在输出中占据了几个位置。还有退格键?输出中似乎有些奇怪,不是吗?因此让我们放慢速度,将文本字符串分成几个部分,并sleep
在它们之间插入一些:
# For that to work, type all the commands on the same line
# or using backslashes like here if you split them into
# several (physical) lines:
sh$ echo -ne 'tab:[\t] backspace:['; \
sleep 1; echo -ne '\b'; \
sleep 1; echo -n ']'; \
sleep 1; echo ''
好吗?这次你看到了吗?让我们分解一下事件的顺序:
- 第一串字符“正常”显示到第二个左方括号。由于该
-n
标志,echo
命令不会发送换行符,因此光标停留在同一行。 - 先睡觉。
- 按下 Backspace 键,光标向后移动一个位置。仍然没有换行符,因此光标仍在同一行。
- 第二次睡眠。
- 显示结束方括号,覆盖开始方括号。
- 第三次睡觉。
- 如果没有选项
-n
,最后一个echo
命令最终会发送换行符,并且光标会移动到下一行,并在那里显示您的 shell 提示符。
当然,如果你还记得的话,使用回车符也可以获得类似的酷炫效果:
sh$ echo -n 'hello'; sleep 1; echo -e '\rgood bye'
good bye
我确信你已经见过一些命令行实用程序,例如curl 和 wget,它们\b
会显示进度条。它们使用and/or的组合来发挥它们的魔力\r
。
有趣的是,讨论本身可以很有趣,这里的重点是了解处理这些字符对于fold
实用程序来说可能具有挑战性。希望 POSIX 标准定义规则:
<backspace> 当前行宽计数应减一,但计数不得为负数。折叠实用程序不得在任何 <backspace> 之前或之后立即插入 <newline>。 <carriage-return> 当前行宽计数应设置为零。折叠实用程序不得在任何 <carriage-return> 之前或之后立即插入 <newline>。 <tab> 遇到的每个 <tab> 应将列位置指针前进到下一个制表位。制表位应位于每个列位置 n,使得 n 模 8 等于1。_
使用该选项时,所有这些特殊处理均被禁用-b
。在这种情况下,上述所有控制字符都(正确地)计为一个字节,因此位置计数器增加一且仅增加一 — 就像任何其他字符一样。
为了更好地理解,我让你自己研究以下两个示例(也许使用实用hexdump
程序)。你现在应该能够找到为什么“hello”变成了“hell”,以及输出中的“i”到底在哪里(因为它在那里,即使你看不到它!)与往常一样,如果你需要帮助,或者只是想分享你的发现,评论部分是你的。
# Why "hello" has become "hell"? where is the "i"?
sh$ echo -e 'hello\rgood bi\bye' | fold -w4
hell
good
bye
# Why "hello" has become "hell"? where is the "i"?
# Why the second line seems to be made of only two chars instead of 4?
sh$ echo -e 'hello\rgood bi\bye' | fold -bw4
hell
go
od b
ye
其他限制
到目前为止我们研究的命令fold
旨在将较长的逻辑行分解为较小的物理行,特别是为了格式化的目的。
这意味着它假设每行输入都是独立的,可以独立于其他行进行拆分。然而,情况并非总是如此。例如,让我们考虑一下我收到的那封非常重要的邮件:
sh$ cat MAIL.txt
Dear friends,
Have a nice day!
We are manufactuer for event chairs and tables, more than 10 years experience.
We supply all kinds of wooden, resin and metal event chairs, include chiavari
chairs, cross back chairs, folding chairs, napoleon chairs, phoenix chairs, etc.
Our chairs and tables are of high quality and competitively priced.
If you need our products, welcome to contact me;we are happy to make you special
offer.
Best Regards
Doris
sh$ awk '{ length>maxlen && (maxlen=length) } END { print maxlen }' MAIL.txt
81
Obviously, lines were already broken to some fixed width. The awk
command told me the maximum line width here was … 81 characters— excluding the new line sequence. Yes, that was sufficiently odd so that I double checked it: indeed the longest line has 80 printable characters plus one extra space at the 81st position and only after that there is the linefeed character. Probably IT people working on behalf of this chair “manufactuer” could take benefit of reading this article!
Anyway, assuming I would like to change the formatting of that email, I will have issues with the fold
command because of the existing line breaks. I let you check the two commands below by yourself if you want, but none of them will work as expected:
sh$ fold -sw 100 MAIL.txt
sh$ fold -sw 60 MAIL.txt
The first one will simply do nothing since all lines are already shorter than 100 characters. Regarding the second command, it will break lines at the 60th position but keep already existing newline characters so that the result will be jagged. It will be particularly visible in the third paragraph:
sh$ awk -v RS='' 'NR==3' MAIL.txt |
fold -sw 60 |
awk '{ length>maxlen && (maxlen=length); print length, $0 }'
53 We supply all kinds of wooden, resin and metal event
25 chairs, include chiavari
60 chairs, cross back chairs, folding chairs, napoleon chairs,
20 phoenix chairs, etc.
The first line of the third paragraph was broken at position 53, which is consistent with our maximum width of 60 characters per line. However, the second line broke at position 25 because that newline character was already present in the input file. In other words, to properly resize the paragraphs, we need first to rejoin the lines before breaking them at the new target position.
You can use sed
or awk
to rejoin the lines. And as a matter of fact, as I mentioned it in the introductory video, that would be a good challenge for you to take. So don’t hesitate to post your solution in the comment section.
As for myself, I will follow an easier path by looking at the fmt
command. Whereas not a POSIX standard command, it is available both in the GNU and BSD world. So there are good chances it will be usable on your system. Unfortunately, the lack of standardization will have some negative implications as we will see it later. But for now, let’s concentrate of the good parts.
The fmt command
The fmt
command is more evolved than the fold
command and has more formatting options. The most interesting part is it can identify paragraphs in the input file based on the empty lines. That means all lines up to the next empty line (or the end of the file) will be first joined together to form what I called earlier a “logical line” of the text. Only after that, the fmt
command will break the text at the requested position.
Let’s see now what that will change when applied to the second paragraph of my example mail:
sh$ awk -v RS='' 'NR==3' MAIL.txt |
fmt -w 60 |
awk '{ length>maxlen && (maxlen=length); print length, $0 }'
60 We supply all kinds of wooden, resin and metal event chairs,
59 include chiavari chairs, cross back chairs, folding chairs,
37 napoleon chairs, phoenix chairs, etc.
Anecdotally, the fmt
command accepted to pack one more word in the first line. But more interesting, the second line is now filled, meaning the newline character already present in the input file after the word “chiavari” (what’s this?) has been discarded. Of courses, things are not perfect, and the fmt
paragraph detection algorithm sometimes triggers false positives, like in the greetings at the end of the mail (line 14 of the output):
sh$ fmt -w 60 MAIL.txt | cat -n
1 Dear friends,
2
3 Have a nice day! We are manufactuer for event chairs and
4 tables, more than 10 years experience.
5
6 We supply all kinds of wooden, resin and metal event chairs,
7 include chiavari chairs, cross back chairs, folding chairs,
8 napoleon chairs, phoenix chairs, etc.
9
10 Our chairs and tables are of high quality and competitively
11 priced. If you need our products, welcome to contact me;we
12 are happy to make you special offer.
13
14 Best Regards Doris
I said earlier the fmt
command was a more evolved text formatting tool than the fold
utility. Indeed it is. It may not be obvious at first sight, but if you look carefully lines 10-11, you may notice it used two spaces after the dot— enforcing a most discussed convention of using two spaces at the end of a sentence. I will not go into that debate to know if you should or shouldn’t use two spaces between sentences but you have no real choice here: to my knowledge, none of the common implementations of the fmt
command offer a flag to disable the double space after a sentence. Unless such an option exists somewhere and I missed it? If this is the case, I’ll be happy you make me know about that using the comment section: as a French writer, I never used the “double space” after a sentence…
More fmt options
The fmt
utility is designed with some more formatting capabilities than the fold command. However, not being POSIX defined, there are major incompatibilities between the GNU and BSD options.
For example, the -c
option is used in the BSD world to center the text whereas in GNU Coreutils’s fmt
it enables the crown margin mode, “preserving the indentation of the first two lines within a paragraph, and align the left margin of each subsequent line with that of the second line. “
I let you experiment by yourself with the GNU fmt -c
if you want. Personally, I find the BSD text centering feature more interesting to study because of some oddity: indeed, in OpenBSD, fmt -c
will center the text according to the target width— but without reflowing it! So the following command will not work as you might have expected:
openbsd-6.3$ fmt -c -w 60 MAIL.txt
Dear friends,
Have a nice day!
We are manufactuer for event chairs and tables, more than 10 years experience.
We supply all kinds of wooden, resin and metal event chairs, include chiavari
chairs, cross back chairs, folding chairs, napoleon chairs, phoenix chairs, etc.
Our chairs and tables are of high quality and competitively priced.
If you need our products, welcome to contact me;we are happy to make you special
offer.
Best Regards
Doris
If you really want to reflow the text for a maximum width of 60 characters and center the result, you will have to use two instances of the fmt
command:
openbsd-6.3$ fmt -w 60 MAIL.txt | fmt -c -w60
Dear friends,
Have a nice day! We are manufactuer for event chairs and
tables, more than 10 years experience.
We supply all kinds of wooden, resin and metal event chairs,
include chiavari chairs, cross back chairs, folding chairs,
napoleon chairs, phoenix chairs, etc.
Our chairs and tables are of high quality and competitively
priced. If you need our products, welcome to contact me;we
are happy to make you special offer.
Best Regards Doris
I will not make here an exhaustive list of the differences between the GNU and BSD fmt
implementations … essentially because all the options are different! Except of course the -w
option. Speaking of that, I forgot to mention -N
where N is an integer is a shortcut for -wN
. Moreover you can use that shortcut both with the fold
and fmt
commands: so, if you were perseverent enough to read his article until this point, as a reward you may now amaze your friends by saving one (!) entire keystroke the next time you will use one of those utilities:
debian-9.4$ fmt -50 POSIX.txt | head -5
The Portable Operating System Interface
(POSIX)[1] is a family of standards specified
by the IEEE Computer Society for maintaining
compatibility between operating systems. POSIX
defines the application programming interface
openbsd-6.3$ fmt -50 POSIX.txt | head -5
The Portable Operating System Interface (POSIX)[1]
is a family of standards specified by the IEEE
Computer Society for maintaining compatibility
between operating systems. POSIX defines the
application programming interface (API), along
debian-9.4$ fold -sw50 POSIX.txt | head -5
The Portable Operating System Interface
(POSIX)[1] is a family of standards specified by
the IEEE Computer Society for maintaining
compatibility between operating systems. POSIX
defines the application programming interface
openbsd-6.3$ fold -sw50 POSIX.txt | head -5
The Portable Operating System Interface
(POSIX)[1] is a family of standards specified by
the IEEE Computer Society for maintaining
compatibility between operating systems. POSIX
defines the application programming interface
As the final word, you may also notice in that last example the GNU and BSD versions of the fmt
utility are using a different formatting algorithm, producing a different result. On the other hand, the simpler fold
algorithm produces consistent results between the implementations. All that to say if portability is a premium, you need to stick with the fold
command, eventually completed by some other POSIX utilities. But if you need more fancy features and can afford to break compatibility, take a look at the manual for the fmt
command specific to your own system. And let us know if you discovered some fun or creative usage for those vendor-specific options!