Linux Unicode and HTML Characters Lookup By Name or Number
You need to use the unum program which is written in Perl. It is a command line utility which allows you to convert decimal, octal, hexadecimal, and binary numbers; Unicode character and block names; and HTML/XHTML character entity names into one another. It can be used as an on-line special character reference for Web authors. This program written in portable Perl which allows you to look up Unicode and HTML characters by name or number, and inter convert numbers in decimal, hexadecimal, and octal bases. Use the unum program to insert special characters into a document or a text field. This is useful for characters that are not available on your keyboard.
Tutorial details | |
---|---|
Difficulty level | Easy |
Root privileges | No |
Requirements | Linux terminal |
Category | Terminal/ssh |
Prerequisites | Perl version v.5.8+ |
OS compatibility | Debian • Linux • Ubuntu |
Est. reading time | 4 minutes |
Displaying unicode character properties using the unicode command
By default, unicode command may not be installed on your system. Hence, use the apk command on Alpine Linux, dnf command/yum command on RHEL & co, apt command/apt-get command on Debian, Ubuntu & co, zypper command on SUSE/OpenSUSE, pacman command on Arch Linux to install the unicode.
$ sudo apt update
$ sudo apt install unicode
[sudo] password for vivek: Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: unicode-data The following NEW packages will be installed: unicode unicode-data 0 upgraded, 2 newly installed, 0 to remove and 10 not upgraded. Need to get 7,786 kB of archives. After this operation, 37.8 MB of additional disk space will be used. Do you want to continue? [Y/n] y Get:1 http://in.archive.ubuntu.com/ubuntu focal/universe amd64 unicode all 2.7-1 [18.3 kB] Get:2 http://in.archive.ubuntu.com/ubuntu focal/universe amd64 unicode-data all 13.0.0-2build1 [7,768 kB] Fetched 7,786 kB in 4s (1,832 kB/s) Selecting previously unselected package unicode. (Reading database ... 333153 files and directories currently installed.) Preparing to unpack .../archives/unicode_2.7-1_all.deb ... Unpacking unicode (2.7-1) ... Selecting previously unselected package unicode-data. Preparing to unpack .../unicode-data_13.0.0-2build1_all.deb ... Unpacking unicode-data (13.0.0-2build1) ... Setting up unicode-data (13.0.0-2build1) ... Setting up unicode (2.7-1) ... Processing triggers for man-db (2.9.1-1) ...
Examples
The syntax is:
$ unicode <string>
For example:
$ unicode n
Outputs:
U+006E LATIN SMALL LETTER N UTF-8: 6e UTF-16BE: 006e Decimal: n Octal: \0156 n (N) Uppercase: 004E Category: Ll (Letter, Lowercase); East Asian width: Na (narrow) Unicode block: 0000..007F; Basic Latin Bidi: L (Left-to-Right)
Try some more examples:
$ unicode ☻
$ unicode U+00E1
$ unicode ????
$ unicode 0400..
data:image/s3,"s3://crabby-images/616ad/616ad2d6851e244d2c8aea9cbdb7919b7af09780" alt="Linux Unix Unicode and HTML Characters Lookup By Name or Number"
Download and Install the unum program
Type the following wget command:
$ [ ! -d ~/bin/perl ] && mkdir -p ~/bin/perl
$ cd ~/bin/perl
$ wget https://www.fourmilab.ch/webtools/unum/download/unum.tar.gz
Untar unum.tar.gz using tar command, enter:
$ tar xvf unum.tar.gz
Use ln command to create a softlink, run:
$ ln -s unum.pl unum
Set PATH in Linux using the export command. For instance:
$ export PATH="$PATH:$HOME/bin:$HOME/bin/perl"
How do I use the unum program to do unicode and HTML characters lookup by name or number?
The syntax is as follows:
$ unum arg
$ unum query
$ unum character
$ unum a
$ unum 9
Please note that all name queries are case-insensitive and accept regular expressions. Be sure to quote regular expressions if they contain characters with meaning to the shell.
Examples
Perform unicode look for a character called ‘d’, run:
$ unum d
Sample outputs:
Octal Decimal Hex HTML Character Unicode 0144 100 0x64 d "d" LATIN SMALL LETTER D
To perform unicode look up for ‘abc’ (non-digit), enter:
$ unum abc
Sample outputs:
Octal Decimal Hex HTML Character Unicode 0141 97 0x61 a "a" LATIN SMALL LETTER A 0142 98 0x62 b "b" LATIN SMALL LETTER B 0143 99 0x63 c "c" LATIN SMALL LETTER C
Other examples:
## arg ## ## Description ## 147 Decimal number 0371 Octal number 0xfa75 Hexadecimal number (letters may be A-F or a-f) 0b11010011 Binary number '∫π' One or more XHTML numeric entities (hex or decimal) xyz The characters xyz (non-digit) c=7Y The characters 7Y (any Unicode characters) b=cherokee List Unicode blocks containing "CHEROKEE" h=alpha List XHTML entities containing "alpha" n=aggravation Unicode characters with "AGGRAVATION" in the name n=^greek.*rho Unicode characters beginning with "GREEK" and containing "RHO" l=gothic List all characters in matching Unicode blocks
A note about GUI programs
By default, gucharmap command may not be installed on your system. Hence, use the apk command on Alpine Linux, dnf command/yum command on RHEL & co, apt command/apt-get command on Debian, Ubuntu & co, zypper command on SUSE/OpenSUSE, pacman command on Arch Linux to install the gucharmap.
Applications menu ▸ Choose Accessories ▸ Character Map
Or, execute the following command:
$ gucharmap
OR
$ gnome-character-map
OR
$ gnome-characters
- Select a character set from the Script or Unicode Block list box. Example: Basic Latin
- Select a character from the Character Table tabbed section. Example: @
- Click on the Character Details tabbed section.
A note about KDE users
Use KCharSelect utility for KDE desktop. KCharSelect is a tool to select special characters from all installed fonts and copy them into the clipboard.
A note about macOS/OS X Unix users
On the macOS/OS X, you need to use the Character Viewer application.
Check out related media
This tutorial is also available is a quick video format:
data:image/s3,"s3://crabby-images/96428/96428a60305c53d232df55309d6005fcddb3bb4c" alt=""
References
- See unum home page.