Provide Unicode character Classification and Character name in details. #409

bcowgill · 2023-01-06T14:35:24Z

When showing the details of a unicode character it would be useful to show the character classification and official Unicode character name.

Here's an example for a few characters at U+2325

$ utf8ls.pl U+0073 U+2325
s	U+73	[LowercaseLetter]	LATIN SMALL LETTER S
⌥	U+2325	[OtherSymbol]	OPTION KEY
⌦	U+2326	[OtherSymbol]	ERASE TO THE RIGHT
⌧	U+2327	[OtherSymbol]	X IN A RECTANGLE BOX
⌨	U+2328	[OtherSymbol]	KEYBOARD
〈	U+2329	[OpenPunctuation]	LEFT-POINTING ANGLE BRACKET
〉	U+232A	[ClosePunctuation]	RIGHT-POINTING ANGLE BRACKET

You could convert this output into a JSON lookup for each unicode code point to display along with the character.

You can generate a full table in json format with my perl script here:
https://github.com/bcowgill/bsac-linux-cfg/blob/master/bin/utf8ls.pl

> utf8ls.pl --all U+0000 | perl -pne 'BEGIN {print "{\n"} END {print "}\n"} chomp; m{U\+(\w+)\s+(\[\w+\])\s+(.+)}; $u = substr("000$1", -4); $_ = $2 ? qq{"\\u$u": { class: "$2", name: "$3"},\n}:""' > utf8.json

{
"\u0000": { class: "[Control]", name: "NULL"},
"\u0001": { class: "[Control]", name: "START OF HEADING"},
"\u0002": { class: "[Control]", name: "START OF TEXT"},

or I can provide it if you cannot run perl

The text was updated successfully, but these errors were encountered:

nbros · 2024-11-28T10:03:26Z

Yes please, it took me a while to understand what I was seeing with multibyte UTF-8 characters representing a non breakable space, which of course shows up blank currently. It would become evident if it showed instead something like "UTF-8 ' ' : No-Break Space (NBSP)"

I think the original list of character names is:
https://unicode.org/Public/UNIDATA/UnicodeData.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide Unicode character Classification and Character name in details. #409

Provide Unicode character Classification and Character name in details. #409

bcowgill commented Jan 6, 2023 •

edited

Loading

nbros commented Nov 28, 2024 •

edited

Loading

Provide Unicode character Classification and Character name in details. #409

Provide Unicode character Classification and Character name in details. #409

Comments

bcowgill commented Jan 6, 2023 • edited Loading

nbros commented Nov 28, 2024 • edited Loading

bcowgill commented Jan 6, 2023 •

edited

Loading

nbros commented Nov 28, 2024 •

edited

Loading