Sunday, March 9, 2014

Typing Arbitrary Unicode Characters in Linux

Ok, this one is a little strange. The X server that provides a graphical environment on Linux does not offer a feature to input arbitrary unicode characters by code point. It does provide a powerful feature to enter some unicode characters by mnemonic sequences with the compose key. GTK, a toolkit used by Firefox, OpenOffice, and Gnome applications, does provide a way to input arbitrary unicode characters. If you are using these and have not reconfigured it, you can input arbitrary unicode character by holding down the Ctrl and Shift keys while you type the letter u followed by the hex code for the unicode character you want. To be clear, the only keys you hold are Ctrl and Shift, the rest are typed in a sequence and then you release Ctrl and Shift.

Now, here is where it gets interesting. GTK overrides the default input for the X server and provides it’s own set of compose key sequences. This is nice for consistency, but the default configuration for the X server may provide compose sequences for characters that are not supported by GTK. Additionally, the X compose feature is configurable: you can add new sequences for characters you want to use. This is very convenient. It is possible to use the default X input method in GTK apps, but you lose the ability to input arbitrary unicode characters. If you prefer to use the default X input method, you can accomplish this by adding a line to your ~/.Xsession like this:

export GTK_IM_MODULE="xim"

You can also change this for a single app by entering the same thing on the command line in a terminal app and then launching the desired application from the same terminal. Now you will be able to use compose sequences such as Compose + #e to get a musical eighth note like this: ♪. You can also add your own compose key sequences by editing /usr/share/X11/locale/en_US.UTF-8/Compose. Personally, I think having a powerful, configurable compose key is more valuable than arbitrary unicode input, but I’m still looking for a way to have both.

No comments:

Post a Comment