Friday, June 12, 2009

Handling multiple encodings in Vim

Most people edit, load and save files in a single character encoding (i.e. en_US.UTF-8) but for many this is not the case. For me I need to write documents and emails in japanese UTF-8 (ja_JP.UTF-8), Latex in EUC-JP (ja_JP.EUC-JP) and source code comments in Shift-JIS (ja_JP.SJIS).

To handle a certain character encoding there are three things you must consider:

  • Your editor character encoding

  • Your terminal character encoding

  • Your font character encoding support

In the prehistoric era editors and terminal could support a single character encoding so you needed a different pair for each encoding you needed. Believe me when I tell you this was not fun at all.

These days most editors and terminals support a large array of character encodings and a lot of free fonts are available that support all the character sets I need. Still it is necessary to reconfigure each part (editor/terminal) or create different profiles for each character encoding you needed to edit (i.e. link) for the editor and the console.

Today I took the time to understand how vim character encoding support works and I found that it has everything I need and a lot more. By carefully manipulating the the fenc, fencs, enc and tenc configuration parameters I can edit any file in any character encoding with little effort. Here are my .vimrc configuration parameters:


"" Character encoding settings

"" By manipulating this variables it is possible to edit all files in one

"" encoding while using the terminal in a different encoding and writing/reading

"" the file in another encoding. Here we set all three variables to UTF-8.


" Default file encoding for new files

setglobal fenc=utf-8

" Auto detect file encoding when opening a file. To check what file encoding was

" selected run ":set fenc" and if you know the auto detection failed and want to

" force another one run ":edit ++enc=<your_enc>".

set fencs=utf-8,euc-jp,sjis

" Internal encoding used by vim buffers, help and commands

set enc=utf-8

" Terminal encoding used for input and terminal display

" Make sure your terminal is configured with the same encoding.

set tenc=utf-8

  • tenc:

    This is the character encoding used to display and input text to the terminal. I configure my terminal (Konsole) always in UTF-8 and as far as I know my input method for Japanese (scim/anthy) is also UTF-8 so to avoid visual/input problems I leave this in UTF-8.

  • enc:

    The encoding used internally by vim buffers, help and commands. This does not need to be the same as tenc as vim will convert from one encoding to the other if they differ. This way you can use your native language encoding in your terminal and input method and let vim handle everything internally using UTF-8.

  • fenc:

    Is the character encoding used for reading/writing files. Again this can differ from enc and tenc because vim will convert between them if they differ. This way you can have your terminal configured with your native language (i.e. Japanese, Russian, Chinese...), let vim work internally in UTF-8 and finally save your files in any coding you want by setting fenc.

  • fencs:

    This is used by vim to try to auto detect the character encoding when opening an already existing file. The order in which you put the options is important so read the help ":h fencs" to learn how to set this correctly. For example if I put euc-jp first in the list all my English documents will be detected as euc-jp instead of utf-8 because all English characters are a subset of euc-jp, the same goes for latin1 and cp1250 encodings so make sure to put these at the end of the list. If the auto detection fails and your document is not displayed correctly you can always reload it forcing an encoding using ":edit ++enc=euc-jp" of course replace euc-jp with your desired encoding.

In my example above I set everything to UTF-8 that is recommended because converting from other encodings may cause loss of information. The only parameter I change is fenc when I need edit/save a file in a different encoding.

For example if I want to create a new file in euc-jp encoding:

- Open new file as normal using vim<
- Set file encoding using :set fenc=euc-jp
- Edit/Save as much as you like and rest assured that your file is euc-jp.

To edit an existing file simply open it and let vim auto detect the encoding using the options available in fencs. To check what encoding was set by vim you can use the command ":set fenc" and it will display the auto detected encoding. If it is not the correct one you can force the encoding by reloading the file using the command ":edit ++enc=euc-jp" replacing "euc-jp" with the encoding you desire.

1 comment:

  1. Thanks for your post. I found it useful.