Working with non-english characters in linux and oracle

 

Over the past couple of years we’ve had horrendous problems with charactersets/codepages and the like when working with non ‘Western European’ characters and even sometimes with Western European ones.

I’m going to give a short example of how to get Hungarian characters (áéíöóőüúű for example) stored inside a database so they can be viewed in a sqlplus session in linux.

OK – here are the things that need to be in place:

1. Wherever the characters are being taken from this needs to be in utf8/unicode format (so a cut/paste from a website, or a load from a txt file the format of these characters needs to be correct or it won’t work from the very start – so for instance in notepad choose this option when you save the file:

image

2. transfer the file to unix using whatever tool you would normally use sftp/ftp/winscp etc – this needs to be done in ascii format to remove the windows special characters (CR,LF) that get appended to the end of the lines

3. Once the file is on unix the session must have some things set correctly:

export LANG=en_US.UTF8

And in putty make sure putty is switched to utf8 mode

image

Now we can type Hungarian characters and even create filenames with them in

image

4. Now we need to make this work inside oracle.

The first (fundamental) thing to ensure is that your database must have been created in a characterset that supports these characters – changing this afterwards (though often possible) can often be a lot of work and a pain to fix (though it is better in newer versions).

In this case I’m using :

select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

VALUE
--------------------------------------------------------------------------------
UTF8

If the db itself is OK we need to set the environment to match as this is not done automatically.

export NLS_LANG=.UTF8

Once that is all in place (and the putty stuff is all done) then this should all now work.

image

There you go – hopefully this helps someone out as its been a real pain for us to deal with…..

As this is for Unicode/utf8 it should work for ‘any’ characters Japanese/Korean etc


2 comments:

  1. Hi Richard,

    The image seems all gone and can't be loaded anymore..... I am really interest in this subject..... Please

    ReplyDelete
  2. Hi Ringo,
    They appear fine for me. Have you tried a different browser maybe?

    Cheers,
    Rich

    ReplyDelete