2 posts tagged “unicode”
Last night I worked for 5 hours for basic RnD about my upcoming product, a unicode based phonetic parser for Bangla language. This parser will parse keystrokes phonetically (or you better say translitterally), for example when we will type "amar" you will see bangla "AMAR".
I was stuck with a strange problem. But I solved that. Now conjunctions, Vowels and Consonants, Preceding Kars and Following Kars are working pretty fine. Though I need to spend some time with "Khanda To", "Ref", "Hasanth" and "Jo Fola". Also "Oi kar","Ou kar" and "II Kar" needs to be processed with special care.
Thanks to Paul Nelson from Microsoft (Well, ya its Microsoft) and developers of Unicode Renderers for *nix distros for their unicode rendering engine.
I am releasing my script under CC/LGPL dual license from Ekushey.org, if everything goes fine.
Well, If you just set the collation as "utf8-general-ci" that wont do all the things for storing and retrieving bangla unicode texts properly. There are something more to do. Last night while working for a project I just found the solution.
You must add these two lines just after selecting the database, i.e mysql_select_db() function.
mysql_query('SET CHARACTER SET utf8');
mysql_query("SET SESSION collation_connection ='utf8_general_ci'");
After executing these two statements MySQL will handle the rest.