Converting Japanese alphabets (Katakana & Hiragana) into Latin script

For a personal project, I needed to play around with Japanese alphabets (Hiragana and Katakana, that is). The project involved several simple tasks such as converting words written in Katakana or Hiragana to latin letters, finding strings written in Hiragana or Katakana and the conversion of voiced sounds into clear sounds. All solutions for these tasks are rather simple in PHP. Below is an approach to turn Japanese Hiragana and Katakana into Latin alphabet:

Converting words written in Katakana or Hiragana into Latin script

The goal is to translate ゴジラ into Gojira, とうきょう into Tōkyō and so on. This might look very easy since Japanese is a syllable language, but then we need to deal with small letters first, else きょう would be transcribed “kiyou” where it should be “kyō”. There is no function for this, so it’s necessary to put everything into an array.

You can test this function live here by pasting in Japanese text. As mentioned above, though, this function doesn’t handle Kanji (Chinese characters), so they won’t be changed. Also, as Japanese is written without spaces between words, the result is a large block of text.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<?php 
 
//the array covering all possible combinations. The transcription follows the Hepburn romanization system
 
 
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
 
$kana2roma = array('/きゃ/' => 'kya', '/きゅ/' => 'kyu', '/きょ/' => 'kyo', '/しゃ/' => 'sha', '/しゅ/' => 'shu', '/しょ/' => 'sho', '/ちゃ/' =>'cha', '/ちゅ/' => 'chu', '/ちょ/' => 'cho', '/にゃ/' => 'nya', '/にゅ/'=> 'nyu', '/にょ/' => 'nyo', '/ひゃ/' => 'hya', '/ひゅ/' => 'hyu', '/ひょ/' => 'hyo', '/みゃ/' => 'mya', '/みゅ/' => 'myu', '/みょ/' =>'myo', '/りゃ/' => 'rya', '/りゅ/' => 'ryu', '/りょ/' => 'ryo', '/ぎゃ/'=> 'gya', '/ぎゅ/' => 'gyu', '/ぎょ/' => 'gyo', '/じゃ/' => 'ja', '/じゅ/' => 'ju', '/じょ/' => 'jo', '/ぢゃ/' => 'ja', '/ぢゅ/' => 'ju', '/ぢょ/' => 'jo', '/びゃ/' => 'bya', '/びゅ/' => 'byu', '/びょ/' => 'byo', '/ぴゃ/' => 'pya', '/ぴゅ/' => 'pyu', '/ぴょ/' => 'pyo', '/あ/' => 'a', '/い/' => 'i', '/う/' => 'u', '/え/' => 'e', '/お/' => 'o', '/か/' =>'ka', '/き/' => 'ki', '/く/' => 'ku', '/け/' => 'ke', '/こ/' => 'ko', '/さ/' => 'sa', '/し/' => 'shi', '/す/' => 'su', '/せ/' => 'se', '/そ/' =>'so', '/た/' => 'ta', '/ち/' => 'chi', '/つ/' => 'tsu', '/て/' => 'te','/と/' => 'to', '/な/' => 'na', '/に/' => 'ni', '/ぬ/' => 'nu', '/ね/'=> 'ne', '/の/' => 'no', '/は/' => 'ha', '/ひ/' => 'hi', '/ふ/' => 'fu','/へ/' => 'he', '/ほ/' => 'ho', '/ま/' => 'ma', '/み/' => 'mi', '/む/'=> 'mu', '/め/' => 'me', '/も/' => 'mo', '/や/' => 'ya', '/ゆ/' => 'yu','/よ/' => 'yo', '/ら/' => 'ra', '/り/' => 'ri', '/る/' => 'ru', '/れ/'=> 're', '/ろ/' => 'ro', '/わ/' => 'wa', '/ゐ/' => 'wi', '/ゑ/' => 'we','/を/' => ' o ', '/ん/' => 'n ', '/が/' => 'ga', '/ぎ/' => 'gi', '/ぐ/' =>'gu', '/げ/' => 'ge', '/ご/' => 'go', '/ざ/' => 'za', '/じ/' => 'ji', '/ず/' => 'zu', '/ぜ/' => 'ze', '/ぞ/' => 'zo', '/だ/' => 'da', '/ぢ/' =>'ji', '/づ/' => 'zu', '/で/' => 'de', '/ど/' => 'do', '/ば/' => 'ba', '/び/' => 'bi', '/ぶ/' => 'bu', '/べ/' => 'be', '/ぼ/' => 'bo', '/ぱ/' =>'pa', '/ぴ/' => 'pi', '/ぷ/' => 'pu', '/ぺ/' => 'pe', '/ぽ/' => 'po','/aa/' => 'ā', '/o[ou]/' => 'ō', '/uu/'=> 'ū',"/・/" => '&middot;',"/(/" => ' (',"/)/" => ') ',"/、/" => ', ','/(っ)+([a-zA-Z]{1})/' => '$2$2',"/aー/" => "ā", "/iー/" => "ī", "/uー/" => "ū","/eー/" => "ē", "/oー/" => "ō", "/っ/" => "'");
 
 
//the function:
 
function kana2roma($str) {
		global $kana2roma;
 
		//turn Katakana into Hiragana so that we can transform them as well
		$str = mb_convert_kana($str,"KVHc","UTF-8");
 
		//convert the string
		$str = preg_replace(array_keys($kana2roma), array_values($kana2roma), $str);
 
		return $str;
}
 
 
// the string before the conversion:
 
$beforeString = "ドイツのビールはやっぱりおいしい";
 
$afterString = kana2roma($beforeString);
 
//output: doitsunobīruhayapparioishii
 
?>

Post a Comment

You must be logged in to post a comment.