I’m currently working on an interesting project that involves Japanese speech lip-syncing of a 3D character. Since it was difficult to find resources on this from an English speaker, I decided to write on it.
Firstly, visemes are the unique facial positions required to produce phonemes, which are basic sounds from a particular language. Each language has multiple phonemes and visemes and each viseme can have multiple phonemes. In the English language, there are about 10 basic phonemes (a-i, e, o, u, c-d-g-k-n-r-s-y-z, f-v,th,l,m-b-p,w-oo-q) with one viseme each totaling 10 visemes.
When I first began looking into Japanese visemes and had luck finding resources, I decided to try to develop my own solution with my understanding of the language. In my view, the Japanese language is composed of some very basic sounds that are construed to make more sounds. Although the phonemes begin to add up, the actual visemes are almost exact. Combinations like じゃ(jya) can be formed accurately by combining their similar visemes い (i) and あ (a). The following diagram depicts the 5 visemes:
Again, I believe that with these 5 basic visemes, it will allow you to construct every mouth pose required for Japanese speech. Unfortunately, the character I am using lip-syncing for will most likely never have a tongue. This is important because, in Japanese speech, the tongue is used more often than other languages (I’m thinking of English, but similar Latin-based languages fall in that category) and requires less movement of the lips to make the language’s basic phonemes.
Finally, near the end of my researching for this particular topic, I found this webpage that explains everything I just did and more: http://www.ordix.com/pfolio/research/
I don’t particular like the visemes for the first two sounds as the character’s teeth stick out far too much for my liking. It is entirely possible that the character is making the correct sound, but when I try the phoneme in her pose, it feels quite strange.