• An end-to-end Vietnamese speech recognition recipe using ESPnet toolkit

    There is a shortage of open resources for Vietnamese language processing systems. In this latest effort, I will introduce a recipe for end-to-end Vietnamese speech recognition using ESPnet toolkit. The performance on the VIVOS corpus shows high error rate compared with conventional systems as the amount of training data is not sufficient. The recipe is available at ESPnet repository and can be easily adapted to others Vietnamese corpus.

  • A curated list of Japanese, Korean and Vietnamese open speech corpora

    I would curate a list of open speech corpora for academic uses of Japanese, Korean and Vietnamese. While speech processing systems achieves outstanding results exponentialy for major languages like English and Chinese, the development of other languages is not as active. This list was created to make it more easy to jump start a speech process project and spark interests in research and development of speech processing systems.

  • The quest for a programmable English dictionary

    Ever want to add a dictionary feature to your applications but don’t know where to start? Turn out Internet is a wonderful place and have everything you need to craft your own English dictionary either for your own uses or to intergrate into your products.

  • An insight into Vietnamese syllables usage

    As Vietnamese uses Latin alphabet with extra characters for writing system, it’s easy and common to mix English into the text. This is prone to be problematic for language processing systems as not only they need to take care of the source language but also be aware of other words that might appear.

  • All syllables in Vietnamese language

    Vietnamese is a monosyllabic language, that means each syllable is written seperately. Even though words can have one or more syllables, you can write all Vietnamese words just by knowing all syllables. But how many syllables are there in Vietnamese language? That should be answered in this post.