The Pronunciation Toolkit: 20+ Resources Every Language Learner Should Know
Ear training, shadowing, ASR feedback, pronunciation lookup, tonal-language tools, and three tool types I would stop using now.
Small practical note before we start: because this is quite a long resource post, some email apps may clip it. If you do not see the full article, click “View entire message” at the bottom, or open the post directly on Substack.
Over the past weeks, while working on this pronunciation series, I started putting together a list of tools I know, use, recommend, or find interesting from a research perspective. I am sure there are more. Probably many more. So if you use a pronunciation tool that deserves a place here, please leave it in the comments. I would love this post to become a living resource for serious adult learners.
If you are new to this pronunciation series, you may want to start with my previous free post, Why Pronunciation Feels So Hard in a New Language, where I explain why adult pronunciation is so difficult in the first place: the ear filters unfamiliar sounds, the brain pulls them into old categories, and the mouth keeps returning to familiar habits.
This post is a practical toolkit for adult language learners who want to train pronunciation without getting lost in the app jungle. I’ll walk you through 20 useful resources, grouped by their primary function: ear training, shadowing, ASR feedback, visualization, and pronunciation lookup.
And if you are learning Mandarin, Vietnamese, Thai, Cantonese, Yoruba, or another tonal language, you’ll also find a few extra tools for tone training, where pronunciation needs a slightly different approach.
I’ll also show you three kinds of tools I would stop using right now, because they feel productive but rarely lead to real change.
Just to be clear, I am not affiliated with any of these companies, and this is not a sponsored post.
Here is my list for now.
Category 1: Ear training (high-variability phonetic training)
If your ears cannot yet hear a sound clearly, your mouth has no chance.
One of the most evidence-backed pronunciation techniques is high-variability phonetic training (HVPT): hearing a target sound spoken by many different voices, in many different contexts, until your brain builds a reliable category for it.
The most recent meta-analysis covered seventy-nine studies and found very large effects on perception that transferred to production (Uchihara, Karas, & Thomson, 2025).
These are the tools that actually do HVPT properly.
1. English Accent Coach (englishaccentcoach.com).
Free, web-based, research-built. Designed by Ron Thomson at Brock University, the researcher behind the HVPT meta-analysis. It is a game-style training tool that drills you on English vowels and consonants spoken by dozens of voices. Used in dozens of published studies. The version 4.3 update added a mobile-friendly experience. If you are learning English, start here.
2. Forvo (forvo.com).
Free.
The world’s largest database of human-recorded word pronunciations, across hundreds of languages, with multiple speakers per word in most cases. Not a training tool in the strict sense, but functionally the easiest way to get HVPT-style multi-voice exposure for any word you want to learn.
Search for a word, listen to five or six different speakers, notice how the sound varies, and your brain starts building the right category.
3. YouGlish (youglish.com).
Free.
Search any word in English (or French, Spanish, German, Russian, Italian, Portuguese, Mandarin, Japanese, Korean, Polish, Dutch, and a growing list), and YouGlish pulls up YouTube clips of native speakers saying that exact word in context. Twenty, thirty, fifty clips for common words. Variability is built in because the speakers are different and the contexts are real. This is the single most underused tool in adult language learning.
Every article and PDF guide here takes weeks of research, writing, and rewriting to turn dense science into something clear you can use tonight. I’m one person doing this work with care. Upgrade to paid to get full access to everything I create and help me keep building it.
Category 2: Shadowing and active imitation
Once your ears can hear the target, the next step is connecting perception to production.
Shadowing (repeating a model speaker’s speech in near-real-time, matching their rhythm and melody) is one of the most-studied techniques in the field, and recent classroom research keeps confirming that ASR-supported shadowing produces real pronunciation gains for adult learners (Le Thi My Duyen, 2025; Conti, 2025).
4. Language Reactor (languagereactor.com).
Free with a paid Pro tier. A browser extension that adds bilingual subtitles and learner-friendly controls to Netflix and YouTube. You can pause, repeat, slow down, loop a single sentence, and shadow line by line. Works for most major languages.
5. eJOY English (ejoy-english.com).
Specifically built around shadowing. The platform pulls video clips with subtitles, lets you record yourself shadowing a line, and gives ASR feedback on how close you got.
A 2025 classroom study with Vietnamese adult learners found that eJOY-supported shadowing produced clear pre-post gains in pronunciation of word-final consonants over two months (Le Thi My Duyen, 2025).
6. Speechling (speechling.com).
Combines ASR with human coach feedback. You record yourself repeating a sentence, and the platform sends your recording to a real human coach who responds (usually within a day) with corrections and tips. Multiple languages, free tier with limited recordings, paid plans for unlimited submissions, and faster turnaround.
The human-coach element is rare in the AI era and genuinely valuable.
7. Glossika (glossika.com)
Sentence-based audio practice that uses spaced repetition. You hear a native speaker say a sentence, you repeat it, and the system schedules it again at increasing intervals. The methodology is good for combining pronunciation with grammar in context. Available for over sixty languages.
8. Trancy AI (trancy.org)
A newer browser extension that overlays bilingual subtitles on YouTube and Netflix and adds AI Talk and AI Shadowing features with sentence-level auto-pause. Works in the player itself rather than as a separate workflow.
Category 3: ASR feedback (your mouth, scored)
Automatic speech recognition tools listen to you produce a word and score how close you got to a target.
ASR-based feedback produces medium-to-large gains in pronunciation accuracy, especially for segmentals and especially at intermediate levels (Ngo et al., 2024).
The caveats matter. ASR is good at scoring individual phonemes, less reliable for suprasegmentals like rhythm and intonation, and sometimes too strict (it marks pronunciations as wrong that native listeners would accept).
9. ELSA Speak (elsaspeak.com).
The heavyweight of phoneme-level pronunciation apps, with over 25 million users and AI that breaks your speech down to the individual sound.
English only. The phoneme precision is unmatched, the personalized study plans actually work, and the CEFR-level tracking is useful. The complaints are real: it can be frustratingly strict, the drill format produces fatigue, and it is biased toward American English (British, Australian, and other accents sometimes get marked down for being themselves).
For targeted segmental work, it is genuinely effective. For conversational fluency, you need other tools.
10. SpeechAce (speechace.com)
A more research-oriented ASR engine, used by some IELTS prep platforms and language schools. Less consumer-facing than ELSA, but the scoring is more transparent.
11. BoldVoice (boldvoice.com).
English only, pairs ASR feedback with short video lessons from Hollywood accent coaches who actually train actors. Recently raised $21 million, which suggests serious institutional confidence in the approach.
Best for intermediate English learners who want to refine specific sounds with both visual articulation modeling and scoring.
12. Hello Nabu (hellonabu.com).
Newer, free, AI-feedback-driven, designed around story-based conversational practice rather than isolated drills. The contextual approach addresses one of the biggest limitations of ELSA-style apps (which is that drills do not transfer to spontaneous speech). 2026 launch.
13. ChatGPT Voice / Claude voice mode.
Free (with paid tiers). I include these with caveats. Conversational AI voice modes do not provide pronunciation feedback in the technical sense, but they do let you have real-feeling conversations in dozens of languages with no judgment and no time pressure. Useful for transfer-to-spontaneous-speech work once your segmentals are in shape. Not a substitute for ASR-scored practice.
Category 4: Visualization tools (for the curious)
Some learners benefit from seeing their pronunciation, not just hearing it. The acoustic side of speech (pitch contours, formants, voicing) can be visualized, and such visual representations help some learners understand exactly what their mouth is doing differently from the target.
14. Praat (praat.org).
Free. This tool in academic phonetics was developed at the University of Amsterdam. You can record yourself and a native speaker saying the same word, then see waveforms, pitch contours, and spectrograms side by side. Steep learning curve, but if you are curious about why your version of a word sounds different from the native one, Praat will show you with millisecond precision.
15. Sounds of Speech (soundsofspeech.uiowa.edu)
University of Iowa, free, web-based. Interactive animations of how the mouth, tongue, and vocal folds produce each sound in English, German, and Spanish. Useful for visual learners who want to understand the mechanics of articulation before drilling production.
16. The Interactive IPA Chart (ipachart.com)
Free. Click any IPA symbol and hear it spoken by a phonetician. Useful as a reference once you start using IPA in your training notes (my paid pronunciation series shows you how to do that).
Category 5: Reference and lookup
These are not training tools, but they live alongside training. You will reach for them constantly when a single word stumps you.
17. Cambridge Dictionary (dictionary.cambridge.org).
Free. Every entry has both British and American audio, plus IPA transcription. The audio is professionally recorded and clean.
18. Forvo for word lookup (covered above as a training tool, but use it for one-off pronunciations too).
19. Oxford Learner’s Dictionary (oxfordlearnersdictionaries.com).
Free. Similar to Cambridge but with more attention to phrasal verbs, collocations, and learner-friendly examples. Audio for both British and American.
20. WordReference (wordreference.com)
Free. One of the most useful dictionary sites for language learners, especially if you study Romance languages such as Italian, Spanish, French, or Portuguese. I love it and use it for Spanish and Italian.
It provides translations, example phrases, audio pronunciations for many entries, verb conjugations, and, perhaps most importantly, forum discussions where native speakers explain tricky usage questions. It is not a pronunciation training tool, but it is excellent for checking how a word is actually used, how it sounds, and which translation fits the context.
Tonal Languages Need a Different Pronunciation Toolkit
If you are learning Mandarin, Vietnamese, Thai, Yoruba, Cantonese, or any tonal language, pronunciation training has an extra layer of difficulty. You are not only training individual sounds, stress, rhythm, and intonation. You are also training tone, which means your pitch movement can change the meaning of the word itself.
This is why many of the tools above will only take you partway. They may help with vowels, consonants, word lookup, shadowing, or general listening, but they are mostly designed for non-tonal languages. If you are learning a tonal language, your most urgent question is often not “Did I pronounce the sound correctly?” but “Did I produce the right pitch movement?”
For tonal training, you need multimodal support: sound, tone marks, pitch contours, visual feedback, gestures, and, ideally, human correction.
The research on this shows that gesture-supported tone training can outperform ear-only practice because tone is not just something learners hear. It is something they often need to see, feel, and physically map onto movement (Morett, 2023; Farran & Morett, 2024).
So, if you are training a tonal language, I would treat these tools as a separate category.
Pleco is probably the strongest learner dictionary for Mandarin. It gives you character lookup, pinyin with tone marks, native-speaker audio, example sentences, and add-ons for more serious study. It is not a full pronunciation-training system, but for checking tones, audio, characters, and usage, it is one of the most useful tools a Mandarin learner can have.
MDBG Chinese Dictionary is another useful Mandarin dictionary, especially if you want a clean web-based lookup tool with characters, pinyin, tone marks, definitions, and example-based support. It is less powerful than Pleco as a full learner ecosystem, but very convenient when you want to check a word quickly.
ChinesePod can also be useful because it gives you sentence-level listening, pinyin support, and lots of contextual audio. For tone learning, this matters because tones do not live in isolation. They shift, connect, and weaken inside real speech.
Mandarin Blueprint is worth mentioning for learners who want a more structured Mandarin system, especially because it pays attention to pronunciation, tones, characters, memory, and sentence-building as connected parts of the same learning process.
Pinyin Mate and other tone-trainer apps can help because they show pitch movement visually as you speak. This is especially useful when you think you are producing a rising tone, but your voice is actually staying flat, falling too early, or starting in the wrong pitch range.
For Vietnamese, I would use VDict and Forvo together. VDict is useful for dictionary lookup, while Forvo lets you hear real speakers. That matters a lot because Vietnamese tones vary across regions, and one clean dictionary recording will not give you enough variability.
For Thai, thai-language.com is a strong starting point because it provides dictionary entries with pronunciation, tone, transliteration, and example sentences. Again, I would pair it with Forvo or YouGlish-style listening whenever possible, because tonal pronunciation needs repeated exposure to real voices.
For Cantonese, look for tools that include Jyutping, tone numbers, and native-speaker audio. CantoDict and Jyut Dictionary are useful starting points. A dictionary without tone information is not enough. You need to see the tone category and hear it repeatedly in real words and phrases.
And for almost any tonal language, Forvo remains useful because it gives you native-speaker recordings across many languages. It will not teach you tones systematically, but it can help you compare how real speakers pronounce the same word.
The main point is this: if you are learning a tonal language, do not rely on audio alone, and definitely do not rely only on “repeat after me” practice. You need tone marks, visual pitch information, many voices, sentence-level listening, and, ideally, feedback from a teacher or tutor who understands tone training.
For tonal languages, the multimodal piece is not optional. Plan for it from the beginning.
A small note before we continue
If you have been thinking about becoming a Founding Member, these are the FINAL days to join before the upcoming change on May 20.
Founding Members receive everything in the paid tier, plus a deeply personalized language-learning plan and priority support. I explained the details here: Thinking About Becoming a Founding Member?
Now, back to pronunciation tools, and to three kinds I would be very careful with.
Three categories of tools to stop using right now
After the recommendations, the warnings. These are categories of tool many learners use faithfully, and they produce few real gains.
Generic “perfect your accent” apps with one voice.
If the app has one voice modeling each sound, your brain builds a category that fits only that voice. The moment another speaker produces the same sound slightly differently, you may not recognize it.
This violates the most well-established principle in pronunciation training: variability. The HVPT meta-analysis (Uchihara et al., 2025) showed dramatically larger effects for multi-voice training than for single-voice training. Single-voice apps do not train your pronunciation on a broader scale. They are training your imitation of one specific speaker.
Mirror-only practice.
Standing in front of a mirror and watching your own mouth move is a tool for actors and speech therapists, not for adult language learners working on their own. The mirror gives you visual feedback on lip and jaw position, but it gives you no auditory feedback, no comparison to a native model, and no objective measure of whether you actually produced the sound.
You can spend an hour in front of a mirror, feeling like you are working hard, and produce zero changes in your auditory output.
Repeat-after-me apps with no feedback.
Some apps just play you a model speaker and ask you to repeat. No scoring, no comparison, no correction. These are exposure tools, not training tools. They might build your listening comprehension, but they do almost nothing for your pronunciation, because you have no way of knowing whether what you produced was right.
The minimum bar for a real pronunciation training tool in 2026 is some form of feedback: ASR-scored production, comparison to your own recording, or a human coach. If the tool only plays a model and waits for you to mimic it, the brain has no signal to learn from.
The tool is not the system
A great tool, used without a strategy, will not improve your pronunciation. A mediocre tool used inside a strategy will.
The diagnostic-and-strategy work I published on Thursday tells you which of these tools to reach for first, given your specific target language, your first language, and your specific problem areas.
Without that ordering, the toolkit above is just a list. With it, the list becomes a sequence: ears first (HVPT), then individual sounds (segmentals), then rhythm and stress (suprasegmentals), then melody (intonation), then shadowing to bridge perception and production, then transfer to real conversation.
On Thursday next week, I will publish a comprehensive pronunciation strategy guide, The Six Layers of Training (Part 2 of the paid system), which walks through each layer with specific strategies and implementation steps and tells you which of the tools above belongs to which layer.
For this weekend, two things to try.
If you do not have a current pronunciation training routine, download English Accent Coach if you are learning English, or set up Forvo and YouGlish bookmarks if you are learning a non-English language.
Spend ten minutes today listening to one sound that you suspect you cannot yet perceive cleanly, played by five or six different voices. Notice what your ears do.
If you already have a training routine, run a quick audit: which tools are you actually using, and which category does each belong to? Are you working on perception, production, or transfer? Are you working with one voice or many? Do you get feedback, or are you flying blind?
A toolkit becomes a system the moment you can answer those questions.
I hope this collection saves you time, money, and months of drilling the wrong thing.
Good luck with your pronunciation training!
One final reminder:
Founding Membership changes on May 20. If you have been seriously considering upgrading, these are the last days to join before the change. You can read the details here: Thinking About Becoming a Founding Member?
Thank you, as always, for reading and supporting this work!
I’d love to hear from you in the comments: What pronunciation tools have helped you most so far? Have you tried YouGlish, ELSA Speak, Speechling, Forvo, Google pronunciation feedback, a dictionary with audio, a tutor, or something else entirely?
If this felt useful, feel free to share the post and publication with someone who might need it too.
References
Conti, G. (2025, July 26). Shadowing for fluency, prosody, and listening comprehension: The what, why, and how according to SLA research. The Language Gym.
Farran, B. M., & Morett, L. M. (2024). Multimodal cues in L2 lexical tone acquisition: Current research and future directions. Frontiers in Education, 9, 1410795.
Le Thi My Duyen. (2025). The impact of shadowing technique using eJOY on improving the final sound pronunciation. Proceedings of International Academia.
Morett, L. M. (2023). When and why is gesture informative for L2 lexical tone learning? Insights from N400 event-related potentials. Bilingualism: Language and Cognition, 26(5), 999-1015.
Ngo, T., Saito, K., & Tierney, A. (2024). The effects of automatic speech recognition feedback on second language pronunciation: A meta-analysis. Studies in Second Language Acquisition.
Thomson, R. I. (2025). English Accent Coach (Version 4.3) [Computer program]. www.englishaccentcoach.com
Uchihara, T., Karas, M., & Thomson, R. I. (2025). High variability phonetic training (HVPT): A meta-analysis of L2 perceptual training studies. Studies in Second Language Acquisition.















Yes, this truly will save me a lot of time and money.
This list a priceless resource for language learning.
Thank you for putting them together.
Happy it helps ☺️ thanks, Albert!