How to refine the asr result? Like some domain words are not recognized
Hi I am building a ASR app for recognizing the speech in realtime. But I found usually the audio model can't get some domain words or some human names. It is so common to have context for llm, but for asr model, if there is a similar solution for this?