Cohereがエッジデバイス向けオープンソース音声モデルを発表
Cohereは、エッジデバイス向けに展開可能な20億パラメータのオープンソース音声認識モデル「Cohere Transcribe」を発表した。
キーポイント
オープンソース音声モデルの公開
Cohereが「Cohere Transcribe」という音声認識モデルをオープンソースとして公開した。
エッジデバイス向け設計
このモデルは、クラウドではなくエッジデバイス上で直接動作することを想定して設計されている。
大規模なモデルサイズ
モデルは20億パラメータという比較的大きな規模を持ち、高精度な音声認識を実現する。
影響分析・編集コメントを表示
影響分析
この発表は、大規模言語モデル企業が音声認識分野に本格参入し、オープンソース化とエッジ展開という二つの重要なトレンドを組み合わせた点で意義深い。エッジAIの実用化を加速し、プライバシー重視の音声アプリケーション開発に新たな選択肢を提供する可能性がある。
編集コメント
エッジデバイス向けの大規模オープンソース音声モデルは実用面で注目だが、既存モデルとの性能比較や具体的なユースケースの提示が不足しているため、現時点での影響評価は慎重に行う必要がある。
Cohere、エッジデバイス向けオープンソース音声認識モデルを公開
Cohere Transcribeは、エッジデバイスへのデプロイを想定して設計された、20億パラメータのオープンソース音声認識モデルです。
原文を表示
2 Min ReadCohere is looking to capitalize on an enterprise trend of embedding automatic speech recognition into applications with a 2 billion parameter open source speech model. Cohere Transcribe, introduced on Thursday, is trained on 14 languages, including Chinese, Japanese, Polish, French and Greek. Cohere released the model under the Apache 2.0 license and said the model outperforms alternatives on the Hugging Face Open ASR Leaderboard, including ElevenLabs Scribe and Qwen3. The model will soon be integrated into Cohere's AI agent orchestration platform, North, according to the company. Cohere Transcribe is an example of the evolution of speech recognition models. Previously, speech models were designed using deep learning techniques such as long short-term memory, recurrent neural networks, and later, transformer-based architectures, which struggled to achieve low latency because of model size. Related:Mistral AI Launches Text-to-Speech ModelNew models such as Transcribe, however, are small enough to be deployed on edge devices. As the technology, infrastructure and capabilities have matured, ASR use cases have expanded, especially in customer service, banking, sales and marketing, which has led to an increase in ASR models from vendors such as IBM and Alibaba. Even video conferencing company Zoom has joined in the competition. In 2025, the video conferencing platform provider introduced AI Companion 3.0, which included real-time voice translation capability. It later introduced a separate feature that allowed participants to hear exchanges in their own language. "Speech is always going to be fundamental to AI," said Lian Jye Su, an analyst at Omdia, a division of Informa TechTarget. "That's how the whole AI movement started — because humans started to be able to interact with Siri." He pointed to a couple of Cohere Transcribe's features as being noteworthy, including its small size and the company’s decision to make the model open source. "When it's open source, you get developers to test it and then they will come back to you if they find the result to be good enough," Su said. "Then you can obviously commercialize a much better model." Meta has found success with this business model, influencing others such as Alibaba and Nvidia to follow suit. "Cohere is trying to copy that," Su said. But the company is focused on an area where it excels -- speech recognition and speech-to-text model, he added. While Cohere has traditionally focused on text generation, it could find an opportunity within speech recognition, especially as some enterprises look to upgrade traditional speech models that use transformers to the growing line of small ASR models that can be used on edge devices, Su continued. Related:Grammarly Rebrands as Superhuman, Intros Productivity AgentsAbout the AuthorNews Writer, AI BusinessEsther Shittu brings four years of expertise covering artificial intelligence technologies and industry trends. As co-host of the "Targeting AI" podcast, she talks to thought leaders and practitioners exploring critical AI developments. Previous to AI Business, she wrote for several publications including the New York Daily News, Bklyner and the Brooklyn Daily Eagle. When she's not diving deep into the world of AI, she spends her time on passion projects and raising her three daughters.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み