TechCrunch AI·2026年6月4日 00:00·約7分で読める

ゴールドマン・サックスとメタを辞めた二人の創業者が、他社が見落とし市場向けに音声 AI を構築

#Voice AI #Small Language Models #Edge Computing #Emerging Markets

TL;DR

ゴールドマン・サックスとメタを辞めた創業者らが、アフリカや中東の複雑な方言と低遅延要件に対応するため独自モデルを開発し、300万ドルの資金調達に成功した。

AI深層分析2026年6月12日 23:09

重要/ 5段階

深度40%

キーポイント

未開拓市場への特化と資金調達

アフリカや中東といった主要プレイヤーが見過ごしてきた市場向けに構築されたVoice AIスタートアップ「AethexAI」が、4DX Ventures 主導で300万ドルのプレシード資金を調達した。

独自モデルによる遅延解消

既存のオーケストレーションツール（Vapi や LiveKit）に依存せず、地域固有の方言や低帯域環境に対応するため、小規模な独自モデルとオーケストレーション層をゼロから構築した。

創業者の背景と市場課題

ゴールドマン・サックスやメタ出身の創業者らが、現地のコールセンターで発生する高いレイテンシやジッターという実務上の課題を解決すべく起業した。

開発者向けプラットフォーム展開

企業向けの試用プラットフォームの立ち上げに加え、開発者が独自モデルを実験できるよう API や SDK の提供を開始する。

小規模モデルと独自データ収集戦略

遅延問題を解決しつつ精度を維持するため、30億〜17 億パラメータの軽量な「Kora」シリーズを開発し、アフリカのラジオ局へのハードドライブ配布や学生による注釈付けネットワークを活用して低コストで多様な音声データを収集している。

市場固有の課題と導入アプローチ

西側向けに設計された既存システムが方言やコードスイッチングに対応できないため、契約エンジニアを現地に派遣し、テレコム事業者との提携を通じて「即席」では解決できないローカル市場向けのインフラ構築を行っている。

顧客への段階的導入と主要ユースケース

顧客には一度に全てを提供するのではなく、最も重要な1 つのユースケース（債務回収、顧客活性化、KYC 認証など）から始めさせるアプローチをとり、オンサイトデモやワークショップを通じて自動化の可能性を特定している。

影響分析・編集コメントを表示

影響分析

この記事は、Voice AI の普及が欧米中心のモデルから、地域固有のインフラ制約や言語的多様性を考慮した「ローカライズされた最適化」へとシフトする重要な転換点を示しています。大規模モデル一辺倒の潮流に対し、小規模モデルによる遅延削減と独自オーケストレーションが新興市場での実用性を担保する新たなパラダイムを提示しており、グローバル AI 展開戦略における地域適応の重要性を再認識させる内容です。

編集コメント

Voice AI の普及において、単にモデルの性能を上げることだけでなく、現地のネットワークインフラや言語的多様性に合わせた「適応型アーキテクチャ」が不可欠であることが浮き彫りになりました。新興市場への展開を目指す企業にとって、このケーススタディは重要な示唆を与えるでしょう。

カスタマーサポートとサービスは、現在音声 AI において最も注目されている分野の一つです。しかし、人間のように聞こえ、目に見える遅延なく応答する製品を構築することは、市場によって難易度が大きく異なり、主要プレイヤーの多くはアフリカや中東を意識して作られていません。

このギャップを埋めるために昨年設立されたスタートアップ「AethexAI」は、4DX Ventures が主導し、Enza Capital、Dorm Room Fund、Mojo Ventures、Stanford GSB 26 Fund が参加するシードラウンドで 300 万ドルの資金調達を行いました。個人投資家にはスタンフォード大学の教員、通信業界のエグゼクティブ、Anthropic の AI 研究者らが含まれています。

既存のオーケストレーションツールである Vapi や LiveKit を使用するのではなく、同社は独自の小型モデルとオーケストレーション層をゼロから構築し、対象市場で話される英語、フランス語、アラビア語のローカライズされた方言に対応しています。この決定は、後述する通り、地域での事業運営における特有の要請によって導かれたものです。

同社はまた、企業がその技術を試してサービスに登録できるようプラットフォームを立ち上げると同時に、開発者が自社のモデルを実験できるよう API や SDK も提供します。

このスタートアップは、マリアマ・ディアロとアヨルゥワ・オデムイワによって設立されました。CEO のディアロ氏はゴールドマン・サックスで勤務した後、YC 支援企業である ModelML に製品および成長担当として入社しました。CTO のオデムイワ氏はカリフォルニア工科大学を卒業し、メタで働いた後、スタンフォード大学ビジネススクールに入学しましたが、その後共同創業に至りました。この二人は新興市場向けに何かを構築したいと考え、機会を探し始めました。

世界中の企業が、業務の一部を自動化するために AI ツールの導入競争を繰り広げています。しかし、それが常にうまくいくわけではありません。エジプトではあるコールセンターが通話の大きな割合を自動化しましたが、結果が悪かったためシステムを撤回したことを、創業者たちは発見しました。アフリカにあるいくつかのサポートセンターからは、適切なコストで通話を自動化するエンジニアを見つけ、採用することがPersistent な頭痛の種であるという話を聞きました。

「この地域における自動化された通話で見られたレイテンシとジッターは許容できないレベルでした。もし私たちがオーケストレーター（調整役）になっていたら、地域外にホストされた大規模モデルを使用する必要があり、結果としてレイテンシが高くなっていたでしょう。私たちはこれを機能させるためには、非常に小さなモデルを使用し、あらゆる段階でレイテンシを削減しなければならないと気づきました」とオデムイワ氏は、自社独自のモデルとオーケストレーション層（調整層）を構築する決定について TechCrunch に語りました。

最新のモデルをデプロイする AI ラボは通常、トレーニングとデータ取得に数百万ドルを費やします。AethexAI はこの両方の課題に対する解決策を見つけました。最大のモデルを追うのではなく、遅延問題を解決しつつ精度を維持するには小規模なモデルで十分だと判断し、3 億から 17 億のパラメータを持つ独自の Kora シリーズを開発しました。これは大規模言語モデル（LLM）のサイズのごく一部に過ぎませんが、まさにそれがポイントです。

これらのモデルをトレーニングするために、スタートアップはコールセンターパートナーからの匿名化された録音を使用しました。また、より多くの音声データを収集するため、アフリカ中のラジオ局へハードドライブを送付しました。コストを抑えるため、大学学生の貢献者ネットワークを構築してデータの注釈付けやローカル名の発音を担当させました。その結果、同スタートアップによると、現在は 1 日あたり 17,000 件以上の通話を処理できるようになっています。

ビジネス面では、音声 AI に初めて触れるクライアントがプロセスを理解できるよう配慮し、オンサイトでのデモやワークショップを提供して、自動化に最適なユースケースを特定するのを支援しています。

「私たちは常に顧客に対し、現時点で誰にでも何でも提供することはできないとお伝えしています。私たちは小規模な企業です。ある企業と話し始める際、まず最も重要な 1 つのユースケースを選んでもらうよう依頼します」と Diallo は述べています。

このスタートアップは全業界での協業に開放的ですが、現時点ではそのユースケースの大きな部分を、債務回収や顧客活性化、KYC（Know Your Customer verification：銀行や通信事業者が使用する標準的な本人確認プロセス）に関する通話に充てています。同社は現地の市場に対応するため契約ベースで現場展開型のエンジニアを採用しており、音声 AI 通話の電話回線処理を担うために通信事業者とのチャネルパートナーシップを構築しています。同社によれば、ここではプラグアンドプレイ型ソリューションでは機能しないといいます。

4DX Ventures の共同創設者兼管理パートナーであるウォルター・バドゥーは、アフリカおよび中東市場は、多くの音声 AI 企業が対象としてきた市場とは根本的に異なると主張しています。

「アフリカと中東の企業は、顧客との対話において依然として音声チャネルが支配的であるため、欧米の同規模企業と比較して通話量が約3倍処理しています」と彼は述べています。「既存のシステムは、高性能 GPU インフラストラクチャ、標準的な英語および欧州圏の発話環境、米国や欧州で一般的なエンタープライズワークフローを特徴とする西洋市場向けに構築されました。その結果、方言、コードスイッチング（言語の切り替え）、非公式な話し方パターンに対応し、既存の電話インフラストラクチャ内で動作し、実際の価格帯に適応するシステムが必要となる際に、大きなギャップが生じます。」

言い換えれば、ElevenLabs、Deepgram、Sierra、Cognigy といった企業が急速にグローバル展開を進める一方で、それらが元々構築された市場と、現在参入しようとしている市場が常に一致しているわけではありません。AethexAI などのスタートアップは、地域固有の方言に特化したモデルや、現地でのパートナーシップ、その地域向けに構築されたインフラストラクチャといった「隙間」こそが、大手企業が対応するインセンティブもアーキテクチャも持たない新たな市場機会であると賭けています。

*当記事内のリンクを通じてご購入いただいた場合、私たちは少額のコミッションを獲得する可能性があります。これは当社の編集の独立性には影響しません。

原文を表示

Customer support and service are among the hottest sectors in voice AI right now. But building a product that sounds human and responds without noticeable delay turns out to be much harder in some markets than others — and most of the major players weren’t built with Africa and the Middle East in mind.

AethexAI, a startup founded last year to close that gap, has raised $3 million in pre-seed funding led by 4DX Ventures, with participation from Enza Capital, Dorm Room Fund, Mojo Ventures, and Stanford GSB 26 Fund. Individual investors include Stanford faculty, telecom executives, and AI researchers from Anthropic.

Rather than using existing orchestration tools like Vapi and LiveKit, the company built its own small model and orchestration layer from scratch to handle the localized dialects of English, French, and Arabic spoken across its target markets — a decision driven, as we’ll get to, by the particular demands of operating in the region.

The company is also launching its platform for enterprises to try out its tech and sign up for its services, along with APIs and SDKs for developers to experiment with its models.

The startup was founded by Mariama Diallo and Ayooluwa Odemuyiwa. CEO Diallo worked at Goldman Sachs and later joined YC-backed ModelML as a product and growth hire. CTO Odemuyiwa graduated from Caltech, worked at Meta, and enrolled at Stanford Business School before co-founding the company. The pair wanted to build something for emerging markets and started looking for opportunities.

Businesses around the world are racing to adopt AI tools to automate parts of their operations. But that doesn’t always work out. In Egypt, a call center automated a significant share of its calls, but rolled the system back because of poor results, the founders found. Several support centers in Africa told them that finding and hiring engineers to automate calls at the right cost was a persistent headache.

“The latency and jitter that we saw on automated calls in this region were outrageous. If we had become orchestrators, we might have had to use large models that were hosted outside the region, resulting in higher latency. We realized that in order for this to work, we have to use very small models and cut latency at every step,” Odemuyiwa told TechCrunch about the decision to build the company’s own models and orchestration layer.

AI labs that deploy their latest models usually spend millions training them and acquiring data. AethexAI found a solution for both. Rather than chasing the largest possible models, it decided that small models are enough to tackle the latency problem while maintaining accuracy and developed its own Kora series, with parameters ranging from 300 million to 1.7 billion. That’s a fraction of the size of the LLMs, which is precisely the point.

To train these models, the startup used anonymized recordings from a call center partner. It also shipped hard drives to radio stations across Africa to collect more audio data. To keep costs down, it built a contributor network of university students to annotate data and pronounce local names. As a result, the startup says, it’s now handling more than 17,000 calls per day.

On the business side, the company is taking care to walk clients who are new to voice AI through the process, offering onsite demos and workshops to help them identify the best use cases for automation.

“We always tell customers that we cannot be everything for everybody right now. We’re small. When we start talking to a company, we ask them to pick one use case that is the most important to them to start [with],” Diallo said.

The startup is open to working across all industries, but at the moment, a big part of its use cases involves calls for debt collection, customer activation, or KYC — Know Your Customer verification, the standard identity-checking process used by banks and telecoms. The company is hiring forward-deployed engineers on a contract basis to serve local markets and building channel partnerships with telecoms providers to handle telephony for voice AI calls. Plug-and-play solutions, it says, simply won’t work here.

Walter Baddoo, co-founder and managing partner of 4DX Ventures, argues that the Africa and Middle East market is fundamentally different from the markets most voice AI companies were built to serve.

“Enterprises in Africa and the Middle East process roughly three times the call volume of their Western counterparts, as voice is still the dominant channel for customer interaction,” he said. “Incumbent systems were built for Western markets characterized by high-end GPU infrastructure, standard English and European speech environments, and enterprise workflows common in the U.S. and Europe. That creates real gaps when enterprises need systems that handle dialects, code-switching, and informal speech patterns, and that work within their existing telephony infrastructure and their actual price points.”

Put another way, while companies like ElevenLabs, Deepgram, Sierra, and Cognigy are expanding globally at a fast pace, the markets they were built for and the markets they are entering aren’t always the same thing. Startups like AethexAI are betting that the gaps — models specialized in local dialects, on-the-ground partnerships, infrastructure built for the region — represent a market opening that the giants have neither the incentive nor the architecture to close.

*When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.*

この記事をシェア

TLDR AI★42026年6月18日 09:00

NVIDIA XR AI を用いた AR グラスおよび XR デバイス向け AI エージェントの構築

NVIDIA は、クラウドやエッジで動作する GPU 加速 AI サービスと拡張現実デバイスを接続するための再利用可能な基盤「XR AI」を公開ベータ版として提供開始した。このオープンソースライブラリにより、開発者はユーザーの視界を理解し、意図を認識してエンタープライズツールを呼び出す知能型エージェントを構築できるようになる。

Vercel Blog★32026年6月13日 09:00

Workflow SDK が Nitro v3 でネイティブ実行可能に

Vercel は Workflow SDK の Nitro v3 統合をベータ版として公開し、ワークフローステップをアプリと同じランタイムで実行可能にした。これによりサーバーサイド API を直接利用でき、開発サーバーでワークフロー UI をデバッグできるようになった。

TechCrunch AI★42026年6月10日 16:05

メタ、インドでリライアンスと初の AI データセンター契約を締結

メタはインドのリライアンス・グループと提携し、同国における最初の AI データセンター建設契約に署名した。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む