The Decoder·2026年4月8日 03:52·約1分で読める

GoogleのAI概要は10回中9回正しいと研究が発見

#検索AI #LLM #AI信頼性 #Google #生成AI #事実確認

TL;DR

The Decoderの記事は、GoogleのAI Overviewsが10回中9回正しいという調査結果を報告し、AI生成検索結果の信頼性に関する実証データを提供している。

AI深層分析2026年4月8日 04:40

注目/ 5段階

深度40%

キーポイント

AI Overviewsの正確性に関する初の調査結果

GoogleのAI生成検索応答機能「AI Overviews」の正確性を調査した結果、10回中9回（90%）が正しいことが判明した。

Googleの免責事項と実態の乖離

GoogleはAI応答に「誤りが含まれる可能性がある」という免責を表示しているが、実際の誤り発生頻度に関する研究はほとんど行われていなかった。

AI生成コンテンツの信頼性評価の重要性

この調査は、大規模言語モデルを活用した検索機能の実用性と信頼性を評価する上で重要なデータポイントを提供する。

影響分析・編集コメントを表示

影響分析

この調査結果は、AI生成検索結果の信頼性に関する具体的な数値を初めて示した点で意義がある。ユーザーがAI応答をどの程度信頼すべきかの判断材料となり、AI検索機能の実用化と普及に影響を与える可能性がある。

編集コメント

AI生成コンテンツの正確性に関する具体的な数値データは貴重だが、調査方法やサンプルサイズなどの詳細情報が不足しているため、結果の解釈には注意が必要。

GoogleはAI生成された検索応答のそれぞれに、「AIの回答には誤りが含まれる可能性があります」という免責事項を表示しています。しかし、実際にそれらの誤りがどの程度の頻度で発生するかについては、これまでほとんど研究が行われていませんでした。

本記事「GoogleのAI Overviewsは10回中9回正しいことが研究で判明」は、The Decoderで最初に公開されました。

原文を表示

Google puts a disclaimer under every AI-generated search response: "AI responses may include mistakes." But just how often those mistakes actually happen has remained largely unstudied.

On behalf of the New York Times, AI startup Oumi examined 4,326 Google searches using the industry-standard SimpleQA benchmark. The tests ran in two rounds: once in October with Gemini 2 powering the AI, and again in February after the upgrade to Gemini 3.

The findings: with Gemini 2, AI overviews were correct 85 percent of the time. With Gemini 3, that number climbed to 91 percent. That sounds impressive, but at Google's scale, it still means millions of wrong answers every hour.

What the study doesn't address is whether users would have gotten better answers through traditional search results or other sources. Not everything on websites is automatically correct either. The real question is whether users end up with more correct information overall than they would without Google's AI Overviews.

Accuracy is up, but verifiability is down

Another key finding: while accuracy improved with Gemini 3, verifiability actually got worse. Oumi checked whether the sources Google linked actually supported the answers it gave. With Gemini 2, 37 percent of correct answers were "ungrounded," meaning the linked websites didn't fully back up the information. With Gemini 3, that figure jumped to 56 percent. Often, there's simply no way to verify an answer based on the source Google provides.

The quality of those sources is questionable too. Out of 5,380 sources Google cited, Facebook and Reddit ranked second and fourth most common. Facebook showed up as a source in five percent of correct answers and seven percent of incorrect ones. Google may have an incentive to favor sources that are less likely to sue over content use.

The New York Times highlights several examples of how things can go wrong even when the system locates the right source. In a question about the Classical Music Hall of Fame, Google identified the correct website listing Yo-Yo Ma as a member but still claimed there was no record of his induction.

When asked about the river west of Goldsboro, North Carolina, Google found the right tourism website but misread the information, naming the Neuse River instead of the actual Little River to the west.

And for a question about the Bob Marley Museum, Google's AI Overview gave the wrong opening year—1987 instead of 1986—pulling from a Facebook post, a travel blog, and a Wikipedia page with conflicting information.

Google pushes back on the study's methods

To verify answers at scale, Oumi used its own AI verification model, HallOumi. That's the only practical way to check thousands of responses, but it comes with an obvious weakness: the AI doing the checking can make mistakes too. Moreover, AI overviews can generate different answers for identical searches, even when queries are just seconds apart.

Google spokesperson Ned Adriance called the study flawed, saying it has "serious holes." The SimpleQA benchmark itself contains incorrect information and doesn't reflect what people actually search for on Google, he said.

Despite its name, SimpleQA, developed by OpenAI, is built around particularly tricky questions, ones where at least one AI model failed during a pre-screening process. That means the failure rate is naturally higher. The benchmark is also designed for scenarios without internet access.

In the Artificial Analysis Intelligence Index, Google's latest model, Gemini 3.1 Pro, shows a 38 percentage point drop in hallucination rate compared to the earlier Gemini 3, which was likely running as a less capable Flash version in Google's search at the time of testing. Google says results with web search are more accurate than those based purely on model knowledge.

The real issue is what AI answers are doing to the open web

The bigger debate around Google's AI overviews is about what they're doing to the internet. By serving up direct answers instead of sending users to external websites, Google is cutting off traffic to publishers and undermining their economic foundation.

The open web is losing its role as a freely linked information network, increasingly replaced by a centralized AI interface under Google's control. A 90 percent accuracy rate is likely more than enough for most users and most searches to skip clicking through to the underlying website altogether.

Studies showing that AI overviews hurt web traffic have consistently been denied by Google, which has yet to share any numbers of its own. Even OpenAI was more upfront when it first launched web features for ChatGPT, stating that "we appreciate that this is a new method of interacting with the web, and welcome feedback on additional ways to drive traffic back to sources and add to the overall health of the ecosystem," though that concern quietly faded as its search rollout progressed.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Subscribe now

この記事をシェア

Google DeepMind★42026年4月3日 01:00

Gemma 4：バイト単位で最も能力の高いオープンモデル

GoogleがGemma 4を発表した。高度な推論とエージェントワークフロー向けに設計された、これまでで最も知的なオープンモデルである。

The Decoder★42026年4月3日 03:06

GoogleのGemma 4が初めてApache 2.0ライセンスで利用可能に

Googleが最も高性能なオープンモデルファミリー「Gemma 4」をリリースした。4つの新モデルはスマートフォンからワークステーションまで幅広く動作し、初めて完全にオープンなApache 2.0ライセンスで提供される。

AI Business★32026年4月3日 21:51

Google、オープンモデルファミリーGemma 4を発表

Googleは、高度な推論とマルチモーダル機能を備えたオープンモデルファミリー「Gemma 4」を発表した。

ニュース一覧に戻る元記事を読む