Simon Willison Blog·2026年4月23日 01:45·約2分で読める

Qwen3.6-27B：270億パラメータの密型モデルでフラッグシップ級コーディング性能

#大規模言語モデル(LLM)#Qwen #オープンソース #ローカル推論 #GGUF量子化

TL;DR

Qwenが公開した27BパラメータのDenseモデル「Qwen3.6-27B」は、ローカル環境での量子化推論によりフラッグシップ級のコーディング性能を実現し、大規模MoEモデルを上回るベンチマーク結果を示した。

AI深層分析2026年4月23日 02:07

重要/ 5段階

深度40%

キーポイント

モデル性能とサイズ比

27BパラメータのDenseモデルが、397B/17B active MoEモデルを上回るコーディングベンチマークスコアを達成。

ローカル推論の実装例

llama-serverとunslothのQ4_K_M量子化版を用いた具体的なCLIコマンドとパラメータ設定が公開されている。

推論速度と出力品質

~25 tok/sの生成速度で、複雑なSVG画像生成タスクを局所環境でも高品質に実行可能。

影響分析・編集コメントを表示

影響分析

27Bクラスの軽量モデルがフラッグシップ級のコーディング・Agentic性能を達成したことは、ローカル環境やエッジデバイスでの大規模AI導入ハードルを大幅に下げる。特にGGUF量子化とllama.cppとの親和性が高いため、開発者は低コストで高度なAIエージェントを構築・デプロイ可能となる。これはオープンソースエコシステムにおけるモデル最適化の新たな基準を示す。

編集コメント

開発者向けの実装手順とベンチマーク結果を具体的に示した実践的なレポートであり、ローカル推論エコシステムの成熟度を裏付ける重要な事例と言える。

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwenによる最新オープンウェイトモデル（open weight model）に関する大胆な主張：

Qwen3.6-27Bは、主要なコーディングベンチマークすべてにおいて前世代のオープンソースフラッグシップモデル Qwen3.5-397B-A17B（総パラメータ数397B / 活性化MoE（Mixture of Experts）17B）を上回る、フラッグシップレベルのエージェント型コーディング性能（agentic coding performance）を実現します。

Hugging Face上では Qwen3.5-397B-A17B が807GBであるのに対し、この新しい Qwen3.6-27B は55.6GBです。

私はまず brew install llama.cpp で llama-server をインストールした後、benob on Hacker News によるこのレシピを使って、16.8GBの Unsloth Qwen3.6-27B-GGUF:Q4_K_M 量子化バージョン（quantized version）と llama-server を試しました：

code

llama-server \
    -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
    --no-mmproj \
    --fit on \
    -np 1 \
    -c 65536 \
    --cache-ram 4096 -ctxcp 2 \
    --jinja \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking": true}'

初回実行時、この ~17GB のモデルは ~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF に保存されました。

「Generate an SVG of a pelican riding a bicycle」のトランスクリプトはこちらです。16.8GBのローカルモデルにとってこれは *素晴らしい* 結果です：

llama-server が報告したパフォーマンス数値：

読み込み（Reading）: 20 トークン (tokens), 0.4秒, 54.32 tokens/s

生成（Generation）: 4,444 トークン (tokens), 2分53秒, 25.57 tokens/s

参考までに、Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER も試しました（以前 GLM-5.1 で実行済み）：

こちらは 6,575 トークン (tokens)、4分25秒、24.74 t/s でした。

Via Hacker News

Tags: ai, generative-ai, local-llms, llms, qwen, pelican-riding-a-bicycle, llama-cpp, llm-release, ai-in-china

原文を表示

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Big claims from Qwen about their latest open weight model:

Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previous-generation open-source flagship Qwen3.5-397B-A17B (397B total / 17B active MoE) across all major coding benchmarks.

On Hugging Face Qwen3.5-397B-A17B is 807GB, this new Qwen3.6-27B is 55.6GB.

I tried it out with the 16.8GB Unsloth Qwen3.6-27B-GGUF:Q4_K_M quantized version and llama-server using this recipe by benob on Hacker News, after first installing llama-server using brew install llama.cpp:

code

llama-server \
    -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
    --no-mmproj \
    --fit on \
    -np 1 \
    -c 65536 \
    --cache-ram 4096 -ctxcp 2 \
    --jinja \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking": true}'

On first run that saved the ~17GB model to ~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF.

Here's the transcript for "Generate an SVG of a pelican riding a bicycle". This is an *outstanding* result for a 16.8GB local model:

Bicycle has spokes, a chain and a correctly shaped frame. Handlebars are a bit detached. Pelican has wing on the handlebars, weirdly bent legs that touch the pedals and a good bill. Background details are pleasant - semi-transparent clouds, birds, grass, sun.

Performance numbers reported by llama-server:

Reading: 20 tokens, 0.4s, 54.32 tokens/s

Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s

For good measure, here's Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER (run previously with GLM-5.1):

Digital illustration in a neon Tron-inspired style of a grey cat-like creature wearing cyan visor goggles riding a glowing cyan futuristic motorcycle through a dark cityscape at night, with its long tail trailing behind, silhouetted buildings with yellow-lit windows in the background, and a glowing magenta moon on the right.

That one took 6,575 tokens, 4min 25s, 24.74 t/s.

Via Hacker News

Tags: ai, generative-ai, local-llms, llms, qwen, pelican-riding-a-bicycle, llama-cpp, llm-release, ai-in-china

この記事をシェア

Latent Space★42026年6月5日 15:44

[AINews] 今日は何も大きな出来事はありませんでした

Anthropic が RSI の兆候を示し、OpenAI の ChatGPT が月間アクティブユーザー数で 10 億人を突破。SpaceX AI は IPO について説明しているが、最も重要なのは AIE WF のチケット確保とイベント参加である。

Cloudflare Blog★42026年6月4日 21:59

Vite 開発元 VoidZero が Cloudflare に参画

Vite や Vitest を開発する企業「VoidZero」がクラウドプロバイダー「Cloudflare」に合流し、同社全従業員も Cloudflare の一員となる。ただし、主要プロジェクトは引き続きオープンソースとして運営される方針を示した。

Ars Technica AI★42026年6月4日 04:10

Google の新モデル「Gemma 4 12B」は 16GB RAM のノート PC で動作可能に設計

Google は、メモリ消費を抑えた新しい生成 AI モデル「Gemma 4 12B」を発表した。このモデルは、一般的な消費者向けノートパソコン（RAM 16GB）でも実行できるように最適化されており、ローカルでの AI 利用を促進するものである。

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年4月23日 01:45·約2分で読める

Qwen3.6-27B：270億パラメータの密型モデルでフラッグシップ級コーディング性能

#大規模言語モデル(LLM)#Qwen #オープンソース #ローカル推論 #GGUF量子化

TL;DR

AI深層分析2026年4月23日 02:07

重要/ 5段階

深度40%

キーポイント

モデル性能とサイズ比

27BパラメータのDenseモデルが、397B/17B active MoEモデルを上回るコーディングベンチマークスコアを達成。

ローカル推論の実装例

llama-serverとunslothのQ4_K_M量子化版を用いた具体的なCLIコマンドとパラメータ設定が公開されている。

推論速度と出力品質

~25 tok/sの生成速度で、複雑なSVG画像生成タスクを局所環境でも高品質に実行可能。

影響分析・編集コメントを表示

影響分析

編集コメント

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwenによる最新オープンウェイトモデル（open weight model）に関する大胆な主張：

Hugging Face上では Qwen3.5-397B-A17B が807GBであるのに対し、この新しい Qwen3.6-27B は55.6GBです。

code

llama-server \
    -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
    --no-mmproj \
    --fit on \
    -np 1 \
    -c 65536 \
    --cache-ram 4096 -ctxcp 2 \
    --jinja \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking": true}'

初回実行時、この ~17GB のモデルは ~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF に保存されました。

「Generate an SVG of a pelican riding a bicycle」のトランスクリプトはこちらです。16.8GBのローカルモデルにとってこれは *素晴らしい* 結果です：

llama-server が報告したパフォーマンス数値：

読み込み（Reading）: 20 トークン (tokens), 0.4秒, 54.32 tokens/s

生成（Generation）: 4,444 トークン (tokens), 2分53秒, 25.57 tokens/s

参考までに、Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER も試しました（以前 GLM-5.1 で実行済み）：

こちらは 6,575 トークン (tokens)、4分25秒、24.74 t/s でした。

Via Hacker News

Tags: ai, generative-ai, local-llms, llms, qwen, pelican-riding-a-bicycle, llama-cpp, llm-release, ai-in-china

原文を表示

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Big claims from Qwen about their latest open weight model:

Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previous-generation open-source flagship Qwen3.5-397B-A17B (397B total / 17B active MoE) across all major coding benchmarks.

On Hugging Face Qwen3.5-397B-A17B is 807GB, this new Qwen3.6-27B is 55.6GB.

code

llama-server \
    -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
    --no-mmproj \
    --fit on \
    -np 1 \
    -c 65536 \
    --cache-ram 4096 -ctxcp 2 \
    --jinja \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking": true}'

On first run that saved the ~17GB model to ~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF.

Here's the transcript for "Generate an SVG of a pelican riding a bicycle". This is an *outstanding* result for a 16.8GB local model:

Performance numbers reported by llama-server: