KDnuggets·2026年6月16日 21:00·約7分で読める

Pandas でループを書かない：試すべき 7 つの高速代替案

#データ分析 #パフォーマンス最適化 #Python #Pandas

TL;DR

KDnuggets は、Pandas データ処理におけるループ使用の非効率性を指摘し、ベクトル化や組み込み関数など 7 つの実践的な高速代替手法を詳述している。

AI深層分析2026年6月16日 22:02

注目/ 5段階

深度40%

キーポイント

ループ処理のボトルネック解消

Python の標準ループはオーバーヘッドが大きく、Pandas のベクトル化操作に比べて著しく低速であることを示唆し、パフォーマンス改善の必要性を強調している。

7 つの高速代替手法の紹介

apply 関数の最適化、groupby の活用、NumPy ベースの計算、および pandas の組み込みメソッドなど、具体的な 7 つの実装パターンを提示している。

可読性と速度の両立

コードの簡潔さを損なわずに実行速度を劇的に向上させるためのベストプラクティスとして、ベクトル化アプローチの採用を推奨している。

影響分析・編集コメントを表示

影響分析

この記事は、データサイエンティストやエンジニアにとって、大規模データを扱う際のコードパフォーマンスを根本から改善するための具体的な指針となる。ループからの脱却を促すことで、計算リソースの削減と処理時間の短縮という実務的なメリットをもたらす。

編集コメント

AI モデルの学習前処理やデータクリーニングにおいて、この「ループ禁止」の原則は計算コスト削減の鍵となります。実務で即座に適用可能な知見です。

image**

# イントロダクション

行ごとの反復処理は、pandas コードにおける最も一般的なパフォーマンスのボトルネックの一つです。小規模なデータセットでは目立たないこともありますが、大規模データの処理においては大きな影響を及ぼします。

pandas は NumPy を基盤として構築されており、コンパイルされた C コードを使用して配列全体に対して一度に演算を実行します。Python で行をループ処理することはこれを完全に迂回し、すべての操作を Python インタープリタに戻すことになります — 1 行ずつです。

この記事では、pandas におけるループの代替手段として 7 つの方法を取り上げます。それぞれが異なる種類の変換に適しています。最後には、問題の形状に応じてどのツールを使うべきかという明確な思考マップを得られるでしょう。

Colab ノートブックは GitHub で入手できます。

# サンプルデータのセットアップ

本記事全体を通じて、現実的な e コマース注文データセットを使用します：

import pandas as pd

import numpy as np

np.random.seed(42)

n = 100_000

categories = ['Electronics', 'Clothing', 'Home & Kitchen', 'Sports', 'Books']

regions = ['North', 'South', 'East', 'West']

df = pd.DataFrame({

'order_id': range(1, n + 1),

'customer_age': np.random.randint(18, 70, n),

'product_category': np.random.choice(categories, n),

'region': np.random.choice(regions, n),

'price': np.round(np.random.uniform(5.0, 500.0, n), 2),

'quantity': np.random.randint(1, 10, n),

'days_to_ship': np.random.randint(1, 14, n),

})

display(df.head())

Output:

これで、作業に使える 10 万件の行を持つデータセットが完成しました。

# 1. 算術演算にはベクトル化演算を使用する

列に対する任意の算術演算や比較においては、ベクトル化演算 (Vectorized Operations) がまず最初に思い浮かぶべき手法です。

私たちが行いたいこと: 注文ごとの総売上高を計算する。

df['revenue'] = df['price'] * df['quantity']

display(df[['price', 'quantity', 'revenue']].head())

Output:

# 2. 条件付きロジックには関数の適用を使用する

単純な算術演算では表現できない論理を含む変換を行う場合、.apply() を使用すると、列または行に対して関数を渡すことができます。

私たちが行いたいこと: 配送までの日数に基づいて配送優先度のラベルを割り当てる。

def shipping_label(days):

if days <= 2:

return 'Express'

elif days <= 5:

return 'Standard'

else:

return 'Economy'

df['shipping_tier'] = df['days_to_ship'].apply(shipping_label)

display(df[['days_to_ship', 'shipping_tier']].head())

Output:

. apply() を使用すると、コードが清潔で読みやすく、ループよりもはるかにデバッグしやすいです。ロジックが条件分岐であり、np.where() や np.select() がネストしすぎてしまう場合にこれを使用してください。

# 3. 二値条件には np.where() を使用する

二値条件（真の場合と偽の場合でそれぞれ異なる結果を返す場合）があるときは、np.where() が清潔かつ高速な選択肢です。

実行したいこと: 顧客がシニア割引の対象となる注文にフラグを立てることです。

df['senior_discount'] = np.where(df['customer_age'] >= 60, True, False)

display(df[['customer_age', 'senior_discount']].head())

Output:

np.where() は完全にベクトル化されており、単純な真偽条件においては .apply() よりも大幅に高速です。これはベクトル化された三項演算子と考えることができます。

# 4. np.select() を使用して複数の条件から選択する

2 つ以上の条件がある場合、np.select() を使用すると、ネストされた if/elif 連鎖を一切必要とせずに、条件のリストと対応する値を定義できます。

私たちが行いたいこと**: リージョンベースの税率を割り当てることです。

conditions = [

df['region'] == 'North',

df['region'] == 'South',

df['region'] == 'East',

df['region'] == 'West',

]

tax_rates = [0.08, 0.06, 0.07, 0.09]

df['tax_rate'] = np.select(conditions, tax_rates, default=0.07)

df['tax_amount'] = df['price'] * df['tax_rate']

display(df[['region', 'price', 'tax_rate', 'tax_amount']].head())

出力:

np.select() は条件を順番に評価し、最初の一致するものを選択します。default パラメータは一致しないものを処理するため、安全網として有用です。

# 5. ディクショナリ参照による値のマッピング

カラム内の値を変換する場合 — カテゴリ名を数値コードにマッピングしたり、キーをラベルに置き換えたりする場合など — .map() をディクショナリと併用すると、すっきりとして高速です。

私たちが行いたいこと**: 製品カテゴリを内部部門コードにマッピングすることです。

category_codes = {

'Electronics': 'ELEC',

'Clothing': 'CLTH',

'Home & Kitchen': 'HOME',

'Sports': 'SPRT',

'Books': 'BOOK',

}

df['dept_code'] = df['product_category'].map(category_codes)

display(df[['product_category', 'dept_code']].head())

出力:

.map() は照合テーブルとして機能します。これは pandas において最も過小評価されているツールの一つです—私たちは .map(dict) が同じことをより高速に実行するにもかかわらず、.apply(lambda x: dict[x]) を使用しがちです。

# 6. .str アクセサによる文字列操作

文字列操作は、人々が最も頻繁にループや .apply() に頼ってしまう分野です。.str アクセサを使用すれば、これらいずれも使わずに、列全体に対して文字列操作を実行できます。

私たちがやりたいこと**: product_category 列から最初の単語を抽出し、それを小文字に変換することです。

df['category_slug'] = df['product_category'].str.split().str[0].str.lower()

display(df[['product_category', 'category_slug']].head())

出力:

# 7. .groupby() を用いたグループ集約

一般的なループのパターンは、データのサブセットを反復処理してグループレベルの統計量を計算することです。.groupby() はこれをネイティブで処理します。

私たちがやりたいこと**: 製品カテゴリごとの総売上高と出荷までの平均日数を計算することです。

summary = (

df.groupby('product_category')

.agg(

total_revenue=('revenue', 'sum'),

avg_ship_days=('days_to_ship', 'mean'),

order_count=('order_id', 'count')

)

.round(2)

.reset_index()

)

summary

# 適切なツールの選択

ループで記述するほとんどの変換は、以下のいずれかのパターンにきれいに収まります:

操作 / メソッド | 使用例 / 説明

---|---

列の算術演算 | DataFrame の列に対して、加算、減算、乗算、除算などのベクトル化された数学演算を直接実行します。

ベクトル化演算 (*, + など) | 明示的なループなしで、効率的に全列に要素ごとの演算を適用します。

単純な真偽条件 | ブール条件を評価してフィルタリングしたり、条件付きカラムを作成したりします。

np.where() | 配列や DataFrame の列に対して、条件分岐 (if-else) ロジックをベクトル化された方法で適用します。

複数の条件と複数の結果 | 複数のルールと出力を持つ複雑な条件ロジックを処理します。

np.select() | 複数の条件に基づいて値を選択し、対応する出力を返します。

ルックアップによる値の置換 | 変換用辞書を使用して値を置き換え、高速な変換を実現します。

.map(dict) | シリーズ内の値を辞書または関数でマッピングして置換します。

.apply() | カスタム関数を行単位または列単位に適用し、柔軟な変換を行います。

文字列操作

.str アクセサーを使用してベクトル化された文字列操作を行い、テキストデータのクリーニングと変換を行ってください。

.groupby() + .agg()

data をグループ化し、合計、平均、カウントなどの集計統計を計算します。

行ではなく列で考えるようになると、pandas API が単なる回避策というよりも、実際に意図された作業方法のように感じられるようになります。

Bala Priya C はインド出身のエンジニア兼テクニカルライターです。数学、プログラミング、データサイエンス、コンテンツ制作が交差する領域での作業を好んでいます。彼女の興味分野と専門知識には、DevOps、データサイエンス、自然言語処理が含まれます。読書、執筆、コーディング、そしてコーヒーを楽しむのが好きです。現在、チュートリアル、ハウツーガイド、意見記事などを執筆することで、開発者コミュニティに知識を共有し、学習に取り組んでいます。また、魅力的なリソースの概要やコーディングチュートリアルも作成しています。

原文を表示

Stop Writing Loops in pandas: 7 Faster Alternatives to Try

# Introduction

Row-by-row iteration is one of the most common performance bottlenecks in pandas code. On small datasets it goes unnoticed, but for processing large datasets, this becomes impactful.

pandas is built on top of NumPy, which executes operations on entire arrays at once using compiled C code. Looping through rows in Python bypasses that entirely and forces every operation back into the Python interpreter — one row at a time.

This article covers 7 alternatives to loops in pandas, each suited to a different kind of transformation. By the end, you'll have a clear mental map of which tool to reach for depending on the shape of the problem.

You can get the Colab notebook on GitHub.

# Setting Up the Sample Dataset

We'll use a realistic e-commerce orders dataset throughout this article:

code

import pandas as pd
import numpy as np

np.random.seed(42)
n = 100_000

categories = ['Electronics', 'Clothing', 'Home & Kitchen', 'Sports', 'Books']
regions = ['North', 'South', 'East', 'West']

df = pd.DataFrame({
    'order_id': range(1, n + 1),
    'customer_age': np.random.randint(18, 70, n),
    'product_category': np.random.choice(categories, n),
    'region': np.random.choice(regions, n),
    'price': np.round(np.random.uniform(5.0, 500.0, n), 2),
    'quantity': np.random.randint(1, 10, n),
    'days_to_ship': np.random.randint(1, 14, n),
})
display(df.head())

Output:

We now have a dataset of 100,000 rows to work with.

# 1. Using Vectorized Operations for Arithmetic

For any arithmetic or comparison on a column, vectorized operations should be your first instinct.

What we want to do**: calculate the total revenue per order.

code

df['revenue'] = df['price'] * df['quantity']
display(df[['price', 'quantity', 'revenue']].head())

Output:

Using Vectorized Operations for Arithmetic

# 2. Applying a Function for Conditional Logic

When your transformation involves some logic that can't be expressed as plain arithmetic, .apply() lets you pass a function over a column or row.

What we want to do**: assign a shipping priority label based on days to ship.

code

def shipping_label(days):
    if days <= 2:
        return 'Express'
    elif days <= 5:
        return 'Standard'
    else:
        return 'Economy'

df['shipping_tier'] = df['days_to_ship'].apply(shipping_label)
display(df[['days_to_ship', 'shipping_tier']].head())

Output:

Applying a Function for Conditional Logic

Using .apply() is clean, readable, and far easier to debug than a loop. Use it when your logic is conditional and np.where() or np.select() feels too nested.

# 3. Using np.where() for Binary Conditions

When you have a binary condition — one outcome if true, another if false — np.where() is the clean, fast choice.

What we want to do**: flag orders where the customer qualifies for a senior discount.

code

df['senior_discount'] = np.where(df['customer_age'] >= 60, True, False)
display(df[['customer_age', 'senior_discount']].head())

Output:

np.where() is fully vectorized and significantly faster than .apply() for simple true or false conditions. Think of it as a vectorized ternary operator.

# 4. Selecting Across Multiple Conditions with np.select()

When you have more than two conditions, np.select() lets you define a list of conditions and corresponding values without any need for nested if/elif chains.

What we want to do**: assign a region-based tax rate.

code

conditions = [
    df['region'] == 'North',
    df['region'] == 'South',
    df['region'] == 'East',
    df['region'] == 'West',
]
tax_rates = [0.08, 0.06, 0.07, 0.09]

df['tax_rate'] = np.select(conditions, tax_rates, default=0.07)
df['tax_amount'] = df['price'] * df['tax_rate']
display(df[['region', 'price', 'tax_rate', 'tax_amount']].head())

Output:

Selecting Across Multiple Conditions with np.select()

np.select() evaluates all conditions in order and picks the first match. The default parameter handles anything that doesn't match, which is useful as a safety net.

# 5. Mapping Values with a Dictionary Lookup

When you need to translate values in a column — like mapping category names to numeric codes, or replacing keys with labels — .map() with a dictionary is clean and fast.

What we want to do**: map product categories to internal department codes.

code

category_codes = {
    'Electronics': 'ELEC',
    'Clothing': 'CLTH',
    'Home & Kitchen': 'HOME',
    'Sports': 'SPRT',
    'Books': 'BOOK',
}

df['dept_code'] = df['product_category'].map(category_codes)
display(df[['product_category', 'dept_code']].head())

Output:

.map() works like a lookup table. It's one of the most underused tools in pandas — we often reach for .apply(lambda x: dict[x]) when .map(dict) does the same thing faster.

# 6. Manipulating Strings with the .str Accessor

String manipulation is where people most often default to loops or .apply(). The .str accessor lets you run string operations across an entire column without either.

What we want to do**: extract the first word from the product_category column and convert it to lowercase.

code

df['category_slug'] = df['product_category'].str.split().str[0].str.lower()
display(df[['product_category', 'category_slug']].head())

Output:

# 7. Aggregating Groups with .groupby()

A common loop pattern is iterating over subsets of data to compute group-level statistics. .groupby() handles this natively.

What we want to do**: calculate total revenue and average days to ship per product category.

code

summary = (
    df.groupby('product_category')
    .agg(
        total_revenue=('revenue', 'sum'),
        avg_ship_days=('days_to_ship', 'mean'),
        order_count=('order_id', 'count')
    )
    .round(2)
    .reset_index()
)
summary

Output:

# Choosing the Right Tool

Most transformations you'd write a loop for fit cleanly into one of these patterns:

Operation / Method

Use Case / Description

Arithmetic on columns

Perform vectorized math operations like addition, subtraction, multiplication, and division directly on DataFrame columns.

Vectorized operations (*, +, etc.)

Apply element-wise operations across entire columns efficiently without explicit loops.

Simple true/false condition

Evaluate boolean conditions to filter or create conditional columns.

np.where()

Apply conditional (if-else) logic in a vectorized way for arrays and DataFrame columns.

Multiple conditions, multiple outcomes

Handle complex conditional logic with multiple rules and outputs.

np.select()

Select values based on multiple conditions and return corresponding outputs.

Value substitution via lookup

Replace values using mapping dictionaries for fast transformations.

.map(dict)

Map values in a Series using a dictionary or function for substitution.

.apply()

Apply custom functions row-wise or column-wise for flexible transformations.

String manipulation

Use vectorized string operations via the .str accessor for cleaning and transforming text data.

.groupby() + .agg()

Group data and compute aggregated statistics like sum, mean, count, etc.

Once you start thinking in columns rather than rows, you'll find the pandas API starts to feel less like a workaround and more like the actual intended way to work.

Bala Priya C** is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

この記事をシェア

Simon Willison Blog★32026年6月19日 04:03

datasette-acl 0.6a0 のリリース

Alex Garcia 氏らが中心となり、データセットのテーブル単位での権限管理から、一般リソース共有システムへと拡張された「datasette-acl」バージョン 0.6a0 が公開されました。これにより、複数ユーザー環境でリソースへのアクセス制御が細かく行えるようになります。

KDnuggets★32026年6月15日 21:00

データクリーニングと前処理のための Pandas の 3 つの技

KDnuggets が紹介する記事で、Pandas ライブラリを用いたデータクリーニングと前処理を効率化する 3 つの実用的なテクニックが解説されています。

KDnuggets★32026年6月19日 21:00

データサイエンティストが知っておくべき実用的な SQL の技

KDnuggets は、データサイエンティストが効率的にデータを処理するために役立つ実践的な SQL のテクニックを紹介している。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

import pandas as pd import numpy as np np.random.seed(42) n = 100_000 categories = ['Electronics', 'Clothing', 'Home & Kitchen', 'Sports', 'Books'] regions = ['North', 'South', 'East', 'West'] df = pd.DataFrame({ 'order_id': range(1, n + 1), 'customer_age': np.random.randint(18, 70, n), 'product_category': np.random.choice(categories, n), 'region': np.random.choice(regions, n), 'price': np.round(np.random.uniform(5.0, 500.0, n), 2), 'quantity': np.random.randint(1, 10, n), 'days_to_ship': np.random.randint(1, 14, n), }) display(df.head())

def shipping_label(days): if days <= 2: return 'Express' elif days <= 5: return 'Standard' else: return 'Economy' df['shipping_tier'] = df['days_to_ship'].apply(shipping_label) display(df[['days_to_ship', 'shipping_tier']].head())

conditions = [ df['region'] == 'North', df['region'] == 'South', df['region'] == 'East', df['region'] == 'West', ] tax_rates = [0.08, 0.06, 0.07, 0.09] df['tax_rate'] = np.select(conditions, tax_rates, default=0.07) df['tax_amount'] = df['price'] * df['tax_rate'] display(df[['region', 'price', 'tax_rate', 'tax_amount']].head())

category_codes = { 'Electronics': 'ELEC', 'Clothing': 'CLTH', 'Home & Kitchen': 'HOME', 'Sports': 'SPRT', 'Books': 'BOOK', } df['dept_code'] = df['product_category'].map(category_codes) display(df[['product_category', 'dept_code']].head())

summary = ( df.groupby('product_category') .agg( total_revenue=('revenue', 'sum'), avg_ship_days=('days_to_ship', 'mean'), order_count=('order_id', 'count') ) .round(2) .reset_index() ) summary

キーポイント

影響分析

編集コメント

# イントロダクション

# サンプルデータのセットアップ

# 1. 算術演算にはベクトル化演算を使用する

# 2. 条件付きロジックには関数の適用を使用する

# 3. 二値条件には np.where() を使用する

# 4. np.select() を使用して複数の条件から選択する

# 5. ディクショナリ参照による値のマッピング

# 6. .str アクセサによる文字列操作

# 7. .groupby() を用いたグループ集約

# 適切なツールの選択

# Introduction

# Setting Up the Sample Dataset

# 1. Using Vectorized Operations for Arithmetic

# 2. Applying a Function for Conditional Logic

# 3. Using np.where() for Binary Conditions

# 4. Selecting Across Multiple Conditions with np.select()

# 5. Mapping Values with a Dictionary Lookup

# 6. Manipulating Strings with the .str Accessor

# 7. Aggregating Groups with .groupby()

# Choosing the Right Tool

関連記事

キーポイント

影響分析

編集コメント

# イントロダクション

# サンプルデータのセットアップ

# 1. 算術演算にはベクトル化演算を使用する

# 2. 条件付きロジックには関数の適用を使用する

# 3. 二値条件には np.where() を使用する

# 4. np.select() を使用して複数の条件から選択する

# 5. ディクショナリ参照による値のマッピング

# 6. .str アクセサによる文字列操作

# 7. .groupby() を用いたグループ集約

# 適切なツールの選択

# Introduction

# Setting Up the Sample Dataset

# 1. Using Vectorized Operations for Arithmetic

# 2. Applying a Function for Conditional Logic

# 3. Using np.where() for Binary Conditions

# 4. Selecting Across Multiple Conditions with np.select()

# 5. Mapping Values with a Dictionary Lookup

# 6. Manipulating Strings with the .str Accessor

# 7. Aggregating Groups with .groupby()

# Choosing the Right Tool

関連記事