Map and Reduce function tutorial in Python

Overview

In daily Python development, do you often encounter such a scenario: you need to batch process each element in the list and aggregate a column of data into a value? For example, capitalize the first letter of a group of names, spell an array of numbers into an integer, and quickly calculate the total amount of an order. At this time, Python’s built-inmap()andfunctools.reduce()They are two simple and efficient tools. They originate from functional programming ideas. Although they are conceptually the same as Google's industry-famous distributed MapReduce paper, their implementation is much lighter and fully capable of small and medium-sized data processing on a single machine.


Map function: a good helper for batch data conversion

map()The core logic is very simple: apply a "conversion function" to each element of an iterable object (such as a list, tuple, string, file stream) in turn, and return a memory-saving "iterator".

Basic syntax

map(function, iterable, ...)
  • function: Conversion function, which can be an ordinary function orlambda, the number of parameters must match the number of iterable objects
  • iterable: one or more iterable objects
  • Return value: an iterator, availablelist()tuple()orforCircular consumption

Basic example: batch squaring

For example, to square the numbers 1 to 9, usemap()It can be written like this:

# 定义转换函数(逻辑复杂时推荐)
def square(x):
    return x * x

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
squared_iter = map(square, numbers)   # 迭代器,不占额外列表内存
print(list(squared_iter))            # [1, 4, 9, 16, 25, 36, 49, 64, 81]

With lambda anonymous function

If the conversion has only one line of expression, uselambdaMore lightweight:

numbers = [1, 2, 3, 4, 5]
# lambda x: x**2 就是临时的平方函数
squared_list = list(map(lambda x: x**2, numbers))
print(squared_list)  # [1, 4, 9, 16, 25]

Corresponding conversion of multiple iterable objects

map()You can also receive multiple iterable objects with the same length (or the shortest one), in which case the conversion function needs to accept multiple parameters. For example, calculate the order details of "unit price × sales volume":

unit_prices = [29.9, 19.9, 49.9]   # 三个商品的单价
sales_counts = [10, 5, 2]         # 对应的销量
# 对应位置相乘得到单个商品的销售额
item_revenues = list(map(lambda x, y: round(x * y, 2), unit_prices, sales_counts))
print(item_revenues)   # [299.0, 99.5, 99.8]

Reduce function: expert in sequence data aggregation

reduce()responsibilities andmap()Different, it will "fold" the elements in the sequence step by step into a result. Starting with Python 3, it was moved tofunctoolsModule, remember to import it before using it.

Basic syntax

from functools import reduce

reduce(function, sequence[, initial])
  • function: A function that must accept two parameters. The first two elements of the sequence are taken when called for the first time, and then the last return value and the next element are used.
  • sequence:Single iterable object
  • initial(optional): initial value. If provided, the first call will use the initial value and the first element of the sequence.
  • Return value: a single result after aggregation (can be a number, string, list, etc.)

Intuitive understanding of workflow

pair sequence[x1, x2, x3, x4]implementreduce(f, [x1, x2, x3, x4])The process:

  1. Executionf(x1, x2), get the intermediate resultres1
  2. Executionf(res1, x3),getres2
  3. Executionf(res2, x4), get the final result

Basic example: accumulation and multiplication

Count odd sequence[1, 3, 5, 7, 9]The sum of:

from functools import reduce

# 命名函数
def add(x, y):
    return x + y

total = reduce(add, [1, 3, 5, 7, 9])
print(total)   # 25

# lambda 写法
total = reduce(lambda x, y: x + y, [1, 3, 5, 7, 9])
print(total)   # 25

If you want to calculate the cumulative multiplication, it is best to add an initial value1(It can avoid empty list errors and is more rigorous):

product = reduce(lambda x, y: x * y, [3, 5, 7, 9], 1)
print(product)   # 945

Advanced example: converting a list of numbers to an integer

The classic usage is to[1, 3, 5, 7, 9]become an integer13579

from functools import reduce

def digits_to_num(digits):
    # 逻辑:上次结果 ×10 + 下一位数字
    return reduce(lambda x, y: x * 10 + y, digits)

print(digits_to_num([1, 3, 5, 7, 9]))   # 13579

Practical small cases: solving real development problems

The following two examples showmap()andreduce()How to work together.

1. Batch normalized names

Convert names in all upper and lower cases into the format of "capitalize the first letter and lowercase the remaining letters":

def normalize(name):
    return name.capitalize()   # Python 内置方法

raw_names = ['adam', 'LISA', 'barT', 'jOhN dOe']
normalized_names = list(map(normalize, raw_names))
print(normalized_names)   # ['Adam', 'Lisa', 'Bart', 'John doe']

2. Manually convert string to floating point number

Here is one purely for teaching purposesstr2float()(Please use it directly in production environmentfloat()), to help understandmapandreducecombination of:

from functools import reduce

def str2float(s):
    # 字符到数字的映射
    def char2num(c):
        return {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
                '5': 5, '6': 6, '7': 7, '8': 8, '9': 9}[c]

    if '.' in s:
        integer_part, decimal_part = s.split('.')
        # 整数部分:用 reduce 拼接
        integer = reduce(lambda x, y: x * 10 + y, map(char2num, integer_part))
        # 小数部分:同样拼接,再除以 10 的小数位数次方
        decimal = reduce(lambda x, y: x * 10 + y, map(char2num, decimal_part)) / (10 ** len(decimal_part))
        return integer + decimal
    else:
        # 没有小数点则直接拼接为整数
        return reduce(lambda x, y: x * 10 + y, map(char2num, s))

print(str2float('123.456'))   # 123.456
print(str2float('789'))       # 789.0

Modern Python alternatives

map()andreduce()Although classic, in modern Python (3.x), many scenarios already have more intuitive alternative writing methods:

SceneClassic writingModern alternativeReasons for recommendation
Simple single parameter batch conversionlist(map(lambda x: x**2, nums))[x**2 for x in nums]List comprehensions are more Pythonic and more readable
Single parameter transformation when dealing with very large datamap(complex_func, huge_iter)(complex_func(x) for x in huge_iter)Generator expressionBoth are lazy iterators, and generator expressions can also add filter conditions
Simple accumulation/accumulation multiplication/finding the maximum valuereduce(lambda x,y:x+y, nums)sum(nums) / max(nums) / min(nums)Built-in functions are more efficient and clear at a glance
Complex multi-step aggregation logicreduce(complex_reduce_func, seq)Still recommendedreduce()At this timereduceThere is no simpler alternative, the expression is very intuitive

Performance considerations and pitfall avoidance guide

  1. Speed ​​of simple operations: For simple conversions of single parameters, list comprehensions are usually faster thanmap()10%–20% faster becausemap()calllambdaThere will be some additional overhead.
  2. Big Data Memory:map()and generator expressions will not load all the data into memory at once, please use them first when working with large files (such as CSV, logs).
  3. Handling of empty sequences: If not providedinitialreduce()will throw on an empty sequenceTypeError,For examplereduce(lambda x,y:x+y, []). When encountering this situation, be sure to remember to add a reasonable initial value.

Summarize

map()andreduce()It is an introductory tool for Python functional programming. Mastering them can help you: -Write concise and elegant batch conversion and aggregation code

  • Understand the core ideas of distributed MapReduce
  • Read more Python code written in a functional style

In daily development, it is recommended to first check whether there are built-in replacement functions** (such assum()max()), if not, give priority to list comprehensions or generator expressions and embrace them lastmap()andreduce(). If you use the right tools, your code will be cleaner and more efficient.