Python reading and writing Excel files-2 (openpyxl advanced)

Reading and writing Excel files with Python (advanced)

1. Introduction to openpyxl: the first choice for modern office automation

In the previous chapter we studied the old version.xlsThe method of reading and writing files, but now mainstream reports and statistical tables have already been fully shifted to Office 2007 and later versions.xlsxXML format. Among the Python tools that handle this format,openpyxlIt is a well-deserved first choice - it integrates reading and writing, has rich functions, and its syntax design is very suitable for Excel's native operating habits.

Compared to the old tool setxlrd / xlwtWhen reading and writing, you need to switch between different libraries and only support the old formats they are responsible for.openpyxlAll operations are unified, and the advantages are very obvious:

  • Read, write and modify in one go: The same workbook object can not only read existing content, but also modify and append new data, without the need to convert back and forth between multiple libraries.
  • Cell positioning is more intuitive: use it directly'A1''C4'Such an Excel coordinate string saves you the trouble of memorizing row and column numbers.
  • Very comprehensive feature coverage: natively supports style customization such as fonts, borders, alignment, filling, etc.; can directly write truly calculable Excel formulas; and can also embed interactive native charts in files (the chart itself is an Excel object, not a static picture).

⚠️ format restrictions openpyxlOnly supports.xlsxand.xlsmformat, cannot handle older versions.xlsdocument. If you still need to process.xls, please refer to the special tools introduced in the previous chapter.

Installation instructions

pip install openpyxl

# 如果下载速度较慢,可以使用国内镜像源
pip install openpyxl -i https://pypi.tuna.tsinghua.edu.cn/simple

2. Reading Excel files: flexible coordinate positioning

openpyxlProvides a very flexible way to read files. In addition to the ordinary row and column indexes, the most popular thing is the native Excel coordinate string - when writing code, you no longer need to mentally calculate the numbers corresponding to the rows and rows, which is highly readable and saves subsequent maintenance.

Practical combat: reading stock historical data

import datetime
import openpyxl

# 1. 加载工作簿(默认 read_only=False,支持读写)
wb = openpyxl.load_workbook('阿里巴巴2020年股票数据.xlsx')

# 获取第一个工作表(可通过 worksheets[0] 或 active 属性)
sheet = wb.active

# 快速查看有效数据范围,避免写死行列数
print(f"有效区域:{sheet.dimensions}")  # 例如输出 A1:G255
print(f"行数:{sheet.max_row}, 列数:{sheet.max_column}")

# 2. 多种读取单元格的方式
# ✅ 方式1:Excel 坐标字符串(推荐)
open_price = sheet['C2'].value

# 📌 方式2:cell 方法(行列索引从 1 开始,而不是 0!)
close_price = sheet.cell(row=2, column=5).value

# ⚙️ 方式3:直接遍历整列或整行切片
c_col_cells = sheet['C']          # 获取 C 列所有单元格对象
dates = sheet['A2:A10']           # 获取指定区域的单元格对象

# 3. 高效遍历并处理数据(values_only=True 直接获取纯值,节省内存)
for row in sheet.iter_rows(min_row=2, values_only=True):
    for idx, value in enumerate(row):
        if isinstance(value, datetime.datetime):
            print(value.strftime('%Y-%m-%d'), end='\t')
        else:
            if idx >= 2:  # 假设第 2 列以后是数值,保留两位小数
                print(f"{value:.2f}\t", end='')
            else:
                print(f"{value}\t", end='')
    print()

Tips:sheet.dimensionsIt can quickly tell you the rectangular range occupied by the data, for exampleA1:G255, which is very useful when debugging.


3. Writing and style customization: Make the report professional and beautiful

openpyxlVarious styles (fonts, alignment, borders, padding, etc.) are designed into independent "building blocks" that we can define separately and then assemble into cells as needed to achieve highly reusable style templates.

import openpyxl
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill

# 新建工作簿(默认包含一个名为 Sheet 的工作表)
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = '高一1班期中成绩'

# ✨ 预先定义样式“积木”,方便后续重复使用
header_font = Font(size=14, bold=True, color='FFFFFF', name='微软雅黑')
header_fill = PatternFill(fill_type='solid', fgColor='4472C4')  # 经典表头蓝
center_align = Alignment(horizontal='center', vertical='center', wrap_text=True)
thin_side = Side(style='thin', color='000000')
full_border = Border(left=thin_side, top=thin_side, right=thin_side, bottom=thin_side)

# 1. 写入表头并应用样式
headers = ['姓名', '语文', '数学', '英语', '总分']
for col_idx, header_text in enumerate(headers, 1):
    cell = sheet.cell(1, col_idx, header_text)
    cell.font = header_font
    cell.fill = header_fill
    cell.alignment = center_align
    cell.border = full_border

# 2. 写入学生成绩数据
students = [
    ['张三', 92, 88, 95],
    ['李四', 85, 96, 90],
    ['王五', 78, 82, 87]
]
for row_idx, student_data in enumerate(students, 2):
    for col_idx, score in enumerate(student_data, 1):
        cell = sheet.cell(row_idx, col_idx, score)
        cell.alignment = center_align
        cell.border = full_border
        # 如果成绩低于 80 分,用红色背景提醒
        if isinstance(score, int) and score < 80:
            cell.fill = PatternFill(fill_type='solid', fgColor='FFC7CE')

# 3. 写入 Excel 原生公式(以 '=' 开头,Excel 打开后会自动计算结果)
for row_idx in range(2, 5):
    sheet.cell(row_idx, 5).value = f'=SUM(B{row_idx}:D{row_idx})'
    # 同时为公式单元格加上边框和对齐
    sheet.cell(row_idx, 5).alignment = center_align
    sheet.cell(row_idx, 5).border = full_border

# 4. 简单自动调整列宽(根据单元格内容长度大致估算)
for col in sheet.columns:
    max_length = 0
    column_letter = col[0].column_letter
    for cell in col:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
        except:
            pass
    adjusted_width = (max_length + 2) * 1.2
    sheet.column_dimensions[column_letter].width = adjusted_width

wb.save('高一1班期中成绩_美化版.xlsx')

Reminder: The function name and punctuation marks in the formula must be in English format. Chinese punctuation will cause the formula to fail to be written.


4. Generate professional native Excel charts

This isopenpyxlOne of the most popular core functions in office and reporting scenarios - what it generates is not a static picture, but a truly interactive Excel chart object. After opening the file, you can click on the chart, edit the data source, change the color scheme and even switch chart types (for example, from a bar chart to a line chart, with just a few clicks).

from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

wb = Workbook()
sheet = wb.active
sheet.title = '电商周销榜'

# 1. 准备销售数据
week_sales = [
    ['商品', '销量'],
    ['无线蓝牙耳机', 2100],
    ['智能手环', 1850],
    ['降噪耳机', 3200],
    ['笔记本支架', 2700]
]
for row in week_sales:
    sheet.append(row)

# 2. 创建柱状图(可选多种预设样式)
bar_chart = BarChart()
bar_chart.title = "本周热销榜 TOP4"
bar_chart.y_axis.title = '销量'
bar_chart.x_axis.title = '商品'
bar_chart.style = 10  # Excel 内置的渐变柱形图样式

# 3. 引用数据区域
values = Reference(sheet, min_col=2, min_row=1, max_row=5)
categories = Reference(sheet, min_col=1, min_row=2, max_row=5)

bar_chart.add_data(values, titles_from_data=True)
bar_chart.set_categories(categories)

# 4. 将图表插入到工作表的 E2 单元格位置
sheet.add_chart(bar_chart, "E2")

wb.save('电商周销榜_带原生图.xlsx')

If you need to make a line chart, just addBarChartReplace withLineChart, the rest of the usage is almost the same, very convenient.


5. Summary and pitfall avoidance suggestions

openpyxlis dealing with modern.xlsxThe library of choice for files, it's perfect for scenarios where you need fine control over styling, writing formulas, or generating charts. If you are just doing large-scale data cleaning and analysis, Pandas is usually more efficient, but Pandas saves.xlsxThe bottom layer often relies onopenpyxlengine.

Practical reminder to avoid pitfalls

  1. Column and column indexes start from 1: This is different from many programming habits. If the records are mixed, it may lead to obtaining wrong data.
  2. Remember to turn on the read-only/write-only mode for large amounts of data: Please use this when reading millions of rows of data.load_workbook(…, read_only=True), used when writing large amounts of dataWorkbook(write_only=True), to significantly reduce memory usage.
  3. Formulas must be punctuated in English: Even if the entire file content is in Chinese, the formula part must be in half-width English symbols, otherwiseopenpyxlAn error will be reported.
  4. Be careful with the row and column range when referencing chart data:Referenceofmin_rowmax_rowIt must correspond accurately to the data source, otherwise the chart may display abnormally.

Mastering these skills, you can use Python to efficiently generate Excel reports with clear structure, beautiful style and professional charts, truly realizing office automation.