Django file upload and storage - complete file processing solution | Daoman PythonAI

#django file upload and storage - complete file processing solution

📂 Stage: Part 2 - Advanced Features 🎯 Difficulty level: Intermediate ⏰ Estimated study time: 4-5 hours

Table of contents

Basic concepts of file upload

Django provides a complete file upload and processing framework, supporting local storage, cloud storage and other methods.

Principle of file upload

The django file upload workflow is:

  1. Client usemultipart/form-dataEncoding uploaded files
  2. django receives files and createsUploadedFileobject
  3. Files are stored to a temporary location or processed directly
  4. You can save files to a specified location or storage service
  5. Save file path information in the model

Major file types include:

  • UploadedFile: Basic upload file class
  • TemporaryUploadedFile: Temporarily upload files
  • InMemoryUploadedFile: Uploaded files in memory

File upload form and model

from django import forms
from django.db import models
import os

class FileUploadForm(forms.Form):
    """基础文件上传表单"""
    title = forms.CharField(max_length=100)
    file = forms.FileField(
        label='选择文件',
        help_text='支持的文件类型: PDF, DOC, XLS, JPG, PNG'
    )
    
    def clean_file(self):
        """验证上传的文件"""
        uploaded_file = self.cleaned_data['file']
        
        # 检查文件大小(限制为5MB)
        if uploaded_file.size > 5 * 1024 * 1024:
            raise forms.ValidationError('文件大小不能超过5MB')
        
        # 检查文件类型
        allowed_extensions = ['.pdf', '.doc', '.docx', '.xls', '.xlsx', '.jpg', '.jpeg', '.png']
        ext = os.path.splitext(uploaded_file.name)[1].lower()
        
        if ext not in allowed_extensions:
            raise forms.ValidationError(
                f'不支持的文件类型: {ext}。支持的类型: {", ".join(allowed_extensions)}'
            )
        
        return uploaded_file

class Document(models.Model):
    """文档模型"""
    title = models.CharField(max_length=200)
    file = models.FileField(upload_to='documents/%Y/%m/')  # 按年月组织
    uploaded_at = models.DateTimeField(auto_now_add=True)
    file_size = models.PositiveIntegerField(null=True, blank=True)
    content_type = models.CharField(max_length=100, null=True, blank=True)
    
    def save(self, *args, **kwargs):
        """保存时设置文件相关信息"""
        if self.file:
            self.file_size = self.file.size
            if hasattr(self.file, 'content_type'):
                self.content_type = self.file.content_type
        super().save(*args, **kwargs)

##django file processing architecture {#django file processing architecture}

The Django file handling system consists of several core components that work together to handle file uploading and storage.

Core component of file processing

from django.core.files.base import File, ContentFile
from django.core.files.storage import default_storage, FileSystemStorage
from django.core.files.uploadedfile import (
    UploadedFile, TemporaryUploadedFile, InMemoryUploadedFile
)
import uuid
import os

class FileUploadHandler:
    """文件上传处理器"""
    
    def __init__(self, request):
        self.request = request
    
    def process_file(self, uploaded_file):
        """处理单个上传文件"""
        # 验证文件
        self.validate_file(uploaded_file)
        
        # 生成安全的文件名
        safe_filename = self.generate_safe_filename(uploaded_file.name)
        
        # 保存文件
        saved_path = default_storage.save(safe_filename, uploaded_file)
        
        return {
            'original_name': uploaded_file.name,
            'saved_name': safe_filename,
            'saved_path': saved_path,
            'size': uploaded_file.size,
            'content_type': getattr(uploaded_file, 'content_type', '')
        }
    
    def validate_file(self, uploaded_file):
        """验证上传文件"""
        # 检查文件大小
        max_size = 10 * 1024 * 1024  # 10MB
        if uploaded_file.size > max_size:
            raise ValueError(f'文件太大,最大支持 {max_size / (1024*1024)}MB')
        
        # 检查文件名安全性
        if '..' in uploaded_file.name or '/' in uploaded_file.name:
            raise ValueError('文件名包含非法字符')
    
    def generate_safe_filename(self, original_filename):
        """生成安全的文件名"""
        name, ext = os.path.splitext(original_filename)
        safe_name = f"{uuid.uuid4().hex}{ext.lower()}"
        return safe_name

File storage configuration

Configuring Django file storage is the basic step to achieve file management.

Basic storage configuration

# settings.py - 基础文件存储配置
import os
from pathlib import Path

BASE_DIR = Path(__file__).resolve().parent.parent

# 媒体文件配置
MEDIA_URL = '/media/'  # 媒体文件URL前缀
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')  # 媒体文件存储根目录

# 静态文件配置
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')
STATICFILES_DIRS = [
    os.path.join(BASE_DIR, 'static'),
]

# 文件上传配置
FILE_UPLOAD_MAX_MEMORY_SIZE = 2621440  # 2.5MB,超过此大小的文件将保存到临时文件
DATA_UPLOAD_MAX_MEMORY_SIZE = 2621440  # 2.5MB,POST数据的最大内存大小
FILE_UPLOAD_TEMP_DIR = os.path.join(BASE_DIR, 'tmp')  # 临时文件目录

# 创建必要的目录
os.makedirs(MEDIA_ROOT, exist_ok=True)
os.makedirs(FILE_UPLOAD_TEMP_DIR, exist_ok=True)

Secure file upload

The file upload function is a common security risk point in web applications and requires special attention.

File verification security

import os
import magic  # 需要python-magic包
from django.core.exceptions import ValidationError
from django.utils.translation import gettext_lazy as _

class SecureFileValidator:
    """安全文件验证器"""
    
    def __init__(self, allowed_extensions=None, allowed_mimes=None, max_size=None):
        self.allowed_extensions = allowed_extensions or [
            '.jpg', '.jpeg', '.png', '.gif', '.pdf', '.doc', '.docx', '.txt'
        ]
        self.allowed_mimes = allowed_mimes or [
            'image/jpeg', 'image/png', 'image/gif',
            'application/pdf', 'text/plain',
            'application/msword',
            'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
        ]
        self.max_size = max_size or 10 * 1024 * 1024  # 10MB
    
    def validate(self, uploaded_file):
        """执行完整验证"""
        self.validate_size(uploaded_file)
        self.validate_extension(uploaded_file)
        self.validate_mime_type(uploaded_file)
        self.validate_content(uploaded_file)
        self.validate_filename(uploaded_file)
    
    def validate_mime_type(self, uploaded_file):
        """验证MIME类型(双重检查)"""
        # 基于文件内容的真实MIME类型
        real_mime = magic.from_buffer(uploaded_file.read(1024), mime=True)
        uploaded_file.seek(0)  # 重置文件指针
        
        if real_mime not in self.allowed_mimes:
            raise ValidationError(
                _('不允许的文件类型: %(mime)s'),
                params={'mime': real_mime},
            )
    
    def validate_content(self, uploaded_file):
        """验证文件内容(防恶意代码)"""
        # 检查文件头
        header = uploaded_file.read(1024)
        uploaded_file.seek(0)  # 重置
        
        # 检查是否包含HTML/JS标签(常见攻击向量)
        suspicious_patterns = [
            b'<script', b'javascript:', b'vbscript:', b'<iframe',
            b'<object', b'<embed', b'<?php', b'<?', b'<%'
        ]
        
        header_lower = header.lower()
        for pattern in suspicious_patterns:
            if pattern in header_lower:
                raise ValidationError(_('文件可能包含恶意代码'))

Cloud Storage Integration

As the scale of applications expands, local storage often cannot meet the demand. In this case, you can consider using cloud storage services.

AWS S3 integration

# settings.py 配置
"""
# AWS配置
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME')
AWS_S3_REGION_NAME = os.environ.get('AWS_S3_REGION_NAME', 'us-east-1')
AWS_S3_CUSTOM_DOMAIN = f'{AWS_STORAGE_BUCKET_NAME}.s3.amazonaws.com'

# S3设置
AWS_S3_OBJECT_PARAMETERS = {
    'CacheControl': 'max-age=86400',
}
AWS_DEFAULT_ACL = 'public-read'

# 媒体文件S3存储
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
"""

# 使用S3存储的模型
from django.db import models
from storages.backends.s3boto3 import S3Boto3Storage

class PublicMediaStorage(S3Boto3Storage):
    """公共媒体存储"""
    location = 'media'
    default_acl = 'public-read'
    file_overwrite = False

class S3Document(models.Model):
    """S3文档模型"""
    title = models.CharField(max_length=200)
    public_file = models.FileField(
        upload_to='public_docs/',
        storage=PublicMediaStorage(),
        blank=True,
        null=True
    )
    uploaded_at = models.DateTimeField(auto_now_add=True)

File processing and optimization

After uploading the file, we usually need to perform some processing on the file, such as resizing the image, compressing the file, etc.

Image processing optimization

from PIL import Image
from io import BytesIO
from django.core.files.base import ContentFile

class ImageProcessor:
    """图像处理器"""
    
    @staticmethod
    def resize_image(image_file, max_width=800, max_height=600, quality=85):
        """调整图像大小"""
        # 打开图像
        img = Image.open(image_file)
        
        # 保持宽高比
        img.thumbnail((max_width, max_height), Image.Resampling.LANCZOS)
        
        # 创建输出缓冲区
        output = BytesIO()
        
        # 保存图像
        img_format = img.format or 'JPEG'
        img.save(output, format=img_format, quality=quality, optimize=True)
        
        # 创建ContentFile
        output.seek(0)
        resized_file = ContentFile(output.read(), name=image_file.name)
        
        return resized_file
    
    @staticmethod
    def compress_image(image_file, target_size_kb=500):
        """压缩图像到目标大小"""
        img = Image.open(image_file)
        
        # 二分查找合适的质量
        low_quality, high_quality = 10, 95
        best_file = None
        
        while low_quality <= high_quality:
            mid_quality = (low_quality + high_quality) // 2
            
            output = BytesIO()
            img.save(output, format='JPEG', quality=mid_quality, optimize=True)
            size_kb = len(output.getvalue()) / 1024
            
            if size_kb <= target_size_kb:
                best_file = ContentFile(output.getvalue(), name=image_file.name)
                low_quality = mid_quality + 1
            else:
                high_quality = mid_quality - 1
        
        return best_file or image_file

Frequently Asked Questions and Solutions

In actual development, we often encounter some problems related to file upload. Here are some common problems and their solutions.

Problem 1: File upload timeout

Symptoms: Timeout error when uploading large files

Solution:

# settings.py
DATA_UPLOAD_MAX_NUMBER_FIELDS = 1000
FILE_UPLOAD_MAX_MEMORY_SIZE = 10 * 1024 * 1024  # 10MB内存限制
DATA_UPLOAD_MAX_MEMORY_SIZE = 10 * 1024 * 1024  # 10MB数据限制

# 对于大文件,直接保存到临时位置
def large_file_upload_view(request):
    """大文件上传视图"""
    if request.method == 'POST':
        uploaded_file = request.FILES.get('file')
        
        if uploaded_file:
            import tempfile
            import os
            from django.core.files.storage import default_storage
            
            # 创建临时文件
            with tempfile.NamedTemporaryFile(delete=False) as temp_file:
                for chunk in uploaded_file.chunks():
                    temp_file.write(chunk)
                temp_file_path = temp_file.name
            
            try:
                # 移动到最终位置
                final_path = default_storage.save(uploaded_file.name, open(temp_file_path, 'rb'))
                return JsonResponse({'success': True, 'path': final_path})
            finally:
                # 清理临时文件
                os.unlink(temp_file_path)
    
    return render(request, 'large_upload.html')

Problem 2: File name conflict

Symptom: When uploading a file with the same name, the original file is overwritten.

Solution:

# 使用UUID生成唯一文件名
import uuid
from django.utils import timezone
from django.db import models

def unique_upload_path(instance, filename):
    """生成唯一上传路径"""
    ext = filename.split('.')[-1]
    filename = f"{uuid.uuid4().hex}.{ext}"
    return f"uploads/{timezone.now().strftime('%Y/%m/%d')}/{filename}"

class UniqueFileModel(models.Model):
    """使用唯一文件名的模型"""
    file = models.FileField(upload_to=unique_upload_path)

Summary of this chapter

In this chapter, we have an in-depth study of the Django file upload and storage system, including:

  1. Basics of file upload: Understand the working principles and basic concepts of django file upload
  2. File processing architecture: Master the core components and architecture of django file processing
  3. File Fields and Forms: Learned how to handle file fields in models and forms
  4. Storage Configuration: Understand the various configuration methods of local and cloud storage
  5. Secure Upload: Learned the security practices of file verification and access control
  6. Cloud Storage Integration: Master the integration methods of cloud storage such as AWS S3
  7. File Optimization: Understand optimization techniques such as image processing
  8. Common Problem Solving: Learned the solutions to common problems in the file upload process

💡 Core Points: File processing is an important function of web applications. It must not only ensure perfect functions, but also ensure safety and reliability. Proper use of Django's file processing framework, combined with security verification and appropriate storage strategies, can build a stable and efficient file management system.


🏷️ tag cloud:django文件上传 文件存储 文件处理 云存储 django媒体文件 安全上传 图像处理