Pydantic 数据验证与转换

Pydantic 是目前最流行的 Python 数据验证工具

  • Python 作为一种动态类型语言,在开发和易用性上存在优势
  • 因此也需要对程序进行功能更强大的类型检查和数据验证

Pydantic 的特点

  1. 自定义与扩展:可用于任意 Python 对象的数据类型验证,支持嵌套结构
  2. 验证的灵活性:类型丰富,验证的时间点灵活,严格模式 vs 宽松模式
  3. 序列化:Pydantic 对象支持序列化和反序列化为字典和 JSON 字符串
  4. 高性能:核心验证逻辑基于 Rust 编写,性能出色且可靠,支持高吞吐量
  5. 生态完善:是很多流行库(FastAPI,LangChain 等)的依赖,社区活跃

严格模式 vs 宽松模式 严格模式 :仅当验证值属于相应类型或该类型的子类型时(StrictBoolStrictBytesStrictFloatStrictIntStrictStr),这些类型才会通过验证 宽松模式:将传入数据强制转换为正确的类型,兼容性更强

官方文档

示例 1:Pydantic 模型与验证器

from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr

class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"

# Pydantic 模型是一个类似于 Python 数据类的对象
# 类内部会定义和存储带有注释字段的实体数据
class Employee(BaseModel): 
    """验证员工信息的 Pydantic 模型"""
    employee_id: UUID = uuid4()
    name: str = Field(min_length=1, frozen=True) # frozen:实例后内容不可修改
    email: EmailStr # 自带邮件格式的验证器
    # email: EmailStr = Field(pattern=r".+@example\.com$") # 正则方式的验证器
    date_of_birth: date = Field(alias="birth_date", repr=False) # 别名,不显示
    salary: float = Field(alias="compensation", gt=0, repr=False) # 数字必须大于0
    department: Department # 嵌套模式
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod # 验证器示例:根据出生日期,判断员工必须满18岁
    def check_valid_age(cls, date_of_birth: date) -> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)
        if date_of_birth > eighteen_years_ago:
            raise ValueError("Employees must be at least 18 years old.")
        return date_of_birth

    @model_validator(mode="after") # 在实例化后再进行验证
     def check_it_benefits(self) -> Self:
         department = self.department
         elected_benefits = self.elected_benefits
        # IT 部门雇佣的都是合同工,因此没有资格享受福利
         if department == Department.IT and elected_benefits:
             raise ValueError(
                 "IT employees are contractors and don't qualify for benefits"
             )
         return self

# Pydantic 模型的实例化
Employee(
     name="Chris DeTuma",
     email="[email protected]",
     date_of_birth="1998-04-02",
     salary=123_000.00,
     department="IT",
     elected_benefits=True,
)

# 根据字典实例化 Employee 对象
new_employee_dict = {
     "name": "Chris DeTuma",
     "email": "[email protected]",
     "date_of_birth": "1998-04-02",
     "salary": 123_000.00,
     "department": "IT",
     "elected_benefits": True,
 }
Employee.model_validate(new_employee_dict)

model_validator 的参数 mode,有 2 种模式: (1)before:在默认验证之前验证数据,一般更常用 (2)after:在默认验证之后验证数据,需要将对象作为 self 传入

示例 2:装饰器来验证函数的参数


import time
from typing import Annotated
from pydantic import PositiveFloat, Field, EmailStr, validate_call

@validate_call
def send_invoice(
    client_name: Annotated[str, Field(min_length=1)],
    client_email: EmailStr,
    items_purchased: list[str],
    amount_owed: PositiveFloat,
) -> str:

    email_str = f"""
    Dear {client_name}, \n
    Thank you for choosing xyz inc! You
    owe ${amount_owed:,.2f} for the following items: \n
    {items_purchased}
    """

    print(f"Sending email to {client_email}...")
    time.sleep(2)

    return email_str

@validate_call 装饰器虽然不如 BaseModel 灵活,但依然能对函数参数应用强大的验证;这样可以节省大量时间,并避免编写样板类型检查和验证逻辑

示例 3:验证和集成环境变量

# 先导入环境变量
export DATABASE_HOST="http://somedatabaseprovider.us-east-2.com"
export DATABASE_USER="username"
export DATABASE_PASSWORD="asdfjl348ghl@9fhsl4"
export API_KEY="ajfsdla48fsdal49fj94jf93-f9dsal"
from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict

# BaseSettings 基类,会尝试读取对应参数关键词的环境变量
class AppConfig(BaseSettings):
    database_host: HttpUrl
    database_user: str = Field(min_length=5)
    database_password: str = Field(min_length=10)
    api_key: str = Field(min_length=20)    

AppConfig()
# AppConfig(
#     database_host=Url('http://somedatabaseprovider.us-east-2.com/'),
#     database_user='username',
#     database_password='asdfjl348ghl@9fhsl4',
#     api_key='ajfsdla48fsdal49fj94jf93-f9dsal'
# ) # 输出预览



class AppConfig(BaseSettings):
# 方式2:用 SettingsConfigDict 从 .env 文件读取环境变量
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=True,
        extra="forbid",
    )

    database_host: HttpUrl
    database_user: str = Field(min_length=5)
    database_password: str = Field(min_length=10)
    api_key: str = Field(min_length=20)

参考: Pydantic: Simplifying Data Validation in Python
A Practical Guide to using Pydantic

往年同期文章