时间序列特征提取与建模技巧

时间序列数据是大模型训练中的重要数据类型，在特征工程中需要进行系统性的处理和提取。本文将分享几个关键的建模技巧。

1. 滞后特征提取 这是时间序列中最基础但重要的特征。通过构造过去n个时间点的值来预测当前值。

import pandas as pd
import numpy as np

def create_lag_features(df, column, lags):
    for lag in lags:
        df[f'{column}_lag_{lag}'] = df[column].shift(lag)
    return df

# 示例使用
# df = create_lag_features(df, 'sales', [1, 2, 3, 7])

2. 滚动窗口统计特征 构造滑动平均、标准差等统计量，能有效捕捉趋势和波动性。

# 滚动均值和标准差
for window in [7, 14, 30]:
    df[f'{column}_rolling_mean_{window}'] = df[column].rolling(window=window).mean()
    df[f'{column}_rolling_std_{window}'] = df[column].rolling(window=window).std()

3. 周期性特征编码 将时间信息转换为周期性函数，如正弦余弦变换。

# 日期分解
from datetime import datetime

# 提取年月日等信息
for col in ['year', 'month', 'day', 'weekday']:
    df[col] = df['date'].dt.getattr(col)
    df[f'{col}_sin'] = np.sin(2 * np.pi * df[col] / df[col].max())
    df[f'{col}_cos'] = np.cos(2 * np.pi * df[col] / df[col].max())

这些特征在实际应用中需要结合具体业务场景进行调整，建议先在小数据集上验证效果再大规模应用。