AI模型部署与推理优化：TensorFlow Serving、ONNX Runtime与GPU加速实践

引言

在人工智能技术快速发展的今天，从模型训练到生产部署的完整流程已经成为AI应用成功的关键环节。随着深度学习模型规模的不断增大和应用场景的日益复杂，如何高效地部署和优化AI模型推理性能成为业界关注的核心问题。

本文将深入探讨AI模型部署与推理优化的技术实践，重点介绍TensorFlow Serving和ONNX Runtime两种主流部署方案，并结合GPU加速技术来提升模型推理性能。通过理论分析与实际代码示例相结合的方式，为读者提供一套完整的AI服务架构构建指南。

一、AI模型部署的挑战与需求

1.1 现代AI部署的核心挑战

在传统软件开发中，应用部署相对简单，但在AI领域，模型部署面临着独特的复杂性：

模型版本管理：不同版本的模型需要并行运行，确保服务的稳定性和可回滚性
性能优化：大规模模型推理需要高效的计算资源利用和低延迟响应
多平台兼容：需要支持多种硬件环境和操作系统
实时性要求：许多应用场景对推理响应时间有严格限制

1.2 部署架构的演进

AI部署架构经历了从简单到复杂的发展过程：

单机部署：模型直接在训练环境中运行，适合开发测试阶段
服务化部署：通过REST API或gRPC提供服务接口
容器化部署：使用Docker等技术实现环境隔离和快速部署
云原生部署：基于Kubernetes的微服务架构，支持弹性伸缩

二、TensorFlow Serving深度解析

2.1 TensorFlow Serving概述

TensorFlow Serving是Google开源的模型服务框架，专门用于生产环境中的机器学习模型部署。它提供了高效的模型管理、版本控制和推理服务功能。

# TensorFlow Serving的基本部署流程示例
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import grpc

# 创建TensorFlow Serving客户端
class TensorFlowServingClient:
    def __init__(self, host='localhost', port=8500):
        self.channel = grpc.insecure_channel(f'{host}:{port}')
        self.stub = prediction_service_pb2_grpc.PredictionServiceStub(self.channel)
    
    def predict(self, model_name, input_data):
        # 构造预测请求
        request = predict_pb2.PredictRequest()
        request.model_spec.name = model_name
        
        # 添加输入数据
        request.inputs['input'].CopyFrom(
            tf.compat.v1.make_tensor_proto(input_data, shape=[1, 224, 224, 3])
        )
        
        # 执行预测
        result = self.stub.Predict(request)
        return result

2.2 模型保存与版本管理

TensorFlow Serving支持多种模型格式的保存和版本控制：

# 模型保存示例
def save_model_for_serving(model, export_dir):
    """
    将训练好的模型保存为TensorFlow Serving可识别的格式
    """
    # 使用SavedModel格式
    tf.saved_model.save(
        model,
        export_dir=export_dir,
        signatures=model.signatures  # 保存签名函数
    )
    
    # 或者使用tf.train.Saver进行保存
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver()
        saver.save(sess, export_dir + '/model.ckpt')

# 模型版本管理示例
def manage_model_versions(base_path):
    """
    管理模型版本目录结构
    """
    import os
    from datetime import datetime
    
    # 创建版本目录
    version = datetime.now().strftime("%Y%m%d_%H%M%S")
    version_path = os.path.join(base_path, version)
    
    if not os.path.exists(version_path):
        os.makedirs(version_path)
    
    return version_path

2.3 性能优化策略

TensorFlow Serving提供了多种性能优化选项：

# TensorFlow Serving配置示例
import json

def create_serving_config():
    """
    创建TensorFlow Serving服务配置
    """
    config = {
        "model_config_list": [
            {
                "config": {
                    "name": "my_model",
                    "base_path": "/models/my_model",
                    "model_platform": "tensorflow",
                    "model_version_policy": {
                        "latest": {
                            "num_versions": 3
                        }
                    }
                }
            }
        ],
        "model_server_config": {
            "enable_batching": True,
            "batching_parameters": {
                "max_batch_size": 64,
                "batch_timeout_micros": 1000,
                "max_enqueued_batches": 1000
            }
        }
    }
    
    return json.dumps(config, indent=2)

# 启动TensorFlow Serving服务
"""
docker run -p 8500:8500 -p 8501:8501 \
    -v /path/to/models:/models \
    tensorflow/serving:latest-gpu \
    --model_name=my_model \
    --model_base_path=/models/my_model
"""

三、ONNX Runtime架构与应用

3.1 ONNX Runtime核心特性

ONNX Runtime是微软和社区共同开发的高性能推理引擎，支持多种深度学习框架模型的统一部署：

# ONNX Runtime基本使用示例
import onnxruntime as ort
import numpy as np

class ONNXInferenceEngine:
    def __init__(self, model_path):
        # 创建推理会话
        self.session = ort.InferenceSession(model_path)
        self.input_names = [input.name for input in self.session.get_inputs()]
        self.output_names = [output.name for output in self.session.get_outputs()]
    
    def predict(self, inputs):
        """
        执行模型推理
        """
        # 准备输入数据
        input_dict = {}
        for i, name in enumerate(self.input_names):
            if isinstance(inputs, list):
                input_dict[name] = np.array(inputs[i])
            else:
                input_dict[name] = np.array(inputs)
        
        # 执行推理
        results = self.session.run(self.output_names, input_dict)
        return results

# 使用示例
# engine = ONNXInferenceEngine('model.onnx')
# predictions = engine.predict([input_data])

3.2 跨平台兼容性优势

ONNX Runtime支持多种硬件加速：

# 不同后端的配置示例
def configure_backend(provider='CPUExecutionProvider'):
    """
    配置不同的执行后端
    """
    if provider == 'CUDAExecutionProvider':
        # GPU加速配置
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
    elif provider == 'TensorRTExecutionProvider':
        # TensorRT优化配置
        providers = ['TensorRTExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
    else:
        # CPU配置
        providers = ['CPUExecutionProvider']
    
    return providers

# 性能优化配置
def optimize_model(model_path, output_path):
    """
    对ONNX模型进行优化
    """
    import onnx
    from onnxruntime.tools import optimizer
    
    # 加载模型
    model = onnx.load(model_path)
    
    # 执行优化
    optimized_model = optimizer.optimize(model)
    
    # 保存优化后的模型
    onnx.save(optimized_model, output_path)
    
    return optimized_model

3.3 模型转换与适配

将不同框架的模型转换为ONNX格式：

# PyTorch到ONNX转换示例
import torch
import torch.onnx

def pytorch_to_onnx(model, input_shape, onnx_path):
    """
    将PyTorch模型转换为ONNX格式
    """
    # 创建示例输入
    dummy_input = torch.randn(*input_shape)
    
    # 导出为ONNX
    torch.onnx.export(
        model,
        dummy_input,
        onnx_path,
        export_params=True,
        opset_version=11,
        do_constant_folding=True,
        input_names=['input'],
        output_names=['output'],
        dynamic_axes={
            'input': {0: 'batch_size'},
            'output': {0: 'batch_size'}
        }
    )

# TensorFlow到ONNX转换示例
def tensorflow_to_onnx(tf_model_path, onnx_path):
    """
    将TensorFlow模型转换为ONNX格式
    """
    import tf2onnx
    
    # 使用tf2onnx工具
    spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
    
    onnx_graph = tf2onnx.convert.from_tensorflow(
        tf_model_path,
        output_path=onnx_path,
        input_names=["input"],
        output_names=["output"]
    )

四、GPU加速技术详解

4.1 GPU资源管理与优化

现代AI推理服务中，GPU加速是提升性能的关键：

# GPU资源管理示例
import tensorflow as tf
import torch

class GPUManager:
    def __init__(self):
        # TensorFlow GPU配置
        gpus = tf.config.experimental.list_physical_devices('GPU')
        if gpus:
            try:
                for gpu in gpus:
                    tf.config.experimental.set_memory_growth(gpu, True)
                print(f"检测到 {len(gpus)} 个GPU设备")
            except RuntimeError as e:
                print(f"GPU配置错误: {e}")
    
    def configure_gpu_memory(self, memory_limit=4096):
        """
        配置GPU内存限制
        """
        gpus = tf.config.experimental.list_physical_devices('GPU')
        if gpus:
            try:
                tf.config.experimental.set_virtual_device_configuration(
                    gpus[0],
                    [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=memory_limit)]
                )
                print(f"设置GPU内存限制为 {memory_limit} MB")
            except RuntimeError as e:
                print(f"GPU配置错误: {e}")

# PyTorch GPU配置
def setup_pytorch_gpu():
    """
    PyTorch GPU环境配置
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"使用GPU: {torch.cuda.get_device_name(0)}")
        return device
    else:
        print("未检测到GPU，使用CPU")
        return torch.device("cpu")

4.2 批处理优化策略

通过合理的批处理来提升GPU利用率：

# 批处理优化示例
class BatchProcessor:
    def __init__(self, max_batch_size=32, batch_timeout=100):
        self.max_batch_size = max_batch_size
        self.batch_timeout = batch_timeout
        self.batch_queue = []
        self.last_batch_time = 0
    
    def add_request(self, request_data):
        """
        添加请求到批处理队列
        """
        self.batch_queue.append(request_data)
        
        # 检查是否需要立即处理
        if len(self.batch_queue) >= self.max_batch_size:
            return self.process_batch()
        
        # 检查超时时间
        current_time = time.time()
        if current_time - self.last_batch_time > self.batch_timeout / 1000:
            return self.process_batch()
        
        return None
    
    def process_batch(self):
        """
        处理批处理请求
        """
        if not self.batch_queue:
            return []
        
        batch_data = self.batch_queue.copy()
        self.batch_queue.clear()
        self.last_batch_time = time.time()
        
        # 执行批量推理
        return self.inference_batch(batch_data)
    
    def inference_batch(self, batch_data):
        """
        批量推理实现
        """
        # 这里应该调用具体的推理函数
        results = []
        for data in batch_data:
            # 执行单个推理
            result = self.single_inference(data)
            results.append(result)
        
        return results

# 使用示例
# processor = BatchProcessor(max_batch_size=64, batch_timeout=50)

4.3 内存优化技术

针对GPU内存不足的问题，采用多种优化策略：

# 内存优化示例
import gc
import torch.nn.utils.prune as prune

class MemoryOptimizer:
    def __init__(self):
        self.memory_usage = []
    
    def optimize_model_memory(self, model):
        """
        模型内存优化
        """
        # 1. 模型量化
        quantized_model = self.quantize_model(model)
        
        # 2. 权重剪枝
        pruned_model = self.prune_model(quantized_model)
        
        return pruned_model
    
    def quantize_model(self, model):
        """
        模型量化
        """
        # 使用PyTorch的量化工具
        model.eval()
        model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
        torch.quantization.prepare(model, inplace=True)
        torch.quantization.convert(model, inplace=True)
        
        return model
    
    def prune_model(self, model):
        """
        模型剪枝
        """
        # 对模型进行剪枝处理
        for name, module in model.named_modules():
            if isinstance(module, (torch.nn.Linear, torch.nn.Conv2d)):
                prune.l1_unstructured(module, name='weight', amount=0.3)
                prune.remove(module, 'weight')
        
        return model
    
    def clear_cache(self):
        """
        清理缓存和内存
        """
        gc.collect()
        if torch.cuda.is_available():
            torch.cuda.empty_cache()

五、实际部署架构设计

5.1 微服务架构实现

构建基于微服务的AI推理平台：

# 基于Flask的推理服务示例
from flask import Flask, request, jsonify
import json
import time

app = Flask(__name__)

class InferenceService:
    def __init__(self):
        self.model = None
        self.engine = None
        self.load_model()
    
    def load_model(self):
        """
        加载推理模型
        """
        # 根据配置加载不同类型的模型
        model_type = "onnx"  # 或 "tensorflow"
        
        if model_type == "onnx":
            self.engine = ONNXInferenceEngine("model.onnx")
        elif model_type == "tensorflow":
            self.engine = TensorFlowServingClient()
        
        print("模型加载完成")
    
    def predict(self, input_data):
        """
        执行推理
        """
        start_time = time.time()
        
        try:
            # 执行推理
            result = self.engine.predict(input_data)
            
            # 计算处理时间
            processing_time = time.time() - start_time
            
            return {
                "result": result,
                "processing_time": processing_time,
                "success": True
            }
        except Exception as e:
            return {
                "error": str(e),
                "success": False
            }

# 推理服务API
service = InferenceService()

@app.route('/predict', methods=['POST'])
def predict():
    """
    推理API端点
    """
    try:
        data = request.get_json()
        
        # 验证输入数据
        if not data or 'input' not in data:
            return jsonify({"error": "Invalid input data"}), 400
        
        # 执行推理
        result = service.predict(data['input'])
        
        return jsonify(result)
    
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route('/health', methods=['GET'])
def health_check():
    """
    健康检查端点
    """
    return jsonify({"status": "healthy", "timestamp": time.time()})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=False)

5.2 容器化部署方案

使用Docker进行容器化部署：

# Dockerfile示例
FROM tensorflow/tensorflow:2.13.0-gpu-jupyter

# 设置工作目录
WORKDIR /app

# 复制依赖文件
COPY requirements.txt .

# 安装Python依赖
RUN pip install --no-cache-dir -r requirements.txt

# 复制应用代码
COPY . .

# 暴露端口
EXPOSE 5000

# 启动命令
CMD ["python", "app.py"]

# docker-compose.yml示例
version: '3.8'

services:
  inference-service:
    build: .
    ports:
      - "5000:5000"
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - TF_FORCE_GPU_ALLOW_GROWTH=true
    volumes:
      - ./models:/app/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

5.3 监控与日志系统

完整的监控和日志解决方案：

# 监控系统实现
import logging
from datetime import datetime
import time

class InferenceMonitor:
    def __init__(self):
        # 配置日志
        self.logger = logging.getLogger('inference_monitor')
        handler = logging.FileHandler('inference.log')
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
        
        # 性能指标
        self.metrics = {
            'total_requests': 0,
            'successful_requests': 0,
            'failed_requests': 0,
            'avg_processing_time': 0,
            'peak_memory_usage': 0
        }
    
    def record_request(self, request_data, response_data, processing_time):
        """
        记录请求信息
        """
        self.metrics['total_requests'] += 1
        
        if response_data.get('success'):
            self.metrics['successful_requests'] += 1
        else:
            self.metrics['failed_requests'] += 1
        
        # 更新平均处理时间
        current_avg = self.metrics['avg_processing_time']
        total_requests = self.metrics['total_requests']
        self.metrics['avg_processing_time'] = (
            (current_avg * (total_requests - 1) + processing_time) / total_requests
        )
        
        # 记录日志
        self.logger.info(
            f"Request processed - "
            f"Time: {processing_time:.4f}s, "
            f"Status: {'Success' if response_data.get('success') else 'Failed'}"
        )
    
    def get_metrics(self):
        """
        获取监控指标
        """
        return self.metrics.copy()
    
    def log_performance(self):
        """
        记录性能指标到日志
        """
        metrics = self.get_metrics()
        self.logger.info(f"Performance Metrics: {json.dumps(metrics, indent=2)}")

# 性能监控装饰器
def monitor_performance(func):
    """
    性能监控装饰器
    """
    def wrapper(*args, **kwargs):
        start_time = time.time()
        
        try:
            result = func(*args, **kwargs)
            processing_time = time.time() - start_time
            
            # 记录性能指标
            monitor = InferenceMonitor()
            monitor.record_request(args[0] if args else {}, result, processing_time)
            
            return result
        except Exception as e:
            processing_time = time.time() - start_time
            monitor = InferenceMonitor()
            monitor.record_request(args[0] if args else {}, {"error": str(e)}, processing_time)
            raise e
    
    return wrapper

六、性能优化最佳实践

6.1 模型压缩与量化

# 模型压缩技术实现
class ModelCompressor:
    def __init__(self):
        pass
    
    def quantize_model(self, model_path, output_path, quantization_type='int8'):
        """
        模型量化压缩
        """
        import torch
        import torch.quantization
        
        # 加载模型
        model = torch.load(model_path)
        model.eval()
        
        # 配置量化
        if quantization_type == 'int8':
            model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
        elif quantization_type == 'float16':
            # 半精度浮点数
            model = model.half()
        
        # 应用量化
        torch.quantization.prepare(model, inplace=True)
        torch.quantization.convert(model, inplace=True)
        
        # 保存压缩后的模型
        torch.save(model, output_path)
        
        return model
    
    def prune_model(self, model_path, output_path, pruning_ratio=0.3):
        """
        模型剪枝压缩
        """
        import torch
        import torch.nn.utils.prune as prune
        
        # 加载模型
        model = torch.load(model_path)
        
        # 对所有线性层和卷积层进行剪枝
        for name, module in model.named_modules():
            if isinstance(module, (torch.nn.Linear, torch.nn.Conv2d)):
                prune.l1_unstructured(module, name='weight', amount=pruning_ratio)
                prune.remove(module, 'weight')
        
        # 保存剪枝后的模型
        torch.save(model, output_path)
        
        return model

# 使用示例
# compressor = ModelCompressor()
# compressed_model = compressor.quantize_model('original_model.pth', 'compressed_model.pth')

6.2 缓存策略优化

# 智能缓存系统实现
import hashlib
import pickle
from typing import Any, Optional

class InferenceCache:
    def __init__(self, max_size=1000, ttl=3600):
        self.cache = {}
        self.max_size = max_size
        self.ttl = ttl  # 秒
        self.access_times = {}
    
    def _get_key(self, input_data: Any) -> str:
        """
        生成缓存键
        """
        # 使用输入数据的哈希值作为键
        data_str = str(input_data)
        return hashlib.md5(data_str.encode()).hexdigest()
    
    def get(self, input_data: Any) -> Optional[Any]:
        """
        获取缓存结果
        """
        key = self._get_key(input_data)
        
        if key in self.cache:
            # 检查是否过期
            if time.time() - self.access_times[key] < self.ttl:
                return self.cache[key]
            else:
                # 过期，删除缓存
                del self.cache[key]
                del self.access_times[key]
        
        return None
    
    def set(self, input_data: Any, result: Any):
        """
        设置缓存结果
        """
        key = self._get_key(input_data)
        
        # 如果缓存已满，删除最旧的项
        if len(self.cache) >= self.max_size:
            oldest_key = min(self.access_times.keys(), 
                           key=lambda k: self.access_times[k])
            del self.cache[oldest_key]
            del self.access_times[oldest_key]
        
        # 添加到缓存
        self.cache[key] = result
        self.access_times[key] = time.time()
    
    def clear(self):
        """
        清空缓存
        """
        self.cache.clear()
        self.access_times.clear()

# 使用示例
# cache = InferenceCache(max_size=500, ttl=1800)
# result = cache.get(input_data)
# if result is None:
#     result = model.predict(input_data)
#     cache.set(input_data, result)

6.3 异步处理机制

# 异步推理实现
import asyncio
import concurrent.futures
from typing import List, Dict, Any

class AsyncInferenceEngine:
    def __init__(self, max_workers=4):
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
        self.cache = InferenceCache()
    
    async def async_predict(self, input_data: Any) -> Any:
        """
        异步推理
        """
        # 检查缓存
        cached_result = self.cache.get(input_data)
        if cached_result is not None:
            return cached_result
        
        # 异步执行推理
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(
            self.executor, 
            self._sync_predict, 
            input_data
        )
        
        # 缓存结果
        self.cache.set(input_data, result)
        
        return result
    
    def _sync_predict(self, input_data: Any) -> Any:
        """
        同步推理实现
        """
        # 这里调用具体的推理函数
        engine = ONNXInferenceEngine("model.onnx")
        return engine.predict(input_data)
    
    async def batch_predict(self, input_batch: List[Any]) -> List[Any]:
        """
        批量异步推理
        """
        tasks = [self.async_predict(data) for data in input_batch]
        results = await asyncio.gather(*tasks)
        return results

# 使用示例
async def main():
    engine = AsyncInferenceEngine(max_workers=4)
    
    # 单个推理
    result = await engine.async_predict(input_data)
    
    # 批量推理
    batch_results = await engine.batch_predict([data1, data2, data3])

# asyncio.run(main())

七、部署监控与维护

7.1 性能监控指标

# 完整的性能监控系统
import psutil
import time
from collections import defaultdict

class PerformanceMonitor:
    def __init__(self):
        self.metrics = defaultdict(list)
        self.start_time = time.time()
    
    def collect_system_metrics(self):
        """
        收集系统指标
        """
        metrics = {
            'timestamp': time.time(),
            'cpu_percent': psutil.cpu_percent(interval=1),
            'memory_percent': psutil.virtual_memory().percent,
            'gpu_utilization': self.get_gpu_utilization(),
            'gpu_memory': self.get_gpu_memory_usage(),
            'network_io': psutil.net_io_counters(),
            'disk_io': psutil.disk_io_counters()
        }
        
        return metrics
    
    def get_gpu_utilization(self):
        """
        获取GPU利用率
        """
        try:
            import pynvml
            pynvml.nvmlInit()
            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
            util = pynvml.nvmlDeviceGetUtilizationRates(handle)
            return util.gpu
        except:
            return 0
    
    def get_gpu_memory_usage(self):
        """
        获取GPU内存使用情况
        """
        try:
            import pynvml
            pynvml.nvmlInit()
            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
            mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
            return {
                'used': mem_info.used,
                'total': mem_info.total,
                'percent': (mem_info.used / mem_info.total) * 100
            }
        except:

AI模型部署与推理优化：TensorFlow Serving、ONNX Runtime与GPU加速实践

引言

一、AI模型部署的挑战与需求

1.1 现代AI部署的核心挑战

1.2 部署架构的演进

二、TensorFlow Serving深度解析

2.1 TensorFlow Serving概述

2.2 模型保存与版本管理

2.3 性能优化策略

三、ONNX Runtime架构与应用

3.1 ONNX Runtime核心特性

3.2 跨平台兼容性优势

3.3 模型转换与适配

四、GPU加速技术详解

4.1 GPU资源管理与优化

4.2 批处理优化策略

4.3 内存优化技术

五、实际部署架构设计

5.1 微服务架构实现

5.2 容器化部署方案

5.3 监控与日志系统

六、性能优化最佳实践

6.1 模型压缩与量化

6.2 缓存策略优化

6.3 异步处理机制

七、部署监控与维护

7.1 性能监控指标

相似文章

评论 (0)

AI模型部署与推理优化：TensorFlow Serving、ONNX Runtime与GPU加速实践

引言

一、AI模型部署的挑战与需求

1.1 现代AI部署的核心挑战

1.2 部署架构的演进

二、TensorFlow Serving深度解析

2.1 TensorFlow Serving概述

2.2 模型保存与版本管理

2.3 性能优化策略

三、ONNX Runtime架构与应用

3.1 ONNX Runtime核心特性

3.2 跨平台兼容性优势

3.3 模型转换与适配

四、GPU加速技术详解

4.1 GPU资源管理与优化

4.2 批处理优化策略

4.3 内存优化技术

五、实际部署架构设计

5.1 微服务架构实现

5.2 容器化部署方案

5.3 监控与日志系统

六、性能优化最佳实践

6.1 模型压缩与量化

6.2 缓存策略优化

6.3 异步处理机制

七、部署监控与维护

7.1 性能监控指标

相似文章

评论 (0)

选择表情