在ATC模型转换动态shape时遭遇的难题：案例、原因与解决方法

人工智能

2023-06-07 10:59:23

动态 Shape 模型转换中的难题：原因和解决方法

简介

深度学习模型中，动态 Shape 指模型的输入或输出形状可以在运行时改变。虽然这赋予了模型处理不同尺寸输入数据的灵活性，但它也给模型转换带来了挑战。本文将深入探讨动态 Shape 模型转换中常见的难题，分析其原因，并提供针对性的解决方案。

输入 Shape 不确定

问题某些模型的输入形状在转换前无法确定。例如，自然语言处理模型的输入文本长度可能有所不同。

原因分析： 模型转换器（如华为 Ascend AI Compiler，简称 ATC）需要提前了解模型的输入形状，以便正确转换模型。

解决方法： 使用占位符来代替实际输入形状。占位符是一种特殊符号，表示一个未知形状。在转换模型时，ATC 将占位符替换为实际的输入形状。

示例代码：

import tensorflow as tf

# 定义一个输入形状不确定的模型
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(None,)),
    tf.keras.layers.Dense(128),
    tf.keras.layers.Dense(1)
])

# 使用占位符转换模型
import ascend
ascend.compile(model, output_path="model.atc")

输出 Shape 不确定

问题： 某些模型的输出形状在转换前无法确定。例如，图像分割模型的输出掩码形状取决于输入图像的大小。

原因分析： 类似于输入形状不确定性，ATC 转换器也需要了解模型的输出形状。

解决方法： 使用动态 Shape 参数来代替实际输出形状。动态 Shape 参数是一种特殊参数，表示一个可变形状。在转换模型时，ATC 将动态 Shape 参数替换为实际的输出形状。

示例代码：

import tensorflow as tf

# 定义一个输出形状不确定的模型
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(224, 224, 3)),
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# 使用动态 Shape 参数转换模型
import ascend
ascend.compile(model, output_path="model.atc", dynamic_shape=True)

自定义转换器

问题： 如果以上方法都无法解决动态 Shape 问题，可以使用自定义转换器。

原因分析： 自定义转换器是一个特殊程序，可以将模型从一种格式转换为另一种格式。

解决方法： 在自定义转换器中，可以指定模型的输入和输出形状。

示例代码：

import numpy as np
import tensorflow as tf

# 定义一个自定义转换器
class CustomConverter(ascend.Converter):
    def __init__(self):
        super().__init__()

    def convert(self, model):
        # 指定模型的输入形状和输出形状
        model.input_shape = (None,)
        model.output_shape = (None,)
        return model

# 使用自定义转换器转换模型
custom_converter = CustomConverter()
ascend.compile(model, output_path="model.atc", converter=custom_converter)