-
下载trition server镜像并运行
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
docker run挂载模型路径
docker run --gpus all --rm --net=host -v /path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 bash
进入容器后执行
tritonserver --model-repository=/models
...
I0315 03:54:57.814177 829 server.cc:549]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0315 03:54:57.814214 829 server.cc:592]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| cust_py | 1 | READY |
| densenet_onnx | 1 | READY |
| fc_model_pth | 1 | READY |
| kfd_trt | 1 | READY |
+---------------+---------+--------+
I0315 03:54:57.814295 829 tritonserver.cc:1920]
+----------------------------------+------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------+
| server_id | triton |
| server_version | 2.15.0 |
| server_extensions | classification sequence model_repository model_repos |
| | itory(unload_dependents) schedule_policy model_confi |
| | guration system_shared_memory cuda_shared_memory bin |
| | ary_tensor_data statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+------------------------------------------------------+
I0315 03:54:57.815274 829 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0315 03:54:57.815463 829 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0315 03:54:57.856651 829 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
-
模型注册(参数配置易出错)
在宿主机模型路径下创建算法模型目录:
.
├── cust_py
│ ├── 1
│ │ ├── model.py
│ └── config.pbtxt
├── densenet_onnx
│ ├── 1
│ │ └── model.onnx
│ └── config.pbtxt
├── fc_model_pth
│ ├── 1
│ │ └── model.pt
│ └── config.pbtxt
└── kfd_trt
├── 1
│ └── model.plan
└── config.pbtxt
每个模型路径下包含模型版本文件夹和模型配置文件,以tensorrt为例:
name: "kfd_trt" #模型名称
platform: "tensorrt_plan" #推理框架名称,不同的推理框架和platform名称对应关系如下
max_batch_size : 0
input [
{
name: "input"
data_type: TYPE_FP32
dims: [1, 3, 224, 224 ]
}
]
output [
{
name: "465"
data_type: TYPE_FP32
dims: [1, 8]
}
]
推理框架和platform名称对应关系:
pytorch |
pytorch_libtorch |
tensorrt |
tensorr_plan |
onnx |
onnxruntime_onnx |
tensorflow |
tensorflow_graphdef |
如果对应错误服务启动会报错
E0315 03:36:33.731745 792 model_repository_manager.cc:1890] Poll failed for model directory ‘densenet_onnx’: unexpected platform type onnxruntime for densenet_onnx
…
error: creating server: Internal – failed to load all models
配置项官方参考如下
https://github.com/triton-inference-server/server/blob/main/docs/README.md#metrics
-
triton成功加载模型后,可以通过http请求查看模型参数
config_url = "http://172.17.0.2:8000/v2/models/{}/config".format(model_name)
res = requests.get(url=config_url)
print("model {} config:{}".format(model_name, res.json() ))
返回结果如下:
{
"name": "kfd_trt",
"platform": "tensorrt_plan",
"backend": "tensorrt",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 0,
"input": [
{
"name": "input",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1,
3,
224,
224
],
"is_shape_tensor": false,
"allow_ragged_batch": false
}
],
"output": [
{
"name": "465",
"data_type": "TYPE_FP32",
"dims": [
1,
8
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "kfd_trt",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
0
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.plan",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {},
"model_warmup": []
}
-
模型访问(传参类型易出错)
通过httpclent访问trition服务进行推理
input_data = np.ones((1,3,224,224), dtype=np.float)
request_data = {
"inputs": [{
"name": "input",
"shape": [
1,
3,
224,
224
],
"datatype": "FP32",
"data": input_data.tolist()
}],
"outputs": [{"name": "465"}]
}
req_url = "http://172.17.0.2:8000/v2/models/{}/versions/1/infer".format(model_name)
res = requests.post(url=req_url,json=request_data)
print("inference result:",res.json())
这里需要注意的是request_data的设置
第3步中通过模型状态查看得到的输入数据格式为
“data_type”: “TYPE_FP32”
但是在推理时入参与之不同
要去掉TYPE字样,否则会出现报错
{‘error’: ‘invalid datatype for input input’}