概述
当我们要去部署和测试深度学习models时,我们经常需要一个比较便利和接口方便的web server,这样我们只需要专注在Model本身即可,通过web server一方面不仅可以进行调试和测试, 另外一方面还可以将自己的成果输出给第三方和用户去体验深度学习带来的便利。大家经常使用的方式有以下两种(以TensorFlow为主):
- TensorFlow Serving
- TensorFlow Models + web server
如果你的模型很多,那么就需要结合云计算的AI工具箱进而支持大规模的深度学习线上、线下训练、预测以及发布等等,还会结合yarn和k8s进行节点、服务调度,微服务化的方式支持业务线。
TensorFlow Serving
我们首先介绍一下TensorFlow Serving,说的直白一点。TensorFlow Serving 使我们使用已训练好的模型一种解耦的技巧。我们创建TensorFlow Serving服务来管理和部署我们的服务。TensorFlow Serving是一个用于深度学习模型Serving的高性能开源库,使用gRPC作为接口接收外部调用。
TensorFlow Serving架构

其中几个很关键的核心的概念可以参考TF的官方文档:
Servables:客户端执行计算的底层对象。
Loaders:负责管理Servable的生命周期,将一个Servable的生命周期的API进行标准化。
Sources:提供Servables的插件模块。
Managers 维护Servables的整个生命周期,包括

Client端会不断给Manager发送请求,Manager会根据版本管理策略管理模型更新,并将最新的模型计算结果返回给Client端。更为详细的为:
- 一个 Source 插件对一个特定的 Version 创建一个 Loader,这个 Loader 包含了需要载入 Servable 的全部元信息。
- Source 通过一个回调函数来通知 Manager 此时的 Aspired Version。
- Manager 根据 Version Policy 的设置决定接下来的操作,可能是卸载之前的 Version,或是载入一个新的 Version。
- 如果 Manager 的认为操作是安全的,则会赋予 Loader 相应的资源并通知 Loader 载入一个新的 Version。
- 客户端向 Manager 请求 Servable,可指定一个版本或是请求最新的版本,Manager 返回一个对应的处理器。
安装TF-Serving
个人建议安装虚拟化的独立环境,本人基于Ubuntu 16.10 的Python环境安装执行的:
1
2
3
| echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt install curl tensorflow-model-server
|
下一步是将自己已训练好的一个模型进行服务管理和发布
1
2
3
4
5
| from keras.applications.inception_v3 import InceptionV3
from keras.layers import Input
inception_model = InceptionV3(weights='imagenet', input_tensor=Input(shape=(224, 224, 3)))
inception_model.save('inception.h5')
|
现在我们的Inception的模型以Keras格式保存的,下一步是将TensorFlow server可以处理的格式导出模型。可以参考下,https://github.com/himanshurawlani/keras-and-tensorflow-serving。
1
2
3
| model compile()
tf.saved_model.simple_save()
|
启动服务
1
| tensorflow_model_server --model_base_path=/modelpath/image_classifier --rest_api_port=9000 --model_name=ImageClassifier
|
测试服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| import argparse
import json
import numpy as np
import requests
from keras.applications import inception_v3
from keras.preprocessing import image
# Argument parser for giving input image_path from command line
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path of the image")
args = vars(ap.parse_args())
image_path = args['image']
# Preprocessing our input image
img = image.img_to_array(image.load_img(image_path, target_size=(224, 224))) / 255.
# this line is added because of a bug in tf_serving(1.10.0-dev)
img = img.astype('float16')
payload = {
"instances": [{'input_image': img.tolist()}]
}
# sending post request to TensorFlow Serving server
r = requests.post('http://localhost:9000/v1/models/ImageClassifier:predict', json=payload)
pred = json.loads(r.content.decode('utf-8'))
# Decoding the response
# decode_predictions(preds, top=5) by default gives top 5 results
# You can pass "top=10" to get top 10 predicitons
print(json.dumps(inception_v3.decode_predictions(np.array(pred['predictions']))[0]))
|
输出内容为:
1
2
3
4
| python serving_sample_request.py -i ../test_images/car.png
Using TensorFlow backend.
[["n04285008", "sports_car", 0.998414], ["n04037443", "racer", 0.00140099], ["n03459775", "grille", 0.000160794], ["n02974003", "car_wheel", 9.57862e-06], ["n03100240", "convertible", 6.01581e-06]]
|
那么如果是多个TF models怎么办呢?比如:model1,model2和model3
多模型部署
可以创建一个model.config的文件,内容为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| model_config_list:{
config:{
name:"model1",
base_path:"/models/model1",
model_platform:"tensorflow"
},
config:{
name:"model2",
base_path:"/models/model2",
model_platform:"tensorflow"
},
config:{
name:"model3",
base_path:"/models/model3",
model_platform:"tensorflow"
}
}
|
创建完后的即可运行的models,
1
| 通过--model_config_file=/models/models.config指定
|
如果一个model有多个版本时候,也可以通过版本号来指定传入server api即可。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| import requests
import numpy as np
SERVER_URL = 'http://localhost:8501/v1/models/model3:predict'
#注意SERVER_URL中的‘model3’是config文件中定义的模型name,不是文件夹名称
#SERVER_URL = 'http://localhost:8501/v1/models/model1/versions/100003:predict'
def prediction():
predict_request='{"instances":%s}' % str([[[10]*7]*7])
print(predict_request)
response = requests.post(SERVER_URL, data=predict_request)
print(response)
prediction = response.json()['predictions'][0]
print(prediction)
prediction()
|
Flask 部署多个TF models
我们先定义和你保存模型
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
| import tensorflow as tf
import numpy as np
def model_a():
"""
model placeholder : an actual model shall be defined like some CNN/LSTM etc.
But for the purpose of demonstration,
all we need is a simple Graph with Variable in it
"""
#some supposed input
A = tf.placeholder(tf.float32, shape=[10, 10])
#"weights" of the model
B = tf.Variable(tf.random_uniform(A.shape.as_list()), name="my_model_a_weight")
#model (haha)
C = tf.matmul(A, B)
#define a dummy class for convenience
class Model():
pass
#fake model instance for easy referencing
model = Model()
#give input to the Graph at this node
model.input_placeholder = A
#get output from the Graph at this node
model.output_node = C
return model
A = model_a()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#train here to change weights or however
#training_model()
#once the model is trained, or not, save the session
saver.save(sess, "model1.ckpt")
|
我们接下来定义第二个模型
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| import tensorflow as tf
import numpy as np
def model_b():
"""
model placeholder : an actual model shall be defined, like some CNN/LSTM etc.
But for the purpose of demonstration,
all we need is a simple Graph with Variable in it
"""
#some supposed input
P = tf.placeholder(tf.float32, shape=(4,4))
#"weights" of the model
Q = tf.Variable(tf.constant(np.arange(16, dtype=np.float32).reshape(-1, 4)))
#model (haha)
R = tf.add(P, Q)
#define a dummy class for convenience
class Model():
pass
#fake model instance for easy referencing
model = Model()
#give input to the Graph at this node
model.input_placeholder = P
#get output from the Graph at this node
model.output_node = R
return model
#graph
A = model_b()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#train here to change weights or however
#training_model()
#once the model is trained, or not, save the session
saver.save(sess, "model2.ckpt")
|
现在两个模型已经定义完和被保存,我们需要定义一个包装类用来将Graph和Session的类对象放在一起。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
| import tensorflow as tf
import numpy as np
from model_a import model_a
from model_b import model_b
class ModelA():
"""
This is a wrapper class on all the compexity that one comes across.
Principle :
<START>
Define graph
Add nodes
Assign session to the graph
Restore sess from saved ckpt
<DONE>
"""
def __init__(self):
#defines graph
self.graph = tf.Graph()
#following step is very important
with self.graph.as_default():
"""
consider whatever the 'graph' has as the default graph
within the lifetime of the instance of this class
"""
#add nodes
self.model = model_a()
#assign sess
self.sess = tf.Session()
#saver
self.saver = tf.train.Saver()
#restore the model
self.saver.restore(self.sess, "model1.ckpt")
def predict(self, input_vec):
result = self.sess.run(
self.model.output_node,
feed_dict={
self.model.input_placeholder:input_vec
}
)
return result
class ModelB():
"""
This is a wrapper class on all the compexity that one comes across.
Principle :
<START>
Define graph
Add nodes
Assign session to the graph
Restore sess from saved ckpt
<DONE>
"""
def __init__(self):
#defines graph
self.graph = tf.Graph()
#following step is very important
with self.graph.as_default():
"""
consider whatever the 'graph' has as the default graph
within the lifetime of the instance of this class
"""
#add nodes
self.model = model_b()
#assign sess
self.sess = tf.Session()
#saver
self.saver = tf.train.Saver()
#restore the model
self.saver.restore(self.sess, "model2.ckpt")
def predict(self, input_vec):
result = self.sess.run(
self.model.output_node,
feed_dict={
self.model.input_placeholder:input_vec
}
)
return result
|
下面我们需要将创建flask APP公布restful api,如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| import tensorflow as tf
import numpy as np
from flask import Flask, jsonify, request,make_response
from flask_cors import CORS
from model_wrappers import ModelA, ModelB
a = ModelA()
b = ModelB()
app = Flask(__name__)
CORS(app)
def featurize1(*args):
"""
Dummy featurizer for model_a
"""
model_size = (10, 10)
return np.arange(100, dtype=np.float32).reshape(model_size)
def featurize2(*args):
"""
Dummy featurizer for model_b
"""
model_size = (4, 4)
return np.random.random(model_size)
@app.route('/get_my_predictions', methods=['POST'])
def get_predictions():
print(request.json)
text = request.json.get('text')
vec1 = featurize1(text)
vec2 = featurize2(text)
resp_a = a.predict(vec1).tolist()
resp_b = b.predict(vec2).tolist()
return jsonify({"Response":[{"model_a": resp_a}, {"model_b":resp_b}]})
if __name__ == '__main__':
app.debug = True
app.run(host='0.0.0.0',threaded=True)
|
对比
TensorFlow Serving和Flask各有优势,如果在大型云计算结合AI服务应用建议使用TensorFlow Serving;如果模型比较单一建议使用Flask简单快捷。