概述

当我们要去部署和测试深度学习models时,我们经常需要一个比较便利和接口方便的web server,这样我们只需要专注在Model本身即可,通过web server一方面不仅可以进行调试和测试, 另外一方面还可以将自己的成果输出给第三方和用户去体验深度学习带来的便利。大家经常使用的方式有以下两种(以TensorFlow为主):

  • TensorFlow Serving
  • TensorFlow Models + web server

如果你的模型很多,那么就需要结合云计算的AI工具箱进而支持大规模的深度学习线上、线下训练、预测以及发布等等,还会结合yarn和k8s进行节点、服务调度,微服务化的方式支持业务线。

TensorFlow Serving

我们首先介绍一下TensorFlow Serving,说的直白一点。TensorFlow Serving 使我们使用已训练好的模型一种解耦的技巧。我们创建TensorFlow Serving服务来管理和部署我们的服务。TensorFlow Serving是一个用于深度学习模型Serving的高性能开源库,使用gRPC作为接口接收外部调用。 以下是TensorFlow Serving整体架构图:

TensorFlow Serving 2

其中几个很关键的核心的概念可以参考TF的官方文档:

  • Servables:客户端执行计算的底层对象。

  • Loaders:负责管理Servable的生命周期,将一个Servable的生命周期的API进行标准化。

  • Sources:提供Servables的插件模块。

  • Managers 维护Servables的整个生命周期,包括

    • 加载 Servables

    • 为 Servables 提供服务

    • 卸载 Servables

    • Core TensorFlow Serving Core通过 TensorFlow Serving APIs 管理 Servales 的如下方面:

      • 生命周期 (lifecycle)
      • 度量信息 (metrics)

      TensorFlow Serving Core 将 Servables 和 Loaders 视为不透明的对象 (opaque objects)。

TensorFlow Serving

Client端会不断给Manager发送请求,Manager会根据版本管理策略管理模型更新,并将最新的模型计算结果返回给Client端。更为详细的为:

  1. 一个 Source 插件对一个特定的 Version 创建一个 Loader,这个 Loader 包含了需要载入 Servable 的全部元信息。
  2. Source 通过一个回调函数来通知 Manager 此时的 Aspired Version。
  3. Manager 根据 Version Policy 的设置决定接下来的操作,可能是卸载之前的 Version,或是载入一个新的 Version。
  4. 如果 Manager 的认为操作是安全的,则会赋予 Loader 相应的资源并通知 Loader 载入一个新的 Version。
  5. 客户端向 Manager 请求 Servable,可指定一个版本或是请求最新的版本,Manager 返回一个对应的处理器。

安装TF-Serving

个人建议安装虚拟化的独立环境,本人基于Ubuntu 16.10 的Python环境安装执行的:

1
2
 echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
 sudo apt install curl tensorflow-model-server

下一步是将自己已训练好的一个模型进行服务管理和发布

1
2
3
4
5
from keras.applications.inception_v3 import InceptionV3
from keras.layers import Input

inception_model = InceptionV3(weights='imagenet', input_tensor=Input(shape=(224, 224, 3)))
inception_model.save('inception.h5')

现在我们的Inception的模型以Keras格式保存的,下一步是将TensorFlow server可以处理的格式导出模型。可以参考下,https://github.com/himanshurawlani/keras-and-tensorflow-serving。

1
2
model compile()
tf.saved_model.simple_save()

启动TensorFlow Serving 服务

1
tensorflow_model_server --model_base_path=/modelpath/image_classifier --rest_api_port=9000 --model_name=ImageClassifier

测试TensorFlow Serving服务

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import argparse
import json

import numpy as np
import requests
from keras.applications import inception_v3
from keras.preprocessing import image

# Argument parser for giving input image_path from command line
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
                help="path of the image")
args = vars(ap.parse_args())

image_path = args['image']
# Preprocessing our input image
img = image.img_to_array(image.load_img(image_path, target_size=(224, 224))) / 255.

# this line is added because of a bug in tf_serving(1.10.0-dev)
img = img.astype('float16')

payload = {
    "instances": [{'input_image': img.tolist()}]
}

# sending post request to TensorFlow Serving server
r = requests.post('http://localhost:9000/v1/models/ImageClassifier:predict', json=payload)
pred = json.loads(r.content.decode('utf-8'))

# Decoding the response
# decode_predictions(preds, top=5) by default gives top 5 results
# You can pass "top=10" to get top 10 predicitons
print(json.dumps(inception_v3.decode_predictions(np.array(pred['predictions']))[0]))

输出的内容为:

1
2
3
python serving_sample_request.py -i ../test_images/car.png
Using TensorFlow backend.
[["n04285008", "sports_car", 0.998414], ["n04037443", "racer", 0.00140099], ["n03459775", "grille", 0.000160794], ["n02974003", "car_wheel", 9.57862e-06], ["n03100240", "convertible", 6.01581e-06]]

那么如果是多个TF models怎么办呢?比如:model1,model2和model3

多模型部署

可以创建一个model.config的文件,内容为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
model_config_list:{
    config:{
      name:"model1",
      base_path:"/models/model1",
      model_platform:"tensorflow"
    },
    config:{
      name:"model2",
      base_path:"/models/model2",
      model_platform:"tensorflow"
    },
    config:{
      name:"model3",
      base_path:"/models/model3",
      model_platform:"tensorflow"
    } 
}

创建完后的即可运行的models,

1
通过--model_config_file=/models/models.config指定

如果一个model有多个版本时候,也可以通过版本号来指定传入server api即可。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import requests 
import numpy as np 
SERVER_URL = 'http://localhost:8501/v1/models/model3:predict'  
#注意SERVER_URL中的‘model3’是config文件中定义的模型name,不是文件夹名称
#SERVER_URL = 'http://localhost:8501/v1/models/model1/versions/100003:predict'  

def prediction(): 
    predict_request='{"instances":%s}' % str([[[10]*7]*7]) 
    print(predict_request) 
    response = requests.post(SERVER_URL, data=predict_request) 
    print(response)
    prediction = response.json()['predictions'][0] 
    print(prediction) 


prediction()

Flask 部署多个TF models

我们先定义和你保存模型

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import tensorflow as tf
import numpy as np

def model_a():
	"""
	model placeholder : an actual model shall be defined like some CNN/LSTM etc.
	But for the purpose of demonstration, 
	all we need is a simple Graph with Variable in it
	"""
	#some supposed input
	A = tf.placeholder(tf.float32, shape=[10, 10])
	#"weights" of the model
	B = tf.Variable(tf.random_uniform(A.shape.as_list()), name="my_model_a_weight")
	#model (haha)
	C = tf.matmul(A, B)	
	
	#define a dummy class for convenience
	class Model():
		pass
	
	#fake model instance for easy referencing
	model = Model()
	
	#give input to the Graph at this node
	model.input_placeholder = A
	
	#get output from the Graph at this node
	model.output_node = C
	return model
A = model_a()

saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#train here to change weights or however
#training_model()
#once the model is trained, or not, save the session
saver.save(sess, "model1.ckpt")

我们接下来定义第二个模型

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import tensorflow as tf
import numpy as np

def model_b():
	"""
	model placeholder : an actual model shall be defined, like some CNN/LSTM etc.
	But for the purpose of demonstration, 
	all we need is a simple Graph with Variable in it
	"""
	#some supposed input
	P = tf.placeholder(tf.float32, shape=(4,4))
	#"weights" of the model
	Q = tf.Variable(tf.constant(np.arange(16, dtype=np.float32).reshape(-1, 4)))
	#model (haha)
	R = tf.add(P, Q)
	
	#define a dummy class for convenience
	class Model():
		pass
	
	#fake model instance for easy referencing
	model = Model()
	
	#give input to the Graph at this node
	model.input_placeholder = P
	
	#get output from the Graph at this node
	model.output_node = R
	return model

#graph
A = model_b()

saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#train here to change weights or however
#training_model()
#once the model is trained, or not, save the session
saver.save(sess, "model2.ckpt")

现在两个模型已经定义完和被保存,我们需要定义一个包装类用来将Graph和Session的类对象放在一起。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import tensorflow as tf
import numpy as np

from model_a import model_a
from model_b import model_b

class ModelA():
	"""
	This is a wrapper class on all the compexity that one comes across.
	Principle : 
	<START>
		Define graph
		Add nodes
		Assign session to the graph
		Restore sess from saved ckpt
	<DONE>
	"""
	def __init__(self):
		#defines graph
		self.graph = tf.Graph()
		#following step is very important
		with self.graph.as_default():
			"""
			consider whatever the 'graph' has as the default graph
			within the lifetime of the instance of this class
			"""
			#add nodes
			self.model = model_a()
			#assign sess
			self.sess = tf.Session()
			#saver
			self.saver = tf.train.Saver()
			#restore the model
			self.saver.restore(self.sess, "model1.ckpt")
	
	def predict(self, input_vec):
		result = self.sess.run(
			self.model.output_node,
			feed_dict={
				self.model.input_placeholder:input_vec
			}
		)
		return result

	
class ModelB():
	"""
	This is a wrapper class on all the compexity that one comes across.
	Principle : 
	<START>
		Define graph
		Add nodes
		Assign session to the graph
		Restore sess from saved ckpt
	<DONE>
	"""
	def __init__(self):
		#defines graph
		self.graph = tf.Graph()
		#following step is very important
		with self.graph.as_default():
			"""
			consider whatever the 'graph' has as the default graph
			within the lifetime of the instance of this class
			"""
			#add nodes
			self.model = model_b()
			#assign sess
			self.sess = tf.Session()
			#saver
			self.saver = tf.train.Saver()
			#restore the model
			self.saver.restore(self.sess, "model2.ckpt")
	
	def predict(self, input_vec):
		result = self.sess.run(
			self.model.output_node,
			feed_dict={
				self.model.input_placeholder:input_vec
			}
		)
		return result

下面我们需要将创建flask APP公布restful api,如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import tensorflow as tf
import numpy as np

from flask import Flask, jsonify, request,make_response
from flask_cors import CORS

from model_wrappers import ModelA, ModelB
a = ModelA()
b = ModelB()

app = Flask(__name__)
CORS(app)

def featurize1(*args):
    """
    Dummy featurizer for model_a
    """
    model_size = (10, 10)
    return np.arange(100, dtype=np.float32).reshape(model_size)

def featurize2(*args):
    """
    Dummy featurizer for model_b
    """
    model_size = (4, 4)
    return np.random.random(model_size)


@app.route('/get_my_predictions', methods=['POST'])
def get_predictions():
    print(request.json)
    text = request.json.get('text')
    vec1 = featurize1(text)
    vec2 = featurize2(text)
    resp_a = a.predict(vec1).tolist()
    resp_b = b.predict(vec2).tolist()

    return jsonify({"Response":[{"model_a": resp_a}, {"model_b":resp_b}]})

if __name__ == '__main__':
    app.debug = True
    app.run(host='0.0.0.0',threaded=True)

对比

TensorFlow Serving和Flask各有优势,如果在大型云计算结合AI服务应用建议使用TensorFlow Serving;如果模型比较单一建议使用Flask简单快捷。

参考文献

1.Flask Deploy Multi models

2.TF-Serving models