使用protobuf实现跨语言序列化 Java和Python实例
首先下载安装protoc
对于OS X可以通过brew直接安装 brew install protobuf
安装完可以通过protoc --version查看版本信息
创建proto文件-带Any类型的版本
带Any类型的只能导出pb2类型的Python文件,没法导出Python3 的版本,暂时不知道如何解决
Any类型的字段可以在java中实现泛型的功能
MessageDto.proto
syntax = "proto3";
import "google/protobuf/any.proto";
message MessageDto {
string action=1;
int32 statte=2;
google.protobuf.Any data=3;
}
RpcCmd.proto
syntax = "proto3";
import "MessageDto.proto";
message RpcCmd {
MessageDto message=1;
string randomKey=2;
string remoteAddressKey=3;
}
Point2PointMessage.proto
syntax = "proto3";
message Point2PointMessage {
string targetAddressKey;
string message;
}
BytesData.proto
syntax = "proto3";
message BytesData {
bytes content=1;
}
导出相应的对象定义文件
Python版
protoc --python_out=./gen_pb2 RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
生成的文件名为XXX_pb2.py
Java版
protoc --java_out=./gen_java RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
生成的文件名为XXXOuterClass.java
在Python中使用
首先要导入生成的文件,放到自己喜欢的包下,然后修改导入包的地址,比如RpcCmd_pb2.py中修改
import com.tony.proto.py2.MessageDto_pb2 as MessageDto__pb2
然后开始使用
from com.tony.proto.py2 import RpcCmd_pb2, Point2PointMessage_pb2, BytesData_pb2, MessageDto_pb2
def serialize_to_file(file_path):
p2p_msg = Point2PointMessage_pb2.Point2PointMessage()
p2p_msg.message = "Hello, p2p from python"
p2p_msg.targetAddressKey = "/127.0.0.1:38211"
# bytes_data = BytesData_pb2.BytesData()
# bytes_data.content = b"Hello, bytes data from python"
rpc_cmd = RpcCmd_pb2.RpcCmd()
rpc_cmd.randomKey = "random-key-key-random"
rpc_cmd.remoteAddressKey = "/127.0.0.1:1234"
rpc_cmd.message.action = "p2p"
rpc_cmd.message.state = 100
rpc_cmd.message.data.Pack(p2p_msg)
# rpc_cmd.message.data.Pack(bytes_data)
bytes_write = rpc_cmd.SerializeToString()
fw = open(file_path, mode="wb")
fw.write(bytes_write)
fw.flush()
fw.close()
print("write bytes to file:", bytes_write)
def deserialize_from_file(file_path):
fo = open(file_path, mode="rb")
bytes_read = fo.read()
fo.close()
print("read bytes from file:", bytes_read)
rpc_cmd = RpcCmd_pb2.RpcCmd()
rpc_cmd.ParseFromString(bytes_read)
print(rpc_cmd)
p2p_msg = Point2PointMessage_pb2.Point2PointMessage()
# bytes_data = BytesData_pb2.BytesData()
# rpc_cmd.message.data.Unpack(bytes_data)
rpc_cmd.message.data.Unpack(p2p_msg)
print("msg_content:", p2p_msg.message)
print("msg_target:", p2p_msg.targetAddressKey)
# print("bytes_data:", str(bytes_data.content, 'utf-8'))
if __name__ == "__main__":
serialize_file_path = "/trans-data-pb2.dat"
serialize_to_file(serialize_file_path)
deserialize_from_file(serialize_file_path)
执行结果如下,可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
write bytes to file: b'\n]\n\x03p2p\x10d\x1aT\n&type.googleapis.com/Point2PointMessage\x12*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
read bytes from file: b'\n]\n\x03p2p\x10d\x1aT\n&type.googleapis.com/Point2PointMessage\x12*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/Point2PointMessage"
value: "\n\020/127.0.0.1:38211\022\026Hello, p2p from python"
}
}
randomKey: "random-key-key-random"
remoteAddressKey: "/127.0.0.1:1234"
msg_content: Hello, p2p from python
msg_target: /127.0.0.1:38211
在Java中使用
同样导入到喜欢的包下,修改对应的包名即可
@Slf4j
public class ProtobufSerializeDemo {
@Test
public void serializeToFile() throws Exception {
Point2PointMessageOuterClass.Point2PointMessage.Builder p2pMsgBuilder = Point2PointMessageOuterClass.Point2PointMessage.newBuilder();
p2pMsgBuilder.setTargetAddressKey("/127.0.0.1:1233");
p2pMsgBuilder.setMessage("hello from java");
// BytesDataOuterClass.BytesData.Builder bytesBuilder = BytesDataOuterClass.BytesData.newBuilder();
// bytesBuilder.setContent(ByteString.copyFrom("bytes data from java".getBytes(StandardCharsets.UTF_8)));
MessageDtoOuterClass.MessageDto.Builder messageBuilder = MessageDtoOuterClass.MessageDto.newBuilder();
messageBuilder.setAction("p2p");
messageBuilder.setState(100);
messageBuilder.setData(Any.pack(p2pMsgBuilder.build()));
// messageBuilder.setData(Any.pack(bytesBuilder.build()));
RpcCmdOuterClass.RpcCmd.Builder builder = RpcCmdOuterClass.RpcCmd.newBuilder();
builder.setRandomKey("RANDOM_KEY_JAVA");
builder.setRemoteAddressKey("/127.0.0.1:1234");
builder.setMessage(messageBuilder.build());
builder.build().writeTo(new FileOutputStream("java_protobuf.dat"));
}
@Test
public void deserializeFromFile() throws Exception {
RpcCmdOuterClass.RpcCmd rpcCmd = RpcCmdOuterClass.RpcCmd.parseFrom(new FileInputStream("java_protobuf.dat"));
Point2PointMessageOuterClass.Point2PointMessage p2pMsg = rpcCmd.getMessage().getData().unpack(Point2PointMessageOuterClass.Point2PointMessage.class);
// BytesDataOuterClass.BytesData bytesData = rpcCmd.getMessage().getData().unpack(BytesDataOuterClass.BytesData.class);
log.info("deserialize rpcCmd: \n{}", rpcCmd);
log.info("deserialize p2pMsg: \n{}", p2pMsg);
// log.info("deserialize bytesData: \n{}", bytesData);
}
}
执行结果,可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
10:22:03.118 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize rpcCmd:
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/Point2PointMessage"
value: "\n\017/127.0.0.1:1233\022\017hello from java"
}
}
randomKey: "RANDOM_KEY_JAVA"
remoteAddressKey: "/127.0.0.1:1234"
10:22:03.168 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize p2pMsg:
targetAddressKey: "/127.0.0.1:1233"
message: "hello from java"
然后是Java和Python之间互相序列化和反序列化
只需要修改对应的文件地址就可以进行测试
Python反序列化Java
java_serialize_file_path = $path_to_java_serialized$
deserialize_from_file(java_serialize_file_path)
执行结果,这里演示的是BytesData类型的
read bytes from file: b'\n@\n\x03p2p\x10d\x1a7\n\x1dtype.googleapis.com/BytesData\x12\x16\n\x14bytes data from java\x12\x0fRANDOM_KEY_JAVA\x1a\x0f/127.0.0.1:1234'
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/BytesData"
value: "\n\024bytes data from java"
}
}
randomKey: "RANDOM_KEY_JAVA"
remoteAddressKey: "/127.0.0.1:1234"
bytes_data: bytes data from java
Java反序列化Python
@Test
public void deserializeFromPythonFile() throws Exception {
RpcCmdOuterClass.RpcCmd rpcCmd = RpcCmdOuterClass.RpcCmd.parseFrom(new FileInputStream($path_to_python_serialize$));
// Point2PointMessageOuterClass.Point2PointMessage p2pMsg = rpcCmd.getMessage().getData().unpack(Point2PointMessageOuterClass.Point2PointMessage.class);
BytesDataOuterClass.BytesData bytesData = rpcCmd.getMessage().getData().unpack(BytesDataOuterClass.BytesData.class);
log.info("deserialize rpcCmd: \n{}", rpcCmd);
// log.info("deserialize p2pMsg: \n{}", p2pMsg);
log.info("deserialize bytesData: \n{}", bytesData);
}
执行结果,同样是BytesData类型的
10:33:03.360 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize rpcCmd:
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/BytesData"
value: "\n\035Hello, bytes data from python"
}
}
randomKey: "random-key-key-random"
remoteAddressKey: "/127.0.0.1:1234"
10:33:03.402 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize bytesData:
content: "Hello, bytes data from python"
在Java平台,还有个更好用的工具可以不用手写proto文件
这个工具是io.protostuff
通过maven导入依赖
io.protostuff
protostuff-core
1.6.0
io.protostuff
protostuff-runtime
1.6.0
org.objenesis
objenesis
2.2
创建序列化工具类
import io.protostuff.LinkedBuffer;
import io.protostuff.ProtobufIOUtil;
import io.protostuff.Schema;
import io.protostuff.runtime.DefaultIdStrategy;
import io.protostuff.runtime.RuntimeSchema;
import lombok.extern.slf4j.Slf4j;
import org.objenesis.Objenesis;
import org.objenesis.ObjenesisStd;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
/**
* 基于Protostuff优化版的ProtobufIOUtil实现序列化,理论上可以支持跨语言序列化
*
* @author jiangwenjie 2019/10/30
*/
@Slf4j
public class ProtobufSerializer {
private final static Objenesis OBJENESIS = new ObjenesisStd(true);
private ProtobufSerializer() {
}
private static class SingletonHolder {
final static ProtobufSerializer INSTANCE = new ProtobufSerializer();
}
public static ProtobufSerializer getInstance() {
return ProtobufSerializer.SingletonHolder.INSTANCE;
}
public void serialize(Object obj, OutputStream outputStream) {
Class clz = obj.getClass();
LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
try {
Schema schema = getSchema(clz);
ProtobufIOUtil.writeTo(outputStream, obj, schema, buffer);
} catch (IOException e) {
log.error("序列化对象失败", e);
} finally {
buffer.clear();
}
}
public byte[] serialize(Object obj) {
Class clz = obj.getClass();
LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
try (ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream()) {
Schema schema = getSchema(clz);
ProtobufIOUtil.writeTo(arrayOutputStream, obj, schema, buffer);
return arrayOutputStream.toByteArray();
} catch (IOException e) {
log.error("序列化对象失败", e);
} finally {
buffer.clear();
}
return new byte[0];
}
public T deSerialize(InputStream inputStream, Class clazz) {
T object = OBJENESIS.newInstance(clazz);
Schema schema = getSchema(clazz);
try {
ProtobufIOUtil.mergeFrom(inputStream, object, schema);
return object;
} catch (IOException e) {
log.error("反序列化对象失败", e);
}
return null;
}
public T deSerialize(byte[] param, Class clazz) {
T object = OBJENESIS.newInstance(clazz);
Schema schema = getSchema(clazz);
try (ByteArrayInputStream inputStream = new ByteArrayInputStream(param)) {
ProtobufIOUtil.mergeFrom(inputStream, object, schema);
return object;
} catch (IOException e) {
log.error("反序列化对象失败", e);
}
return null;
}
private Schema getSchema(Class clz) {
return RuntimeSchema.createFrom(clz, new DefaultIdStrategy());
}
}
创建序列化对象
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import java.io.Serializable;
/**
* @author jiangwenjie 2019/10/22
*/
@Data
@Slf4j
public class RpcCmd implements Serializable {
private MessageDto message;
private String randomKey;
/**
* 目标地址,不需要序列化传输
*/
private transient String remoteAddressKey;
}
import com.tony.constants.EnumNettyState;
import com.tony.serializer.impl.ProtobufSerializer;
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.ToString;
import lombok.extern.slf4j.Slf4j;
import java.io.Serializable;
/**
* 消息对象
*
* @author jiangwenjie 2019/10/22
*/
@Slf4j
@Data
@ToString
@EqualsAndHashCode
public class MessageDto implements Serializable {
private String action;
private int state = 100;
/**
* 跨语言使用Protostuff中提供的protobuff序列化传递复杂对象
*/
private byte[] bytesData;
private Serializable serialData;
public T dataOfClazz(Class clazz, boolean isStuff) {
if (isStuff) {
return serialDataOfClazz(clazz);
} else {
return bytesDataOfClass(clazz);
}
}
public void setData(T object, boolean isStuff) {
if (isStuff) {
setSerialData(object);
} else {
setBytesData(object);
}
}
@SuppressWarnings("unchecked")
private T serialDataOfClazz(Class clazz) {
if (serialData == null) {
return null;
}
if (clazz.isInstance(serialData)) {
return (T)serialData;
} else {
throw new IllegalArgumentException("data is not instance of class:" + clazz.getName());
}
}
private T bytesDataOfClass(Class clazz) {
if (bytesData == null) {
return null;
}
try {
return ProtobufSerializer.getInstance().deSerialize(bytesData, clazz);
} catch (Exception e) {
log.error("反序列化data对象失败,请确认对象是否为:{} 类型", clazz);
}
return null;
}
private void setBytesData(T data) {
this.bytesData = ProtobufSerializer.getInstance().serialize(data);
}
}
import lombok.Data;
import java.io.Serializable;
/**
* 点对点通信data对象
*
* @author jiangwenjie 2019/10/26
*/
@Data
public class Point2PointMessage implements Serializable {
private String targetAddressKey;
private String message;
}
序列化测试
import lombok.extern.slf4j.Slf4j;
import org.junit.Test;
import java.io.FileInputStream;
import java.io.FileOutputStream;
/**
* @author jiangwenjie 2019/11/1
*/
@Slf4j
public class JavaProtostuffSerializeDemo {
@Test
public void serializeToFile() throws Exception {
Point2PointMessage p2pMsg = new Point2PointMessage();
p2pMsg.setTargetAddressKey("/127.0.0.1:1233");
p2pMsg.setMessage("message from java");
MessageDto messageDto = new MessageDto();
messageDto.setAction("p2p");
messageDto.setState(100);
messageDto.setData(p2pMsg, false);
RpcCmd rpcCmd = new RpcCmd();
rpcCmd.setMessage(messageDto);
rpcCmd.setRandomKey("RANDOM_KEY_JAVA");
rpcCmd.setRemoteAddressKey("/127.0.0.1:1234");
ProtobufSerializer.getInstance().serialize(rpcCmd, new FileOutputStream("java_proto_simple.dat"));
}
@Test
public void deserializeFromFile() throws Exception {
RpcCmd rpcCmd = ProtobufSerializer.getInstance().deSerialize(new FileInputStream("java_proto_simple.dat"), RpcCmd.class);
log.info("deserialize cmd:\n{}", rpcCmd);
log.info("deserialize p2p msg:\n{}", rpcCmd.getMessage().dataOfClazz(Point2PointMessage.class, false));
}
}
测试输出
11:02:45.646 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize cmd:
RpcCmd(message=MessageDto(action=p2p, state=100, bytesData=[10, 15, 47, 49, 50, 55, 46, 48, 46, 48, 46, 49, 58, 49, 50, 51, 51, 18, 17, 109, 101, 115, 115, 97, 103, 101, 32, 102, 114, 111, 109, 32, 106, 97, 118, 97], serialData=null, isFromBuff=false), randomKey=RANDOM_KEY_JAVA, remoteAddressKey=null)
11:02:45.651 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize p2p msg:
Point2PointMessage(targetAddressKey=/127.0.0.1:1233, message=message from java)
MessageDto中的Data 可以泛型化使用
/**
* 跨语言使用Protostuff中提供的protobuff序列化传递复杂对象
*/
private byte[] bytesData;
private Serializable serialData;
当序列化和反序列化不需要跨平台使用时,可以直接使用Serializable类型,反之需要用byte数组保存数据,进行二次序列化和反序列化。同时可以在序列化工具类ProtobufSerializer中将ProtobufIOUtil修改为ProtostuffIOUtil
通过setData方法进行响应的操作
public void setData(T object, boolean isStuff) {
if (isStuff) {
setSerialData(object);
} else {
setBytesData(object);
}
}
跨语言Python中反序列化
创建proto文件
MessageDto.proto
syntax = "proto3";
message MessageDto {
string action=1;
int32 state=2;
bytes data=3;
}
RpcCmd.proto
syntax = "proto3";
import "MessageDto.proto";
message RpcCmd {
MessageDto message = 1;
string randomKey = 2;
string remoteAddressKey = 3;
}
Point2PointMessage.proto
syntax = "proto3";
message Point2PointMessage {
bytes java_class = 127;
string targetAddressKey = 1;
string message = 2;
}
BytesData.proto
syntax = "proto3";
message BytesData {
bytes content=1;
}
导出Python3对象定义文件
此时没有用到Any类型 可以直接导出为Python3的py文件
protoc --python3_out=./gen RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
和pb2的区别是序列化和反序列化的方法名称进行了修改
pb2中用的是ParseFromString和SerializeToString
pb3中修改成了encode_to_bytes和parse_from_bytes
在Python中使用
同样的放到喜欢的包下,修改对应包名 这里不赘述
#!/usr/bin/python3
# -*-coding: utf-8 -*-
from com.tony.proto.py3 import RpcCmd, Point2PointMessage, MessageDto, BytesData
def serialize_to_file(file_path):
p2p_msg = Point2PointMessage.Point2PointMessage()
p2p_msg.message = "Hello, p2p from python"
p2p_msg.targetAddressKey = "/127.0.0.1:38211"
# bytes_data = BytesData.BytesData()
# bytes_data.content = "bytes data from python"
rpc_cmd = RpcCmd.RpcCmd()
rpc_cmd.randomKey = "random-key-key-random"
rpc_cmd.remoteAddressKey = "/127.0.0.1:1234"
rpc_cmd.message.action = "p2p"
rpc_cmd.message.state = 100
rpc_cmd.message.data = p2p_msg.encode_to_bytes()
# rpc_cmd.message.data = bytes_data.encode_to_bytes()
bytes_write = rpc_cmd.encode_to_bytes()
fw = open(file_path, mode="wb")
fw.write(bytes_write)
fw.flush()
fw.close()
print("write bytes to file:", bytes_write)
def deserialize_from_file(file_path):
fo = open(file_path, mode="rb")
bytes_read = fo.read()
fo.close()
print("read bytes from file:", bytes_read)
rpc_cmd = RpcCmd.RpcCmd()
rpc_cmd.parse_from_bytes(bytes_read)
print(rpc_cmd)
msg_bytes = rpc_cmd.message.data
print("message bytes", msg_bytes)
p2p_msg = data_of(rpc_cmd.message, Point2PointMessage.Point2PointMessage)
# bytes_data = data_of(rpc_cmd.message, BytesData.BytesData)
print("msg_content:", p2p_msg.message)
print("msg_target:", p2p_msg.targetAddressKey)
# print("bytes_data:", bytes_data.content)
def data_of(message: RpcCmd.MessageDto, message_identify):
content = message_identify()
content.parse_from_bytes(message.data)
return content
if __name__ == "__main__":
serialize_file_path = "./trans-data.dat"
serialize_to_file(serialize_file_path)
deserialize_from_file(serialize_file_path)
执行结果如下,同样的可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
write bytes to file: b'\n3\n\x03p2p\x10d\x1a*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
read bytes from file: b'\n3\n\x03p2p\x10d\x1a*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
:
: p2p : 100 : b'\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python'
:
random-key-key-random
:
/127.0.0.1:1234
message bytes b'\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python'
msg_content: Hello, p2p from python
msg_target: /127.0.0.1:38211
Python和Java互转
同样是仅仅修改序列化文件地址即可
Python反序列化Java
java_serialize_file_path = $path_to_java_serialized$
deserialize_from_file(java_serialize_file_path)
执行结果
read bytes from file: b'\n-\n\x03p2p\x10d\x1a$\n\x0f/127.0.0.1:1233\x12\x11message from java\x12\x0fRANDOM_KEY_JAVA'
:
: p2p : 100 : b'\n\x0f/127.0.0.1:1233\x12\x11message from java'
:
RANDOM_KEY_JAVA
message bytes b'\n\x0f/127.0.0.1:1233\x12\x11message from java'
msg_content: message from java
msg_target: /127.0.0.1:1233
Java反序列化Python
@Test
public void deserializeFromPythonFile() throws Exception {
RpcCmd rpcCmd = ProtobufSerializer.getInstance()
.deSerialize(new FileInputStream($python_serialize_path$), RpcCmd.class);
log.info("deserialize cmd:\n{}", rpcCmd);
log.info("deserialize p2p msg:\n{}", rpcCmd.getMessage().dataOfClazz(Point2PointMessage.class, false));
}
执行结果
13:15:17.821 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize cmd:
RpcCmd(message=MessageDto(action=p2p, state=100, bytesData=[10, 16, 47, 49, 50, 55, 46, 48, 46, 48, 46, 49, 58, 51, 56, 50, 49, 49, 18, 22, 72, 101, 108, 108, 111, 44, 32, 112, 50, 112, 32, 102, 114, 111, 109, 32, 112, 121, 116, 104, 111, 110], serialData=null), randomKey=random-key-key-random, remoteAddressKey=null)
13:15:17.828 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize p2p msg:
Point2PointMessage(targetAddressKey=/127.0.0.1:38211, message=Hello, p2p from python)
io.protostuff使用总结
在java平台可以直接定义普通的POJO而不需要手写proto文件并生成对应的对象文件,仅仅通过其所提供的ProtobufIOUtil或者ProtostuffIOUtil来实现序列化和反序列化即可。
当需要进行跨语言序列化和反序列化时,需要其他语言中编写对应的proto文件并生成对象文件,而在Java中的泛型实例变量则需要进行修改,改成二次序列化的byte数组,方便在Python等语言中进行解析。Java中的序列化也应采用ProtobufIOUtil来实现。此时,Python中可以根据业务类型反序列化成指定的对象,Java中也以该对象来序列化,反过来也是一样的操作。以此来达到的目的是定义MessageDto之后如果需要扩展,不需要修改MessageDto,仅仅需要定义更多的data类型然后赋值给MessageDto.$data。
对比纯protobuf实现的来说,在编码上更加简单,不需要写大量的Any.pack()和Any.unpack()
原文发布在我的github blog