google protobuf 反射机制学习笔记

持续更新中。。。

何为反射机制

基本概念

指程序可以访问、检测和修改它本身状态或行为的一种能力

程序集包含模块，而模块包含类型，类型又包含成员。反射则提供了封装程序集、模块和类型的对象。您可以使用反射动态地创建类型的实例，将类型绑定到现有对象，或从现有对象中获取类型。然后，可以调用类型的方法或访问其字段和属性。

我(c++程序员)关注的问题

如何在程序运行过程中通过类型名字(一个字符串，合法但是内容在编译期间未知，比如是在配置文件中获取的)创建出类型对象.
如果在程序运行过程中通过对象和对象的属性的名字(一个字符串，合法但是内容在编译期间未知，比如是通过通讯包获取的)获取,修改对应属性.
如果在程序运行过程中通过对象和对象方法的名字(一个字符串，合法但是内容在编译期间未知，比如是从用户输入获取的)调用对应的方法.

protobuf 反射使用简介

举个例子：
当你有一个test.proto 比如：

package T;
message Test
{
    optional int32 id = 1;
}

通过类型名字创建出类型对象.

预先编译好proto模式

//! 利用类型名字构造对象.
/*!
 * @Param type_name 类型名字，比如 "Test.TestMessage".
 * @Return 对象指针，new 出来的，使用者负责释放.
 */
#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>
#include "cpp/test.pb.h" // 这是protoc给你生成的文件

int main()
{
    // 先获得类型的Descriptor .
    auto descriptor = google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName("T.Test");
    if (nullptr == descriptor)
    {
        return 0 ;
    }
    // 利用Descriptor拿到类型注册的instance. 这个是不可修改的.
    auto prototype = google::protobuf::MessageFactory::generated_factory()->GetPrototype(descriptor);
    if ( nullptr == descriptor)
    {
        return 0 ;
    }
    // 构造一个可用的消息.
    auto message = prototype->New();
    // 只有当我们预先编译了test消息并且正确链接才能这么干.
    auto test = dynamic_cast<T::Test*>(message);
    // 直接调用message的具体接口
    // 其实这些接口是语法糖接口.所以并没有对应的反射机制来对应调用.
    // 反射机制实现了的Set/Get XXX系列接口，是属于Reflection的接口，接收Message作为参数.
    test->set_id(1);
    std::cout<<test->Utf8DebugString()<<std::endl;
    delete message ;
    return 0 ;
}

直接解析proto 文件模式

#include <iostream>
#include <google/protobuf/compiler/importer.h>
#include <google/protobuf/dynamic_message.h>
int main()
{
    // 准备配置好文件系统
    google::protobuf::compiler::DiskSourceTree sourceTree;
    // 将当前路径映射为项目根目录 ， project_root 仅仅是个名字，你可以你想要的合法名字.
    sourceTree.MapPath("project_root","./");
    // 配置动态编译器.
    google::protobuf::compiler::Importer importer(&sourceTree, NULL);
    // 动态编译proto源文件。 源文件在./source/proto/test.proto .
    importer.Import("project_root/source_proto/test.proto");
    // 现在可以从编译器中提取类型的描述信息.
    auto descriptor1 = importer.pool()->FindMessageTypeByName("T.Test");
    // 创建一个动态的消息工厂.
    google::protobuf::DynamicMessageFactory factory;
    // 从消息工厂中创建出一个类型原型.
    auto proto1 = factory.GetPrototype(descriptor1);
    // 构造一个可用的消息.
    auto message1= proto1->New();
    // 下面是通过反射接口给字段赋值.
    auto reflection1 = message1->GetReflection();
    auto filed1 = descriptor1->FindFieldByName("id");
    reflection1->SetInt32(message1,filed1,1);
    // 打印看看
    std::cout << message1->DebugString();
    // 删除消息.
    delete message1 ;
    return 0 ;
}

通过对象和对象的属性的名字获取,修改对应属性.

首先定义mesage :
对于上文提到的 test.proto

#include "cpp/test.pb.h"
#include <iostream>

int main()
{
    // 拿到一个对象，不在乎怎么拿到，可以是通过反射拿到。
    // 这里简单直接的创建一个.
    T::Test p_test ;
    // 拿到对象的描述包.
    auto descriptor = p_test.GetDescriptor() ;
    // 拿到对象的反射配置.
    auto reflecter = p_test.GetReflection() ;
    // 拿到属性的描述包.
    auto field = descriptor->FindFieldByName("id");
    // 设置属性的值.
    reflecter->SetInt32(&p_test , field , 5 ) ;
    // 获取属性的值.
    std::cout<<reflecter->GetInt32(p_test , field)<< std::endl ;
    return 0 ;
}

通过对象和对象方法的名字调用对应的方法.

//TODO

protobuf 反射实现解析.

基本概念

Descriptor系列.

::google::protobuf::Descriptor 系列包括：

Descriptor – 用来描述消息
FieldDescriptor – 用来描述字段
OneofDescriptor – 用来描述联合体
EnumDescriptor – 用来描述枚举
EnumValueDescriptor – 用来描述枚举值
ServiceDescriptor – 用来描述服务器
MethodDescriptor – 用来描述服务器方法
FileDescriptor – 用来描述文件

这些 Descriptor系列的数据则是由 DescriptorProto系列的数据 , 利用DescriptorBuilder工具类来填充的 .
有兴趣的可以查阅 https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.cc

DescriptorProto系列是一些用protobuf定义的，用来描述所有由protbuf产生的类型的类型信息包.
对应的proto文件在 : https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto
Descriptor 系列最大的message是 FileDescriptor . 每个文件会生成一个包含本文件所有信息的FileDescriptor包.

举个例子：
对于上文提到的 test.proto , protoc 会给就会自动填装一个描述包，类似于:

::google::protobuf::FileDescriptorProto file;
file.set_name("test.proto");
file.set_packet("T")
auto desc = file.add_message_type() ;
desc->set_name("T.Test");
auto id_desc  = desc->mutable_field();
id_desc->set_name("id");
id_desc->set_type(::google::protobuf::FieldDescriptorProto::TYPE_INT32);
id_desc->set_number(1);
//...

然后保存起来.
如果你读protoc生成的 test.pb.cc文件你会看到这样的代码：

::google::protobuf::DescriptorPool::InternalAddGeneratedFile(
    "\n\013test.proto\022\001T\"\022\n\004Test\022\n\n\002id\030\001 \001(\005", 36);

其实就是在protoc生成代码中hard code 了对应proto文件的FileDescriptor包序列化之后的数据. 作为参数直接使用.

offset

任何一个对象最终都对应一段内存，有内存起始(start_addr)和结束地址，
而对象的每一个属性，都位于 start_addr+$offset ，所以当对象和对应属性的offset已知的时候，
属性的内存地址也就是可以获取的。


//! 获取某个属性在对应类型对象的内存偏移.
#define GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(TYPE, FIELD)    \
  static_cast<int>(                                           \
      reinterpret_cast<const char*>(                          \
          &reinterpret_cast<const TYPE*>(16)->FIELD) -        \
      reinterpret_cast<const char*>(16))

DescriptorDatabase

DescriptorDatabase是一个纯虚基类，描述了一系列符合通过名字(文件名，符号名。。。) 来获取FileDescriptorProto的接口 :

https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor_database.h

// 这里我干掉了里面的英文注释.
class LIBPROTOBUF_EXPORT DescriptorDatabase {
    public:
        inline DescriptorDatabase() {}
        virtual ~DescriptorDatabase();
        virtual ~DescriptorDatabase();
        // 通过文件名字找.
        virtual bool FindFileByName(const string& filename,
                FileDescriptorProto* output) = 0;
        // 通过符号名字找.
        virtual bool FindFileContainingSymbol(const string& symbol_name,
                FileDescriptorProto* output) = 0;
        // 通过扩展信息找.
        virtual bool FindFileContainingExtension(const string& containing_type,
                int field_number,
                FileDescriptorProto* output) = 0;
        // 通过扩展信息的tag数字找...
        virtual bool FindAllExtensionNumbers(const string& /* extendee_type */,
                vector<int>* /* output */) {
            return false;
        }
    private:
        GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(DescriptorDatabase);
};

核心的两个派生类是：

EncodedDescriptorDatabase
- 支持DescriptorDatabase的全部接口
- 接收序列化之后的FileDescriptorProto, 保存在map中备查.
- 这个类对应着预先编译链接好的那些类型的反射机制。
SourceTreeDescriptorDatabase
- 仅支持DescriptorDatabase的FindFileByName接口。其余直接返回false.
- 每次查询某个文件都是从磁盘读入proto的源文件，编译解析后返回对应的FileDescriptorProto .
- 这个类对应着动态编译proto源文件的时候的反射机制.

这里我不探究protobuf 是如何运行时编译proto源文件.

DescriptorPool

任何时候想要查询一个Descriptor , 都是去DescriptorPool里面查询。
DescriptorPool 实现了这样的机制：

缓存所有查询的文件的Descriptor 。
查找Descriptor的时候，如果自身缓存查到就直接返回结果，
否则去自带的DescriptorDatabase中查FileDescriptorProto，
查到就转化成Descriptor，返回结果并且缓存.

https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.h


class LIBPROTOBUF_EXPORT  DescriptorPool{
 public:
  // Create a normal, empty DescriptorPool.
  DescriptorPool();

  // 干掉一个灰常长的注释，核心是下面两条加一些注意事项.
  // 构造一个带着DescriptorDatabase的Pool 。
  // 这样查找的时候，优先从Pool中查找，找不到就到fallback_database中找.
  class ErrorCollector;
  explicit DescriptorPool(DescriptorDatabase* fallback_database,
                          ErrorCollector* error_collector = NULL);

  ~DescriptorPool();

  // 这个获取编译进入二进制的那些消息的pool 。这个接口就是我们获取预先编译链接好
  // 的消息的入口。
  // Get a pointer to the generated pool.  Generated protocol message classes
  // which are compiled into the binary will allocate their descriptors in
  // this pool.  Do not add your own descriptors to this pool.
  static const DescriptorPool* generated_pool();


  // Find a FileDescriptor in the pool by file name.  Returns NULL if not
  // found.
  const FileDescriptor* FindFileByName(const string& name) const;
  // .... 一系列Find XXX By XXX  接口 ... ， 不全部复制了.

  // Building descriptors --------------------------------------------

  class LIBPROTOBUF_EXPORT ErrorCollector {
  // 不关心这个错误收集类...
  };

  // 这个是用FileDescriptorProto 填充FileDescriptor的接口.
  const FileDescriptor* BuildFile(const FileDescriptorProto& proto);

  // Same as BuildFile() except errors are sent to the given ErrorCollector.
  const FileDescriptor* BuildFileCollectingErrors(
    const FileDescriptorProto& proto,
    ErrorCollector* error_collector);
 // 依赖相关接口.
  void AllowUnknownDependencies() { allow_unknown_ = true; }
  void EnforceWeakDependencies(bool enforce) { enforce_weak_ = enforce; }

  // Internal stuff --------------------------------------------------
    // 一系列实现细节的接口。。 不复制。。。
 private:
    // 一系列实现细节的接口。。 不复制。。。

  // 当从pool本身的table找不到的时候，试图从database中查找的接口。
  bool TryFindFileInFallbackDatabase(const string& name) const;
  bool TryFindSymbolInFallbackDatabase(const string& name) const;
  bool TryFindExtensionInFallbackDatabase(const Descriptor* containing_type,
                                          int field_number) const;

    // 一系列实现细节的接口。。 不复制。。。

  // See constructor.
  DescriptorDatabase* fallback_database_; // 持有的datebase
  scoped_ptr<Tables> tables_;   // Pool自身的table。 会缓存所有查过的文件内容.

    // 一系列实现细节的接口。。 不复制。。。
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(DescriptorPool);
}

核心的查找接口

https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.cc


Symbol DescriptorPool::Tables::FindByNameHelper(
    const DescriptorPool* pool, const string& name) {
  MutexLockMaybe lock(pool->mutex_);
  known_bad_symbols_.clear();
  known_bad_files_.clear();
  //先从缓存中查询.
  Symbol result = FindSymbol(name);

  // 这个是内部实现的细节 不要在意
  if (result.IsNull() && pool->underlay_ != NULL) {
    // Symbol not found; check the underlay.
    result =
      pool->underlay_->tables_->FindByNameHelper(pool->underlay_, name);
  }

  if (result.IsNull()) {
    // 这里去数据库尝试获取数据.
    // Symbol still not found, so check fallback database.
    if (pool->TryFindSymbolInFallbackDatabase(name)) {
        // 再次刚刚数据库更新数据之后的缓存中获取数据.
        result = FindSymbol(name);
    }
  }

  return result;
}

MessageFactory

任何时候想要获取一个类型的instance , 都要去MessageFactory里面获取。
MessageFactory 是一个纯虚的基类，定义了通过Descripor来获取对应类型instance的接口.

{
 public:
  inline MessageFactory() {}
  virtual ~MessageFactory();
  // 了通过Descripor来获取对应类型instance 的接口
  virtual const Message* GetPrototype(const Descriptor* type) = 0;

  // 这个是获取编译链接好的那些类型的factory单例的入口.
  static MessageFactory* generated_factory();
  // 这个是对应的像上面哪个单例内填装数据的接口，protoc自动生成的文件都会有调用.
  static void InternalRegisterGeneratedMessage(const Descriptor* descriptor,
                                               const Message* prototype);
 private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(MessageFactory);
}

同样有两个核心的派生类

GeneratedMessageFactory
- 一个map , 保存着Descriptor和Message *
- 这个类对应着预先编译链接好的那些类型的反射机制。
DynamicMessageFactory
- 有简单的缓存，保存自己解析过的Descriptor`` </li> <li>可以通过Descriptor“，动态的基于内存构造出一个Message ！！！

解决问题的办法

通过类型名字创建出类型对象 — 预编译proto并且链接进入二进制.

查表！！
是的，你没猜错，就是查表！！！

数据存储在哪里
所有的Descriptor存储在单例的DescriptorPool 中。google::protobuf::DescriptorPool::generated_pool()来获取他的指针。
所有的instance 存储在单例的MessageFactory中。google::protobuf::MessageFactory::generated_factory()来获取他的指针。
将所有的Descriptor & instance 提前维护到表中备查

在protoc 生成的每个cc文件中，都会有下面的代码(protobuf V2 版本) ：


// xxx 应该替换为文件名，比如test.proto的test.

namespace {

//! 将本文件内的全部类型的instance注册进入MessageFactory的接口.
void protobuf_RegisterTypes(const ::std::string&) {
   // 初始化本文件的reflection数据.
  protobuf_AssignDescriptorsOnce();
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedMessage(
    Test_descriptor_, &Test::default_instance());
}
//! 本文件的初始接口.
void protobuf_AddDesc_xxx_2eproto() {
  static bool already_here = false;
  if (already_here) return;
  already_here = true;
  GOOGLE_PROTOBUF_VERIFY_VERSION;
  // 注册本文件的Descriptor包. 这样就可以用名字通过generated_pool获取对应的Descriptor。
  ::google::protobuf::DescriptorPool::InternalAddGeneratedFile(
    "\n\013xxx.proto\022\001T\"\022\n\004Test\022\n\n\002id\030\001 \001(\005", 36);
  // 将本文件的类型instance注册接口注册给MessageFactory.
  // 这里注册接口是为了实现类型的lazy注册。如果没有使用请求某个文件的类型，就不注册对应文件的类型。
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedFile(
    "xxx.proto", &protobuf_RegisterTypes);
  // 构造并且初始化全部instance.
  Test::default_instance_ = new Test();
  Test::default_instance_->InitAsDefaultInstance();
  // 注册清理接口.
  ::google::protobuf::internal::OnShutdown(&protobuf_ShutdownFile_xxx_2eproto);
}
//! 下面利用全局变量的构造函数确保main函数执行之前数据已经进行注册.
struct StaticDescriptorInitializer_xxx_2eproto {
  StaticDescriptorInitializer_xxx_2eproto() {
    protobuf_AddDesc_xxx_2eproto();
  }
} static_descriptor_initializer_xxx_2eproto_;
}

通过类型名字创建出类型对象 — 运行时态编译proto

这里要引入 Importer类.

class LIBPROTOBUF_EXPORT Importer {
 public:
     // 需要使用SourceTree来构造，
     // 不过SourceTree最终是用来构造SourceTreeDescriptorDatabase的。
  Importer(SourceTree* source_tree,
           MultiFileErrorCollector* error_collector);
  ~Importer();

  // 这个就是运行时态加载proto源文件的接口.
  // 多次调用同一个文件只有第一次有效。
  const FileDescriptor* Import(const string& filename);
  // 拿到DescriptorPool 的接口. 
  // 每个Importer都有自己的DescriptorPool。
  inline const DescriptorPool* pool() const {
    return &pool_;
  }
  // 下面是咱不在意的接口.
  void AddUnusedImportTrackFile(const string& file_name);
  void ClearUnusedImportTrackFiles();

 private:
  // 这两个数据成员很好的解释了Importer如何工作 ：
  // 有 SourceTreeDescriptorDatabase 构造一个DescriptorPool，这样
  // 每当有文件被查找的时候，如果缓存中有，直接返回，如果没有，
  // SourceTreeDescriptorDatabase 自然会去加载解析源文件.
  // Import接口则是提前将proto解析加载进入缓存的途径.
  SourceTreeDescriptorDatabase database_;
  DescriptorPool pool_;

  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(Importer);
}

通过对象和对象的属性的名字获取,修改对应属性.

GeneratedMessageReflection 的填装和获取
对于每一个message , 都有一个对应的GeneratedMessageReflection 对象.
这个对象保存了对应message反射操作需要的信息.


//!初始化本文件的所有GeneratedMessageReflection对象.
void protobuf_AssignDesc_xxx_2eproto() {
  protobuf_AddDesc_xxx_2eproto();
  const ::google::protobuf::FileDescriptor* file =
    ::google::protobuf::DescriptorPool::generated_pool()->FindFileByName(
      "xxx.proto");
  GOOGLE_CHECK(file != NULL);
  Test_descriptor_ = file->message_type(0);
  static const int Test_offsets_[1] = {
    //这里在计算属性的内存偏移.
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Test, id_),
  };
  // 这里是个test包填装的GeneratedMessageReflection对象.
  Test_reflection_ =
    new ::google::protobuf::internal::GeneratedMessageReflection(
      Test_descriptor_,
      Test::default_instance_,
      Test_offsets_,
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Test, _has_bits_[0]),
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Test, _unknown_fields_),
      -1,
      ::google::protobuf::DescriptorPool::generated_pool(),
      ::google::protobuf::MessageFactory::generated_factory(),
      sizeof(Test));
}
inline void protobuf_AssignDescriptorsOnce() {
  ::google::protobuf::GoogleOnceInit(&protobuf_AssignDescriptors_once_,
                 &protobuf_AssignDesc_xxx_2eproto);
}

// message.h 中 message的基本接口.
virtual const Reflection* GetReflection() const {
    return GetMetadata().reflection;
}
// 每个message获取自己基本信息的接口.
::google::protobuf::Metadata Test::GetMetadata() const {
  protobuf_AssignDescriptorsOnce();
  ::google::protobuf::Metadata metadata;
  metadata.descriptor = Test_descriptor_;
  metadata.reflection = Test_reflection_;
  return metadata;
}

GeneratedMessageReflection 操作具体对象的属性

按照offset数组的提示，注解获取操作对应内存，这里以int32字段的SetInt32接口为例子.

- 接口定义

#undef DEFINE_PRIMITIVE_ACCESSORS
#define DEFINE_PRIMITIVE_ACCESSORS(TYPENAME, TYPE, PASSTYPE, CPPTYPE)
void GeneratedMessageReflection::Set##TYPENAME(                            \
      Message* message, const FieldDescriptor* field,                        \
      PASSTYPE value) const {                                                \
    USAGE_CHECK_ALL(Set##TYPENAME, SINGULAR, CPPTYPE);                       \
    if (field->is_extension()) {    /*先不要在意这个*/                       \
      return MutableExtensionSet(message)->Set##TYPENAME(                    \
        field->number(), field->type(), value, field);                       \
    } else {
      /*一般的字段走这里*/\
      SetField<TYPE>(message, field, value);                                 \
    }                                                                        \
  }

DEFINE_PRIMITIVE_ACCESSORS(Int32 , int32 , int32 , INT32 )
#undef DEFINE_PRIMITIVE_ACCESSORS

- 内存赋值.


// 找到对应的内存地址，返回合适类型的指针.
template <typename Type>
inline Type* GeneratedMessageReflection::MutableRaw(
    Message* message, const FieldDescriptor* field) const {
  int index = field->containing_oneof() ?
      descriptor_->field_count() + field->containing_oneof()->index() :
      field->index();
  void* ptr = reinterpret_cast<uint8*>(message) + offsets_[index];
  return reinterpret_cast<Type*>(ptr);
}
// 设置protobuf的标志bit.
inline void GeneratedMessageReflection::SetBit(
    Message* message, const FieldDescriptor* field) const {
  if (has_bits_offset_ == -1) {
    return;
  }
  MutableHasBits(message)[field->index() / 32] |= (1 << (field->index() % 32));
}
// 设置某个字段的值
template <typename Type>
inline void GeneratedMessageReflection::SetField(
    Message* message, const FieldDescriptor* field, const Type& value) const {
  if (field->containing_oneof() && !HasOneofField(*message, field)) {
    ClearOneof(message, field->containing_oneof()); // V3 oneof 类型的清理。
  }
  *MutableRaw<Type>(message, field) = value; // 先直接覆盖
  field->containing_oneof() ?
      SetOneofCase(message, field) : SetBit(message, field); // 添加标记bit
}

通过对象和对象方法的名字调用对应的方法.

//TODO

原文链接：https://blog.csdn.net/cchd0001/article/details/52452204