LLVM编译原理
LLVM
LLVM是架构编译器的框架系统,以C++编写而成,用于优化任意程序语言编写的程序的编译时间(compile-time)、链接时间(link-time)、运行时间(run-time)以及空闲时间(idle-time)。对开发者保持开放,并兼容已有脚本。
传统编译器设计
源码 Source Code + 前端 Frontend + 优化器 Optimizer + 后端 Backend(代码生成器 CodeGenerator)+ 机器码 Machine Code,如下图所示

ios的编译器架构
OC、C、C++使用的编译器前端是Clang,Swift是swift,后端都是LLVM,如下图所示

模块说明
前端 Frontend:编译器前端的任务是解析源代码(编译阶段),它会进行 词法分析、语法分析、语义分析、检查源代码是否存在错误,然后构建抽象语法树(Abstract Syntax Tree AST),LLVM的前端还会生成中间代码(intermediate representation,简称IR),可以理解为llvm是编译器 + 优化器, 接收的是IR中间代码,输出的还是IR,给后端,经过后端翻译成目标指令集
优化器 Optimizer:优化器负责进行各种优化,改善代码的运行时间,例如消除冗余计算等
后端 Backend(代码生成器 Code Generator):将代码映射到目标指令集,生成机器代码,并且进行机器代码相关的代码优化
示例代码:
#define add(a, b) a + b
int test(int a, int b)
{
return a + add(b, 10);
}
int main()
{
int a = test(1, 2);
return 0;
}
一、预处理编译阶段
这个阶段主要是处理包括宏的替换,头文件的导入,可以执行如下命令,执行完毕可以看到头文件的导入和宏的替换
//在终端直接查看替换结果
clang -E main.m
//生成对应的文件查看替换后的源码
clang -E main.m >> main2.m
需要注意的是:
typedef在给数据类型取别名时,在预处理阶段不会被替换掉include在这部分中执行,执行后会引入对应的声明
#include 实质是什么?
预编译的时候copy include头文件的内容到当前行
被#include的header file中最常见的内容分为哪几类?
宏定义
typedef
包含别的头文件
inline函数定义
函数声明
struct,union,enum类型定义
(其实可以打开一个.h文件来看看,如 /user/include/stdio.h)
# 1 "main.m"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 412 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.m" 2
int test(int a, int b)
{
return a + b + 10;
}
int main()
{
int a = test(1, 2);
return 0;
}
二、编译阶段
编译阶段主要是进行词法、语法等的分析和检查,然后生成中间代码IR
1、词法分析
预处理完成后就会进行词法分析,这里会把代码切成一个个token,比如大小括号、等于号还有字符串等,
- 可以通过下面的命令查看
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
词法分析结果
int 'int' [StartOfLine] Loc=<main.m:2:1>
identifier 'test' [LeadingSpace] Loc=<main.m:2:5>
l_paren '(' Loc=<main.m:2:9>
int 'int' Loc=<main.m:2:10>
identifier 'a' [LeadingSpace] Loc=<main.m:2:14>
comma ',' Loc=<main.m:2:15>
int 'int' [LeadingSpace] Loc=<main.m:2:17>
identifier 'b' [LeadingSpace] Loc=<main.m:2:21>
r_paren ')' Loc=<main.m:2:22>
l_brace '{' [StartOfLine] Loc=<main.m:3:1>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:4:5>
identifier 'a' [LeadingSpace] Loc=<main.m:4:12>
plus '+' [LeadingSpace] Loc=<main.m:4:14>
identifier 'b' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:4:20>>
plus '+' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:1:21>>
numeric_constant '10' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:4:23>>
semi ';' Loc=<main.m:4:26>
r_brace '}' [StartOfLine] Loc=<main.m:5:1>
int 'int' [StartOfLine] Loc=<main.m:7:1>
identifier 'main' [LeadingSpace] Loc=<main.m:7:5>
l_paren '(' Loc=<main.m:7:9>
r_paren ')' Loc=<main.m:7:10>
l_brace '{' [StartOfLine] Loc=<main.m:8:1>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.m:9:5>
identifier 'a' [LeadingSpace] Loc=<main.m:9:9>
equal '=' [LeadingSpace] Loc=<main.m:9:11>
identifier 'test' [LeadingSpace] Loc=<main.m:9:13>
l_paren '(' Loc=<main.m:9:17>
numeric_constant '1' Loc=<main.m:9:18>
comma ',' Loc=<main.m:9:19>
numeric_constant '2' [LeadingSpace] Loc=<main.m:9:21>
r_paren ')' Loc=<main.m:9:22>
semi ';' Loc=<main.m:9:23>
identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.m:10:5>
l_paren '(' Loc=<main.m:10:11>
string_literal '"%d"' Loc=<main.m:10:12>
comma ',' Loc=<main.m:10:16>
identifier 'a' [LeadingSpace] Loc=<main.m:10:18>
r_paren ')' Loc=<main.m:10:19>
semi ';' Loc=<main.m:10:20>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:11:5>
numeric_constant '0' [LeadingSpace] Loc=<main.m:11:12>
semi ';' Loc=<main.m:11:13>
r_brace '}' [StartOfLine] Loc=<main.m:12:1>
eof '' Loc=<main.m:12:2>
wenfeng.chen@Q44FK4LWMF cws % clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m >> 2.txt
int 'int' [StartOfLine] Loc=<main.m:2:1>
identifier 'test' [LeadingSpace] Loc=<main.m:2:5>
l_paren '(' Loc=<main.m:2:9>
int 'int' Loc=<main.m:2:10>
identifier 'a' [LeadingSpace] Loc=<main.m:2:14>
comma ',' Loc=<main.m:2:15>
int 'int' [LeadingSpace] Loc=<main.m:2:17>
identifier 'b' [LeadingSpace] Loc=<main.m:2:21>
r_paren ')' Loc=<main.m:2:22>
l_brace '{' [StartOfLine] Loc=<main.m:3:1>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:4:5>
identifier 'a' [LeadingSpace] Loc=<main.m:4:12>
plus '+' [LeadingSpace] Loc=<main.m:4:14>
identifier 'b' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:4:20>>
plus '+' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:1:21>>
numeric_constant '10' [LeadingSpace] Loc=<main.m:4:16 <Spelling=main.m:4:23>>
semi ';' Loc=<main.m:4:26>
r_brace '}' [StartOfLine] Loc=<main.m:5:1>
int 'int' [StartOfLine] Loc=<main.m:7:1>
identifier 'main' [LeadingSpace] Loc=<main.m:7:5>
l_paren '(' Loc=<main.m:7:9>
r_paren ')' Loc=<main.m:7:10>
l_brace '{' [StartOfLine] Loc=<main.m:8:1>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.m:9:5>
identifier 'a' [LeadingSpace] Loc=<main.m:9:9>
equal '=' [LeadingSpace] Loc=<main.m:9:11>
identifier 'test' [LeadingSpace] Loc=<main.m:9:13>
l_paren '(' Loc=<main.m:9:17>
numeric_constant '1' Loc=<main.m:9:18>
comma ',' Loc=<main.m:9:19>
numeric_constant '2' [LeadingSpace] Loc=<main.m:9:21>
r_paren ')' Loc=<main.m:9:22>
semi ';' Loc=<main.m:9:23>
identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.m:10:5>
l_paren '(' Loc=<main.m:10:11>
string_literal '"%d"' Loc=<main.m:10:12>
comma ',' Loc=<main.m:10:16>
identifier 'a' [LeadingSpace] Loc=<main.m:10:18>
r_paren ')' Loc=<main.m:10:19>
semi ';' Loc=<main.m:10:20>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:11:5>
numeric_constant '0' [LeadingSpace] Loc=<main.m:11:12>
semi ';' Loc=<main.m:11:13>
r_brace '}' [StartOfLine] Loc=<main.m:12:1>
eof '' Loc=<main.m:12:2>
2、语法分析
可以通过下面命令查看语法分析的结果
词法分析完成后就是
语法分析,它的任务是验证语法是否正确,在词法分析的基础上将单词序列组合成各类此法短语,如程序、语句、表达式 等等,然后将所有节点组成抽象语法树(Abstract Syntax Tree AST),语法分析程序判断程序在结构上是否正确clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
其中,主要说明几个关键字的含义
- FunctionDecl 函数
- ParmVarDecl 参数
- CallExpr 调用一个函数
- BinaryOperator 运算符
更具体的找到一个链接可以查阅:https://www.jianshu.com/p/d21c16b8953e
TranslationUnitDecl 0x153840a08 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x153841908 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x153840fb0 '__int128'
|-TypedefDecl 0x153841978 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x153840fd0 'unsigned __int128'
|-TypedefDecl 0x1538e0e18 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *'
| `-PointerType 0x1538419d0 'SEL *'
| `-BuiltinType 0x153841210 'SEL'
|-TypedefDecl 0x1538e0ef8 <<invalid sloc>> <invalid sloc> implicit id 'id'
| `-ObjCObjectPointerType 0x1538e0ea0 'id'
| `-ObjCObjectType 0x1538e0e70 'id'
|-TypedefDecl 0x1538e0fd8 <<invalid sloc>> <invalid sloc> implicit Class 'Class'
| `-ObjCObjectPointerType 0x1538e0f80 'Class'
| `-ObjCObjectType 0x1538e0f50 'Class'
|-ObjCInterfaceDecl 0x1538e1030 <<invalid sloc>> <invalid sloc> implicit Protocol
|-TypedefDecl 0x1538e13d0 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x1538e11a0 'struct __NSConstantString_tag'
| `-Record 0x1538e1100 '__NSConstantString_tag'
|-TypedefDecl 0x1538e1438 <<invalid sloc>> <invalid sloc> implicit __SVInt8_t '__SVInt8_t'
| `-BuiltinType 0x153841230 '__SVInt8_t'
|-TypedefDecl 0x1538e14a0 <<invalid sloc>> <invalid sloc> implicit __SVInt16_t '__SVInt16_t'
| `-BuiltinType 0x153841250 '__SVInt16_t'
|-TypedefDecl 0x1538e1508 <<invalid sloc>> <invalid sloc> implicit __SVInt32_t '__SVInt32_t'
| `-BuiltinType 0x153841270 '__SVInt32_t'
|-TypedefDecl 0x1538e1570 <<invalid sloc>> <invalid sloc> implicit __SVInt64_t '__SVInt64_t'
| `-BuiltinType 0x153841290 '__SVInt64_t'
|-TypedefDecl 0x1538e15d8 <<invalid sloc>> <invalid sloc> implicit __SVUint8_t '__SVUint8_t'
| `-BuiltinType 0x1538412b0 '__SVUint8_t'
|-TypedefDecl 0x1538e1640 <<invalid sloc>> <invalid sloc> implicit __SVUint16_t '__SVUint16_t'
| `-BuiltinType 0x1538412d0 '__SVUint16_t'
|-TypedefDecl 0x1538e16a8 <<invalid sloc>> <invalid sloc> implicit __SVUint32_t '__SVUint32_t'
| `-BuiltinType 0x1538412f0 '__SVUint32_t'
|-TypedefDecl 0x1538e1710 <<invalid sloc>> <invalid sloc> implicit __SVUint64_t '__SVUint64_t'
| `-BuiltinType 0x153841310 '__SVUint64_t'
|-TypedefDecl 0x1538e1778 <<invalid sloc>> <invalid sloc> implicit __SVFloat16_t '__SVFloat16_t'
| `-BuiltinType 0x153841330 '__SVFloat16_t'
|-TypedefDecl 0x1538e17e0 <<invalid sloc>> <invalid sloc> implicit __SVFloat32_t '__SVFloat32_t'
| `-BuiltinType 0x153841350 '__SVFloat32_t'
|-TypedefDecl 0x1538e1848 <<invalid sloc>> <invalid sloc> implicit __SVFloat64_t '__SVFloat64_t'
| `-BuiltinType 0x153841370 '__SVFloat64_t'
|-TypedefDecl 0x1538e18b0 <<invalid sloc>> <invalid sloc> implicit __SVBFloat16_t '__SVBFloat16_t'
| `-BuiltinType 0x153841390 '__SVBFloat16_t'
|-TypedefDecl 0x1538e1918 <<invalid sloc>> <invalid sloc> implicit __clang_svint8x2_t '__clang_svint8x2_t'
| `-BuiltinType 0x1538413b0 '__clang_svint8x2_t'
|-TypedefDecl 0x1538e1980 <<invalid sloc>> <invalid sloc> implicit __clang_svint16x2_t '__clang_svint16x2_t'
| `-BuiltinType 0x1538413d0 '__clang_svint16x2_t'
|-TypedefDecl 0x1538e19e8 <<invalid sloc>> <invalid sloc> implicit __clang_svint32x2_t '__clang_svint32x2_t'
| `-BuiltinType 0x1538413f0 '__clang_svint32x2_t'
|-TypedefDecl 0x1538e1a50 <<invalid sloc>> <invalid sloc> implicit __clang_svint64x2_t '__clang_svint64x2_t'
| `-BuiltinType 0x153841410 '__clang_svint64x2_t'
|-TypedefDecl 0x1538e1ab8 <<invalid sloc>> <invalid sloc> implicit __clang_svuint8x2_t '__clang_svuint8x2_t'
| `-BuiltinType 0x153841430 '__clang_svuint8x2_t'
|-TypedefDecl 0x1538e1b20 <<invalid sloc>> <invalid sloc> implicit __clang_svuint16x2_t '__clang_svuint16x2_t'
| `-BuiltinType 0x153841450 '__clang_svuint16x2_t'
|-TypedefDecl 0x1538e1b88 <<invalid sloc>> <invalid sloc> implicit __clang_svuint32x2_t '__clang_svuint32x2_t'
| `-BuiltinType 0x153841470 '__clang_svuint32x2_t'
|-TypedefDecl 0x1538e1bf0 <<invalid sloc>> <invalid sloc> implicit __clang_svuint64x2_t '__clang_svuint64x2_t'
| `-BuiltinType 0x153841490 '__clang_svuint64x2_t'
|-TypedefDecl 0x1538e1c58 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat16x2_t '__clang_svfloat16x2_t'
| `-BuiltinType 0x1538414b0 '__clang_svfloat16x2_t'
|-TypedefDecl 0x1538e1cc0 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat32x2_t '__clang_svfloat32x2_t'
| `-BuiltinType 0x1538414d0 '__clang_svfloat32x2_t'
|-TypedefDecl 0x1538e1d28 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat64x2_t '__clang_svfloat64x2_t'
| `-BuiltinType 0x1538414f0 '__clang_svfloat64x2_t'
|-TypedefDecl 0x1538e1d90 <<invalid sloc>> <invalid sloc> implicit __clang_svbfloat16x2_t '__clang_svbfloat16x2_t'
| `-BuiltinType 0x153841510 '__clang_svbfloat16x2_t'
|-TypedefDecl 0x1538e2200 <<invalid sloc>> <invalid sloc> implicit __clang_svint8x3_t '__clang_svint8x3_t'
| `-BuiltinType 0x153841530 '__clang_svint8x3_t'
|-TypedefDecl 0x1538e2268 <<invalid sloc>> <invalid sloc> implicit __clang_svint16x3_t '__clang_svint16x3_t'
| `-BuiltinType 0x153841550 '__clang_svint16x3_t'
|-TypedefDecl 0x1538e22d0 <<invalid sloc>> <invalid sloc> implicit __clang_svint32x3_t '__clang_svint32x3_t'
| `-BuiltinType 0x153841570 '__clang_svint32x3_t'
|-TypedefDecl 0x1538e2338 <<invalid sloc>> <invalid sloc> implicit __clang_svint64x3_t '__clang_svint64x3_t'
| `-BuiltinType 0x153841590 '__clang_svint64x3_t'
|-TypedefDecl 0x1538e23a0 <<invalid sloc>> <invalid sloc> implicit __clang_svuint8x3_t '__clang_svuint8x3_t'
| `-BuiltinType 0x1538415b0 '__clang_svuint8x3_t'
|-TypedefDecl 0x1538e2408 <<invalid sloc>> <invalid sloc> implicit __clang_svuint16x3_t '__clang_svuint16x3_t'
| `-BuiltinType 0x1538415d0 '__clang_svuint16x3_t'
|-TypedefDecl 0x1538e2470 <<invalid sloc>> <invalid sloc> implicit __clang_svuint32x3_t '__clang_svuint32x3_t'
| `-BuiltinType 0x1538415f0 '__clang_svuint32x3_t'
|-TypedefDecl 0x1538e24d8 <<invalid sloc>> <invalid sloc> implicit __clang_svuint64x3_t '__clang_svuint64x3_t'
| `-BuiltinType 0x153841610 '__clang_svuint64x3_t'
|-TypedefDecl 0x1538e2540 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat16x3_t '__clang_svfloat16x3_t'
| `-BuiltinType 0x153841630 '__clang_svfloat16x3_t'
|-TypedefDecl 0x1538e25a8 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat32x3_t '__clang_svfloat32x3_t'
| `-BuiltinType 0x153841650 '__clang_svfloat32x3_t'
|-TypedefDecl 0x1538e2610 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat64x3_t '__clang_svfloat64x3_t'
| `-BuiltinType 0x153841670 '__clang_svfloat64x3_t'
|-TypedefDecl 0x1538e2678 <<invalid sloc>> <invalid sloc> implicit __clang_svbfloat16x3_t '__clang_svbfloat16x3_t'
| `-BuiltinType 0x153841690 '__clang_svbfloat16x3_t'
|-TypedefDecl 0x1538e26e0 <<invalid sloc>> <invalid sloc> implicit __clang_svint8x4_t '__clang_svint8x4_t'
| `-BuiltinType 0x1538416b0 '__clang_svint8x4_t'
|-TypedefDecl 0x1538e2748 <<invalid sloc>> <invalid sloc> implicit __clang_svint16x4_t '__clang_svint16x4_t'
| `-BuiltinType 0x1538416d0 '__clang_svint16x4_t'
|-TypedefDecl 0x1538e27b0 <<invalid sloc>> <invalid sloc> implicit __clang_svint32x4_t '__clang_svint32x4_t'
| `-BuiltinType 0x1538416f0 '__clang_svint32x4_t'
|-TypedefDecl 0x1538e2818 <<invalid sloc>> <invalid sloc> implicit __clang_svint64x4_t '__clang_svint64x4_t'
| `-BuiltinType 0x153841710 '__clang_svint64x4_t'
|-TypedefDecl 0x1538e2880 <<invalid sloc>> <invalid sloc> implicit __clang_svuint8x4_t '__clang_svuint8x4_t'
| `-BuiltinType 0x153841730 '__clang_svuint8x4_t'
|-TypedefDecl 0x1538e28e8 <<invalid sloc>> <invalid sloc> implicit __clang_svuint16x4_t '__clang_svuint16x4_t'
| `-BuiltinType 0x153841750 '__clang_svuint16x4_t'
|-TypedefDecl 0x1538e2950 <<invalid sloc>> <invalid sloc> implicit __clang_svuint32x4_t '__clang_svuint32x4_t'
| `-BuiltinType 0x153841770 '__clang_svuint32x4_t'
|-TypedefDecl 0x1538e29b8 <<invalid sloc>> <invalid sloc> implicit __clang_svuint64x4_t '__clang_svuint64x4_t'
| `-BuiltinType 0x153841790 '__clang_svuint64x4_t'
|-TypedefDecl 0x1538e2a20 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat16x4_t '__clang_svfloat16x4_t'
| `-BuiltinType 0x1538417b0 '__clang_svfloat16x4_t'
|-TypedefDecl 0x1538e2a88 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat32x4_t '__clang_svfloat32x4_t'
| `-BuiltinType 0x1538417d0 '__clang_svfloat32x4_t'
|-TypedefDecl 0x1538e2af0 <<invalid sloc>> <invalid sloc> implicit __clang_svfloat64x4_t '__clang_svfloat64x4_t'
| `-BuiltinType 0x1538417f0 '__clang_svfloat64x4_t'
|-TypedefDecl 0x1538e2b58 <<invalid sloc>> <invalid sloc> implicit __clang_svbfloat16x4_t '__clang_svbfloat16x4_t'
| `-BuiltinType 0x153841810 '__clang_svbfloat16x4_t'
|-TypedefDecl 0x1538e2bc0 <<invalid sloc>> <invalid sloc> implicit __SVBool_t '__SVBool_t'
| `-BuiltinType 0x153841830 '__SVBool_t'
|-TypedefDecl 0x1538e2c68 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x1538e2c20 'char *'
| `-BuiltinType 0x153840ab0 'char'
|-TypedefDecl 0x1538e2cd8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
| `-PointerType 0x1538e2c20 'char *'
| `-BuiltinType 0x153840ab0 'char'
|-FunctionDecl 0x1538e2ed0 <main.m:3:1, line:6:1> line:3:5 used test 'int (int, int)'
| |-ParmVarDecl 0x1538e2d48 <col:10, col:14> col:14 used a 'int'
| |-ParmVarDecl 0x1538e2dc8 <col:17, col:21> col:21 used b 'int'
| `-CompoundStmt 0x1538e30c0 <line:4:1, line:6:1>
| `-ReturnStmt 0x1538e30b0 <line:5:5, col:23>
| `-BinaryOperator 0x1538e3090 <col:12, col:23> 'int' '+'
| |-BinaryOperator 0x1538e3050 <col:12, col:20> 'int' '+'
| | |-ImplicitCastExpr 0x1538e3020 <col:12> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x1538e2fe0 <col:12> 'int' lvalue ParmVar 0x1538e2d48 'a' 'int'
| | `-ImplicitCastExpr 0x1538e3038 <col:20> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x1538e3000 <col:20> 'int' lvalue ParmVar 0x1538e2dc8 'b' 'int'
| `-IntegerLiteral 0x1538e3070 <col:23> 'int' 10
`-FunctionDecl 0x1538e3130 <line:8:1, line:12:1> line:8:5 main 'int ()'
`-CompoundStmt 0x1538eef90 <line:9:1, line:12:1>
|-DeclStmt 0x1538eef48 <line:10:5, col:23>
| `-VarDecl 0x1538eee00 <col:5, col:22> col:9 a 'int' cinit
| `-CallExpr 0x1538eef18 <col:13, col:22> 'int'
| |-ImplicitCastExpr 0x1538eef00 <col:13> 'int (*)(int, int)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x1538eee68 <col:13> 'int (int, int)' Function 0x1538e2ed0 'test' 'int (int, int)'
| |-IntegerLiteral 0x1538eee88 <col:18> 'int' 1
| `-IntegerLiteral 0x1538eeea8 <col:21> 'int' 2
`-ReturnStmt 0x1538eef80 <line:11:5, col:12>
`-IntegerLiteral 0x1538eef60 <col:12> 'int' 0
3、生成中间代码IR
完成以上步骤后,就开始生成中间代码IR了,代码生成器(Code Generation)会将语法树自顶向下遍历逐步翻译成LLVM IR,
- 可以通过下面命令可以生成
.ll的文本文件,查看IR代码。OC代码在这一步会进行runtime桥接,:property合成、ARC处理等
clang -S -fobjc-arc -emit-llvm main.m
@ 全局标识 % 局部标识 alloca 开辟空间 align 内存对齐 i32 32bit,4个字节 store 写入内存 load 读取数据 call 调用函数 ret 返回IR文件在OC中是可以进行优化的,一般设置是在
target - Build Setting - Optimization Level(优化器等级)中设置。LLVM的优化级别分别是-O0 -O1 -O2 -O3 -Os(第一个是大写英文字母O),下面是带优化的生成中间代码IR的命令clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
生成IR
; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx12.0.0"
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @test(i32 %0, i32 %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
%5 = load i32, i32* %3, align 4
%6 = load i32, i32* %4, align 4
%7 = add nsw i32 %5, %6
%8 = add nsw i32 %7, 10
ret i32 %8
}
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
%3 = call i32 @test(i32 1, i32 2)
store i32 %3, i32* %2, align 4
ret i32 0
}
attributes #0 = { noinline nounwind optnone ssp uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+v8.5a,+zcm,+zcz" }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14}
!llvm.ident = !{!15}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 12, i32 3]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
!5 = !{i32 1, !"Objective-C Class Properties", i32 64}
!6 = !{i32 1, !"Objective-C Enforce ClassRO Pointer Signing", i8 0}
!7 = !{i32 1, !"wchar_size", i32 4}
!8 = !{i32 1, !"branch-target-enforcement", i32 0}
!9 = !{i32 1, !"sign-return-address", i32 0}
!10 = !{i32 1, !"sign-return-address-all", i32 0}
!11 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
!12 = !{i32 7, !"PIC Level", i32 2}
!13 = !{i32 7, !"uwtable", i32 1}
!14 = !{i32 7, !"frame-pointer", i32 1}
!15 = !{!"Apple clang version 13.1.6 (clang-1316.0.21.2.5)"}
xcode7以后开启bitcode,苹果会做进一步优化,生成.bc的中间代码,我们通过优化后的IR代码生成.bc代码
clang -emit-llvm -c main.ll -o main.bc

有关具体讲解可以查阅官方文档:https://llvm.org/docs/BitCodeFormat.html
三、后端
LLVM在后端主要是会通过一个个的Pass去优化,每个Pass做一些事情,最终生成汇编代码
生成汇编代码
clang -S -fobjc-arc main.ll -o main.s
- 我们通过最终的
.bc或者.ll代码生成汇编代码
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 12, 0 sdk_version 12, 3
.globl _test ; -- Begin function test
.p2align 2
_test: ; @test
.cfi_startproc
; %bb.0:
add w8, w0, #10
add w0, w8, w1
ret
.cfi_endproc
; -- End function
.globl _main ; -- Begin function main
.p2align 2
_main: ; @main
.cfi_startproc
; %bb.0:
mov w0, #0
ret
.cfi_endproc
; -- End function
.section __DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
.long 0
.long 64
.subsections_via_symbols
四、生成目标文件
目标文件的生成,是汇编器以汇编代码作为插入,将汇编代码转换为机器代码,最后输出目标文件(object file)
clang -fmodules -c main.s -o main.o

可以通过nm命令,查看下main.o中的符号
$xcrun nm -nm main.o
0000000000000000 (__TEXT,__text) external _test
0000000000000000 (__TEXT,__text) non-external ltmp0
000000000000000c (__TEXT,__text) external _main
0000000000000014 (__DATA,__objc_imageinfo) non-external ltmp1
0000000000000020 (__LD,__compact_unwind) non-external ltmp2
undefined表示在当前文件暂时找不到符号external表示这个符号是外部可以访问的
五、链接与绑定
链接主要是链接需要的动态库和静态库,生成可执行文件,其中
- 静态库会和可执行文件合并
- 动态库是独立的
连接器把编译生成的.o文件和 .dyld .a文件链接,生成一个mach-o文件
clang main.o -o main
查看链接之后的符号
$xcrun nm -nm main
结果如下所示
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003fa4 (__TEXT,__text) external _test
0000000100003fb0 (__TEXT,__text) external _main
绑定主要是通过不同的架构,生成对应的mach-o格式可执行文件
可以查看main是什么格式,此时是mach-o可执行文件
总结
LLVM的编译流程
