深入解析String及字符串常量池
引子
在研究ThreadLocal的时候研究到了WeakRefrence,然后对它测试如下
测试一:
String s = "sd";
WeakReference weakReference = new WeakReference(s);
System.out.println(weakReference.get());
System.gc();//没清掉
System.out.println(weakReference.get());//仍然get到sd
测试二:
String s = new String("ds");
WeakReference weakReference = new WeakReference(s);
System.out.println(weakReference.get());
System.gc();//没清掉
System.out.println(weakReference.get());//仍然get到ds
测试三:
WeakReference weakReference = new WeakReference("dsa");
System.out.println(weakReference.get());
System.gc();//没清掉
System.out.println(weakReference.get());//仍然get到dsa
测试四:
WeakReference weakReference = new WeakReference(new String("dsa"));
System.out.println(weakReference.get());
System.gc();//清掉了
System.out.println(weakReference.get());//get到null
测试一:栈的s指向堆中常量池的sd,首先这是一个强引用,其次gc清除的是eden和survivor,并不会清除常量池
测试二:栈的s指向堆的String obj,这个obj的 char[] value指向常量池的ds,gc很想清掉这个obj,但是有强引用s
测试三:dsa会放到常量池,gc清除eden和survivor,所以清不掉
测试四:dsa会放到常量池,堆中新建一个String obj,char[] value指向常量池dsa,没有引用指向这个obj,可以清除
这些结论是我在看了很多篇博客加上自己编写代码查看常量池内容后得出的结论,也不一定正确,如有纰漏请指出
查看常量池常量
NO0b的文章(下文有链接)提到
String s = “abc”+ “def”, 会直接把“abcdef"放入字符串常量池, 而不把 "abc"和"def"放进常量池,但是没有验证过程,笔者找过很多方法想要验证这些点,因为不验证你永远不知道谁说的对,最终我通过分析字节码文件写代码实现了这一点
public class ConstantPool {
//class 文件中常量池起始偏移,CA FE BA BE占四字节,主次版本号占四个字节
private static final int CONSTANT_POOL_COUNT_INDEX = 8;
//CONSTANT_UTF8_INFO常量的tag标志
private static final int CONSTANT_UTF8_INFO = 1;
//常量池中常量所占的长度,CONSTANT_UTF8_INFO除外吗,因为他是不定长的
//0,1,2为-1是因为常量池0号索引不用,1号索引是CONSTANT_UTF8_INFO,他是不定长的,2号位置没人用,其他位置的值见172页
//(见深入理解jvm虚拟机第二版169页表格)这个数组的索引就是常量的tag
private static final int[] CONSTANT_ITEM_LENGTH = {-1,-1,-1,5,5,9,9,3,3,5,5,5,5};
//一字节
private static final int u1 = 1;
//二字节
private static final int u2= 2;
private byte[] classByte;
public ConstantPool(byte[] classByte){
this.classByte = classByte;
}
//获得class文件第8-9个字节的byte,转int,见168页的图,00 16就是代表22,即0-21,有21项常量
private int getConstantPoolCount() {
return ByteUtils.bytes2Int(classByte,CONSTANT_POOL_COUNT_INDEX,u2);
}
public List<String> getStringConstant(){
List<String> list = new ArrayList<>();
//拿到class文件8-9字节代表的十进制数,即有多少项常量
int cpc = getConstantPoolCount();
//跳到第一项常量,+u2是因为常量项数占u2字节
int offset = CONSTANT_POOL_COUNT_INDEX+u2;
for(int i=0;i<cpc;i++){
//tag占u1字节,拿到当前常量的tag标志
int tag = ByteUtils.bytes2Int(classByte,offset,u1);
//如果tag是CONSTANT_UTF8_INFO
if(tag==CONSTANT_UTF8_INFO){
//拿到u8的length,u8length占u2个字节
int len = ByteUtils.bytes2Int(classByte,offset+u1,u2);
//移到u8的bytes内容
offset+=(u1+u2);
//从这里拿出len长度的字节转成string
String str = ByteUtils.bytes2String(classByte,offset,len);
list.add(str);
offset+=len;
} else{
//{-1,-1,-1,5,5,9,9,3,3,5,5,5,5}
offset+=CONSTANT_ITEM_LENGTH[tag];
}
}
return list;
}
}
public class ByteUtils {
//将bytes数组的start开始的len长度的byte转成int
public static int bytes2Int(byte[]bytes ,int start,int len){
int sum = 0;
int end = start+len;
for(int i=start;i<end;i++){
int n = ((int)bytes[i])&0xff;
n<<=(--len)*8;
sum+=n;
}
return sum;
}
public static byte[] int2Bytes(int value,int len){
byte[] bytes = new byte[len];
for(int i=0;i<len;i++){
bytes[len-i-1] = (byte)((value>>8*i)&0xff);
}
return bytes;
}
public static String bytes2String(byte[]bytes,int start,int len){
return new String(bytes,start,len);
}
}
public class TestString {
String s = "abc"+ "def";
}
public class Test {
public static void main(String[] args) throws Exception {
InputStream in = new FileInputStream("F:\\lry\\project\\idea\\basic\\target\\classes\\com\\lry\\basic\\jvm\\constantPool\\TestString.class");
byte [] bytes = new byte[in.available()];
in.read(bytes);
in.close();
ConstantPool constantPool = new ConstantPool(bytes);
System.out.println(constantPool.getStringConstant());
}
}
例子
首先说明一点,字符串常量池在1.7之后已经移到堆中
编译期可确定的例子
例1:
String s = “abc”+ “def”; 的常量池内容如下
[s, Ljava/lang/String;, <init>, ()V, Code, LineNumberTable, LocalVariableTable, this, Lcom/lry/basi/jvm
/constantPool/TestString;, SourceFile, TestString.java, abcdef, com/lry/basic/jvm/constantPool/TestString, java/lang/Object]
可以看出输出只有s(字面量)和abcdef,并没有abc,def,常量相加,在编译期就会合并起来,所以没有abc,def
例2:
String s3 = new String(“a3”+“a33”+“a333”);
a3和a33和a333都是常量编译期可确定,只会把a3a33a333放入常量池
例3:
final String s = “hello”;
final String ss = " world";
String sss = s+ss;
把hello,world,hello world都放入常量池,这是因为final是常量,在编译期就确定了,编译期sss就是hello world了
例4:
final String s = “hello”;
String ss = s+" world";
把hello和hello world放入常量池,world未声明,所以只放结果
编译期不可确定的例子
例1:
String s = new String(“abc”) + new String(“def”); 的常量池内容如下
[s, Ljava/lang/String;, <init>, ()V, Code, LineNumberTable, LocalVariableTable, this,
Lcom/lry/basic/jvm/constantPool/TestString;, SourceFile, TestString.java, java/lang/StringBuilder,
java/lang/String, abc, def, com/lry/basic/jvm/constantPool/TestString, java/lang/Object,
(Ljava/lang/String;)V, append, (Ljava/lang/String;)Ljava/lang/StringBuilder;, toString, ()Ljava/lang/String;]
可以看出只有abc和def,没有abcdef,另外还出现了StringBuilder,说明是他StringBuilder实现的
例2:
String s = “hello”;
String ss = " world";
String sss = s+ss;//这里也是StringBuilder实现
只会把hello,world放入常量池,针对这种情况,我画一下jvm内存结构图
例3:
StringBuilder sb = new StringBuilder().append(“hello”).append(" world");
只会把hello,world放入常量池,没有hello world
例4:
final String s = “hello”;
String s1 = " world";
String ss = s+s1;//由StringBuilder实现
把hello和 world放入常量池,没有hello world
例5:
String s = “hello”;
String ss = s+" world";//由StringBuilder实现
把hello和 world放入常量池,没有hello world
这5个例子全部都是由StringBuilder实现,全部都是只把过程存到了常量池,结果没有存
更多你疑惑不解的例子充分发挥自己的想象在这里得到验证,本章限于篇幅就不说intern了,下次再说,下篇博客我将详细讲解intern,并且举几个例子让你一定懂它。
参考文章及书籍
《深入理解java虚拟机第二版-周志明》
美团string#intern
常量池
string#intern