linux 查看文件字符格式,Linux下文件字符编码格式检测和转换

Tools

enca 命令名是Extremely Naive Charset Analyser的缩写。

ubuntu下

# sudo apt-get install enca

Use

1.查看原文件格式

# enca test.txt

Simplified Chinese National Standard; GB2312

2.限制字符范围

# enca --list languages

belarusian: CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855 KOI8-U

bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113

czech: ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK

estonian: ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic

croatian: CP1250 ISO-8859-2 IBM852 macce CORK

hungarian: ISO-8859-2 CP1250 IBM852 macce CORK

lithuanian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic

latvian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic

polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK

russian: KOI8-R CP1251 ISO-8859-5 IBM866 maccyr

slovak: CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK

slovene: ISO-8859-2 CP1250 IBM852 macce CORK

ukrainian: CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr

chinese: GBK BIG5 HZ

none:

# enca -L chinese test.txt

Simplified Chinese National Standard; GB2312

3.使用 enca 进行转换

# enca -x utf8 test.txt

# enca -x UTF8 -L chinese test.txt

Universal transformation format 8 bits; UTF-

注意 , enca 命令会将源文件覆盖, 所以使用这个命令时, 要注意备份源文件.

Script

#! /bin/bash

#auth : green

#date : 2019-08-02

#func : docu converters

#vers : 0.0

if [[ -f $1 ]];then

echo `enca $1`

echo "conv $1 to utf8"

`enca -x utf8 $1`

echo "conv over!"

fi