Java编程的艺术：从入门到精通

发表时间: 2024-04-17 23:36

1. 编码

什么是编码？

计算机中存储的都是二进制，但是要显示的时候，就是我们看到的却可以有中国，a 1 等字符

计算机中是没有存储字符的，但是我们却看到了。计算机在存储这些信息的时候，根据一个有规则的编号，当用户输入a 有a对映的编号，就将这个编号存进计算机中这就是编码。

计算机只能识别二进制数据。

为了方便应用计算机，让它可以识别各个国家的文字。就将各个国家的文字用数字来表示，并一一对应，形成一张表，这就是编码表。

例如：汉字中

有一种编码：

中字在utf 8中对映的编码

utf-8 -->100

在gbk中呢？有可能就不是100了

gbk --> 150

很显然同一个信息在不同的编码中对映的数字也不同，

不同的国家和地区使用的码表是不同的，

gbk 是中国大陆

bjg5 是台湾同胞中的繁体字。所以如果给big5一个简体字是不认识的。

还有ASCII 美国标准信息交换码

1.1. 码表

常见的码表如下：

ASCII：美国标准信息交换码。用一个字节的7位可以表示。

ISO8859-1：拉丁码表。欧洲码表，用一个字节的8位表示。又称Latin-1(拉丁编码)或“西欧语言”。ASCII码是包含的仅仅是英文字母，并且没有完全占满256个编码位置，所以它以ASCII为基础，在空置的0xA0-0xFF的范围内，加入192个字母及符号，

藉以供使用变音符号的拉丁字母语言使用。从而支持德文，法文等。因而它依然是一个单字节编码，只是比ASCII更全面。

GB2312：中国的中文编码表。

GBK：中国的中文编码表升级，融合了更多的中文文字符号。

Unicode：国际标准码，融合了多种文字。所有文字都用两个字节来表示，Java语言使用的就是unicode。

UTF-8：最多用三个字节来表示一个字符。

（我们以后接触最多的是iso8859-1、gbk、utf-8）

查看上述码表后，很显然中文的‘中’在iso8859-1中是没有对映的编码的。或者一个字符在2中码表中对应的编码不同，例如有一些字在不同的编码中是有交集的，例如bjg5 和gbk 中的汉字简体和繁体可能是一样的，就是有交集，但是在各自码表中的数字不一样。

例如

使用gbk 将中文保存在计算机中，

中国

对映 100 200 如果使用big5 打开

可能？ ...

不同的编码对映的是不一样的。

很显然，我们使用什么样的编码写数据，就需要使用什么样的编码来对数据。

ISO8859-1：一个字节

GBK：两个字节包含了英文字符和扩展的中文 ISO8859-1+中文字符

UTF-8 万国码，推行的。是1~3个字节不等长。英文存的是1个字节，中文存的是3个字节，是为了节省空间。

1.2. 编码：

字符串---》字节数组

String类的getBytes() 方法进行编码，将字符串，转为对映的二进制，并且这个方法可以指定编码表。如果没有指定码表，该方法会使用操作系统默认码表。

注意：中国大陆的Windows系统上默认的编码一般为GBK。在Java程序中可以使用System.getProperty("file.encoding")方式得到当前的默认编码。

1.3. 解码：

字节数组---》字符串

String类的构造函数完成。

String(byte[] bytes) 使用系统默认码表

String(byte[],charset)指定码表

注意：我们使用什么字符集（码表）进行编码，就应该使用什么字符集进行解码，否则很有可能出现乱码（兼容字符集不会）。

// 编码操作与解码操作。

public static void main(String[] args) throws Exception {

String value = System.getProperty("file.encoding");

System.out.println("系统默认的编码为 " + value);

String str = "中";

// 编码操作

byte[] bytes = str.getBytes();

byte[] bytes2 = str.getBytes("gbk");// d6d0

byte[] bytes3 = str.getBytes("utf-8");// e4b8ad

System.out.println(Arrays.toString(bytes)); // [-42, -48]

System.out.println(Arrays.toString(bytes2));// [-42, -48]

System.out.println(Arrays.toString(bytes3));// [-28, -72, -83]

// 解码操作

// 编码gbk,解码utf-8乱码。

String str2 = new String(bytes2, "utf-8");

System.out.println(str2);

// 编码utf-8 解码gbk，乱码

str2 = new String(bytes3, "gbk");

System.out.println(str2);

// gbk兼容gb2312所以，没有问题。

str = new String("中国".getBytes("gb2312"), "gbk");

System.out.println(str);

}

存文件时可以使用各种编码，但是解码的时候要对映的采用相同的解码方式。

我们的字符流自动的做了编码和解码的工作，写一个中文，字符流进行了编码，存到了计算机中读到了一个字符，字符流进行了解码，我们可以看到字符。因为文件存的都是二进制。

但是拷贝图片时，是纯二进制，不是有意义的字符，所以码表无法转换。

字符流的弊端：

一：无法拷贝图片和视频。

二：拷贝文件使用字节流而不使用字符流，因为字符流读文件涉及到解码，会先解码，写文件的时候又涉及到编码，这些操作多余，而且读和写的码表不对应还容易引发问题。

例如FileReader 读文件，我们没有指定编码时，默认是按照系统编码gbk进行操作，如果读到utf-8的文件也是按照gbk编码进行解码，那就会出现问题。

1.4. 字节流读取中文

public class TestIo {

public static void main(String[] args) throws IOException {

readFileByInputStream2("c:\a.txt");

}

private static void readFileByInputStream2(String path) throws IOException {

FileInputStream fis = new FileInputStream(path);

int len = 0;

while ((len = fis.read()) != -1) {

System.out.print((char) len);

}

这个方法读取文本文件，中文是无法正确显示的。

很显然这些字节需要解码，可以将字节输入流读取的信息保存在字节数组中，指定对应的码表进行解码即可。

public class TestIo {

public static void main(String[] args) throws IOException {

readFileByInputStream("c:\a.txt");

}

private static void readFileByInputStream(String path) throws IOException {

FileInputStream fis = new FileInputStream(path);

int len = 0;

byte[] buffer = new byte[1024];

while ((len = fis.read(buffer)) != -1) {

System.out.println(new String(buffer, 0, len, "gbk"));

}

注意：如果指定的编码表和解码表不对应就会出现问题

public class TestIo {

public static void main(String[] args) throws IOException {

// 该文件默认是gbk编码

readFileByInputStream("c:\a.txt");

}

private static void readFileByInputStream(String path) throws IOException {

FileInputStream fis = new FileInputStream(path);

int len = 0;

byte[] buffer = new byte[1024];

while ((len = fis.read(buffer)) != -1) {

// 使用utf-8 解码，解错。

System.out.println(new String(buffer, 0, len, "utf-8"));

}

1.5. 字节流写出中文

需要编码，可以指定码表。就需要自己把字符串进行编码操作后，把得到的二进制内容通过字节流写入到文件中

使用String的getBytes方法，无参数的会使用系统默认的码表进行编码，也可以指定码表

系统默认编码

public class TestIo {

public static void main(String[] args) throws IOException {

String path = "c:\test.txt";

writeFileByOutputStream(path, "世界你好");

readFileByInputStream(path);

}

private static void writeFileByOutputStream(String path, String content)

throws IOException {

FileOutputStream fos = new FileOutputStream(path);

// 把字符串进行编码操作，系统默认编码

byte[] bytes = content.getBytes();

// 内容通过字节流写入到文件中。

fos.write(bytes);

fos.close();

}

private static void readFileByInputStream(String path) throws IOException {

FileInputStream fis = new FileInputStream(path);

int len = 0;

byte[] buffer = new byte[1024];

while ((len = fis.read(buffer)) != -1) {

// 二进制解码，使用系统默认编码

System.out.println(new String(buffer, 0, len));

}

使用utf-8进行编码

public class TestIo {

public static void main(String[] args) throws IOException {

String path = "c:\test.txt";

writeFileByOutputStream(path, "世界你好");

readFileByInputStream(path);

}

private static void writeFileByOutputStream(String path, String content)

throws IOException {

FileOutputStream fos = new FileOutputStream(path);

// 把字符串进行编码操作

byte[] bytes = content.getBytes("utf-8");

// 内容通过字节流写入到文件中。

fos.write(bytes);

fos.close();

}

private static void readFileByInputStream(String path) throws IOException {

FileInputStream fis = new FileInputStream(path);

int len = 0;

byte[] buffer = new byte[1024];

while ((len = fis.read(buffer)) != -1) {

// 二进制解码，使用系统默认编码

System.out.println(new String(buffer, 0, len,"utf-8"));

}

在明白了字节流也可以正确的处理中文字符之后，就应该明白字符流其实就是字节流在加上系统默认的码表。自动进行了编码和解码的操作。底层还是使用字节流读取文件。通过转换流的学习就可以明白这些道理。

1.6. 转换流

InputStreamReader

查看API文档，发现是字节流通向字符流的桥梁。查看构造，可以传递字节流，可以指定编码，该流可以实现什么功能？很显然可以包装我们的字节流，自动的完成节流编码和解码的工作。该流是一个Reader的子类，是字符流的体系。所以将转换流称之为字节流和字符流之间的桥梁。

InputStreamReader 是字节流通向字符流的桥梁

测试InputStreamReader：

第一步: 需要专门新建以GBK编码的文本文件。为了便于标识，我们命名为gbk.txt

和以UFT-8编码的文本文件,命名为utf.txt
第二步: 分别写入汉字”中国”

第三步:编写测试方法,用InputStreamReader 分别使用系统默认编码,GBK,UTF-8编码读取文件.

public class Demo4 {

public static void main(String[] args) throws IOException {

File file = new File("c:\a.txt");

File fileGBK = new File("c:\gbk.txt");

File fileUTF = new File("c:\utf.txt");

// 默认编码

testReadFile(file);

// 传入gbk编码文件,使用gbk解码

testReadFile(fileGBK, "gbk");

// 传入utf-8文件,使用utf-8解码

testReadFile(fileUTF, "utf-8");

}

// 该方法中nputStreamReader使用系统默认编码读取文件.

private static void testReadFile(File file) throws

IOException {

FileInputStream fis = new FileInputStream(file);

InputStreamReader ins = new InputStreamReader(fis);

int len = 0;

while ((len = ins.read()) != -1) {

System.out.print((char) len);

}

ins.close();

fis.close();

}

// 该方法使用指定编码读取文件

private static void testReadFile(File file, String encod)

throws IOException {

FileInputStream fis = new FileInputStream(file);

InputStreamReader ins = new InputStreamReader(fis, encod);

int len = 0;

while ((len = ins.read()) != -1) {

System.out.print((char) len);

}

ins.close();

}

注意：码表不对应
分别测试:

使用系统默认编码读取utf-8编码文件

使用utf-8编码读取gbk编码文件

使用"gbk”编码读取utf-8文件.

发现都会出现乱码的问题.

// 使用系统默认编码读取utf-8

testReadFile(fileUTF);

// 传入gbk编码文件,使用utf-8解码

testReadFile(fileGBK, "utf-8");

// 传入utf-8文件,使用"gbk解码

testReadFile(fileUTF, "gbk");

类 OutputStreamWriter

OutputStreamWriter

有了InputStreamReader 可以转换InputStream

那么其实还有OutputStreamWriter 可以转换OutputStream

OutputStreamWriter 是字符流通向字节流的桥梁

测试OutputStreamWriter

一: 分别使用OutputStreamWriter使用系统默认编码,GBK,UTF-8相对应的默认编码文件,GBK编码文件,UTF-8编码文件中写出汉字”中国”.

二: 在使用上述案例中的readFile方法传入相对应码表读取.

public class TestIo {

public class Demo4 {

public static void main(String[] args) throws IOException {

File file = new File("c:\a.txt");

File fileGBK = new File("c:\gbk.txt");

File fileUTF = new File("c:\utf.txt");

// 写入

// 使用系统默认码表写入

testWriteFile(file);

// 使用gbk编码向gbk文件写入信息

testWriteFile(fileGBK, "gbk");

// 使用utf-8向utf-8文件中写入信息

testWriteFile(fileUTF, "utf-8");

// 读取

// 默认编码

testReadFile(file);

// 传入gbk编码文件,使用gbk解码

testReadFile(fileGBK, "gbk");

// 传入utf-8文件,使用utf-8解码

testReadFile(fileUTF, "utf-8");

}

// 使用系统码表将信息写入到文件中

private static void testWriteFile(File file) throws IOException {

FileOutputStream fos = new FileOutputStream(file);

OutputStreamWriter ops = new OutputStreamWriter(fos);

ops.write("中国");

ops.close();

}

// 使用指定码表,将信息写入到文件中

private static void testWriteFile(File file, String encod)

throws IOException {

FileOutputStream fos = new FileOutputStream(file);

OutputStreamWriter ops = new OutputStreamWriter(fos, encod);

ops.write("中国");

ops.close();

}

// 该方法中nputStreamReader使用系统默认编码读取文件.

private static void testReadFile(File file) throws IOException {

FileInputStream fis = new FileInputStream(file);

InputStreamReader ins = new InputStreamReader(fis);

int len = 0;

while ((len = ins.read()) != -1) {

System.out.print((char) len);

}

ins.close();

}

// 该方法适合用指定编码读取文件

private static void testReadFile(File file, String encod)

throws IOException {

FileInputStream fis = new FileInputStream(file);

InputStreamReader ins = new InputStreamReader(fis, encod);

int len = 0;

while ((len = ins.read()) != -1) {

System.out.print((char) len);

}

ins.close();

}

注意: 码表不对应的问题

分别测试:

向GBK文件中写入utf-8编码的信息

向utf文件中写入gbk编码的信息

发现文件都有问题,无法正常的读取了.

public static void main(String[] args) throws IOException {

File file = new File("c:\a.txt");

File fileGBK = new File("c:\gbk.txt");

File fileUTF = new File("c:\utf.txt");

// 写入

// // 使用系统默认码表写入

// testWriteFile(file);

// // 使用gbk编码向gbk文件写入信息

// testWriteFile(fileGBK, "gbk");

// // 使用utf-8向utf-8文件中写入信息

// testWriteFile(fileUTF, "utf-8");

testWriteFile(fileGBK);

// 向GBK文件中写入utf-8编码的信息

testWriteFile(fileGBK, "utf-8");

// 向utf文件中写入gbk编码的信息

testWriteFile(fileUTF, "gbk");

// 读取

// 默认编码

testReadFile(file);

// 传入gbk编码文件,使用gbk解码

testReadFile(fileGBK, "gbk");

// 传入utf-8文件,使用utf-8解码

testReadFile(fileUTF, "utf-8");

}

InputStreamReader：字节到字符的桥梁。

OutputStreamWriter：字符到字节的桥梁。

它们有转换作用，而本身又是字符流。所以在构造的时候，需要传入字节流对象进来。

构造函数：

InputStreamReader(InputStream)

通过该构造函数初始化，使用的是本系统默认的编码表GBK。

InputStreamReader(InputStream,String charSet)

通过该构造函数初始化，可以指定编码表。

OutputStreamWriter(OutputStream)

通过该构造函数初始化，使用的是本系统默认的编码表GBK。

OutputStreamWriter(OutputStream,String charSet)

通过该构造函数初始化，可以指定编码表。

注意：

操作文件的字符流对象是转换流的子类。

Reader

|--InputStreamReader

|--FileReader

Writer

|--OutputStreamWriter

|--FileWriter

注意：

在使用FileReader操作文本数据时，该对象使用的是默认的编码表。

如果要使用指定编码表时，必须使用转换流。

如果系统默认编码是GBK的：

FileReader fr = new FileReader("a.txt");//操作a.txt的中的数据使用的本系统默认的GBK。

操作a.txt中的数据使用的也是本系统默认的GBK。

InputStreamReader isr = new InputStreamReader(new FileInputStream("a.txt"));

这两句的代码的意义相同。

但是：如果a.txt中的文件中的字符数据是通过utf-8的形式编码。使用FileReader就无能为力，那么在读取时，就必须指定编码表。那么转换流必须使用。

InputStreamReader isr =

new InputStreamReader(new FileInputStream("a.txt"),"utf-8");

Java编程的艺术：从入门到精通

1. 编码

1.1. 码表

1.2. 编码：

1.3. 解码：

1.4. 字节流读取中文

1.5. 字节流写出中文

1.6. 转换流

热门阅读

推荐阅读