目录
1.准备数据:将图像转换为测试向量
为了使用之前编辑好的classify0()分类器,我们必须将图像格式化处理为一个向量。(32*32的二进制图像矩阵转换为1*1024的向量)
import numpy as np
"准备数据:将图像转换为测试向量"
def img2vector(filename):
returnvect = np.zeros((1, 1024)) # 创建1*1024的numpy数组
fr = open(filename)
for i in range(32): # 循环读取文件的前32行
linestr = fr.readline()
for j in range(32): # 循环读取每行的前32个字符值
returnvect[0, 32*i+j] = int(linestr[j])
return returnvect
testvector = img2vector('digits/testDigits/0_13.txt')查看文本数据:
print(testvector[0, 0:31])
print(testvector[0, 32:64])输出结果:

2.测试算法:使用K-近邻算法识别手写数字
在写代码之前:必须确保将from os import listdir写入文件起始部分 (函数listdir可以列出给定目录的文件名)
def hanfwritingclasstest():
hwlabels = []
trainingfilelist = listdir('digits/trainingDigits') # 获取目录内容
m = len(trainingfilelist)
trainingmat = np.zeros((m, 1024))
for i in range(m):
filenamestr = trainingfilelist[i]
filestr = filenamestr.split('.')[0]
classnumstr = int(filestr.split('_')[0])
hwlabels.append(classnumstr)
trainingmat[i, :] = img2vector('digits/trainingDigits/%s' % filenamestr)
testfilelist = listdir('digits/testDigits')
errorcount = 0.0
mtest = len(testfilelist)
for i in range(mtest):
filenamestr = testfilelist[i]
filestr = filenamestr.split('.')[0]
classnumstr = int(filenamestr.split('_')[0])
vectorundertest = img2vector('digits/testDigits/%s' % filenamestr)
classifierresult = classify0(vectorundertest, \
trainingmat, hwlabels, 3)
print("the classifier came back with: %d, the real answer is: %d"\
% (classifierresult, classnumstr))
if(classifierresult != classnumstr):
errorcount += 1.0
print("the total number of error is : %d" % errorcount)
print("the total error rate is : %f" % (errorcount/float(mtest)))由于文件中的值已经在0到1之间,所以不需要进行归一化处理。
输出结果:

说明:上部分代码中调用的classify0函数为我们在之前小节定义好的k-近邻算法,代码如下:
import operator
def classify0(inX, dataset, labels, k):
datasetsize = dataset.shape[0]
diffmat = np.tile(inX, (datasetsize, 1))-dataset
sqdiffmat = diffmat**2
sqdistances = sqdiffmat.sum(axis = 1)
distances = sqdistances**0.5 # 计算距离
sorteddistindices = distances.argsort()
classcount = {}
for i in range(k):
voteilabel = labels[sorteddistindices[i]]
classcount[voteilabel] = classcount.get(voteilabel, 0) + 1 # 选择距离最小的k个点
sortedclasscount = sorted(classcount.items(),
key = operator.itemgetter(1),
reverse = True) # 排序
return sortedclasscount[0][0]版权声明:本文为GH0602原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。