目录
1. 矩阵分解
2. 基于协同过滤的推荐系统
2.1. 相似度计算
2.2. 推荐未评分的物品
2.3. 利用SVD提高推荐效果
3. 基于SVD的图像压缩
矩阵分解
import numpy as np
U, Sigma, VT = np.linalg.svd([[1, 1], [7, 7]])
U
array([[-0.14142136, -0.98994949],
[-0.98994949, 0.14142136]])
Sigma #Sigma是一个对角矩阵,这样返回是为了节省空间
array([ 1.00000000e+01, 2.82797782e-16])
VT
array([[-0.70710678, -0.70710678],
[ 0.70710678, -0.70710678]])
import svdRec
Data = svdRec.loadExData()
U, Sigma, VT = np.linalg.svd(Data)
Sigma
array([ 9.72140007e+00, 5.29397912e+00, 6.84226362e-01,
1.97142765e-16, 1.57042877e-16])
矩阵分解的原理如下:
Sigma3 = np.mat([[Sigma[0], 0, 0], [0, Sigma[1], 0], [0, 0, Sigma[2]]])
retMat = U[:,:3] * Sigma3 * VT[:3,:]
retMat[np.abs(retMat)<1e-8] = 0
retMat
matrix([[ 1., 1., 1., 0., 0.],
[ 2., 2., 2., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 5., 5., 5., 0., 0.],
[ 1., 1., 0., 2., 2.],
[ 0., 0., 0., 3., 3.],
[ 0., 0., 0., 1., 1.]])
基于协同过滤的推荐系统
相似度计算
reload(svdRec)
<module 'svdRec' from 'svdRec.pyc'>
myMat = np.mat(svdRec.loadExData())
svdRec.ecludSim(myMat[:,0], myMat[:,4])
0.13367660240019172
svdRec.ecludSim(myMat[:,0], myMat[:,0])
1.0
svdRec.cosSim(myMat[:,0], myMat[:,4])
0.54724555912615336
svdRec.cosSim(myMat[:,0], myMat[:,0])
0.99999999999999989
svdRec.pearsSim(myMat[:,0], myMat[:,4])
0.23768619407595826
svdRec.pearsSim(myMat[:,0], myMat[:,0])
1.0
推荐未评分的物品
reload(svdRec)
<module 'svdRec' from 'svdRec.pyc'>
myMat = np.mat(svdRec.loadExData())
myMat[0, 1] = myMat[0, 0] = myMat[1, 0] = myMat[2, 0] = 4
myMat[3, 2] = 2
myMat
matrix([[4, 4, 1, 0, 0],
[4, 2, 2, 0, 0],
[4, 1, 1, 0, 0],
[5, 5, 2, 0, 0],
[1, 1, 0, 2, 2],
[0, 0, 0, 3, 3],
[0, 0, 0, 1, 1]])
svdRec.recommend(myMat, 2)
[(3, 2.5), (4, 2.5)]
svdRec.recommend(myMat, 2, simMeas=svdRec.ecludSim)
[(3, 2.5), (4, 2.5)]
svdRec.recommend(myMat, 2, simMeas=svdRec.pearsSim)
[(3, 2.5), (4, 2.5)]
利用SVD提高推荐效果
reload(svdRec)
<module 'svdRec' from 'svdRec.pyc'>
myMat = np.mat(svdRec.loadExData2())
U, Sigma, VT = np.linalg.svd(myMat)
Sigma
array([ 15.77075346, 11.40670395, 11.03044558, 4.84639758,
3.09292055, 2.58097379, 1.00413543, 0.72817072,
0.43800353, 0.22082113, 0.07367823])
Sig2 = Sigma ** 2
Sig2
array([ 2.48716665e+02, 1.30112895e+02, 1.21670730e+02,
2.34875695e+01, 9.56615756e+00, 6.66142570e+00,
1.00828796e+00, 5.30232598e-01, 1.91847092e-01,
4.87619735e-02, 5.42848136e-03])
np.sum(Sig2)
541.99999999999955
sum(Sig2) * 0.9
487.79999999999961
sum(Sig2[:2])
378.82955951135796
sum(Sig2[:3])
500.50028912757932
所以我们可以将一个11维矩阵转换成一个3维矩阵,使得这三个元素包含的能量高于总能量的90%。svdEst函数中以下语句:
xformedItems = dataMat.T * U[:, :4] * Sig4.I #构建转换后的物品
这条语句就是用来计算矩阵奇异值分解后的$V$矩阵,因此需要先将$Data_{m \times n}$转置成$Data_{m \times n}^{T}$的$n \times m$的矩阵。
reload(svdRec)
<module 'svdRec' from 'svdRec.pyc'>
svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst)
the 0 and 3 similarity is : 0.490950
the 0 and 5 similarity is : 0.484274
the 0 and 10 similarity is : 0.512755
the 1 and 3 similarity is : 0.491294
the 1 and 5 similarity is : 0.481516
the 1 and 10 similarity is : 0.509709
the 2 and 3 similarity is : 0.491573
the 2 and 5 similarity is : 0.482346
the 2 and 10 similarity is : 0.510584
the 4 and 3 similarity is : 0.450495
the 4 and 5 similarity is : 0.506795
the 4 and 10 similarity is : 0.512896
the 6 and 3 similarity is : 0.743699
the 6 and 5 similarity is : 0.468366
the 6 and 10 similarity is : 0.439465
the 7 and 3 similarity is : 0.482175
the 7 and 5 similarity is : 0.494716
the 7 and 10 similarity is : 0.524970
the 8 and 3 similarity is : 0.491307
the 8 and 5 similarity is : 0.491228
the 8 and 10 similarity is : 0.520290
the 9 and 3 similarity is : 0.522379
the 9 and 5 similarity is : 0.496130
the 9 and 10 similarity is : 0.493617
[(4, 3.3447149384692283), (7, 3.3294020724526971), (9, 3.328100876390069)]
svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst, simMeas=svdRec.pearsSim)
the 0 and 3 similarity is : 0.341942
the 0 and 5 similarity is : 0.124132
the 0 and 10 similarity is : 0.116698
the 1 and 3 similarity is : 0.345560
the 1 and 5 similarity is : 0.126456
the 1 and 10 similarity is : 0.118892
the 2 and 3 similarity is : 0.345149
the 2 and 5 similarity is : 0.126190
the 2 and 10 similarity is : 0.118640
the 4 and 3 similarity is : 0.450126
the 4 and 5 similarity is : 0.528504
the 4 and 10 similarity is : 0.544647
the 6 and 3 similarity is : 0.923822
the 6 and 5 similarity is : 0.724840
the 6 and 10 similarity is : 0.710896
the 7 and 3 similarity is : 0.319482
the 7 and 5 similarity is : 0.118324
the 7 and 10 similarity is : 0.113370
the 8 and 3 similarity is : 0.334910
the 8 and 5 similarity is : 0.119673
the 8 and 10 similarity is : 0.112497
the 9 and 3 similarity is : 0.566918
the 9 and 5 similarity is : 0.590049
the 9 and 10 similarity is : 0.602380
[(4, 3.3469521867021732), (9, 3.3353796573274699), (6, 3.307193027813037)]
上述代码的运行效率并不高,一个原因是不必在每次估计评分时都做SVD分解。此外,在上面的矩阵中,含有很多0元素,在实际系统中0元素更多。我们可以只存储非零元素来节省内存和计算开销。另一个潜在的计算资源浪费来自于相似度得分。解决方法可以是离线计算并保存相似度得分。
基于SVD的图像压缩
reload(svdRec)
<module 'svdRec' from 'svdRec.pyc'>
svdRec.imgCompress(2)
****origin matrix*****
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
****reconstructed matrix using 2 singular values******
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0