矩阵分解¶

import numpy as np

U, Sigma, VT = np.linalg.svd([[1, 1], [7, 7]])

U

array([[-0.14142136, -0.98994949],
       [-0.98994949,  0.14142136]])

Sigma #Sigma是一个对角矩阵，这样返回是为了节省空间

array([  1.00000000e+01,   2.82797782e-16])

VT

array([[-0.70710678, -0.70710678],
       [ 0.70710678, -0.70710678]])

import svdRec

Data = svdRec.loadExData()

U, Sigma, VT = np.linalg.svd(Data)

Sigma

array([  9.72140007e+00,   5.29397912e+00,   6.84226362e-01,
         1.97142765e-16,   1.57042877e-16])

矩阵分解的原理如下： $$Data_{m \times n} \approx U_{m \times 3} \Sigma_{3 \times 3} V_{3 \times n}^{T}$$

Sigma3 = np.mat([[Sigma[0], 0, 0], [0, Sigma[1], 0], [0, 0, Sigma[2]]])

retMat = U[:,:3] * Sigma3 * VT[:3,:]

retMat[np.abs(retMat)<1e-8] = 0

retMat

matrix([[ 1.,  1.,  1.,  0.,  0.],
        [ 2.,  2.,  2.,  0.,  0.],
        [ 1.,  1.,  1.,  0.,  0.],
        [ 5.,  5.,  5.,  0.,  0.],
        [ 1.,  1.,  0.,  2.,  2.],
        [ 0.,  0.,  0.,  3.,  3.],
        [ 0.,  0.,  0.,  1.,  1.]])

基于协同过滤的推荐系统¶

相似度计算¶

reload(svdRec)

<module 'svdRec' from 'svdRec.pyc'>

myMat = np.mat(svdRec.loadExData())

svdRec.ecludSim(myMat[:,0], myMat[:,4])

0.13367660240019172

svdRec.ecludSim(myMat[:,0], myMat[:,0])

1.0

svdRec.cosSim(myMat[:,0], myMat[:,4])

0.54724555912615336

svdRec.cosSim(myMat[:,0], myMat[:,0])

0.99999999999999989

svdRec.pearsSim(myMat[:,0], myMat[:,4])

0.23768619407595826

svdRec.pearsSim(myMat[:,0], myMat[:,0])

1.0

利用SVD提高推荐效果¶

reload(svdRec)

<module 'svdRec' from 'svdRec.pyc'>

myMat = np.mat(svdRec.loadExData2())

U, Sigma, VT = np.linalg.svd(myMat)

Sigma

array([ 15.77075346,  11.40670395,  11.03044558,   4.84639758,
         3.09292055,   2.58097379,   1.00413543,   0.72817072,
         0.43800353,   0.22082113,   0.07367823])

Sig2 = Sigma ** 2

Sig2

array([  2.48716665e+02,   1.30112895e+02,   1.21670730e+02,
         2.34875695e+01,   9.56615756e+00,   6.66142570e+00,
         1.00828796e+00,   5.30232598e-01,   1.91847092e-01,
         4.87619735e-02,   5.42848136e-03])

np.sum(Sig2)

541.99999999999955

sum(Sig2) * 0.9

487.79999999999961

sum(Sig2[:2])

378.82955951135796

sum(Sig2[:3])

500.50028912757932

所以我们可以将一个11维矩阵转换成一个3维矩阵，使得这三个元素包含的能量高于总能量的90%。svdEst函数中以下语句：

xformedItems = dataMat.T * U[:, :4] * Sig4.I  #构建转换后的物品

这条语句就是用来计算矩阵奇异值分解后的$V$矩阵，因此需要先将$Data_{m \times n}$转置成$Data_{m \times n}^{T}$的$n \times m$的矩阵。

reload(svdRec)

<module 'svdRec' from 'svdRec.pyc'>

svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst)

the 0 and 3 similarity is : 0.490950
the 0 and 5 similarity is : 0.484274
the 0 and 10 similarity is : 0.512755
the 1 and 3 similarity is : 0.491294
the 1 and 5 similarity is : 0.481516
the 1 and 10 similarity is : 0.509709
the 2 and 3 similarity is : 0.491573
the 2 and 5 similarity is : 0.482346
the 2 and 10 similarity is : 0.510584
the 4 and 3 similarity is : 0.450495
the 4 and 5 similarity is : 0.506795
the 4 and 10 similarity is : 0.512896
the 6 and 3 similarity is : 0.743699
the 6 and 5 similarity is : 0.468366
the 6 and 10 similarity is : 0.439465
the 7 and 3 similarity is : 0.482175
the 7 and 5 similarity is : 0.494716
the 7 and 10 similarity is : 0.524970
the 8 and 3 similarity is : 0.491307
the 8 and 5 similarity is : 0.491228
the 8 and 10 similarity is : 0.520290
the 9 and 3 similarity is : 0.522379
the 9 and 5 similarity is : 0.496130
the 9 and 10 similarity is : 0.493617

[(4, 3.3447149384692283), (7, 3.3294020724526971), (9, 3.328100876390069)]

svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst, simMeas=svdRec.pearsSim)

the 0 and 3 similarity is : 0.341942
the 0 and 5 similarity is : 0.124132
the 0 and 10 similarity is : 0.116698
the 1 and 3 similarity is : 0.345560
the 1 and 5 similarity is : 0.126456
the 1 and 10 similarity is : 0.118892
the 2 and 3 similarity is : 0.345149
the 2 and 5 similarity is : 0.126190
the 2 and 10 similarity is : 0.118640
the 4 and 3 similarity is : 0.450126
the 4 and 5 similarity is : 0.528504
the 4 and 10 similarity is : 0.544647
the 6 and 3 similarity is : 0.923822
the 6 and 5 similarity is : 0.724840
the 6 and 10 similarity is : 0.710896
the 7 and 3 similarity is : 0.319482
the 7 and 5 similarity is : 0.118324
the 7 and 10 similarity is : 0.113370
the 8 and 3 similarity is : 0.334910
the 8 and 5 similarity is : 0.119673
the 8 and 10 similarity is : 0.112497
the 9 and 3 similarity is : 0.566918
the 9 and 5 similarity is : 0.590049
the 9 and 10 similarity is : 0.602380

[(4, 3.3469521867021732), (9, 3.3353796573274699), (6, 3.307193027813037)]

上述代码的运行效率并不高，一个原因是不必在每次估计评分时都做SVD分解。此外，在上面的矩阵中，含有很多0元素，在实际系统中0元素更多。我们可以只存储非零元素来节省内存和计算开销。另一个潜在的计算资源浪费来自于相似度得分。解决方法可以是离线计算并保存相似度得分。

基于SVD的图像压缩¶

reload(svdRec)

<module 'svdRec' from 'svdRec.pyc'>

svdRec.imgCompress(2)

****origin matrix*****
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0  
****reconstructed matrix using 2 singular values******
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

矩阵分解¶

基于协同过滤的推荐系统¶

相似度计算¶

推荐未评分的物品¶

利用SVD提高推荐效果¶

基于SVD的图像压缩¶