##### Document Text Contents

Page 1

Lecture Notes in Artificial Intelligence 3169

Edited by J. G. Carbonell and J. Siekmann

Subseries of Lecture Notes in Computer Science

Page 2

Bamshad Mobasher Sarabjot SinghAnand (Eds.)

Intelligent

Techniques

for Web

Personalization

IJCAI 2003 Workshop, ITWP 2003

Acapulco, Mexico, August 11, 2003

Revised Selected Papers

13

Page 165

158 C.P. Lam

We then recommend the top N items based on the elements of pa, not including those

items for which the active user has already rated.

We will simply refer to collaborative-filtering using linear associative memory as

CLAM from now on. The first observation we make is that W completely represents

the underlying model and is an n × n symmetric matrix, in which n is the number of

items. Its complexity is independent of m, the number of users. In general, m � n, so

the difference can be significant.

Another notable property about CLAM is its ability for incremental learning. Other

model-based CF algorithms tend to support only batch learning, and their models have

to be completely rebuilt when more data is available. It is a simple observation that

if W (m) stands for the memory matrix given m users, and the data for a new, (m +

1)th, user become available, then the memory matrix can be updated by W (m + 1) =

W (m)+xm+1 · (xm+1)T /‖xm+1‖α. Other simple updating rules can also be devised

for the cases where existing users in the training set decide to add, modify, or delete

their ratings.

3.1 Effect of Training Set Size

The fact that each user’s information is additive in creating the memory model also

enables some theoretical analysis. One can imagine there to be a universe of all users

that one can get ratings from (e.g., the population of all shoppers). Say the size of this

universe is M . If one can actually collect the ratings from all M users, then one can

create the ideal CLAM model,

W ∗ =

1

M

M∑

i=1

xi · (xi)T

Here we assume α = 0, and the 1

M

multiplier has no impact on the recommendation

but will add clarity to our analysis.

Of course, one does not have the ratings from all possible users, so we assume what

one has is a random subset of size m. The CLAM model built from this subset is

W (m) =

1

m

m∑

i=1

Xi · (Xi)T (6)

(Again the multiplicative constant has no effect on recommendation, and we capitalize

X just to emphasize it as a random sample.) Under this view, one is effectively trying to

estimate W ∗ by sampling a population of size m. It is well known that this is an unbi-

ased estimator and the expectation of W (m) is in fact W ∗. Another known result [12]

for this estimator is that the variance of each element of W (m) is proportional to

1

m

(

1 − m − 1

M − 1

)

For m � M , as is usually the case, the variance of the elements of W (m) is approx-

imately proportional to 1

m

and the standard error, the more useful measure, is approxi-

mately proportional to 1√

m

. Thus a larger sample population of users will make a better

Page 166

Collaborative Filtering Using Associative Neural Memory 159

model with lower standard error, although the rate of improvement will be decreasing.

This fits the intuition of many practitioners, but to our knowledge this is the first formal

analysis of such phenomenon for a collaborative filtering algorithm.

3.2 User-Based Interpretation of CLAM

Earlier we described the user-based collaborative filtering algorithm for ranking in

Equation 2. Let xi be a column vector with its elements as defined in Equation 4. Let

pa be a column vector whose jth element is paj of Equation 2. The user-based ranking

algorithm can be written in vector form as

(pa)T =

m∑

i=1

w(a, i) · (xi)T

Several forms for the weighting function w(a, i) exist [2]. One popular form takes the

vector similarity of user i and the active user’s rating vectors. It is defined as

w(a, i) =

(xa)T · xi

‖xa‖‖xi‖

Plugging into the user-based ranking algorithm, we get

(pa)T =

m∑

i=1

(xa)T · xi · (xi)T

‖xa‖‖xi‖

=

(xa)T

‖xa‖ ·

m∑

i=1

xi · (xi)T

‖xi‖

The division by ‖xa‖ can be dropped because it affects the predicted value of every

item equally and thus has no effect on the ranking of items. The summation is simply

W defined in Equation 5 with α set to 1. Noting the symmetry of W , we have

(pa)T = (xa)T W

pa = Wxa,

which is exactly our CLAM algorithm.

4 Experiment

4.1 Methodology

To test the effectiveness of the CLAM algorithm we ran some experiments on the

MovieLens dataset [13], collected by the GroupLens Research Project at the University

of Minnesota. The dataset consists of 100,000 ratings, in the range of one to five, from

943 users on 1682 movies. Each user has rated at least 20 movies, and each movie has

been rated at least once. The ratings were gathered at a Web site during a seven-month

period from 1997 to 1998.

Page 330

Web Personalisation for Users Protection: A Multi-agent Method 323

67%). This evaluation has also important consequences with respect to the applicability

of the system in an industrial context.

References

[1] S. Aknine and P. Caillou. Agreements without disagreements: A coalition formation

method. In ECAI, European Conference on Artifitial Intelligence, pages 3–7, 2004.

[2] Kessler B., Nunberg G., and Schtze H. Automatic detection of text genre. Technical report,

Palo Alto Research Center, 1997.

[3] C.H. Brooks and E.H. Durfee. Congregation formation in multiagent systems. Journal of

Autonomous Agents and Multi-agent Systems, 2001.

[4] Princip Consorsium. Final report of princip project. Technical report, LIP6, 2004.

[5] J. Kalgren and D. Cutting. Recognizing text genres with simple metrics using discriminant

analysis. In COLING94, 1994.

[6] J. Kalgren and D. Cutting. Genres defined for a purpose, fast clustering, and an iterative

information retrieval interface. In Eighth DELOS Workshop on User Interfaces in Digital

Libraries, 1998.

[7] Jim Miller. Pics label distribution, label syntax and communication protocols. W3C Rec-

ommendation REC-PICS-labels-961031, 1996.

[8] T.W. Sandholm and V.R. Lesser. Coalitions among computationally bounded agents. AI,

94:99–137, 1997.

[9] O. Shehory and S. Kraus. Methods for task allocation via agent coalition formation. Arti-

ficial Intelligence, 101(1-2):165–200, May 1998.

[10] M. Tambe. Towards flexible teamwork. Journal Of AI Research, 7:83–124, 1997.

Page 331

Author Index

Aknine, Samir, 306

Anand, Sarabjot Singh, 1, 272

Berendt, Bettina, 69

Burke, Robin, 133

Chan, Keith C.C., 169

Dogac, Asuman, 289

Eirinaki, Magdalini, 272

Greiner, Russ, 241

Häubl, Gerald, 241

Hay, Birgit, 187

Keenoy, Kevin, 201

Kritikopoulos, Apostolos, 229

Lam, Chuck P., 153

Levene, Mark, 201

Lorenzi, Fabiana, 89

McCarthy, Kevin, 255

McGinty, Lorraine, 114

Miller, Craig S., 37

Mobasher, Bamshad, 1

Price, Bob, 241

Quenum, Ghislain, 306

Ramakrishnan, Naren, 53

Reilly, James, 255

Ricci, Francesco, 89

Sideri, Martha, 229

Slodzian, Aurélien, 306

Smyth, Barry, 114, 255

Tang, Tiffany Ya, 169

Teltzrow, Maximilian, 69

Toroslu, I. Hakki, 289

Tumer, Arif, 289

Vanhoof, Koen, 187

Vlachakis, Joannis, 272

Wets, Geert, 187

Winoto, Pinata, 169

Zhu, Tingshao, 241

Lecture Notes in Artificial Intelligence 3169

Edited by J. G. Carbonell and J. Siekmann

Subseries of Lecture Notes in Computer Science

Page 2

Bamshad Mobasher Sarabjot SinghAnand (Eds.)

Intelligent

Techniques

for Web

Personalization

IJCAI 2003 Workshop, ITWP 2003

Acapulco, Mexico, August 11, 2003

Revised Selected Papers

13

Page 165

158 C.P. Lam

We then recommend the top N items based on the elements of pa, not including those

items for which the active user has already rated.

We will simply refer to collaborative-filtering using linear associative memory as

CLAM from now on. The first observation we make is that W completely represents

the underlying model and is an n × n symmetric matrix, in which n is the number of

items. Its complexity is independent of m, the number of users. In general, m � n, so

the difference can be significant.

Another notable property about CLAM is its ability for incremental learning. Other

model-based CF algorithms tend to support only batch learning, and their models have

to be completely rebuilt when more data is available. It is a simple observation that

if W (m) stands for the memory matrix given m users, and the data for a new, (m +

1)th, user become available, then the memory matrix can be updated by W (m + 1) =

W (m)+xm+1 · (xm+1)T /‖xm+1‖α. Other simple updating rules can also be devised

for the cases where existing users in the training set decide to add, modify, or delete

their ratings.

3.1 Effect of Training Set Size

The fact that each user’s information is additive in creating the memory model also

enables some theoretical analysis. One can imagine there to be a universe of all users

that one can get ratings from (e.g., the population of all shoppers). Say the size of this

universe is M . If one can actually collect the ratings from all M users, then one can

create the ideal CLAM model,

W ∗ =

1

M

M∑

i=1

xi · (xi)T

Here we assume α = 0, and the 1

M

multiplier has no impact on the recommendation

but will add clarity to our analysis.

Of course, one does not have the ratings from all possible users, so we assume what

one has is a random subset of size m. The CLAM model built from this subset is

W (m) =

1

m

m∑

i=1

Xi · (Xi)T (6)

(Again the multiplicative constant has no effect on recommendation, and we capitalize

X just to emphasize it as a random sample.) Under this view, one is effectively trying to

estimate W ∗ by sampling a population of size m. It is well known that this is an unbi-

ased estimator and the expectation of W (m) is in fact W ∗. Another known result [12]

for this estimator is that the variance of each element of W (m) is proportional to

1

m

(

1 − m − 1

M − 1

)

For m � M , as is usually the case, the variance of the elements of W (m) is approx-

imately proportional to 1

m

and the standard error, the more useful measure, is approxi-

mately proportional to 1√

m

. Thus a larger sample population of users will make a better

Page 166

Collaborative Filtering Using Associative Neural Memory 159

model with lower standard error, although the rate of improvement will be decreasing.

This fits the intuition of many practitioners, but to our knowledge this is the first formal

analysis of such phenomenon for a collaborative filtering algorithm.

3.2 User-Based Interpretation of CLAM

Earlier we described the user-based collaborative filtering algorithm for ranking in

Equation 2. Let xi be a column vector with its elements as defined in Equation 4. Let

pa be a column vector whose jth element is paj of Equation 2. The user-based ranking

algorithm can be written in vector form as

(pa)T =

m∑

i=1

w(a, i) · (xi)T

Several forms for the weighting function w(a, i) exist [2]. One popular form takes the

vector similarity of user i and the active user’s rating vectors. It is defined as

w(a, i) =

(xa)T · xi

‖xa‖‖xi‖

Plugging into the user-based ranking algorithm, we get

(pa)T =

m∑

i=1

(xa)T · xi · (xi)T

‖xa‖‖xi‖

=

(xa)T

‖xa‖ ·

m∑

i=1

xi · (xi)T

‖xi‖

The division by ‖xa‖ can be dropped because it affects the predicted value of every

item equally and thus has no effect on the ranking of items. The summation is simply

W defined in Equation 5 with α set to 1. Noting the symmetry of W , we have

(pa)T = (xa)T W

pa = Wxa,

which is exactly our CLAM algorithm.

4 Experiment

4.1 Methodology

To test the effectiveness of the CLAM algorithm we ran some experiments on the

MovieLens dataset [13], collected by the GroupLens Research Project at the University

of Minnesota. The dataset consists of 100,000 ratings, in the range of one to five, from

943 users on 1682 movies. Each user has rated at least 20 movies, and each movie has

been rated at least once. The ratings were gathered at a Web site during a seven-month

period from 1997 to 1998.

Page 330

Web Personalisation for Users Protection: A Multi-agent Method 323

67%). This evaluation has also important consequences with respect to the applicability

of the system in an industrial context.

References

[1] S. Aknine and P. Caillou. Agreements without disagreements: A coalition formation

method. In ECAI, European Conference on Artifitial Intelligence, pages 3–7, 2004.

[2] Kessler B., Nunberg G., and Schtze H. Automatic detection of text genre. Technical report,

Palo Alto Research Center, 1997.

[3] C.H. Brooks and E.H. Durfee. Congregation formation in multiagent systems. Journal of

Autonomous Agents and Multi-agent Systems, 2001.

[4] Princip Consorsium. Final report of princip project. Technical report, LIP6, 2004.

[5] J. Kalgren and D. Cutting. Recognizing text genres with simple metrics using discriminant

analysis. In COLING94, 1994.

[6] J. Kalgren and D. Cutting. Genres defined for a purpose, fast clustering, and an iterative

information retrieval interface. In Eighth DELOS Workshop on User Interfaces in Digital

Libraries, 1998.

[7] Jim Miller. Pics label distribution, label syntax and communication protocols. W3C Rec-

ommendation REC-PICS-labels-961031, 1996.

[8] T.W. Sandholm and V.R. Lesser. Coalitions among computationally bounded agents. AI,

94:99–137, 1997.

[9] O. Shehory and S. Kraus. Methods for task allocation via agent coalition formation. Arti-

ficial Intelligence, 101(1-2):165–200, May 1998.

[10] M. Tambe. Towards flexible teamwork. Journal Of AI Research, 7:83–124, 1997.

Page 331

Author Index

Aknine, Samir, 306

Anand, Sarabjot Singh, 1, 272

Berendt, Bettina, 69

Burke, Robin, 133

Chan, Keith C.C., 169

Dogac, Asuman, 289

Eirinaki, Magdalini, 272

Greiner, Russ, 241

Häubl, Gerald, 241

Hay, Birgit, 187

Keenoy, Kevin, 201

Kritikopoulos, Apostolos, 229

Lam, Chuck P., 153

Levene, Mark, 201

Lorenzi, Fabiana, 89

McCarthy, Kevin, 255

McGinty, Lorraine, 114

Miller, Craig S., 37

Mobasher, Bamshad, 1

Price, Bob, 241

Quenum, Ghislain, 306

Ramakrishnan, Naren, 53

Reilly, James, 255

Ricci, Francesco, 89

Sideri, Martha, 229

Slodzian, Aurélien, 306

Smyth, Barry, 114, 255

Tang, Tiffany Ya, 169

Teltzrow, Maximilian, 69

Toroslu, I. Hakki, 289

Tumer, Arif, 289

Vanhoof, Koen, 187

Vlachakis, Joannis, 272

Wets, Geert, 187

Winoto, Pinata, 169

Zhu, Tingshao, 241