Download Thesis Beyond Keyword Search: Representations and Models for Personalization PDF

TitleThesis Beyond Keyword Search: Representations and Models for Personalization
LanguageEnglish
File Size7.7 MB
Total Pages149
Table of Contents
                            1 Introduction
	1.1 Personalization and its Discontents
	1.2 Interactive Concept Coverage
	1.3 Thesis Statement and Contributions
	1.4 Outline
2 Background
	2.1 Probabilistic Graphical Models
		2.1.1 Representation
		2.1.2 Inference
	2.2 Sparsity in Machine Learning
		2.2.1 Penalized Loss Minimization
		2.2.2 Sparse Bayesian Methods
3 Interactive Concept Coverage with Simple Interactions
	3.1 Concept Representation and Coverage
	3.2 Optimizing Set Coverage
	3.3 Personalizing with Simple Interaction
		3.3.1 Interaction Models
		3.3.2 Personalization by Minimizing Regret
		3.3.3 Learning a User's Preferences
	3.4 Experimental Results
		3.4.1 Evaluating Coverage
		3.4.2 Personalization
	3.5 Extensions
	3.6 Related Work
		3.6.1 Incentivizing Diversity
		3.6.2 Taming Information Overload in the Blogosphere
	3.7 Conclusions
	3.8 Appendix: No-Regret Learning
	3.9 Appendix: Data Preprocessing
	3.10 Appendix: Setting the Concept Granularity Parameter
4 Complex Queries and Trust Preferences
	4.1 Problem Description
	4.2 Modeling Scientific Influence
		4.2.1 Defining edge weights
		4.2.2 Calculating influence
	4.3 Selecting Articles
		4.3.1 Influence-based Coverage
		4.3.2 Optimization
	4.4 Trust and Personalization
	4.5 Experimental Results
	4.6 Related Work
	4.7 Conclusions
	4.8 Appendix: Data Details and Preprocessing
	4.9 Appendix: User Study Details
	4.10 Appendix: Selected Papers
5 Transparent User Models for Personalization
	5.1 Modeling Badges
		5.1.1 Generating labels
		5.1.2 Generating actions
		5.1.3 Prior probabilities
		5.1.4 Badge inference
	5.2 Experimental Results
		5.2.1 Data
		5.2.2 Evaluation
	5.3 Related Work
	5.4 Conclusions
	5.5 Appendix: Derivations
		5.5.1 Sampling bi(u)
		5.5.2 Sampling bg,j
		5.5.3 Sampling sij
		5.5.4 Sampling ij
	5.6 Appendix: Experimental Details
		5.6.1 Hyperparameters
		5.6.2 Initialization
	5.7 Appendix: Badge Visualizations
6 Representing Documents Through Their Readers
	6.1 Documents and Their Readers
	6.2 Approach Summary
	6.3 The Badge Model
		6.3.1 Learning the Dictionary
		6.3.2 Coding the Documents
		6.3.3 Incorporating Relations among Badges
	6.4 Experimental Results
		6.4.1 Data Processing and Experimental Setup
		6.4.2 Examples
		6.4.3 Case Study with Political Columnists
		6.4.4 Quantitative Comparisons
	6.5 Related Work
	6.6 Conclusions
	6.7 Appendix: Data Processing
	6.8 Appendix: Optimization
		6.8.1 Dictionary Learning
		6.8.2 Coding the Documents
	6.9 Appendix: Experimental Details
7 Conclusion
	7.1 Thesis Summary
	7.2 Recommendations
8 Future Work
	8.1 Concept Hierarchies and Cuts Over Time
	8.2 Modeling the Knowledge Remainder
	8.3 Automatic Fact Checking of the Web
	8.4 Interactive Concept Coverage Beyond Text
	8.5 Richer User Interactions
Bibliography
                        
Document Text Contents
Page 1

Thesis
Beyond Keyword Search:

Representations and Models for
Personalization

Khalid El-Arini

CMU-CS-13-102

January 29, 2013

School of Computer Science
Carnegie Mellon University

Pittsburgh, PA 15213

Thesis Committee:
Carlos Guestrin, Chair
Zoubin Ghahramani

Tom Mitchell
Noah Smith

Thorsten Joachims, Cornell University

Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.

Copyright c
2013 Khalid El-Arini

This research was partially supported by the Office of Naval Research under MURI N000141010934, MURI N000140710747,
YIP N000140810752 and PECASE N000141010672, the National Science Foundation under CAREER IIS0644225, NeTS-
NOSS CNS0625518 and NeTS-SCAN CNS0721591, and by the Army Research Office under MURI W911NF0710287 and
W911NF0810242.

Page 2

Keywords: personalization, recommendation, transparency, user studies, social networks, Twitter, doc-
ument representation, content analysis, topic modeling, graphical models, sparsity, machine learning,
information retrieval

Page 74

Table 4.3: Selected papers for PNAS example (as a plant biologist)

Rank Title Year Volume Pages
1 Nitric oxide and salicylic acid signaling in plant defense 2000 97 8849-8855
2 Ancient origins of nitric oxide signaling in biological systems 1999 96 14206-14207
3 The Arabidopsis dnd1“defense, no death” gene encodes a mutated

cyclic nucleotide-gated ion channel
2000 97 9323-9328

4 Roles for mannitol and mannitol dehydrogenase in active oxygen-
mediated plant defense

1998 95 15129-15133

5 Defective localization of the NADPH phagocyte oxidase to
Salmonella-containing phagosomes in tumor necrosis factor p55
receptor-deficient macrophages

2001 98 2561-2565

6 ArabidopsisRelA/SpoT homologs implicate (p)ppGpp in plant
signaling

2000 97 3747-3752

7 A fatty acid desaturase modulates the activation of defense signal-
ing pathways in plants

2001 98 9448-9453

8 Reactive oxygen and nitrogen intermediates in the relationship
between mammalian hosts and microbial pathogens

2000 97 8841-8848

9 Virulent Salmonella typhimuriumhas two periplasmic Cu, Zn-
superoxide dismutases

1999 96 7502-7507

10 A highly conserved sequence is a novel gene involved in de novo
vitamin B6 biosynthesis

1999 96 9374-9378

Table 4.4: Selected papers for PNAS example (as an immunologist)

Rank Title Year Volume Pages
1 Defective localization of the NADPH phagocyte oxidase to

Salmonella-containing phagosomes in tumor necrosis factor p55
receptor-deficient macrophages

2001 98 2561-2565

2 Virulent Salmonella typhimuriumhas two periplasmic Cu, Zn-
superoxide dismutases

1999 96 7502-7507

3 Reactive oxygen and nitrogen intermediates in the relationship
between mammalian hosts and microbial pathogens

2000 97 8841-8848

4 Helicobacter pyloriarginase inhibits nitric oxide production by
eukaryotic cells: A strategy for bacterial survival

2001 98 13844-13849

5 Nitric oxide and salicylic acid signaling in plant defense 2000 97 8849-8855
6 Nitric oxide in plant immunity 1998 95 10345-10347
7 Peptide methionine sulfoxide reductase from Escherichia coliand

Mycobacterium tuberculosisprotects bacteria against oxidative
damage from reactive nitrogen intermediates

2001 98 9901-9906

8 Ancient origins of nitric oxide signaling in biological systems 1999 96 14206-14207
9 The oxyhemoglobin reaction of nitric oxide 1999 96 9027-9032

10 A mechanism of paraquat toxicity involving nitric oxide synthase 1999 96 12760-12765

68

Page 75

Table 4.5: Selected papers for example in Figure 4.1D

Rank Title Year Volume Pages
1 Defense gene induction in tobacco by nitric oxide, cyclic GMP,

and cyclic ADP-ribose
1998 95 10328-10333

2 Ancient origins of nitric oxide signaling in biological systems 1999 96 14206-14207
3 Periplasmic superoxide dismutase protects Salmonella from prod-

ucts of phagocyte NADPH-oxidase and nitric oxide synthase
1997 94 13997-14001

4 A mechanism of paraquat toxicity involving nitric oxide synthase 1999 96 12760-12765
5 Nitroreductase A is regulated as a member of the soxRS regulon

of Escherichia coli
1999 96 3537-3539

6 Nitric oxide and salicylic acid signaling in plant defense 2000 97 8849-8855
7 S-nitrosothiol repletion by an inhaled gas regulates pulmonary

function
2001 98 5792-5797

8 Cysteine-3635 is responsible for skeletal muscle ryanodine recep-
tor modulation by NO

2001 98 11158-11162

9 The oxyhemoglobin reaction of nitric oxide 1999 96 9027-9032
10 Protection from nitrosative stress by yeast flavohemoglobin 2000 97 4672-4676
11 Hemoglobin induction in mouse macrophages 1999 96 6643-6647
12 Physiological reactions of nitric oxide and hemoglobin: A radical

rethink
1999 96 9967-9969

13 Cochlear mechanisms from a phylogenetic viewpoint 2000 97 11736-11743
14 Plant mitogen-activated protein kinase cascades: Negative regula-

tory roles turn out positive
2001 98 784-786

15 Flavohemoglobin denitrosylase catalyzes the reaction of a nitroxyl
equivalent with molecular oxygen

2001 98 10108-10112

16 Relative role of heme nitrosylation and � -cysteine 93 nitrosation
in the transport and metabolism of nitric oxide by hemoglobin in
the human circulation

2000 97 9943-9948

17 Role of circulating nitrite and S-nitrosohemoglobin in the regula-
tion of regional blood flow in humans

2000 97 11482-11487

18 Modulation of nitric oxide bioavailability by erythrocytes 2001 98 11771-11776
19 Nitric oxide prevents cardiovascular disease and determines sur-

vival in polyglobulic mice overexpressing erythropoietin
2000 97 11609-11613

20 Plasma nitrite rather than nitrate reflects regional endothelial nitric
oxide synthase activity but lacks intrinsic vasodilator action

2001 98 12814-12819

69

Page 148

5.2.2, 6.5

Daniel Ramage, Susan Dumais, and Dan Liebling. Characterizing microblogs with topic models. In International Conference on
Weblogs and Social Media (ICWSM), 2010. 5.3

Sidney Redner. How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4:
131–134, 1998. 4.6

Nicholas D. Rizzolo and Dan Roth. Modeling discriminative global inference. In Proceedings of the First International Conference
on Semantic Computing (ICSC), 2007. 3.4

Christian P. Robert and George Casella. Monte Carlo Statistical Methods. Springer, 2005. 2.1.2

Martin Rosvall and Carl T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of
the National Academy of Sciences, 105:1118–1123, 2008. 4.6

Michal Rozen-Zvi, Thomas L. Griffiths, Mark Steyvers, and Padhraic Smyth. The author-topic model for authors and documents.
In 20th Conference on Uncertainty in Artificial Intelligence (UAI), 2004. 4.6

G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11),
November 1975. 1.2, 3.1

Ross D. Schachter. Bayes-ball: The rational pastime (for determining irrelevance and requisite information in belief networks and
influence diagrams). In 14th Conference on Uncertainty in Artificial Intelligence (UAI), 1998. 2.1.1, 2.2

Benyah Shaparenko and Thorsten Joachims. Information genealogy: uncovering the flow of ideas in non-hyperlinked document
databases. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2007. 4.5, 4.6

Ben Smith. The hair’s still perfect. Politico, April 16, 2007. 3

Noah A. Smith. Linguistic Structure Prediction. Synthesis Lectures on Human Language Technologies. Morgan and Claypool,
May 2011. 4

Alexander J. Smola and Shravan Narayanamurthy. An architecture for parallel topic models. Proceedings of Very Large Data
Bases (PVLDB), 3(1):703–710, 2010. 5.4

Victoria Stodden. Model selection when the number of variables exceeds the number of observations. PhD thesis, Stanford
University, 2006. 2.2.1

Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. Adaptive web search based on user profile constructed without any
effort from users. In 13th International World Wide Web Conference (WWW), 2004. 6.5

Romain Thibaux and Michael I. Jordan. Hierarchical beta processes and the Indian buffet process. In Proceedings of the 12th
International Conference on Artificial Intelligence and Statistics (AISTATS), 2007. 5

Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):
267–288, 1996. 2.2, 2.2.1, 2.2.2

Roberto Torres, Sean M. McNee, Mara Abel, Joseph A. Konstan, and John Riedl. Enhancing digital libraries with TechLens+. In
ACM/IEEE Joint Conference on Digital Libraries, 2004. 4.6

Leslie G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
4.2.2

Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and
Trends in Machine Learning, 1(1-2):1–305, 2008. 2.1

Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Understanding belief propagation and its generalizations. In Exploring
artificial intelligence in the new millennium. Morgan Kaufmann, 2003. 2.1.2

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical
Society, Series B, 68(1):49–67, 2007. 2.2.1

Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversified retrieval. In J. Shawe-Taylor, R.S.
Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems (NIPS) 24,
2011. 3.5, 6.1

Yisong Yue and Thorsten Joachims. Predicting diverse subsets using structural SVMs. In 25th International Conference on
Machine Learning (ICML), 2008. 3.6.1

Jeffrey Zaslow. If TiVo thinks you are gay, here’s how to set it straight. The Wall Street Journal, November 26 2002. 5, 5.3

Chengxiang Zhai, William W. Cohen, and John Lafferty. Beyond independent relevance: methods and metrics for subtopic

142

Page 149

retrieval. In 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),
2003. 3.6.1

Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, and Wei-Ying Ma. Improving web search results using
affinity graph. In 28th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval
(SIGIR), 2005. 3.6.1

143

Similer Documents