Download metadata-based personalization in data warehouses PDF

Titlemetadata-based personalization in data warehouses
LanguageEnglish
File Size3.4 MB
Total Pages158
Document Text Contents
Page 1

UNIVERSITY OF LATVIA








NATĀLIJA KOZMINA




METADATA-BASED PERSONALIZATION
IN DATA WAREHOUSES







Doctoral thesis for Ph. D. (Dr. sc. comp.) academic degree


Field: computer science
Subfield: data processing systems and computer networks














Advisor:
Asoc. professor, Dr. sc. comp.
LAILA NIEDRĪTE








R ī g a - 2014

Page 2

Doctoral thesis ‘Metadata-based Personalization in Data Warehouses’


2

CONTENTS
1. INTRODUCTION ............................................................................................................. 4

1.1. Motivation, Topicality and Novelty of the Subject .............................................................. 6
1.1.1. Motivation .......................................................................................................................... 6
1.1.2. Topicality and Novelty ....................................................................................................... 7

1.2. Goals and Tasks of the Thesis ................................................................................................ 8
1.3. Hypotheses Formulated in the Research .............................................................................. 9
1.4. Research Methods Applied .................................................................................................. 10
1.5. Main Results of the Research ............................................................................................... 10
1.6. Approbation of the Results .................................................................................................. 12
1.7. Outline of the Thesis ............................................................................................................. 14

2. LITERATURE REVIEW ON DATA WAREHOUSE PERSONALIZATION ........ 16
2.1. The Intent of the Section ...................................................................................................... 16
2.2. Research Directions in OLAP Personalization .................................................................. 16

2.2.1. OLAP Schema, its Elements and Basic OLAP Operations ............................................. 16
2.2.2. A Description of OLAP Personalization Directions ........................................................ 18
2.2.3. A Comparison of Existing OLAP Personalization Approaches ...................................... 23
2.2.4. Hard and Soft Constraints as User Preferences ................................................................ 25
2.2.5. Approaches for Collecting User Preference Data ............................................................ 28
2.2.6. Methods for Obtaining User Preferences ......................................................................... 28

2.3. Summary of the Section ........................................................................................................ 30
3. REQUIREMENT FORMALIZATION TO DEVELOP THE CONCEPTUAL
MODEL OF A DATA WAREHOUSE IN COMPLIANCE WITH USER NEEDS ........ 33

3.1. The Intent of the Section ...................................................................................................... 33
3.2. Methods to Construct Conceptual Models for Data Warehouses .................................... 33
3.3. Existing Methods for Formalization of Data Warehouse Requirements ........................ 35
3.4. Requirement Formalization Metamodel and Examples ................................................... 36

3.4.1. Principles of Requirement Reformulation ....................................................................... 37
3.4.2. Extending a Requirement Formalization Metamodel ...................................................... 39
3.4.3. Two Versions of the Requirement Formalization Metamodel ......................................... 39
3.4.4. An Example of a Formalized Requirement ...................................................................... 41
3.4.5. Requirement Prioritization ............................................................................................... 43

3.5. Summary of the Section ........................................................................................................ 46
4. USER-DESCRIBING PROFILES IN OLAP ............................................................... 47

4.1. The Intent of the Section ...................................................................................................... 47
4.2. The Concept of User-describing Profiles ............................................................................ 47
4.3. The Method for Construction of User-describing Profiles ............................................... 48

4.3.1. User-describing Profile Connections and Data Sources .................................................. 51
4.3.2. A Concept of the Preferential Profile ............................................................................... 52
4.3.3. A Concept of the Recommendational Profile .................................................................. 54

4.4. Summary of the Section ........................................................................................................ 55
5. OLAP REPORTING TOOL AND ITS METADATA ................................................. 57

5.1. The Intent of the Section ...................................................................................................... 57
5.2. Metadata Layers ................................................................................................................... 57

5.2.1. Physical Metadata ............................................................................................................ 58
5.2.2. Logical Metadata .............................................................................................................. 59
5.2.3. Reporting Metadata .......................................................................................................... 60
5.2.4. Semantic Metadata ........................................................................................................... 61
5.2.5. OLAP Preferences Metadata ............................................................................................ 62

5.3. Technical Details on the OLAP Reporting Tool ................................................................ 68
5.4. Summary of the Section ........................................................................................................ 69

6. METHODS FOR GENERATION OF RECOMMENDATIONS IN THE OLAP
REPORTING TOOL ............................................................................................................. 70

Page 79

Doctoral thesis ‘Metadata-based Personalization in Data Warehouses’


79

number of activity records is lower than some pre-defined threshold value). The essence of

cold-start method is composed of two components: firstly, structural analysis of existing

reports is performed, and secondly, likeliness between each pair reports is revealed.

The cold-start method addresses two issues most common in recommender systems: a

new item (or long-tail as in [PT08]) issue and a cold-start user (i.e. a user with no previous

activity in the system) issue. The main point of a new item or long-tail issue in recommender

systems is that items, which are either newly added to the system or unpopular (i.e. received

too few rating set by users), are practically of no use, because the overall rating score based

on user ratings is either absent or too low. As a result, the number of items that are never

recommended (a long tail) to users increases. In the cold-start method described in this

section the new item issue along with the cold-start user issue is solved, since the likeliness

between reports is defined irrespective of user activity. More precisely, similarity scores that

reflect likeliness are recalculated each time a new report is being created, an existing report is

being deleted or any kind of changes in existing reports are being made.

In the cold-start method, report structure denotes data warehouse schema elements

and acceptable aggregate functions, which are related to items of a certain report. OLAP

schema elements used in a report are discovered as described in section “Interconnection of

Report Items and OLAP Schema Elements”, and report structure is defined. Each report is

represented as a Report Structure Vector (RSV) by Formula 6, which is of the following form:

),,,,,,,,( 21111211 nnknnk eeeeeeRSV ………= , (6)

where eiki is a vector coordinate, i.e. a binary value that indicates presence (equals 1) or

absence (equals 0) of the instance of the report structure element, ki is the number of elements

in i-th structure, i is the index number of each structure (i = 1, 2, …, n), n is the total number

of distinct structure elements in reports. In a typical case, n = 7 as there is a finite set S of 7

elements, S = {attribute, measure, fact table, dimension, schema, acceptable aggregation,

hierarchy}.

Two instances of RSV depicted in Figure 6.2.2.1 provide an example of RSV application:

• Vector !r1 describes the structure of the report R1 – Average student count for each

faculty per semester,

• Vector !r2 describes the structure of the report R2 – Total PhD student count for each

study program per year.

Page 80

Doctoral thesis ‘Metadata-based Personalization in Data Warehouses’


80

1 1...011 1 01...1... ... 11...01 ... ...1
1 1...110 1 10...1... ... 11...10 ... ...1

A
ttr

ib
ut

es

D
im

en
si

on
s

F
ac

t T
ab

le
s

M
ea

su
re

s

A
cc

ep
ta

bl
e

A
gg

re
ga

tio
n

H
ie

ra
rc

hi
es

O
LA

P


S
ch

em
as

ye
ar

se
m

es
te

r
fa

cu
lty

pr
og

ra
m

tim
e

pr
og

ra
m

re
gi

st
ra

tio
ns

st
ud

en
t

P
hD

s
tu

de
nt

A
V

G
C

O
U

N
T

tim
e

fa
cu

lty

st
ud

en
ts

r1


r2




Fig. 6.2.2.1. Two instances of report structure vector (RSV)

Both reports belong to the same OLAP schema (Students), utilize common hierarchies

(Time: year ! semester, Faculty: faculty ! program), share a fact table (Registrations), and

dimensions (Time, Program). Sets of attributes, measures and acceptable aggregations in the

reports R1 and R2 are not equal. Note that RSV includes all OLAP schemas, their elements

(attributes, measures, etc.) and acceptable aggregations. Other elements of the report structure

are substituted with “…”, because they are not essential for current analysis.

To measure likeliness (also referred to as similarity), it is offered to make use of

Cosine/Vector similarity. Salton et al. [SM83] introduced Cosine/Vector similarity in the field

of information retrieval in order to calculate similarity between a pair of documents by

interpreting each document as a vector of term frequency values. Later Breese et al. [BHK98]

adopted this formalism in collaborative filtering. In [BHK98] users were treated as documents

and user rating values of items as term frequency values. In recommender systems literature

Cosine/Vector similarity is extensively used ([VM04, RKR05, AMK11], etc.) to compute a

similarity coefficient for a pair of users (in collaborative filtering) or items (in content-based

filtering).

In order to estimate quantitatively the difference between the reports R1 and R2,

Cosine/Vector similarity is applied. Cosine/Vector similarity of the vectors !r1 and
!
r2 is

calculated by Formula 7:

21

21

* rr
rr

sim !!
!!


= , (7)

where “⋅” is the dot-product of two vectors (i.e, the sum of pairwise products of vector

coordinates) and ir
! is the length of each vector (i = 1, 2).

Page 157

Doctoral thesis ‘Metadata-based Personalization in Data Warehouses’


157

Appendix 6. User survey results grouped by user experience.

Question Answer Novice Advanced user & Expert Sparklines

1. How would you
evaluate the complexity
of the 1st task?

Very easy 0,00% 3,33%
Easy 16,67% 30,00%
Average 23,33% 20,00%
Hard 6,67% 0,00%
Very hard 0,00% 0,00%

2. How would you
evaluate the clarity of the
1st task?

Clear 26,67% 36,67%
Mostly clear 16,67% 16,67%
Mostly confusing 3,33% 0,00%
Confusing 0,00% 0,00%

3. In your opinion, did
the report
recommendations help
you to complete the 1st
task?

Yes 30,00% 46,67%
Mostly yes 16,67% 6,67%
Mostly no 0,00% 0,00%
No 0,00% 0,00%

4. While completing the
1st task, have you used
Top3 report
recommendation in most
of the cases?

Yes 6,67% 3,33%
Mostly yes 16,67% 26,67%
Mostly no 20,00% 16,67%
No 3,33% 6,67%

5. How would you
evaluate the complexity
of the 2nd task?

Very easy 0,00% 0,00%
Easy 6,67% 20,00%
Average 26,67% 33,33%
Hard 6,67% 0,00%
Very hard 6,67% 0,00%

6. How would you
evaluate the clarity of the
2nd task?

Clear 13,33% 23,33%
Mostly clear 26,67% 26,67%
Mostly confusing 6,67% 3,33%
Confusing 0,00% 0,00%

7. In your opinion, did
the report
recommendations help
you to complete the 2nd
task?

Yes 23,33% 26,67%
Mostly yes 23,33% 26,67%
Mostly no 0,00% 0,00%
No 0,00% 0,00%

8. While completing the
2nd task, have you used
Top3 report
recommendation in most
of the cases?

Yes 6,67% 3,33%
Mostly yes 26,67% 33,33%
Mostly no 10,00% 16,67%
No 3,33% 0,00%

9. How would you
evaluate the complexity
of the 3rd task?

Very easy 3,33% 0,00%
Easy 23,33% 33,33%
Average 13,33% 13,33%
Hard 6,67% 6,67%
Very hard 0,00% 0,00%

Page 158

Doctoral thesis ‘Metadata-based Personalization in Data Warehouses’


158



10. How would you
evaluate the clarity of the
3rd task?

Clear 20,00% 40,00%
Mostly clear 23,33% 13,33%
Mostly confusing 3,33% 0,00%
Confusing 0,00% 0,00%

11. In your opinion, did
the report
recommendations help
you to complete the 3rd
task?

Yes 6,67% 6,67%
Mostly yes 36,67% 30,00%
Mostly no 3,33% 13,33%
No 0,00% 3,33%

12. While completing the
3rd task, have you used
Top3 report
recommendation in most
of the cases?

Yes 0,00% 3,33%
Mostly yes 13,33% 13,33%
Mostly no 23,33% 20,00%
No 10,00% 16,67%

13. How would you
evaluate your experience
with reporting tools in
general?

Novice 46,67% 0,00%
Advanced user 0,00% 40,00%
Expert 0,00% 13,33%

14. In your opinion, is it
easier to complete the
tasks employing any of
the recommendation
modes (1st – 3rd tasks)
than to complete the task
without any
recommendations (Test
task)?

Yes 20,00% 33,33%
Mostly yes 26,67% 20,00%
Mostly no 0,00% 0,00%
No 0,00% 0,00%

15. While completing
which of the tasks have
you used the report
recommendations most of
all? (may tick 1 or 2
answers)

1st task 10,00% 10,00%
2nd task 10,00% 20,00%
3rd task 3,33% 3,33%
1st & 2nd task 16,67% 16,67%
2nd & 3rd task 6,67% 3,33%
3rd & 1st task 0,00% 0,00%

16. While completing
which of the tasks have
you received the most
precise
recommendations? (may
tick 1 or 2 answers)

1st task 6,67% 16,67%
2nd task 23,33% 26,67%
3rd task 3,33% 3,33%
1st & 2nd task 13,33% 6,67%
2nd & 3rd task 0,00% 0,00%
3rd & 1st task 0,00% 0,00%

Similer Documents