GTU Information Technology (Semester 7)
Data Warehousing And Data Mining
June 2015
Total marks: --
Total time: --
INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary


1 (a) Explain different OLAP operation with example.
7 M
1 (b) (i) What are the major challenges of mining a huge amount of data in comparison with mining a small amount of data?
4 M
1 (b) (ii) Why strong association rule is not always interesting? Explain with example.
3 M

2 (a) Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where charge is the fee that a doctor charges a patient for a visit.
1) Draw a star schema diagram for the data warehouse.
2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?
7 M
2 (b) Define sampling. Explain different type of sampling techniques with example.
7 M
2 (c) What is noise? Explain the different techniques to remove the noise from data.
7 M

3 (a) How to compute the dissimilarity between objects described by the following types of variables:
1) Interval-scaled variables
2) Asymmetric binary variables
3) Categorical variables.
7 M
3 (b) How multilevel association rules can be mined efficiently using concept hierarchy?
7 M
3 (c) Suppose that the data mining task is to cluster the following eight points (with (x, y) representing location) into three clusters:
A 1 (2, 10), A 2 (2, 5), A 3 (8, 4), B 1 (5, 8), B 2 (7, 5), B 3 (6, 4), C 1 (1, 2), C 2 (4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1 , B1 ,and C1 as the center of each cluster, respectively. Use the k-means algorithm to show
1) The three cluster centers after the first round execution
2) The final three clusters
7 M
3 (d) Explain linear regression? What are the reasons for not using the linear regression model to estimate the output data?
7 M

4 (a) What is decision tree induction? Write Basic algorithm for inducing a decision tree from training tuples.
7 M
4 (b) (i) List strengths and weakness of neural network as classifier.
4 M
4 (b) (ii) How can distance be computed for attributes that having missing valves in K-Nearest Neighbour classifier?
3 M
4 (c) A database has 5 transactions. Let min_sup = 60% and min_conf = 80%.
TID items_bought
T100 {M,O,N,KE,Y}
T200 {D,O,N,K,E,Y}
T300 {M,A,K,E}
T400 {M,U,C,K,Y}
T500 {C,O,O,K,I,E}

1) Find all frequent itemsets using Apriori algorithm
2) List all the association rules (with support s and confidence c) matching the following meta rule, where X is a variable representing customers, and item denotes variables representing items (e.g., "A", "B", etc.):
∀xϵ transaction; buys (X, item1)Λbuys(X,tem2)→busy(X,item3)[s,c].
7 M
4 (d) What are the methods to evaluate accuracy of classifier/predictor?
7 M

5 (a) Write a short note on web usage mining.
7 M
5 (b) Discuss basic principle of Attribute Oriented Indication.
7 M
5 (c) What is time series database? How to characterize the time series data using trend analysis?
7 M
5 (d) (i) What are measures for assessing quality of text retrieval mining system?
3 M
5 (d) (ii) What are the terminating conditions to stop training process of neural network classifier?
4 M



More question papers from Data Warehousing And Data Mining
SPONSORED ADVERTISEMENTS