GTU Data Warehousing And Data Mining - June 2015 Exam Question Paper

GTU Information Technology (Semester 7)
Data Warehousing And Data Mining
June 2015

Total marks: --
Total time: --

INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary

1 (a) Explain different OLAP operation with example.

7 M

1 (b) (i) What are the major challenges of mining a huge amount of data in comparison with mining a small amount of data?

4 M

1 (b) (ii) Why strong association rule is not always interesting? Explain with example.

3 M

2 (a) Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where charge is the fee that a doctor charges a patient for a visit.
1) Draw a star schema diagram for the data warehouse.
2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?

7 M

2 (b) Define sampling. Explain different type of sampling techniques with example.

7 M

2 (c) What is noise? Explain the different techniques to remove the noise from data.

7 M

3 (a) How to compute the dissimilarity between objects described by the following types of variables:
1) Interval-scaled variables
2) Asymmetric binary variables
3) Categorical variables.

7 M

3 (b) How multilevel association rules can be mined efficiently using concept hierarchy?

7 M

3 (c) Suppose that the data mining task is to cluster the following eight points (with (x, y) representing location) into three clusters:
A 1 (2, 10), A 2 (2, 5), A 3 (8, 4), B 1 (5, 8), B 2 (7, 5), B 3 (6, 4), C 1 (1, 2), C 2 (4, 9):
The distance function is Euclidean distance. Suppose initially we assign A₁ , B₁ ,and C₁ as the center of each cluster, respectively. Use the k-means algorithm to show
1) The three cluster centers after the first round execution
2) The final three clusters

7 M

3 (d) Explain linear regression? What are the reasons for not using the linear regression model to estimate the output data?

7 M

4 (a) What is decision tree induction? Write Basic algorithm for inducing a decision tree from training tuples.

7 M

4 (b) (i) List strengths and weakness of neural network as classifier.

4 M

4 (b) (ii) How can distance be computed for attributes that having missing valves in K-Nearest Neighbour classifier?

3 M

4 (c) A database has 5 transactions. Let min_sup = 60% and min_conf = 80%.

TID	items_bought
T100	{M,O,N,KE,Y}
T200	{D,O,N,K,E,Y}
T300	{M,A,K,E}
T400	{M,U,C,K,Y}
T500	{C,O,O,K,I,E}

1) Find all frequent itemsets using Apriori algorithm
2) List all the association rules (with support s and confidence c) matching the following meta rule, where X is a variable representing customers, and item denotes variables representing items (e.g., "A", "B", etc.):
∀xϵ transaction; buys (X, item1)Λbuys(X,tem2)→busy(X,item3)[s,c].

7 M

4 (d) What are the methods to evaluate accuracy of classifier/predictor?

7 M

5 (a) Write a short note on web usage mining.

7 M

5 (b) Discuss basic principle of Attribute Oriented Indication.

7 M

5 (c) What is time series database? How to characterize the time series data using trend analysis?

7 M

5 (d) (i) What are measures for assessing quality of text retrieval mining system?

3 M

5 (d) (ii) What are the terminating conditions to stop training process of neural network classifier?

4 M

More question papers from Data Warehousing And Data Mining

SPONSORED ADVERTISEMENTS