IT446 Data mining
May 5, 2024
College of Computing and Informatics Question One Pg. 01 Learning Outcome(s):4 Evaluate the performance of data mining algorithms. Question One 2 Marks An online streaming platform wants to analyze whether there is a significant association between users’ subscription plans (Basic, Premium, Family) and their preferred device for streaming (Smartphone, Tablet, Laptop). The platform samples 300 subscribers randomly and records their subscription plans and preferred streaming devices. Based on the data, can the streaming platform conclude that there is a relationship between subscription plans and preferred streaming devices among its users? (Use Chisquare with a significance level (α) set at 0.05). Smartphone Tablet Laptop Basic 50 30 20 Premium 60 40 20 Family 30 20 30 Question Two Pg. 02 Learning Outcome(s):2 Demonstrate a wide range of clustering, Question Two 2 Marks It is important to define/select similarity measures in data analysis. However, there is no commonly accepted subjective similarity measure. Results can vary depending on the similarity measures used. Nonetheless, seemingly different similarity measures may be equivalent after some transformation. estimation, prediction, and Suppose we have the following 2-D data set: classification algorithms to solve a specific program or application. Consider the data as a pair of data points. Given a new data point, x = (1.4,1.6) as a query, rank the database points based on similarity with the query using Euclidean distance, Manhattan distance, supremum distance, and cosine similarity. Question Three Pg. 03 Learning Outcome(s):3 Question Three Suppose a group of 12 sales price records has been sorted as follows: Employ data 5,10,11,13,15,35,50,55,72,92,204,215. mining and data Partition them into three bins by each of the following methods: warehousing techniques to (a) equal-frequency (equal-depth) partitioning solve real-world (b) equal-width partitioning problems. 2 Marks Question Four Pg. 04 Learning Outcome(s):1 Define different data mining tasks, problems, and the algorithms most appropriate for addressing them. Question Four 2 Marks Suppose that a data warehouse consists of four dimensions (date, spectator, location, and game) and two measures (count and charge) where charge is the fare that a spectator pays when watching a game on a given date. Spectators may be (students, adults, or seniors), with each category having its own charge rate. (a) Draw a star schema diagram for the data warehouse. (b) Starting with the base cuboid [date, spectator, location, game], what specific OLAP operations should be performed to list the total charge paid by student spectators at GM Place in 2010?
Trust your assignments to an essay writing service with the fastest delivery time and fully original content.