6 Pages
Unlock Document

Computer Science & Engineering
CSE 4412
Aijun An

York University CSE 4412: Data Mining Solutions to Assignment 1 (Fall 2012) 1. We prove this by contradiction. Let’s assume there is a frequent itemset of length k+1 which is missed by the join step. Let { i 1 i2, …, i k ik+1 be that frequent itemset. Since { i ,1i 2 …, i k ik+1} is frequent, its subsets, { 1 , 2 , …, k-1 ik}and {i 1 i2, …, ik-1 ik+1}, must be frequent according to the Apriori property. This means that { i , i , …, i , i }and {i , i , …, i , i } are in L . Because they 1 2 k-1 k 1 2 k-1 k+1 k are in L knd they share the first k-1 items, they will be joined into { i1, 2 , …, ik, ik+1} when generating C k+1. This contradicts the assumption made at the beginning. 2. (a) Finding all frequent itemsets containing item i. min_sup_count=min_sup* (number of transactions) =30%*10=3 So 1-item frequent itemset as follow in the Header Table; Since we are only looking for frequent itemsets containing item i, I put i to be the last item in the Head Table in order to prune the tree because we only need to scan the i-conditional pattern base from the leaves of the FP-tree. Original F-list: F-list after moving i at the end: Item Count Item Count a 7 a 7 i 6 c 5 c 5 r 5 r 5 s 4 s 4 e 3 e 3 l 3 l 3 i 6 The transactions after removing infrequent items and having items ordered according to the modified F- lis: TID Items bought Ordered frequent items 1 {s, u, s, a, n} { a, s} 2 {s, a, r, a} {a, r, s} 3 {s, a, m} {a, s} 4 {r, i, c, h, a, r, d} {a, c, r, i} 5 {e, r, i, c} {c, r, e, i} 6 {n, i, c, k} {c, i} 7 {p, a, t, r, i, c, k} {a, c, r, i} 8 {e, m, i, l, y} {e, l, i} 9 {c, h, a, r, l, e, s} {a, c, r, s, e, l} 10 {l, i, d, a} {a, l, i} In the following step, we could have only built the tree with transactions that contain item i. So all the paths with a leaf other than i can be removed. But we build the whole anyways. The FP- tree is as follows: i-conditional-pattern base: acr: 2 al: 1 cre: 1 c: 1 el: 1 After removing local infrequent items: car: 2 a: 1 cr: 1 c: 1 In this step, first generate frequent itemset: {a, i} {r, i} {c, i} To learn longer patterns containing i, we build the i-conditional FP-tree: Partition the patterns to be mined: To find patterns containing ri, generate ri-conditional base: ca: 2 c: 1 Local frequent c Î c: 2 c: 1 Î frequent itemset {c, r, i} To find patterns containing ai but no ri, ai-conditional pattern base: c: 2 local frequent item: none Finally, all frequent itemsets that contain item i are: {i}:6 {a, i}:3 {c, i}:4 {r, i}:3 {c, r, i}:3 (b) Finding all strong association rules with consequence i: Itemset Rule wsith i as antecedent Confidence Strong {a, i}: 4 a Æ i 3/7 = 0.43 No {c, i}: 4 c Æ i 4/5 = 0.8 Yes {r, i}: 3 r Æ i 3/5 = 0.6 Yes 3/4 = 0.75 Yes {c, r, i}: 4 cr Æ i (c) Finding misleading associations: Among the strong association rules found above, r Æ i (30%, 60%) is a misleading rule. This is because the lift of this rule is 30%/(50%*60%)=1. which means r and i are independent. The percentage of transactions containing i is exactly the same as the percentage of transaction containing i given that the transaction contains r. 3. In order to incorporate such a constraint (i.e ., finding all frequent item sets that satisfy both the support threshold and average price of at
More Less

Related notes for CSE 4412

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.