Deadpool 2 V.Extendida BDremux | Into the Badlands | Stream in HD

data science course

Loading...

Association Rules Market Basket Analysis

Relationship Mining

Affinity Analysis

© 2013 ExcelR Solutions. All Rights Reserved

Market Basket Analysis

• Large number of transaction records through data collected using bar-code scanners

• Each record = All items purchased on a single purchase transaction

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules

• What item goes with what • Are certain groups of items consistently purchased together • What business strategies will you device with this knowledge

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules

• Products shelf placement – a specific product beside another • Selling of prominent shelves – Slotting Fees • Stocking – Supply Chain Management • Price Bundling – Combo offers. How? Source: http://www.economist.com/news/business/21654601-supplier-rebates-are-heart-some-supermarket-chains-woes-buying-up-shelves http://en.wikipedia.org/wiki/Association_rule_learning

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Cell phone faceplates  A store sells accessories for cellular phones runs a promotion on faceplates OFFER! Buy multiple faceplates from a choice of 6 different colors & get discount  How would you help store managers device strategy to become more profitable © 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Cell phone faceplates List Format Transaction # 1 2 3 4 5 6 7 8 9 10

Binary Matrix Format

Faceplate colors purchased Red White Green White Orange White Blue Red White Orange Red Blue White Blue White Orange Red White Blue Green Red White Blue Yellow

Transaction # 1 2 3 4 5 6 7 8 9 10

Red 1 0 0 1 1 0 0 1 1 0

White 1 1 1 1 0 1 1 1 1 0

Association Rules are probabilistic “if-then” statements 2 Main Ideas:  Examine all possible “if-then” rule formats  Select rules, which indicates true dependence © 2013 ExcelR Solutions. All Rights Reserved

Blue 0 0 1 0 1 1 0 1 1 0

Orange Green Yellow 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1

Association Rules – Cell phone faceplates Rules for { Red, White, Green} 1. If

{Red, White}

then

{Green}

2. If

{Red, Green}

then

{White}

• Many rules are possible

3. If

{White, Green}

then

{Red}

4. If

{Red}

then

{White, Green}

• How to select the TRUE/GOOD rules from all generated rules?

5. If

{White}

then

{Red, Green}

6. If

{Green}

then

{Red, White}

Problem

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Terminology “IF” part = Antecedent = A “THEN” part = Consequent = C • If {Red, White} then {Green} • If Red & White phone faceplates are purchased, then Green faceplate is purchased  Antecedent: Red & White  Consequent: Green © 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Performance Measures

1 Support

2 Confidence

© 2013 ExcelR Solutions. All Rights Reserved

3 Lift

Association Rules – Support • Consider only combinations that occur with higher frequency in the database

1 Support

• Support is the criterion based on frequency Percentage / Number of transactions in which IF/Antecedent & THEN / Consequent appear in the data Mathematically: # transactions in which A & C appear together _____________________________________ Total no. of transactions © 2013 ExcelR Solutions. All Rights Reserved

Support - Calculation Transaction # 1 2 3 4 5 6 7 8 9 10

Red White White Red Red White White Red Red Yellow

• What is the support for “if White then Blue”? 1. 2. 3. 4.

4 40% 2 90%

Faceplate colors purchased White Green Orange Blue White Orange Blue Blue Orange White Blue Green White Blue

• What is the support for “if Blue then White”? 1. 2. 3. 4.

4 40% 2 90%

© 2013 ExcelR Solutions. All Rights Reserved

Support - Problem • Generating all possible rules is exponential in the number of distinct items • Solution: Frequent item sets using Apriori Algorithm

© 2013 ExcelR Solutions. All Rights Reserved

Apriori Algorithm

1 2 3

4 5

For k products:

Set minimum support criteria Generate list of one-item sets that meet the support criterion Use list of one-item sets to generate list of two-item sets that meet support criterion Use list of two-item sets to generate list of three-item sets that meet support criterion Continue up through k-item sets © 2013 ExcelR Solutions. All Rights Reserved

Support – Criterion = 2 Transaction # 1 2 3 4 5 6 7 8 9 10

Faceplate colors purchased Red White Green White Orange White Blue Red White Orange Red Blue White Blue White Orange Red White Blue Green Red White Blue Yellow

Create rules from frequent item sets only

Item set

Support (Count)

{Red} {White} {Blue} {Orange} {Green} {Red, White} {Red, Blue} {Red, Green} {White, Blue} {White, Orange} {White, Green} {Red, White, Blue} {Red, White, Green}

5 8 5 3 2 4 3 2 4 3 2 2 2

© 2013 ExcelR Solutions. All Rights Reserved

Support Criterion Example Rules for { Red, White, Green} 1. If

{Red, White}

then

{Green}

2. If

{Red, Green}

then

{White}

3. If

{White, Green}

then

{Red}

4. If

{Red}

then

{White, Green}

5. If

{White}

then

{Red, Green}

6. If

{Green}

then

{Red, White}

How good are these rules beyond the point that they have high support?

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Confidence • Percentage of If/Antecedent transactions that also have the Then/Consequent item set

2 Confidence

Mathematically: P (Consequent | Antecedent) = P(C & A) / P(A)

# transactions in which A & C appear together _____________________________________ # transactions with A

© 2013 ExcelR Solutions. All Rights Reserved

Confidence - Calculation Transaction # 1 2 3 4 5 6 7 8 9 10

Red White White Red Red White White Red Red Yellow

Faceplate colors purchased White Green Orange Blue White Orange Blue Blue Orange White Blue Green White Blue

• What is the confidence for “if White then Blue”? 1. 2. 3. 4.

4/5 5/8 5/4 4/8

• What is the confidence for “if Blue then White”? 1. 2. 3. 4.

4/5 5/8 5/4 4/8

© 2013 ExcelR Solutions. All Rights Reserved

Confidence - Weakness • If antecedent and consequent have:

High Support => High / Biased Confidence

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules – Lift Ratio Confidence / Benchmark confidence Benchmark assumes independence between antecedent & consequent:

3 Lift Ratio

P(antecedent & consequent) = P(antecedent) X P(consequent)

Benchmark confidence

P(C|A) = P(C & A) / P(A) = P(C) X P(A) /P(A) = P(C)

# transactions with consequent item sets _____________________________________ # transactions in database © 2013 ExcelR Solutions. All Rights Reserved

Interpreting Lift • Lift > 1 indicates a rule that is useful in finding consequent item sets • The rule above is much better than selecting random transactions

© 2013 ExcelR Solutions. All Rights Reserved

Lift - Calculation Transaction # 1 2 3 4 5 6 7 8 9 10

Red White White Red Red White White Red Red Yellow

Faceplate colors purchased White Green Orange Blue White Orange Blue Blue Orange White Blue Green White Blue

• What is the Lift for “if White then Blue”? 1. 2. 3. 4.

4/8 5/10 4/5 1 © 2013 ExcelR Solutions. All Rights Reserved

Rules selection process Generate all rules that meet specified Support & Confidence  Find frequent item sets based on Support specified by applying minimum support cutoff

 From these item sets, generate rules with defined Confidence. By filtering remaining rules select only those with high Confidence © 2013 ExcelR Solutions. All Rights Reserved

Rules Inputs

List of Rules

Data # Transactions in Input Data # Columns in Input Data # Items in Input Data # Association Rules Minimum Support Minimum Confidence

10 6 6 8 2 70.00%

Rule: If all Antecedent items are purchased, then with Confidence percentage Consequent items will also be purchased.

Row ID

8 4 6 3 5 7 1 2

Antecedent Confidence % (A)

100 100 100 100 100 100 80 80

Consequent (C)

green red & white green red white & green red orange white green white red & green white red white blue white

Support for Support for A C

2 2 2 3 2 2 5 5

© 2013 ExcelR Solutions. All Rights Reserved

4 5 5 8 8 8 8 8

Support for A&C

Lift Ratio

2 2 2 3 2 2 4 4

2.5 2 2 1.25 1.25 1.25 1 1

Alarming!  Random data can generate apparently interesting association rules  More the rules you produce, greater the danger

 Rules based on large numbers of records are less subject to this danger © 2013 ExcelR Solutions. All Rights Reserved

Profusion of rules

© 2013 ExcelR Solutions. All Rights Reserved

Applications • What if Product & Stores are selected as a tuple for analysis?

• What if crimes in different geographies for each week is known?

Narcotics

Battery

Assault

Narcotics

Robbery © 2013 ExcelR Solutions. All Rights Reserved

Public Peace Violation

Recap with an example • How can you use the information if you know about the purchase history of customers in a specific geography? • Supermarket database has 100,000 POS transactions

• 2000 transactions include both Strepsils & Orange Juice

• 800 of the above 2000 include Soup purchases

© 2013 ExcelR Solutions. All Rights Reserved

Recap with an example • What is the support for rule “IF (Orange Juice & Strepsils) are purchased THEN (Soup) is purchased on the same trip”? 1. 0.8 % 2. 2 % 3. 40 %

• What is the confidence for rule “IF (Orange Juice & Strepsils) are purchased THEN (Soup) is purchased on the same trip”? 1. 0.8 % 2. 2 % 3. 40 % © 2013 ExcelR Solutions. All Rights Reserved

Recap with an example • What is the lift ratio for rule “IF (Orange Juice & Strepsils) are purchased THEN (Soup) is purchased on the same trip”?

© 2013 ExcelR Solutions. All Rights Reserved

Sequential Pattern Mining IT IS

NOT Purchases / events occur at the same time



If person X has taken “Data Mining Unsupervised” training in 1st Quarter, Person X has also taken “Data Mining Supervised” training in 2nd Quarter



Based on the statement above, recommend “Data Mining Supervised” training to those who have enrolled for “Data Mining Unsupervised”

© 2013 ExcelR Solutions. All Rights Reserved

Association Rules vs. Sequential Pattern Mining • Look for temporal patterns • Order/sequence of a & b matters for a rule “b follows a” • However, what happens in between a & b doesn’t matter • In phone faceplates dataset:  Association among items, which were bought within the same week were discovered  How about finding what they would buy next week or the week after, if they had bought ‘x’ in this week? © 2013 ExcelR Solutions. All Rights Reserved

Applications • Identify the appropriate Basket

• Identify popular taxi routes  Sequential pattern from GPS tracks; spatiotemporal records of taxi trajectories  First cluster collocated customers

© 2013 ExcelR Solutions. All Rights Reserved

THANK YOU

© 2013 ExcelR Solutions. All Rights Reserved

Loading...

data science course

Association Rules Market Basket Analysis Relationship Mining Affinity Analysis © 2013 ExcelR Solutions. All Rights Reserved Market Basket Analysi...

2MB Sizes 1 Downloads 0 Views

Recommend Documents

data science course Hyderabad
ExcelR is a proud partner of Universit Malaysia Saravak (UNIMAS), Malaysia’s 1st public University and ranked 8th top un

data science certification course
ExcelR is considered to be the best Data Science training institute in Noida which offers a gamut of services starting f

data science course in bangalore
Data Science certification training course from ExcelR equips you with essential Data Science skills to make you a succe

data science course in hyderabad
Business Analytics or Data Analytics or Data Science certification course is an extremely high-in-demand profession whic

data science course in mysore
Business Analytics or Data Analytics or Data Science certification course is an extremely high-in-demand profession wh

data science course in gurgaon
Data Science is all about mining hidden insights of data pertaining to trends, behaviour, interpretation and inferences

data science course in bangalore
Data Science certification training course from ExcelR equips you with essential Data Science skills to make you a succe

data science course fee in hyderabad
Data Science is all about mining hidden insights of data pertaining to trends, behaviour, interpretation and inferences

Data Science Design MANUAL Data Science Design
average height of adult women in the United States is 63.7±2.7 inches, meaning. µ = 63.7 and σ = 2.7. The average temper

Data Science Colloquium - ENS-CFM Data Science Chair
Next seminars. Nov. 14th, 2017, 12h00-13h00, room CONF IV (physic dpt, Rue Lhomond). Rémi Monasson (ENS) Title: Searchi