1 min read

association rules

Measuring the likeihood that two or more factor levels (categories) appear together in observation (row). You could imagine that Aidan would want to know how likely is it that a particular beers are purchased on the same bill…

library(arules)

rules.surv <- titanic.raw %>% apriori(
             control = list(verbose=F),
             parameter = list(minlen=2, supp=0.005, conf=0.8),
             appearance = list(rhs=c("Survived=No",
                                     "Survived=Yes"),
                               default="lhs"))
## keep three decimal places
quality(rules.surv) <- rules.surv %>% quality() %>% round(digits=3)
## sort rules by lift
rules.surv.sorted <- rules.surv %>% sort(by="lift")

## ----inspect rules-------------------------------------------------------
rules.surv.sorted %>% inspect() ## print rules

Which got you a nice output:

lhs div rhs support confidence lift
Class=2nd, Age=Child => Survived=Yes 0.011 1.000 3.096
Class=2nd, Sex=Female, Age=Child => Survived=Yes 0.006 1.000 3.096
Class=1st, Sex=Female => Survived=Yes 0.064 0.972 3.010

Definitions:

support -> Fraction of transactions/obs that contain both LHS and RHS confidence -> Measures how often each item in RHS appears in transactions/obs that contain LHS

lift -> A lift value greater than 1 could indicate that LHS and RHS appear more often together than expected. A lift smaller than 1 could indicate that LHS and RHS appear less often together than expected

One can use association rules to predict/model future combinations…