What is Apriori Method in Machine learning?
One very fresh morning I visited “Reliance fresh market”, planned to purchase the list of items that are necessary for me.. Instead of the list, I purchased additional items which were not been planned, in spite that it made me feel that I should buy even these things…… but why did that happen???
Aspirants… Story is still incomplete!!
Thrill in this scenario, was the manager has stored all the items, which goes hand in hand, rather kept the items which resembles with each other. the logic behind was.. the person who visit any mall will buy only the things he planned for but if he don’t rome the complete mall, then the person will never try to purchase if once he gets things together. typically called marking strategy. applied in each and every place of marketing…now let me take you what are we going to do with this marketing example…
What is Apriori method?
The Apriori algorithm was proposed by “Agrawal and Srikant” from India in 1994. It is designed to operate on database containing transactions (for example, collections of items bought by customers)..Where Each transaction is seen as a set of items (an item set). with a Given threshold..
This algorithm allows/helps to trim down the choices of the list of items, from the given transactions, also which gives you possible set of rules after the listing is done, then it eliminates out the unnecessary items from the list based on different rules which constructs, association rules between elements or items, those algorithm considers 3 important factors which are, “support, confidence and lift” where we found these topics in association rules from InsideAIML content.
The Apriori is so called, Which is an mathematical fancy term, starts evaluating the frequent itemset, in prior/soon provided with condition, with a given dataset which are Boolean association rules. Which also relies on mathematical property called “Downward Closure Property”
let’s begin understanding about:-
- What is Apriori Algorithm?
- What is downward closure property?
- How these are interlink with each other using same example?
Example:- These are the list of groceries transaction that took place in “Reliance fresh market” which helps to link the story we are going to use in this entire content.
Now, let’s take the same set of items & understand step-by-step, with some transaction id’s mentioned below..
For our simplicity, lets create short way representation as
- sausage →as 1st item(1).
- whole milk → as 2nd item(2)
- canned bear →as 3rd item(3)
- soda → as 4th item(4) &
- curd → as 5th item(5)
From the above image, I want to consider only those items which gives the support weightage more than 50% anything below to that are eliminated..
Step:-1
- So this method started eliminating the values which don’t resemble or matches less that 50% from the item list..
- {TID →100 with item 1,3,4 are bought together, where TID →200, with item 2,3,5 are set of items and so on…till TID →400 item with 2,5}
Step:-2
After verifying the things, it started making pair with other items which are been purchased with product 1,2,3,4,5..
So here, the second image(c1 scan D), shows that item 1 is bought twice out of those four transactions which is 2/4, where second item bought 3 times which is 3/4.. & so on..
Step:-3
- The matched pair more than 50% are considered and rest again are eliminated as shown in the picture(l1), where 4th item is been eliminated since it couldn’t meet the set threshold.
Step:- 4
After elimination part, it again started paring with the possible set of items {fig c2}, defines as item set as (1,2),(1,3),(1,5).. and so on, again eliminated where the items are below 50% of support weightage.
Step:-5
- Follow the images, provided step-by-step, which stops eliminating at the last picture i.e (scan l3), left over with the items which satisfies the set threshold.
- So the final points to be considered or the items sets left over from the sheet are{‘2,3,5’}…
- The elimination ends up, when there are no items left over, less than the set target.
limitations of apriori algorithm:-
- It’s a time consuming process, as the number of transactions increases, the more it has to use with rules to evaluate the output. Which may also lead to the junk into the data…
- Apriori will be very low and inefficiency when memory capacity is limited with large number of transactions
- This algorithm scans the database too many times, which reduces the overall performance.
To overcome, these issue different approaches are implemented those are below mentioned..
Approaches of Apriori Algorithm
- Increase the Ram of the system or Throw more compute / RAM at it
- Else we can increase the support value.
- We can implement another algorithm called (FP-Growth)
- We can also use Navie bayes Rules..
But the most important thumb rule of the Apriori algorithm is
“ Rules with low support might be valuable”..
Now to over come this issue, we can even implement the most easy method of association rule is “Downward Closure Property”
lets begin understanding this concept, with the same above example
What is Downward Closure Property?
In simple terms, This approach helps us speeding up, or trimming down the transactions, which goes together. it starts eliminating at the beginning/The principle states that all subsets of a frequent itemset, must also be frequent.
let’s begin understanding this example step by step.. since that we have mentioned all the items as A,B,C,D..
For our simplicity let’s make the small change i.e am going to transform products into,
- sausage → as ‘A’
- whole milk → as ‘B’
- semi finished bread→ as ‘c’
- yogurt→ as ‘D’
Example:-
Lets understand this concept with the help of below example, i.e Let’s Suppose there are 1000 transaction, in which if {A&B} has happen together and looking at the transaction, that only A happen, could be possible 1000 times individually. and same could be with ‘B’, since that am interested in only product with 1000 transactions, if I can find at the beginning or the product which is giving my interested rule, then I need not to further proceed with combinations where the A product matched with any of the product so this type of trimming down the method is called “Downward Closure Property”.
More Examples of Downward closure property..
- Suppose {A,B} has a certain frequency (f).
- Since each occurrence of A,B includes both A and B, then both A and B must also have frequency >= f.
- Similar argument for larger item sets.
- So, if a k-itemset meets a cut-off frequency, all its subsets (k-1, k-2 item sets) also meet this cut-off frequency.
Conclusion:-
The items scattering technique, which manager used in the beginning story applied apriori method, and downward closure method, to increase the sales of the mall, these machine learning techniques helped manager to increase the sales with decrease in working process.
Hope this content gave detail information about this topic, for more such informative knowledge visit:- https://medium.com/@saurabhmirgane007/what-are-ensembles-in-python-a84c599c3791
https://insideaiml.com/article-details/Evolution-of-MachineLearning-267
https://saurabhmirgane007.medium.com/what-are-object-oriented-programming-in-python-63562574f94d
Be creative think positive.
Happy learning.