Validating cluster structures in data mining tasks
From this data, it could be found whether certain age groups (22-30 year olds, for example) have a higher propensity to order a certain color of BMW M5s (75 percent buy blue).Similarly, it can be shown that a different age group (55-62, for example) tend to order silver BMWs (65 percent buy silver, 20 percent buy gray)." Regression models can answer a question with a numerical answer.A regression model would use past sales data on BMWs and M5s to determine how much people paid for previous cars from the dealership, based on the attributes and selling features of the cars sold." The data can be mined to show that when people come and purchase a BMW M5, they also tend to purchase the matching luggage. Using this data, the car dealership can move the promotions for the matching luggage to the front of the dealership, or even offer a newspaper ad for free/discounted matching luggage when they buy the M5, in an effort to increase sales.(also known as classification trees or decision trees) is a data mining algorithm that creates a step-by-step guide for how to determine the output of a new data instance.However, I included it in the comparisons and descriptions for this article to make the discussions complete.Before we get into the specific details of each method and run them through WEKA, I think we should understand what each model strives to accomplish — what type of data and what goals each model attempts to accomplish.
Question: "When people purchase the BMW M5, what other options do they tend to buy at the same time?
Think of this another way: If you only used regression models, which produce a numerical output, how would Amazon be able to tell you "Other Customers Who Bought X Also Bought Y?
" There's no numerical function that could give you this type of information.
So let's delve into the two additional models you can use with your data.
In this article, I will also make repeated references to the data mining method called "nearest neighbor," though I won't actually delve into the details until Part 3.Possible nodes on the tree would be age, income level, current number of cars, marital status, kids, homeowner, or renter.