Preliminaries
library(mosaic)
library(broom)
library(tidyverse)
library(openintro)
library(Stat2Data)
#library(car)
data("Clothing")
The Clothing dataset represents a random of 60 customers from a large clothing retailer. The manager of the store is interested in predicting how much a customer will spend on his or her next purchase based on one or more of the available explanatory variables.
Question 1
What does each cloumn of this dataset represent? Which one is the response variable?
Clothing
Question 2
Is there any abnormal/outlier value in the response variable?
Yes, the first three rows and the last row.
Question 3
Please run filter to remove all rows with abnormal/outlier value of amount.
Clothing <- filter(Clothing, Amount > 0 & Amount < 1000)
Clothing
Question 4
Please run cor to calculate a matrix of correlation (do not include the column ID and Card) and then identify the predictor variables which have high correlations with the reponse variable.
cor(Clothing[, 2:7])
Amount Recency Freq12 Dollar12 Freq24 Dollar24
Amount 1.00000000 -0.2208139 0.05159862 0.8036804 0.1017170 0.6773219
Recency -0.22081392 1.0000000 -0.58382258 -0.4538679 -0.5490906 -0.4323811
Freq12 0.05159862 -0.5838226 1.00000000 0.5558579 0.7099506 0.4214650
Dollar12 0.80368037 -0.4538679 0.55585788 1.0000000 0.4849486 0.8274500
Freq24 0.10171701 -0.5490906 0.70995061 0.4849486 1.0000000 0.5962226
Dollar24 0.67732192 -0.4323811 0.42146503 0.8274500 0.5962226 1.0000000
Question 5
Draw a scatterplot between Amount and the preditor with the highest correlation.
ggplot(Clothing, aes(y = Amount, x = Dollar12)) +
geom_point() +
geom_smooth(method="lm")

Question 5
Run a simple linear regression between Amount and Dollar12.
Clothing_lm <- lm(Amount ~ Dollar12, data = Clothing)
tidy(Clothing_lm)
Calculate the R^2 of this model.
glance(Clothing_lm)
Question 6
Run a multiple linear regression between Amount and three predictors with highest correlations.
Clothing1_lm <- lm(Amount ~ Dollar12 + Dollar24 + Recency, data = Clothing)
tidy(Clothing1_lm)
Calculate the R^2 of this model. How much of R^2 is improved comparing with the simple model in question 5.
glance(Clothing1_lm)
Question 7
Run a multiple linear regression between Amount and all predictors.
Which predictor are significant at 5% level?
Clothing2_lm <- lm(Amount ~ Dollar12 + Dollar24 + Recency + Freq12 + Freq24 + Card, data = Clothing)
tidy(Clothing2_lm)
Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 6.
glance(Clothing2_lm)
Question 8
Run a multiple linear regression between Amount and significant predictors.
Clothing3_lm <- lm(Amount ~ Dollar12 + Freq12, data = Clothing)
tidy(Clothing3_lm)
Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 7. Should you use this model or the model in question 7?
glance(Clothing3_lm)
Question 9
Add a new column AveSpent12 to dataset Clothing using mutate to reflect the average amount spent on each visit over the past 12 month.
Clothing <- mutate(Clothing, AveSpent12 = Dollar12/Freq12)
Clothing
Is there any abnormal value in the new column AveSpent12? Why? Please run filter to remove all rows with NaN value.
Clothing <- filter(Clothing, Freq12 > 0)
Clothing <- mutate(Clothing, AveSpent12 = Dollar12/Freq12)
Clothing
Run a multiple linear regression between Amount and AveSpent12.
Clothing4_lm <- lm(Amount ~ AveSpent12, data = Clothing)
tidy(Clothing4_lm)
Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 8. Should you use this model or the model in question 8?
glance(Clothing4_lm)
Question 10
Draw a scatterplot between Amount and AveSpent12. Is there any curvature in the pattern between Amount and AveSpent12? How should we modify the model in question 9 if there is a curvature in this scatterplot?
ggplot(Clothing, aes(y = Amount, x = AveSpent12)) +
geom_point() +
geom_smooth(method="lm")

Question 11
Add a quadratic term of AveSpent12 to the model.
Clothing5_lm <- lm(Amount ~ AveSpent12 + I(AveSpent12^2), data = Clothing)
tidy(Clothing5_lm)
Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 9.
glance(Clothing5_lm)
CLothing5_aug <- augment(Clothing5_lm)
CLothing5_aug
Question 12
Draw a residual plot of the model in question 11.
ggplot(CLothing5_aug, aes(y = .resid, x = .fitted )) +
geom_point() +
geom_hline(yintercept = 0)

Draw a quadratic curve of the model in question 11 over the scatter plot between Amount and AveSpent12.
ggplot(Clothing, aes(y = Amount, x = AveSpent12)) +
geom_point() +
geom_line(aes(x=AveSpent12, y=CLothing5_aug$.fitted), color="red")

A log log:
Clothing6_lm <- lm(log(Amount) ~ log(AveSpent12), data = Clothing)
tidy(Clothing6_lm)
glance(Clothing6_lm)
LS0tCnRpdGxlOiAiREFUQSAzNTAtIE11bHRpcGxlIFJlZ3Jlc3Npb24gLSBDYXNlIFN0dWR5IC0gYW5zd2VyIgpvdXRwdXQ6CiAgICBodG1sX25vdGVib29rOgogICAgICAgIHRvYzogeWVzCiAgICAgICAgdG9jX2Zsb2F0OiB5ZXMKLS0tCgoKIyMgUHJlbGltaW5hcmllcwoKYGBge3IsIG1lc3NhZ2UgPSBGQUxTRX0KbGlicmFyeShtb3NhaWMpCmxpYnJhcnkoYnJvb20pCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KG9wZW5pbnRybykKbGlicmFyeShTdGF0MkRhdGEpCiNsaWJyYXJ5KGNhcikKZGF0YSgiQ2xvdGhpbmciKQpgYGAKCgpUaGUgYENsb3RoaW5nYCBkYXRhc2V0IHJlcHJlc2VudHMgYSByYW5kb20gb2YgNjAgY3VzdG9tZXJzIGZyb20gYSBsYXJnZSBjbG90aGluZyByZXRhaWxlci4gVGhlIG1hbmFnZXIgb2YgdGhlIHN0b3JlIGlzIGludGVyZXN0ZWQgaW4gcHJlZGljdGluZyBob3cgbXVjaCBhIGN1c3RvbWVyIHdpbGwgc3BlbmQgb24gaGlzIG9yIGhlciBuZXh0IHB1cmNoYXNlIGJhc2VkIG9uIG9uZSBvciBtb3JlIG9mIHRoZSBhdmFpbGFibGUgZXhwbGFuYXRvcnkgdmFyaWFibGVzLgoKIyMjIyBRdWVzdGlvbiAxCgpXaGF0IGRvZXMgZWFjaCBjbG91bW4gb2YgdGhpcyBkYXRhc2V0IHJlcHJlc2VudD8gV2hpY2ggb25lIGlzIHRoZSByZXNwb25zZSB2YXJpYWJsZT8gCgpgYGB7cn0KQ2xvdGhpbmcKYGBgCgojIyMjIFF1ZXN0aW9uIDIKCklzIHRoZXJlIGFueSBhYm5vcm1hbC9vdXRsaWVyIHZhbHVlIGluIHRoZSByZXNwb25zZSB2YXJpYWJsZT8KClllcywgdGhlIGZpcnN0IHRocmVlIHJvd3MgYW5kIHRoZSBsYXN0IHJvdy4KCiMjIyMgUXVlc3Rpb24gMwoKUGxlYXNlIHJ1biBgZmlsdGVyYCB0byByZW1vdmUgYWxsIHJvd3Mgd2l0aCBhYm5vcm1hbC9vdXRsaWVyIHZhbHVlIG9mIGFtb3VudC4gCgpgYGB7cn0KQ2xvdGhpbmcgPC0gZmlsdGVyKENsb3RoaW5nLCBBbW91bnQgPiAwICYgQW1vdW50IDwgMTAwMCkKQ2xvdGhpbmcKYGBgCgojIyMjIFF1ZXN0aW9uIDQKClBsZWFzZSBydW4gYGNvcmAgdG8gY2FsY3VsYXRlIGEgbWF0cml4IG9mIGNvcnJlbGF0aW9uIChkbyBub3QgaW5jbHVkZSB0aGUgY29sdW1uIGBJRGAgYW5kIGBDYXJkYCkgYW5kIHRoZW4gaWRlbnRpZnkgdGhlIHByZWRpY3RvciB2YXJpYWJsZXMgd2hpY2ggaGF2ZSBoaWdoIGNvcnJlbGF0aW9ucyB3aXRoIHRoZSByZXBvbnNlIHZhcmlhYmxlLgoKYGBge3J9CmNvcihDbG90aGluZ1ssIDI6N10pCmBgYAoKIyMjIyBRdWVzdGlvbiA1CgpEcmF3IGEgc2NhdHRlcnBsb3QgYmV0d2VlbiBBbW91bnQgYW5kIHRoZSBwcmVkaXRvciB3aXRoIHRoZSBoaWdoZXN0IGNvcnJlbGF0aW9uLgoKYGBge3J9CmdncGxvdChDbG90aGluZywgYWVzKHkgPSBBbW91bnQsIHggPSBEb2xsYXIxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX3Ntb290aChtZXRob2Q9ImxtIikKYGBgCgoKIyMjIyBRdWVzdGlvbiA1CgpSdW4gYSBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIERvbGxhcjEyLgoKYGBge3J9CkNsb3RoaW5nX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmdfbG0pCmBgYAoKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nX2xtKQpgYGAKCiMjIyMgUXVlc3Rpb24gNgoKUnVuIGEgbXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIHRocmVlIHByZWRpY3RvcnMgd2l0aCBoaWdoZXN0IGNvcnJlbGF0aW9ucy4gCgpgYGB7cn0KQ2xvdGhpbmcxX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyICsgRG9sbGFyMjQgKyBSZWNlbmN5LCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmcxX2xtKQpgYGAKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuIEhvdyBtdWNoIG9mIFJeMiBpcyBpbXByb3ZlZCBjb21wYXJpbmcgd2l0aCB0aGUgc2ltcGxlIG1vZGVsIGluIHF1ZXN0aW9uIDUuIAoKYGBge3J9CmdsYW5jZShDbG90aGluZzFfbG0pCmBgYAoKCiMjIyMgUXVlc3Rpb24gNwoKUnVuIGEgbXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIGFsbCBwcmVkaWN0b3JzLiAKCldoaWNoIHByZWRpY3RvciBhcmUgc2lnbmlmaWNhbnQgYXQgNSUgbGV2ZWw/CgpgYGB7cn0KQ2xvdGhpbmcyX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyICsgRG9sbGFyMjQgKyBSZWNlbmN5ICsgRnJlcTEyICsgRnJlcTI0ICsgQ2FyZCwgZGF0YSA9IENsb3RoaW5nKQp0aWR5KENsb3RoaW5nMl9sbSkKYGBgCgpDYWxjdWxhdGUgdGhlIFJeMiBvZiB0aGlzIG1vZGVsLiBIb3cgbXVjaCBvZiBSXjIgaXMgaW1wcm92ZWQgY29tcGFyaW5nIHdpdGggdGhlIG1vZGVsIGluIHF1ZXN0aW9uIDYuCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nMl9sbSkKYGBgCgojIyMjIFF1ZXN0aW9uIDgKClJ1biBhIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uIGJldHdlZW4gQW1vdW50IGFuZCBzaWduaWZpY2FudCBwcmVkaWN0b3JzLiAKCmBgYHtyfQpDbG90aGluZzNfbG0gPC0gbG0oQW1vdW50IH4gRG9sbGFyMTIgKyAgRnJlcTEyLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmczX2xtKQpgYGAKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuIEhvdyBtdWNoIG9mIFJeMiBpcyBpbXByb3ZlZCBjb21wYXJpbmcgd2l0aCB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gNy4gU2hvdWxkIHlvdSB1c2UgdGhpcyBtb2RlbCBvciB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gNz8KCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nM19sbSkKYGBgCgojIyMjIFF1ZXN0aW9uIDkKCkFkZCBhIG5ldyBjb2x1bW4gYEF2ZVNwZW50MTJgIHRvIGRhdGFzZXQgYENsb3RoaW5nYCB1c2luZyBgbXV0YXRlYCB0byByZWZsZWN0IHRoZSBhdmVyYWdlIGFtb3VudCBzcGVudCBvbiBlYWNoIHZpc2l0IG92ZXIgdGhlIHBhc3QgMTIgbW9udGguCgpgYGB7cn0KQ2xvdGhpbmcgPC0gbXV0YXRlKENsb3RoaW5nLCBBdmVTcGVudDEyID0gRG9sbGFyMTIvRnJlcTEyKQpDbG90aGluZwpgYGAKCklzIHRoZXJlIGFueSBhYm5vcm1hbCB2YWx1ZSBpbiB0aGUgbmV3IGNvbHVtbiBBdmVTcGVudDEyPyBXaHk/IFBsZWFzZSBydW4gZmlsdGVyIHRvIHJlbW92ZSBhbGwgcm93cyB3aXRoIE5hTiB2YWx1ZS4KCmBgYHtyfQpDbG90aGluZyA8LSBmaWx0ZXIoQ2xvdGhpbmcsIEZyZXExMiA+IDApCkNsb3RoaW5nIDwtIG11dGF0ZShDbG90aGluZywgQXZlU3BlbnQxMiA9IERvbGxhcjEyL0ZyZXExMikKQ2xvdGhpbmcKYGBgCgpSdW4gYSBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBiZXR3ZWVuIEFtb3VudCBhbmQgQXZlU3BlbnQxMi4gCgpgYGB7cn0KQ2xvdGhpbmc0X2xtIDwtIGxtKEFtb3VudCB+IEF2ZVNwZW50MTIsIGRhdGEgPSBDbG90aGluZykKdGlkeShDbG90aGluZzRfbG0pCmBgYAoKQ2FsY3VsYXRlIHRoZSBSXjIgb2YgdGhpcyBtb2RlbC4gSG93IG11Y2ggb2YgUl4yIGlzIGltcHJvdmVkIGNvbXBhcmluZyB3aXRoIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA4LiBTaG91bGQgeW91IHVzZSB0aGlzIG1vZGVsIG9yIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA4PwoKYGBge3J9CmdsYW5jZShDbG90aGluZzRfbG0pCmBgYAoKIyMjIyBRdWVzdGlvbiAxMAoKRHJhdyBhIHNjYXR0ZXJwbG90IGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyLiBJcyB0aGVyZSBhbnkgY3VydmF0dXJlIGluIHRoZSBwYXR0ZXJuIGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyPyBIb3cgc2hvdWxkIHdlIG1vZGlmeSB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gOSBpZiB0aGVyZSBpcyBhIGN1cnZhdHVyZSBpbiB0aGlzIHNjYXR0ZXJwbG90PwoKCmBgYHtyfQpnZ3Bsb3QoQ2xvdGhpbmcsIGFlcyh5ID0gQW1vdW50LCB4ID0gQXZlU3BlbnQxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX3Ntb290aChtZXRob2Q9ImxtIikKYGBgCgojIyMjIFF1ZXN0aW9uIDExCgpBZGQgYSBxdWFkcmF0aWMgdGVybSBvZiBBdmVTcGVudDEyIHRvIHRoZSBtb2RlbC4gCgpgYGB7cn0KQ2xvdGhpbmc1X2xtIDwtIGxtKEFtb3VudCB+IEF2ZVNwZW50MTIgKyBJKEF2ZVNwZW50MTJeMiksIGRhdGEgPSBDbG90aGluZykKdGlkeShDbG90aGluZzVfbG0pCmBgYAoKQ2FsY3VsYXRlIHRoZSBSXjIgb2YgdGhpcyBtb2RlbC4gSG93IG11Y2ggb2YgUl4yIGlzIGltcHJvdmVkIGNvbXBhcmluZyB3aXRoIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA5LgoKYGBge3J9CmdsYW5jZShDbG90aGluZzVfbG0pCmBgYAoKCmBgYHtyfQpDTG90aGluZzVfYXVnIDwtIGF1Z21lbnQoQ2xvdGhpbmc1X2xtKQpDTG90aGluZzVfYXVnCmBgYAoKIyMjIyBRdWVzdGlvbiAxMgoKRHJhdyBhIHJlc2lkdWFsIHBsb3Qgb2YgdGhlIG1vZGVsIGluIHF1ZXN0aW9uIDExLgoKYGBge3J9CmdncGxvdChDTG90aGluZzVfYXVnLCBhZXMoeSA9IC5yZXNpZCwgeCA9IC5maXR0ZWQgKSkgKyAKICBnZW9tX3BvaW50KCkgKwogIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDApCmBgYAoKRHJhdyBhIHF1YWRyYXRpYyBjdXJ2ZSBvZiB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gMTEgb3ZlciB0aGUgc2NhdHRlciBwbG90IGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyLiAKCmBgYHtyfQpnZ3Bsb3QoQ2xvdGhpbmcsIGFlcyh5ID0gQW1vdW50LCB4ID0gQXZlU3BlbnQxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX2xpbmUoYWVzKHg9QXZlU3BlbnQxMiwgeT1DTG90aGluZzVfYXVnJC5maXR0ZWQpLCBjb2xvcj0icmVkIikKYGBgCgpBIGxvZyBsb2c6CgpgYGB7cn0KQ2xvdGhpbmc2X2xtIDwtIGxtKGxvZyhBbW91bnQpIH4gbG9nKEF2ZVNwZW50MTIpLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmc2X2xtKQpgYGAKCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nNl9sbSkKYGBgCgoKCgo=