Preliminaries

library(mosaic)
library(broom)
library(tidyverse)
library(openintro)
library(Stat2Data)
#library(car)
data("Clothing")

The Clothing dataset represents a random of 60 customers from a large clothing retailer. The manager of the store is interested in predicting how much a customer will spend on his or her next purchase based on one or more of the available explanatory variables.

Question 1

What does each cloumn of this dataset represent? Which one is the response variable?

Clothing

Question 2

Is there any abnormal/outlier value in the response variable?

Yes, the first three rows and the last row.

Question 3

Please run filter to remove all rows with abnormal/outlier value of amount.

Clothing <- filter(Clothing, Amount > 0 & Amount < 1000)
Clothing

Question 4

Please run cor to calculate a matrix of correlation (do not include the column ID and Card) and then identify the predictor variables which have high correlations with the reponse variable.

cor(Clothing[, 2:7])
              Amount    Recency      Freq12   Dollar12     Freq24   Dollar24
Amount    1.00000000 -0.2208139  0.05159862  0.8036804  0.1017170  0.6773219
Recency  -0.22081392  1.0000000 -0.58382258 -0.4538679 -0.5490906 -0.4323811
Freq12    0.05159862 -0.5838226  1.00000000  0.5558579  0.7099506  0.4214650
Dollar12  0.80368037 -0.4538679  0.55585788  1.0000000  0.4849486  0.8274500
Freq24    0.10171701 -0.5490906  0.70995061  0.4849486  1.0000000  0.5962226
Dollar24  0.67732192 -0.4323811  0.42146503  0.8274500  0.5962226  1.0000000

Question 5

Draw a scatterplot between Amount and the preditor with the highest correlation.

ggplot(Clothing, aes(y = Amount, x = Dollar12)) +
    geom_point() +
    geom_smooth(method="lm")

Question 5

Run a simple linear regression between Amount and Dollar12.

Clothing_lm <- lm(Amount ~ Dollar12, data = Clothing)
tidy(Clothing_lm)

Calculate the R^2 of this model.

glance(Clothing_lm)

Question 6

Run a multiple linear regression between Amount and three predictors with highest correlations.

Clothing1_lm <- lm(Amount ~ Dollar12 + Dollar24 + Recency, data = Clothing)
tidy(Clothing1_lm)

Calculate the R^2 of this model. How much of R^2 is improved comparing with the simple model in question 5.

glance(Clothing1_lm)

Question 7

Run a multiple linear regression between Amount and all predictors.

Which predictor are significant at 5% level?

Clothing2_lm <- lm(Amount ~ Dollar12 + Dollar24 + Recency + Freq12 + Freq24 + Card, data = Clothing)
tidy(Clothing2_lm)

Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 6.

glance(Clothing2_lm)

Question 8

Run a multiple linear regression between Amount and significant predictors.

Clothing3_lm <- lm(Amount ~ Dollar12 +  Freq12, data = Clothing)
tidy(Clothing3_lm)

Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 7. Should you use this model or the model in question 7?

glance(Clothing3_lm)

Question 9

Add a new column AveSpent12 to dataset Clothing using mutate to reflect the average amount spent on each visit over the past 12 month.

Clothing <- mutate(Clothing, AveSpent12 = Dollar12/Freq12)
Clothing

Is there any abnormal value in the new column AveSpent12? Why? Please run filter to remove all rows with NaN value.

Clothing <- filter(Clothing, Freq12 > 0)
Clothing <- mutate(Clothing, AveSpent12 = Dollar12/Freq12)
Clothing

Run a multiple linear regression between Amount and AveSpent12.

Clothing4_lm <- lm(Amount ~ AveSpent12, data = Clothing)
tidy(Clothing4_lm)

Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 8. Should you use this model or the model in question 8?

glance(Clothing4_lm)

Question 10

Draw a scatterplot between Amount and AveSpent12. Is there any curvature in the pattern between Amount and AveSpent12? How should we modify the model in question 9 if there is a curvature in this scatterplot?

ggplot(Clothing, aes(y = Amount, x = AveSpent12)) +
    geom_point() +
    geom_smooth(method="lm")

Question 11

Add a quadratic term of AveSpent12 to the model.

Clothing5_lm <- lm(Amount ~ AveSpent12 + I(AveSpent12^2), data = Clothing)
tidy(Clothing5_lm)

Calculate the R^2 of this model. How much of R^2 is improved comparing with the model in question 9.

glance(Clothing5_lm)
CLothing5_aug <- augment(Clothing5_lm)
CLothing5_aug

Question 12

Draw a residual plot of the model in question 11.

ggplot(CLothing5_aug, aes(y = .resid, x = .fitted )) + 
  geom_point() +
  geom_hline(yintercept = 0)

Draw a quadratic curve of the model in question 11 over the scatter plot between Amount and AveSpent12.

ggplot(Clothing, aes(y = Amount, x = AveSpent12)) +
    geom_point() +
    geom_line(aes(x=AveSpent12, y=CLothing5_aug$.fitted), color="red")

A log log:

Clothing6_lm <- lm(log(Amount) ~ log(AveSpent12), data = Clothing)
tidy(Clothing6_lm)
glance(Clothing6_lm)
LS0tCnRpdGxlOiAiREFUQSAzNTAtIE11bHRpcGxlIFJlZ3Jlc3Npb24gLSBDYXNlIFN0dWR5IC0gYW5zd2VyIgpvdXRwdXQ6CiAgICBodG1sX25vdGVib29rOgogICAgICAgIHRvYzogeWVzCiAgICAgICAgdG9jX2Zsb2F0OiB5ZXMKLS0tCgoKIyMgUHJlbGltaW5hcmllcwoKYGBge3IsIG1lc3NhZ2UgPSBGQUxTRX0KbGlicmFyeShtb3NhaWMpCmxpYnJhcnkoYnJvb20pCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KG9wZW5pbnRybykKbGlicmFyeShTdGF0MkRhdGEpCiNsaWJyYXJ5KGNhcikKZGF0YSgiQ2xvdGhpbmciKQpgYGAKCgpUaGUgYENsb3RoaW5nYCBkYXRhc2V0IHJlcHJlc2VudHMgYSByYW5kb20gb2YgNjAgY3VzdG9tZXJzIGZyb20gYSBsYXJnZSBjbG90aGluZyByZXRhaWxlci4gVGhlIG1hbmFnZXIgb2YgdGhlIHN0b3JlIGlzIGludGVyZXN0ZWQgaW4gcHJlZGljdGluZyBob3cgbXVjaCBhIGN1c3RvbWVyIHdpbGwgc3BlbmQgb24gaGlzIG9yIGhlciBuZXh0IHB1cmNoYXNlIGJhc2VkIG9uIG9uZSBvciBtb3JlIG9mIHRoZSBhdmFpbGFibGUgZXhwbGFuYXRvcnkgdmFyaWFibGVzLgoKIyMjIyBRdWVzdGlvbiAxCgpXaGF0IGRvZXMgZWFjaCBjbG91bW4gb2YgdGhpcyBkYXRhc2V0IHJlcHJlc2VudD8gV2hpY2ggb25lIGlzIHRoZSByZXNwb25zZSB2YXJpYWJsZT8gCgpgYGB7cn0KQ2xvdGhpbmcKYGBgCgojIyMjIFF1ZXN0aW9uIDIKCklzIHRoZXJlIGFueSBhYm5vcm1hbC9vdXRsaWVyIHZhbHVlIGluIHRoZSByZXNwb25zZSB2YXJpYWJsZT8KClllcywgdGhlIGZpcnN0IHRocmVlIHJvd3MgYW5kIHRoZSBsYXN0IHJvdy4KCiMjIyMgUXVlc3Rpb24gMwoKUGxlYXNlIHJ1biBgZmlsdGVyYCB0byByZW1vdmUgYWxsIHJvd3Mgd2l0aCBhYm5vcm1hbC9vdXRsaWVyIHZhbHVlIG9mIGFtb3VudC4gCgpgYGB7cn0KQ2xvdGhpbmcgPC0gZmlsdGVyKENsb3RoaW5nLCBBbW91bnQgPiAwICYgQW1vdW50IDwgMTAwMCkKQ2xvdGhpbmcKYGBgCgojIyMjIFF1ZXN0aW9uIDQKClBsZWFzZSBydW4gYGNvcmAgdG8gY2FsY3VsYXRlIGEgbWF0cml4IG9mIGNvcnJlbGF0aW9uIChkbyBub3QgaW5jbHVkZSB0aGUgY29sdW1uIGBJRGAgYW5kIGBDYXJkYCkgYW5kIHRoZW4gaWRlbnRpZnkgdGhlIHByZWRpY3RvciB2YXJpYWJsZXMgd2hpY2ggaGF2ZSBoaWdoIGNvcnJlbGF0aW9ucyB3aXRoIHRoZSByZXBvbnNlIHZhcmlhYmxlLgoKYGBge3J9CmNvcihDbG90aGluZ1ssIDI6N10pCmBgYAoKIyMjIyBRdWVzdGlvbiA1CgpEcmF3IGEgc2NhdHRlcnBsb3QgYmV0d2VlbiBBbW91bnQgYW5kIHRoZSBwcmVkaXRvciB3aXRoIHRoZSBoaWdoZXN0IGNvcnJlbGF0aW9uLgoKYGBge3J9CmdncGxvdChDbG90aGluZywgYWVzKHkgPSBBbW91bnQsIHggPSBEb2xsYXIxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX3Ntb290aChtZXRob2Q9ImxtIikKYGBgCgoKIyMjIyBRdWVzdGlvbiA1CgpSdW4gYSBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIERvbGxhcjEyLgoKYGBge3J9CkNsb3RoaW5nX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmdfbG0pCmBgYAoKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nX2xtKQpgYGAKCiMjIyMgUXVlc3Rpb24gNgoKUnVuIGEgbXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIHRocmVlIHByZWRpY3RvcnMgd2l0aCBoaWdoZXN0IGNvcnJlbGF0aW9ucy4gCgpgYGB7cn0KQ2xvdGhpbmcxX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyICsgRG9sbGFyMjQgKyBSZWNlbmN5LCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmcxX2xtKQpgYGAKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuIEhvdyBtdWNoIG9mIFJeMiBpcyBpbXByb3ZlZCBjb21wYXJpbmcgd2l0aCB0aGUgc2ltcGxlIG1vZGVsIGluIHF1ZXN0aW9uIDUuIAoKYGBge3J9CmdsYW5jZShDbG90aGluZzFfbG0pCmBgYAoKCiMjIyMgUXVlc3Rpb24gNwoKUnVuIGEgbXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gYmV0d2VlbiBBbW91bnQgYW5kIGFsbCBwcmVkaWN0b3JzLiAKCldoaWNoIHByZWRpY3RvciBhcmUgc2lnbmlmaWNhbnQgYXQgNSUgbGV2ZWw/CgpgYGB7cn0KQ2xvdGhpbmcyX2xtIDwtIGxtKEFtb3VudCB+IERvbGxhcjEyICsgRG9sbGFyMjQgKyBSZWNlbmN5ICsgRnJlcTEyICsgRnJlcTI0ICsgQ2FyZCwgZGF0YSA9IENsb3RoaW5nKQp0aWR5KENsb3RoaW5nMl9sbSkKYGBgCgpDYWxjdWxhdGUgdGhlIFJeMiBvZiB0aGlzIG1vZGVsLiBIb3cgbXVjaCBvZiBSXjIgaXMgaW1wcm92ZWQgY29tcGFyaW5nIHdpdGggdGhlIG1vZGVsIGluIHF1ZXN0aW9uIDYuCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nMl9sbSkKYGBgCgojIyMjIFF1ZXN0aW9uIDgKClJ1biBhIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uIGJldHdlZW4gQW1vdW50IGFuZCBzaWduaWZpY2FudCBwcmVkaWN0b3JzLiAKCmBgYHtyfQpDbG90aGluZzNfbG0gPC0gbG0oQW1vdW50IH4gRG9sbGFyMTIgKyAgRnJlcTEyLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmczX2xtKQpgYGAKCkNhbGN1bGF0ZSB0aGUgUl4yIG9mIHRoaXMgbW9kZWwuIEhvdyBtdWNoIG9mIFJeMiBpcyBpbXByb3ZlZCBjb21wYXJpbmcgd2l0aCB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gNy4gU2hvdWxkIHlvdSB1c2UgdGhpcyBtb2RlbCBvciB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gNz8KCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nM19sbSkKYGBgCgojIyMjIFF1ZXN0aW9uIDkKCkFkZCBhIG5ldyBjb2x1bW4gYEF2ZVNwZW50MTJgIHRvIGRhdGFzZXQgYENsb3RoaW5nYCB1c2luZyBgbXV0YXRlYCB0byByZWZsZWN0IHRoZSBhdmVyYWdlIGFtb3VudCBzcGVudCBvbiBlYWNoIHZpc2l0IG92ZXIgdGhlIHBhc3QgMTIgbW9udGguCgpgYGB7cn0KQ2xvdGhpbmcgPC0gbXV0YXRlKENsb3RoaW5nLCBBdmVTcGVudDEyID0gRG9sbGFyMTIvRnJlcTEyKQpDbG90aGluZwpgYGAKCklzIHRoZXJlIGFueSBhYm5vcm1hbCB2YWx1ZSBpbiB0aGUgbmV3IGNvbHVtbiBBdmVTcGVudDEyPyBXaHk/IFBsZWFzZSBydW4gZmlsdGVyIHRvIHJlbW92ZSBhbGwgcm93cyB3aXRoIE5hTiB2YWx1ZS4KCmBgYHtyfQpDbG90aGluZyA8LSBmaWx0ZXIoQ2xvdGhpbmcsIEZyZXExMiA+IDApCkNsb3RoaW5nIDwtIG11dGF0ZShDbG90aGluZywgQXZlU3BlbnQxMiA9IERvbGxhcjEyL0ZyZXExMikKQ2xvdGhpbmcKYGBgCgpSdW4gYSBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBiZXR3ZWVuIEFtb3VudCBhbmQgQXZlU3BlbnQxMi4gCgpgYGB7cn0KQ2xvdGhpbmc0X2xtIDwtIGxtKEFtb3VudCB+IEF2ZVNwZW50MTIsIGRhdGEgPSBDbG90aGluZykKdGlkeShDbG90aGluZzRfbG0pCmBgYAoKQ2FsY3VsYXRlIHRoZSBSXjIgb2YgdGhpcyBtb2RlbC4gSG93IG11Y2ggb2YgUl4yIGlzIGltcHJvdmVkIGNvbXBhcmluZyB3aXRoIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA4LiBTaG91bGQgeW91IHVzZSB0aGlzIG1vZGVsIG9yIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA4PwoKYGBge3J9CmdsYW5jZShDbG90aGluZzRfbG0pCmBgYAoKIyMjIyBRdWVzdGlvbiAxMAoKRHJhdyBhIHNjYXR0ZXJwbG90IGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyLiBJcyB0aGVyZSBhbnkgY3VydmF0dXJlIGluIHRoZSBwYXR0ZXJuIGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyPyBIb3cgc2hvdWxkIHdlIG1vZGlmeSB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gOSBpZiB0aGVyZSBpcyBhIGN1cnZhdHVyZSBpbiB0aGlzIHNjYXR0ZXJwbG90PwoKCmBgYHtyfQpnZ3Bsb3QoQ2xvdGhpbmcsIGFlcyh5ID0gQW1vdW50LCB4ID0gQXZlU3BlbnQxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX3Ntb290aChtZXRob2Q9ImxtIikKYGBgCgojIyMjIFF1ZXN0aW9uIDExCgpBZGQgYSBxdWFkcmF0aWMgdGVybSBvZiBBdmVTcGVudDEyIHRvIHRoZSBtb2RlbC4gCgpgYGB7cn0KQ2xvdGhpbmc1X2xtIDwtIGxtKEFtb3VudCB+IEF2ZVNwZW50MTIgKyBJKEF2ZVNwZW50MTJeMiksIGRhdGEgPSBDbG90aGluZykKdGlkeShDbG90aGluZzVfbG0pCmBgYAoKQ2FsY3VsYXRlIHRoZSBSXjIgb2YgdGhpcyBtb2RlbC4gSG93IG11Y2ggb2YgUl4yIGlzIGltcHJvdmVkIGNvbXBhcmluZyB3aXRoIHRoZSBtb2RlbCBpbiBxdWVzdGlvbiA5LgoKYGBge3J9CmdsYW5jZShDbG90aGluZzVfbG0pCmBgYAoKCmBgYHtyfQpDTG90aGluZzVfYXVnIDwtIGF1Z21lbnQoQ2xvdGhpbmc1X2xtKQpDTG90aGluZzVfYXVnCmBgYAoKIyMjIyBRdWVzdGlvbiAxMgoKRHJhdyBhIHJlc2lkdWFsIHBsb3Qgb2YgdGhlIG1vZGVsIGluIHF1ZXN0aW9uIDExLgoKYGBge3J9CmdncGxvdChDTG90aGluZzVfYXVnLCBhZXMoeSA9IC5yZXNpZCwgeCA9IC5maXR0ZWQgKSkgKyAKICBnZW9tX3BvaW50KCkgKwogIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDApCmBgYAoKRHJhdyBhIHF1YWRyYXRpYyBjdXJ2ZSBvZiB0aGUgbW9kZWwgaW4gcXVlc3Rpb24gMTEgb3ZlciB0aGUgc2NhdHRlciBwbG90IGJldHdlZW4gQW1vdW50IGFuZCBBdmVTcGVudDEyLiAKCmBgYHtyfQpnZ3Bsb3QoQ2xvdGhpbmcsIGFlcyh5ID0gQW1vdW50LCB4ID0gQXZlU3BlbnQxMikpICsKICAgIGdlb21fcG9pbnQoKSArCiAgICBnZW9tX2xpbmUoYWVzKHg9QXZlU3BlbnQxMiwgeT1DTG90aGluZzVfYXVnJC5maXR0ZWQpLCBjb2xvcj0icmVkIikKYGBgCgpBIGxvZyBsb2c6CgpgYGB7cn0KQ2xvdGhpbmc2X2xtIDwtIGxtKGxvZyhBbW91bnQpIH4gbG9nKEF2ZVNwZW50MTIpLCBkYXRhID0gQ2xvdGhpbmcpCnRpZHkoQ2xvdGhpbmc2X2xtKQpgYGAKCgpgYGB7cn0KZ2xhbmNlKENsb3RoaW5nNl9sbSkKYGBgCgoKCgo=