The goal of this analysis is to explore the relationship between a set of variables such as number of cylinders, displacement, gross horsepower, etc and miles per gallon (MPG). For the analysis â€˜Motor Trend Car Road Testsâ€™ dataset in R is used. This data was extracted from 1974 Motor Trend US magazine. The analysis answers the following two questions:

- Is an automatic or manual transmission better for MPG?
- Quantify the MPG difference between automatic and manual transmissions?

Loading the required packages

`library(ggplot2)`

Loading the â€˜mtcarsâ€™ data set. Coverting appropriate variables into factors.

```
data("mtcars")
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs, labels = c('V-Engine', 'Straight Engine'))
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c('Automatic','Manual'))
```

Refer apendix, to see the exploratory plots.

Checking the variance of both the samples.

`var(mtcars$mpg[mtcars$am == 'Automatic'])`

`## [1] 14.6993`

`var(mtcars$mpg[mtcars$am == 'Manual'])`

`## [1] 38.02577`

As the variance isnâ€™t equal, performing Welchâ€™s t test.

`t.test(mtcars$mpg~mtcars$am,conf.level=0.95)`

The summary of the Welchâ€™s t test is in the appendix. As p-value < 0.05, we reject the Null Hypothesis that mean MPG is same for both the transmission types.

Fitting a linear model with â€˜mpgâ€™ as the response and â€˜amâ€™ as the regressor.

```
model1 <- lm(mpg~am, data = mtcars)
summary(model1)
```

The summary for this model can be found in the appendix. As R-Squarred value is 0.3598, this model only accounts for 36% variablity in mpg. Hence this models isnâ€™t a good fit. A linear model with â€˜mpgâ€™ as the response and all the remaining variables as the regressors will result in overfitting. Hence we obtain the best model by backward selection, using â€˜stepâ€™ function in r.

```
model2 <- lm(mpg~., data = mtcars)
best_model <- step(model2, direction = "backward")
```

`summary(best_model)`

```
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## amManual 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
```

As the R-squarred value is 0.8659, the model accounts for 86.6% variablity in the mpg. This model is a good fit for the data. Various diagnostic plots are included in appendix.

According to the Welchâ€™s test and regression model, Manual Transmittion is better for mpg. Manual Transmission results in an increase of 1.8092 in mpg, keeping other variables constant. However this relation isnâ€™t very significant.

```
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
```

```
model1 <- lm(mpg~am, data = mtcars)
summary(model1)
```

```
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
```

Mtcars pair plot

`pairs(mtcars)`

box plot of mpg ~ am

```
g <- ggplot(data = mtcars, aes(am,mpg))
g+geom_boxplot()+labs(x = 'Transmission', y = 'Miles per Gallon')
```

```
par(mfrow=c(2, 2))
plot(best_model)
```