BAQM supplies functions developed by Babson College instructors for AQM 1000 and AQM 2000 courses using R in the curriculum. The primary functions include:
-
stat_desc()
- a function to produce descriptive statistics for data frames -
summary.lm()
- a function to summarize linear model results -
lm_plot.4way()
- a function to produce diagnostic plots for linear models -
print.summary.regsubsets()
- a function to format a “best subsets” report created by theregsubsets
function from the leaps package
Installation
You can install the development version of BAQM from GitHub with:
# install.packages("pak")
# pak::pak("CPA-wrk/BAQM")
Example
Here are summaries of built-in R data sets swiss
and iris
, with
- a best-subsets analysis of modeling Fertility in
swiss
and - a linear model with analytics of Sepal Length in
iris
.
(Variable names are truncated in swiss
to narrow the output.)
load_all()
#> ℹ Loading BAQM
#> Registered S3 methods overwritten by 'BAQM':
#> method from
#> print.summary.lm stats
#> summary.lm stats
# library(BAQM)
names(swiss) # Show original variable names
#> [1] "Fertility" "Agriculture" "Examination" "Education"
#> [5] "Catholic" "Infant.Mortality"
names(swiss) <- substr(names(swiss), 1, 4) # Narrows output
stat_desc(swiss)
#> Fert Agri Exam Educ Cath Infa
#> n.val 47 47 47 47 47 47
#> n.na 0 0 0 0 0 0
#> min 35 1.2 3 1 2.15 10.8
#> Q1 64.4 35.3 12 6 5.16 18
#> median 70.4 54.1 16 8 15.14 20
#> mean 70.14 50.66 16.49 10.98 41.14 19.94
#> Q3 79.3 67.8 22 12.5 93.4 22.2
#> max 92.5 89.7 37 53 100 26.6
#> std.dev 12.49 22.71 7.978 9.615 41.7 2.913
#
regs <- leaps::regsubsets(Fert ~ ., data = swiss, nbest = 3)
#> Registered S3 method overwritten by 'leaps':
#> method from
#> print.summary.regsubsets BAQM
print.summary.regsubsets(regs)
#>
#> Call: regsubsets.formula(Fert ~ ., data = swiss, nbest = 3)
#> _k_i.best rsq adjr2 see cp Agri Exam Educ Cath Infa
#> 1 1 ( 1 ) 0.4406 0.4282 9.44603 35.20 *
#> 2 1 ( 2 ) 0.4172 0.4042 9.64200 38.48 *
#> 3 1 ( 3 ) 0.2150 0.1976 11.18995 66.75 *
#> 4 2 ( 1 ) 0.5745 0.5552 8.33144 18.49 * *
#> 5 2 ( 2 ) 0.5648 0.5450 8.42614 19.85 * *
#> 6 2 ( 3 ) 0.5363 0.5152 8.69745 23.83 * *
#> 7 3 ( 1 ) 0.6625 0.6390 7.50542 8.18 * * *
#> 8 3 ( 2 ) 0.6423 0.6173 7.72776 11.01 * * *
#> 9 3 ( 3 ) 0.6191 0.5925 7.97396 14.25 * * *
#> 10 4 ( 1 ) 0.6993 0.6707 7.16817 5.03 * * * *
#> 11 4 ( 2 ) 0.6639 0.6319 7.57936 9.99 * * * *
#> 12 4 ( 3 ) 0.6498 0.6164 7.73642 11.96 * * * *
#> 13 5 ( 1 ) 0.7067 0.6710 7.16537 6.00 * * * * *
#
stat_desc(iris) # Includes non-numeric variable
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> n.val 150 150 150 150 150
#> n.na 0 0 0 0 0
#> min 4.3 2 1 0.1 n.lvls=4
#> Q1 5.1 2.7 1.6 0.2 setos:50
#> median 5.8 3 4.35 1.3 vrscl:50
#> mean 5.843 3.057 3.758 1.199 vrgnc:50
#> Q3 6.45 3.4 5.1 1.8
#> max 7.9 4.4 6.9 2.5
#> std.dev 0.8281 0.4359 1.765 0.7622
#
mdl <- lm(Sepal.Length ~ ., data = iris)
# mdl <- lm(mpg ~ wt + hp, data = mtcars)
print.summary.lm(summary.lm(mdl))
#>
#> Summary Statistics:
#> Value Performance Measure Err(Resids) Metric
#> Observations = 150 R-Squared = 0.86731 MAPE = 0.041785
#> F-Statistic = 188.25 Adj-R2 = 0.86271 MAD = 0.24286
#> Pr(b's=0),% = <2e-16 *** Std.Err.Est = 0.30683 RMSE = 0.30063
#>
#> Analysis of Variance:
#> Deg.Frdm Sum.of.Sqs Mean.Sum.Sqs F.statistic p-value(F)
#> Regression 5 88.612 17.722370 188.25 <2e-16 ***
#> Error(Resids) 144 13.556 0.094142
#> Total 149 102.168
#>
#> Coefficients:
#> Coefficient Std.Error t-stat p-value VIF
#> (Intercept) 2.17127 0.279794 7.7602 1.43e-12 ***
#> Sepal.Width 0.49589 0.086070 5.7615 4.87e-08 *** 2.2275
#> Petal.Length 0.82924 0.068528 12.1009 < 2e-16 *** 23.1616
#> Petal.Width -0.31516 0.151196 -2.0844 0.03889 * 21.0214
#> Species_versicolor -0.72356 0.240169 -3.0127 0.00306 ** 20.4234
#> Species_virginica -1.02350 0.333726 -3.0669 0.00258 ** 39.4344
#>
#> Signif.Levels: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
#>
#> Summary of Min 1Q Mean Median 3Q Max
#> Residuals: -0.7942 -0.2187 <3e-14 0.008987 0.2025 0.731
#>
#> Call: lm(formula = Sepal.Length ~ ., data = iris)
lm_plot.lst <- lm_plot.4way(mdl)
lm_plot.lst$p_4way
#> Warning: ggrepel: 66 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps