The explainr R package
Mar 6, 2016
3 minute read

I heard about the explainr R package in episode 10 of the Not So Standard Deviation podcast and thought I would give it a try.

The main idea of the package is to make it easy for beginners to understand the output of a hypothesis test performed in R by returning a paragraph or two explaining the output in plain English. This explanation also provides a nice template to paste into reports or manuscripts only requiring a minimal amount of manual editing.

Using the explainr package

After installing the package from its GitHub repository, I first tested the example in the project’s README which yields a nice explanation of the output of a hypothesis test for testing the null that the proportion of successes in a trial is 0.5:

library(explainr)
ptest <- prop.test(x = 500, n = 1008)
explain(ptest)

Which yields the following output:

This was a one-sample proportion test of the null hypothesis that the true population proportion is equal to 0.5. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that true population proportion is different than 0.5. The observed sample proportion is 0.496031746031746 (500 events out of a total sample size of 1,008).

The confidence interval for the true population proportion is (0.464746, 0.5273481). This interval will contain the true population proportion 95 times out of 100.

The p-value for this test is 0.8254979. This, formally, is defined as the probability – if the null hypothesis is true – of observing a sample proportion that is as or more extreme than the sample proportion from this data set. In this case, this is the probability – if the true population proportion is 0.5 – of observing a sample proportion that is greater than 0.503968253968254 or less than 0.496031746031746.

Applying explainr’s explain() function to the output of a t-test works as follows:

t.res <-  t.test(rnorm(10), rnorm(10))
explain(t.res)

However, the output is only a minimal explanation:

This hypothesis test had a p-value of 0.2665616.

While the default print of the t-test result provides additional information - of course not written in plain English.

t.res

Welch Two Sample t-test

data: rnorm(10) and rnorm(10)

t = -1.1479, df = 17.362, p-value = 0.2666

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.2359499 0.3640234

sample estimates:

mean of x mean of y

-0.007502282 0.428460926

Summary

Overall, I think that the idea behind the explainr package is great: Hypothesis tests are often misunderstood and many users only look at the p-value returned by a given test. The explainr package remedies this by both explaining what the test is designed to do and also which output is generated by the test by, for instance, mentioning confidence intervals.

As seen in the t-test example, explainr is still to be extended to yield concise summaries of more hypothesis tests but it’s a step in the right direction. I am sure the package developers welcome pull requests on GitHub adding new features to explainr.



comments powered by Disqus