Hypothesis Testing - KZHU.ai 🚀

Table of Contents

Typically we have some question, and that question could be “could the value of the parameter be X?” We could have some idea of what it could be but we’re not sure if that’s correct or not. Then, with that question we use data to try and support that claim or maybe go against that claim.

Test One Population Proportion

With any hypothesis test, you first want to start with your hypothesis before you even collect any data. The first hypothesis is the called null hypothesis, H₀. The second hypothesis is called alternative H_a. Finally we need to set significance level α which is typically 0.05. It is basically the cut-off point of when we’ve found something to be significant.

There are assumptions we need to check. First, if the sample is random. Second, if the sample size if large enough, which will ensure the distribution of sample proportions is normal.

After we’ve set up our testing for one population proportion, and then we’re actually going to test it by:

calculating a test statistic
then getting a p-value and
finally, making a conclusion of that p-value

A test statistic is determined by:

(Best estimate - Hypothesized estimate) / Standard error of estimate

Best estimate is p^{^}, hypothesized estimate is p₀, standard error of estimate p^{^} is defined as √(p(1-p)/n). But we don’t exactly know what p is, we can use p₀ to guess, we call it the null standard error under the null hypothesis. So the test statistic is calculated as:

(p^{^} - p₀) / √( p₀ (1 - p₀) / n )

This test statistic (also called z test statistic) means our observed sample proportion is how many null standard errors above our hypothesized population proportion. The z test statistic is random variable, it has normal distribution, N(0, 1), because our original data is normal and we centered (by subtracting p₀ from p^{^} ) and scaled the data (by dividing the standard error).

Then we can find p-value from the test statistic. Since we have our p-value, we can come up with a conclusion with it.

p-value < α	Reject the null hypothesis
p-value > α	Fail to reject the null hypothesis

Test Difference in Population Proportions

We want to check two assumptions before doing any calculation:

We have two independent random samples.
We need large enough sample sizes to assure that the distribution of our estimate is normal.

From the generic formula, we know that the test statistic is below, assuming the hypothesized estimate is zero:

(Best estimate - Hypothesized estimate) / Standard error of estimate
⟹ ( (p₁^{^} - p₂^{^}) - 0 ) / √( p^{^} (1-p^{^}) × (1/n₁ + 1/n₂) )

where
p^{^} : common proportion for both populations

Then we could use the test statistic to calculate p-value. The p-value means that our observed difference in sample proportion (p₁^{^} - p₂^{^}) is how many estimated standard errors below / above the hypothesized estimate of equal population proportions 0. Next we could compare the p-value to significance level α, to decide whether we can reject the null hypothesis (p-value < α) or fail to reject the null hypothesis (p-value > α).

Alternative Approaches

One of the alternative approaches is the Chi-Square (X²) test. For the Chi-Square test, there are slightly different hypothesis. You typically will be looking to see if there’s a difference in distributions although there are a couple of other ways that you can formulate it within a Chi-Square test. In addition, Chi-Square tests really only work if you have a two-sided hypothesis. In general though, you will have the same conclusion. Your p-value should be the same in the Chi-Square test as in the difference in population proportion hypothesis test, as long as you did have that two-sided alternative hypothesis.

Another approach is Fisher’s Exact Test, which allows you to calculate the test if you have small sample sizes. You can have a one-sided alternative hypothesis, in other words it doesn’t have to be ≠. It can be ≥ or ≤. It doesn’t use the assumption for the normal distribution for your test statistic. It does calculate slightly different p-values.

Test a Population Mean

One of the first steps in performing our test is to make sure we’re defining the two competing theories called the null hypothesis and the alternative hypothesis. Before we actually conduct the test though, we should also examine some of the assumptions:

the sample could be considered a random sample
the model for our population can be assumed to be bell-shaped or normal. The normality condition is not quite as crucial if our sample size is large enough because we’ll be able to rely on that Central Limit theorem.

To measure how far away sample means tend to be from that true mean, we would need to know the standard error of the sample mean (σ / √n). Since we do not know the population standard deviation σ for the original population of all measurements, we will have to still use our data that we have (our sample of observations). We can measure the variability of our sample and use that sample standard deviation s in our standard error expression, i.e. estimated standard error of the sample mean (s / √n).

So the test statistic is:

( x^- - Null hypothesis estimate ) / ( s / √n )

we often convert our test statistic to a probability value (p-value) that will measure whether this distance we’re seeing here is unusual or not. The p-value is the probability of seeing test statistic or more extreme assuming the null hypothesis is true.

If null hypothesis is true, the t test statistic follows a distribution called T-distribution with degree of freedom n - 1. T-distributions are still somewhat bell-shaped, they just have a little thicker tails than our normal distributions for testing about proportions. The degrees of freedom indicate how thick those tails are.

If our p-value is big compared to our significance level, we do not have enough evidence to reject the null hypothesis, in other words we fail to reject that null hypothesis.
If the p-value turned out to be smaller than the significance level, saying that our data was more unusual, we’ll then reject null hypothesis.

Alternative Approaches

If we’re not convinced that the normal model is reasonable and we’re a little skeptical about our observations being large enough, there are alternate techniques that we could use: non-parametric test that don’t assume normality. Wilcoxon Signed Rank Test is a non-parametric analog of the one sample t test. It uses the median as its measure of center location rather than the mean. We would calculate a test statistic, and we would get out another p-value and use that to make the decision in the same way we have in the past.

Test Difference in Means for Paired Data

We need to assume that we have a random sample of differences. We also need the population of differences to be normally distributed. We can get around this assumption if we have a large sample size (about 25+).

If we assume that the sampling distribution of the sample mean difference is normal, then we can use that t-test statistic.

( x_d^- - Null hypothesis estimate ) / ( s_d / √n )

We need to look at what the p-value is. Because we have a t-test statistic, we know that it’ll follow a t distribution with degree of freedom n - 1. Now to get the t-distribution for the p-value, we’ll go ahead and draw a T-distribution. It looks very similar to that normal distribution, it just has slightly heavier tails because we have used that estimated standard deviation s rather than using a true population standard deviation σ.

Now using our p-value, we should compare that to our significance level α. If that p-value is less than α, that indicates that we should reject the null hypothesis, we have enough evidence against the null hypothesis that we can reject it.

Test Difference in Means for Independent Groups

We’re going to actually see if we have a significant difference between two populations. The steps to perform a hypothesis test are as follows:

define the null and alternative hypothesis.
go through examine our data, check our assumptions and make sure those hold
- Samples are considered simple random samples
- Samples are independent from one another
- Both populations of responses are approximately normal (or sample sizes are both large enough).
calculate a test statistic for our data, with which we’re going to determine the corresponding p-value
make a decision based off this p-value about our null hypothesis.

Test statistic is a measure of how far our sample statistics is from our hypothesized population parameter in terms of estimated standard error. The further away our sample statistic is, the less confident we’ll be in our null hypothesized value.

Remember the estimated standard error for two means can change depending on which approach we are going to use:

Pooled approach	The variance of the two populations are assumed to be equal. `σ₁² = σ₂²`
Unpooled approach	The assumption of equal variance is dropped.

For the pooled approach, the test statistic is calculated as:

(( x₁^- - x₂^- ) - Null hypothesis estimate ) / s_p × √(1/n₁ + 1/n₂) )

where
s_p : √( [ (n₁-1)s₁² + (n₂-1)s₂² ] / (n₁ + n₂ - 2) )

For the unpooled approach,

(( x₁^- - x₂^- ) - Null hypothesis estimate ) / √(s₁²/n₁ + s₂²/n₂) )

With the test statistic, we’re going to calculate our p-value, which is the probability of observing a test statistic or more extreme. Note we need to use t-distribution with degrees of freedom n₁ + n₂ - 2.

If p-value is larger than the significance level, we fail to reject the null hypothesis.

The Importance of Well-Formulated Research Questions

Wherever we get the data from, inferences based on those analyses are going to tend to miss the mark, if we don’t have a well formulated research question underlying the study that we’re trying to perform. If we want to start applying statistical procedures to data, we have to have a good idea of what question we’re answering in the first place.

What defines a good research question? There are some key aspects to think about:

What is the target population of interest?
Is research question descriptive or analytic?
- Descriptive: we are interested in estimating a descriptive parameter. E.g.: mean income in a specific population?
- Analytic: we are interested in the relationship between different variables. E.g: relationship between income and quality of life in specific population?
Has question been asked before? Will new study add knowledge that did not exist before?
Are variables readily available, measured appropriately or feasible to measure using well-established tools?

If we craft the research question following those four key aspects, and we use an appropriate statistical procedure that’s well aligned with that research question given the four properties, we can make very good inferences related to that question.

The absence of a good research question and just blindly running analyses, generating different results, writing up those results, could very easily lead to poor insights and incorrect decisions. We need to make sure that, the analyses that we’re running are well aligned with a carefully crafted research question that will maximize the quality of the inferences that we make.

My Certificate

For more on Hypothesis Testing, please refer to the wonderful course here https://www.coursera.org/learn/inferential-statistical-analysis-python/

My 126th certificate from Coursera

Related Quick Recap

Confidence Intervals

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Test One Population Proportion

Test Difference in Population Proportions

Alternative Approaches

Test a Population Mean

Alternative Approaches

Test Difference in Means for Paired Data

Test Difference in Means for Independent Groups

The Importance of Well-Formulated Research Questions

My Certificate

Related Quick Recap

Related Posts

Kubernetes Deployment and Networking

Cloud Computing: Law Enforcement, Competition and Tax

My 13th specialization certificate from Coursera