Featured Image

If Something Is True, Does It Mean It's Important? Understanding Statistical Significance

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant.
Nov 18, 2021

The statistical perspective of significance should not be confused with the practical sense of significance. Consider the difference between something having strategic importance versus something being statistically significant. Statistical significance means that there is enough evidence to suggest that the relationship observed in the collected sample also exists in the broader population. In other words, the effect is not due to chance. 

What does statistical significance involve?

In an experiment, information is usually extrapolated based on a representative sample. Since every possible data point in a population is not included, there will naturally be sampling error.  

For example, assume there are a set of cohorts within a membership program. The success of a membership initiative is being measured based on a sample taken from each cohort. The measurement being gauged may appear higher for one cohort than the others. However, the sample drawn may have not sufficiently portrayed the population for that cohort.

Variation in the original population as well as the sample size contribute to sampling error. The effect of sampling error increases with smaller samples. Generally, with larger samples, statistical significance is less likely to be based on randomness.

With a more varied population, the confidence in the findings being statistically significant decreases. When the data is more widely dispersed from its mean as shown in the red distribution in Figure 2, there is more variation and therefore higher sampling error. Based on the red distribution, the amount of research requested by membership differs more. 

With the narrower distribution shown in black, it can be assumed that most members have around the same number of research requests. The confidence in the findings is greater since the data is not as scattered. Also, in this case, the sample most likely better resembles the underlying population.

How is statistical significance determined?

Determining statistical significance involves establishing a null hypothesis. A null hypothesis is a statement initially assumed to be true. Regarding the red and black distributions above, a null hypothesis may be that “There isn’t a difference in the average number of research requests for the two populations.” You are trying to determine if this null hypothesis is false. 

An alternative hypothesis should also be established. This is a statement that you are trying to prove. Given the above distributions, an alternative hypothesis might be that “There is a difference in the average number of research requests for the two populations.”

Another component to assessing statistical significance is the significance level, a threshold for understanding if the null hypothesis should be rejected. The significance level commonly used is 0.05, although other values can be used. There isn’t a singular threshold value that always confirms statistical significance.

A probability known as the p-value is generated from an applicable statistical test and is compared against this threshold value. If the p-value is smaller than the threshold, the null hypothesis is rejected, indicating a significant result and that the result is less likely random. With a smaller p-value, there is greater evidence that the null hypothesis is false. If the p-value is larger than the threshold, the result is considered non-significant, and the null hypothesis is not rejected.

What other factors surround statistical significance?

Non-sampling error occurs as well when samples are used to generalize about a population. This includes the bias that will potentially exist with factors such as poorly worded survey questions, ill-suited sampling methods, or low response rates. While p-values produced from statistical tests help rationalize sampling error, quantifying non-sampling error poses more of a challenge. Minimizing non-sampling error involves structuring the analysis as such to validate the results. This may involve introducing an element into the design that will reduce the effect of the error.

Confidence intervals are tied to significance levels and are affected by variation and sample size. They convey how accurate a calculated statistic is likely to be. They are wider for a population that is more varied and narrower with bigger samples. As an example, a 95% confidence interval indicates that 95 intervals will include the true population value and 5 will not for every 100 calculated confidence intervals from the sample.

What are a few particulars on statistical significance?

It is possible to have statistically significant results that have a minimal effect where the results are not important. A small p-value does not necessarily imply importance. When a finding is statistically significant, it is unlikely due to plain luck. Statistical significance should not solely be used to interpret whether an impact is meaningful.

PicturePicture
Author
Nina Anderson
Data Scientist
Recent intelligence News
Suppliers throughout the Department of Defense supply chain received memos this fall from their biggest customers mandating they demonstrate steps taken toward obtaining cybersecurity certification, or lose out on new contracts. For many small...
May data signaled a slightly softer, but nonetheless severe, contraction in U.S. manufacturing output. The decrease in output was largely driven by a further weakening of client demand and lower new order inflows from both domestic and foreign customers...
Jan de Nijs oversees Lockheed Martin’s manufacturing production data collection and management at the F-35 plant in Ft. Worth, Texas and is team leader within the Lockheed Martin Digital Transformation Program. In 2019, he was awarded the prestigious...
"The challenges are not necessarily capturing and analyzing data, rather what to analyze in the first place,” says ABI Research. Data management and data analytics continue to become an essential part of how manufacturers conduct business...
IHS Markit compiles data from the Purchasing Managers’ Index (PMI) for more than 40 economies worldwide. Monthly reports are derived from survey data collected from senior executives at private sector companies. This month, private sector firms in the...
Similar News
undefined
Technology
By Benjamin Moses | Sep 24, 2021

Episode 57: Steve explains why electric motorcycles are failing and he’s part of the problem! Ben pivots to the success of robotics in woodworking. Stephen quotes some Wall Street nerds with their take on additive manufacturing in/for space.

37 min
undefined
Technology
By Benjamin Moses | Oct 29, 2021

Episode 59: Steve delivers a terrible intro. Ben talks about his visit with a gear grinding manufacturer and then an article covering Porsche’s plans to 3D print bespoke seats. Stephen tried to break into a 3D-printed neighborhood in Austin, TX...

34 min
undefined
Technology
By John Turner | Nov 23, 2021

The types of equipment connected to OT networks (manufacturing) are fundamentally different than the equipment connected to an IT network (office). This difference represents a whole new set of challenges for IT professions attempting to extend their ...

5 min