Data Byte: Fatness and ‘Sex’

Intro

I copied this data from The Handbook of Small Data Sets [1] (page 13, data set 17). Originally it was called Human age and fatness. According to the book the data came from a study done in the 80’s where researchers were investigating a new method of measure body fat percentage. They recorded 18 data points detailing age, sex, and fat percentage.

Initial Glance

The data has 18 rows and 3 columns. The data is also tidy with each row being a separate observation and each column being a separate variable. This is going to make our lives much easier since we won’t have to reshape the data. Also there is no missing data and every column has values we would expect i.e. age are all integers, fat are all floats, and ‘sex’ are all strings.

There are 4 ‘Male’ rows and 14 ‘Female’ rows. While we certainly couldn’t draw any major conclusions from our samples, especially the ‘Male’ data, there is enough here for us to run calculations on.

Our variables are:

  • Age
  • ‘Sex’
  • Fat Percentage

Summary Statistics

‘Male’

Population: 4

Mean: 13.1

Standard deviation: 8.1

Min: 7.8

Max: 27.4

‘Female’

Population: 14

Mean: 32.3

Standard Deviation: 4.72

Min: 25.2

Max: 42

Hypothesis

According to Active.com [2] the acceptable range of body fat % for ‘Male’ is between 18-25%. For ‘Female’ it is 25-31%.
My hypothesis is that the means for our two different groups do generally fall within range of ‘Acceptable’. To make calculations easy we will take the average of these two ranges. This gives us 21.5 for ‘Male’ and 28 for ‘Female.


Null Hypothesis: That our means are within the acceptable range of body fat percentage.
Alt Hypothesis: That our means are not within the acceptable range of body fat percentage.

Hypothesis Test

To test our hypothesis out we will use Python 3 and ttest_1samp from the Scipy package. Very briefly ttest_1samp is the T-Test for one group of data. We are operating under a 95% confidence rating. In order for us to reject the null hypothesis our p-value must be less 0.05.

from scipy.stats import ttest_1samp
Male_data = [9.5,7.8,17.8,27.4]
Female_data = [27.9,31.4,25.9,25.2,31.1,34.7,42.0,29.1,32.5,30.3,33.0,33.8,41.1,34.5]

print(ttest_1samp(Male_data, 21.5))

>>Ttest_1sampResult(statistic=-1.3079057076248999, pvalue=0.28210004133913985)

print(ttest_1samp(Female_data,28)))

>Ttest_1sampResult(statistic=3.2998921731172364, pvalue=0.005748911939385525)

Conclusions

The data we had for the ‘Male’ category was insufficient to reject the null hypothesis at a 95% confidence percentage. We are able to determine
this due to the high p-value i.e. its over 0.05
However the data we had for the ‘Female’ category was sufficient to reject our null hypothesis. In this case the p-value is far below 0.05

My guess is that the ‘Female’ category is more representative of ‘obese’ fat percentages. That is, fat percentages at or above 32%.

print(ttest_1samp(Female_data,32))
Ttest_1sampResult(statistic=0.24544652527318647, pvalue=0.8099429524799272)

Since the p-value is much higher than 0.05 my second hypothesis about the ‘Female’ data is supported by the data we currently have.

references

[1] https://books.google.com/books/about/A_Handbook_of_Small_Data_Sets.html?id=vWu-MJM_obsC

[2] https://www.active.com/fitness/calculators/bodyfat

Leave a comment