Dr Charles Martin
age = 999
Quantitative Analysis: find magnitude, amounts or size of something and make rigorous comparisons
Qualitative Analysis: find the nature of things, themes, patterns, stories
What do we do with quantitative data once we have some?
what’s the average of a set of values?
Example: [2, 2, 3, 4, 873]
Example shows that outliers mess up mean, so median is often more useful.
how spread out is a set of values?
max - min
Similarly to central measures, interquartile range is robust against weird outliers
The distribution of the data is how it is spread out and where it is bunched up.
This matters because statistical tests often assume data is normal so findings might be misleading.
First think to do after loading it in. May not be the most helpful approach… but still important to check it’s not garbled and the columns make sense.
interactive activities | attend in person | watch online | degree | time in CBR |
---|---|---|---|---|
5 | 2 | 2 | undergraduate | 1-3 years |
3 | 5 | 1 | postgraduate | 3+ years |
4 | 5 | 1 | postgraduate | <1 year |
5 | 2 | 4 | undergraduate | <1 year |
4 | 3 | 1 | undergraduate | <1 year |
Second thing to do when loading up data for analysis, calculate:
Ask: are these values what you expected? do they suggest any interesting points about your data?
stat | interactive activities | attend in person | watch online |
---|---|---|---|
count | 75 | 75 | 75 |
mean | 3.36 | 3.15 | 2.84 |
std | 1.30 | 1.24 | 1.39 |
min | 1 | 1 | 1 |
25% | 2 | 2 | 2 |
50% | 3 | 3 | 3 |
75% | 4 | 4 | 4 |
max | 5 | 5 | 5 |
Third thing to do when loading data
If plots show something interesting then you can investigate.
You can get more plots into one plot. Good for surfacing contrasts or telling a story about the data graphically.
sns.set_theme(style="ticks", palette="Set2")
plt.figure(figsize=(10, 6))
sns.boxplot(data=survey_data, x='degree_program',
y='interactive_activities_likert',
hue='time_in_canberra',
medianprops={'linewidth': 2, 'color': 'black'})
plt.savefig('plots/fake_data_complex_boxplot.png',
bbox_inches='tight', dpi=300)
plt.show()
Lots of ways to do data analysis:
In this class we’ll use Python, numpy, pandas, scipy, seaborn, and matplotlib as a default stack for data analysis (yes, libraries are a problem in python…)
Let’s do some data analysis
A theme is a high level finding from qualitative analysis, but what that means can differ.
Who has a question?