Wine Quality Datasets
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
The data includes two datasets:
- winequality-red.csv – red wine preference samples;
- winequality-white.csv – white wine preference samples;
The datasets are available here: winequality.zip
I will be using “winequality-red.csv” dataset for this assignment purposes.
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 – fixed acidity
2 – volatile acidity
3 – citric acid
4 – residual sugar
5 – chlorides
6 – free sulfur dioxide
7 – total sulfur dioxide
8 – density
9 – pH
10 – sulphates
11 – alcohol
Output variable (based on sensory data):
12 – quality (score between 0 and 10)
# First Few datapoints in the dataset
Running regression with a moderator
Procedure: let us run a regression between “quality” as dependable variable, and residual sugar and chlorides as independent vars. with a moderator pH.
Here are the results:
ANOVA with moderator.
Using the above variables, here are the results:
we can see that pH is significantly affecting Chlorides (95% CI) and Residual Sugar (90% CI) in the above analysis for wine quality.
Task 2: Chi Square Test for the same
With interaction of pH on Chlorides and Residual Sugar, following is the Chi.Sq. test result:
Since P-val < 0.05, we can say that there is interaction effect on both.