Wine Quality Datasets

These datasets are public available for research purposes only. The details are described in [Cortez et al., 2009]: [©Elsevier] [Pre-press (pdf)] [bib].


P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

The data includes two datasets:


  • winequality-red.csv – red wine preference samples;
  • winequality-white.csv – white wine preference samples;

The datasets are available here:

and at:


I will be using “winequality-red.csv” dataset for this assignment purposes.

Attribute Information:

For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 – fixed acidity
2 – volatile acidity
3 – citric acid
4 – residual sugar
5 – chlorides
6 – free sulfur dioxide
7 – total sulfur dioxide
8 – density
9 – pH
10 – sulphates
11 – alcohol
Output variable (based on sensory data):
12 – quality (score between 0 and 10)

# First Few datapoints in the dataset


Running regression with a moderator

Procedure: let us run a regression between “quality” as dependable variable, and  residual sugar and chlorides as independent vars. with a moderator pH.

Here are the results:


Task 1:

ANOVA with moderator.

Using the above variables, here are the results:


we can see that pH is significantly affecting Chlorides (95% CI) and Residual Sugar (90% CI) in the above analysis for wine quality.

Task 2: Chi Square Test for the same

With interaction of pH on Chlorides and Residual Sugar, following is the Chi.Sq. test result:


Since P-val < 0.05, we can say that there is interaction effect on both.