Image
Code

Interactive Scatter Plot for Boston Housing Dataset

yuzhang21

Last edited Oct 21, 2019
Created on Oct 06, 2019

A scatter plot of any two variables in Boston Housing Dataset.

In this dataset, the target variable is: "medv", which is the median value of owner-occupied homes. Because there are total 14 columns, we need to see the correlation among different variables to make sense of the data. Selection menus for both X and Y offers an excellent way for us to review the correlation between any two variables. Not only for the relationship to the target variable, but also for any pair of variables. In addition to X, Y channels, the point color is fixed to use "chas" variable, which indicates whether the house is near Charles river. And point size is mapped to "ptratio" - the pupil-teacher ratio of the town.

The legends for color and size variables are now added, with interactive effects. Now the point transparency could take 3 values, fade out: 20%, normal: 50%, highlight: 80% respectively. When mouse hover on color channel legend, which is "Charles" in this case. The points corresponding to unselected color, including the legend will be set to fade out transparency. The points in the other group remain in regular transparency and will standing out in visual. All points go back to normal transparency when mouse leaves. If the mouse hover on size channel legend, which is "P/T Ratio" here, not only the points not selected are faded out, but also the selected points are highlighted. This is because the selected group are relatively small comparing to the whole dataset. There is still a scaling issue that not all p/t ratio groups are showing up that need to be adjusted.

The original data comes from the github: Boston Housing Data

MIT Licensed