Export Image
Export Code

Boston Housing Dataset Part I

yuzhang21

Last edited Sep 18, 2019
Created on Sep 10, 2019

A visualization constructed using the vega-lite-api.

In this Boston Housing Dataset, the target variable is: medv, the median value of owner-occupied homes. Because there are total 14 columns, we need to see the correlation among different variables to make sense of the data. In a single page, we showed 4 plots to review the information in the dataset03.

  1. In the first plot, medv and lstat shows strongest negative correlation, lstat indicates lower status of the populution in percentage. That means higher housing price region usually have less the fewer lower status people.
  2. We try to visualize the pupil-teacher ratio (ptratio) related to the housing price. The data shows that their correlation is low. On the left side of the plot, it shows that low ptratio around 12 don’t have house price below a relatively high value around 22. On the opposite side, if the ptratio is higher than 20, there are no house price more than 30.
  3. Age field means the proportion of owner-occupied units built prior to 1940. Therefore, the higher the age value, the more old houses exist. The plot shows that the old houses has a high value range, could be cheaper than new houses due to the age, but still have some high priced ones as well.
  4. Nox value is related to NO, which is the major pollution gas in industrial area. Therefore, nox and indus fields are likely to have higher correlation. However, from the plot we can see that the indus value, which is defined as the proportion of non-retail business acres per town, can top-off around 20% with some fewer exceptions.

See Data on Gist: Housing Values at Boston Suburbs

The original data comes from the github: Boston Housing Data

MIT Licensed