Visualization Basics in Seaborn, Matploblib, Altair

Trevor Kapuvari Voter Turnout 2018

Importing Libraries

import altair as alt
import geopandas as gpd
import hvplot.pandas
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import os
%matplotlib inline

#Loading the CSV and reading it
voterturnout2018df = pd.read_csv("./Data/voter_turnout_primary_election_2018.csv")
voterturnout2017df = pd.read_csv("./Data/voter_turnout_primary_election_2017.csv")
voterturnout2016df = pd.read_csv("./Data/voter_turnout_primary_election_2016.csv")
voterturnout2015df = pd.read_csv("./Data/voter_turnout_primary_election_2015.csv")

Data Wrangling and Database Framing

voterturnout2018df

MainParties18 = voterturnout2018df[voterturnout2018df['political_party'].isin(['DEMOCRATIC','REPUBLICAN'])]
MainParties17 = voterturnout2017df[voterturnout2017df['political_party'].isin(['DEMOCRATIC','REPUBLICAN'])]
MainParties16 = voterturnout2016df[voterturnout2016df['political_party'].isin(['DEMOCRATIC','REPUBLICAN'])]
MainParties15 = voterturnout2015df[voterturnout2015df['political_party'].isin(['DEMOCRATIC','REPUBLICAN'])]

merged_elections = pd.concat([MainParties18, MainParties17, MainParties16, MainParties15], axis=0)
merged_elections

	election	election_date	precinct_description	precinct_code	political_party	voter_count
0	2018 GENERAL PRIMARY	5/15/2018	PHILA WD 01 DIV 01	101	DEMOCRATIC	158
3	2018 GENERAL PRIMARY	5/15/2018	PHILA WD 01 DIV 01	101	REPUBLICAN	9
4	2018 GENERAL PRIMARY	5/15/2018	PHILA WD 01 DIV 02	102	DEMOCRATIC	174
8	2018 GENERAL PRIMARY	5/15/2018	PHILA WD 01 DIV 02	102	REPUBLICAN	8
9	2018 GENERAL PRIMARY	5/15/2018	PHILA WD 01 DIV 03	103	DEMOCRATIC	243
...	...	...	...	...	...	...
6473	2015 MUNICIPAL PRIMARY	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 44	6644	REPUBLICAN	61
6477	2015 MUNICIPAL PRIMARY	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 45	6645	REPUBLICAN	48
6478	2015 MUNICIPAL PRIMARY	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 45	6645	DEMOCRATIC	67
6482	2015 MUNICIPAL PRIMARY	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 46	6646	REPUBLICAN	94
6483	2015 MUNICIPAL PRIMARY	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 46	6646	DEMOCRATIC	113

12959 rows × 6 columns

merged_elections.set_index(['precinct_code', 'election', 'political_party'], append=True)
#setting an index better organized the table to indicate ways to tidy/pivot it to our liking

				election_date	precinct_description	voter_count
	precinct_code	election	political_party
0	101	2018 GENERAL PRIMARY	DEMOCRATIC	5/15/2018	PHILA WD 01 DIV 01	158
3	101	2018 GENERAL PRIMARY	REPUBLICAN	5/15/2018	PHILA WD 01 DIV 01	9
4	102	2018 GENERAL PRIMARY	DEMOCRATIC	5/15/2018	PHILA WD 01 DIV 02	174
8	102	2018 GENERAL PRIMARY	REPUBLICAN	5/15/2018	PHILA WD 01 DIV 02	8
9	103	2018 GENERAL PRIMARY	DEMOCRATIC	5/15/2018	PHILA WD 01 DIV 03	243
...	...	...	...	...	...	...
6473	6644	2015 MUNICIPAL PRIMARY	REPUBLICAN	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 44	61
6477	6645	2015 MUNICIPAL PRIMARY	REPUBLICAN	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 45	48
6478	6645	2015 MUNICIPAL PRIMARY	DEMOCRATIC	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 45	67
6482	6646	2015 MUNICIPAL PRIMARY	REPUBLICAN	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 46	94
6483	6646	2015 MUNICIPAL PRIMARY	DEMOCRATIC	05/19/2015 12:00:00 AM	PHILA WD 66 DIV 46	113

12959 rows × 3 columns

tidyElections = pd.pivot(merged_elections, columns = ("election", "political_party"), index = "precinct_code", values = "voter_count")
tidyElections

election	2018 GENERAL PRIMARY		2017 MUNICIPAL PRIMARY		2016 GENERAL PRIMARY		2015 MUNICIPAL PRIMARY
political_party	DEMOCRATIC	REPUBLICAN	DEMOCRATIC	REPUBLICAN	REPUBLICAN	DEMOCRATIC	DEMOCRATIC	REPUBLICAN
precinct_code
101	158.0	9.0	135.0	4.0	13.0	207.0	157.0	3.0
102	174.0	8.0	178.0	6.0	38.0	288.0	187.0	11.0
103	243.0	16.0	205.0	10.0	37.0	318.0	245.0	17.0
104	183.0	28.0	147.0	21.0	65.0	242.0	170.0	24.0
105	84.0	7.0	62.0	8.0	11.0	124.0	98.0	10.0
...	...	...	...	...	...	...	...	...
6642	49.0	40.0	47.0	32.0	95.0	122.0	101.0	63.0
6643	84.0	63.0	79.0	44.0	176.0	168.0	147.0	100.0
6644	98.0	44.0	71.0	33.0	122.0	175.0	159.0	61.0
6645	31.0	26.0	29.0	19.0	100.0	80.0	67.0	48.0
6646	58.0	69.0	73.0	78.0	164.0	119.0	113.0	94.0

1688 rows × 8 columns

Matplotlib Chart, Data & Plotting

Here we are comparing voter participation for the Democratic Party between the years of 2016 and 2018, specifically among their primary elections.

Matplotlib was used for the scatter chart because of the simplisity and direct approach to plotting large sets of data. The basic approach of matplotlib creates visualizations that are easy to comprehend for viewers.

Here, we can tell that the precincts that had more votes in 2016 also had more votes in 2018, with some notable outliers.

x= tidyElections['2018 GENERAL PRIMARY']['DEMOCRATIC']
y = tidyElections['2016 GENERAL PRIMARY']['DEMOCRATIC']

#matplotlib

fig, ax = plt.subplots()
x = x #2018 votes
y = y #2016 votes 
ax.scatter(x, y, c='blue')
ax.set_title('Democratic Votes in General Primary, 2018  vs 2016, by Precinct')
ax.set_xlabel('Votes in 2018')
ax.set_ylabel('Votes in 2016')
plt.ylim(0,400)
plt.show()

The scatter plot displays the difference in Democratic votes in the 2018 and 2016 general primaries. Each dot represents a precinct that compares the two years. You may also notice the cluster goes off the chart when trending upward. This cut-off was done on purpose, the x and y axis are fixed to be the same to demonstrate the difference in voter participation. In 2016, a presidential election year, showed significantly more participation than in 2018, a mid-term election year. When lookinng at individual points, you will notice there is, generally, more votes counted in 2016 than in 2018.

Seaborn Charts

In order to better visualize the seaborn chart, we needed to extract specific data that pandas can better comprehend. The issue with tidyElections is the nested columns among elections and political party votes.

Electiondf = pd.DataFrame({'Election':['2018 GENERAL PRIMARY','2017 MUNICIPAL PRIMARY','2016 GENERAL PRIMARY','2015 MUNICIPAL PRIMARY'],
                            'Democratic Votes':[154206,161534,350914,241611],
                            'Republican Votes':[16378, 12237, 47684, 20411],
                            'Total':[170584,173771,398598,262022]})

Electiondf.reset_index()

	index	Election	Democratic Votes	Republican Votes	Total
0	0	2018 GENERAL PRIMARY	154206	16378	170584
1	1	2017 MUNICIPAL PRIMARY	161534	12237	173771
2	2	2016 GENERAL PRIMARY	350914	47684	398598
3	3	2015 MUNICIPAL PRIMARY	241611	20411	262022

Republican Votes in Primary Elections compared to Total Votes

f, ax = plt.subplots(figsize=(15,6))

sns.set_color_codes("pastel")
sns.barplot(y="Election", x="Total", data=Electiondf,
            label="Total", color="g")

sns.set_color_codes("muted")
sns.barplot(y="Election", x="Republican Votes", data=Electiondf,
            label="Republican Votes", color="r")

ax.legend(ncol=2, loc="lower right", frameon=True)
ax.set(ylim=(0, 14), ylabel="",
       xlabel="Republican Votes in Elections")
sns.despine(left=True, bottom=True)

Regardless of political opinions, it is evident that Republican participation in the Phildelphia area is significantly overshadowed by the Democratic party and has generally a weaker base in the city. The point of the chart is to visualize that minority vote in each election. The chart measures the amount of people who voted in primary elections, general and municipal, while also showing what portion is Republican. While this does not guarantee any election forecast, what these results can reasonably conclude is that there is a low amount of registered Republicans that vote in primaries compared to other parties.

Altair Chart 1

Pie Chart of Election Participation Overall

alt.Chart(Electiondf, title="Voting Comparisons by Election").mark_arc(innerRadius=50).encode(
    theta="Total",
    color="Election:N",
)

The pie chart helps us conclude the amount of voter participation in each election year.

We notice two major changes over time from the chart. The first is that voter turnout decreased from 2015 to 2018 generally, and peaked at 2016.

The second is that the largest declines in turnout were from municipal elections, specifically the years 2015 and 2017.

Altair 2

Voter Participation in Elections, Visualized Through Bar Chart

#altair 2
alt.Chart(Electiondf, title="Voting Comparisons by Election (Bar Form)").mark_bar().encode(
    x="Election",
    y="Total",
).properties(
    width=alt.Step(40),
)

The bar graph shows the same thing as the pie chart but better compares individual years to one another. We notice here that voter turnout has been declining overall in both types of elections ever since 2016.

Altair 3

Scatter Plot, Democrats vs Republicans in 9 Sampled Precincts

Here we look at 9 sampled precincts, specifically where Republicans accumulated their largest number of total votes.

Election2018 = pd.read_csv("./Data/voters2018top9.csv")
Election2018

	Precinct	Sum of REPUBLICAN	Sum of DEMOCRATIC
0	3505	69	118
1	4503	81	118
2	4520	71	99
3	4524	69	89
4	5824	77	148
5	5841	90	142
6	6311	86	180
7	6520	87	255
8	6617	81	24
9	6646	69	58

#altair  3
brush = alt.selection_interval()
alt.Chart(Election2018).mark_point().encode(
    x='Sum of DEMOCRATIC',
    y='Sum of REPUBLICAN',
    color=alt.condition(brush, 'Precinct', alt.value('grey')),
).add_params(brush)

This chart shows precincts where Republicans accumulated the most votes in their primaries. We notice in one precinct, 6617, that there was larger participation by Republicans than by Democrats. This comparison can help us predict future election results in this parrticular precinct. Meanwhile, other precincts still have a heavy Democratic lean in terms of votes.

Altair Extra Credit

Population Pyramid of Planet X

This dataset provides us the population of an unknown planet we will call Planet X, here we are going to examine how its population has grown over the past 100 years. Over the time-span, you can examine the baby boomer generation gradually escalating up the population pyramid, and there is a noticably lower birth rate that follows the generation. This top-heavy pyramid indicates an aging population with a lower birth rate and potentially demographic challenges in the future.

(The dataset originially is U.S populations from 1850, for the proof of concept, I will be changing as many variables as possible while also explaining the function of each line.)

Link to Original: https://altair-viz.github.io/gallery/us_population_pyramid_over_time.html

import altair as alt
from vega_datasets import data

source = data.population.url
source

'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/population.json'

slider = alt.binding_range(min=1900, max=2000, step=10) 
#creates the transformable portion of the chart, the highest and lowest values, and the intervals ("steps"), in each.

years = alt.selection_point(fields=['year'], name='Select',  bind=slider, value={'year': 1950}) 
#The 'years' variable becomes the decider on which chart to present, the parameters here are your field (what column to show), name (optional but adds more aesthetic to the slider, removing this
#adds the paramter number. To make it the most intuitive for the user, I changed it to "select". `bind=slider` is the most important because that creates the slider, and the value sets the starting year.
# For the example, I set it to 1970 as a middle number. 

base = alt.Chart(source).add_params(
    years
).transform_filter(
    years
).transform_calculate(
    species=alt.expr.if_(alt.datum.sex == 1, 'Minions', 'Smurfs')
).properties(
    width=250
)
#the base variable sets the physical bars of the two populations. Having the base alone will only show the bars of sex 1, in this case 'Minions'. We then have to define the species because the data only sees them as sex 1 and sex 2. 
#species uses an altair expression "if". Because we aren't using pandas or any other package, the code for each package tends to have syntax that varies slightly from one another.
#if this expression was in pandas for example, we'd use `pd.loc[species['sex'] == 1] = 'Minions'
#                                                        `pd.loc[species['sex'] != 1] = 'Smurfs'

color_scale = alt.Scale(domain=['Minions', 'Smurfs'],
                        range=['#ede72d', '#8acaf2'])
#sets a color scale for the values, domain is all the "values" to be marked on the graph, and range are the selected colors. If this were numeric, the domain would be a single variable and the range would be `alt.Gradient` from the altair library.

left = base.transform_filter(
    alt.datum.species == 'Smurfs'
).encode(
    alt.Y('age:O').axis(None) #Specfies ordinal data, this can work with Q because the ages are numbers but the bars get thin and the graph is not as legible. We need O because that letter tells the computer this is ordinal data.
                                #age acts as a category rather than a measurement in this scenario. 
        .sort('descending'), #least to greatest in terms of population
    alt.X('people:Q') #Q as in quantitive, needed to specify the amount of people per "category" of age. 
        .title('Total Smurfs')
        .axis(values=[0,4000000,8000000,12000000])
        .sort('descending'), #align the population label with the rest of the graph
    alt.Color('species:N').scale(color_scale).legend(None) 
).mark_bar().properties(title='Smurfs')
#This creates a graph of its own, showing the population and age of smurfs. 

age_axis = base.encode(
    alt.Y('age:O').axis(None)
    .sort('descending'), #least to greatest in terms of population
    alt.Text('age:Q'),
).mark_text().properties(title='Age', width=25)
#This graph acts as its own, not displaying anything but the age as categories. Playing around with age as Ordinal or Quanititative can be non-binding because they're numbers, but requires consistency between the left and right graph. 

right = base.transform_filter(
    alt.datum.species == 'Minions'
).encode(
    alt.Y('age:O').axis(None)
    .sort('descending'),
    alt.X('people:Q')
    .axis(values=[0,4000000,8000000,12000000])
    .title('Total Minions')
    .sort('ascending'),
    alt.Color('species:N').scale(color_scale).legend(None)
).mark_bar().properties(title='Minions')

PopPyr = alt.concat(age_axis, left, right, age_axis, spacing=0, title =("Population Pyramid of Planet X"))
PopPyr