Mastering the Art of Visualization: Python and Seaborn Unleashed
Hey there! Welcome to my blog where I post about my journey as a self-taught developer. You can find my GitHub by clicking HERE.
AI Assistance Disclosure: As a writer with over a decade of experience in programming and a deep understanding of the subject matter, I have leveraged AI technology to enhance the clarity and articulation of this content. While the core knowledge and insights are my own, AI assistance was employed to refine grammar and presentation, ensuring the highest quality reading experience.
Seaborn is not just another data visualization library in the Python world; it’s a powerful tool that streamlines the process of creating insightful visualizations, making it particularly useful for web developers experienced in various languages and frameworks who are new to technologies such as Seaborn.
This tutorial assumes Python and the appropriate dependencies (Seaborn..) are installed correctly.
What is Seaborn?
Seaborn is designed to seamlessly integrate with your existing Python data tools, building on top of Matplotlib and closely connecting with Pandas data structures. As a web developer, you’ll appreciate its compatibility with your existing Python workflow.
Seaborn tries to make hard things very easy to do — Michael Waskom
If you do not yet know, Michael Waskom is the creator of Seaborn.
Seaborn simplifies data exploration and comprehension. Its plotting functions operate on dataframes and arrays containing entire datasets, automatically handling semantic mapping and statistical aggregation to produce meaningful plots. This means you can focus on the essence of your data and its interpretation rather than getting bogged down in the minutiae of plot creation.
Let’s get our hands dirty
Script Example Below:
import seaborn as sns # Import Seaborn
import matplotlib.pyplot as plt # Import Matplotlib
# Load an example dataset
tips = sns.load_dataset("tips")
# Create a visualization
plot = sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size"
)
# Show the plot using Matplotlib
plt.show()
Script Result Below:
Alright, let’s pause for a moment. What is happening in the script above? And where in the world is the data coming from? Both valid questions.
Talking Tips!
The “tips” dataset is a commonly used example dataset in data visualization tutorials and examples. It contains information about tips left by restaurant customers and various attributes related to those tips, making it a suitable dataset for demonstrating Seaborn’s capabilities in creating different types of plots, particularly for exploring relationships between variables.
Here are some of the columns typically found in the “tips” dataset:
total_bill
: The total bill amount, including the cost of the meal and any additional charges.tip
: The tip amount left by the customer.sex
: The gender of the customer (e.g., Male or Female).smoker
: Whether the customer is a smoker (e.g., Yes or No).day
: The day of the week (e.g., Thur, Fri, Sat, Sun).time
: Whether the meal was lunch or dinner.size
: The number of people in the dining party.
Script Breakdown!
1. import seaborn as sns: This line imports the Seaborn library and gives it the alias `sns`. Seaborn is a powerful data visualization library for Python.
2. import matplotlib.pyplot as plt: This line imports the `pyplot` module from the Matplotlib library and gives it the alias `plt`. Matplotlib is another popular data visualization library in Python. In this code, we are using it to display the Seaborn plot.
3. tips = sns.load_dataset(“tips”): Seaborn provides several built-in example datasets, and this line loads the “tips” dataset using the `sns.load_dataset()` function. The dataset contains information about tips left in a restaurant, and it’s often used as an example dataset for data visualization.
4. plot = sns.relplot(…): This line creates a relational plot using Seaborn’s `relplot` function. The `relplot` function is used to create scatter plots and other types of relational plots. In this code, we are specifying various parameters to customize the plot, including the data, the x-axis and y-axis variables, columns for subplots (`col`), colors (`hue` and `style`), and point size (`size`).
5. plt.show(): Finally, this line uses Matplotlib to display the plot. The `plt.show()` function is called to open a window and show the Seaborn plot.
By adding the `plt.show()` line, you ensure that the plot is displayed correctly, even in environments like Visual Studio Code. Matplotlib is used as the backend for plot rendering, allowing you to view the Seaborn plot in a separate window or within your IDE.
Deeper Dive: Unveiling the power of visualization
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
# Generate synthetic financial data
np.random.seed(0)
# Calculate the start date as today's date minus 6 months
current_date = datetime.datetime.now()
start_date = current_date - datetime.timedelta(days=6*30)
# Creating date range for the past 6 months
date_range = pd.date_range(start=start_date, end=current_date, freq='D')
# Generate random customer transactions in dollars with an average ticket size of $10.00
average_ticket_size = 10
customers = np.cumsum(np.random.normal(average_ticket_size, 2, len(date_range)))
# Generate random merchant transactions in dollars with an average ticket size of $20.00
merchants = np.cumsum(np.random.normal(20, 3, len(date_range)))
# Generate random SaaS company transactions in dollars with an average ticket size of $15.00
saas_companies = np.cumsum(np.random.normal(15, 2, len(date_range)))
# Define fee categories
fee_categories = [
{'Capture Fee (%)': 2.9, 'Auth Fee (Cents)': 30},
{'Capture Fee (%)': 2.8, 'Auth Fee (Cents)': 15},
{'Capture Fee (%)': 2.75, 'Auth Fee (Cents)': 35}
]
# Apply fees to the transaction data
for fee_category in fee_categories:
for dataset in [customers, merchants, saas_companies]:
fees = (dataset * fee_category['Capture Fee (%)'] / 100) + fee_category['Auth Fee (Cents)']
dataset += fees
# Calculate the total amounts
total_customers = customers[-1]
total_merchants = merchants[-1]
total_saas_companies = saas_companies[-1]
total_fees = np.sum(fees) # Total fees for all transactions
# Create a DataFrame for the totals
totals_data = {
'Category': ['Customers', 'Merchants', 'SaaS Companies', 'Fees'],
'Total Amount ($)': [total_customers, total_merchants, total_saas_companies, total_fees]
}
totals_df = pd.DataFrame(totals_data)
# Create a bar plot
sns.set_style("whitegrid")
plt.figure(figsize=(12, 6))
# Define colors
colors = sns.color_palette('husl', 4)
# Plot the bars for each category
for i, category in enumerate(['Transaction Net', 'Transaction Net', 'Transaction Net', 'Transaction Net']):
data_col = totals_df['Category']
col_name = f"{category} {data_col[i]}"
total_amount = totals_df.iloc[i]['Total Amount ($)']
sns.barplot(data=totals_df, x='Category', y='Total Amount ($)', color=colors[i], label=col_name)
plt.text(i, total_amount, f"${total_amount:.2f}", ha='center', va='bottom', fontsize=12, color='black')
# Set the title, labels, and rotation
plt.title('Financial Details for Customers, Merchants, SaaS Companies, and Total Fees', fontsize=14)
plt.xlabel('Category', fontsize=12)
plt.ylabel('Amount ($)', fontsize=12)
plt.xticks(fontsize=10)
# Add a legend
plt.legend(fontsize=12)
# Show the plot
plt.show()
In this script, we generate and visualize financial data over the past six months to understand the transactions, fees, and totals for customers, merchants, and SaaS companies in a simulated financial ecosystem. The script begins by generating synthetic data, where we define the starting date as six months ago and create a date range spanning that period. Three distinct types of transactions are simulated, representing customer, merchant, and SaaS company interactions. These transactions reflect different average ticket sizes, i.e., the amount per transaction, each with its own characteristic.
Furthermore, the script introduces fee categories, accounting for authorization and capture fees, each with its own set of percentage and fixed cost. Fees are then applied to the transaction data, altering the total amounts accordingly. The script concludes by calculating and visualizing the total amounts for each category (customers, merchants, and SaaS companies) and the overall fees. The resulting bar plot visually represents the net transaction amounts for each category and the total fee amount, helping to analyze the financial dynamics of this hypothetical ecosystem. The combination of data generation, fee application, and data visualization enables us to gain insights into this financial scenario, making it a valuable tool for understanding and forecasting financial trends.
Unlocking Financial Insights: Explore the Dynamics of Synthetic Transactions and Fees
Still here? Cool, let’s dissect the last Python script. Lots of content to cover.
Final Result!
This script is a powerful forecasting tool tailored to the payments industry. It delves into the intricacies of transaction lifecycles, offering valuable insights into purchasing patterns. Its primary aim is to aid analysts in deciphering customer behavior, ultimately leading to more informed monetization strategies. By generating synthetic financial data over a 6-month period and applying transaction fees, the script empowers analysts to grasp the ebb and flow of financial activities. The resulting bar plot showcases transaction net amounts for customers, merchants, and SaaS companies, along with a distinct representation of total fees. This comprehensive visualization paves the way for a deeper understanding of payment trends and offers a strategic advantage in the realm of financial analysis.
DO NOT GET OVERWHELMED!
Import Libraries:
— The script begins by importing the necessary libraries: `pandas` for data manipulation, `numpy` for numerical operations, and `seaborn` and `matplotlib` for data visualization.
Generate Synthetic Data:
— It sets a random seed (to ensure reproducibility) and calculates the current date. It then calculates the start date, which is 6 months (approximately 180 days) before the current date.
— A date range for the past 6 months is created using `pd.date_range`.
Generate Synthetic Transaction Data:
— Random transactions are generated for customers, merchants, and SaaS companies.
— For each category, the `np.cumsum` function accumulates the transaction amounts. In this step, the code specifies an average ticket size for each category to control the magnitude of transaction amounts.
Define Fee Categories:
— Two fee categories are defined. Each category includes a capture fee as a percentage and an authorization fee in cents.
Apply Fees to Transaction Data:
— A loop iterates through each fee category and applies the specified fees to the transaction data for customers, merchants, and SaaS companies.
— The fees are calculated based on the given capture fee percentage and authorization fee in cents.
Calculate Total Amounts:
— The code calculates the total amounts for customers, merchants, SaaS companies, and total fees. It captures the final transaction amounts for each category and the total fees using the `[-1]` index, indicating the last element in each data series.
Create a DataFrame for Totals:
— A DataFrame named `totals_df` is created to store the total amounts for customers, merchants, SaaS companies, and fees. It is structured with two columns: ‘Category’ and ‘Total Amount ($)’.
Create a Bar Plot:
— The script uses Seaborn to create a bar plot. It sets the style and figure size for the plot and defines a color palette for the bars.
— A loop iterates through each category to plot the bars for ‘Transaction Net,’ using data from the `totals_df` DataFrame. It labels each bar with the respective category name and the total amount in dollars.
Set Plot Labels and Title:
— The script defines labels for the x and y-axes, as well as a title for the plot.
Display the Plot:
— The `plt.show()` function is called to display the generated plot.
This script provides a visual representation of transaction totals for different categories and the associated fees over the past 6 months, making it useful for financial analysis and reporting.
As always, I love learning and sharing my knowledge within the web development world. I hope this post helped someone and shed some light on a new strategy to help improve your code.
Happy Coding!