Tableau Prep + Python 101
In this post, I will show you how to integrate Python scripts to your Tableau Prep flow. If you’re curious about how to integrate Python to Tableau Desktop, then go to my previous article here.
Tableau Prep ≠ Tableau Desktop
Yes, me too. Until recently, I also was not aware of its existence. So, what is Tableau Prep? It is a data preparation tool allows users to cleanse, aggregate, merge, and do other preparations to their data before it goes to Tableau Desktop for analysis.
Talk Python to me 🐍
That being said, you can also use Python in Tableau Prep to further prepare your data and here I’ll show you how to do it.
First, you have to be running tabpy
on your server (click here to see how).
Next, you’ll load your data as you normally would and then click on Script.
Once you do that, click on Tableau Python (TabPy) Server and put in your credentials. If you’re using your local machine, then just do Server: localhost
and Port: 9004
and leave the rest empty.
Then you’ll click on browse to find your Python script and specify the function name where you’re doing all the transformations.
What about the script?
Creating the script is somewhat cumbersome and I hope Tableau Prep improves this in future releases. But for now, here’s what you have to keep in mind:
- You can have as many functions as you wish, but be sure to have a function that will return your desired data frame output. This function is the one you’ll call in the Function Name field on Tableau Prep. In this case, I called it
my_fun
. - If you’re going to add more columns than what you originally had, you have to create a function called
get_output_schema
where you specify for all your column their types exactly as shown below. I know, annoying.
import pandas as pd
import numpy as np
def sqrt_root(number):
return np.sqrt(number)
def my_fun(df):
df['SqrtRevenue'] = df['Revenue'].apply(lambda n: sqrt_root(n))
return df
# Create Schema
def get_output_schema():
return pd.DataFrame({
'Store': prep_string(),
'Revenue': prep_int(),
'SqrtRevenue': prep_decimal()
})
- TRICK:
In case you’re only adding a couple of columns, here’s a trick so you don’t have to create theget_output_schema
function. Before loading your Script on Tableau Prep, add a Clean Step where you create a new column (calculated metric) with just a “0” in it, and name it exactly as your new column on the script. Tadah!
That’s all for now, thank you for reading! Feel free to drop questions or comments in the box below if you’d like to learn something new about Tableau!