What digital marketer likes to do repetitive work that could be broken down into steps? This is where Python comes in handy to Automate The Boring Stuff. When completing a monthly trend analysis report, I did just that. Thus, increasing work efficiencies and allowing me to invest my time into higher profit center activities.
- 5 hours of time saving a month
- Increased volume of sales materials for representatives
- Time consuming repetitive trend report
- Leveraged Jupyter Notebook to write Python script
- Used Pandas Python library to manipulate Excel spreadsheet data frames
Challenge: Process Documentation
In order automate this process I wrote down all of the repeatable steps in order to develop them into code form.
1. Import Data
The first step that I took was to import the correct dataframe to analyze from my local host. I did this by exporting the data from our server with a third party software(Stonefield). I set up a consistent naming convention so that I could just export, overwrite and run the program when we needed the report.
In : I am importing the Python libraries to manipulate dataframes and to do mathematical, arrays and matrices. Pandas does most of the heavy lifting in the program, reading and manipulating the spreadsheets. It is best practice to set these libraries to shorthand variables for easier code readability.
In : At this point in the program I imported the Item Invoice History report to find the most popular items from the past month. I created a df(stands for dataframe) variable to hold the local files from the data export. Then, I utilized the Pandas library to read this file through a built in method. It is also extremely important that you have the correct path or you will get an error message from not being able to find the file. You can get the preliminary path by calling pwd. Lastly, I have to set a ISO 8859-1 single-byte encoding for the output file.
2. Clean Data
In : Next, I am dropping irrelevant columns from the raw report that wouldn’t end up on the final PDF. I call the df variable that I created earlier and use the drop method on it. The parameter that I feed the method is columns. This is so that I can name each column in the dataframe that is unnecessary. I’m printing each line of code to see if I was getting the proper output in the debugging process(results omitted for data security).
3. Retrieve Top Items
In : Now, I set a variable called abc to sort the item skus alphabetically. It uses the sort_values method on the dfDrop and takes the parameters of ‘Item No.’ and ascending is set to True.
In : Next, I did a subtotal equivalent, using the groupby & count method. Now at each change in ‘Item No.’ it will sum the count. This is why the data was sorted alphabetically.
4. Formatting Data
In : The next step in the process is too sort all item skus by their title or Web Description in this case. Then, I utilize the head method with the parameter of 25 to grab top 25 skus.
In : In this input I had to merge the previous abc dataframe and the new top25 one. This is because I needed the columns of price, description and unit of measure(UOM). I used the merge method with the parameters of top25 and merged them on unique item numbers.
Clean Merged Dataframes
In :When the dataframes got merged this brought in excess columns such as Web Description_y, Price_y, and UOM_y. I dropped them via the drop method.
In : Here I am dropping all duplicate iterations of item numbers. I set this as variable finalItems and used the drop_duplicates method. Subset is watching the Item No columns, keep is set to the string ‘first‘.
2nd Data Import
In : I changed the columns names to match the original dataframe. In Jupyter Notebook you often have to set 2 assignment signs on variable renames like this.
In : I am using the same method that I did before to grab the Item Invoice History to retrieve the piece prices of the items. Then, I changed the column names to have no spaces to avoid syntax errors.
Piece Price Merge
In : After that, I created a merge variable to merge the finalItems dataframe and PCprice dataframe by the Item_Num column. This is so that each item sku would have their correct piece price.
In : In this section I have to change the data type of of the Item_Num because it is being read as an integer. However, in order to merge the datatypes need to be all the same. So I used the astype(str) method to convert it. Finally, I drop the duplicate item skus of merged piece prices.
In : Lastly, I drop the extra columns using the drop method.
In : The output variable holds the final file placement for the local host. This automatically makes the file and saves a massive amount of time it would take me to do this process.
In conclusion leveraging programming skills to mix into marketing objectives allows me to focus on what really matters. I am just scratching the surface with Python and data science. I will continue to find bottlenecks in business processes and improve efficiencies!