Ted Giraitis-Quaye

Welcome To My Blog!

I will utilize this medium to discuss and document the latest in digital marketing. Both in my career via case studies and trending topics I could add value to. Check out some content below!

TGQ Marketing Logo

Python Script Increases Trend Analysis Efficiency

What digital marketer likes to do repetitive work that could be broken down into steps? This is where Python comes in handy to Automate The Boring Stuff. When completing a monthly trend analysis report, I did just that. Thus, increasing work speed efficiencies and allowing me to invest my time into higher profit center activities.

Results

  • 5 hours of time saving a month
  • Increased volume of sales materials for representatives

Problem

  • Time consuming repetitive trend report

Solution

  • Leveraged Jupyter Notebook to write Python script
  • Used Pandas Python library to manipulate Excel spreadsheet data frames

Challenge: Process Documentation

In order automate this process I wrote down all of the repeatable steps in order to develop them into code form.

1. Import Data

Clean Data Python Code In Jupyter Notebook

The first step that I took was to import the correct dataframe to analyze from my local host. I did this by exporting the data from our server with a third party software(Stonefield). I set up a consistent naming convention so that I could just export, overwrite and run the program when we needed the report.

In [6]: I am importing the Python libraries to manipulate dataframes and to do mathematical, arrays and matrices. Pandas does most of the heavy lifting in the program, reading and manipulating the spreadsheets. It is best practice to set these libraries to shorthand variables for easier code readability.

In [39]: At this point in the program I imported the Item Invoice History report to find the most popular items from the past month. I created a df(stands for dataframe) variable to hold the local files from the data export. Then, I utilized the Pandas library to read this file through a built in method. It is also extremely important that you have the correct path or you will get an error message from not being able to find the file. You can get the preliminary path by calling pwd. Lastly, I have to set a ISO 8859-1 single-byte encoding for the output file.

2. Clean Data

In [40]: Next, I am dropping irrelevant columns from the raw report that wouldn’t end up on the final PDF. I call the df variable that I created earlier and use the drop method on it. The parameter that I feed the method is columns. This is so that I can name each column in the dataframe that is unnecessary. I’m printing each line of code to see if I was getting the proper output in the debugging process(results omitted for data security).

3. Retrieve Top Items

Retrieving Top Items In Dataframe

In [41]: Now, I set a variable called abc to sort the item skus alphabetically. It uses the sort_values method on the dfDrop and takes the parameters of ‘Item No.’ and ascending is set to True.

In [42]: Next, I did a subtotal equivalent, using the groupby & count method. Now at each change in ‘Item No.’ it will sum the count. This is why the data was sorted alphabetically.

4. Formatting Data

Sorting Dataframe With Python Methods

In [43]: The next step in the process is too sort all item skus by their title or Web Description in this case. Then, I utilize the head method with the parameter of 25 to grab top 25 skus.

Merge Dataframes

Merging Dataframes in Python Jupyter Notebook

In [44]: In this input I had to merge the previous abc dataframe and the new top25 one. This is because I needed the columns of price, description and unit of measure(UOM). I used the merge method with the parameters of top25 and merged them on unique item numbers.

Clean Merged Dataframes

In [45]:When the dataframes got merged this brought in excess columns such as Web Description_y, Price_y, and UOM_y. I dropped them via the drop method.

In [46]: Here I am dropping all duplicate iterations of item numbers. I set this as variable finalItems and used the drop_duplicates method. Subset is watching the Item No columns, keep is set to the string ‘first‘.

2nd Data Import

Jupyter Notebook Data Import With Pandas

In [47]: I changed the columns names to match the original dataframe. In Jupyter Notebook you often have to set 2 assignment signs on variable renames like this.

In [48]: I am using the same method that I did before to grab the Item Invoice History to retrieve the piece prices of the items. Then, I changed the column names to have no spaces to avoid syntax errors.

Piece Price Merge

In [49]: After that, I created a merge variable to merge the finalItems dataframe and PCprice dataframe by the Item_Num column. This is so that each item sku would have their correct piece price.

In [50]: In this section I have to change the data type of of the Item_Num because it is being read as an integer. However, in order to merge the datatypes need to be all the same. So I used the astype(str) method to convert it. Finally, I drop the duplicate item skus of merged piece prices.

Output File

Final Output File

In [51]: Lastly, I drop the extra columns using the drop method.

In [38]: The output variable holds the final file placement for the local host. This automatically makes the file and saves a massive amount of time it would take me to do this process.

Conclusion

In conclusion leveraging programming skills to mix into marketing objectives allows me to focus on what really matters. I am just scratching the surface with Python and data science. I will continue to find bottlenecks in business processes and improve efficiencies!

Leave a Reply