Transforming Several Dozen Excel Sheets into an Easily Queryable Database using Python and SQLite

    When I started my job last year I had no idea how to derive any insights from the data that I could access. However, after getting more familiar with my regular job responsibilities and the responsibilities of my team, I began to get some ideas on what I could do. I realized that everything our small office worked on was stored in a local network storage drive, where there were Excel sheets going back years containing sales data on everything that our Ecommerce department has shipped out. These sheets were just being stored in a folder unused, only serving as a record of previous years.

    So I decided that my first big project would be finding a way to utilize these Excel sheets to provide useful insight into our Ecommerce department's sales. Ideally I would be able to give my team regular reports on the performance of our products through various Ecommerce channels such as Amazon and Wayfair and then use this data to make demand projections for inventory purposes. I would accomplish this by automatically compiling the data from these Excel sheets using Python and then importing them into a SQLite database for later queries.

    The first step in this process was to take a quick look at the Excel sheets that I would be importing into the SQLite database. The rows that I would be importing from these Excel sheets would have values such as these: OID - order ID that is unique to each row (text value), DATE - date of shipment (date value), COMPANY - company that ordered the shipment (text value), PO - purchase order number of a shipment (text value), ITEM - SKU of the item shipped (text value), and QUANTITY - number of items shipped (integer/number value). 

    Knowing the structure of the rows that I want to import into a SQLite database would make it easy to get started. First I would write a python script that would create a SQLite database table with the same type of values as the rows I would be importing.

 

    Next I would go about uploading the Excel data into the SQLite database using a python script. Each of these Excel files are named with the following pattern: "1_Cheungs Internet Main [Year].xlsx" for their respective year and each file contains 12 sheets in the date pattern "mm-yy" for each month in the year. The uniformity of the Excel sheet names would make it easy for my Python script to search for them and append them to a Pandas dataframe.

     Once I have each Excel sheet appended to a Pandas dataframe, I import that dataframe into the SQLite database. 

    By running this script run automatically on a daily basis, my team has an up to date database that can be brought into  Power BI through an ODBC connection using this driver: http://www.ch-werner.de/sqliteodbc/. Ever since creating this simple database I have been regularly using this data for making reports on topics such as projecting the demand for certain items and looking into the seasonality of certain items.

Comments

Popular posts from this blog

Does the data for LA Metro's A Line show that some stations can reasonably be skipped for an express service from Long Beach to Los Angeles?