Basic Sales Data EDA
Here I explore a sales dataset from Kaggle using Python for data preprocessing and visualisation. The goal is to uncover patterns and trends in the sales data through basic exploratory data analysis.
I started by cleaning the data, removing unnecessary columns such as contact names and exact order dates, since the relevant time information is already available in the Month_ID
, Quarter_ID
, and Year_ID
columns. This makes the dataset more manageable and focused.
For the purpose of this work, I focussed on columns SALES, QTR_ID, MONTH_ID, YEAR_ID and STATUS.
Basic Visualisation
The graph titled “Monthly Sales by Year” illustrates sales trends across different months for the years 2003, 2004, and 2005. Key observations from the graph reveal significant insights into the business’s sales performance:
1. Incomplete Data for 2005 — Notably, the dataset for 2005 is incomplete, as the sales data sharply terminates at month 5. This truncation restricts a full analysis of the year’s performance, particularly impacting our understanding of sales trends in the latter half of the year.
2. Growth from 2003 to 2004 — There is a clear upward trend in sales from 2003 to 2004, indicating positive business growth. This pattern is marked by progressively higher peaks and an overall increase in sales throughout 2004 compared to the same months in 2003.
3. Promising Start for 2005 with Early Drop — Despite the incomplete data, the initial months of 2005 suggest a continued upward trend in sales compared to previous years. However, a significant dip in sales is observed in April 2005. This deviation is noteworthy as it interrupts what could have been a consistent growth pattern based on earlier months.
4. Q3 Sales Increase — The third quarter, particularly noticeable in 2004, shows a sharp rise in sales. This surge could likely be attributed to seasonal marketing campaigns or promotions, such as those timed around back-to-school events or early holiday shopping, which tend to boost consumer spending.
Why the drop in sales in 04/2005?
The histogram titled “Sales by Status for April Across Different Years” presents a clear visual depiction of the sales status breakdown for the month of April over the years 2003, 2004, and 2005. The histogram reveals several critical insights:
- Consistent Shipment in Early Years: In 2003 and 2004, all recorded sales were successfully shipped, indicating smooth operations and effective fulfilment of orders during these periods.
- Emergence of On-Hold Orders in 2005: A significant shift occurs in 2005, where the data shows a substantial portion of sales being put on hold, surpassing the value of shipped orders. This change marks a stark contrast from the previous years and suggests operational or logistical challenges that impacted the ability to fulfil orders.
- Impact on Business Operations: The presence of a high volume of on-hold orders in 2005, coupled with the lack of data beyond April, raises concerns about potential disruptions in the business’s operations. It is plausible to speculate that these disruptions could have led to the business closing, as the on-hold orders might represent unfulfilled commitments at the time of closure.
This introductory exploration of basic sales data using exploratory data analysis (EDA) sets the groundwork for understanding key patterns and fluctuations in sales performance. Through careful preprocessing and focused visual analysis, I have uncovered significant trends and raised questions about operational continuity, particularly in 2005. Looking ahead, further detailed analyses could provide deeper insights into the operational or logistical challenges suggested by the data. Such investigations might focus on pinpointing specific periods or products that experienced issues, thereby offering a clearer perspective on how and why sales performance varied as dramatically as it did. Additionally, a more granular approach could explore the root causes behind the status changes observed in the sales data, potentially leading to actionable strategies to mitigate such issues in the future. This foundational EDA is just the beginning of a deeper dive that could significantly enhance operational strategies and business decision-making processes.