Name: Lei Yang
The goal of the project is to make the hospitality business more competitive by using data to help make decisions. Li (2018) says that companies that rely on data are much more likely to get better at making decisions. Maximizing revenue and guest happiness are the most important things in the hotel business. This study uses information about hotel reservations to get useful insights and find growth possibilities. We'll look at booking patterns, guest tastes, and cancellation rates to see if there are any patterns across all types of hotels. A predictive model will also be made to predict cancellations of reservations, which will help hotels improve the way they handle reservations. The end goal is to improve guest experiences and streamline operations by incorporating advanced analytics and machine learning into Revenue Management Systems (RMS). This is because big data has changed the industry so much (Provost, 2013). The goal of this method is not only to guess how customers will act in the future, but also to make the business better and make customers happier.
This data was obtained from a comprehensive collection of hotel bookings, spanning across three years from Kaggle, Those datasets showcast the intricate dynamics of room demand for two distinct types of hotels.This information talks about datasets that encompass detailed records on room reservations for a "City Hotel" located in the bustling city of Lisbon, and a "Resort Hotel" situated in the serene vacation destination of the Algarve. The datasets meticulously record various aspects of hotel bookings over the years 2015, 2016, and 2017, including but not limited to the date the reservation was made, the demographic details of the guests (number of adults, children, and/or babies), the duration of their stay (both weekend and weekday nights), and the requirement for car parking spaces.
The ETL (Extract, Transform, Load) process begins by combining datasets from the years 2015, 2016, and 2017 into a single dataframe, enabling comprehensive analysis across these periods. Transformation includes calculating the Average Daily Rate (ADR) for each year and preparing data for visualization by filtering bookings for 2016, converting month names for proper ordering, and counting bookings by hotel type and month. Finally, the transformed data is ready for loading into analytical tools for further analysis and visualization, exemplified by the calculation of average ADR per year and the preparation of monthly booking counts for 2016.
As part of our plan to work together, we meet once a week on Zoom to talk about the project's direction, set clear goals, and offer support to each other. In the time between these talks, we'll keep the conversation going by sending each other text messages to get real-time updates on how the work is going. We've set up a live, shared schedule in Google Docs so that everyone can see it at any time and keep track of our to-dos. We've also set up a shared Github repository as the hub for all of our code projects. This makes it easy for us to keep our work in sync. The goal of this approach is to get us all to work faster and more efficiently so that we can reach our project goals on time.
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
data_2015 = pd.read_csv('hotel_bookings_2015.csv')
data_2016 = pd.read_csv('hotel_bookings_2016.csv')
data_2017 = pd.read_csv('hotel_bookings_2017.csv')
# Combining the datasets for some analyses
combined_data = pd.concat([data_2015, data_2016, data_2017])
# b. Calculating one interesting statistic: Average Daily Rate (ADR) per year
average_adr_per_year = combined_data.groupby('arrival_date_year')['adr'].mean()
# c. Preparing data for one graph: Count of bookings by month for a selected year (2016)
# Filter data for 2016 to visualize
data_2016 = combined_data[combined_data['arrival_date_year'] == 2016]
# Create a column for month names to help with ordering in the plot
data_2016['arrival_date_month_name'] = pd.to_datetime(data_2016['arrival_date_month'], format='%B').dt.month_name()
# Count the number of bookings for each hotel type per month
monthly_bookings_2016 = data_2016.groupby(['arrival_date_month_name', 'hotel']).size().unstack()
# Ordering the data by month
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
monthly_bookings_2016 = monthly_bookings_2016.reindex(months_order)
average_adr_df = average_adr_per_year.reset_index()
average_adr_df.columns = ['Year', 'Average_ADR']
average_adr_df
Year | Average_ADR | |
---|---|---|
0 | 2015 | 87.178515 |
1 | 2016 | 98.325863 |
2 | 2017 | 114.637950 |
monthly_bookings_2016
hotel | City Hotel | Resort Hotel |
---|---|---|
arrival_date_month_name | ||
January | 1364 | 884 |
February | 2371 | 1520 |
March | 3046 | 1778 |
April | 3561 | 1867 |
May | 3676 | 1802 |
June | 3923 | 1369 |
July | 3131 | 1441 |
August | 3378 | 1685 |
September | 3871 | 1523 |
October | 4219 | 1984 |
November | 3122 | 1332 |
December | 2478 | 1382 |
The average daily rates (ADR) for bookings across the three years show a clear trend of increase:
This upward trend in the ADR suggests that the pricing for hotel rooms has been increasing year over year, reflecting potentially higher demand, improved hotel offerings, or inflationary pressures affecting the hospitality sector.
For the graph portion, we have prepared the data for the monthly booking counts for each hotel type in 2016.
import matplotlib.pyplot as plt
import seaborn as sns
# Setting the aesthetic style of the plots
sns.set_style("whitegrid")
# Plotting the monthly bookings for each hotel type in 2016
plt.figure(figsize=(14, 7))
monthly_bookings_2016.plot(kind='bar', figsize=(14,7), width=0.8)
plt.title('Monthly Bookings for Each Hotel Type in 2016', fontsize=16)
plt.xlabel('Month', fontsize=14)
plt.ylabel('Number of Bookings', fontsize=14)
plt.xticks(rotation=45, fontsize=12)
plt.yticks(fontsize=12)
plt.legend(title='Hotel Type', fontsize=12)
plt.tight_layout()
plt.show()
<Figure size 1400x700 with 0 Axes>
From the graph, we can observe several trends: