In this article, we will design and implement an online hotel reservation booking system. A similar solution with minor changes could also be used for flight reservations, concerts and events, or reserving unique Airbnb/Vrbo stays. We will start with user requirements and functional edge cases to consider. We will then dig into the backend services and APIs, persistence, transactionality, user and backend flows, the two frontend web applications for users and hotel managers respectively, and conclude with performance and scaling optimizations.

Contents:

  1. Requirements
  2. Backend Services, APIs, and Databases
  3. Frontend Web Applications (Guest and Admin)
  4. User and Backend Flows
  5. Optimizing Performance and Scaling

Requirements

All reservation systems, whether for hotels, flights, or even event tickets, have several features in common. Users can search for inventory, view details, create a reservation, and review past reservations. While searching, users should only see available inventory for the dates they select. Once reserved, the inventory should be reduced. Optionally during checkout a hold may be placed on the inventory to prevent another user from consuming it in parallel - no one likes to be in the process of checking out and have their inventory taken. If a user fails to checkout within an allotted time then the inventory should become available again.

Turning our attention to hotel booking systems in particular, hotels typically group individual rooms together under a room-type and overbook some percentage accounting for cancellations. We will design this to be configurable by hotel management.

User Requirements:

  1. Search rooms by hotel property, date, room type, and availability
  2. View hotel property details
  3. View room type details
  4. Create room reservation
  5. Pay for a reservation
  6. View upcoming and recent reservation details
  7. Dynamic prices based on available inventory, increasing as inventory becomes low; extensibility for future integration with machine learning (ML) pricing models which account for parameters such as seasonality
  8. Support a configurable percentage of overbooking of rooms assuming cancellations
  9. Support 10,000 hotels, 10 million rooms, and 70% average daily reservation fill
  10. Admin portal for adding/removing/updating room types

A key element in this design is the need for mutual exclusion when a guest is reserving a room, as well as preventing double-bookings from a single user. We will also need to consider database access patterns and indexes which are performant. The client-facing frontend web application and search will need lower latency to maintain a great guest experience; different pages and functionality will have varying levels of traffic volume and acceptable latency. For example, the search and listing pages will have high throughput and low latency requirements, whereas the reservation-booking transaction will have far lower throughput and more acceptance for a few seconds of latency while making all the necessary constraint checks.

Additional topics like payment processing, internationalization/translations, and multi-region failover are covered in separate articles.

Backend Services, APIs, and Databases

We design our services following Domain Driven Design principles and cellular architecture patterns using the Hotel domain and cell. In the future, this domain/cell would be broken into multiple independent cells as the team(s) scale. The Hotel Gateway Service is the primary service through which external requests will flow after going through an upstream Edge Router. The Edge Router will protect the underlying services from DDoS attacks and provide user authentication. The Hotel Gateway Service authorizes users based on their roles and then coordinates calls to the backend internal services. We design five internal micro-services within the Hotel cell: (1) the Hotel Service, (2) the Hotel Pricing Service, (3) the Hotel Payment Service, (4) the Hotel Guest Service, and (5) the Hotel Reporting Service. These are stateless horizontally scalable services with primary data externalized within the persistence layer. Inter-service requests leverage remote procedure calls (RPC) and validate requests are authorized using JSON Web Tokens (JWTs). Each API endpoint is versioned independently for the greatest flexibility as schemas evolve with changing business requirements. Compute infrastructure leverages containers for quick portable CI/CD deployments with cross-environment parity.

Services:

  1. [External] Edge Router - network firewall, DDOS protection, block list, authentication
  2. [External] Hotel Gateway Service - external REST APIs, and top-level permissions, calls to internal domain services, TLS public-certificate termination resumed with private certs
  3. [Internal] Hotel Service - hotel property details, room type details, reservations, and inventory
  4. [Internal] Hotel Pricing Service - dynamic pricing and rates for rooms
  5. [Internal] Hotel Payment Service - payment processing (details covered in subsequent article)
  6. [Internal] Hotel Guest Service - guest details
  7. [Internal] Hotel Reporting Service - collects reservation events and generates reports, maintains historical reservation data

Architecture Diagram

The scale of this system is sizable but not overly massive. Given we only need to support 10 million hotel rooms with 70% fill, in the worst case if users all are online at the same time throughout the day, that is approximately 81 transactions per second (10m * 0.7 / 86400 seconds-per-day). Let’s look at the access patterns we need to support and estimate call volume, veering on the higher-side.

Access patterns:

  1. Search rooms by hotel property, date, room type, and availability (500 TPS, read)
  2. View hotel property details (500 TPS, read)
  3. View room type details (500 TPS, read)
  4. View upcoming and past reservations (150 TPS, read)
  5. Create a reservation (81 TPS, write)
  6. Pay for a reservation (81 TPS, write)
  7. Update or cancel a reservation (40 TPS, write)

The access pattern is clearly read-heavy (1650 TPS) with far fewer writes (202 TPS) and the data can naturally be modeled in a relational manner. We also require atomic transactions when creating reservations. The ACID properties and the nature/size of this data fits a relational database well. We will go with MySQL on AWS Aurora, though other relational databases like Postgres could also be considered.

Each of our micro-services are designed to be stateless for easy horizontal scaling and therefore the database will likely be the bottleneck as the system scales. We have three scaling improvements we will design: (1) sharding the database by Hotel-Id, (2) implementing TTL’s (time-to-live) to remove old reservation data after 7 days which can be kept in archived cold storage within the Reporting Service, and (3) leveraging read-replicas for data which can afford eventual-consistency (e.g. not when creating reservations). In a later section we discuss in-memory caching as well.

Hotel Gateway Service

  • APIs
    • Search
      • GET:/hotel/search/v1?{propertyId}&{dateStart}&{dateEnd}&{roomType} - search for rooms
    • Hotel Property
      • POST:/hotel/property/v1 - create new hotel property location (admin)
      • GET:/hotel/property/v1/{uuid} - view hotel property details
      • PUT:/hotel/property/v1/{uuid} - update hotel property details (admin)
    • Hotel Room Type
      • POST:/hotel/room/v1 - create new room type (admin)
      • GET:/hotel/room/v1/{uuid} - view room type details
      • PUT:/hotel/room/v1/{uuid} - update room type details (admin)
    • Hotel Reservation
      • POST:/hotel/reservation/v1 - create a new reservation
      • GET:/hotel/reservation/v1 - view a list of reservations
      • GET:/hotel/reservation/v1/{uuid} - view a reservation
      • PUT:/hotel/reservation/v1/{uuid} - update or cancel a reservation
    • *Hotel Payment (details covered in a subsequent article)
      • POST:/hotel/payment/v1 - submit payment for a reservation
      • GET:/hotel/payment/v1/{uuid} - get payment details
    • Hotel Guest
      • POST:/hotel/guest/v1 - create new guest profile
      • GET:/hotel/guest/v1/{uuid} - get guest details
      • PUT:/hotel/guest/v1/{uuid} - update guest details
    • Hotel Reporting
      • POST:/hotel/report/v1 - generate a new report async (admin)
      • GET:/hotel/report/v1 - view list of generated reports (admin)
      • GET:/hotel/report/v1/{uuid} - view a generated report or its status (admin)

Hotel Service

  • APIs
    • GetAvailableRooms
    • GetHotelDetails, UpdateHotelDetails
    • GetHotelRoomTypeDetails, UpdateHotelRoomTypeDetails
    • GetHotelReservations, GetHotelReservation, UpdateHotelReservation
  • See table schema in the Appendix below

Hotel Guest Service

  • APIs
    • GetGuestDetails, UpdateGuestDetails, CreateGuestProfile
  • See table schema in the Appendix below

Hotel Pricing Service

  • APIs
    • GetPrice
  • See table schema in the Appendix below

Hotel Payment Service

  • APIs
    • SubmitPayment
    • GetPaymentDetails
  • Payment processing details are covered in a subsequent article

Hotel Reporting Service

  • APIs
    • GetReport, CreateReport, UpdateReport

Hotel Tables

Database tables and schemas are detailed further below in the appendix.

Frontend Guest and Admin Web Apps

We will design two web applications for this solution. The guest-facing primary web application will be server-side rendered for optimal search engine indexing and SEO. A backend server will serve the webpages and a CDN will be leveraged such as CloudFlare, Fastly, Akamai, or AWS Cloudfront. We will use React for the frontend interactivity, coordinating the async AJAX requests to the backend Hotel Gateway service, and maintaining the frontend state. The admin web application will have far lower traffic and has no SEO needs. It will not need a CDN and can be a purely frontend-application stored in S3 following the Single-Page-Application (SPA) model. It too will use React.

For those curious about internationalization and translations see: Software Architecture - Translations Service & Clients

User and Backend Flows

Workflow - Searching and Viewing Hotel Room Details

The first step for users looking to book a hotel room is viewing the hotels primary website landing page and searching for available rooms by specific hotel location, room type, and date. The search request will flow through the edge router and Hotel Gateway Service to the Hotel Service where the search is executed. Oftentimes search is implemented with ElasticSearch atop Lucene, but for simplification and given the basic search capabilities and not massive dataset we are supporting, we will start simply with querying the database using carefully designed indexes. The database query will leverage the RoomType and RoomInventory tables. Only inventory below the overbooking threshold will be shown. The search results are returned back to the user and populated in the user-facing web frontend. Further below, we enhance this with an in-memory cache for all eventually-consistent reads including search, greatly reducing read load on these tables.

Search Workflow

Workflow - Booking a Reservation

The most interesting workflow of this solution is the reservation workflow. When a user selects a room and begins the reservation process we will want to mark that room inventory as no longer available. Upon beginning the reservation process, we will deduct one inventory from the database and if inventory falls to zero or lower considering up to X% overbooking, the room type will then be marked unavailable. As part of this, we will create a reservation record with the status of “PENDING”.

When creating the reservation, we must protect against concurrent requests overbooking a room. We will consider four options: (1) pessimistic locking, (2) optimistic locking, (3) database table constraints, and (4) update expressions.

1) Pessimistic Locking

Pessimistic locking, sometimes simply referred to as locking, locks the database record while it completes the transaction. If another request comes into the database, it will not make updates as the records are locked. One must be careful when locking multiple records to not create a deadlock scenario where two threads or processes are both stuck waiting for the other to release their lock. In this scenario, (1) a lock would be obtained on the inventory table record, (2) inventory would be checked, (3) reservation record created and inventory reduced, and (4) the lock is released. During this transaction any attempt to modify the inventory table record would be halted which would limit scaling.

2) Optimistic Locking

Optimistic locking checks at the end of the transaction and does not commit if a conflict is detected. When a conflict is detected the entire transaction must then be retried. This can be more or less performant than pessimistic locking depending on the scenario. Optimistic locking becomes unsuitable when contention is high as transactions have to be repeated over-and-over again, but is great when contention is low as database performance is higher without locking records. This can be implemented with version numbers or timestamps, though versioning is typically preferred to avoid issues with clock skew between servers.

3) Database Table Constraints

Database table constraints are implemented when creating the database tables. The database will reject updates which break table constraints; it is a little like optimistic locking. Support for table constraints varies based on the specific database.

4) Update Expressions

Many databases offer the ability to perform atomic read-and-update operations on a single record as a single operation. For example, DynamoDB using update expressions allows one to: add an item to a list, remove an item from a list, decrement a value, or increment a value without reading the record first. This eliminates certain cases where one may have relied on a locking mechanism, but obviously does not address all such cases including when updating multiple records at once. For this case, we need to update multiple records as part of one atomic update and so update expressions won’t satisfy our requirement.

We will use optimistic locking with versioning here as contention is not expected to be too high and it will provide improved scaling. In practice, this entails the Hotel Reservations Service to (1) check the inventory and then (2) create the reservation and reduce inventory as long as the version number of the record remains the same. If this fails and the version number changed since reading the record, it will retry from the start. Once the reservation is created the user will pay, updating the reservation record status, and completing the reservation booking process. A background job will check pending reservations and clean them up if time has expired with no payment received. Payment processing itself will be covered in a subsequent blog post.

We also need to consider a single user double-booking when a user submits the reservation and mistakenly sends the request multiple times. For example, clicking the “reserve” button twice. To address this, the frontend will pass in a reservation-id which is a UUID identifying that specific user’s reservation request. If the request is resubmitted it will contain the same reservation-id and be rejected with a HTTP 429 error-code. This provides idempotency, protecting against double-bookings.

After the initial reservation has been created, the user then has two minutes to complete the reservation process before losing the reservation, marked with the status of “CANCELED”, and restarting the process. Once paid the reservation will be moved to the “PAID” status. Other statuses include: “CHECKED_IN” once the guest checks in and has their room assigned, “COMPLETED” after the guest checks out, and “REFUNDED” if the guest is granted a refund after already paying. These states can be thought of as a finite state machine.

Booking Workflow

Workflow - Viewing or Canceling a Reservation

Viewing and canceling reservations is rather simple. The guest will view the reservation and then click cancel. These two requests both flow through the Edge Service and Hotel Gateway Service to the Hotel Service.

Cancel Workflow

Workflow - Admin Report Generation

Production systems often require a number of asynchronous backend processing jobs to generate analytics, dashboards, and reports. Our dedicated Reports Service is responsible for managing these backend processing jobs and generating reports. A map-reduce cluster can be used here for large scale data processing.

Reporting Workflow

Closing Thoughts, Optimizing Performance and Scaling

Though this design is suitable, there are further enhancements that can be made. The first is implementing caches for search and room type details. Given these requests will see much higher volume, we will want to reduce load on the database and ensure low latencies. These requests do not require strong consistency and can rely on eventual consistency.

Therefore, in addition to the relational database we will leverage an in-memory distributed cache to improve performance. Redis will work great here. For inventory data, the key will be the HotelId + RoomTypeId + Date, whereas the value will be the available number of rooms. This can cause a brief data inconsistency between the relational database record-of-truth and the cache. This is acceptable as the database is always used as the source-of-truth when creating reservations, therefore in the worst-case a guest sees a room that is no longer available when they attempt to create the reservation.

As the teams and services scale, the Hotel cell would be split into multiple cells and services like the Hotel Service would be broken into multiple microservices. This may require distributed transactions to ensure strong consistency between different microservices and their databases. We avoid this for now for simplicity sake until the solution reaches a scale where that is deemed necessary. This concludes the system design for a hotel reservation booking solution.

Now to book that next hotel trip!

Appendix - Database Table Schemas

Hotel Table

  • HotelId (primary key) - uuid
  • Name - string
  • Address - string

RoomType Table

  • HotelId (primary key) - uuid
  • RoomTypeId (primary key) - uuid
  • BedCount - integer
  • Name - string

RoomInventory Table

  • HotelId (primary key) - uuid
  • RoomTypeId (primary key) - uuid
  • Date (primary key) - date
  • AvailableCount - integer
  • ReservedCount - integer

RoomInventory Table Example:

hotel_id room_type_id date available_count reserved_count
00001 100 2020-01-02 782 162
00001 100 2020-01-03 601 201
00001 101 2020-01-02 9 22
00002 100 2020-01-02 73 0

Reservation Table

  • ReservationId (primary key) - uuid
  • HotelId - uuid
  • RoomTypeId - uuid
  • StartDate - date
  • EndDate - date
  • UserId - uuid
  • RoomNumber - integer
  • Status - enum [PENDING, PAID, CHECKED_IN, COMPLETED, CANCELED, REFUNDED]
  • Secondary Indexes:
    • UserId
    • HotelId + RoomTypeId

Reservation Table Example:

reservation_id hotel_id room_type_id start_date end_date user_id room status
10000001 100 243 2020-01-02 2020-01-04 1000000 123 CHECKED_IN
20000002 100 187 2020-01-02 2020-01-05 1002000 nil PAID
30000003 101 001 2020-01-03 2020-01-04 2000000 nil PENDING
80000004 100 082 2020-01-04 2020-01-06 3000000 nill PENDING

Hotel Guest

  • UserId (primary key) - uuid
  • FullName - string
  • Address - string
  • Email - string
  • Phone - string

Hotel Pricing

  • HotelId (primary key) - uuid
  • RoomTypeId (primary key) - uuid
  • Date (primary key) - date
  • Price - integer