Scraping and Analyzing British Airways Customer Reviews
Tech Stack: Python, BeautifulSoup, requests, pandas, matplotlib, seaborn, NLTK (VADER), WordCloud
This project involved scraping thousands of customer reviews for British Airways (BA) from the Skytrax website and analyzing them to extract insights on customer satisfaction, service quality, and sentiment.
Problem Statement
British Airways (BA) is the flag carrier of the United Kingdom, operating daily international and domestic flights. With customer satisfaction being central to its brand, BA relies on customer feedback to inform business strategy and service improvements. It receives thousands of reviews across third-party platforms like Skytrax. These reviews contain valuable but unstructured data on customer experiences. The challenge is to extract, clean, and analyze this data to uncover actionable insights for decision-makers at BA.
Overview
- Goal: Scrape and analyze customer reviews to evaluate satisfaction and highlight key areas of feedback
- Data: 3,877 reviews scraped from Skytrax, including ratings, travel context, and full text reviews
- Tools: BeautifulSoup for scraping, pandas for cleaning, NLTK VADER for sentiment analysis, matplotlib/seaborn for visualization
- Outcome: Identified satisfaction trends, category ratings, traveler preferences, and sentiment distribution
Approach
Data Collection: Used BeautifulSoup and requests to scrape 50 pages of British Airways reviews from Skytrax, extracting review text, country, cabin class, route, traveler type, category ratings, and recommendation status.
Data Cleaning: Addressed missing values using strategies like median imputation and placeholder categories. Dates were parsed and review text was cleaned using regular expressions.
Exploratory Data Analysis: Visualized rating distributions, passenger recommendations, cabin-level satisfaction, and cross-country comparisons. Reviewed category-wise average scores (e.g., food, WiFi, ground service).
Text and Sentiment Analysis: Generated a word cloud to highlight common review themes. Applied NLTK’s VADER sentiment analyzer to classify reviews into Positive, Neutral, or Negative and analyzed the sentiment distribution.
Highlights
- Scraped and structured 3,877 detailed reviews from Skytrax, including 15 different attributes
- 57% of reviews were classified as positive, but many showed dissatisfaction in areas like WiFi and food
- Business Class and First Class travelers gave significantly higher average scores than Economy travelers
- Passengers from 75 unique countries shared their feedback, highlighting global perspectives on BA's service
- Generated compelling visualizations for data storytelling and executive-level presentations
Key Insights
- “Value for money” and “WiFi connectivity” were the most poorly rated service features
- Travelers in higher cabins reported better experiences and gave significantly higher ratings
- Review sentiment aligns closely with star ratings, validating VADER’s performance on this data
- Text feedback reveals consistent complaints around delays, baggage handling, and poor meals