Menu

Gymshark PostgreSQL Database

Used Python to webscrape gymshark product data and the faker library to create synthetic order and customer data

Teal Flower
Teal Flower

Skills Used

Challenge

My initial approach to data collection presented several unexpected technical hurdles. When I began this project, I had planned to use traditional web scraping methods to gather product information from the Gymshark website. However, I quickly discovered that Gymshark's website is built using dynamic rendering technologies that load content through JavaScript after the initial page load, rather than delivering all content in the initial HTML response. This meant that conventional Python web scraping libraries like Beautiful Soup and Selenium—tools I was already proficient with—were ineffective for capturing the complete dataset I needed.

After researching alternatives, I identified Playwright as the most suitable library for handling dynamically loaded content. This presented a significant learning curve, as I had no prior experience with this technology. I spent considerable time studying Playwright's documentation, understanding its browser automation capabilities, and adapting my code to work with its asynchronous programming model. The process involved writing custom scripts to navigate through various product pages, wait for dynamic content to load, and extract structured data while handling potential timeouts and errors.

Further complications arose when setting up the database infrastructure to store the scraped data. Creating a database from scratch led to numerous configuration issues within my terminal environment. I encountered dependency conflicts, permission problems, and connectivity issues that required extensive troubleshooting. Each step of the database setup process—from schema design to connection handling—presented its own set of technical challenges that needed to be overcome through research and experimentation.

Results

Despite facing these significant technical obstacles, the project ultimately proved to be an exceptionally valuable learning experience that expanded my skill set in several important directions. Working with Playwright opened my eyes to the reality of modern web scraping in an era where most commercial websites use JavaScript frameworks and dynamic content loading. I gained practical expertise in handling these complex scraping scenarios, which represents a substantial advancement in my data collection capabilities.

The project also provided unexpected insights into web development practices. By examining how the Gymshark website was structured and rendered, I developed a deeper understanding of frontend architecture, AJAX requests, and the interaction between client-side and server-side components. This knowledge will be invaluable for future projects that involve web technologies, whether for data collection or application development.

One of the most significant takeaways came from my work with database implementation. Through the process of setting up and optimizing my database, I gained first-hand experience with the critical importance of efficient query design. I observed notable differences in loading times between my initial queries and my optimized versions, even with the relatively modest dataset I was working with. This experience provided a tangible demonstration of database performance principles that are often discussed theoretically.

While my current dataset wasn't large enough for inefficient queries to cause significant performance problems, I recognized that the same design patterns would become crucial when scaling to larger datasets. This realization has prepared me for future work with big data environments, where query optimization can mean the difference between analyses completing in seconds versus hours. I've developed both technical skills and a performance-oriented mindset that will serve me well in data-intensive professional contexts.

The technical skills I acquired—from advanced web scraping techniques to database optimization—represent valuable additions to my professional toolkit that will be applicable across numerous data science and engineering projects in the future.

Process

Process

Process

01

Webscraping

The first step was understanding the web page structure of the website and design the Python script to parse the website with the relevant columns for my analysis

02

Database Creation

PostgreSQL was installed on macOS using Homebrew, and the database was built from the ground up. Four tables were created: orders, products, customer, and order_items. Each table was designed with appropriate data types for each column, and constraints were applied to ensure the database remains efficient when running queries

03

Advanced SQL Querying

Ran SQL queries using windows funcitions, common table expressions, joins, and rankings to better understand product popularity and customer sentiment

04

Data Visualization

The main goal was to understand the significance of product ratings on popular purchase probability. A Power BI dashboard was built to present my findings in a simplified manner

We conducted user interviews, surveys, and analyzed in-app analytics to understand the pain points and user needs. We also studied competitor apps and industry trends to gather insights

01

Webscraping

The first step was understanding the web page structure of the website and design the Python script to parse the website with the relevant columns for my analysis

02

Database Creation

PostgreSQL was installed on macOS using Homebrew, and the database was built from the ground up. Four tables were created: orders, products, customer, and order_items. Each table was designed with appropriate data types for each column, and constraints were applied to ensure the database remains efficient when running queries

03

Advanced SQL Querying

Ran SQL queries using windows funcitions, common table expressions, joins, and rankings to better understand product popularity and customer sentiment

04

Data Visualization

The main goal was to understand the significance of product ratings on popular purchase probability. A Power BI dashboard was built to present my findings in a simplified manner

We conducted user interviews, surveys, and analyzed in-app analytics to understand the pain points and user needs. We also studied competitor apps and industry trends to gather insights

01

Webscraping

The first step was understanding the web page structure of the website and design the Python script to parse the website with the relevant columns for my analysis

02

Database Creation

PostgreSQL was installed on macOS using Homebrew, and the database was built from the ground up. Four tables were created: orders, products, customer, and order_items. Each table was designed with appropriate data types for each column, and constraints were applied to ensure the database remains efficient when running queries

03

Advanced SQL Querying

Ran SQL queries using windows funcitions, common table expressions, joins, and rankings to better understand product popularity and customer sentiment

04

Data Visualization

The main goal was to understand the significance of product ratings on popular purchase probability. A Power BI dashboard was built to present my findings in a simplified manner

We conducted user interviews, surveys, and analyzed in-app analytics to understand the pain points and user needs. We also studied competitor apps and industry trends to gather insights

01

Webscraping

The first step was understanding the web page structure of the website and design the Python script to parse the website with the relevant columns for my analysis

02

Database Creation

PostgreSQL was installed on macOS using Homebrew, and the database was built from the ground up. Four tables were created: orders, products, customer, and order_items. Each table was designed with appropriate data types for each column, and constraints were applied to ensure the database remains efficient when running queries

03

Advanced SQL Querying

Ran SQL queries using windows funcitions, common table expressions, joins, and rankings to better understand product popularity and customer sentiment

04

Data Visualization

The main goal was to understand the significance of product ratings on popular purchase probability. A Power BI dashboard was built to present my findings in a simplified manner

We conducted user interviews, surveys, and analyzed in-app analytics to understand the pain points and user needs. We also studied competitor apps and industry trends to gather insights