6 Engineers but 11 Million users, how? — the story about Pinterest

FadinGeek
5 min readDec 2, 2024

--

Pinterest’s journey from a small startup to a massive platform with 11 million users is truly impressive. Imagine this: a man is sitting in an uninspiring washroom, thinking, “Man, this washroom sucks.” Desperate for inspiration, he thinks, “If only there was a place for me to get some ideas.” Suddenly, a lightbulb moment strikes — “Wait a minute, why don’t I…?”

And that’s how Pinterest was born. (Not a true story!)

This little story shows a universal truth: some of the best ideas come to us in the most unexpected places. It’s in those quiet, solitary moments, like in a washroom, where our minds wander freely and creativity flourishes. Pinterest’s incredible journey from its early days to becoming a platform with millions of users proves that simple, yet revolutionary ideas can spring from the most mundane situations.

Let’s dive into their story and see how they managed this feat with only six engineers.

Background Story

Pinterest was first launched in March 2010, a time when web development was simpler. Back then, the LAMP stack (Linux, Apache, MySQL, PHP) was the go-to technology stack for many startups. Pinterest started with two founders and one engineer, with a tech stack consisting of a small web engine and a single MySQL database.

Initial Growth and AWS Transition

In January 2011, Pinterest continued to release the product in stealth mode, evolving based on user feedback. They brought in another engineer and moved to Amazon Web Services (AWS). Their updated tech stack included:

  • Amazon EC2, S3, and CloudFront for CDN
  • NginX as a web server, reverse proxy, load balancer, mail proxy, and HTTP cache
  • Four web engines
  • One MySQL database with one replica
  • A task queue with two task processors
  • One MongoDB for tracking analytics

Explosive Growth and Technological Expansion

By September 2011, Pinterest experienced explosive growth, with users doubling every month and a half. They hired another engineer and expanded their tech stack significantly:

  • Amazon EC2, S3, and CloudFront
  • Two NginX, 16 web servers, and two API engines
  • Five MySQL databases with nine read replicas
  • Four Cassandra nodes
  • 15 members nodes across three clusters
  • 10 Redis nodes
  • Three task routers with four task processors
  • Four Elasticsearch nodes
  • Three MongoDB clusters

During this period, they experimented with various technologies and faced the challenge of pushing their systems to the limit. As databases grew rapidly, they encountered issues like databases running out of memory and the need to distribute data across multiple servers.

Addressing Scaling Challenges: Clustering vs. Sharding

Pinterest halted new feature development to focus on scaling solutions, exploring clustering and sharding as potential strategies.

Clustering: Database clustering involves distributing data across multiple servers that work together as a single system. While Pinterest already used Cassandra and Membase for clustering, they faced several challenges:

  1. Data rebalance issues, where replication would get stuck when adding new servers.
  2. Data corruption due to bugs in the cluster management algorithm.
  3. Improper load balancing, requiring manual redistribution of data.
  4. Data authority failures lead to data loss during replication.

Due to these issues, Pinterest decided to drop Cassandra, Membase, and Mongo from their stack, opting to stick with MySQL and implement sharding.

Sharding: Sharding involves partitioning a large database into smaller, more manageable pieces called shards, each hosted on a separate server. Pinterest denormalized its database, removing foreign keys and joins, and introduced read replicas to handle queries more efficiently.

They implemented caching to reduce database traffic and addressed replication lag by storing recently updated data in the cache. Sharding aimed to minimize database queries for rendering a single page, ensuring all user-associated data resided on the same shard.

Pinterest initially created virtual shards with eight physical servers, each hosting 512 databases. They used a multi-master replication approach, assigning each master to a different availability zone. They eventually adopted an ID sharding strategy, where a record’s ID determined its shard.

Transition to Sharded Database

Pinterest transitioned to the sharded database by continuing to write to the old system while replicating data to the new shards. Once the new system was confirmed to work correctly, they cut off the old system.

Dealing with Sharding Issues

While in the shared environment, Pinterest faced an increased load on specific shards. They monitored code commits and, if needed, split the database to distribute the load. They replicated primary nodes on different servers and moved data to new shards, updating application code accordingly.

Challenges with Sharding:

  • Loss of foreign keys joins, and constraints, moving logic to the application layer.
  • Loss of transaction capabilities, making it challenging to ensure atomicity and consistency across shards.
  • Increased complexity in schema changes requires careful planning.

Final Takeaways

By January 2012, Pinterest had simplified its tech stack and continued to see traffic growth. They learned valuable lessons from their scaling journey:

  1. Your architecture is effective if you can solve problems by adding more servers.
  2. Minimize data movement across nodes for a more stable architecture.
  3. Keep it simple, as everything can fail.

Pinterest’s story is a testament to the power of focus, efficient resource use, and scalable infrastructure. By prioritizing core features, leveraging open-source tools, and employing an iterative development approach, they managed to grow rapidly while maintaining a high-quality user experience.

You’re Awesome :)

FadinGeek

--

--

FadinGeek
FadinGeek

Written by FadinGeek

Tech ⚙️ Finance $ | Learning & Trying | Sharing discoveries & mistakes #lifelonglearner

No responses yet