Core Goal
Implement a simplified Twitter with:
postTweet(userId, tweetId)
follow(followerId, followeeId)
unfollow(followerId, followeeId)
getNewsFeed(userId): return 10 most recent tweetIds from userId + followees, ordered by recency.
The entire optimization focus is getNewsFeed, because it dominates runtime as data grows.
Baseline Approach (AC but not scalable)
Data Structures (baseline)
followMap_[u] -> unordered_set<int>: followees of user u
tweets_[tweetId] -> Tweet(tweetId, ts) global lookup
userTweet_[u] -> unordered_set<int>: tweetIds posted by user u
ts uses a monotonic counter gTimestamp_++ (preferred over wall clock)
getNewsFeed Baseline
Collect all tweetIds from:
each followee of userId
userId themself
Convert tweetIds to Tweet objects using tweets_
Sort all tweets by timestamp descending
Take first 10
Complexity
Let M be number of collected tweets (sum of all followees’ tweets + self tweets):
Collect: O(M)
Sort: O(M log M)
Return 10: O(10)
This passes the problem but is overkill, because we only need top 10.
Optimization 1: Avoid Full Sort — Keep Top 10 with a Heap
Key Observation
We don’t need a full ordering of M tweets. We only need the largest 10 by timestamp.
Idea: Min-Heap of Size 10
While scanning candidate tweets:
push Tweet into min-heap ordered by timestamp
if heap size > 10, pop the smallest (oldest)
At the end, heap contains the newest 10 tweets (unordered); pop into output and reverse.
Complexity
scan: O(M)
heap operations: O(M log 10) ≈ O(M)
space: O(10)
This is already a major improvement:
from O(M log M) → O(M)
Tradeoff
Still scans all tweets of all followees. If a user follows many accounts with huge histories, scanning remains expensive.
Optimization 2 (Best): K-Way Merge on Per-User Chronological Streams
Key Observation
Tweets for each user are naturally time-ordered at creation.
So instead of storing tweetIds in an unordered container, store tweets in append-only chronological order.
Data Structure Change
Replace:
userTweet_[u] -> unordered_set<int>
With:
userTweets_[u] -> vector<Tweet> (append-only, time increasing)
Now each user has a sorted stream by timestamp_.
K-Way Merge Strategy
We need the newest tweets among (F + 1) streams (F followees + self).
This is identical to merging k sorted lists from the end.
Algorithm:
For each stream (self + each followee), push its latest tweet into a max-heap.
Heap node tracks: (uid, idx, tweetId, ts)
Repeat up to 10 times:
pop heap top (newest tweet)
add tweetId to answer
move that user’s pointer one step earlier (idx-1) and push the previous tweet (if exists)
Why Node is Required
If the heap only stores Tweet objects, you lose:
which user stream it came from
where to fetch the “next candidate” from that stream
So heap elements must carry stream state (cursor).
Complexity
Let F be number of followees:
heap init: O((F+1) log(F+1))
extract 10: O(10 log(F+1))
does NOT depend on total tweet history
This is the “interview-ready” answer:
avoids scanning all tweets
avoids sorting all tweets
only touches as much data as needed to output 10 results
Practical Notes (C++ gotchas)
Avoid
unordered_map::operator[]
for read-only access
operator[] inserts default values when key is missing
it requires default-constructible value types
Use find() / at() when reading.
Prefer logical timestamp over wall clock
gTimestamp_++ is monotonic, deterministic, fast
avoids clock rollback / NTP adjustments
perfect for ordering in this simplified system
Final Recommended getNewsFeed (Optimized Pattern)
per-user vector<Tweet> chronological
max-heap (priority_queue) + cursor Node
pop/push up to 10 times
This is the strongest optimization story:
brute-force collect + sort
heap top-10 without full sort
k-way merge without scanning full histories