The Problem We're Solving
Imagine you're scrolling through Twitter during a major news event. Without search, you'd be lost in an endless stream of random tweets. Without trending topics, you'd miss the biggest conversations happening worldwide. Without recommendations, you'd never discover interesting people to follow.
This is exactly what early social media platforms faced - users posting content into a void with no way to find or discover relevant conversations. The solution? A sophisticated search and discovery engine that processes millions of queries daily while detecting trending topics in real-time.
Today, we're building the system that makes Twitter addictive - the ability to instantly find any conversation and discover what's happening in the world.
What We're Building Today
Today we're implementing the discovery engine that makes users stick around - the search functionality that lets people find conversations, trending topics, and interesting accounts to follow. We'll build three core components: lightning-fast text search, real-time trending detection, and a basic recommendation system.
Our Target: Handle 1,000 concurrent searches with sub-second response times while detecting trending hashtags in real-time.
Why Search Matters in Social Media
Think about how you use Twitter - you probably search for breaking news, specific topics, or interesting people to follow. Without good search, users get lost in the noise. Twitter processes over 500 million searches daily, making it as crucial as posting tweets themselves.
The challenge? Social media search isn't like Google. People expect real-time results, trending topics change by the minute, and relevance depends on recency, not just content quality.
Core Concepts We'll Master
1. Full-Text Search with PostgreSQL
Unlike simple LIKE queries, full-text search understands language - it handles variations, rankings, and can search across multiple fields simultaneously. PostgreSQL's built-in search is surprisingly powerful and perfect for our 1,000-user scale.
Key Insight: We'll use tsvector and tsquery for blazing-fast searches that automatically handle word variations and ranking.
2. Trending Detection Algorithm
Trending isn't just about volume - it's about acceleration. A hashtag used 100 times today might be trending if it was only used 5 times yesterday. We'll implement a time-windowed counting system that detects velocity, not just volume.
Real-world Context: Twitter's trending algorithm considers multiple factors including account diversity, geographic clustering, and spam detection.
3. Content Recommendation Engine
We'll build a simple but effective recommendation system using collaborative filtering - "users who liked this also liked that." It's the same principle Netflix uses for movie recommendations.
Architecture Overview
Our search system consists of three main components:
Search Service: Handles text queries and returns ranked results
Trending Service: Continuously monitors hashtag usage patterns
Recommendation Service: Analyzes user behavior to suggest content
These services work together but remain independent - if trending detection fails, search still works perfectly.
System Integration
Our search system plugs into the existing Twitter architecture we've built:
Database Layer: PostgreSQL with search indexes
API Layer: New search endpoints in our Express server
Frontend: React search components with real-time updates
Caching: Redis for trending topics and frequent searches
Implementation Deep Dive
GitHub Link:
https://github.com/sysdr/twitterdesign/tree/main/lesson8/twitter-search-system
PostgreSQL Full-Text Search Setup
We'll create specialized indexes that transform tweet content into searchable vectors. When someone types "climate change," PostgreSQL instantly finds all tweets containing related terms like "global warming" or "environmental."
// Search query example - the magic happens in the database
const searchResults = await db.query(`
SELECT *, ts_rank(search_vector, plainto_tsquery($1)) as rank
FROM tweets
WHERE search_vector @@ plainto_tsquery($1)
ORDER BY rank DESC, created_at DESC
`, [searchTerm]);
Trending Detection Logic
Our trending algorithm works in time windows:
Current Window (last hour): Count hashtag usage
Previous Window (previous hour): Baseline comparison
Trending Score: Calculate acceleration percentage
Threshold Check: Mark as trending if score exceeds threshold
This approach catches both gradually rising topics and sudden viral content.
Recommendation System
We'll implement collaborative filtering by tracking user interactions (likes, retweets, follows) and finding similar users. If User A and User B like many same tweets, we can recommend User A's other liked tweets to User B.
Performance Considerations
Search Response Time: Our target is sub-second response. We achieve this through:
Proper database indexing on search vectors
Redis caching for popular queries
Limiting search results to top 50 matches
Trending Updates: We update trending topics every 10 minutes - frequent enough for real-time feel, infrequent enough to avoid database overload.
Memory Usage: Our recommendation system keeps user similarity scores in memory using a simple cache with LRU eviction.
Real-World Production Insights
Twitter's Evolution: Twitter started with simple MySQL LIKE queries but moved to specialized search infrastructure (Earlybird) handling billions of searches. Our PostgreSQL approach scales perfectly to thousands of users.
Spam Prevention: In production, trending detection includes spam filters - we'll add basic duplicate detection to prevent artificial trending.
Search Quality: Real systems use machine learning for result ranking. Our timestamp + relevance approach provides good results at our scale.
Building Your Search System
Now let's implement this step by step. We'll build the backend services first, then create the React frontend components.
Phase 1: Database Foundation
First, we need to set up PostgreSQL with full-text search capabilities.
Step 1: Create Search Indexes
Run the database migration to add search vectors:
-- Add full-text search indexes for tweets
ALTER TABLE tweets ADD COLUMN search_vector tsvector;
CREATE INDEX tweets_search_idx ON tweets USING gin(search_vector);
-- Update existing tweets with search vectors
UPDATE tweets SET search_vector = to_tsvector('english', content);
-- Create auto-update trigger
CREATE OR REPLACE FUNCTION update_tweets_search_vector()
RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english', NEW.content);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tweets_search_vector_update
BEFORE INSERT OR UPDATE ON tweets
FOR EACH ROW EXECUTE FUNCTION update_tweets_search_vector();
Expected Output: Tables updated with search vectors and indexes created
Step 2: Set Up Hashtag Tracking
Create tables for trending detection:
CREATE TABLE hashtags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
hashtag VARCHAR(100) UNIQUE NOT NULL,
usage_count INTEGER DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE TABLE hashtag_usage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
hashtag_id UUID REFERENCES hashtags(id),
tweet_id UUID REFERENCES tweets(id),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Phase 2: Backend Services
Step 3: Implement Search Service
Create the core search logic that handles both tweet and user searches:
export class SearchService {
async searchContent(query: string, limit = 20): Promise<SearchResult[]> {
const tweetResults = await client.query(`
SELECT t.*, u.username, u.display_name,
ts_rank(t.search_vector, plainto_tsquery($1)) as rank
FROM tweets t
JOIN users u ON t.author_id = u.id
WHERE t.search_vector @@ plainto_tsquery($1)
ORDER BY rank DESC, t.created_at DESC
LIMIT $2
`, [query, limit]);
return tweetResults.rows;
}
}
Step 4: Build Trending Detection
Implement the velocity-based trending algorithm:
export class TrendingService {
async updateTrendingTopics(): Promise<void> {
const currentHour = new Date().setMinutes(0,0,0);
const previousHour = new Date(currentHour - 3600000);
const trendingQuery = `
SELECT hashtag,
current_count,
previous_count,
CASE
WHEN previous_count = 0 THEN 1000
ELSE ((current_count - previous_count) / previous_count * 100)
END as velocity
FROM hashtag_counts
WHERE current_count >= 3
ORDER BY velocity DESC LIMIT 10
`;
// Cache results in Redis
await redisClient.setEx('trending:topics', 600, JSON.stringify(results));
}
}
Step 5: Create API Endpoints
Set up the REST endpoints for search functionality:
router.get('/search', async (req, res) => {
const { q: query, limit = 20 } = req.query;
const results = await searchService.searchContent(query, limit);
res.json({ results, query });
});
router.get('/trending', async (req, res) => {
const topics = await trendingService.getTrendingTopics();
res.json({ trending: topics });
});
Phase 3: Frontend Components
Step 6: Build Search Bar Component
Create a React component with real-time suggestions:
export const SearchBar: React.FC = ({ onSearch }) => {
const [query, setQuery] = useState('');
const [suggestions, setSuggestions] = useState([]);
const fetchSuggestions = useCallback(
debounce(async (searchQuery) => {
if (searchQuery.length < 2) return;
const response = await searchApi.getSuggestions(searchQuery);
setSuggestions(response.suggestions);
}, 300), []
);
return (
<div className="relative">
<input
value={query}
onChange={(e) => {
setQuery(e.target.value);
fetchSuggestions(e.target.value);
}}
placeholder="Search Twitter..."
className="search-input"
/>
{/* Suggestions dropdown */}
</div>
);
};
Step 7: Create Trending Topics Display
Build the trending topics sidebar:
export const TrendingTopics: React.FC = ({ onHashtagClick }) => {
const [trending, setTrending] = useState([]);
useEffect(() => {
fetchTrendingTopics();
const interval = setInterval(fetchTrendingTopics, 600000); // 10 minutes
return () => clearInterval(interval);
}, []);
return (
<div className="trending-container">
<h2>What's happening</h2>
{trending.map((topic, index) => (
<div key={topic.hashtag} onClick={() => onHashtagClick(topic.hashtag)}>
<span>#{topic.hashtag}</span>
<span>{topic.count} posts</span>
</div>
))}
</div>
);
};
Testing Your Implementation
Performance Testing
Test search response times:
time curl -s "http://localhost:3001/api/search?q=climate" > /dev/null
# Target: <500ms response time
Test concurrent users:
ab -n 1000 -c 100 "http://localhost:3001/api/search?q=test"
# Target: Handle 1000+ concurrent users
Functionality Testing
Verify search accuracy:
SELECT content, ts_rank(search_vector, plainto_tsquery('technology')) as rank
FROM tweets
WHERE search_vector @@ plainto_tsquery('technology')
ORDER BY rank DESC LIMIT 5;
Test trending updates:
# Monitor logs for trending updates every 10 minutes
docker-compose logs -f backend | grep "trending"
Success Criteria
By lesson end, you'll have:
✅ Full-text search across all tweets and users
✅ Real-time trending hashtags updating every 10 minutes
✅ Basic recommendation system suggesting users to follow
✅ Search API responding in <500ms for typical queries
✅ React frontend with search bar and trending sidebar
Integration with Previous Lessons
This builds on our previous work:
Database Models: Extends our tweet and user schemas with search indexes
API Design: Adds new search endpoints following RESTful patterns
Rate Limiting: Applies existing rate limiting to prevent search abuse
Real-time Updates: Uses our WebSocket system for live trending updates
Next Steps Preview
Next lesson covers media handling - uploading images, generating thumbnails, and serving content through CDNs. Our search system will then index image metadata and captions, making media discoverable too.
Assignment Challenge
Implement a "saved searches" feature where users can save frequent queries and get notifications when new content matches. This combines our search system with the notification system from earlier lessons.
Bonus: Add search filters (date range, user type, media content) to make search even more powerful.
The beauty of search is immediate feedback - type a query, see instant results. It's one of the most satisfying features to build and use!