Lesson 8: Building Twitter's Brain: Search and Discovery

Aug 30, 2025

Imagine you're scrolling through Twitter during a major news event. Without search, you'd be lost in an endless stream of random tweets. Without trending topics, you'd miss the biggest conversations happening worldwide. Without recommendations, you'd never discover interesting people to follow.
This is exactly what early social media platforms faced - users posting content into a void with no way to find or discover relevant conversations. The solution? A sophisticated search and discovery engine that processes millions of queries daily while detecting trending topics in real-time.
Today, we're building the system that makes Twitter addictive - the ability to instantly find any conversation and discover what's happening in the world.

What We're Building Today

Today we're implementing the discovery engine that makes users stick around - the search functionality that lets people find conversations, trending topics, and interesting accounts to follow. We'll build three core components: lightning-fast text search, real-time trending detection, and a basic recommendation system.

Our Target: Handle 1,000 concurrent searches with sub-second response times while detecting trending hashtags in real-time.

Why Search Matters in Social Media

Think about how you use Twitter - you probably search for breaking news, specific topics, or interesting people to follow. Without good search, users get lost in the noise. Twitter processes over 500 million searches daily, making it as crucial as posting tweets themselves.

The challenge? Social media search isn't like Google. People expect real-time results, trending topics change by the minute, and relevance depends on recency, not just content quality.

Core Concepts We'll Master

1. Full-Text Search with PostgreSQL

Unlike simple LIKE queries, full-text search understands language - it handles variations, rankings, and can search across multiple fields simultaneously. PostgreSQL's built-in search is surprisingly powerful and perfect for our 1,000-user scale.

Key Insight: We'll use tsvector and tsquery for blazing-fast searches that automatically handle word variations and ranking.

2. Trending Detection Algorithm

Trending isn't just about volume - it's about acceleration. A hashtag used 100 times today might be trending if it was only used 5 times yesterday. We'll implement a time-windowed counting system that detects velocity, not just volume.

Real-world Context: Twitter's trending algorithm considers multiple factors including account diversity, geographic clustering, and spam detection.

3. Content Recommendation Engine

We'll build a simple but effective recommendation system using collaborative filtering - "users who liked this also liked that." It's the same principle Netflix uses for movie recommendations.

Architecture Overview

Our search system consists of three main components:

Search Service: Handles text queries and returns ranked results
Trending Service: Continuously monitors hashtag usage patterns
Recommendation Service: Analyzes user behavior to suggest content

These services work together but remain independent - if trending detection fails, search still works perfectly.

System Integration

Our search system plugs into the existing Twitter architecture we've built:

Database Layer: PostgreSQL with search indexes
API Layer: New search endpoints in our Express server
Frontend: React search components with real-time updates
Caching: Redis for trending topics and frequent searches

Implementation Deep Dive

GitHub Link:

https://github.com/sysdr/twitterdesign/tree/main/lesson8/twitter-search-system

PostgreSQL Full-Text Search Setup

We'll create specialized indexes that transform tweet content into searchable vectors. When someone types "climate change," PostgreSQL instantly finds all tweets containing related terms like "global warming" or "environmental."

// Search query example - the magic happens in the database
const searchResults = await db.query(`
  SELECT *, ts_rank(search_vector, plainto_tsquery($1)) as rank
  FROM tweets 
  WHERE search_vector @@ plainto_tsquery($1)
  ORDER BY rank DESC, created_at DESC
`, [searchTerm]);

Trending Detection Logic

Our trending algorithm works in time windows:

Current Window (last hour): Count hashtag usage
Previous Window (previous hour): Baseline comparison
Trending Score: Calculate acceleration percentage
Threshold Check: Mark as trending if score exceeds threshold

This approach catches both gradually rising topics and sudden viral content.

Recommendation System

We'll implement collaborative filtering by tracking user interactions (likes, retweets, follows) and finding similar users. If User A and User B like many same tweets, we can recommend User A's other liked tweets to User B.

Performance Considerations

Search Response Time: Our target is sub-second response. We achieve this through:

Proper database indexing on search vectors
Redis caching for popular queries
Limiting search results to top 50 matches

Trending Updates: We update trending topics every 10 minutes - frequent enough for real-time feel, infrequent enough to avoid database overload.

Memory Usage: Our recommendation system keeps user similarity scores in memory using a simple cache with LRU eviction.

Real-World Production Insights

Twitter's Evolution: Twitter started with simple MySQL LIKE queries but moved to specialized search infrastructure (Earlybird) handling billions of searches. Our PostgreSQL approach scales perfectly to thousands of users.

Spam Prevention: In production, trending detection includes spam filters - we'll add basic duplicate detection to prevent artificial trending.

Search Quality: Real systems use machine learning for result ranking. Our timestamp + relevance approach provides good results at our scale.

Building Your Search System

Now let's implement this step by step. We'll build the backend services first, then create the React frontend components.

Phase 1: Database Foundation

First, we need to set up PostgreSQL with full-text search capabilities.

Step 1: Create Search Indexes

Run the database migration to add search vectors:

-- Add full-text search indexes for tweets
ALTER TABLE tweets ADD COLUMN search_vector tsvector;
CREATE INDEX tweets_search_idx ON tweets USING gin(search_vector);

-- Update existing tweets with search vectors
UPDATE tweets SET search_vector = to_tsvector('english', content);

-- Create auto-update trigger
CREATE OR REPLACE FUNCTION update_tweets_search_vector()
RETURNS trigger AS $$
BEGIN
  NEW.search_vector := to_tsvector('english', NEW.content);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tweets_search_vector_update 
  BEFORE INSERT OR UPDATE ON tweets
  FOR EACH ROW EXECUTE FUNCTION update_tweets_search_vector();

Expected Output: Tables updated with search vectors and indexes created

Step 2: Set Up Hashtag Tracking

Create tables for trending detection:

CREATE TABLE hashtags (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  hashtag VARCHAR(100) UNIQUE NOT NULL,
  usage_count INTEGER DEFAULT 0,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE hashtag_usage (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  hashtag_id UUID REFERENCES hashtags(id),
  tweet_id UUID REFERENCES tweets(id),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Phase 2: Backend Services

Step 3: Implement Search Service

Create the core search logic that handles both tweet and user searches:

export class SearchService {
  async searchContent(query: string, limit = 20): Promise<SearchResult[]> {
    const tweetResults = await client.query(`
      SELECT t.*, u.username, u.display_name,
             ts_rank(t.search_vector, plainto_tsquery($1)) as rank
      FROM tweets t
      JOIN users u ON t.author_id = u.id
      WHERE t.search_vector @@ plainto_tsquery($1)
      ORDER BY rank DESC, t.created_at DESC
      LIMIT $2
    `, [query, limit]);
    
    return tweetResults.rows;
  }
}

Step 4: Build Trending Detection

Implement the velocity-based trending algorithm:

export class TrendingService {
  async updateTrendingTopics(): Promise<void> {
    const currentHour = new Date().setMinutes(0,0,0);
    const previousHour = new Date(currentHour - 3600000);
    
    const trendingQuery = `
      SELECT hashtag, 
             current_count,
             previous_count,
             CASE 
               WHEN previous_count = 0 THEN 1000
               ELSE ((current_count - previous_count) / previous_count * 100)
             END as velocity
      FROM hashtag_counts
      WHERE current_count >= 3
      ORDER BY velocity DESC LIMIT 10
    `;
    
    // Cache results in Redis
    await redisClient.setEx('trending:topics', 600, JSON.stringify(results));
  }
}

Step 5: Create API Endpoints

Set up the REST endpoints for search functionality:

router.get('/search', async (req, res) => {
  const { q: query, limit = 20 } = req.query;
  const results = await searchService.searchContent(query, limit);
  res.json({ results, query });
});

router.get('/trending', async (req, res) => {
  const topics = await trendingService.getTrendingTopics();
  res.json({ trending: topics });
});

Phase 3: Frontend Components

Step 6: Build Search Bar Component

Create a React component with real-time suggestions:

export const SearchBar: React.FC = ({ onSearch }) => {
  const [query, setQuery] = useState('');
  const [suggestions, setSuggestions] = useState([]);

  const fetchSuggestions = useCallback(
    debounce(async (searchQuery) => {
      if (searchQuery.length < 2) return;
      const response = await searchApi.getSuggestions(searchQuery);
      setSuggestions(response.suggestions);
    }, 300), []
  );

  return (
    <div className="relative">
      <input
        value={query}
        onChange={(e) => {
          setQuery(e.target.value);
          fetchSuggestions(e.target.value);
        }}
        placeholder="Search Twitter..."
        className="search-input"
      />
      {/* Suggestions dropdown */}
    </div>
  );
};

Step 7: Create Trending Topics Display

Build the trending topics sidebar:

export const TrendingTopics: React.FC = ({ onHashtagClick }) => {
  const [trending, setTrending] = useState([]);

  useEffect(() => {
    fetchTrendingTopics();
    const interval = setInterval(fetchTrendingTopics, 600000); // 10 minutes
    return () => clearInterval(interval);
  }, []);

  return (
    <div className="trending-container">
      <h2>What's happening</h2>
      {trending.map((topic, index) => (
        <div key={topic.hashtag} onClick={() => onHashtagClick(topic.hashtag)}>
          <span>#{topic.hashtag}</span>
          <span>{topic.count} posts</span>
        </div>
      ))}
    </div>
  );
};

Testing Your Implementation

Performance Testing

Test search response times:

time curl -s "http://localhost:3001/api/search?q=climate" > /dev/null
# Target: <500ms response time

Test concurrent users:

ab -n 1000 -c 100 "http://localhost:3001/api/search?q=test"
# Target: Handle 1000+ concurrent users

Functionality Testing

Verify search accuracy:

SELECT content, ts_rank(search_vector, plainto_tsquery('technology')) as rank 
FROM tweets 
WHERE search_vector @@ plainto_tsquery('technology')
ORDER BY rank DESC LIMIT 5;

Test trending updates:

# Monitor logs for trending updates every 10 minutes
docker-compose logs -f backend | grep "trending"

Success Criteria

By lesson end, you'll have:

✅ Full-text search across all tweets and users
✅ Real-time trending hashtags updating every 10 minutes
✅ Basic recommendation system suggesting users to follow
✅ Search API responding in <500ms for typical queries
✅ React frontend with search bar and trending sidebar

Integration with Previous Lessons

This builds on our previous work:

Database Models: Extends our tweet and user schemas with search indexes
API Design: Adds new search endpoints following RESTful patterns
Rate Limiting: Applies existing rate limiting to prevent search abuse
Real-time Updates: Uses our WebSocket system for live trending updates

Next Steps Preview

Next lesson covers media handling - uploading images, generating thumbnails, and serving content through CDNs. Our search system will then index image metadata and captions, making media discoverable too.

Assignment Challenge

Implement a "saved searches" feature where users can save frequent queries and get notifications when new content matches. This combines our search system with the notification system from earlier lessons.

Bonus: Add search filters (date range, user type, media content) to make search even more powerful.

The beauty of search is immediate feedback - type a query, see instant results. It's one of the most satisfying features to build and use!

System Design Twitter Course

Discussion about this post