On this page

Building a video recommendation engine with AI

A workflow for using AI embeddings to build a recommendation engine that suggests similar videos based on content

You can build a content-based recommendation system that suggests similar videos by converting video transcripts into AI embeddings and performing a nearest neighbor search.

How it works

The core concept is to convert text (your video transcripts) into high-dimensional vectors (embeddings) that capture semantic meaning. Videos with similar content will have embeddings that are close together in vector space, allowing you to find and recommend similar content.

Output format

The idea is that your recommendation system will return a ranked list of similar videos, for example:

{
  "id": "abc123",
  "recommendations": [
    {
      "id": "def456",
      "title": "Similar Video Title",
      "similarity_score": 0.89,
      "mux_asset_id": "mux-asset-id-123",
      "mux_playback_id": "mux-playback-id-456"
    },
    {
      "id": "ghi789",
      "title": "Another Related Video",
      "similarity_score": 0.82,
      "mux_asset_id": "mux-asset-id-789",
      "mux_playback_id": "mux-playback-id-1011"
    }
  ]
}

Mux features used

Auto-generated captions

The transcript data from Mux's auto-generated captions provides the textual content needed to create meaningful embeddings for your recommendation engine.

Workflow

Here's how to build a content-based video recommendation system:

Upload a video with auto-generated captions enabled
Wait for the video.asset.track.ready webhook, which will tell you that the captions track has finished being created
Retrieve the transcript file and create embeddings using an AI model
Store the embeddings in a vector database or search index
For recommendations, perform nearest-neighbor search to find videos with similar embeddings

Creating embeddings

Embeddings are numerical representations of text that capture semantic meaning. Words or phrases with similar meanings will have similar embedding vectors.

There are many services, APIs, and models available for creating embeddings from text:

OpenAI - text-embedding-3-small, text-embedding-3-large
Amazon Bedrock - Titan Text Embeddings
Cohere - Embed models
Hugging Face - Sentence Transformers
Google Vertex AI - Text Embeddings

OpenAI

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function createEmbedding(transcript) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: transcript,
    encoding_format: "float",
  });

  return response.data[0].embedding;
}

Amazon Bedrock

import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });

async function createEmbedding(transcript) {
  const command = new InvokeModelCommand({
    modelId: "amazon.titan-embed-text-v1",
    contentType: "application/json",
    body: JSON.stringify({
      inputText: transcript
    })
  });

  const response = await client.send(command);
  const responseBody = JSON.parse(new TextDecoder().decode(response.body));
  return responseBody.embedding;
}

// Example with Pinecone
import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY,
});

const index = pinecone.index('video-recommendations');

async function storeEmbedding(videoId, embedding, metadata) {
  await index.upsert([{
    id: videoId,
    values: embedding,
    metadata: metadata
  }]);
}

async function findSimilarVideos(queryEmbedding, topK = 5) {
  const queryResponse = await index.query({
    vector: queryEmbedding,
    topK: topK,
    includeMetadata: true,
    includeValues: true
  });

  return queryResponse.matches;
}

Best practices

Chunking: For long videos, consider splitting transcripts into chunks and averaging embeddings if the length of the transcript exceeds the allowed length of the API you're using to create embeddings.
Query embedding model must match stored embedding: Different embedding models work differently. They have different dimentionality and different values. Whatever embedding model you're using to store the embeddings must match the embedding model you're using to convert the query into a vector.
Quality thresholds: Set minimum similarity thresholds to avoid poor recommendations.

This approach gives you a content-based recommendation system that understands the semantic meaning of your videos, helping viewers discover relevant content based on what they're actually watching.

On this page

Building a video recommendation engine with AI

How it works

Output format

Mux features used

Workflow

Creating embeddings

OpenAI

Amazon Bedrock

Storing embeddings

PostgreSQL with pgvector

Vector databases

Best practices

On this page

Building a video recommendation engine with AI

How it works

Output format

Mux features used

Workflow

Creating embeddings

OpenAI

Amazon Bedrock

Storing embeddings

PostgreSQL with pgvector

Vector databases

Best practices