What is video encoding and how encode videos

Video encoding is the process of creating a video file using a given codec. The video codec chosen is like a template that describes how the resulting data should be stored, compressed, and eventually decoded for playback.

For example; H.264 is a common codec for encoding video. To play it back after a video file has been created with it, you need a H.264 decoder. Decoding could be done in software or with dedicated hardware to make it faster and more efficient.

What’s the difference between encoding and transcoding?

You may see the term “transcoding” used interchangeably with “encoding”. Transcoding is the process of decoding (decompressing) from one codec and then encoding (compressing) it into another. Encoding on its own only strictly happens when the video is first captured, turning captured light into digital data.

Even though most people call it encoding, they’re usually referring to transcoding because one video format is being turned into a different one.

It's also common to hear someone say they are "converting" the video into a different format instead of using the word "transcoding".

What's the difference between a codec and container?

Codecs describe how to compress and store a video stream. Container formats control how video and audio tracks are grouped together into the same file, sometimes including text tracks as well for captions. Audio and video tracks will use different codecs for encoding so require something else to package them together.

For example, H.264 is a codec that is often delivered in an MP4 (MPEG-4) container. Your video file will likely have the .mp4 extension for this. Confusingly, there are both codecs and containers that have "MPEG" in the name.

Common web codecs include H.264, H.265 (aka HEVC), AV1, and VP8/9.

The most common container formats on the web are MP4, WebM, and MOV.

There are many other specialist codecs and containers used for traditional video recording and editing as they require very high quality source files to work

Why encode a video for video streaming?

Encoding using an efficient codec makes viewing video over the internet possible. Uncompressed video, video that is stored "raw", produces massive file sizes. It's so large that you can't fit very much of it on physical media like DVD's or Blu-Ray discs and usually require very large hard drives to store.

For example, uncompressed 1080p video can have a bitrate of 2.98 Gbit/s (Gigabits per second). The average home internet connection, very rarely reaching gigabit speeds, would take a very long time to receive an uncompressed video.

Internet video streaming is more complicated than just encoding one input file and one output file though. To deliver a great user experience across a variety of internet connections, you actually need to encode the input file to many different output files at different resolutions and compression rates.

This approach is referred to adaptive bitrate streaming, or ABR. This is what allows you to switch between different resolutions when watching video back on the internet, because all of the these different resolutions have already been encoded ready for you to switch between during playback.

You should carefully consider what encoding settings to use for each rendition (output file) of the original video so that playback is smooth for a variety of internet connection speeds.

If you've made it this far, you might be interested in the Mux Video API to encode your video files.

Learn more about Mux Video

How does video encoding work?

Uncompressed video is very simple, just store the color of every pixel of every frame and have a perfect image for every frame of the video ready to play. The big drawback of this is the huge file sizes this produces.

Encoding works by analysing the input file for patterns and then storing the patterns in the output file in a way that results in smaller file sizes. It takes up much less space to say "the next 1000 pixels are blue" than to write "blue..." 1000 times in row for each pixel.

Different codecs introduce more and more complex pattern recognition to more efficiently compress the image. When the compressed video is able to 100% accurately decode into the original input file we say that the compressed video is "lossless". If we lose data during compression, because we've thrown away too much data in order to compress it better, we call it "lossy" because there's no way of getting the original data back from the compressed version.

Codecs don't just look at nearby pixels though, they look at how patterns change over multiple frames of video and encode the differences that happen over time.

This results in different types of frames being stored in the final video: "I-frames" store all the information needed to produce a whole image, and "P-frames" and "B-frames" only store enough information to change the previous frame into the next frame. The trade-off is that much more information needs to be discarded to produce these in-between frames and so quality can sometimes suffer if there is too many of them.

How long a video plays between I-Frames is often referred to as keyframe distance or GOP (group of pictures) size. A keyframe distance of 2 seconds and a frame rate of 30 frames per second means that there is a solid I-Frame every 60 frames. If this distance is too large, compression artefacts often appear because it has been playing for too long without reaching a solid frame.

How long does video encoding take?

For a vast majority of use cases, encoding is the most time consuming part of processing video. How long it takes depends on a lot variables but generally, a short video could take a few seconds while a very long video with a large file size using an advanced codec could take many hours.

Most video encoding services encode the whole video before it can be used for playback. This is a big bottleneck to ensuring fast playback after you upload a video file to one of them as you have to wait until the whole video is finished encoding before you can play any of it back. Mux specifically can play back a video of any length almost immediately because of a unique feature called just-in-time video encoding.

Instant video encoding for web streaming

Just-in-time encoding is an encoding process that let's viewers start watching a video even though the whole file hasn't been encoded yet. When a viewer requests a video for playback, the transcoding process starts and bytes of the video are delivered to the viewer as soon as the first frame of the video is encoded. Every Mux Video customer gets just-in-time encoding. Create an account to test it out.

The best encoding settings for file-size and quality

Encoding requires you to decide what tradeoffs you’re willing to make with regards to quality and file-size.

For example, you may choose to encode a higher quality video that you can deliver to users with a fast internet connection. Or, you might limit the quality of the video so you can save money on storing it or to make it faster to deliver to users on slower connection speeds.

Quality in video encoding is normally controlled by:

Bitrate: how many "bits" of information is stored per second of video. It's normally expressed in Kbps (Kilobits pers second), Mbps (Megabits) or Gbps (Gigabits per second). Not to be confused with "Megabytes" and "Gigabytes" which normally describe the resulting total file size.
"Constant" or "Fixed" bitrate: whether the bitrate will fluctuate across the duration of the video
Bit depth: How many bits are used to describe each pixel. Usually 8-bit, 10-bit is used for higher quality encodes like UHD (ultra-HD) 4K blu-rays.
Dynamic range (HDR): Specialist codecs like HEVC are often needed to encode high dynamic range content and usually require a higher bit depth to accommodate storing the extra range.

If you're using a video platform like Mux, then these settings have all been carefully chosen for you and encoding will be optimised for streaming to give you the best balance between speed, quality and cost.

Configuring all of these settings can often take a lot of trial and error to achieve the results that you're looking for. There are a few tools available to help with encoding your videos:

FFmpeg: a command line utility that is able to encode/transcode just about any codec imaginable into any other one. A steep learning curve to learn but you will have complete control over every aspect of the encoding process
Handbrake: an open source tool for encoding video into many formats

Advancements in encoding quality:

Normally, video platforms apply the same encoding settings to every single video uploaded. The drawback of this is that one-size doesn't fit all when it comes to encoding videos. Depending on how complex the content is you might be using too many bits to store the video or throwing away too much information and reducing the quality by relying on generic settings.

Per-title video encoding is a process that adjusts the encoding settings automatically for each video based on analysing the unique patterns and visual complexity inherent in them.

Advanced techniques like machine-learning is often used to pre-process the video to figure out what the optimum settings would be to maximise quality and select the most appropriate encoding settings for each individual video. Per-title encoding is included for every Mux account so you can be confident that we're squeezing as much quality out of every byte possible. Sign up today to see the difference.

Why use a video encoding service instead of doing it all yourself?

If you have just a few videos, you can use encoding software to encode the videos on your desktop by hand. If you need to encode lot's of videos though this can become very tedious. You will also need to encode lot's of different versions of each video so that users can pick different resolutions for playback.

On top of this, you will ideally want:

Your videos delivered by a CDN for quick delivery
To produce thumbnails, poster images and gifs for previews
Automatic encoding to multiple resolutions with optimal settings
The option to allow users to upload videos directly instead of sending them to you first
Restricted playback, only let users watch videos that you grant access to
And lot's of other considerations...

If you’re building an app with user-generated content (UGC), like the next YouTube, you’re going to need a process that can handle scale scaling video upload and playback very quickly. It wouldn’t be feasible to manually encode each video, one at a time, on your desktop. A cloud video encoding service like Mux can encode thousands of videos quickly and reliably.

Back to Articles

Table of Contents

Everything you need to know about video encoding