Developer Blog

Tipps und Tricks für Entwickler und IT-Interessierte

FFMPEG | Compress video files

Compress and Convert MP4 to WMV

Compress and Convert MP4 to Webm for YouTube, Ins, Facebook

ffmpeg -i source.mp4 -c:v libvpx-vp9 -b:v 0.33M -c:a libopus -b:a 96k \<br>-filter:v scale=960x540 target.webm

Compress and Convert H.264 to H.265 for Higher Compression

ffmpeg -i input.mp4 -vcodec libx265 -crf 28 output.mp4

Set CRF in FFmpeg to Reduce Video File Size

ffmpeg -i input.mp4 -vcodec libx264 -crf 24 output.mp4

Reduce video frame size to make 4K/1080P FHD video smaller

ffmpeg -i input.avi -vf scale=1280:720 output.avi

Command-line – resize video in FFmpeg to reduce video size

ffmpeg -i input.avi -vf scale=852×480 output.avi

Resize video in FFmpeg but keep the original aspect ratio

Specify only one component, width or height, and set the other component to -1, for eample,

ffmpeg -i input.jpg -vf scale=852:-1 output_852.png

Converting WebM to MP4

ffmpeg -i video.webm video.mp4

When the WebM file contains VP8 or VP9 video, you have no choice but to transcode both the video and audio.

Video conversion can be a lengthy and CPU intensive process, depending on file size, video and audio quality, video resolution, etc. but FFmpeg provides a series of presets and controls to help you optimize for quality or faster conversions.

A note on video quality

When encoding video with H.264, the video quality can be controlled using a quantizer scale (crf value, crf stands for Constant Rate Factor) which can take values from 0 to 51: 0 being lossless, 23 the default and 51 the worst possible. So the lower the value the better the quality. You can leave the default value or, for a smaller bitrate, you can raise the value:

ffmpeg -i video.webm -crf 26 video.mp4

Video presets

FFmpeg also provides several quality presets which are calibrated for a certain encoding speed to compression ratio. A slower preset will provide a better compression. The following presets are available in descending order: ultrafastsuperfastveryfastfaster, fastmediumslowslower and veryslow. The default preset is medium but you can choose a faster preset:

ffmpeg -i video.webm -preset veryfast video.mp4

Placing the MOOV atom at the beginning

All MP4 files contain a moov atom. The moov atom contains information about the length of the video. If it’s at the beginning it immediately enables a streaming video player to play and scrub the MP4 file. By default FFmpeg places the moov atom at the end of the MP4 file but it can place the mov atom at the beginning with the -movflags faststart option like this:

ffmpeg -i video.webm -movflags faststart video.mp4

Using Levels and Profiles when encoding H.264 video with FFmpeg

To ensure the highest compatibility with older iOS or Android devices you will need to use certain encoding profiles and levels. For example a video encoded with the High Profile and Level 4.2 will work on iPhone 5S and newer but not on older iOS devices.

ffmpeg -i video.webm -movflags faststart -profile:v high -level 4.2 video.mp4

Converting WebM with H.264 video to MP4

In some rare cases the .webm file will contain H.264 video and Vorbis or Opus audio(for example .webm files created using the MediaRecorder API on Chrome 52+ ). In such cases you don’t have to re-encode the video data since it’s already in the desired H.264 format (re-encoding is also not recommended since you’ll be loosing some quality in the process while consuming CPU cycles) so we’re just going to copy over the data

To copy the video data and transcode the audio in FFmpeg you use the -c:v copy option:

ffmpeg -i video.webm -c:v copy video.mp4

BeautifulSoup | Complete Cheatsheet with Examples

Installation

pip install beautifulsoup4
from bs4 import BeautifulSoup

Creating a BeautifulSoup Object

Parse HTML string:

html = "<p>Example paragraph</p>"
soup = BeautifulSoup(html, 'html.parser')

Parse from file:

with open("index.html") as file:
  soup = BeautifulSoup(file, 'html.parser')

BeautifulSoup Object Types

When parsing documents and navigating the parse trees, you will encounter the following main object types:

Tag

A Tag corresponds to an HTML or XML tag in the original document:

soup = BeautifulSoup('<p>Hello World</p>')
p_tag = soup.p

p_tag.name # 'p'
p_tag.string # 'Hello World'

Tags contain nested Tags and NavigableStrings.

NavigableString

A NavigableString represents text content without tags:

soup = BeautifulSoup('Hello World')
text = soup.string

text # 'Hello World'
type(text) # bs4.element.NavigableString

BeautifulSoup

The BeautifulSoup object represents the parsed document as a whole. It is the root of the tree:

soup = BeautifulSoup('<html>...</html>')

soup.name # '[document]'
soup.head # <head> Tag element

Comment

Comments in HTML are also available as Comment objects:

<!-- This is a comment -->

Copy

comment = soup.find(text=re.compile('This is'))
type(comment) # bs4.element.Comment

Knowing these core object types helps when analyzing, searching, and navigating parsed documents.

Searching the Parse Tree

By Name

HTML:

<div>
  <p>Paragraph 1</p>
  <p>Paragraph 2</p>
</div>

Python:

paragraphs = soup.find_all('p')
# <p>Paragraph 1</p>, <p>Paragraph 2</p>

By Attributes

HTML:

<div id="content">
  <p>Paragraph 1</p>
</div>

Python:Copy

div = soup.find(id="content")
# <div id="content">...</div>

By Text

HTML:

<p>This is some text</p>

Python:

p = soup.find(text="This is some text")
# <p>This is some text</p>

Searching with CSS Selectors

CSS selectors provide a very powerful way to search for elements within a parsed document.

Some examples of CSS selector syntax:

By Tag Name

Select all

tags:

soup.select("p")

By ID

Select element with ID “main”:

soup.select("#main")

By Class Name

Select elements with class “article”:

soup.select(".article")

By Attribute

Select tags with a “data-category” attribute:

soup.select("[data-category]")

Descendant Combinator

Select paragraphs inside divs:

soup.select("div p")

Child Combinator

Select direct children paragraphs:

soup.select("div > p")

Adjacent Sibling

Select h2 after h1:

soup.select("h1 + h2")

General Sibling

Select h2 after any h1:

soup.select("h1 ~ h2")

By Text

Select elements containing text:

soup.select(":contains('Some text')")

By Attribute Value

Select input with type submit:

soup.select("input[type='submit']")

Pseudo-classes

Select first paragraph:

soup.select("p:first-of-type")

Chaining

Select first article paragraph:

soup.select("article > p:nth-of-type(1)")

Accessing Data

HTML:

<p class="content">Some text</p>

Python:

p = soup.find('p')
p.name # "p"
p.attrs # {"class": "content"}
p.string # "Some text"

The Power of find_all()

The find_all() method is one of the most useful and versatile searching methods in BeautifulSoup.

Returns All Matches

find_all() will find and return a list of all matching elements:

all_paras = soup.find_all('p')

This gives you all paragraphs on a page.

Flexible Queries

You can pass a wide range of queries to find_all():Name – find_all(‘p’)Attributes – find_all(‘a’, class_=’external’)Text – find_all(text=re.compile(‘summary’))Limit – find_all(‘p’, limit=2)And more!

Useful Features

Some useful things you can do with find_all():Get a count – len(soup.find_all(‘p’))Iterate through results – for p in soup.find_all(‘p’):Convert to text – [p.get_text() for p in soup.find_all(‘p’)]Extract attributes – [a[‘href’] for a in soup.find_all(‘a’)]

Why It’s Useful

In summary, find_all() is useful because:It returns all matching elementsIt supports diverse and powerful queriesIt enables easily extracting and processing result data

Whenever you need to get a collection of elements from a parsed document, find_all() will likely be your go-to tool.

Navigating Trees

Traverse up and sideways through related elements.

Modifying the Parse Tree

BeautifulSoup provides several methods for editing and modifying the parsed document tree.

HTML:

<p>Original text</p>

Python:

p = soup.find('p')
p.string = "New text"

Edit Tag Names

Change an existing tag name:

tag = soup.find('span')
tag.name = 'div'

Edit Attributes

Add, modify or delete attributes of a tag:

tag['class'] = 'header' # set attribute
tag['id'] = 'main'

del tag['class'] # delete attribute

Edit Text

Change text of a tag:

tag.string = "New text"

Append text to a tag:

tag.append("Additional text")

Insert Tags

Insert a new tag:

new_tag = soup.new_tag("h1")
tag.insert_before(new_tag)

Delete Tags

Remove a tag entirely:

tag.extract()

Wrap/Unwrap Tags

Wrap another tag around:

tag.wrap(soup.new_tag('div))

Unwrap its contents:

tag.unwrap()

Modifying the parse tree is very useful for cleaning up scraped data or extracting the parts you need.

Outputting HTML

Input HTML:

<p>Hello World</p>

Python:

print(soup.prettify())

# <p>
#  Hello World
# </p>

Integrating with Requests

Fetch a page:

import requests

res = requests.get("<https://example.com>")
soup = BeautifulSoup(res.text, 'html.parser')

Parsing Only Parts of a Document

When dealing with large documents, you may want to parse only a fragment rather than the whole thing. BeautifulSoup allows for this using SoupStrainers.

There are a few ways to parse only parts of a document:

By CSS Selector

Parse just a selection matching a CSS selector:

from bs4 import SoupStrainer

only_tables = SoupStrainer("table")
soup = BeautifulSoup(doc, parse_only=only_tables)

This will parse only the tags from the document.

By Tag Name

Parse only specific tags:

only_divs = SoupStrainer("div")
soup = BeautifulSoup(doc, parse_only=only_divs)

By Function

Pass a function to test if a tag should be parsed:

def is_short_string(string):
  return len(string) < 20

only_short_strings = SoupStrainer(string=is_short_string)
soup = BeautifulSoup(doc, parse_only=only_short_strings)

This parses tags based on their text content.

By Attributes

Parse tags that contain specific attributes:

has_data_attr = SoupStrainer(attrs={"data-category": True})
soup = BeautifulSoup(doc, parse_only=has_data_attr)

Multiple Conditions

You can combine multiple strainers:

strainer = SoupStrainer("div", id="main")
soup = BeautifulSoup(doc, parse_only=strainer)

This will parse only

.

Parsing only parts you need can help reduce memory usage and improve performance when scraping large documents.

Dealing with Encoding

When parsing documents, you may encounter encoding issues. Here are some ways to handle encoding:

Specify at Parse Time

Pass the from_encoding parameter when creating the BeautifulSoup object:

soup = BeautifulSoup(doc, from_encoding='utf-8')

This handles any decoding needed when initially parsing the document.

Encode Tag Contents

You can encode the contents of a tag:

tag.string.encode("utf-8")

Use this when outputting tag strings.

Encode Entire Document

To encode the entire BeautifulSoup document:

soup.encode("utf-8")

This returns a byte string with the encoded document.

Pretty Print with Encoding

Specify encoding when pretty printing

print(soup.prettify(encoder="utf-8"))

Unicode Dammit

BeautifulSoup’s UnicodeDammit class can detect and convert incoming documents to Unicode:

from bs4 import UnicodeDammit

dammit = UnicodeDammit(doc)
soup = dammit.unicode_markup

This converts even poorly encoded documents to Unicode.

Properly handling encoding ensures your scraped data is decoded and output correctly when using BeautifulSoup.