Sai A Sai A
Updated date Aug 03, 2023
In this blog, we will learn how to convert HTML to plain text in PHP. Discover multiple methods, including using built-in functions, regular expressions, DOMDocument class, and the HTML Purifier library.
  • 5.2k
  • 0
  • 0

Introduction:

In this blog, we will explore multiple methods and techniques in PHP to achieve this conversion. We'll provide sample PHP code and explanations for each method, helping you make an informed decision based on your specific use case.

Method 1: Stripping Tags Using strip_tags() Function

One of the simplest and most straightforward methods to convert HTML to plain text in PHP is by using the built-in strip_tags() function. This function removes all HTML and PHP tags from a given string, leaving only the plain text content.

$htmlContent = '<p>Hello, <strong>World!</strong></p>';
$plainText = strip_tags($htmlContent);
echo $plainText;

Output:

Hello, World!

The strip_tags() function takes two parameters: the input string containing HTML content and an optional second parameter to specify allowed tags if you want to retain specific tags. In our example, we used the default behavior, which removes all tags and returns the plain text.

Method 2: Using Regular Expressions

Another approach is using regular expressions to extract the plain text from the HTML content. Regular expressions offer more flexibility, especially if you want to manipulate the text further or handle specific tag attributes.

$htmlContent = '<p>Hello, <strong>World!</strong></p>';
$plainText = preg_replace('/<[^>]*>/', '', $htmlContent);
echo $plainText;

Output:

Hello, World!

In this method, we used preg_replace() to replace all occurrences of HTML tags with an empty string. The regular expression /<[^>]*>/ matches any HTML tag (including attributes) and removes them from the HTML content.

Method 3: Using DOMDocument Class

PHP's DOMDocument class provides a robust and efficient way to manipulate HTML documents. We can leverage this class to extract the text content while preserving the document's structure.

$htmlContent = '<p>Hello, <strong>World!</strong></p>';
$dom = new DOMDocument();
$dom->loadHTML($htmlContent);
$plainText = $dom->textContent;
echo $plainText;

Output:

Hello, World!

Here, we created a new DOMDocument object and loaded the HTML content into it using the loadHTML() method. The textContent property retrieves the plain text from the document, stripping away all HTML tags.

Method 4: Using HTML Purifier Library

If your HTML content is untrusted and you need to ensure safety and cleanliness, using a library like HTML Purifier is a recommended approach. HTML Purifier not only converts HTML to plain text but also ensures that the output is free from malicious code and adheres to strict formatting standards.

First, you need to install the HTML Purifier library using Composer:

composer require ezyang/htmlpurifier

Then, you can use the following PHP code:

require 'vendor/autoload.php';

$config = \HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', '');
$purifier = new \HTMLPurifier($config);

$htmlContent = '<p>Hello, <strong>World!</strong></p>';
$plainText = $purifier->purify($htmlContent);
echo $plainText;

Output:

Hello, World!

In this method, we utilized HTML Purifier to sanitize and convert the HTML content to plain text. The configuration settings ensure that all HTML tags are disallowed, resulting in a clean plain text output.

Conclusion:

Converting HTML to plain text is a fundamental task in web development and data analysis. In this blog, we explored multiple methods to achieve this conversion using PHP. We started with the simple strip_tags() function, moved on to regular expressions, utilized the power of DOMDocument class, and finally, introduced the HTML Purifier library for secure conversions.

Comments (0)

There are no comments. Be the first to comment!!!