There are times when HTML is not useful, being for parsing the text or saving it. Removing HTML tags is a pain; that’s why the “HTML to Text” action is a god-send. It removes the HTML tags and presents us with the “raw” text that we can use to do whatever we want. Let’s explore it.
Where to find it?
You can find it under Standard.
Here’s what it looks like.
You can use any element to add to the “HTML to text” action. For example, a trigger parameter, variable, compose or even provide yourself the raw HTML in the field.
There are a few things to keep in mind. First in the conversion and then in the size of the HTML.
I’ll list them as they are in Microsoft’s Reference and then provide some context:
- The max line length is 80 characters; afterward, a line break will follow.
- For link elements that follow the structure,
<a href='link'>text</a>the result becomes text
text[link]. If text and link are the same, only text will be present.
- Headers (
<h2>Etc) are uppercased.
- Heading cells (
<th>) are uppercased.
- Empty lines will be trimmed as a space-saving measure.
- Unordered lists will use * as a prefix.
- There will be 3 spaces between data table columns
- There will be 0 empty lines between data table rows.
href='#...'will be ignored.
- New lines
\nfrom the input, HTML will be collapsed into space as any other HTML whitespace characters.
These are not limitations per se but notice a pattern in the conversion. For example, the links will be converted to
text[link] the markdown reference for the links (some “flavors” of markdown will represent it differently, but this is how Microsoft represents it). The same with the * that represents in markdown an unordered list. This does not mean that the “HTML to text” is converting HTML to markdown, but since you’re aware of what will be converted, you can use them to your advantage.
Since HTML files can be quite big, Microsoft imposes a 5Mb size in the file and a 250 HTML DOM Tree. DOM stands for “Document Object Model,” and you can think of it this way. If you have an HTML file, you’ll have:
As you can see, it’s a tree structure. Inside the text, you can have more links, tables, images, and more. The limitation indicates that you can only have nested up to 250 branches of the tree. It may look quite large, but HTML can get very complex very quickly, so be aware of this limit if you’re not getting all values converted correctly.
Here are some things to keep in mind.
First, convert and then format.
As you can notice, the “HTML to Text” allows for the formatting of the value.
I would recommend, though, that you leave it “as is,” convert the HTML to text, and in the next action, do the necessary formatting.
It’s important to have the “source” text unchanged. The “HTML to text” action will convert the value that you can refer to in any part of your Flow. This is important because, in the future, you may want to convert the text before you do the formatting, so you have the division of work done already.
Also, it’s important to have only one “step” per action. In this action, we’re converting the text, and this enables us to debug to see if the text was converted properly. If we apply any formatting, then we won’t know if it was the conversion or the formatting that made the changes that we see.
Be careful with the limitations.
Both file size and DOM branch number can be a problem in complex HTML. Break it down and parse it in sections if you have to. Find a point where you can safely break your HTML into sections, like
<br> for example, and then combines all of them in the end.
Don’t try to do it yourself.
It’s complex to parse HTML, so don’t try to do it yourself. If you have huge files, break them down and use the “HTML to Text” action to parse them. There are many hidden complexities in parsing HTML, so always defer to the actions to do the heavy lifting for you.
Name it correctly
The name is super important in this case since we’re getting information from somewhere. It’s important to know where it comes from and what it is. Always build the name so that other people can understand what you are using without opening the action and checking the details.
Always add a comment.
Adding a comment will also help to avoid mistakes. Indicate where the HTML comes from and what you are expecting to get. It’s important to enable faster debugging when something goes wrong.
Always deal with errors.
Have your Flow fail graciously when the file doesn’t exist and notify someone that something failed. It’s horrible to have failing Flows in Power Automate since they may go unlooked for a while or generate even worse errors. I have a template and a template that you can use to help you make your Flow resistant to issues. You can check all details here.
Back to the Power Automate Action Reference.