PowerShell: How to replace accents in strings?

Working with text is quite hard, especially when you need “clean” text to create a resource or something that doesn’t support special characters. Let’s think, for example, if you want to create a site for each of your employees. You can create a simple script that uses the person’s name for the site name, but people may have accents or special characters on their names, so we would like to replace the accents in the string but not remove them.

There are many ways to solve this, so let’s look at two of them.

Nuclear. Remove them all

One solution could be to simply remove them, as demonstrated here with the following function:

function Remove-SpecialCharacters {
    param ([String]$sourceStringToClean = [String]::Empty)
    return $sourceStringToClean -replace '[^a-zA-Z0-9]', ''
}

If you run it:

Remove-SpecialCharacters("António")

You’ll get:

Antnio

The function will remove all characters that are not from A to Z (capitalized or not), including numbers.

This may be what you want, but in this case, it would make more sense to replace the special character with its “not so special” counterpart.

Replace the accents

Accents are also called “Diacritics,” so here’s how to replace them with their “normal” counterparts. I developed the Frankenstein script over time and used it in my projects.

So we’ll replace, for example, “Á” with “A” simplifying how it’s displayed.

function Update-SpecialCharacters {
    # From https://stackoverflow.com/questions/7836670/how-remove-accents-in-powershell
    param ([String]$sourceStringToClean = [String]::Empty)
    $normalizedString = $sourceStringToClean.Normalize( [Text.NormalizationForm]::FormD )
    $stringBuilder = new-object Text.StringBuilder
    $normalizedString.ToCharArray() | ForEach-Object { 
        if ( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
            [void]$stringBuilder.Append($_)
        }

        
    }
    # From https://lazywinadmin.com/2015/05/powershell-remove-diacritics-accents.html
    [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($stringBuilder.ToString()))
}

We’re combining here two types of cleaning. The first “block” cleans the Latin characters, while the second will clear the Cyrillic alphabet characters. Here’s an example:

Update-SpecialCharacters("António")
Update-SpecialCharacters("Łagiewnicki")

We’ll get:

Antonio
Lagiewnicki

I understand that the name now doesn’t make sense to my Polish friends, but some systems will appreciate it better than its original form.

Final thoughts

There are many reasons we want to remove the accents from a string, like calculating a username or an email address to a person that doesn’t allow for special characters but needs the characters to be there.

I hope the script above will help you, but if you find something that can be improved, please let me know.

Have a suggestion or a better script to recommend? Leave a comment or interact on Twitter and check out other PowerShell-related articles here.

Photo by Martin Sanchez on Unsplash

 

Manuel Gomes

I'm a previous Project Manager, and Developer now focused on delivering quality articles and projects here on the site. I've worked in the past for companies like Bayer, Sybase (now SAP), and Pestana Hotel Group and using that knowledge to help you automate your daily tasks

View all posts by Manuel Gomes →

Leave a Reply

Your email address will not be published. Required fields are marked *