November 21, 2024

PowerShell: How to replace accents in strings?

Working with text is quite hard, especially when you need “clean” text to create a resource or something that doesn’t support special characters. Let’s think, for example, if you want to create a site for each of your employees. You can create a simple script that uses the person’s name for the site name, but people may have accents or special characters on their names, so we would like to replace the accents in the string but not remove them.

There are many ways to solve this, so let’s look at two of them.

Nuclear. Remove them all

One solution could be to simply remove them, as demonstrated here with the following function:

function Remove-SpecialCharacters {
    param ([String]$sourceStringToClean = [String]::Empty)
    return $sourceStringToClean -replace '[^a-zA-Z0-9]', ''
}

If you run it:

Remove-SpecialCharacters("António")

You’ll get:

Antnio

The function will remove all characters that are not from A to Z (capitalized or not), including numbers.

This may be what you want, but in this case, it would make more sense to replace the special character with its “not so special” counterpart.

Replace the accents

Accents are also called “Diacritics,” so here’s how to replace them with their “normal” counterparts. I developed the Frankenstein script over time and used it in my projects.

So we’ll replace, for example, “Á” with “A” simplifying how it’s displayed.

function Update-SpecialCharacters {
    # From https://stackoverflow.com/questions/7836670/how-remove-accents-in-powershell
    param ([String]$sourceStringToClean = [String]::Empty)
    $normalizedString = $sourceStringToClean.Normalize( [Text.NormalizationForm]::FormD )
    $stringBuilder = new-object Text.StringBuilder
    $normalizedString.ToCharArray() | ForEach-Object { 
        if ( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
            [void]$stringBuilder.Append($_)
        }

        
    }
    # From https://lazywinadmin.com/2015/05/powershell-remove-diacritics-accents.html
    [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($stringBuilder.ToString()))
}

We’re combining here two types of cleaning. The first “block” cleans the Latin characters, while the second will clear the Cyrillic alphabet characters. Here’s an example:

Update-SpecialCharacters("António")
Update-SpecialCharacters("Łagiewnicki")

We’ll get:

Antonio
Lagiewnicki

I understand that the name now doesn’t make sense to my Polish friends, but some systems will appreciate it better than its original form.

Final thoughts

There are many reasons we want to remove the accents from a string, like calculating a username or an email address to a person that doesn’t allow for special characters but needs the characters to be there.

I hope the script above will help you, but if you find something that can be improved, please let me know.

Have a suggestion or a better script to recommend? Leave a comment or interact on Twitter and check out other PowerShell-related articles here.

Photo by Martin Sanchez on Unsplash

 

Manuel Gomes

I have 18 years of experience in automation, project management, and development. In addition to that, I have been writing for this website for over 3 years now, providing readers with valuable insights and information. I hope my expertise allows me to create compelling, informative content that resonates with the audience.

View all posts by Manuel Gomes →

One thought on “PowerShell: How to replace accents in strings?

  1. Thank you for sharing this! I will reference your site when I add it to my scripts. My next step will be to learn what each call is doing, but at least I can use it to handle folders and files that have diacritics in the name.

Leave a Reply

Your email address will not be published. Required fields are marked *

Mastodon