By M. Dudley


2011-04-08 15:02:39 8 Comments

Out-File seems to force the BOM when using UTF-8:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath

How can I write a file in UTF-8 with no BOM using PowerShell?

15 comments

@mklement0 2016-01-23 21:44:57

Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition (v6+), UTF-8 without BOM is the default encoding, across all cmdlets.

To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):

For convenience, here's advanced function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means:

  • you can use it just like Out-File in a pipeline.
  • input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File.

Example:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath

Note how (Get-Content $MyPath) is enclosed in (...), which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.

A note on memory use:

  • M. Dudley's own answer requires that the entire file contents be built up in memory first, which can be problematic with large files.
  • The function below improves on this only slightly: all input objects are still buffered first, but their string representations are then generated and written to the output file one by one.

Source code of Out-FileUtf8NoBom (also available as an MIT-licensed Gist):

<#
.SYNOPSIS
  Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).

.DESCRIPTION
  Mimics the most important aspects of Out-File:
  * Input objects are sent to Out-String first.
  * -Append allows you to append to an existing file, -NoClobber prevents
    overwriting of an existing file.
  * -Width allows you to specify the line width for the text representations
     of input objects that aren't strings.
  However, it is not a complete implementation of all Out-String parameters:
  * Only a literal output path is supported, and only as a parameter.
  * -Force is not supported.

  Caveat: *All* pipeline input is buffered before writing output starts,
          but the string representations are generated and written to the target
          file one by one.

.NOTES
  The raison d'ĂȘtre for this advanced function is that, as of PowerShell v5,
  Out-File still lacks the ability to write UTF-8 files without a BOM:
  using -Encoding UTF8 invariably prepends a BOM.

#>
function Out-FileUtf8NoBom {

  [CmdletBinding()]
  param(
    [Parameter(Mandatory, Position=0)] [string] $LiteralPath,
    [switch] $Append,
    [switch] $NoClobber,
    [AllowNull()] [int] $Width,
    [Parameter(ValueFromPipeline)] $InputObject
  )

  #requires -version 3

  # Make sure that the .NET framework sees the same working dir. as PS
  # and resolve the input path to a full path.
  [System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory
  $LiteralPath = [IO.Path]::GetFullPath($LiteralPath)

  # If -NoClobber was specified, throw an exception if the target file already
  # exists.
  if ($NoClobber -and (Test-Path $LiteralPath)) {
    Throw [IO.IOException] "The file '$LiteralPath' already exists."
  }

  # Create a StreamWriter object.
  # Note that we take advantage of the fact that the StreamWriter class by default:
  # - uses UTF-8 encoding
  # - without a BOM.
  $sw = New-Object IO.StreamWriter $LiteralPath, $Append

  $htOutStringArgs = @{}
  if ($Width) {
    $htOutStringArgs += @{ Width = $Width }
  }

  # Note: By not using begin / process / end blocks, we're effectively running
  #       in the end block, which means that all pipeline input has already
  #       been collected in automatic variable $Input.
  #       We must use this approach, because using | Out-String individually
  #       in each iteration of a process block would format each input object
  #       with an indvidual header.
  try {
    $Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) }
  } finally {
    $sw.Dispose()
  }

}

@delianmc 2018-07-19 16:19:03

Had the same issue. That did the trick for me:

$MyFile | Out-File -Encoding Oem $MyPath

While opening the file with Visual Studio Code or Notepad++ it shows as UTF-8

Finnaly this just "appear" to work. When open with some editor it shows as UTF-8 no BOM. But this is not true at all. Use the solution at the top of the tread. This one work for real

@Paulo Merson 2018-10-24 15:19:22

When converting letters with accent (áãàç...), OEM encoding did not produce the corresponding UTF-8 encoding characters on my Windows 10 machine.

@sc911 2019-03-09 12:59:18

Starting from version 6 powershell supports the UTF8NoBOM encoding both for set-content and out-file and even uses this as default encoding.

So in the above example it should simply be like this:

$MyFile | Out-File -Encoding UTF8NoBOM $MyPath

@John Bentley 2019-09-16 05:03:32

@RaúlSalinas-Monteagudo what version are you on?

@KCD 2019-10-29 02:48:04

Nice. FYI check version with $PSVersionTable.PSVersion

@SATO Yusuke 2017-05-24 13:35:09

If you want to use [System.IO.File]::WriteAllLines(), you should cast second parameter to String[] (if the type of $MyFile is Object[]), and also specify absolute path with $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), like:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)

If you want to use [System.IO.File]::WriteAllText(), sometimes you should pipe the second parameter into | Out-String | to add CRLFs to the end of each line explictly (Especially when you use them with ConvertTo-Csv):

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)

Or you can use [Text.Encoding]::UTF8.GetBytes() with Set-Content -Encoding Byte:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"

see: How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

@mklement0 2018-02-19 16:05:47

Good pointers; suggestions/: the simpler alternative to $ExecutionContext.SessionState.Path.GetUnresolvedProviderPat‌​hFromPSPath($MyPath) is Convert-Path $MyPath; if you want to ensure a trailing CRLF, simply use [System.IO.File]::WriteAllLines() even with a single input string (no need for Out-String).

@Lucero 2018-04-23 17:48:31

When using Set-Content instead of Out-File, you can specify the encoding Byte, which can be used to write a byte array to a file. This in combination with a custom UTF8 encoding which does not emit the BOM gives the desired result:

# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false

$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath

The difference to using [IO.File]::WriteAllLines() or similar is that it should work fine with any type of item and path, not only actual file paths.

@frank tan 2017-02-08 05:47:40

    [System.IO.FileInfo] $file = Get-Item -Path $FilePath 
    $sequenceBOM = New-Object System.Byte[] 3 
    $reader = $file.OpenRead() 
    $bytesRead = $reader.Read($sequenceBOM, 0, 3) 
    $reader.Dispose() 
    #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 
    if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) 
    { 
        $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) 
        [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) 
        Write-Host "Remove UTF-8 BOM successfully" 
    } 
    Else 
    { 
        Write-Warning "Not UTF-8 BOM file" 
    }  

Source How to remove UTF8 Byte Order Mark (BOM) from a file using PowerShell

@xdhmoore 2017-02-01 06:08:46

For whatever reason, the WriteAllLines calls were still producing a BOM for me, with the BOMless UTF8Encoding argument and without it. But the following worked for me:

$bytes = gc -Encoding byte BOMthetorpedoes.txt
[IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)])

I had to make the file path absolute for it to work. Otherwise it wrote the file to my Desktop. Also, I suppose this only works if you know your BOM is 3 bytes. I have no idea how reliable it is to expect a given BOM format/length based on encoding.

Also, as written, this probably only works if your file fits into a powershell array, which seems to have a length limit of some value lower than [int32]::MaxValue on my machine.

@mklement0 2018-02-19 16:55:35

WriteAllLines without an encoding argument never writes a BOM itself, but it's conceivable that your string happened to start with the BOM character (U+FEFF), which on writing effectively created a UTF-8 BOM; e.g.: $s = [char] 0xfeff + 'hi'; [io.file]::WriteAllText((Convert-Path t.txt), $s) (omit the [char] 0xfeff + to see that no BOM is written).

@mklement0 2018-02-19 16:59:33

As for unexpectedly writing to a different location: the problem is that the .NET framework typically has a different current directory than PowerShell; you can either sync them first with [Environment]::CurrentDirectory = $PWD.ProviderPath, or, as a more generic alternative to your "$(pwd)\..." approach (better: "$pwd\...", even better: "$($pwd.ProviderPath)\..." or (Join-Path $pwd.ProviderPath ...)), use (Convert-Path BOMthetorpedoes.txt)

@xdhmoore 2018-02-21 20:35:25

Thanks, I didn't realize there could be a single BOM character to UTF-8 BOM conversion like that.

@mklement0 2018-02-21 20:42:08

All BOM byte sequences (Unicode signatures) are in fact the respective encoding's byte representation of the abstract single Unicode character U+FEFF.

@xdhmoore 2018-02-21 22:00:25

Ah ok. That does seem to make things simpler.

@M. Dudley 2011-04-08 15:02:53

Using .NET's UTF8Encoding class and passing $False to the constructor seems to work:

$MyFile = Get-Content $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)

@Scott Muc 2011-05-24 06:16:17

Ugh, I hope that's not the only way.

@Roman Kuzmin 2011-11-08 19:42:18

One line [System.IO.File]::WriteAllLines($MyPath, $MyFile) is enough. This WriteAllLines overload writes exactly UTF8 without BOM.

@Groostav 2015-02-18 20:08:50

@sschuberth 2017-01-04 15:38:36

Note that WriteAllLines seems to require $MyPath to be absolute.

@Bender the Greatest 2017-01-20 20:17:22

@sschuberth I just tried WriteAllLines with a relative path, works fine for me. Does it give you an error with a relative path?

@sschuberth 2017-01-20 22:17:56

@AlexanderMiles It "works", but the file ends up being in some weird directory (not relative to the current working directory). IIRC it was the path of the PowerShell interpreter binary.

@xdhmoore 2017-02-01 06:17:35

For me, it seems to write the file to my Desktop even if I'm currently in another directory.

@Rosberg Linhares 2017-06-17 01:03:23

If you don't want an extra new line in the end of the file, you can do this: [IO.File]::WriteAllText($MyPath, $MyFile).

@Shayan Toqraee 2017-09-30 19:00:56

@xdhmoore WriteAllLines gets the current directory from [System.Environment]::CurrentDirectory. If you open PowerShell and then change your current directory (using cd or Set-Location), then [System.Environment]::CurrentDirectory will not be changed and the file will end up being in the wrong directory. You can work around this by [System.Environment]::CurrentDirectory = (Get-Location).Path.

@watery 2018-03-09 16:37:08

This looks to be the solution still in 2018 with Out-File from PowerShell 6; but Notepad++ states the file has no encoding, any hint?

@Or Ohev-Zion 2018-06-20 12:49:21

$MyFile variable does not have to be object that is created by a Get-Content. It can also be a plain string, i.e. $MyFile = "utf8 string of some kind..."

@pholpar 2019-04-17 09:22:13

Instead of New-Object System.Text.UTF8Encoding $False you can use simply New-Object System.Text.UTF8Encoding, since "This constructor creates an instance that does not provide a Unicode byte order mark", see docs.microsoft.com/en-us/dotnet/api/…

@PolarBear 2019-07-26 08:55:55

As @RosbergLinhares noted, WriteAllLines adds an extra new line at the end of a file. But to make WriteAllText work you have to use -Raw parameter for Get-Content, otherwise all text will be squashed into a single line. $fileContent = Get-Content -Raw "$fileFullName"; [System.IO.File]::WriteAllText($fileFullName, $fileContent)

@Lenny 2016-12-02 00:26:54

I figured this wouldn't be UTF, but I just found a pretty simple solution that seems to work...

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

For me this results in a utf-8 without bom file regardless of the source format.

@Chim Chimz 2017-01-12 14:53:09

This worked for me, except I used -encoding utf8 for my requirement.

@user1529294 2017-04-07 05:50:00

Thank you very much. I am working with dump logs of a tool - which had tabs inside it. UTF-8 was not working. ASCII solved the problem. Thanks.

@mklement0 2017-04-07 13:51:02

Yes, -Encoding ASCII avoids the BOM problem, but you obviously only get 7-bit ASCII characters. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but all non-ASCII characters in your input will be converted to literal ? characters.

@TheDudeAbides 2020-01-02 18:40:41

@ChimChimz I accidentally up-voted your comment, but -encoding utf8 still outputs UTF-8 with a BOM. :(

@Jaume Suñer Mut 2016-10-03 13:59:08

Change multiple files by extension to UTF-8 without BOM:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
    $MyFile = Get-Content $i.fullname 
    [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}

@Erik Anderson 2016-09-22 19:36:20

One technique I utilize is to redirect output to an ASCII file using the Out-File cmdlet.

For example, I often run SQL scripts that create another SQL script to execute in Oracle. With simple redirection (">"), the output will be in UTF-16 which is not recognized by SQLPlus. To work around this:

sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force

The generated script can then be executed via another SQLPlus session without any Unicode worries:

sqlplus / as sysdba "@new_script.sql" |
tee new_script.log

@mklement0 2018-02-19 17:03:06

Yes, -Encoding ASCII avoids the BOM problem, but you obviously only get support for 7-bit ASCII characters. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but all non-ASCII characters in your input will be converted to literal ? characters.

@Amit Naidu 2018-03-08 00:06:17

This answer needs more votes. The sqlplus incompatibility with BOM is a cause of many headaches.

@ForNeVeR 2015-10-05 15:03:51

The proper way as of now is to use a solution recommended by @Roman Kuzmin in comments to @M. Dudley answer:

[IO.File]::WriteAllLines($filename, $content)

(I've also shortened it a bit by stripping unnecessary System namespace clarification - it will be substituted automatically by default.)

@Liam 2016-06-17 10:31:33

This (for whatever reason) did not remove the BOM for me, where as the accepted answer did

@ForNeVeR 2016-06-17 14:58:13

@Liam, probably some old version of PowerShell or .NET?

@Bender the Greatest 2017-01-23 16:38:26

I believe older versions of the .NET WriteAllLines function did write the BOM by default. So it could be a version issue.

@ForNeVeR 2017-01-24 03:37:53

@AlexanderMiles best I can tell from .NET 2.0 documentation, it still uses BOMless UTF-8 there.

@BobHy 2017-09-24 17:09:18

Can confirm this writes UTF8 no BOM on Win10 / .Net 4.6. But still needs an absolute path .

@chazbot7 2017-10-30 22:31:26

Confirmed with writes with a BOM in Powershell 3, but without a BOM in Powershell 4. I had to use M. Dudley's original answer.

@Johny Skovdal 2018-01-12 07:05:53

So it works on Windows 10 where it's installed by default. :) Also, suggested improvement: [IO.File]::WriteAllLines(($filename | Resolve-Path), $content)

@Robin Wang 2015-09-22 20:43:38

Could use below to get UTF8 without BOM

$MyFile | Out-File -Encoding ASCII

@ForNeVeR 2015-10-05 15:05:32

No, it will convert the output to current ANSI codepage (cp1251 or cp1252, for example). It is not UTF-8 at all!

@Greg 2015-12-10 22:34:45

Thanks Robin. This may not have worked for writing a UTF-8 file without the BOM but the -Encoding ASCII option removed the BOM. That way I could generate a bat file for gvim. The .bat file was tripping up on the BOM.

@mklement0 2016-01-21 06:01:53

@ForNeVeR: You're correct that encoding ASCII is not UTF-8, but it's als not the current ANSI codepage - you're thinking of Default; ASCII truly is 7-bit ASCII encoding, with codepoints >= 128 getting converted to literal ? instances.

@ForNeVeR 2016-01-21 09:03:53

@mklement0 AFAIK ASCII really mean the default single-byte encoding in this API and generally in Windows. Yes, it is not in sync with the official ASCII definition, but is just a historical legacy.

@mklement0 2016-01-21 15:07:39

@ForNeVeR: You're probably thinking of "ANSI" or "extended ASCII". Try this to verify that -Encoding ASCII is indeed 7-bit ASCII only: 'äb' | out-file ($f = [IO.Path]::GetTempFilename()) -encoding ASCII; '?b' -eq $(Get-Content $f; Remove-Item $f) - the ä has been transliterated to a ?. By contrast, -Encoding Default ("ANSI") would correctly preserve it.

@TNT 2016-08-25 19:25:10

@rob This is the perfect answer for everybody who just doesn't need utf-8 or anything else that is different to ASCII and is not interested in understanding encodings and the purpose of unicode. You can use it as utf-8 because the equivalent utf-8 characters to all ASCII characters are identical (means converting an ASCII-file to an utf-8-file results in an identical file (if it gets no BOM)). For all who have non-ASCII characters in their text this answer is just false and misleading.

@Krzysztof 2015-05-06 12:34:44

This one works for me (use "Default" instead of "UTF8"):

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath

The result is ASCII without BOM.

@M. Dudley 2015-05-06 13:21:17

Per the Out-File documentation specifying the Default encoding will use the system's current ANSI code page, which is not UTF-8, as I required.

@eythort 2016-08-05 11:00:00

This does seem to work for me, at least for Export-CSV. If you open the resulting file in a proper editor, the file encoding is UTF-8 without BOM, and not Western Latin ISO 9 as I would have expected with ASCII

@emptyother 2017-07-22 09:40:32

Many editors open the file as UTF-8 if they can't detect the encoding.

@jamhan 2013-05-01 05:22:46

This script will convert, to UTF-8 without BOM, all .txt files in DIRECTORY1 and output them to DIRECTORY2

foreach ($i in ls -name DIRECTORY1\*.txt)
{
    $file_content = Get-Content "DIRECTORY1\$i";
    [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}

@darksoulsong 2013-09-08 13:34:17

This one fails without any warning. What version of powershell should I use to run it?

@BermudaLamb 2015-03-25 15:44:26

The WriteAllLines solution works great for small files. However, I need a solution for larger files. Every time I try to use this with a larger file I'm getting an OutOfMemory error.

Related Questions

Sponsored Content

23 Answered Questions

[SOLVED] PowerShell says "execution of scripts is disabled on this system."

21 Answered Questions

18 Answered Questions

[SOLVED] Determine installed PowerShell version

  • 2009-12-01 11:30:03
  • MagicAndi
  • 2888393 View
  • 2572 Score
  • 18 Answer
  • Tags:   powershell

17 Answered Questions

[SOLVED] Best way to convert text files between character sets?

10 Answered Questions

[SOLVED] UTF-8 without BOM

36 Answered Questions

[SOLVED] Excel to CSV with UTF8 encoding

14 Answered Questions

[SOLVED] How to run a PowerShell script

15 Answered Questions

[SOLVED] Setting Windows PowerShell environment variables

  • 2009-04-03 17:19:35
  • Vasil
  • 578139 View
  • 567 Score
  • 15 Answer
  • Tags:   windows powershell

24 Answered Questions

[SOLVED] Detect encoding and make everything UTF-8

9 Answered Questions

[SOLVED] Write text files without Byte Order Mark (BOM)?

Sponsored Content