Always typed a quality into your Home windows bid punctual lone to seat a unusual signal look? Oregon possibly tally a book that unexpectedly outputs garbled matter? The perpetrator is frequently a mismatch betwixt the encoding utilized by cmd.exe and the encoding of the matter being enter oregon displayed. Knowing however cmd.exe handles encoding is important for anybody running with matter information, scripts, oregon global characters inside the Home windows bid-formation situation. This station dives into the intricacies of cmd.exe encoding, exploring however it plant, wherefore it issues, and however to troubleshoot communal encoding points.
Decoding the Default: Knowing Cmd.exe’s Encoding
By default, cmd.exe makes use of the scheme’s progressive codification leaf. This codification leaf is a bequest scheme inherited from the days of DOS and determines the quality fit disposable for usage. Deliberation of it arsenic a representation that hyperlinks circumstantial numerical codes to characters. Successful galore Occidental techniques, the progressive codification leaf is sometimes fit to 437 (America) oregon 850 (Multilingual Italic I). These codification pages grip about modular Nation characters with out a hitch. Nevertheless, they tin battle with characters from another languages, particularly these with accents oregon antithetic alphabets.
You tin cheque your scheme’s progressive codification leaf by typing chcp
into the bid punctual. The bid volition instrument a figure corresponding to the progressive codification leaf. Realizing this figure is important once dealing with matter records-data oregon scripts that mightiness beryllium utilizing a antithetic encoding.
For case, if you’re running with a UTF-eight encoded record, and your cmd.exe is utilizing codification leaf 437, you mightiness brush points displaying oregon processing definite characters accurately.
The Value of Codification Leaf Matching
Matching the encoding of your enter, output, and cmd.exe situation is important for stopping information corruption and guaranteeing your scripts tally arsenic anticipated. A mismatch tin pb to respective issues:
- Incorrect quality show: Characters whitethorn look arsenic motion marks, containers, oregon another sudden symbols.
- Book malfunction: Scripts relying connected circumstantial characters mightiness neglect to execute appropriately oregon food sudden outcomes.
- Information failure: Successful terrible circumstances, encoding mismatches tin pb to information corruption, particularly once dealing with records-data containing non-ASCII characters.
Troubleshooting Encoding Points successful Cmd.exe
If you brush encoding issues, present are a fewer troubleshooting steps:
- Place the encoding of the problematic matter: This mightiness affect checking the record properties oregon utilizing a matter application that tin show encoding accusation.
- Alteration the progressive codification leaf: Usage the
chcp
bid adopted by the desired codification leaf figure (e.g.,chcp 65001
for UTF-eight). - Usage a much Unicode-affable situation: See utilizing PowerShell, which has amended activity for Unicode and antithetic encodings.
For illustration, if you are running with a UTF-eight encoded record, you would usage chcp 65001
earlier moving immoderate instructions that work together with the record. This ensures that cmd.exe makes use of the accurate quality mapping, stopping garbled matter oregon book errors.
Unicode and the Early of Cmd.exe
Piece cmd.exe is chiefly based mostly connected bequest codification pages, the expanding prevalence of Unicode has led to enhancements successful its dealing with of global characters. The instauration of chcp 65001
for UTF-eight activity is a important measure guardant. Nevertheless, PowerShell presents a much strong and blanket resolution for running with antithetic encodings successful the Home windows situation. Its autochthonal activity for Unicode and assorted cmdlets designed for encoding manipulation brand it a almighty implement for anybody running with multilingual matter oregon global quality units.
See this: A book designed to procedure person enter from antithetic areas mightiness neglect successful cmd.exe if the enter incorporates characters extracurricular the progressive codification leaf. Nevertheless, the aforesaid book is apt to relation flawlessly successful PowerShell owed to its strong Unicode activity.
Often Requested Questions
Q: What is the quality betwixt a codification leaf and Unicode?
A: Codification pages are constricted mappings of characters to numerical values, whereas Unicode is a cosmopolitan quality fit that goals to embody each characters from each languages.
Knowing however encoding plant successful cmd.exe is indispensable for avoiding irritating quality show points and guaranteeing your scripts relation appropriately. Piece the bequest scheme of codification pages tin beryllium cumbersome, utilizing the accurate chcp
bid tin aid mitigate galore issues. For a much contemporary and versatile attack, see exploring PowerShell and its superior Unicode dealing with. Larn much astir precocious scripting strategies present. By greedy these ideas, you’ll beryllium amended outfitted to navigate the complexities of quality encoding inside the Home windows bid-formation situation and leverage the afloat possible of your scripts and instruments. Cheque retired these assets for additional speechmaking connected quality encoding: Unicode Consortium, Microsoft’s CHCP documentation, and IANA Quality Units.
Question & Answer :
Once I unfastened cmd.exe
connected Home windows, what encoding is it utilizing?
However tin I cheque which encoding it is presently utilizing?
Does it be connected my location mounting oregon are location immoderate situation variables to cheque?
What occurs once you kind a record with a definite encoding? Typically I acquire garbled characters (due to the fact that of incorrect encoding) and typically it benignant of plant.
Nevertheless, I don’t property thing arsenic agelong arsenic I don’t cognize what’s going connected. Tin anybody explicate?
Sure, it’s irritating—typically kind
and another applications mark gibberish, and generally they bash not.
Archetypal of each, Unicode characters volition lone show if the actual console font comprises the characters. Truthful usage a TrueType font similar Lucida Console alternatively of the default Raster Font.
However if the console font doesn’t incorporate the quality you’re making an attempt to show, you’ll seat motion marks alternatively of gibberish. Once you acquire gibberish, location’s much going connected than conscionable font settings.
Once packages usage modular C-room I/O capabilities similar printf
, the programme’s output encoding essential lucifer the console’s output encoding, oregon you volition acquire gibberish. chcp
exhibits and units the actual codepage. Each output utilizing modular C-room I/O features is handled arsenic if it is successful the codepage displayed by chcp
.
Matching the programme’s output encoding with the console’s output encoding tin beryllium achieved successful 2 antithetic methods:
- A programme tin acquire the console’s actual codepage utilizing
chcp
oregonGetConsoleOutputCP
, and configure itself to output successful that encoding, oregon - You oregon a programme tin fit the console’s actual codepage utilizing
chcp
oregonSetConsoleOutputCP
to lucifer the default output encoding of the programme.
Nevertheless, packages that usage Win32 APIs tin compose UTF-16LE strings straight to the console with WriteConsoleW
. This is the lone manner to acquire accurate output with out mounting codepages. And equal once utilizing that relation, if a drawstring is not successful the UTF-16LE encoding to statesman with, a Win32 programme essential walk the accurate codepage to MultiByteToWideChar
. Besides, WriteConsoleW
volition not activity if the programme’s output is redirected; much fiddling is wanted successful that lawsuit.
kind
plant any of the clip due to the fact that it checks the commencement of all record for a UTF-16LE Byte Command Grade (BOM), i.e. the bytes 0xFF 0xFE
. If it finds specified a grade, it shows the Unicode characters successful the record utilizing WriteConsoleW
careless of the actual codepage. However once kind
ing immoderate record with out a UTF-16LE BOM, oregon for utilizing non-ASCII characters with immoderate bid that doesn’t call WriteConsoleW
—you volition demand to fit the console codepage and programme output encoding to lucifer all another.
However tin we discovery this retired?
Present’s a trial record containing Unicode characters:
ASCII abcde xyz Germanic äöü ÄÖÜ ß Polish ąęźżńł Country абвгдеж эюя CJK 你好
Present’s a Java programme to mark retired the trial record successful a clump of antithetic Unicode encodings. It may beryllium successful immoderate programming communication; it lone prints ASCII characters oregon encoded bytes to stdout
.
import java.io.*; national people Foo { backstage static last Drawstring BOM = "\ufeff"; backstage static last Drawstring TEST_STRING = "ASCII abcde xyz\n" + "Germanic äöü ÄÖÜ ß\n" + "Polish ąęźżńł\n" + "Country абвгдеж эюя\n" + "CJK 你好\n"; national static void chief(Drawstring[] args) throws Objection { Drawstring[] encodings = fresh Drawstring[] { "UTF-eight", "UTF-16LE", "UTF-16BE", "UTF-32LE", "UTF-32BE" }; for (Drawstring encoding: encodings) { Scheme.retired.println("== " + encoding); for (boolean writeBom: fresh Boolean[] {mendacious, actual}) { Scheme.retired.println(writeBom ? "= bom" : "= nary bom"); Drawstring output = (writeBom ? BOM : "") + TEST_STRING; byte[] bytes = output.getBytes(encoding); Scheme.retired.compose(bytes); FileOutputStream retired = fresh FileOutputStream("uc-trial-" + encoding + (writeBom ? "-bom.txt" : "-nobom.txt")); retired.compose(bytes); retired.adjacent(); } } } }
The output successful the default codepage? Entire rubbish!
Z:\andrew\initiatives\sx\1259084>chcp Progressive codification leaf: 850 Z:\andrew\tasks\sx\1259084>java Foo == UTF-eight = nary bom ASCII abcde xyz Germanic ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Country ð░ð▒ð▓ð│ð┤ðÁð ÐìÐÄÐÅ CJK õ¢áÕÑ¢ = bom ´╗┐ASCII abcde xyz Germanic ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Country ð░ð▒ð▓ð│ð┤ðÁð ÐìÐÄÐÅ CJK õ¢áÕÑ¢ == UTF-16LE = nary bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n zero♦1♦2♦three♦four♦5♦6♦ M♦N♦O♦ C J Ok `O}Y = bom ■A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n zero♦1♦2♦three♦four♦5♦6♦ M♦N♦O♦ C J Ok `O}Y == UTF-16BE = nary bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦zero♦1♦2♦three♦four♦5♦6 ♦M♦N♦O C J Okay O`Y} = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦zero♦1♦2♦three♦four♦5♦6 ♦M♦N♦O C J Okay O`Y} == UTF-32LE = nary bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n zero♦ 1♦ 2♦ three♦ four♦ 5♦ 6♦ M♦ N ♦ O♦ C J Okay `O }Y = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n zero♦ 1♦ 2♦ three♦ four♦ 5♦ 6♦ M♦ N ♦ O♦ C J Ok `O }Y == UTF-32BE = nary bom A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦zero ♦1 ♦2 ♦three ♦four ♦5 ♦6 ♦M ♦N ♦O C J Okay O` Y} = bom ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦zero ♦1 ♦2 ♦three ♦four ♦5 ♦6 ♦M ♦N ♦O C J Ok O` Y}
Nevertheless, what if we kind
the information that bought saved? They incorporate the direct aforesaid bytes that had been printed to the console.
Z:\andrew\initiatives\sx\1259084>kind *.txt uc-trial-UTF-16BE-bom.txt ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦zero♦1♦2♦three♦four♦5♦6 ♦M♦N♦O C J Ok O`Y} uc-trial-UTF-16BE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣☺↓☺z☺|☺D☺B R u s s i a n ♦zero♦1♦2♦three♦four♦5♦6 ♦M♦N♦O C J Ok O`Y} uc-trial-UTF-16LE-bom.txt ASCII abcde xyz Germanic äöü ÄÖÜ ß Polish ąęźżńł Country абвгдеж эюя CJK 你好 uc-trial-UTF-16LE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n zero♦1♦2♦three♦four♦5♦6♦ M♦N♦O♦ C J Okay `O}Y uc-trial-UTF-32BE-bom.txt ■ A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦zero ♦1 ♦2 ♦three ♦four ♦5 ♦6 ♦M ♦N ♦O C J Ok O` Y} uc-trial-UTF-32BE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ☺♣ ☺↓ ☺z ☺| ☺D ☺B R u s s i a n ♦zero ♦1 ♦2 ♦three ♦four ♦5 ♦6 ♦M ♦N ♦O C J Okay O` Y} uc-trial-UTF-32LE-bom.txt A S C I I a b c d e x y z G e r m a n ä ö ü Ä Ö Ü ß P o l i s h ą ę ź ż ń ł R u s s i a n а б в г д е ж э ю я C J Okay 你 好 uc-trial-UTF-32LE-nobom.txt A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺ ↓☺ z☺ |☺ D☺ B☺ R u s s i a n zero♦ 1♦ 2♦ three♦ four♦ 5♦ 6♦ M♦ N ♦ O♦ C J Ok `O }Y uc-trial-UTF-eight-bom.txt ´╗┐ASCII abcde xyz Germanic ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Country ð░ð▒ð▓ð│ð┤ðÁð ÐìÐÄÐÅ CJK õ¢áÕÑ¢ uc-trial-UTF-eight-nobom.txt ASCII abcde xyz Germanic ├ñ├Â├╝ ├ä├û├£ ├ƒ Polish ─à─Ö┼║┼╝┼ä┼é Country ð░ð▒ð▓ð│ð┤ðÁð ÐìÐÄÐÅ CJK õ¢áÕÑ¢
The lone happening that plant is UTF-16LE record, with a BOM, printed to the console through kind
.
If we usage thing another than kind
to mark the record, we acquire rubbish:
Z:\andrew\initiatives\sx\1259084>transcript uc-trial-UTF-16LE-bom.txt CON ■A S C I I a b c d e x y z G e r m a n õ ÷ ³ ─ Í ▄ ▀ P o l i s h ♣☺↓☺z☺|☺D☺B☺ R u s s i a n zero♦1♦2♦three♦four♦5♦6♦ M♦N♦O♦ C J Okay `O}Y 1 record(s) copied.
From the information that transcript CON
does not show Unicode appropriately, we tin reason that the kind
bid has logic to observe a UTF-16LE BOM astatine the commencement of the record, and usage particular Home windows APIs to mark it.
We tin seat this by beginning cmd.exe
successful a debugger once it goes to kind
retired a record:
Last kind
opens a record, it checks for a BOM of 0xFEFF
—i.e., the bytes 0xFF 0xFE
successful small-endian—and if location is specified a BOM, kind
units an inner fOutputUnicode
emblem. This emblem is checked future to determine whether or not to call WriteConsoleW
.
However that’s the lone manner to acquire kind
to output Unicode, and lone for records-data that person BOMs and are successful UTF-16LE. For each another records-data, and for packages that don’t person particular codification to grip console output, your records-data volition beryllium interpreted in accordance to the actual codepage, and volition apt entertainment ahead arsenic gibberish.
You tin emulate however kind
outputs Unicode to the console successful your ain applications similar truthful:
#see <stdio.h> #specify UNICODE #see <home windows.h> static LPCSTR lpcsTest = "ASCII abcde xyz\n" "Germanic äöü ÄÖÜ ß\n" "Polish ąęźżńł\n" "Country абвгдеж эюя\n" "CJK 你好\n"; int chief() { int n; wchar_t buf[1024]; Grip hConsole = GetStdHandle(STD_OUTPUT_HANDLE); n = MultiByteToWideChar(CP_UTF8, zero, lpcsTest, strlen(lpcsTest), buf, sizeof(buf)); WriteConsole(hConsole, buf, n, &n, NULL); instrument zero; }
This programme plant for printing Unicode connected the Home windows console utilizing the default codepage.
For the example Java programme, we tin acquire a small spot of accurate output by mounting the codepage manually, although the output will get messed ahead successful bizarre methods:
Z:\andrew\initiatives\sx\1259084>chcp 65001 Progressive codification leaf: 65001 Z:\andrew\tasks\sx\1259084>java Foo == UTF-eight = nary bom ASCII abcde xyz Germanic äöü ÄÖÜ ß Polish ąęźżńł Country абвгдеж эюя CJK 你好 ж эюя CJK 你好 你好 好 � = bom ASCII abcde xyz Germanic äöü ÄÖÜ ß Polish ąęźżńł Country абвгдеж эюя CJK 你好 еж эюя CJK 你好 你好 好 � == UTF-16LE = nary bom A S C I I a b c d e x y z …
Nevertheless, a C programme that units a Unicode UTF-eight codepage:
#see <stdio.h> #see <home windows.h> int chief() { int c, n; UINT oldCodePage; char buf[1024]; oldCodePage = GetConsoleOutputCP(); if (!SetConsoleOutputCP(65001)) { printf("mistake\n"); } freopen("uc-trial-UTF-eight-nobom.txt", "rb", stdin); n = fread(buf, sizeof(buf[zero]), sizeof(buf), stdin); fwrite(buf, sizeof(buf[zero]), n, stdout); SetConsoleOutputCP(oldCodePage); instrument zero; }
does person accurate output:
Z:\andrew\tasks\sx\1259084>.\trial ASCII abcde xyz Germanic äöü ÄÖÜ ß Polish ąęźżńł Country абвгдеж эюя CJK 你好
The motivation of the narrative?
kind
tin mark UTF-16LE information with a BOM careless of your actual codepage- Win32 applications tin beryllium programmed to output Unicode to the console, utilizing
WriteConsoleW
. - Another packages which fit the codepage and set their output encoding accordingly tin mark Unicode connected the console careless of what the codepage was once the programme began
- For all the pieces other you volition person to messiness about with
chcp
, and volition most likely inactive acquire bizarre output.