ããã©ã«ãã®æåã³ãŒããåæ ãããªãã XMLãã¡ã€ã«ã®ç·šéã«ãŠçºçã ã¿ã€ãå¥èšå®ã§SJISãããã©ã«ãã«èšå®ããŠããæ¯åUTF-8ã§èªã¿èŸŒãŸããŠããŸããæååããçºçããã 2.2.0.1ããåã®ããŒãžã§ã³ã§ã¯èµ·ããªãã£ãã
encodingåãMS932ãªã©èªèã§ããªããšãã«UTF-8ã«èšå®ããããã°ããããŸãã -- Moca 2016-04-05 (ç«) 13:44:53
ãã®æé ã«ãããŠãåœè©²ãã¡ã€ã«ã¯UTF-8ã§èªã¿èŸŒãŸããŸãã
ãããæ£ããæåã³ãŒãã§éãçŽããåŸã¯ãå±¥æŽã«æ®ã£ãŠããéãåé¡ã¯çºçããŸããã
ãªããååèµ·åã®ç¶æ
ã§ãUTF-8ã§èªã¿èŸŒãŸããŸããã
SJISã䜿ãããã«èšå®ããXMLã®ã¿ã€ãå¥èšå®ãããç¶æ ã§ãäžåºŠãéããããšã®ãªãXMLææžãæ±ããšãã¯å¿ ãééããããšã«ãªããšæããŸãã
XML宣èšããæåã³ãŒããå€å¥ããåŠçã«åé¡ããããšæãããŸãã
åœè©²ãã¡ã€ã«ãéãéã®æåã³ãŒãå€å¥åŠçã§ã¯CESI::AutoDetectByXMLãå®è¡ãããŠããŸããã
ãã®äžã§èªã¿åã£ãæååãæåã³ãŒãã®å®çŸ©ïŒencodingNameToCodeïŒãšæ¯èŒããåŠçã«ãããŠãã©ã®å®çŸ©ãšãäžèŽããªãã£ãå Žåã¯ã«ãŒããçµäºããŠããŸããŸãã
ãã®ã«ãŒãã®åŸã«å®è¡ãããã³ãŒãã¯ãencodingæå®ç¡ãã§xml宣èšãçµäºããããšå€æããŠåŒã³åºãå
ã«UTF-8ã§ããæšãè¿çããŠãããããã«ãã£ãŠèª€ã£ãæåã³ãŒãã§ãã¡ã€ã«ãéããŠããŸããã
ãŸãããencodingæå®ãç¡ããXMLææžã¯UTF-8ãŸãã¯UTF-16ã®ã©ã¡ãããšããŠåãæ±ãããšãXMLã®ä»æ§æžã§èŠå®ãããŠããŸããããã®ã³ãŒãã¯UTF-16ã§ããå¯èœæ§ãèæ
®ããŠããŸããã®ã§ãããUTF-16ã§äœæãããªããencodingæå®ãçç¥ããŠããå ŽåïŒä»æ§äžã¯å¯èœã§ãïŒãåæ§ã®åé¡ãçºçããŸãã
ããŒãžã§ã³: 2.2.0.1
2.3.0.0 ã§ãåãçç¶ãèµ·ããŸããã -- anonymous 2016-04-05 (ç«) 10:59:32
æå ã§ã¯çŸæç¹ã§ææ°ã®masterã§ãåçŸã§ããŸããã
äžèšãã°ä¿®æ£âupatchid:1050ã xmlã§encodingããªãå Žåãéåžžã®èªåèªèã®åªå 床ãäžããŸãã -- Moca 2016-04-05 (ç«) 13:54:10
sf.netã«ã¯ãæ¬ä»¶ã«å¯ŸããããããšããŠpatchunicode#1050ããã§ã«ææ¡ãããŠããŸãããUTF-16ã§ãšã³ã³ãŒãã£ã³ã°å®£èšãçç¥ãããã¿ãŒã³ã«å¯Ÿããèæ
®ããããŸããã
ãŸããæåã³ãŒãåã®å¥åã远å ãã倿Žãè¡ãããŠããŸããã远å ãããå¥åã¯XMLã®ä»æ§çã«æ£ãããªãããã§ãã
ãªãäœè«ã«ãªããŸãããencodingNameToCodeã«ã¯éè€ã1ä»¶ïŒwindows-1252ïŒãããŸããpatchunicode#1050ã§ã¯ããã«1ä»¶ïŒshift_jisïŒå¢ããŠããŸãã
以äžãBugReport/195åã³patchunicode#1050ããã®è»¢èšãšãèªåãªãã®èª¿æ»ã®å ±åããããŠããã ããŸãã
远å ãããå¥åã¯XMLã®ä»æ§çã«æ£ãããªã
æ¬æã«èšèŒãããšé·ããªããããªã®ã§ãã³ã¡ã³ãã«åããŸãã
patchunicode#1050ã確èªããã«ããã£ãŠãXMLã®ä»æ§ãèŠãŠã¿ãŸããã
XML宣èšã«ããããšã³ã³ãŒãã£ã³ã°ã®å®£èšã¯ãã£ãã¿ãŒ4.3.3ã«ãããæ¬¡ã®åœ¢åŒã«ãªã£ãŠããŸãã
EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
UTF-8/UTF-16ã®ã»ãã«ãIANAãå®çŸ©ãããã®ãš"x-"æ¥é èŸããå§ãŸããã®ã§äžèšã®ãã©ãŒããããæºããããã®ã䜿ããããã§ãã
IANAã®å®çŸ©ã§ã¯MIMEã§äœ¿ããååãšãšã€ãªã¢ã¹ ã ãå«ãŸããŠããŸãã
ãŸãããšã€ãªã¢ã¹ã«å«ãŸãã"cs"ã§å§ãŸããã®ã¯ãMIBã§å©çšããããã«RFC3808ã§å®çŸ©ããããšããæ³šéããããŸãããMIBã¯SNMPãªã©ã§å©çšãããããŒã¿ããŒã¹ãšã®ããšã§ãã
ãªããHTML/CSSã以åã¯IANAã®å®çŸ©ãåç
§ããŠããããã§ãããW3Cã®ææžã«ãããšçŸåšã¯WHATWGãæäŸããŠããæåã³ãŒãã®ãªã¹ããåç
§ããŠããïŒãªã³ã¯å
ã®ãã£ãã¿ãŒ4.2ã«ãããŸãïŒãå·ŠåŽã®åããéžã¶ããã«ãšèšè¿°ãããŠããŸããã
IANAã®å®çŸ©ãšpatchunicode#1050ã®å€æŽãçªãåãããšããããMS_Kanjiããš"cs"ã§å§ãŸããã®ä»¥å€ã¯å«ãŸããŠããŸããã§ããã
ä»®ã«è¿œå ãããšããå Žåãå°ãªããšãIANAã®å®çŸ©ãWHATWGã®ãªã¹ãã«å«ãŸãããã®ããéžã¶ã¹ãã§ã¯ãªãããšæããŸããã
åé¡ã 2〠æžããŠããããã«èŠããŸãã
ã¿ã€ãã«éãã®åé¡ã解決ãã issue ãšæããŠããã§ããããïŒ
ã¿ã€ãã«ã«æžããŠãªãåé¡ãããªãéèŠãªèª²é¡ã ãšæãã®ã§ã¡ãã£ãšæ±ãã«å°ãæãã§ãã
ããã©ã«ãã®æåã³ãŒããåæ ãããªããSJISã«èšå®ããŠãUTF-8ã§èªã¿èŸŒãŸããã
åå ïŒ encodding="SJIS" ããµããŒããããŠããªãããã
察çïŒïŒ ãŠãŒã¶ãŒãããµããŒããããŠãããšã³ã³ãŒãã£ã³ã°å Windows-31J ã«æžãæããã
察çïŒïŒ ãŠãŒã¶ãŒãããµããŒããããŠãããšã³ã³ãŒãã£ã³ã°å x-sjis ã«æžãæããã
察çïŒïŒ éçºè
ãããµããŒããããšã³ã³ãŒãã£ã³ã°ã« SJIS ã远å ããã
æ®éã«èããŠã察çïŒã®äžæã§ããã :smile:
encodingåãMS932ãªã©èªèã§ããªããšãã«UTF-8ã«èšå®ããããã°ããããŸãã
åå ïŒ ãšã³ã³ãŒãã£ã³ã°åãèªèã§ããªããšãã«ãšã³ã³ãŒãã£ã³ã°ãå€å®ããããžãã¯ã誀ã£ãŠããïŒ
察çïŒ æ£ããã¯UTF-8ãUTF-16ã®ãããããšããŠæ±ã仿§ãªã®ã§ãããžãã¯ãä¿®æ£ããã
|No.|encoding|encodingå¥å|å€å®æ¡ä»¶|
|--|--|--|--|
|1|UTF-16|UTF16LE|å
é 2ãã€ãã{ 0xff, 0xfe }ãšäžèŽãã|
|2|UTF-16|UTF16BE|å
é 2ãã€ãã{ 0xfe, 0xff }ãšäžèŽãã|
|3|UTF-8|UTF8BOM|å
é 3ãã€ãã{ 0xef, 0xbb, 0xbf }ãšäžèŽãã|
|4|UTF-8|UTF8N|äžèšä»¥å€|
ä¿®æ£å¯Ÿè±¡ã¯ãã®é¢æ°ãªãã§ããã©ãå
šäœçã«ãªãã¡ã¯ã¿ãªã³ã°ããã»ããè¯ããããªé°å²æ°ã§ãã
https://github.com/sakura-editor/sakura/blob/8f58ec825d2cc29c192725b13e6820fd89718e8d/sakura_core/charset/CESI.cpp#L879-L884
ä¿®æ£ããã®ã¯ç°¡åã ãã©ãã¬ãã¥ãŒããã®ãããã©ããã»ã»ã»
ãïŒã¡ãã£ãšèªã¿éã£ãŠããããã
2. ãã®æãæåã³ãŒããšããŠSJISã䜿ãããã«æå®ããŠãããŸãã
èŠããã«ãåé¡2ã§ããããšã³ã³ãŒãã£ã³ã°ã®å€å®è¡šãæ£ãããªãã§ãã :sob:
|No.|encoding|encodingå¥å|å€å®æ¡ä»¶|
|--|--|--|--|
|1|UTF-16|UTF16LE|å
é 2ãã€ãã{ 0xff, 0xfe }ãšäžèŽãã|
|2|UTF-16|UTF16BE|å
é 2ãã€ãã{ 0xfe, 0xff }ãšäžèŽãã|
|3|UTF-8|UTF8BOM|å
é 3ãã€ãã{ 0xef, 0xbb, 0xbf }ãšäžèŽãã|
|4|(èšå®ã«äŸå)|-|äžèšä»¥å€ããã€ãã¿ã€ãå¥èšå®ã«ããã©ã«ããšã³ã³ãŒãã£ã³ã°ããã|
|5|UTF-8|UTF8N|äžèšä»¥å€|
çŸç¶ãã¿ã€ãå¥èšå®ã¯ãã¡ã€ã«ã®æ¡åŒµåã§è¡ã£ãŠããŸãããXMLææžå 容ã«åºã¥ããšã³ã³ãŒãã£ã³ã°å€å®ã®åŠçã®ã³ã³ããã¹ãã«ã¯è§£æäžããŒã¿ã®ã¿ã€ãå¥èšå®ãåç §ã§ããŸããã
ãªã®ã§ãŸãããã¡ã€ã«ã®æ¡åŒµåã«åºã¥ããŠã¿ã€ãå¥èšå®ã®ããã©ã«ããšã³ã³ãŒãã£ã³ã°ãé©çšããã®ã¯ãèšèšçã«äžå¯èœã§ãã
ãç²ãæ§ã§ãã
ã¿ã€ãã«éãã®åé¡ã解決ãã issue ãšæããŠããã§ããããïŒ
ã¿ã€ãã«ã«æžããŠãªãåé¡ãããªãéèŠãªèª²é¡ã ãšæãã®ã§ã¡ãã£ãšæ±ãã«å°ãæãã§ãã
èªåã¯sf.netã®ããŒãžãããè§£éããŸããã
ãSJISãèšå®ããŠããã®ãã ãã¯ã¿ã€ãå¥èšå®ã®ç»é¢ã®è©±ã ãšæããŸãã
ã¿ã€ãå¥èšå®ã®ãŠã£ã³ããŠã¿ãã«ããã©ã«ãã®æåã³ãŒããèšå®ããç®æããããããã§SJISãéžæããããšããããšã ãšæããŸããã
æ¢å®ã¯ãæ¹è¡ã³ãŒãïŒCR+LFããæåã³ãŒãïŒUTF-8ããBOMãªãããCPãªãããèªåå€å¥æã«CESU-8ãåªå
ããªãããèšå®ãããŠããŸãã
éžæè¢ã¯ãSJIS/EUC-JP/Latin1/UTF-16/UTF-16BE/UTF-8/CESU-8ã®7åã§ãã
ããã§SJISãéžæããŠããã®ã«UTF-8ã§èªã¿èŸŒãŸããããšããããšããšæããŸãã
ïŒãªããæ¬æã®åçŸæ¹æ³ã¯ãããèžãŸããŠSJISãéžã¶ããã«æžããŠããŸãïŒ
åé¡â ã«ã€ããŠã¯ã远å 察å¿ã§è¯ããšæããŸãã
èšè¿°ãä¿®æ£ãããšããã·ãã¥ãšãŒã·ã§ã³ãããã®ã§ãIANAå®çŸ©ã«ãªããã®ãå«ããŠãããããããªãã®ãè¶³ããŠããããã§ãã
ãã ãIANAå®çŸ©ã«ããã®ã«è¿œå ãããŠããªããã®ãããã®ã§ããããã©ãããããããæ©ãã§ããŸãïŒæ°ãå€ãã§ãïŒã
çŸç¶ãã¿ã€ãå¥èšå®ã¯ãã¡ã€ã«ã®æ¡åŒµåã§è¡ã£ãŠããŸãããXMLææžå 容ã«åºã¥ããšã³ã³ãŒãã£ã³ã°å€å®ã®åŠçã®ã³ã³ããã¹ãã«ã¯è§£æäžããŒã¿ã®ã¿ã€ãå¥èšå®ãåç §ã§ããŸããã
ããæã£ãŠãŸããã
åé¡â¡ã«ã€ããŠïŒã³ã¡ã³ããåããŸããïŒã§ããã詊ãããšããUTF-16ã§encodingæå®ç¡ãã®ãã¿ãŒã³ã§ã¯ãæžå¿µããŠããåé¡ã¯çºçããŸããã§ãããèªåã®ææã«çµãã£ãããã§ãã
CESI::DetectUnicodeBomãCESI::AutoDetectByXMLãããå
ã«åŒã°ããŠãããBOMãããå Žåã¯ãã®æ®µéã§Unicodeã§ãããšæ±ºãŸã£ãŠããŸãã®ã§æ£ããåŠçãããŸãã
BOMããªãå Žåã¯CESI::AutoDetectByXMLã«å
¥ããŸãããçªåãã®ã«ãŒãåŠçã«å
¥ããã«å¥ã®åŠçã§ãã¯ãUnicodeãšå€å®ãããŠããŸããã
ãã®ããã察å¿ããªããšãããªãã®ã¯ãencodingæå®ããªããããã£ãŠãèªèã§ããªãæåã³ãŒããæå®ãããŠããå Žåã§ãUnicode以å€ã®æåã³ãŒãã§äœæãããŠããããã¿ãŒã³ã ãã®ããã§ãã
ã¡ãªã¿ã«ãåœæã®æ åœè ã§ããMocaããã®å¯ŸåŠæ¹æ³ããããããèªã¿åããšã次ã®ããã«ãªã£ãŠããŸããã
ã³ãŒãããã ãšãCODE_NONEã®ãŸãŸæåã³ãŒãå€å®åŠçãçµãã£ãŠããŸã£ãå Žåã¯ãããã©ã«ãèšå®ã®æåã³ãŒãã䜿ãããããã«èŠããŸããããã§ããã°ã¿ã€ãå¥èšå®ã®æåã³ãŒããåæ ããããã§ãã
ã¡ãªã¿ã«ãåœæã®æ åœè ã§ããMocaããã®å¯ŸåŠæ¹æ³ããããããèªã¿åããšã次ã®ããã«ãªã£ãŠããŸããã
- å¥åã远å ãã
- èªèã§ããªãæåã³ãŒãã®æã¯CODE_UTF8ã§ã¯ãªãCODE_NONEãè¿ã
- encodingæå®ããªããšãã¯CODE_AUTODETECTãè¿ããŠèªåå€å¥åŠçãè¡ãããã«ããããã§ãããã§ã決å®ã§ããªãå Žåã«éãCODE_UTF8ãè¿ã
ã³ãŒãããã ãšãCODE_NONEã®ãŸãŸæåã³ãŒãå€å®åŠçãçµãã£ãŠããŸã£ãå Žåã¯ãããã©ã«ãèšå®ã®æåã³ãŒãã䜿ãããããã«èŠããŸããããã§ããã°ã¿ã€ãå¥èšå®ã®æåã³ãŒããåæ ããããã§ãã
ãã®å 容ã§åé¡ãªãããã«æããŸãã
远å ããå¥åã§ããã
ãšãããããã®2ç¹ãæºãããã®ãæœåºããŠã¿ãŸãããããã§ã39åãããŸãã
"cscesu8", "cscesu-8""cseucpkdfmtjapanese", "extended_unix_code_packed_format_for_japanese", "x-euc-jp""csiso2022jp""cp819", "csisolatin1", "ibm819", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1""csshiftjis", "ms932", "ms_kanji", "shift-jis", "sjis""unicodefffe", "utf-16be""csunicode", "iso-10646-ucs-2", "ucs-2", "unicode", "unicodefeff", "utf-16", "utf-16le""csutf7""unicode-1-1-utf-8", "unicode11utf8", "unicode20utf8", "utf8", "x-unicode20utf8""csWindows31J""cp1252", "cswindows1252", "x-cp1252"远å ããå¥åã§ããã
- ãµã¯ã©ãšãã£ã¿ã§ä¿åæã«æå®ã§ããæåã³ãŒã
- IANAã®ãªã¹ããšWHATWGã®ãªã¹ãã®ã©ã¡ããã«ãã
ãšãããããã®2ç¹ãæºãããã®ãæœåºããŠã¿ãŸãããããã§ã39åãããŸãã
39åããããªããã®ãŸãŸå ¥ããŠãããããããªãããšæããŸãã
IANAã®ãªã¹ã â WHATWGã®ãªã¹ã ãšããé¢ä¿ïŒIANAã®ãªã¹ãã¯WHATWGã®ãªã¹ãã«å«ãŸããïŒã§ã
IANAã®ãªã¹ã â Windowsã®ã³ãŒãããŒãžãªã¹ã ãšããé¢ä¿ïŒIANAã®ãªã¹ãã¯Windowsã®ã³ãŒãããŒãžãªã¹ãã«å«ãŸããïŒã ã£ããšæããŸãã
åé¡ã¯ãµã¯ã©ãšãã£ã¿ã IANAã®ãªã¹ã ã«ã Windowsã®ã³ãŒãããŒãžãªã¹ã ã«ãååšããªãã³ãŒãããŒãž CESU-8 ã«å¯Ÿå¿ããŠããïŒïŒïŒããšã§ããµã¯ã©ãšãã£ã¿ããµããŒãããã³ãŒãããŒãžã®ç¯å²ãäžåœã«çãèŠããŠãæ°ãããŠããŸãã
ãµã¯ã©ãšãã£ã¿ã«ã¯ãCPãã§ãã¯ããã¯ã¹ã«ãã Windowsã®ã³ãŒãããŒãžãªã¹ã ã«å«ãŸãããã¹ãŠã®ã³ãŒãããŒãžã«å¯Ÿå¿ããããšãã§ããŸãããªã®ã§ããšã³ã³ãŒãã£ã³ã°åã Windowsã®ã³ãŒãããŒãž ã«å€æããããšãã§ããã°ãããªãããããã®ãšã³ã³ãŒãã£ã³ã°ã«å¯Ÿå¿ããããšãã§ããã¯ãã§ãã
ããããèªã¿è¿ãããshift_jisã¯éè€ããŠããŸããã§ããããã¿ãŸããã§ããã
ïŒä»æŽãªããèšæ£ãæ¬æã«åæ ããŸãããïŒ
CPã®æ©èœãç¥ããªãã£ãã®ã§ããããããšè©ŠããŠã¿ãŸããã
ãã®éçšã§ISO-8859-1ã«28591çªãå²ãåœãŠãããŠããããšã«æ°ãä»ããŸããã
ããã§ãCODE_LATIN1ã¯1252ãš28591ã®ã©ã¡ããæããŠããã®ãæ°ã«ãªã£ãŠããŸãã
encodingNameToCodeäžã§ã¯windows-1252ãæ¬¡ã®ããã«èšè¿°ãããŠããŸãã
{ "windows-1252", 12, CODE_LATIN1 },
ïŒäžç¥ïŒ
{"windows-1252", 12, 1252},
å®éã«windows-1252ãšãã宣èšã«ééããå ŽåãCODE_LATIN1ãšå€å®ãããããã§ãã
ããã§ãCODE_LATIN1ã¯1252ãš28591ã®ã©ã¡ããæããŠããã®ãæ°ã«ãªã£ãŠããŸãã
Windows-1252ïŒïŒISO-8859-15ãISO-8859-1ã®æ¹èšçïŒã§ãã
https://github.com/sakura-editor/sakura/blob/8f58ec825d2cc29c192725b13e6820fd89718e8d/sakura_core/charset/CLatin1.cpp#L80-L81
encodingNameToCodeäžã§ã¯windows-1252ãæ¬¡ã®ããã«èšè¿°ãããŠããŸãã
{ "windows-1252", 12, CODE_LATIN1 }, ïŒäžç¥ïŒ {"windows-1252", 12, 1252},å®éã«windows-1252ãšãã宣èšã«ééããå ŽåãCODE_LATIN1ãšå€å®ãããããã§ãã
ãã®èšè¿°ã®æå³ã¯ããããã«ç°ãªããŸãã
CODE_LATIN1 ãšå€å®ããå Žåã Latin1倿å°çšã¯ã©ã¹ã§ãã CLatin1 ã䜿ãããŸãã
1252 ãšå€å®ããå Žåãæ±çšã³ãŒãããŒãžå€æã¯ã©ã¹ã§ãã CCodePage ã䜿ãããŸãã
äž¡è ã®éãã¯ãç¬èªã«ã«ã¹ã¿ã ãã倿ã䜿ããã©ããã§ãå®éã©ã£ã¡ãéãã®ãïŒã¯ãŒããç¥ããªãã£ãã
Windows-1252ïŒïŒISO-8859-15ãISO-8859-1ã®æ¹èšçïŒã§ãã
Windows-1252 㯠ISO-8859-15 ãšã¯ç°ãªããŸãã
Windows-1252ïŒïŒISO-8859-15ãISO-8859-1ã®æ¹èšçïŒã§ãã
Windows-1252 㯠ISO-8859-15 ãšã¯ç°ãªããŸãã
https://ja.wikipedia.org/wiki/Windows-1252
https://ja.wikipedia.org/wiki/ISO/IEC_8859-15
笊å·äœçœ®ãç°ãªãã®ã§ããã¢ããããã§ãã
ã¿ã€ãå¥èšå®ã«ããæåã³ãŒãèšå®ãåæ ãããããã«ãã倿Žãã#1428ã§æåºããŸããã
ã察å¿ããããšãããããŸããã
Most helpful comment
Windows-1252 㯠ISO-8859-15 ãšã¯ç°ãªããŸãã