Develop and Download Open Source Software

View skf_1.94_man

category(Tag) tree

file info

category(Tag)
root
file name
skf_1.94_man.txt
last update
2006-07-30 23:27
type
Plain Text
editor
Seiji Kaneko
description
skf 1.94 English man page
language
English
translate
SKF(1)                                                                  SKF(1)



NAME
       skf - simple Kanji Filter (v1.94)

SYNOPSIS
       skf [-AEIJKNQRSXZabdehjknqrsuvxz] [ long_format_options ] [infiles..]

DESCRIPTION
       skf  is  a  yet another i18n capable kanji-filter, designed for reading
       various CJK-coded files on the Net.  It converts input kanji  texts  or
       streams  into  a  character  stream using designated codeset and output
       them to standard output. Specifically, skf is designed to be  a  versa-
       tile  filter  to read documents in various code sets, and does not have
       fancy features which are not directly related to code conversion.

       Like nkf, skf automatically recognizes input file code  when  it  is  a
       kind  of ISO-2022 compliant code, and also detects EUC-variant codes if
       input file is Japanese text without X0201 kanas.   skf  1.9x  can  read
       various  iso-2022  compliant charsets, including JIS Kanji code (X0208,
       X0212 and X0213), EUC encoding (euc-jp (with x-0213  support),  euc-cn,
       euc-kr   and   euc-tw),   ISO   Europian   latins  (ISO-8859-1  to  11,
       13/14/15/16), BS 4730, NF Z 62-010 and X0201 kana  with  ESC-(-I,  SS0,
       Locking  shift.   skf  also  supports  some non-iso2022 compliant sets,
       including Microsoft  Shift-JIS  code,  KOI-8-R/U,  GB2312  (HZ),  big5,
       VISCII(rfc1456,  include VIQR), Unicode standard (UCS2/UTF-16, UTF7 and
       UTF8), some of MS codesets (cp1250 etc.) and some other vendor specific
       codes (KEIS83, JEF etc).

       Supported output codesets include X-0208/X-0212/X-0213 JIS, X-0201 JIS,
       ASCII, Microsoft Shift-JIS, EUC-jp/-kr/-cn, HZ,  iso-2022-jp/kr,  big5,
       VISCII and Unicode.

       skf also provide some basic decoding features for some common encodings
       including MIME, Punycode and URI codepoint.

       Unlike nkf, skf is designed to convert input code  into  some  kind  of
       human-readable  form  under a local environment (i.e. codeset), and has
       several extra conversion features like GNU  recode.   Such  conversions
       include  Windows/Macintosh  specific  code  swap  and old-new jis glyph
       change, html-format/TeX format conversion and variant unifications.

       If file name(s) are given, skf read  the  files  and  output  converted
       stream to stdout. If no file names are given, input is taken from stdin
       and output to stdout.  OPTIONS are  taken  from  Environment  Variables
       SKFENV,  skfenv  and command line, respectively in this order. Environ-
       ment variables are not used when skf is running  as  priviledged  user.
       skf  does  not use LOCALE-related environment variables for conversion,
       but output error messages are controlled by given LOCALES.

OPTIONS
       skf-1.9 is written from scratch, and inherits no code  from  nkf.  How-
       ever, skf is intended to be a drop-in replacement for nkf(v1.4) and has
       a similar commonly-used nkf option set.
       skf 1.9x recognizes following options. Defaults  are  all  off  if  not
       explicitly specified.

   buffering control
       -b     use buffered output. This is default.

       -u     use  unbuffered  output. Code detection feature is disabled when
              this option is on.

   Input/Output codeset options
       --ic=  input_code_set
              specify input codeset is  input_code_set.   Possible  candidates
              are shown below.

       --oc=  output_code_set
              specify  output codeset is output_code_set.  Possible candidates
              are shown below. Default codeset in distribution package is euc-
              jp, but depends on compile option. Default codeset is shown by

     Supported codeset
       skf  recognizes  following  codesets  as an input/output codeset. These
       codeset names are case insensitive,  and  minus  ('-')  and  underscore
       ('_')  is ignored.  Note that iso-2022 escape-based input codeset (reg-
       istered to IANA) is recoginized automatically,  even  when  non-iso2022
       codeset is specified.  o in in-column means named codeset can be speci-
       fied as input and x means named codeset is not for input. output-column
       is same except it is for output.

       in out  name            description
       o  o    iso8859-1       ascii + iso-8859-1 (latin-1)
       o  o    iso8859-2       ascii + iso-8859-2 (latin-2)
       o  o    iso8859-3       ascii + iso-8859-3 (latin-3)
       o  o    iso8859-4       ascii + iso-8859-4 (latin-4)
       o  o    iso8859-5       ascii + iso-8859-5 (Cyrillic)
       o  o    iso8859-6       ascii + iso-8859-6 (Arabic)
       o  o    iso8859-7       ascii + iso-8859-7 (Greek)
       o  o    iso8859-8       ascii + iso-8859-8 (Hebrew)
       o  o    iso8859-9       ascii + iso-8859-9 (latin-5)
       o  o    iso8859-10      ascii + iso-8859-10 (latin-6)
       o  o    iso8859-11      ascii + iso-8859-11 (Thai)
       o  o    iso8859-13      ascii + iso-8859-13 (Baltic Rim)
       o  o    iso8859-14      ascii + iso-8859-14 (Celtic)
       o  o    iso8859-15      ascii + iso-8859-15 (Latin-9)
       o  o    iso8859-16      ascii + iso-8859-16
       o  o    koi-8r          koi-8r (Russian)
       o  o    cp1251          Cyrillic latin MS cp1251
       o  o    jis             iso-2022-jp (rfc1496 7bit JIS)
       o  o    iso-2022-jp-x0213 iso-2022-jp-3 (JIS X-0213:2000).
                      a.k.a. jis-x0213
       o  o    jis-x0213-strict iso-2022-jp-3-strict
       o  o    iso-2022-jp-2004 iso-2022-jp-2004(JIS X-0213:2004)
                      a.k.a. jis-x0213-2004
       o  o    oldjis          iso-2022-jp-1978(JIS X-0208:1978)
       o  o    cp50220         JIS-encoded Microsoft codepage 932.
       o  o    euc-jp          EUC-encoded JIS X-0208:1997
       o  o    euc-x0213       EUC-encoded JIS X-0213:2000
       o  o    euc-jis-2004    EUC-encoded JIS X-0213:2004
       o  o    cp51932         EUC-encoded Microsoft codepage 932
       o  o    euc-kr          EUC-encoded KS X-1001 Korian
       o  o    euc7-kr         7bit EUC-encoded KS X-1001 Korian
       o  o    johab           KS X-1001-johab Korian
       o  o    euc-cn          EUC-encoded GB2312 Chinese
       o  o    euc7-cn         7bit EUC-encoded GB2312 Chinese
       o  o    hz              HZ-encoded GB2312 Chinese
       o  o    euc-tw          EUC-encoded CNS 11643 Chinese
       o  o    gb12345         EUC-encoded GB12345 Chinese
       o  o    gbk             GB2312 Extension(cp936) Chinese
       o  o    gb18030         GB18030 chinese
       o  o    big5            BIG5 (with Eten extension + EURO)
       o  o    cp950           BIG5 (Microsoft cp950 + EURO)
       o  o    big5p           BIG5 plus (with HKSCS)
       o  o    sjis            Shift-jis (Microsoft cp943)
       o  o    shift_jis-x0213 Shift-jis-encoded JIS X-0213:2000
       o  o    shift_jis-2004  Shift-jis-encoded JIS X-0213:2004
       o  x    sjis-cellular   Shift-jis-encoded JIS X-0208
                        with NTT Docomo, Vodafone phone glyph
       o  o    cp932           Shift-jis-encoded MS cp932
       o  o    cp50220         Jis-encoded MS cp50220
       o  o    cp51932         EUC-jp-encoded MS cp51932
       o  o    oldsjis         Shift-jis (JIS X-0208:1978)
       o  o    viscii          VISCII (rfc1456) Vietnamise
       o  o    viqr            VISCII (rfc1456-VIQR) Vietnamise
       o  o    keis            Hitachi KEIS83/90
       o  x    jef             Fujitsu JEF (basic support only)
       o  x    ibm930          IBM EBCDIC DBCS Japanese
       o  x    ibm931          IBM EBCDIC DBCS Japanese w.latin
       o  x    ibm931          IBM EBCDIC DBCS Korian
       o  x    ibm935          IBM EBCDIC DBCS Simpl. Chinese
       o  x    ibm937          IBM EBCDIC DBCS Trad. Chinese
       o  o    ucs2            Unicode(TM) UCS-2/UTF-32LE
       o  o    utf7            Unicode(TM) UTF-7
       o  o    utf8            Unicode(TM) UTF-8
       o  x    transparent     Transparent mode (see below)


     Codeset explanations
       iso-8859-*
              a.k.a. latin*. When specified as output, G0 = GL is ascii and G1
              = GR is iso-8859-*. 8bit encoding is used.

       iso-2022-jp, jis
              Encoding is iso-2022-jp-2 (RFC1496). G0 = GL is JIS x0201 roman,
              G1  = GR is JIS x0201 kana, G2 is iso-8859-1 and G3 is JIS x0212
              Supplementary Kanji.

       jis-x0213, iso-2022-jp-3, iso-2022-jp-2003
              Encoding is iso-2022-jp-3. G0 = GL is JIS x0201 roman, For  out-
              put,  G1  = GR is JIS x0201 kana, G2 is iso-8859-1 and G3 is JIS
              x0213 plane2 Kanji.

       jis-x0213-strict
              Encoding is subset of iso-2022-jp-3-strict (uses Plane 1  only).
              For  output,  G0  =  GL is JIS x0201 roman, G1 = GR is JIS x0201
              kana, G2 is iso-8859-1 and G3 is not set.  Output  code  as  JIS
              x0208  whenever possible. JIS X-0213 input is automatically rec-
              ognized.

       jis-x0213-2004, iso-2022-jp-2004
              Encoding is iso-2022-jp-2003:2004. For output, G0 =  GL  is  JIS
              x0201  roman, G1 = GR is JIS x0201 kana, G2 is iso-8859-1 and G3
              is JIS x0213 plane2 Kanji.

       oldjis
              Encoding is iso-2022-jp (JIS X-0208:1978). G0 = GL is JIS  x0201
              roman, G1 = GR is JIS x0201 kana, G2 is iso-8859-1 and G3 is JIS
              x0212 Supplementary Kanji.

       euc-jp, euc
              Encoding is 8-bit EUC using JIS X-0208:1997 character set.  G0 =
              GL  is  ascii, G1 = GR is JIS x0208, G2 is JIS x0201 kana and G3
              is JIS x0212 Supplementary Kanji.

       euc-x0213, euc-jis-2003
              Encoding is 8-bit EUC-based JIS X0213:2000.  G0 = GL  is  ascii,
              G1  =  GR is X0213 plane 1, G2 is iso-8859-1 and G3 is JIS x0213
              plane2 Kanji.

       euc-jis-2004
              Encoding is 8-bit EUC-based JIS X0213:2004.  G0 = GL  is  ascii,
              G1  =  GR  is X0213:2004 plane 1, G2 is iso-8859-1 and G3 is JIS
              x0213 plane2 Kanji.

       euc-kr
              Encoding is 8-bit EUC using KS X-1001 Wansung character set.  G0
              = GR is KS X1003, G1 = GR is KS X1001, G2 and G3 is not set.

       euc7-kr iso-2022-kr
              Encoding  is  iso-2022-kr  (rfc1557):  7-bit EUC using KS X-1001
              Wansung character set.  G0 = GR is KS X1003, G1 is KS X1001,  G2
              and G3 is not set.

       euc-cn
              Encoding is 8-bit EUC using GB 2312 simplified chinese character
              set.  G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3 is not set.

       euc7-cn
              Encoding is 7-bit EUC using GB 2312 simplified chinese character
              set.  G0 = GR is ASCII, G1 is GB2312, G2 and G3 is not set.

       hz
              Encoding is HZ encoded  (rfc1842)  GB  2312  simplified  chinese
              character  set.   G0 = GR is ASCII, G1 = GR is GB2312, G2 and G3
              is not set.

       euc-tw
              Encoding is EUC encoded CNS11643  Plane1/2  traditional  chinese
              character set. Subset of iso-2022-cn.  G0 = GR is ASCII, G1 = GR
              is CNS11643 plane 1, G2 is CNS11643 plane 2 and G3 is not set.

       gb12345
              Encoding is 8-bit EUC using GB 12345 (GBF)  traditional  chinese
              character  set.  G0 = GR is ASCII, G1 = GR is GB12345, G2 and G3
              is not set.

       gbk, cp936
              Encoding is GBK simplified chinese character set.  G0  =  GR  is
              ASCII and G1 = GR is GBK. G2 and G3 is not set.

       gb18030
              Encoding  is GB18030 (ibm-1392, Windows cp54936) chinese charac-
              ter set.  G0 = GR is ASCII and G1 = GR is GB18030. G2 and G3  is
              not set.

       big5
              Encoding  is  Big5  traditional  chinese character set with ETen
              extension.  Include Euro mapping.  Uses ASCII as latin part.

       big5-cp950
              Encoding is cp950-Big5 traditional chinese character set.   Uses
              ASCII as latin part.

       big5p
              Encoding  is  cp950-Big5  traditional chinese character set with
              HKSCS extension.  Uses ASCII as latin part.

       VISCII (experimental)
              Vietnamise VISCII (rfc1456) character set. Not TCVN-5712.

       VIQR (experimental)
              Vietnamise VISCII character set with VIQR encoding(rfc1456).

       sjis
              Encoding is Shift-encoded JIS X-0208:1997 character  set.   Note
              that this is not cp932. Uses JIS x-0201 latin as latin(GL) part.

       sjis-x0213, shift_jis-2003
              Encoding is Microsoft JIS using JIS X0213:2000 character set.

       sjis-x0213-2004, shift_jis-2004
              Encoding is Microsoft JIS using JIS  X0213:2004  character  set.
              10 newly defined character added, but Unicode mapping is same as
              JIS X0213:2000. Uses JIS x-0201 latin as latin(GL) part.

       sjis-cellular (experimental)
              Encoding is Shift-encoded JIS X-0208:1997 character set with NTT
              Docomo/Vodafone cellular phone glyph mapping.

       cp932
              Encoding  is Microsoft SJIS cp932 with NEC/IBM gaiji area, based
              on Windows XP mapping.  Uses JIS x-0201 latin as latin(GL) part.

       cp51932
              Encoding is Microsoft EUC-based cp51932 with NEC/IBM gaiji area,
              based on Windows XP mapping.  Uses JIS x-0201 latin  as  EUC  G2
              part.

       cp50220
              Encoding is Microsoft JIS-based cp50220 with NEC/IBM gaiji area,
              based on Windows XP mapping.  For input,  skf  accepts  cp50220,
              50221  and 50222.  Note that this codeset is NOT compatible with
              iso-2022.

       oldsjis
              Encoding is Microsoft SJIS (JIS  X-0208:1978  a.k.a.  old  JIS).
              Uses JIS x-0201 latin as latin(GL) part.

       johab
              Encoding  is  KS X1001(Johab) character set. Uses KS X1003 latin
              as latin(GL) part.

       uhc
              Encoding is UHC (cp949) character set. Uses KS  X1003  latin  as
              latin(GL) part.

       ucs2, utf16
              Encoding  is  Unicode  UTF-16 (v4.1). Input/Output default byte-
              endian is little, and input byte order mark is recognized.  Out-
              put includes endian mark by default unless --disable-endian-mark
              is specified. Output range is within UTF-32 with surrogate  pair
              unless --limit-to-ucs2 is specified.  Note that ucs2 is not sup-
              ported within perl/ruby extension in both in and output, because
              of  data  structure  limitation.  Specify  to ucs2 will generate
              error.

       utf8
              Encoding is UTF-8 encoded Unicode (v4.1). Output doesn't include
              byte  order mark unless --enable-endian-mark is specified.  Out-
              put range is within UTF-32 unless --limit-to-ucs2 is  specified.

       utf7
              Encoding  is  UTF-7 encoded Unicode (v4.1). Output range is lim-
              ited to UTF-16, and value above U+10000  is  regarded  as  unde-
              fined.

       keis (experimental)
              Encoding is Hitachi KEIS83/90. Output range is limited to EBCDIK
              and JIS X-0208 area.

       jef (experimental)
              Encoding is Fujitsu JEF. Input only. Only  basic  part  is  sup-
              ported.

       ibm930 (experimental)
              Encoding is IBM DBCS Japanese with EBCDIC Kana

       ibm931 (experimental)
              Encoding is IBM DBCS Japanese with EBCDIC latin (ibm037)

       ibm933 (experimental)
              Encoding is IBM DBCS Korian with EBCDIC Wansung character set

       ibm935 (experimental)
              Encoding is IBM DBCS Simplified Chinese with EBCDIC Chinese

       ibm937 (experimental)
              Encoding is IBM DBCS Traditional Chinese with EBCDIC Chinese

       koi8r
              Russian KOI-8R code.

       cp1250
              Central Europian latin Microsoft cp1250 code

       cp1251
              Eastern Europian cyrillic Microsoft cp1251 code.

       transparent
              Transparent mode. Various code control features, include folding
              and line end code conversion, is ignored.


     Shortcuts
       -n -j  same as --oc=jis

       -s -x  same as --oc=sjis

       -a -e  same as --oc=euc-jp

       -q     same as --oc=ucs2

       -z     same as --oc=sjis

       -y     same as --oc=utf7

       -k     same as --oc=keis


       -A, -E same as --ic=euc-jp. Assume input code set is EUC-JP.

       -N     same as --ic=jis. Assume input code set is iso-2022-jp.

       -S, -X same as --ic=sjis. Assume input code set is Microsoft JIS.

       -Q     same as --ic=ucs2.

       -Y     same as --ic=utf7.

       -Z     same as --ic=utf8.

       -K     same as --ic=keis.


     ISO-2022 Specific controls
       Replace G0-3 after setting up according to specified input  codeset  by
       assigned character set with this option.

       --set-g0=`charset name'
              Predefine  specified code set to plane 0 (G0). Also set to GL at
              initial state.

       --set-g1=`charset name'
              Predefine specified code set to right plane (G1). Also set to GR
              at initial state.

       --set-g2=`charset name'
              Predefine specified code set to right plane (G2).

       --set-g3=`charset name'
              Predefine specified code set to right plane (G3).


       Supported `char_set' is as follows. 'o' means the codeset can be spaci-
       fied to set to the plane. 'x' means you can't.


       g0 g1 g2 g3    codeset name   description
       o  o  o  o     ascii          ANSI X3.4 ASCII
       o  o  o  o     x0201          JIS X 0201 (latin part)
       x  o  o  o     iso8859-1      ISO 8859-1 latin
       x  o  o  o     iso8859-2      ISO 8859-2 latin
       x  o  o  o     iso8859-3      ISO 8859-3 latin
       x  o  o  o     iso8859-4      ISO 8859-4 latin
       x  o  o  o     iso8859-5      ISO 8859-5 Cyrillic
       x  o  o  o     iso8859-6      ISO 8859-6 Arabic
       x  o  o  o     iso8859-7      ISO 8859-7 Greek-latin
       x  o  o  o     iso8859-8      ISO 8859-8 Hebrew
       x  o  o  o     iso8859-9      ISO 8859-9 latin
       x  o  o  o     iso8859-10     ISO 8859-10 latin
       x  o  o  o     iso8859-11     ISO 8859-11 Thai
       x  o  o  o     iso8859-13     ISO 8859-13 latin
       x  o  o  o     iso8859-14     ISO 8859-14 latin
       x  o  o  o     iso8859-15     ISO 8859-15 latin
       x  o  o  o     iso8859-16     ISO 8859-16 latin
       x  o  o  o     tcvn5712       TCVN 5712 (Vietnamese)
       x  o  o  o     ecma113        ECMA 113 Cyrillic
       o  o  o  o     x0212          JIS X-0212:1990
       o  o  o  o     x0208          JIS X-0208:1997
       o  o  o  o     x0213          JIS X-0213 Plane 1:2000
       o  o  o  o     x0213-2        JIS X-0213 Plane 2:2000
       o  o  o  o     x0213n         JIS X-0213 Plane 1:2004
       o  o  o  o     gb2312         Simplified Chinese GB2312
       o  o  o  o     gb1988         Chinese GB1988(latin)
       o  o  o  o     gb12345        Traditional Chinese GB12345
       o  o  o  o     ksx1003        Korian KS X 1003(latin)
       o  o  o  o     ksx1001        Korian KS X 1001
       x  o  o  o     koi8-r         Cyrillic KOI-8R
       x  o  o  o     koi8-u         Ukrainean Cyrillic KOI-8U
       o  o  o  o     cns11643       Traditional Chinese CNS11643-1
       x  o  o  o     viscii-r       RFC1496 VISCII (right plane)
       o  o  o  o     viscii-l       RFC1496 VISCII (left plane)
       o  o  o  o     vni            Vietnamese VNI
       x  o  o  o     cp437          Microsoft cp437 (US latin)
       x  o  o  o     cp737          Microsoft cp737
       x  o  o  o     cp775          Microsoft cp775
       x  o  o  o     cp850          Microsoft cp850
       x  o  o  o     cp852          Microsoft cp852
       x  o  o  o     cp855          Microsoft cp855
       x  o  o  o     cp857          Microsoft cp857
       x  o  o  o     cp860          Microsoft cp860
       x  o  o  o     cp861          Microsoft cp861
       x  o  o  o     cp862          Microsoft cp862
       x  o  o  o     cp863          Microsoft cp863
       x  o  o  o     cp864          Microsoft cp864
       x  o  o  o     cp865          Microsoft cp865
       x  o  o  o     cp866          Microsoft cp866
       x  o  o  o     cp869          Microsoft cp869
       x  o  o  o     cp874          Microsoft cp874
       x  o  o  o     cp932          Microsoft cp932 (Japanese)
       x  o  o  o     cp1250     Microsoft cp1250(Central Europe)
       x  o  o  o     cp1251         Microsoft cp1251 (Cyrillic)
       x  o  o  o     cp1252         Microsoft cp1252 (Latin-1)
       x  o  o  o     cp1253         Microsoft cp1253 (Greek)
       x  o  o  o     cp1254         Microsoft cp1254 (Turkish)
       x  o  o  o     cp1255         Microsoft cp1255
       x  o  o  o     cp1258         Microsoft cp1258

       --euc-protect-g1
              In EUC input mode, suppress sequences to set a  charset  to  G1.
              Such sequences are discarded.

       --add-annon
              Add  announcer for JIS X-0208:1997 to X-0208 designate sequence.
              This option works only with iso-2022-based output.

       --disable-jis90
              Disable 2 added characters of JIS X-0208:1997. If this option is
              specified,  these two characters are replaced by Kanji variants.
              This option is off by default.

       --input-detect-jis78
              Distinguish JIS X-0208:1978 codeset and JIS X-0208:1997 codeset.
              By  default,  these two charset is regarded as X-0208:1997. This
              option is valid only when input encoding is JIS (ISO-2022).


     JIS X-0212(Supplement Kanji code) Support
       --x0212-enable
              skf by default does not output  JIS  X-0212  code.  This  option
              enables  use  of JIS X-0212 part. Output code set may be neither
              Microsoft code nor KEIS. For  Unicode  variant  encodings,  this
              option  is on by default.  This option is supported for backward
              compatibility. May not be supported in future versions.


     Unicode coding specific control options
       --use-compat
              When output is one of translation format  of  Unicode  standard,
              enable characters in compatibility plane (0xfxxx).  If disabled,
              these characters is converted to variants or undefined.

       --use-ms-compat
              When output is Unicode, make translation to be Microsoft windows
              compatible).   This  only  affect some symbols in JIS-Kanji, and
              adding --use-compat option is recommended.

       --use-cde-compat
              When output is Unicode, make translation  CDE  standard  codeset
              compatible.

       --little-endian
              When  output  is  Unicode, use little endian byte-order. This is
              default.

       --big-endian
              When output is Unicode, use big endian byte-order.

       --disable-endian-mark
              When output is UTF-16, do not use byte order  marking.  To  make
              UTF-16N,  use  this  option with --little-endian. This is off by
              default.

       --enable-endian-mark
              When output is UTF-8, output byte order marking. This is off  by
              default.

       --input-little-endian
              When  input  is  Unicode,  assume  input  is little endian byte-
              ordered.  This is default, but skf respects byte-order mark.

       --input-big-endian
              When input is Unicode, assume input is big endian  byte-ordered.
              Note that skf respects byte-order mark.

       --endian-protect
              Do  not use endian mark in the input stream. Endian mark is just
              discarded.  This is off by default.

       --use-replace-char
              skf by default converts undefined (except 0x2xxx  part)  charac-
              ters into "geta (U+3013)" code in Japanese codeset.  This option
              specifies skf to use replacement char (U-fffc) instead.

       --limit-to-ucs2
              Do not use > 0x10000 area code in Unicode (i.e. limits  code  to
              ucs2 area).  This is off by default.

       --disable-cjk-extension
              Treat  CJK  extension  A/B  area as undefined. This is off (i.e.
              these areas are enabled) by default.

       --old-hangul-location
              Treat input U-3400 area as hangul (Unicode  1.0  compatibility).
              This is off by default.


     Codeset/Vendor Specific codeset handling flags
       skf  by  default  assumes  machine  specific  parts  of  kanji code are
       Microsoft Windows compatible. Here are some options that  control  this
       behavior.   Option  in  this  category  is valid when output codeset is
       Japanese codeset, except --disable-charts.

       --use-apple-gaiji
              Assume machine specific part in input file is Macintosh  (System
              7,8,9 or OS X) compatible.

       --disable-ibm-gaiji
              Disable machine specific part in input file.

       --disable-chart
              Do  not  use  Moji-keisen  characters. This is for old Macintosh
              system (System 6.x or older) compatibility.


     Miscellanious codeset related options
       --old-nec-compat
              Enable old NEC kanji sequence (ESC-K,H).  Needs  compile  option
              --enable-oldnec at configuration.

       --no-utf7
              Assume  input  code  set  is  *NOT*  UTF-7 encoded Unicode. This
              option disables input utf7 testing.

       --no-kana
              Assume input code set does *NOT* include JIS  x0201  kana.  Also
              suppresses Unicode half width variants.


   OUTPUT Conversions options
       skf  has various features to fit output files to local environment, and
       many of these are controlled by extended control  switch  described  in
       this section.

       --use-g0-ascii
              set  G0(=GL) for output encoding to ASCII, ignoring codeset des-
              ignation.

     X-0201 Kana/latin conversions
       skf by default converts X-0201 kanas to X-0208 kanas. To output  X-0201
       kana  as it is, use one of following options. When output is designated
       to EUC or SJIS, these three options enable X-0201 kana output  by  ways
       provided  by  each code set. When Unicode output is specified, (equiv.)
       kana part output is controlled by --use-compat, not following switches.
       Valid only when output codeset is non-Unicode Japanese codeset.

       --kana-jis7
              use SI/SO locking shift sequence to designate X-0201 kana.  This
              switch is valid for jis, jis-x0213 and  cp50220  (i.e.  cp50221)
              encoding.  For other codeset, this option is ignored.

       --kana-jis8
              output X-0201 kana using 8-bit code right plane.  This switch is
              valid for jis and jis-x0213 encoding.  For other  codeset,  this
              option is ignored.

       --kana-esci --kana-call
              use  ESC-(-I to designate X-0201 kana.  This switch is valid for
              jis, jis-x0213 and cp50220 (i.e. cp50222) encoding.   For  other
              codeset, this option is ignored.

       --kana-enable
              use  X-0201 kana when EUC (with G2) or SJIS output code is used.
              When JIS output, it is same as --kana-call.


     URI/TeX conversion feature options
       With Unicode(tm) family output  codings,  skf  output  non-ascii  latin
       character  part  as  it is, but with other output codings, skf converts
       these characters using following rules:

       (1) If code is defined in a specified output codeset, it  is  outputted
       with this codeset.
       (2)  If  one  of  following  html  convert  modes  enabled (i.e. --con-
       vert-html --convert-sgml) and code is defined in html/sgml codeset,  it
       is converted to entity-reference or codepoint reference.
       (3)  If tex convert mode enabled and code is defined in tex codeset, it
       is converted to tex format.
       (4) If code is a kind of combined ligatures, it is shown by  a  set  of
       characters.
       (5) A kind of replacement character is shown, with warning.

       --convert-html --convert-sgml
              Enable html convert mode. This mode is cleared by --reset. These
              two options are synonyms, and are treated as same option.

       --convert-html-decimal
              Enable html  code-point  decimal  convert  mode.  This  mode  is
              cleared by --reset.

       --convert-html-hexadecimal
              Enable  html  code-point  hexadecimal convert mode. This mode is
              cleared by --reset.

       --convert-tex
              Enable TeX convert mode. This mode is cleared by --reset.

       --use-iso8859-1
              Enable iso-8859-1 output. Iso-8859-1 is invoked to G1 and set to
              GR plane.

       --use-iso8859-1-right
              Enable  7-bit  iso-8859-1  output.  Iso-8859-1  is invoked to G1
              plane.

   Encoding control options
       --decode=`encoding scheme'
              Specify encoding scheme for  input  stream.  Supported  encoding
              scheme  is  `hex',  'mime',  'mime_q',  'mime_b',  'uri_encode',
              'puny', 'hex_perc_encode', CAP hex-code, mime, mime  Q-encoding,
              mime B-encoding, uri character reference, ACE punycode, uri per-
              cent notation, base64, Q-encoding, rfc2231 and rot13/47  respec-
              tively.  Only  one  decode option is valid, and if more than one
              option is specified, last one is used.  When  mime  decoding  is
              specified, base text is assumed to be EUC encoding unless speci-
              fied otherwise.  Except  rot,  which  assumes  input  stream  is
              Shift_JIS,  EUC  or  iso-2022-jp,  these encodings assumes input
              stream is ascii (as defined in RFC2045). Some encodings may  co-
              exist  with encoding, but this is not guaranteed. Especially, if
              input is UTF-16/UCS2 code, these encoding is ignored in skf.

   End of line control options
       --lineend-thru
              Output end of line code as it is. Also output ^Z code as it  is.
              This is default.

       --lineend-cr --lineend-mac
              Use  CR  as  end  of  line  code. Also delete ^Z code from input
              stream.

       --lineend-lf --lineend-unix
              Use LF as end of line code.  Also  delete  ^Z  code  from  input
              stream.

       --lineend-crlf --lineend-windows
              Use  CR+LF  as  end of line code. Also delete ^Z code from input
              stream.  This option doesn't preserve original order of  cr  and
              lf.

       --input-cr
              Assume input stream uses CR as end of line code.

       --input-lf
              Assume input stream uses LF as end of line code.

       --input-crlf
              Assume input stream uses CR+LF as end of line code.

       -F[line_length[-kinsoku]]

       -f[line_length[-kinsoku]] -f[line_length[+kinsoku]]
              Wrap  input  lines  by  line_length  columns.  f  option deletes
              CR/LF's in input, and F option doesn't delete them. For Japanese
              convension,    both    gyoutou-kinsoku(by   burasage-gumi)   and
              gyoumatsu-kinsoku(by oidasi-gumi) is  supported.  The  burasage-
              length  is  controlled  by  kinsoku  option.  Default  value for
              line_length is 66, and must be < 1000. Default value for kinsoku
              is  5,  and  must be <= 10. In 'f' option, skf autodetects para-
              graph and retains some CR/LF. 2nd 'f' option format  (with  '+')
              disables  this  behaviour.   In  nkf  compatible mode, some fold
              behavior changes as follows.
              (1) Default line_length is set to 60, and kinsoku value is 10.
              (2) alpha numeric characters become gyoutou-kinsoku  characters.

   File control options
       --filewise-detect --force-reset
              Reset and re-detect input code set at the start of each file.

       --linewise-detect
              Reset  and  re-detect  input code set at the start of each line.
              This option needs -DKUNIMOTO at compile time.


   Compatibility options
       --nkf-compat
              interpret following options as nkf compatible manners.

       --skf-compat
              interpret following options as skf-native manners.


   Misc. Control options
       --disable-space-convert
              skf by default, converts an ideographic  space  into  two  ascii
              spaces.  This option disables this behavior.

       --html-sanitize
              Convert  several characters in HTML document to entity reference
              expression. Specifically, "!#$&%()/<>:;?' is escaped  by  entity
              expression.

       --filewise-detect --force-reset
              If  multiple  input  files are given, detect input code for each
              file.

       --linewise-detect
              Detect input code  line-wise.  Note  this  option  weakens  code
              detect  feature.   Need  compile option (at configure) --enable-
              kunimoto.

       --reset
              Reset all flags specified by extended controls and  given  input
              code.

       --inquiry --guess
              skf  detects code and output detect result to stdout. No filter-
              ing output is  performed.  If  multiple  input  file  is  given,
              --show-filename is automatically enabled.

       --hard-inquiry
              Similar  as  inquiry, but reports both code and line end charac-
              ter.

       --suppress-filename
              When inquiry(--inquiry) is on, this option  disables  file  name
              output.  This option overrides --show-filename.

       --show-filename
              When  inquiry(--inquiry)  is on, this option adds each file name
              to output.

       --invis-strip
              Delete all escape  sequences  not  belonging  to  ISO-2022  code
              extension.  This  is intended to replace invisstrip command bun-
              dled in inews package.

       -I     Warn if input has unassigned code points.

       -v     print version and exit.

       -h --help
              print brief help.

       --show-supported-codeset
              Display supported codeset (input) and exit.

       --show-supported-charset
              Display supported character set (output) and exit.

       -%[debug_level]
              Enable skf debugging. Debug level is one digit. 0 is  the  least
              verbose,  and with -%9 you'll get whole traces within skf.  This
              option needs compile option --enable-debug.


FILES
       /usr/(local/)share/skf/lib/   (Unices)

       /Program Files/skf/share/lib (MS Windows)
              These directories are where external codeset  conversion  tables
              go.   The  location  that  current  skf  assumes are shown by -h
              option.


AUTHOR
       skf is written by Seiji Kaneko  (skaneko@a2.mbn.or.jp)  based  on  idea
       from nkf written by Itaru Ichikawa (ichikawa@flab.fujitsu.co.jp) X-0213
       code table is derived from work of earthian@tama.or.jp.   Some  codeset
       mapping  is  derived  from various sources. Detailed origin is shown in
       copyright document included in this distribution.


ACKNOWLEDGEMENT
       skf  is  inspired  by   works   or   requests   by   shinoda@cs.titech,
       kato@cs.titech,  uematsu@cs.titech, void@global ohta@ricoh, Hinata(HKE)
       Ashizawa(CRL) Kunimoto(SDL) Oohara(Univ of Kyoto), Jokagi(elf2000)  and
       naruse (at sourceforge.jp). Thanks.


BUGS AND LIMITATIONS
       1.  skf  can  handle  mixed coding with some limitations. However, code
       detection tends to fail for mixed code, and giving explicit input  code
       set is strongly encouraged, if codeset is known beforehand.
       In  case of need, --linewise-detect option may help, but more likely to
       fail to detect codes.

       2. When using UCS2, UTF-16, UTF-8 and UTF-7, skf tries to detect  input
       code,  but giving explicit code set is encouraged.  skf doesn't support
       UCS4, but does support UTF-16/UTF-32 (i.e. surrogate pairs).  skf  just
       pass  Composite  characters to output. No further normalization process
       is performed.

       3. skf implements ISO-2022 with following exceptions
        i) GL 0x20 is always space. Even when 96-character codeset is  invoked
       to GL.
        ii) Sequences for setting codes to C1 and C2 is always ignored.
        iii) if unknown sequence is given to G0, G0 is set to ascii, and lock-
       ing/single shift is cleared. Unknown sequece  call  to  G1-G3  is  just
       ignored.
        iv) Sequences for 96 character multibyte coding is ignored (Currently,
       no codeset is registered).
        v) Calling UTF-8, UTF-16 coding system from iso-2022 is supported, and
       returns to previous coding system by standard return.
        Callings and returns to/from other coding schemes are ignored.
        vi) Because of cellular phone glyph support, several private (not reg-
       istered) codeset is defined in skf, and can be  called  by  appropriate
       sequence.

       4.  Since  skf by default tests input stream to detect utf7 coding, skf
       sometimes misdetects pure ascii text  as  utf7.  If  this  occurs,  use
       --no-utf7 option.

       5. error output coding is controlled by LOCALE environment variables in
       UN*X system. Since skf don't care about stdout and stderr is  redirect-
       ing into same stream, this case should be handled by user.

       6. skf-1.9x converts KEIS/JIS X-0213 code using CJK-extension B and CJK
       compatibility area. For this reason, X-0213  and  KEIS  convert  result
       varies depending on --use-compat and --limit-to-ucs2 switches.

       7.  JIS  X-0207(1979) is not supported. JIS X-0211(1987) is designed to
       be supported (i.e. common terminal control sequence will  be  transpar-
       ently passed to output).

       8.  Even  if  unbuffer  option(-u)  is specified, some code-translation
       related bufferings are still performed (in MIME, kana, VIQR etc.).

       9. skf-1.9x recognizes and  handles  languages  in  iso639-1(alpha  2).
       iso639-2 is not supported as a valid language set.

       10.  Ucs2  is  not  supported within perl/ruby extension in both in and
       output, because of data structure limitation. Specify to ucs2 will gen-
       erate  error.  This  is  a limitation of language itself, rather than a
       limitation of skf.


Notes
       1. Extended options are changed extensively since skf-1.9. Some archaic
       options (eg. -B, -@ and -r) have been deleted from this version.

       2.  skf  is  derived  project  from nkf, but doesn't contain nkf codes.
       Copyright notice is retained by honor.

       3. From version 1.9, default Japanese character set assumed by skf  has
       changed  to JIS X-0208:1990 with Microsoft Japanese Windows gaiji (i.e.
       CP932).

       4. Code autodetection is not perfect by design. If  it  has  failed  to
       detect  input code properly, please give input code information explic-
       itly.

       5. Some ligatures in Unicode, cp932  gaiji  and  KEIS83  are  converted
       using  JIS  X-0124  and  other convention.  During this conversion, its
       byte length is not preserved.

       6. skf is intended to  pass  ANSI  compatible  terminal  control  codes
       transparently, but this is not guaranteed.

       7.  nkf's  -i  and  -o options still works at 1.94, but is obsolete and
       valid only when iso-2022-jp  and  without  considering  output  codeset
       specifications.  Using these options are strongly discouraged.

       8.  For unconverted character, skf uses geta and undefined character as
       --use-replace-char option.  If  output  codeset  doesn't  contain  geta
       code, skf prefers 'black square character', then uses '.' respectively.

       9. There are some undocumented options. These options should be consid-
       ered as highly experimental.

       10.  In  lineend_thru mode and using folding, skf remembers order of cr
       and lf appears in stream, and use that order.  For this design, if  skf
       needs  to  output  line-end  character  before  any  line-end character
       appears in input stream, input order may not be preserved.


Notice
       Unicode(TM) is a trademark of Unicode, Inc. Microsoft and  Windows  are
       registered  trademarks  of Microsoft corporation. Macintosh is a regis-
       tered trademark of Apple Computer Inc. Vodafone is a trademark of Voda-
       fone K.K.  Other names and terms may be trademarks or registered trade-
       marks of their respective owner.  Trademark symbol (TM) may be  omitted
       in this manual page.




                                  30/JUL/2006                           SKF(1)
SourceForge.JP is a Japanese version of SourceForge.net. For developments that are not related to Japan, we recommend you to use SourceForge.net.