[ Index ]

PHP Cross Reference of Unnamed Project




/se3-unattended/var/se3/unattended/install/linuxaux/opt/perl/lib/5.10.0/ -> utf8.pm (source)

   1  package utf8;
   3  $utf8::hint_bits = 0x00800000;
   5  our $VERSION = '1.07';
   7  sub import {
   8      $^H |= $utf8::hint_bits;
   9      $enc{caller()} = $_[1] if $_[1];
  10  }
  12  sub unimport {
  13      $^H &= ~$utf8::hint_bits;
  14  }
  16  sub AUTOLOAD {
  17      require  "utf8_heavy.pl";
  18      goto &$AUTOLOAD if defined &$AUTOLOAD;
  19      require Carp;
  20      Carp::croak("Undefined subroutine $AUTOLOAD called");
  21  }
  23  1;
  24  __END__
  26  =head1 NAME
  28  utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
  30  =head1 SYNOPSIS
  32      use utf8;
  33      no utf8;
  35      # Convert a Perl scalar to/from UTF-8.
  36      $num_octets = utf8::upgrade($string);
  37      $success    = utf8::downgrade($string[, FAIL_OK]);
  39      # Change the native bytes of a Perl scalar to/from UTF-8 bytes.
  40      utf8::encode($string);
  41      utf8::decode($string);
  43      $flag = utf8::is_utf8(STRING); # since Perl 5.8.1
  44      $flag = utf8::valid(STRING);
  46  =head1 DESCRIPTION
  48  The C<use utf8> pragma tells the Perl parser to allow UTF-8 in the
  49  program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based
  50  platforms).  The C<no utf8> pragma tells Perl to switch back to treating
  51  the source text as literal bytes in the current lexical scope.
  53  B<Do not use this pragma for anything else than telling Perl that your
  54  script is written in UTF-8.> The utility functions described below are
  55  directly usable without C<use utf8;>.
  57  Because it is not possible to reliably tell UTF-8 from native 8 bit
  58  encodings, you need either a Byte Order Mark at the beginning of your
  59  source code, or C<use utf8;>, to instruct perl.
  61  When UTF-8 becomes the standard source format, this pragma will
  62  effectively become a no-op.  For convenience in what follows the term
  63  I<UTF-X> is used to refer to UTF-8 on ASCII and ISO Latin based
  64  platforms and UTF-EBCDIC on EBCDIC based platforms.
  66  See also the effects of the C<-C> switch and its cousin, the
  67  C<$ENV{PERL_UNICODE}>, in L<perlrun>.
  69  Enabling the C<utf8> pragma has the following effect:
  71  =over 4
  73  =item *
  75  Bytes in the source text that have their high-bit set will be treated
  76  as being part of a literal UTF-X sequence.  This includes most
  77  literals such as identifier names, string constants, and constant
  78  regular expression patterns.
  80  On EBCDIC platforms characters in the Latin 1 character set are
  81  treated as being part of a literal UTF-EBCDIC character.
  83  =back
  85  Note that if you have bytes with the eighth bit on in your script
  86  (for example embedded Latin-1 in your string literals), C<use utf8>
  87  will be unhappy since the bytes are most probably not well-formed
  88  UTF-X.  If you want to have such bytes under C<use utf8>, you can disable
  89  this pragma until the end the block (or file, if at top level) by
  90  C<no utf8;>.
  92  =head2 Utility functions
  94  The following functions are defined in the C<utf8::> package by the
  95  Perl core.  You do not need to say C<use utf8> to use these and in fact
  96  you should not say that  unless you really want to have UTF-8 source code.
  98  =over 4
 100  =item * $num_octets = utf8::upgrade($string)
 102  Converts in-place the internal octet sequence in the native encoding
 103  (Latin-1 or EBCDIC) to the equivalent character sequence in I<UTF-X>.
 104  I<$string> already encoded as characters does no harm.  Returns the
 105  number of octets necessary to represent the string as I<UTF-X>.  Can be
 106  used to make sure that the UTF-8 flag is on, so that C<\w> or C<lc()>
 107  work as Unicode on strings containing characters in the range 0x80-0xFF
 108  (on ASCII and derivatives).
 110  B<Note that this function does not handle arbitrary encodings.>
 111  Therefore Encode is recommended for the general purposes; see also
 112  L<Encode>.
 114  =item * $success = utf8::downgrade($string[, FAIL_OK])
 116  Converts in-place the internal octet sequence in I<UTF-X> to the
 117  equivalent octet sequence in the native encoding (Latin-1 or EBCDIC).
 118  I<$string> already encoded as native 8 bit does no harm.  Can be used to
 119  make sure that the UTF-8 flag is off, e.g. when you want to make sure
 120  that the substr() or length() function works with the usually faster
 121  byte algorithm.
 123  Fails if the original I<UTF-X> sequence cannot be represented in the
 124  native 8 bit encoding. On failure dies or, if the value of C<FAIL_OK> is
 125  true, returns false. 
 127  Returns true on success.
 129  B<Note that this function does not handle arbitrary encodings.>
 130  Therefore Encode is recommended for the general purposes; see also
 131  L<Encode>.
 133  =item * utf8::encode($string)
 135  Converts in-place the character sequence to the corresponding octet
 136  sequence in I<UTF-X>.  The UTF8 flag is turned off, so that after this
 137  operation, the string is a byte string.  Returns nothing.
 139  B<Note that this function does not handle arbitrary encodings.>
 140  Therefore Encode is recommended for the general purposes; see also
 141  L<Encode>.
 143  =item * $success = utf8::decode($string)
 145  Attempts to convert in-place the octet sequence in I<UTF-X> to the
 146  corresponding character sequence.  The UTF-8 flag is turned on only if
 147  the source string contains multiple-byte I<UTF-X> characters.  If
 148  I<$string> is invalid as I<UTF-X>, returns false; otherwise returns
 149  true.
 151  B<Note that this function does not handle arbitrary encodings.>
 152  Therefore Encode is recommended for the general purposes; see also
 153  L<Encode>.
 155  =item * $flag = utf8::is_utf8(STRING)
 157  (Since Perl 5.8.1)  Test whether STRING is in UTF-8 internally.
 158  Functionally the same as Encode::is_utf8().
 160  =item * $flag = utf8::valid(STRING)
 162  [INTERNAL] Test whether STRING is in a consistent state regarding
 163  UTF-8.  Will return true is well-formed UTF-8 and has the UTF-8 flag
 164  on B<or> if string is held as bytes (both these states are 'consistent').
 165  Main reason for this routine is to allow Perl's testsuite to check
 166  that operations have left strings in a consistent state.  You most
 167  probably want to use utf8::is_utf8() instead.
 169  =back
 171  C<utf8::encode> is like C<utf8::upgrade>, but the UTF8 flag is
 172  cleared.  See L<perlunicode> for more on the UTF8 flag and the C API
 173  functions C<sv_utf8_upgrade>, C<sv_utf8_downgrade>, C<sv_utf8_encode>,
 174  and C<sv_utf8_decode>, which are wrapped by the Perl functions
 175  C<utf8::upgrade>, C<utf8::downgrade>, C<utf8::encode> and
 176  C<utf8::decode>.  Also, the functions utf8::is_utf8, utf8::valid,
 177  utf8::encode, utf8::decode, utf8::upgrade, and utf8::downgrade are
 178  actually internal, and thus always available, without a C<require utf8>
 179  statement.
 181  =head1 BUGS
 183  One can have Unicode in identifier names, but not in package/class or
 184  subroutine names.  While some limited functionality towards this does
 185  exist as of Perl 5.8.0, that is more accidental than designed; use of
 186  Unicode for the said purposes is unsupported.
 188  One reason of this unfinishedness is its (currently) inherent
 189  unportability: since both package names and subroutine names may need
 190  to be mapped to file and directory names, the Unicode capability of
 191  the filesystem becomes important-- and there unfortunately aren't
 192  portable answers.
 194  =head1 SEE ALSO
 196  L<perlunitut>, L<perluniintro>, L<perlrun>, L<bytes>, L<perlunicode>
 198  =cut

Generated: Tue Mar 17 22:47:18 2015 Cross-referenced by PHPXref 0.7.1