[ Index ] |
PHP Cross Reference of Unnamed Project |
[Summary view] [Print] [Text view]
1 =head1 NAME 2 3 perlfilter - Source Filters 4 5 =head1 DESCRIPTION 6 7 This article is about a little-known feature of Perl called 8 I<source filters>. Source filters alter the program text of a module 9 before Perl sees it, much as a C preprocessor alters the source text of 10 a C program before the compiler sees it. This article tells you more 11 about what source filters are, how they work, and how to write your 12 own. 13 14 The original purpose of source filters was to let you encrypt your 15 program source to prevent casual piracy. This isn't all they can do, as 16 you'll soon learn. But first, the basics. 17 18 =head1 CONCEPTS 19 20 Before the Perl interpreter can execute a Perl script, it must first 21 read it from a file into memory for parsing and compilation. If that 22 script itself includes other scripts with a C<use> or C<require> 23 statement, then each of those scripts will have to be read from their 24 respective files as well. 25 26 Now think of each logical connection between the Perl parser and an 27 individual file as a I<source stream>. A source stream is created when 28 the Perl parser opens a file, it continues to exist as the source code 29 is read into memory, and it is destroyed when Perl is finished parsing 30 the file. If the parser encounters a C<require> or C<use> statement in 31 a source stream, a new and distinct stream is created just for that 32 file. 33 34 The diagram below represents a single source stream, with the flow of 35 source from a Perl script file on the left into the Perl parser on the 36 right. This is how Perl normally operates. 37 38 file -------> parser 39 40 There are two important points to remember: 41 42 =over 5 43 44 =item 1. 45 46 Although there can be any number of source streams in existence at any 47 given time, only one will be active. 48 49 =item 2. 50 51 Every source stream is associated with only one file. 52 53 =back 54 55 A source filter is a special kind of Perl module that intercepts and 56 modifies a source stream before it reaches the parser. A source filter 57 changes our diagram like this: 58 59 file ----> filter ----> parser 60 61 If that doesn't make much sense, consider the analogy of a command 62 pipeline. Say you have a shell script stored in the compressed file 63 I<trial.gz>. The simple pipeline command below runs the script without 64 needing to create a temporary file to hold the uncompressed file. 65 66 gunzip -c trial.gz | sh 67 68 In this case, the data flow from the pipeline can be represented as follows: 69 70 trial.gz ----> gunzip ----> sh 71 72 With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser: 73 74 compressed gunzip 75 Perl program ---> source filter ---> parser 76 77 =head1 USING FILTERS 78 79 So how do you use a source filter in a Perl script? Above, I said that 80 a source filter is just a special kind of module. Like all Perl 81 modules, a source filter is invoked with a use statement. 82 83 Say you want to pass your Perl source through the C preprocessor before 84 execution. You could use the existing C<-P> command line option to do 85 this, but as it happens, the source filters distribution comes with a C 86 preprocessor filter module called Filter::cpp. Let's use that instead. 87 88 Below is an example program, C<cpp_test>, which makes use of this filter. 89 Line numbers have been added to allow specific lines to be referenced 90 easily. 91 92 1: use Filter::cpp; 93 2: #define TRUE 1 94 3: $a = TRUE; 95 4: print "a = $a\n"; 96 97 When you execute this script, Perl creates a source stream for the 98 file. Before the parser processes any of the lines from the file, the 99 source stream looks like this: 100 101 cpp_test ---------> parser 102 103 Line 1, C<use Filter::cpp>, includes and installs the C<cpp> filter 104 module. All source filters work this way. The use statement is compiled 105 and executed at compile time, before any more of the file is read, and 106 it attaches the cpp filter to the source stream behind the scenes. Now 107 the data flow looks like this: 108 109 cpp_test ----> cpp filter ----> parser 110 111 As the parser reads the second and subsequent lines from the source 112 stream, it feeds those lines through the C<cpp> source filter before 113 processing them. The C<cpp> filter simply passes each line through the 114 real C preprocessor. The output from the C preprocessor is then 115 inserted back into the source stream by the filter. 116 117 .-> cpp --. 118 | | 119 | | 120 | <-' 121 cpp_test ----> cpp filter ----> parser 122 123 The parser then sees the following code: 124 125 use Filter::cpp; 126 $a = 1; 127 print "a = $a\n"; 128 129 Let's consider what happens when the filtered code includes another 130 module with use: 131 132 1: use Filter::cpp; 133 2: #define TRUE 1 134 3: use Fred; 135 4: $a = TRUE; 136 5: print "a = $a\n"; 137 138 The C<cpp> filter does not apply to the text of the Fred module, only 139 to the text of the file that used it (C<cpp_test>). Although the use 140 statement on line 3 will pass through the cpp filter, the module that 141 gets included (C<Fred>) will not. The source streams look like this 142 after line 3 has been parsed and before line 4 is parsed: 143 144 cpp_test ---> cpp filter ---> parser (INACTIVE) 145 146 Fred.pm ----> parser 147 148 As you can see, a new stream has been created for reading the source 149 from C<Fred.pm>. This stream will remain active until all of C<Fred.pm> 150 has been parsed. The source stream for C<cpp_test> will still exist, 151 but is inactive. Once the parser has finished reading Fred.pm, the 152 source stream associated with it will be destroyed. The source stream 153 for C<cpp_test> then becomes active again and the parser reads line 4 154 and subsequent lines from C<cpp_test>. 155 156 You can use more than one source filter on a single file. Similarly, 157 you can reuse the same filter in as many files as you like. 158 159 For example, if you have a uuencoded and compressed source file, it is 160 possible to stack a uudecode filter and an uncompression filter like 161 this: 162 163 use Filter::uudecode; use Filter::uncompress; 164 M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/ 165 M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[> 166 ... 167 168 Once the first line has been processed, the flow will look like this: 169 170 file ---> uudecode ---> uncompress ---> parser 171 filter filter 172 173 Data flows through filters in the same order they appear in the source 174 file. The uudecode filter appeared before the uncompress filter, so the 175 source file will be uudecoded before it's uncompressed. 176 177 =head1 WRITING A SOURCE FILTER 178 179 There are three ways to write your own source filter. You can write it 180 in C, use an external program as a filter, or write the filter in Perl. 181 I won't cover the first two in any great detail, so I'll get them out 182 of the way first. Writing the filter in Perl is most convenient, so 183 I'll devote the most space to it. 184 185 =head1 WRITING A SOURCE FILTER IN C 186 187 The first of the three available techniques is to write the filter 188 completely in C. The external module you create interfaces directly 189 with the source filter hooks provided by Perl. 190 191 The advantage of this technique is that you have complete control over 192 the implementation of your filter. The big disadvantage is the 193 increased complexity required to write the filter - not only do you 194 need to understand the source filter hooks, but you also need a 195 reasonable knowledge of Perl guts. One of the few times it is worth 196 going to this trouble is when writing a source scrambler. The 197 C<decrypt> filter (which unscrambles the source before Perl parses it) 198 included with the source filter distribution is an example of a C 199 source filter (see Decryption Filters, below). 200 201 202 =over 5 203 204 =item B<Decryption Filters> 205 206 All decryption filters work on the principle of "security through 207 obscurity." Regardless of how well you write a decryption filter and 208 how strong your encryption algorithm, anyone determined enough can 209 retrieve the original source code. The reason is quite simple - once 210 the decryption filter has decrypted the source back to its original 211 form, fragments of it will be stored in the computer's memory as Perl 212 parses it. The source might only be in memory for a short period of 213 time, but anyone possessing a debugger, skill, and lots of patience can 214 eventually reconstruct your program. 215 216 That said, there are a number of steps that can be taken to make life 217 difficult for the potential cracker. The most important: Write your 218 decryption filter in C and statically link the decryption module into 219 the Perl binary. For further tips to make life difficult for the 220 potential cracker, see the file I<decrypt.pm> in the source filters 221 module. 222 223 =back 224 225 =head1 CREATING A SOURCE FILTER AS A SEPARATE EXECUTABLE 226 227 An alternative to writing the filter in C is to create a separate 228 executable in the language of your choice. The separate executable 229 reads from standard input, does whatever processing is necessary, and 230 writes the filtered data to standard output. C<Filter:cpp> is an 231 example of a source filter implemented as a separate executable - the 232 executable is the C preprocessor bundled with your C compiler. 233 234 The source filter distribution includes two modules that simplify this 235 task: C<Filter::exec> and C<Filter::sh>. Both allow you to run any 236 external executable. Both use a coprocess to control the flow of data 237 into and out of the external executable. (For details on coprocesses, 238 see Stephens, W.R. "Advanced Programming in the UNIX Environment." 239 Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference 240 between them is that C<Filter::exec> spawns the external command 241 directly, while C<Filter::sh> spawns a shell to execute the external 242 command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning 243 a shell allows you to make use of the shell metacharacters and 244 redirection facilities. 245 246 Here is an example script that uses C<Filter::sh>: 247 248 use Filter::sh 'tr XYZ PQR'; 249 $a = 1; 250 print "XYZ a = $a\n"; 251 252 The output you'll get when the script is executed: 253 254 PQR a = 1 255 256 Writing a source filter as a separate executable works fine, but a 257 small performance penalty is incurred. For example, if you execute the 258 small example above, a separate subprocess will be created to run the 259 Unix C<tr> command. Each use of the filter requires its own subprocess. 260 If creating subprocesses is expensive on your system, you might want to 261 consider one of the other options for creating source filters. 262 263 =head1 WRITING A SOURCE FILTER IN PERL 264 265 The easiest and most portable option available for creating your own 266 source filter is to write it completely in Perl. To distinguish this 267 from the previous two techniques, I'll call it a Perl source filter. 268 269 To help understand how to write a Perl source filter we need an example 270 to study. Here is a complete source filter that performs rot13 271 decoding. (Rot13 is a very simple encryption scheme used in Usenet 272 postings to hide the contents of offensive posts. It moves every letter 273 forward thirteen places, so that A becomes N, B becomes O, and Z 274 becomes M.) 275 276 277 package Rot13; 278 279 use Filter::Util::Call; 280 281 sub import { 282 my ($type) = @_; 283 my ($ref) = []; 284 filter_add(bless $ref); 285 } 286 287 sub filter { 288 my ($self) = @_; 289 my ($status); 290 291 tr/n-za-mN-ZA-M/a-zA-Z/ 292 if ($status = filter_read()) > 0; 293 $status; 294 } 295 296 1; 297 298 All Perl source filters are implemented as Perl classes and have the 299 same basic structure as the example above. 300 301 First, we include the C<Filter::Util::Call> module, which exports a 302 number of functions into your filter's namespace. The filter shown 303 above uses two of these functions, C<filter_add()> and 304 C<filter_read()>. 305 306 Next, we create the filter object and associate it with the source 307 stream by defining the C<import> function. If you know Perl well 308 enough, you know that C<import> is called automatically every time a 309 module is included with a use statement. This makes C<import> the ideal 310 place to both create and install a filter object. 311 312 In the example filter, the object (C<$ref>) is blessed just like any 313 other Perl object. Our example uses an anonymous array, but this isn't 314 a requirement. Because this example doesn't need to store any context 315 information, we could have used a scalar or hash reference just as 316 well. The next section demonstrates context data. 317 318 The association between the filter object and the source stream is made 319 with the C<filter_add()> function. This takes a filter object as a 320 parameter (C<$ref> in this case) and installs it in the source stream. 321 322 Finally, there is the code that actually does the filtering. For this 323 type of Perl source filter, all the filtering is done in a method 324 called C<filter()>. (It is also possible to write a Perl source filter 325 using a closure. See the C<Filter::Util::Call> manual page for more 326 details.) It's called every time the Perl parser needs another line of 327 source to process. The C<filter()> method, in turn, reads lines from 328 the source stream using the C<filter_read()> function. 329 330 If a line was available from the source stream, C<filter_read()> 331 returns a status value greater than zero and appends the line to C<$_>. 332 A status value of zero indicates end-of-file, less than zero means an 333 error. The filter function itself is expected to return its status in 334 the same way, and put the filtered line it wants written to the source 335 stream in C<$_>. The use of C<$_> accounts for the brevity of most Perl 336 source filters. 337 338 In order to make use of the rot13 filter we need some way of encoding 339 the source file in rot13 format. The script below, C<mkrot13>, does 340 just that. 341 342 die "usage mkrot13 filename\n" unless @ARGV; 343 my $in = $ARGV[0]; 344 my $out = "$in.tmp"; 345 open(IN, "<$in") or die "Cannot open file $in: $!\n"; 346 open(OUT, ">$out") or die "Cannot open file $out: $!\n"; 347 348 print OUT "use Rot13;\n"; 349 while (<IN>) { 350 tr/a-zA-Z/n-za-mN-ZA-M/; 351 print OUT; 352 } 353 354 close IN; 355 close OUT; 356 unlink $in; 357 rename $out, $in; 358 359 If we encrypt this with C<mkrot13>: 360 361 print " hello fred \n"; 362 363 the result will be this: 364 365 use Rot13; 366 cevag "uryyb serq\a"; 367 368 Running it produces this output: 369 370 hello fred 371 372 =head1 USING CONTEXT: THE DEBUG FILTER 373 374 The rot13 example was a trivial example. Here's another demonstration 375 that shows off a few more features. 376 377 Say you wanted to include a lot of debugging code in your Perl script 378 during development, but you didn't want it available in the released 379 product. Source filters offer a solution. In order to keep the example 380 simple, let's say you wanted the debugging output to be controlled by 381 an environment variable, C<DEBUG>. Debugging code is enabled if the 382 variable exists, otherwise it is disabled. 383 384 Two special marker lines will bracket debugging code, like this: 385 386 ## DEBUG_BEGIN 387 if ($year > 1999) { 388 warn "Debug: millennium bug in year $year\n"; 389 } 390 ## DEBUG_END 391 392 When the C<DEBUG> environment variable exists, the filter ensures that 393 Perl parses only the code between the C<DEBUG_BEGIN> and C<DEBUG_END> 394 markers. That means that when C<DEBUG> does exist, the code above 395 should be passed through the filter unchanged. The marker lines can 396 also be passed through as-is, because the Perl parser will see them as 397 comment lines. When C<DEBUG> isn't set, we need a way to disable the 398 debug code. A simple way to achieve that is to convert the lines 399 between the two markers into comments: 400 401 ## DEBUG_BEGIN 402 #if ($year > 1999) { 403 # warn "Debug: millennium bug in year $year\n"; 404 #} 405 ## DEBUG_END 406 407 Here is the complete Debug filter: 408 409 package Debug; 410 411 use strict; 412 use warnings; 413 use Filter::Util::Call; 414 415 use constant TRUE => 1; 416 use constant FALSE => 0; 417 418 sub import { 419 my ($type) = @_; 420 my (%context) = ( 421 Enabled => defined $ENV{DEBUG}, 422 InTraceBlock => FALSE, 423 Filename => (caller)[1], 424 LineNo => 0, 425 LastBegin => 0, 426 ); 427 filter_add(bless \%context); 428 } 429 430 sub Die { 431 my ($self) = shift; 432 my ($message) = shift; 433 my ($line_no) = shift || $self->{LastBegin}; 434 die "$message at $self->{Filename} line $line_no.\n" 435 } 436 437 sub filter { 438 my ($self) = @_; 439 my ($status); 440 $status = filter_read(); 441 ++ $self->{LineNo}; 442 443 # deal with EOF/error first 444 if ($status <= 0) { 445 $self->Die("DEBUG_BEGIN has no DEBUG_END") 446 if $self->{InTraceBlock}; 447 return $status; 448 } 449 450 if ($self->{InTraceBlock}) { 451 if (/^\s*##\s*DEBUG_BEGIN/ ) { 452 $self->Die("Nested DEBUG_BEGIN", $self->{LineNo}) 453 } elsif (/^\s*##\s*DEBUG_END/) { 454 $self->{InTraceBlock} = FALSE; 455 } 456 457 # comment out the debug lines when the filter is disabled 458 s/^/#/ if ! $self->{Enabled}; 459 } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { 460 $self->{InTraceBlock} = TRUE; 461 $self->{LastBegin} = $self->{LineNo}; 462 } elsif ( /^\s*##\s*DEBUG_END/ ) { 463 $self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo}); 464 } 465 return $status; 466 } 467 468 1; 469 470 The big difference between this filter and the previous example is the 471 use of context data in the filter object. The filter object is based on 472 a hash reference, and is used to keep various pieces of context 473 information between calls to the filter function. All but two of the 474 hash fields are used for error reporting. The first of those two, 475 Enabled, is used by the filter to determine whether the debugging code 476 should be given to the Perl parser. The second, InTraceBlock, is true 477 when the filter has encountered a C<DEBUG_BEGIN> line, but has not yet 478 encountered the following C<DEBUG_END> line. 479 480 If you ignore all the error checking that most of the code does, the 481 essence of the filter is as follows: 482 483 sub filter { 484 my ($self) = @_; 485 my ($status); 486 $status = filter_read(); 487 488 # deal with EOF/error first 489 return $status if $status <= 0; 490 if ($self->{InTraceBlock}) { 491 if (/^\s*##\s*DEBUG_END/) { 492 $self->{InTraceBlock} = FALSE 493 } 494 495 # comment out debug lines when the filter is disabled 496 s/^/#/ if ! $self->{Enabled}; 497 } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { 498 $self->{InTraceBlock} = TRUE; 499 } 500 return $status; 501 } 502 503 Be warned: just as the C-preprocessor doesn't know C, the Debug filter 504 doesn't know Perl. It can be fooled quite easily: 505 506 print <<EOM; 507 ##DEBUG_BEGIN 508 EOM 509 510 Such things aside, you can see that a lot can be achieved with a modest 511 amount of code. 512 513 =head1 CONCLUSION 514 515 You now have better understanding of what a source filter is, and you 516 might even have a possible use for them. If you feel like playing with 517 source filters but need a bit of inspiration, here are some extra 518 features you could add to the Debug filter. 519 520 First, an easy one. Rather than having debugging code that is 521 all-or-nothing, it would be much more useful to be able to control 522 which specific blocks of debugging code get included. Try extending the 523 syntax for debug blocks to allow each to be identified. The contents of 524 the C<DEBUG> environment variable can then be used to control which 525 blocks get included. 526 527 Once you can identify individual blocks, try allowing them to be 528 nested. That isn't difficult either. 529 530 Here is an interesting idea that doesn't involve the Debug filter. 531 Currently Perl subroutines have fairly limited support for formal 532 parameter lists. You can specify the number of parameters and their 533 type, but you still have to manually take them out of the C<@_> array 534 yourself. Write a source filter that allows you to have a named 535 parameter list. Such a filter would turn this: 536 537 sub MySub ($first, $second, @rest) { ... } 538 539 into this: 540 541 sub MySub($$@) { 542 my ($first) = shift; 543 my ($second) = shift; 544 my (@rest) = @_; 545 ... 546 } 547 548 Finally, if you feel like a real challenge, have a go at writing a 549 full-blown Perl macro preprocessor as a source filter. Borrow the 550 useful features from the C preprocessor and any other macro processors 551 you know. The tricky bit will be choosing how much knowledge of Perl's 552 syntax you want your filter to have. 553 554 =head1 THINGS TO LOOK OUT FOR 555 556 =over 5 557 558 =item Some Filters Clobber the C<DATA> Handle 559 560 Some source filters use the C<DATA> handle to read the calling program. 561 When using these source filters you cannot rely on this handle, nor expect 562 any particular kind of behavior when operating on it. Filters based on 563 Filter::Util::Call (and therefore Filter::Simple) do not alter the C<DATA> 564 filehandle. 565 566 =back 567 568 =head1 REQUIREMENTS 569 570 The Source Filters distribution is available on CPAN, in 571 572 CPAN/modules/by-module/Filter 573 574 Starting from Perl 5.8 Filter::Util::Call (the core part of the 575 Source Filters distribution) is part of the standard Perl distribution. 576 Also included is a friendlier interface called Filter::Simple, by 577 Damian Conway. 578 579 =head1 AUTHOR 580 581 Paul Marquess E<lt>Paul.Marquess@btinternet.comE<gt> 582 583 =head1 Copyrights 584 585 This article originally appeared in The Perl Journal #11, and is 586 copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and 587 The Perl Journal. This document may be distributed under the same terms 588 as Perl itself.
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Tue Mar 17 22:47:18 2015 | Cross-referenced by PHPXref 0.7.1 |