Perl Compatible Regular Expressions Article Index for
Perl
Website Links For
Perl
 

Information About

Perl Compatible Regular Expressions




  Latest Release Version 73
  Latest Release Date 2007-08-28
  Programming Language C
  Operating System Cross-platform
  Genre Pattern Matching Library
  License BSD
  Website http://wwwpcreorg/


Perl Compatible Regular Expressions ('''PCRE''') is a Regular Expression C Library inspired by Perl 's external interface, written by Philip Hazel . PCRE is much richer than classic regular expression libraries which is why they have been adopted by many modern Programming Language s. Their syntax is much more powerful and flexible than POSIX regular expressions. The name is therefore a misnomer, because PCRE is "Perl Compatible" only if you consider a subset of PCRE's settings and a subset of Perl's regular expression facilities.

PCRE settings also permit PCRE to emulate regular expression libraries other than Perl's, such as the selection of backslash to either enable ( Emacs -like) or disable (Perl-like) special characters like Vertical Bar . C and C++ interfaces are provided by the library itself.

The PCRE library is incorporated into a number of prominent Open-source programs, such as the Apache web server and the PHP scripting language. As of Perl 5.9.4 PCRE is also available as a replacement for Perl's default regular expression engine through the re::engine::PCRE module.


FEATURES

PCRE has developed an extensive and in some ways unique feature set. While originally intended to be feature equivalent with Perl over time a number of features have been first implemented in PCRE and only much later added to Perl. During the PCRE 7.x and Perl 5.9.x (development track) phase the two projects have coordinated development and are to the extent possible feature equivalent. In some cases PCRE has included in mainline releases features that originated with Perl 5.9.x and in some cases Perl 5.9.x has included features that were previously only available in PCRE. 1

Currently, the following features are available:

;Consistent escaping rules: Like Perl, PCRE has consistent escaping rules: any non-alpha-numeric character may be escaped to mean its literal value by prefixing a \ (backslash) before the character, and vice versa, any alpha-numeric character preceded by a backslash typically gives it a special meaning. In the case where the sequence has not been defined to be special it will also be treated as a literal, however this usage is not forward compatible as new versions of PCRE may give such patterns a special meaning. A good example of this is \R which has no special meaning prior to PCRE 7. In POSIX regular expressions, sometimes backslashes escaped non-alpha-numerics (e.g. \.) and sometimes it introduced a special feature (e.g. \(\)).
;Extended character classes :Single-letter character classes are supported in addition to the longer POSIX names. For example \d matches any digit exactly as would in POSIX regular expressions.
  • ?b" would match "ab" in "ababab", where "a.---b" would match the entire string.

  • ?\p{Pe}, would match a string that was delimited by any "opening punctuation" and any "close punctuation" such as " {Link without Title} ".

  • ;Multiline matching :^ and $ can match at the beginning and end of a string only, or at the start and end of each "line" within the string depending on what options are set.

;Newline/linebreak options :When PCRE is compiled, a Newline default is selected. Which Newline/linebreak is in effect affects where PCRE detects ^-line beginnings and $-ends (in multiline mode) as well as what matches dot (regardless of multiline mode unless the dotall (?s) option is set). It also affects PCRE's matching procedure (since version 7.0): when an unanchored pattern fails to match at the start of a newline sequence, PCRE advances past the entire newline sequence before retrying the match. If the newline option alternative in effect includes CRLF as one of the valid linebreaks, it does not skip the
in a CRLF if the pattern contains specific or
references (since version 7.3).
:The Newline option can be altered with external options when a pattern is compiled as well as when it is run. Few application using PCRE provide users with the means to apply this external option. So, new in version 7.3, the Newline option can also be stated at the start of the pattern using one of the following:

  • LF) Newline is a linefeed character. Corresponding linebreaks can be matched with

  • .

  • CR) Newline is a carriage return. Corresponding linebreaks can be matched with .

  • CRLF) Newline/Linebreak is a carriage return followed by a linefeed. Corresponding linebreaks can be matched with

  • .

  • ANYCRLF) Any of the above encountered in the data will trigger newline processing. Corresponding linebreaks can be matched with (?>

  •   Recursive Matches Are Atomic In PCRE And Non Atomic In Perl: This Means That <code>"<<!>!>!>><>>!>!>!>" ~/^(<(: {Link without Title} +(3)(1))>)()(!>!>!>)$/</code> will match in Perl but not in PCRE