This document is unfortunately outdated but is still the best documentation we have. |
Julien's plug-ins for PowerPro
* Regex |
If you're familiar with PowerPro and its plug-ins, here is a quick start with everything you need to know. But you're still advised to read the entire document and get acquainted with important details. If you're not familiar at all with PowerPro and its plug-ins and/or want the full explanation, then you should definitely skip this first table and read all the rest:
REGEX plug-in syntax (PowerPro 3.7):
plugin.service(arguments) Example: regex.match(string, pattern, replace, "output_var") Services: match = match first occurrence matchg = match all occurrences replace = replace first occurrence replaceg = replace all occurrences Input variables: string = the string to be parsed pattern = the pattern applied on the string replace = the replacement string "output_var" = the name of the PowerPro variable that should contain the output string In PowerPro 3.7 syntax, string, pattern, replace and output are referred to as such just to make their roles clear. It's the position of the variables in the argument list that identifies their roles, not their names. In other words, these variables can have any name, only their position matters. So string must be the first argument, pattern must be the second and replace must be the third. The output_var string is optional, but must be the fourth element in the list of arguments. Output variables: return_code = numeric code indicating the result of the operation output_var = the output string, probably transformed The names of all these variables are arbitrary. The values they're supposed to hold can be assigned to any PowerPro variable you wish. The actual strings can also be typed directly into the argument list without the use of any variables at all: return_code = regex.match (string, pattern, replace) ________________________________________________ myRetCode = regex.replace ("take this string", "find this", "replace with that", "myOutput") myTransformedString = myOutput ____________________________________________ myNewReturn = regex.replaceg (myInputString, myMatchPattern, myReplaceString, "myNewOutput") myNewTransformedString = myNewOutput |
The REGEX plug-in makes it possible to parse short or long strings with PowerPro, either as a standalone operation (to test a regular expression, for example) or in the context of a PowerPro script. PowerPro already provides native functions to select, remove or replace characters in a string, but these native functions are quite limited because they rely on predictive characteristics, not available in every situation. Only regular expressions, provided with this handy plug-in, can really find, match and replace any possible pattern. Another excellent thing about this plug-in is that it supports PCRE (Perl-Compatible Regular Expressions), thus allowing even more flexible and accurate patterns.
This special functionality is added to PowerPro with two DLLs: pcre.dll, a library written by Philip Hazel, and regex.dll, written by Julien Pierrehumbert. The latter should be called with PowerPro every time a single match and/or replace operation is desired, using the correct syntax provided below. PowerPro can work with probably any custom-made DLL, making its full range of functionalities virtually unlimited. Note that it is possible to change the #include in the source code of the plugin to compile it with another POSIX-like regex library. Further information regarding licenses, documentation and sources of the PCRE library can be found at: http://gnuwin32.sourceforge.net/.
There are two ways to do it:
- Just copy pcre.dll and regex.dll to PowerPro's base directory, as explained in the chapter about plug-ins of PowerPro's documentation.
- Copy regex.dll to PowerPro's "plugins" directory, as explained in the chapter about plug-ins of PowerPro's documentation. But if you do that, pcre.dll must be placed in a directory included in your PATH environment variable.
Any given PowerPro plug-in can offer several different and even unrelated functionalities, called "services". The REGEX plug-in has four services: match, replace, matchg and replaceg. As explained in PowerPro's documentation, any plug-in is called this way:
plugin.service(arguments)
So there are four possible ways to call the REGEX plug-in, one for each service:
regex.match(string, pattern, replace)
regex.replace(string, pattern, replace, "Output_var") regex.matchg(string, pattern, replace) regex.replaceg(string, pattern, replace, "Output_var") |
Note that the fourth element in the argument list, "Output_var", is always optional. If you only want to test a match, you probably would use one of the "match" services, and probably wouldn't need the fourth argument.
However, if you want to change text and use one of the "replace" services, you probably want to retrieve the output. That's when we use the fourth argument, "Output_var". That argument will be the name of the PowerPro variable that will automatically hold the output of the operation. That's why the fourth argument is only included in the "replace" operations in the syntax demonstration above.
Of course, the REGEX plug-in cannot do anything until it is given some input. This is provided by means of three mandatory arguments:
Any one of them can be clearly expressed in quotes or represented by variables that contain their respective strings. Here is the description of each argument's role:
string must contain the string that you want to analyze, i.e. the text on which the regular expression will be applied;
|
The REGEX plug-in has four services: match, matchg, replace and replaceg.
match | "MATCH" operation.
Checks whether string contains any portion that matches pattern. The check is performed only once, so the first match only is found. The result of the operation is stored automatically in one or two output variables*. |
matchg | "MATCH GLOBAL" operation.
Checks whether string contains any portion that matches pattern. The check is performed several times if necessary, until all possible matches are found. The result of the operation is stored automatically in one or two output variables*. |
replace | "REPLACE" operation.
Checks whether string contains any portion that matches pattern. If any match is found, the first occurrence of that match is replaced with the text stored in replace. The result of the operation is stored automatically in one or two output variables*. |
replaceg | "REPLACE GLOBAL" operation.
Checks whether string contains any portion that matches pattern. If any match is found, every occurrence of that match is replaced with the text stored in replace. The result of the operation is stored automatically in one or two output variables*. |
* Detailed information on the output variables is provided below, in the "Output" section. |
After the input arguments are provided and the REGEX plug-in is run, it returns at least one useful value in one or two variables: the return_code and the "output" variables (note the color and the quotes used in this sentence). You can specify one, two or even none of these variables, although specifying none would be totally useless.
The "output" variable contains the output of the match or replace operation, i.e. an actual string that may be an exact copy of the original subject string, depending on what you fed all the input variables.
|
Be careful when you use the "output" variable. You'll probably want to choose a name for the variable you know you will use later in the script and use that name enclosed in quotes in the plug-in call:
If you use an actual variable instead of just a name enclosed in quotes, some string value must be assigned to that variable beforehand:
The return code variable stores the value of the Regex plug-in's return code. It is obtained through the assignment of the whole plug-in call to any user-defined variable:
|
return_code contains the result of the expression's evaluation, but in the form of a return code, not a string. Here is the meaning of each possible code:
0 = Full Match
1 = Partial Match
2 = No Match
8 = Error: Invalid Pattern
9 = Error: No Input
A more detailed explanation of each return code is given in the table "Return Codes - Simple Examples" below.
Remember that return_code is only referred to as such so make its role clear. Any other variable name could be used for that purpose, like result, myReturn or JustChecking.
In other words...
return_code is the variable that will tell you what happened after the plug-in was run. It informs you (and the running script) whether or not there was a successful match in the operation.
- If you are replacing things in a string, the "output" variable is the name of the PowerPro variable that will contain the original text, probably modified by the plug-in after looking for a given pattern in it and replacing everything that matches that pattern with some other text (replace). Depending on what you feed the arguments, many or no modifications at all may apply, turning the string contained in the "output" variable into something slightly or completely different from the original string, or perhaps just leaving it the way it was.
IMPORTANT: if you are not replacing anything in a string, i.e. if you're just matching, "output" is the name of the variable that will contain whatever was matched by pattern. That is another good reason to use the third argument every time, even when we're not replacing anything. This issue and the actual mechanism required to retrieve a match will be explained later, in the Detailed Examples.
Return Codes - Simple Examples | ||
Return code | Meaning | Description |
0 | Full Match | Example: string = "223557864" pattern = "[0-9]+" return_code = 0 The pattern looks for a sequence with one or more numbers and nothing else, so the entire string matches the pattern. |
1 | Partial Match | Example: string = "Today is 24/09/2002 03:28" pattern = "[0-9]+" return_code = 1 The pattern looks for a sequence with one or more numbers and nothing else. "Today is ", the slashes, the colon and the spaces do not match, so only a few portions of string match the pattern. |
2 | No Match | Example: string = "Today is 24/09/2002 03:28" pattern = ".* anytime!" return_code = 2 The pattern looks for any sequence of characters (.*) followed by a space, the word "anytime" and an exclamation point. The exclamation point alone is enough to ruin all matching possibilities. The word "anytime" is not present in string either, so there is no match at all. |
8 | Error: Invalid Pattern | Example: string = "Today is 24/09/2002 03:28" pattern = "(open parentheses... [or brackets..." return_code = 8 Not closing or escaping opened parentheses, for example, is a mistake according to Regular Expression syntax rules. The REGEX plug-in cannot process an incorrect regular expression, so it aborts the operation and returns the corresponding error code. |
9 | Error: No Input | Example: string = "" pattern = "" return_code = 9 Neither string nor pattern can be left empty. If either is omitted, the REGEX plug-in does not have enough data to evaluate and do its job. |
Let's see practical examples of each service, their behavior, return code and output in detail. You can always try your own examples with the REGEX plug-in test script, provided at the end of this document. Click here to take a look at it now, then press the browser's "Back" button to come back to this point. You may want to copy the test script, then run it as you read each example and try it yourself.
MATCH/MATCHG:
We use the match and match global services to check whether some text contains any sequence of characters, like words, phrases, numbers or symbols. You'd better keep two things in mind when using these services:
|
The first point is relevant if you just want to test the presence of some pattern in a string and don't need to know exactly what was matched. In other words, "If (myReturn lt 2)" would mean "If something matches...".
The second point is relevant if you want to use the portion of the string that matches pattern. That's due to the perhaps unexpected behavior of the REGEX plug-in, placing the replacement string (replace) in the "output" variable whenever a match is found. For example:
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[0-9]{4}", i.e. a sequence of exactly four digits, and obtain the matched portion so as to verify exactly what four-digit number was found.
string = "Today is 24/09/2002, 03:28" pattern = "[0-9]{4}" replace = "" return_code = regex.match(string, pattern, replace, "output") or: return_code = regex.match("Today is 24/09/2002, 03:28", "[0-9]{4}", "", "output") return_code => 1 output => |
Hmmm... something is not right. Part of the string matches the pattern, so return_code is 1 . But the output is empty. Why? Because the replacement string replace is empty! Let's try again:
string = "Today is 24/09/2002, 03:28"
pattern = "[0-9]{4}" replace = "XXX" return_code = regex.match(string, pattern, replace, "output") or: return_code = regex.match("Today is 24/09/2002, 03:28", "[0-9]{4}", "XXX", "output") return_code => 1 output => XXX |
OK... so replace becomes output. But we're just matching. We're not replacing. Return code is 1 so we have a match, but what is the four-digit number that the plug-in found? How do we obtain the portion of the string that was matched?
There are two ways to achieve that, and both require that we use the match service as if it were a replacement service:
- First, an obvious workaround: we can (group) the whole pattern and get the match with the first backreference:
string = "Today is 24/09/2002, 03:28"
pattern = "([0-9]{4})" replace = "\1" return_code = regex.match(string, pattern, replace, "output") or: return_code = regex.match("Today is 24/09/2002, 03:28", "([0-9]{4})", "\1", "output") return_code => 1 output => 2002 |
OK, now we found the four-digit number matched by "([0-9]{4})"! It's "2002"!
- the second way is not a workaround, but an actual mechanism provided by the REGEX plug-in: grouping or not the whole pattern, the match can always be obtained with the zeroth back reference:
string = "Today is 24/09/2002, 03:28"
pattern = "[0-9]{4}" replace = "\0" return_code = regex.match(string, pattern, replace, "output") or: return_code = regex.match("Today is 24/09/2002, 03:28", "[0-9]{4}", "\0", "output") return_code => 1 output => 2002 |
Not very intuitive, but no rocket science either. Mystery solved. That's what documentation is for!
Note that although we use the match service as if it were a replacement service, no actual replacement has taken place yet. We use the replace argument, but we're not changing the original string at all, we're in fact just replacing whatever we place in replace with the matched portion, thus extracting the matched portion. Replacing involves changing part or all of the original string and getting back the entire string with the modifications. The match service will not output anything besides the matched portion only, its output never includes the rest of the string if it does not match.
But, wait! There is more! We still haven't tried the matchg service.
What if we want to find all occurrences of a pattern that may occur more than once? Let's see:
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[0-9]{2}", i.e. a sequence of exactly two digits, and obtain the matched portion so as to verify exactly what two-digit number was found.
string = "Today is 24/09/2002, 03:28" pattern = "[0-9]{2}" replace = "\0" return_code = regex.matchg(string, pattern, replace, "output") or: return_code = regex.matchg("Today is 24/09/2002, 03:28", "[0-9]{2}", "\0", "output") return_code => 1 output => 240920020328 |
The REGEX plug-in finds 24, 09, 20, 02, 03 and 28 and displays them all in a sequence, in the order they are found. If you want to separate the matches, just add a space to the replacement string:
string = "Today is 24/09/2002, 03:28"
pattern = "[0-9]{2}" replace = "\0 " return_code = regex.matchg(string, pattern, replace, "output") or: return_code = regex.matchg("Today is 24/09/2002, 03:28", "[0-9]{2}", "\0 ", "output") return_code => 1 output => 24 09 20 02 03 28 |
If you want to unload the plug-in at the end of the script, here is how it's done:
regex.unload
|
We use the replace and replace global services to check whether some text contains any sequence of characters, like words, phrases, numbers or symbols, and replace the matched pattern with some other text. You may also want to keep two things in mind when using the replacement services:
|
The first point means that we can also use the replace services to test matches. Usually, the replace services are used when a match is already expected and intended to be replaced right away. But it is possible to use replace operations to test a match and, if that match is found, replace it with something else, all in a single operation.
The second point is related to something that has been said before in this document and is repeated now:
"Note that although we use the match service as if it were a replacement service, no actual replacement has taken place yet. We use the replace argument, but we're not changing the original string at all. Replacing involves changing part or all of the original string and getting back the entire string with the modifications. The match service will not output anything besides the matched portion only, its output never includes the rest of the string if it does not match."
Contrasting with the behavior described above, the replace and replace global services will take everything: they will take the whole string, replace the matched portion with whatever is provided in replace and "output" not just the matches, but rather a combination of not matched and matched/replaced text. While the match service required that we use the replace argument just to extract matches, this time the replace argument actually does the job of replacing something and modifying the original string.
OK, let's see some examples:
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[0-9]{4}", i.e. a sequence of exactly four digits. If it is found, we're going to replace it with "XXXX".
string = "Today is 24/09/2002, 03:28" pattern = "[0-9]{4}" replace = "XXXX" return_code = regex.replace(string, pattern, replace, "output") or: return_code = regex.replace("Today is 24/09/2002, 03:28", "[0-9]{4}", "XXXX", "output") return_code => 1 output => Today is 24/09/XXXX, 03:28 |
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[0-9]{2}", i.e. a sequence of exactly two digits. If it is found, we're going to replace it with "XX".
string = "Today is 24/09/2002, 03:28" pattern = "[0-9]{2}" replace = "XX" return_code = regex.replace(string, pattern, replace, "output") or: return_code = regex.replace("Today is 24/09/2002, 03:28", "[0-9]{2}", "XX", "output") return_code => 1 output => Today is XX/09/2002, 03:28 |
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[0-9]{2}", i.e. a sequence of exactly two digits. If it is found, we're going to replace it with "XX".
string = "Today is 24/09/2002, 03:28" pattern = "[0-9]{2}" replace = "XX" return_code = regex.replaceg(string, pattern, replace, "output") or: return_code = regex.replaceg("Today is 24/09/2002, 03:28", "[0-9]{2}", "XX", "output") return_code => 1 output => Today is XX/XX/XXXX XX:XX |
Plain English: take the phrase "Today is 24/09/2002, 03:28", check if it contains the pattern "[A-Za-z ]+", i.e. a sequence of characters with undefined length which may include upper and lower case letters and spaces. If it is found, we're going to replace it with "", i.e. nothing. In other words, remove all letters and spaces.
string = "Today is 24/09/2002, 03:28" pattern = "[A-Za-z ]+" replace = "" return_code = regex.replaceg(string, pattern, replace, "output") or: return_code = regex.replaceg("Today is 24/09/2002, 03:28", "[A-Za-z ]+", "", "output") return_code => 1 output => 24/09/2002,03:28 |
One last thing: long strings. For a long time, PowerPro and its plug-ins could only handle strings with up to 263 characters, recently increased to 531. Most of this limitation has been removed, unleashing all the power the Regex plug-in can have. One must be aware of a couple of things, though:
1 - The native PowerPro clip variable can only hold one line, with up to 531 characters. Two plug-ins provide a suitable way to get the entire content of the clipboard:
And this is it. Go ahead and do your own experiments! Use the script below to run any tests you feel like. The script is ready to be used. Just copy it, paste it in a text file (preferably with the ".powerpro" extension) and run it. The parts highlighted with a white background can and should be changed to suit your preferences.
REGEX PLUG-IN TEST SCRIPT |
;LINES WRITTEN IN THIS COLOR ARE COMMENTS ;THEY ARE NOT EXECUTED BY POWERPRO ;String = the string on which you want to run the expression ;Pattern = the regular expression ;Replace = replacement string ;myServices: match, matchg, replace, replaceg String = "Today is 24/09/2002 03:28" Pattern = "[0-9]+" Replace = "XX" Service = "replace" ;# DO NOT EDIT BELOW THIS POINT ;# ---------------------------------------------------------------- Return_code = regex.&(Service)(string, pattern, replace, "output") ;# ---------------------------------------------------------------- If (Return_code == 0) do MsgPopup = Return_code ++ " - Full match!" ElseIf (Return_code == 1) MsgPopup = Return_code ++ " - Partial_match" ElseIf (Return_code == 2) MsgPopup = Return_code ++ " - No_match" ElseIf (Return_code == 8) MsgPopup = Return_code ++ " - Invalid_regexp. Pay attention!!!!" ElseIf (Return_code == 9) MsgPopup = Return_code ++ " - No_input." Else MsgPopup = Return_code ++ " - Yikes! Weird return code!" EndIf ;# ---------------------------------------------------------------- Debug RETURN CODE: &(MsgPopup), OUTPUT: &(output) ;# ---------------------------------------------------------------- ;THE NEXT LINES ARE OPTIONAL. USE THEM IF YOU WANT ;A LOG FILE CALLED "DEBUG-REGEX.TXT" ON YOUR DESKTOP myLogFile = "c:\windows\desktop\debug-regex.txt" *Exec ToFile &(myLogFile) Testing myService: &(Service) *Exec ToFile &(myLogFile) *Exec ToFile &(myLogFile) string = &(String) *Exec ToFile &(myLogFile) pattern = &(Pattern), replace = &(Replace) *Exec ToFile &(myLogFile) return = &(Return_code), output = &(output) *Exec ToFile &(myLogFile) *Exec ToFile &(myLogFile) ==================================== *Exec ToFile &(myLogFile) |
Bruce Switzer |