From 23d6c43d524aa2df08851a4739f3a1eef9ef3b77 Mon Sep 17 00:00:00 2001 From: danrega <16720986+danrega@users.noreply.github.com> Date: Fri, 29 Nov 2024 16:28:28 +0100 Subject: [PATCH] Update --- 07_String_Processing.md | 371 +--------- 28_Regular_Expressions.md | 1375 +++++++++++++++++++++++++++++++++++++ README.md | 1 + 3 files changed, 1386 insertions(+), 361 deletions(-) create mode 100644 28_Regular_Expressions.md diff --git a/07_String_Processing.md b/07_String_Processing.md index ea1f4b1..5453019 100644 --- a/07_String_Processing.md +++ b/07_String_Processing.md @@ -34,11 +34,6 @@ - [Pattern-Based Searching and Replacing in Strings](#pattern-based-searching-and-replacing-in-strings) - [Simple Pattern-Based Searching Using Comparison Operators](#simple-pattern-based-searching-using-comparison-operators) - [Complex Searching and Replacing Using Regular Expressions](#complex-searching-and-replacing-using-regular-expressions) - - [Excursion: Common Regular Expressions](#excursion-common-regular-expressions) - - [Searching Using Regular Expressions](#searching-using-regular-expressions) - - [System Classes for Regular Expressions](#system-classes-for-regular-expressions) - - [Replacing Using Regular Expressions](#replacing-using-regular-expressions) - - [Overview/Examples: Using PCRE Regular Expressions in Various Contexts](#overviewexamples-using-pcre-regular-expressions-in-various-contexts) - [More String Functions](#more-string-functions) - [Checking the Similarity of Strings](#checking-the-similarity-of-strings) - [Repeating Strings](#repeating-strings) @@ -1687,279 +1682,17 @@ IF s1 NP `i+`. ... "true; sy-fdpos = 11 (length of searched string) ### Complex Searching and Replacing Using Regular Expressions -#### Excursion: Common Regular Expressions - -There are several ways to perform complex searches in strings using PCRE expressions. They can be quite complex. The following overview shows common PCRE expressions with simple examples. It is not comprehensive. For more details, see [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm). - -Characters and character types - -| Expression | Represents | Example Regex | Example String | Matches | Does not Match | -|---|---|---|---|---|---| -| `x` | Specific character | `a` | abcdef | a | Anything else | -| `.` | Anything except a line break | `.` | ab 1# | a, b, the blank, 1, # | ab, 1# | -| `\d` | Any digit (0-9), alternative: `[0-9]` | `\d` | a1-b2 3-4c9 | 1, 2, 3, 4, 9 | a, b, c, the blank and hyphens | -| `\D` | Any non-digit, alternative: `[^0-9]` | `\D` | a1-b2 3-4c9 | a, b, c, the blank and hyphens | 1, 2, 3, 4, 9 | -| `\s` | Any whitespace character such as a blank, tab and new line | `\s` | (hi X ) | The blanks | h, i, X, (, ) | -| `\S` | Any character that is not a whitespace | `\S` | (hi X ) | h, i, X, (, ) | The blanks | -| `\w` | Any word character (letter, digit or the underscore), alternative: `[a-zA-Z0-9_]` | `\w` | (ab 12_c) | a, b, c, 1, 2, _ | (, ), the blank | -| `\W` | Any character that is not a word character, alternative: `[^a-zA-Z0-9_]` | `\W` | (ab 12_c) | (, ), the blank | a, b, c, 1, 2, _ | -| `\` | To include special characters like `[] \ / ^`, use `\` to escape them. Use `\.` to match a period ("."). | `.\.` | ab.cd.ef | a**b.**c**d.**ef | ab**.c**d**.e**f | - -Repetitions and Alternatives - -| Expression | Represents | Example Regex | Example String | Matches | Does not Match | -|---|---|---|---|---|---| -| `x*` | Zero or more repetitions of `x` | `ab*` | abc abbc abbbc a ac | **ab**c **abb**c **abbb**c **a** **a**c | **abc** **abbc** **abbbc** a **ac** | -| `x+` | One or more repetitions of `x` | `ab+` | abc abbc abbbc a ac | **ab**c **abb**c **abbb**c a ac | ... **a** **a**c | -| `x{m,n}` | Between `m` and `n` repetitions of `x` | `ab{2,3}` | abc abbc abbbc a ac | abc **abb**c **abbb**c a ac | **ab**c ... | -| `x{m}` | Exactly `m` repetitions | `ab{3}` | abc abbc abbbc a ac | abc abbc **abbb**c a ac | abc **abb**c ... | -| `x{m,}` | Exactly `m` or more repetitions | `ab{2,}` | abc abbc abbbc a ac | abc **abb**c **abbb**c a ac | **ab**c ... | -| `x?` | Optional `x`, i.e. zero or one time | `ab?` | abc abbc abbbc a ac | **ab**c **ab**bc **ab**bbc **a** **a**c | ... **ac** | -| `x\|y` | Matching alternatives, i. e. `x` or `y` | 1. `b\|2`
2. `b(a\|u)t` | 1. abc 123
2. bit bat but bet | 1. b, 2
2. bat, but | 1. a, c, 1, 3
2. bit, bet | -| `x*?` | `x*` captures greedily, i.e. as much as possible, while `x*?` captures non-greedily, i.e. as few as possible | 1. `bc*?`
2. `a.*?#` | 1. abcd abccccd ab
2. abc#defgh#i | 1. a**b**cd a**b**ccccd a**b**
2. **abc#**defgh#i | 1. a**bc**d a**bcccc**d a**b** (result for `bc*`)
2. **abc#defgh#**i (result for `a.*#`) | -| `x+?` | Same as above: `x+` (greedy), `x+?` (non-greedy) | 1. `bc+?`
2. `<.+?>` | 1. abcd abccccd ab
2. <span>Hallo</span> html. | 1. a**bc**d a**bc**cccd ab
2. **<span>**Hallo**</span>** html. | 1. a**bc**d a**bcccc**d ab (result for `bc+`)
2. **<span>Hallo</span>** html. (result for `<.+>`) | - -Character Sets, Ranges, Subgroups and Lookarounds -| Expression | Represents | Example Regex | Example String | Matches | Does not Match | -|---|---|---|---|---|---| -| `[xy]` | Character set, matches a single character present in the list | `b[iu]` | bit bat but bet | **bi**t bat **bu**t bet | bit **ba**t but **be**t | -| `[x-y]` | Character range, matches a single character in the specified range, note that ranges may be locale-dependent | `a[a-c0-5]` | aa1 ab2 ba3 cac4 da56 a7 |**aa**1 **ab**2 b**a3** c**ac**4 d**a5**6 a7 | aa1 ab2 ba3 cac4 da56 **a7** | -| `[^xy]` | Negation, matches any single character not present in the list | `[^Ap]` | ABap | B, a | A, p | -| `[^x-y]` | Negation, matches any single character not within the range | `[^A-Ca-c1-4]` | ABCDabcd123456 | D, d, 5, 6 | A, B, C, a, b, c, 1, 2, 3, 4 | -| `(...)` | Capturing group to group parts of patterns together | `b(a\|u)t` | bit bat but bet | bat, but | bit, bet | -| `(?=...)` | Positive lookahead, returns characters that are followed by a specified pattern without including this pattern | `a(?=b)` | abc ade | **a**bc ade | abc **a**de | -| `(?!...)` | Negative lookahead, returns characters that are not followed by a specified pattern without including this pattern | `a(?!b)` | abc ade | abc **a**de | **a**bc ade | -| `(?<=...)` | Positive lookbehind, returns characters that are preceded by a specified pattern without including this pattern | `(?<=\s)c` | ab c abcd | ab **c** abcd (it is preceded by a blank) | ab c ab**c**d | -| `(?**c**d (it is not preceded by a blank) | ab **c** abcd | -| `\1` | Backreference, refers to a previous capturing group; 1 represents the number of the group index (the group index starts with 1) | `(a.)(\w*)\1` | abcdefabghij | **abcdefab**ghij
Note: Capturing group 1 holds `ab` in the example. The second capturing group captures all word characters until `ab` is found. | **ab**cdefabghij | -| `\K` | Resets the starting point of a match, i.e. findings are excluded from the final match | `a.\Kc` | abcd | ab**c**d | **abc**d | - -> **💡 Note**
-> - Subgroups are useful in replacements. By using an expression with `$` and a number, such as `$1`, you can refer to a specific group. For example, you have a string `abcde`. A PCRE expression might be -`(ab|xy)c(d.)`, where two subgroups are specified within two pairs of parentheses. In a replacement pattern, you can refer to the first group with `$1` and the second group with `$2`. Thus, the replacement pattern `$2Z$1` results in `deZab`. -> - `(?:x)` creates a group but it is not captured. Example regular expression: `(?:ab)(ap)`. Example string: 'abap'. It matches 'abap', but `$1` will only contain 'ap'. -> - Regarding special characters, check the [Special Characters](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm) topic in the ABAP Keyword Documentation. For example, a non-breaking space whose hex code is *U+00A0*. You can replace all of the non-breaking space occurrences in a string as follows: -> ```abap -> REPLACE ALL OCCURRENCES OF PCRE `\x{00A0}` IN some_string WITH ``. -> "Alternative -> REPLACE ALL OCCURRENCES OF PCRE `(*UTF)\N{U+00A0}` IN some_string WITH ``. -> ``` - -Anchors and Positions - -| Expression | Represents | Example Regex | Example String | Matches | Does not Match | -|---|---|---|---|---|---| -| `^` | Start of line | `^.` | abc def | **a**bc def | abc **d**ef | -| `$` | End of line | `.$` | abc def | abc de**f** | **a**bc def | -| `\b` | Start or end of word | 1. `\ba.`
2. `\Dd\b`
3. `\b.d\b` | abcd a12d ed | 1. **ab**cd **a1**2d ed
2. ab**cd** a12d **ed**
3. abcd a12d **ed** | 1. ab**cd** a1**2d** ed
2. abcd a1**2d** ed
3. **abcd** **a12d** ed | -| `\B` | Negation of `\b`, not at the start or end of words | `\Be\B` | see an elefant | s**e**e an el**e**fant | s**ee** an **e**lefant | - -> **💡 Note**
-> Find more information [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm). Note that there are further anchors available such as `\A` and `\Z` for the start and end of strings. - -

⬆️ back to top

- -#### Searching Using Regular Expressions - -- Multiple string functions support PCRE expressions by offering the - `pcre` parameter, which you can use to specify such an expression. -`FIND` and `REPLACE` statements support regular -expressions with the `PCRE` addition. -- The string function -[`match`](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenmatch_functions.htm) -works only with regular expressions. It returns a substring that -matches a regular expression within a string. -- For comparisons, you can -also use the [predicate -function](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenpredicate_function_glosry.htm "Glossary Entry") -[`matches`](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenmatches_functions.htm), which returns true or false if a string matches a given pattern or not. - -Syntax examples: -``` abap -DATA(s1) = `Cathy's black cat on the mat played with Matt.`. - -"Determining the position of the first occurrence -"Here, the parameter occ is 1 by default. -DATA(int) = find( val = s1 pcre = `at.` ). "1 - -"Determining the number of all occurrences. -"Respects all 'a' characters not followed by 't', all 'at' plus 'att' -int = count( val = s1 pcre = `at*` ). "6 - -"Respects all 'at' plus 'att' -int = count( val = s1 pcre = `at+` ). "4 - -"Extracting a substring matching a given pattern -DATA(s2) = match( val = `The email address is jon.doe@email.com.` - pcre = `\w+(\.\w+)*@(\w+\.)+(\w{2,4})` ). "jon.doe@email.com - -"Predicate function matches -"Checking the validitiy of an email address -IF matches( val = `jon.doe@email.com` - pcre = `\w+(\.\w+)*@(\w+\.)+(\w{2,4})` ). "true -... -ENDIF. - -"Examples with the FIND statement -"SUBMATCHES addition: Storing submatches in variables -"Pattern: anything before and after ' on ' -FIND PCRE `(.*)\son\s(.*)` IN s1 IGNORING CASE SUBMATCHES DATA(a) DATA(b). -"a: 'Cathy's black cat' / b: 'the mat played with Matt.'. - -"Determining the number of letters in a string -FIND ALL OCCURRENCES OF PCRE `[A-Za-z]` IN s1 MATCH COUNT DATA(c). "36 - -"Searching in an internal table and retrieving line, offset, length information -DATA(itab) = value string_table( ( `Cathy's black cat on the mat played with the friend of Matt.` ) ). -"Pattern: 't' at the beginning of a word followed by another character -FIND FIRST OCCURRENCE OF PCRE `\bt.` IN TABLE itab - IGNORING CASE MATCH LINE DATA(d) MATCH OFFSET DATA(e) MATCH LENGTH DATA(f). "d: 1, e: 21, f: 2 - -"The objective of the following example is to extract the content of the segments that -"are positioned within /.../ in a URL. The segments are stored in an internal table. -DATA(url) = `https://help.sap.com/docs/abap-cloud/abap-concepts/controlled-sap-luw/`. -DATA url_parts TYPE string_table. - -FIND ALL OCCURRENCES OF PCRE `(?<=/)([^/]+)(?=/)` IN url RESULTS DATA(res). - -"Details on the regular expression: -"- Positive lookbehind (?<=/) that determines that the content is preceded by `/` -"- Positive lookahead (?=/) that determines that the content is followed by `/ -"- ([^/]+) in between determines that any sequence of characters that are not `/` are matched -"- The match is put in parentheses to store the submatch - -"The RESULTS addition stores findings in an internal table of type match_result_tab. -"Submatches (i.e. length and offset values of the submatches) are stored in internal -"tables themselves. Therefore, the example uses nested loops and the substring function -"to retrieve the strings. - -LOOP AT res INTO DATA(finding). - LOOP AT finding-submatches INTO DATA(sub). - DATA(url_part) = substring( val = url off = sub-offset len = sub-length ). - APPEND url_part TO url_parts. - ENDLOOP. -ENDLOOP. - -"The following statement uses nested iteration expressions with FOR instead of nested -"LOOP statements. -DATA(url_parts_for_loop) = VALUE string_table( FOR wa1 IN res - FOR wa2 IN wa1-submatches - ( substring( val = url off = wa2-offset len = wa2-length ) ) ). - -ASSERT url_parts = url_parts_for_loop. -*Content: -*help.sap.com -*docs -*abap-cloud -*abap-concepts -*controlled-sap-luw -``` -

⬆️ back to top

- -#### System Classes for Regular Expressions - -- You can create an object-oriented representation of regular expressions using the `CL_ABAP_REGEX` system class. -- For example, the `CREATE_PCRE` method creates instances of regular expressions with PCRE syntax. -- The instances can be used, for example, with the `CL_ABAP_MATCHER` class, which applies the regular expressions. -- A variety of methods and parameters can be specified to accomplish various things and to further specify the handling of the regular expression. -- More information can be found [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_system_classes.htm) and in the class documentation (choose F2 on the class in ADT). - - -``` abap -DATA(str) = `a1 # B2 ? cd . E3`. - -"Creating a regex instance for PCRE regular expressions -"In the example, regex_inst has the type ref to cl_abap_regex. -DATA(regex_inst) = cl_abap_regex=>create_pcre( pattern = `\D\d` "any-non digit followed by a digit - ignore_case = abap_true ). - -"Creating an instance of CL_ABAP_MATCHER using the method CREATE_MATCHER of the class CL_ABAP_REGEX -"You can also specify internal tables with the 'table' parameter and more. -DATA(matcher) = regex_inst->create_matcher( text = str ). - -"Finding all results using the 'find_all' method -"In the example, result has the type match_result_tab containing the findings. -DATA(result) = matcher->find_all( ). - -"Using method chaining -DATA(res) = cl_abap_regex=>create_pcre( pattern = `\s\w` "any blank followed by any word character - ignore_case = abap_true )->create_matcher( text = str )->find_all( ). -``` - -

⬆️ back to top

- -#### Replacing Using Regular Expressions - -- To perform replacement operations using regular expressions, you can use both -the string function `replace` and `REPLACE` statements with the `pcre` parameter or the `PCRE` addition. -- Like the `find` function, among others, and -`FIND` statements, the `replace` function and -`REPLACE` statements offer a number of parameters and additions that you can use to further restrict the area to be replaced. -- For more detailed information, refer to the ABAP -Keyword Documentation. -- The executable example covers many of the PCRE expressions listed above. - -Syntax examples: -``` abap -DATA(s1) = `ab apppc app`. -DATA s2 TYPE string. - -"Replaces 'p' with 2 - 4 repetitions, all occurences -s2 = replace( val = s1 pcre = `p{2,4}` with = `#` occ = 0 ). "ab a#c a# - -"Replaces any single character not present in the list, all occurences -s2 = replace( val = s1 pcre = `[^ac]` with = `#` occ = 0 ). " "a##a###c#a## - -"Replaces first occurence of a blank -s2 = replace( val = s1 pcre = `\s` with = `#` ). "ab#apppc app - -"Greedy search -"The pattern matches anything before 'p'. The matching is carried out as -"often as possible. Hence, in this example the search stretches until the -"end of the string since 'p' is the final character, i. e. this 'p' and -"anything before is replaced. -s2 = replace( val = s1 pcre = `.*p` with = `#` ). "# - -"Non-greedy search -"The pattern matches anything before 'p'. The matching proceeds until -"the first 'p' is found and does not go beyond. It matches as few as -"possible. Hence, the first found 'p' including the content before -"is replaced. -s2 = replace( val = s1 pcre = `.*?p` with = `#` ). "#ppc app - -"Replacements with subgroups -"Replaces 'pp' (case-insensitive here) with '#', the content before and after 'pp' is switched -s2 = replace( val = s1 - pcre = `(.*?)PP(.*)` - with = `$2#$1` - case = abap_false ). "pc app#ab a - -"Changing the source field directly with a REPLACE statement; same as above -REPLACE PCRE `(.*?)PP(.*)` IN s1 WITH `$2#$1` IGNORING CASE. "pc app#ab a - -"ALL OCCURRENCES addition -REPLACE ALL OCCURRENCES OF PCRE `\s` IN s1 WITH `?`. "pc?app#ab?a - -REPLACE ALL OCCURRENCES OF PCRE `p.` IN s1 WITH `XY` "XY?aXY#ab?a - REPLACEMENT COUNT DATA(repl_cnt) "2 - RESULTS DATA(repl_res). - "repl_res: - "LINE OFFSET LENGTH - "0 0 2 - "0 4 2 -``` - -

⬆️ back to top

- -#### Overview/Examples: Using PCRE Regular Expressions in Various Contexts - -- As also covered in other sections, there are multiple contexts where regular expressions are possible, for example, PCRE regular expressions. Among them are: - - Statements (`PCRE` addition): `FIND`, `REPLACE` - - Classes: `CL_ABAP_REGEX`, `CL_ABAP_MATCHER` - - Built-in functions having the `pcre` parameter: `find`, `find_end`, `count`, `match`, `replace`, `substring_from`, `substring_after`, `substring_before`, `substring_to` - - Note: Built-in string functions dealing with regular expressions are also available for ABAP SQL and CDS. See the [Misc Built-In Functions](24_Misc_Builtin_Functions.md) cheat sheet. - +- For more information refer to the [Regular Expressions in ABAP](28_Regular_Expressions.md) cheat sheet. +- It covers the following topics: + - An excursion covering common regular expressions (the focus is on PCRE) + - Regular expressions used in ABAP in the following contexts: + - `FIND` and `REPLACE` statements (with the `PCRE` addition) + - Built-in functions in ABAP with the `pcre` parameter, such as `find`, `find_end`, `count`, `match`, `replace`, `substring_from`, `substring_after`, `substring_before`, `substring_to` + - Built-in functions in ABAP SQL and CDS (e.g. `like_regexpr`, `locate_regexpr`, `locate_regexpr_after`, `occurrences_regexpr`, `replace_regexpr`, `substring_regexpr` in ABAP SQL) + - `CL_ABAP_REGEX` and `CL_ABAP_MATCHER` classes + Examples: + ```abap DATA(some_string) = `aa bb cc dd ee`. DATA(original_string) = some_string. @@ -2080,90 +1813,6 @@ REPLACE ALL OCCURRENCES OF PCRE `(

)(.*?)(<\/p>)` IN html_a WITH `$1Hi$3`. "Regular expression: any character or a new line with zero or more repretitions REPLACE ALL OCCURRENCES OF PCRE `(

)(.|\n)*?(<\/p>)` IN html_b WITH `$1Hi$3`. *

Hi

Hi

Hi

- -"The following strangely formatted demo HTML code is to be processed. It -"intentionally and randomly includes new lines and returns. Some HTML tags -"have attributes. Suppose you want to proceed with the following tasks: -"- Getting the content between the table tags -"- Putting the content in one line by removing all new lines and returns -"- Removing all attributes so that only the tags are available - -DATA(nl) = |\n|. -DATA(rt) = |\r|. -DATA(html) = -`` && nl && -`` && nl && -` ` && -` ` && rt && -` ` && nl && -` ` && -` ` && nl && -` ` && nl && -` ` && rt && -` ` && nl && -` ` && nl && -` ` && nl && -` ` && nl && -` ` && nl && -` ` && nl && -`
CARRID` && nl && -`CARRNAME
` && -` XY` && -` ` && -` XY Airlines` && nl && -`
` && rt && -` YZ` && rt && -` ` && nl && -` YZ Airways` && nl && -`
` && rt && -`` && nl && -``. - -"See the following examples of how to proceed. Other regular expressions may be chosen, too. - -"Getting the content/all tags between the table tags -"The regular expression considers the attributes of the table tag that could be anything. -"A non-greedy search up to the next '>' would not work with just '.' in this case because -"of the new lines and returns. So, a regular expression to retrieve the content might be -"as follows: -"[\s|\S]*? -"- Matches a single character specified within [...] -"- In this case it is either \s or \S (i.e. any whitespace or non-whitespace character -"- 'or' is represented by '|' -"- '*?' represents non-greedy search (zero or multiple occurrences) up to the next '>' in -" the example -"- The same regular expression is used between the ...
tags to capture -" anyting. -"- It is put within a pair of parentheses to capture a group. This group (i.e. the offset -" and length values for the finding) is available in the data object following RESULTS -" in the FIND statement. As only one group is available in the regular expression, -" the 'submatches" table contains one line. -"- Note the escaping of '/' for '/' in -"- The examples use FIND/REPLACE statements. You can also use string functions. -FIND PCRE `([\s|\S]*?)<\/table>` IN html RESULTS DATA(res). -DATA(content_bw_table_tag) = substring( val = html off = res-submatches[ 1 ]-offset len = res-submatches[ 1 ]-length ). -"Removing all new lines and returns -REPLACE ALL OCCURRENCES OF PCRE `\n|\r` IN content_bw_table_tag WITH ``. -"Removing all whitespace characters between start and end HTML tags -REPLACE ALL OCCURRENCES OF PCRE `>(\s*?)<` IN content_bw_table_tag WITH `><`. -"Removing all whitespace characters in front of the first HTML tag, and behind -"the last tag -REPLACE PCRE `^\s*?<` IN content_bw_table_tag WITH `<`. -REPLACE PCRE `>\s*$` IN content_bw_table_tag WITH `>`. -"Removing all attributes within tags -"Regex: Non-greedy searches up to the next whitespace character, and from that -"whitespace character up to the next '>'. Using $n, you refer to the capturing -"group. In this case, the first capturing group enclosed by '<...>' shall be -"inserted. -REPLACE ALL OCCURRENCES OF PCRE `<(.*?)\s(.*?)>` IN content_bw_table_tag WITH `<$1>`. -"span tags are not needed in the example any more as the attributes are removed -REPLACE ALL OCCURRENCES OF `` IN content_bw_table_tag WITH ``. -REPLACE ALL OCCURRENCES OF `` IN content_bw_table_tag WITH ``. ```

⬆️ back to top

diff --git a/28_Regular_Expressions.md b/28_Regular_Expressions.md new file mode 100644 index 0000000..93e8f5d --- /dev/null +++ b/28_Regular_Expressions.md @@ -0,0 +1,1375 @@ + + +# Regular Expressions in ABAP + +- [Regular Expressions in ABAP](#regular-expressions-in-abap) + - [Introduction](#introduction) + - [Excursion: Common Regular Expressions](#excursion-common-regular-expressions) + - [Characters and Character Types](#characters-and-character-types) + - [Escaped Characters and Special Variants](#escaped-characters-and-special-variants) + - [Quantifiers, Repetitions and Alternatives](#quantifiers-repetitions-and-alternatives) + - [Character Sets and Ranges](#character-sets-and-ranges) + - [Anchors and Positions](#anchors-and-positions) + - [Capturing Groups, Replacements and Backreferences](#capturing-groups-replacements-and-backreferences) + - [Lookarounds](#lookarounds) + - [Case Conversions in Replacement Patterns](#case-conversions-in-replacement-patterns) + - [Option Settings and Control Verbs](#option-settings-and-control-verbs) + - [Callouts](#callouts) + - [Coniditional Patterns](#coniditional-patterns) + - [ABAP Statements Using Regular Expressions](#abap-statements-using-regular-expressions) + - [Built-In Functions in ABAP Using Regular Expressions](#built-in-functions-in-abap-using-regular-expressions) + - [Built-In Functions in ABAP SQL and CDS Using Regular Expressions](#built-in-functions-in-abap-sql-and-cds-using-regular-expressions) + - [System Classes for Regular Expressions: CL\_ABAP\_REGEX and CL\_ABAP\_MATCHER](#system-classes-for-regular-expressions-cl_abap_regex-and-cl_abap_matcher) + +## Introduction + +Regular expressions +- Often abbreviated as regex +- Define a pattern of literal and special characters that describe a set of character strings +- Find one or more occurrences of the character strings defined by the expression +- Offer a powerful way to perform complex searching, replacing, and matching +- Can be used in ABAP in the following contexts (with a focus on PCRE): + - `FIND` and `REPLACE` statements (with the `PCRE` addition) + - Built-in functions in ABAP with the `pcre` parameter, such as `find`, `find_end`, `count`, `match`, `replace`, `substring_from`, `substring_after`, `substring_before`, `substring_to` + - Built-in functions in ABAP SQL and CDS (e.g. `like_regexpr`, `locate_regexpr`, `locate_regexpr_after`, `occurrences_regexpr`, `replace_regexpr`, `substring_regexpr` in ABAP SQL) + - `CL_ABAP_REGEX` and `CL_ABAP_MATCHER` classes (note that you can also use objects of `CL_ABAP_REGEX` in `FIND` and `REPLACE` statements) +- Are supported in ABAP with the following syntaxes: + - Perl Compatible Regular Expressions (PCRE) + - ABAP supports PCRE syntax via the PCRE2 library in the ABAP kernel, and ABAP SQL supports PCRE by using the PCRE1 library in the SAP HANA database. + - XPath + - XSD + - XPath and XSD regular expressions are internally and automatically converted into PCRE syntax. + - The syntax of regular expressions is generally standardized but varies with different syntax flavors. + - POSIX + - Obsolete; `REGEX` and `regex` are obsolete additions/parameters + - PCRE regular expressions perform better + - A syntax warning for POSIX can be hidden with the pragma `##regex_posix` + +> **💡 Note**
+> - You can perform complex searches using regular expressions. For simple pattern-based searches, refer to comparison operators (`CP`, `NP`) in the [String Processing](07_String_Processing.md) cheat sheets. +> - The cheat sheet and examples focus on PCRE regular expressions. For other syntax types, find more information and links in the [ABAP Keyword Documentation](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/ABENREGEX_SYNTAX.html). +> - If you have access to a system supporting [classic ABAP](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenclassic_abap_glosry.htm), you can check out the `demo_regex_toy` program for experimenting with regular expressions in ABAP. + +

⬆️ back to top

+ +## Excursion: Common Regular Expressions + +> **💡 Note**
+> - The sections below provide an overview of common PCRE expressions with examples. This is not a comprehensive overview; only selected expressions are included. +> - For a complete guide to PCRE syntax, refer to the [official documentation](https://perldoc.perl.org/perlre). Note that ABAP-specific restrictions or modifications may apply to the standard syntax. +> - The code snippets use `replace` functions to show the effects of PCRE regular expressions. Many examples use the `occ = 0` parameter to replace all occurrences. +> - When using regular expressions, note that exceptions such as of type `cx_sy_invalid_regex` (invalid regex) or `cx_sy_regex_too_complex` (too complex regex) can occur. + +### Characters and Character Types + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `x` | Specific character | `a` | abcdef | a | Anything else | +| `.` | Any character except a line break (unless in dotall mode, then any character) | `.` | ab 1# | a, b, the blank, 1, # | ab, 1# | +| `\d` | Any digit (0-9), alternative: `[0-9]` | `\d` | a1-b2 3-4c9 | 1, 2, 3, 4, 9 | a, b, c, the blank and hyphens | +| `\D` | Any non-digit, alternative: `[^0-9]` | `\D` | a1-b2 3-4c9 | a, b, c, the blank and hyphens | 1, 2, 3, 4, 9 | +| `\s` | Any whitespace character such as a blank, tab and new line | `\s` | (hi X ) | The blanks | h, i, X, (, ) | +| `\S` | Any character that is not a whitespace | `\S` | (hi X ) | h, i, X, (, ) | The blanks | +| `\w` | Any word character (letter, digit or the underscore), alternative: `[a-zA-Z0-9_]` | `\w` | (ab 12_c) | a, b, c, 1, 2, _ | (, ), the blank | +| `\W` | Any character that is not a word character, alternative: `[^a-zA-Z0-9_]` | `\W` | (ab 12_c) | (, ), the blank | a, b, c, 1, 2, _ | + +Examples: + + +```abap +DATA string_chars_types TYPE string. + +"Specific character +"#bcdef +string_chars_types = replace( val = `abcdef` pcre = `a` with = `#` occ = 0 ). + +"Any character except a line break +"_______ (7 replacements before the line break) +"____ (4 replacements after the line break) +string_chars_types = replace( val = |ab 1#?C\n D23| pcre = `.` with = `_` occ = 0 ). + +"Any digit +"a#-b# #-#c# +string_chars_types = replace( val = `a1-b2 3-4c9` pcre = `\d` with = `#` occ = 0 ). + +"Any non-digit +"#1##2#3#4#9 +string_chars_types = replace( val = `a1-b2 3-4c9` pcre = `\D` with = `#` occ = 0 ). + +"Any whitespace character such as a blank, tab and new line +"ab#cd#ef#gh +string_chars_types = replace( val = |ab cd\tef\ngh| pcre = `\s` with = `#` occ = 0 ). + +"Any character that is not a +"String template representation of the result: |## ##\t##\n##| +string_chars_types = replace( val = |ab cd\tef\ngh| pcre = `\S` with = `#` occ = 0 ). + +"Any word character (letter, digit or the underscore) +"(## ####) +string_chars_types = replace( val = `(ab 12_c)` pcre = `\w` with = `#` occ = 0 ). + +"Any character that is not a word character +"#ab#12_c# +string_chars_types = replace( val = `(ab 12_c)` pcre = `\W` with = `#` occ = 0 ). +``` + +

⬆️ back to top

+ +### Escaped Characters and Special Variants + +Note that some of the example strings are represented by string templates. + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `\...` | To include special characters like `[] \ / ^`, use `\` to escape them. Use `\.` to match a period ("."). | `.\.` | ab.cd.ef | a**b.**c**d.**ef | ab**.c**d**.e**f | +| `\n` | Line feed (negation `\N` for non-line feeds) | `\n\D` | \|\nabc\n123\| | \|\na\| | \|\n1\| | +| `\t` | Tab | `\t\d` | \|\tabc\t123\| | \|\t1\| | \|\ta\| | +| `\r` | Carriage return | `\r\s` | \|\rabc\r 123\| | \|\r \| | \|\ra\| | +| `\R` | Line feed sequence, ensuring that regular expression matches all kinds of newlines, such as `\n`, `\r`, or `\r\n` (carriage return followed by line feed) | `\R.` | \|abc\rdef\nghi\r\njkl mno\| | \|\rd\|, \|\ng\|, \|\r\nj\| | \| m\| | +| `\x{...}` | Character with hex code | `\x{00A0}#` (non-breaking space and #) | \|#{ cl_abap_conv_codepage=>create_in( codepage = \`UTF-16BE\` )->convert( source = CONV xstring( \`00A0\` ) ) }#\| | The non-breaking space plus the second # character | The first # character | +| `\N{U+...}` | Character with Unicode code point | `\N{U+00A0}#` (non-breaking space and # as above) | See the example below | See the example below | | +| `\p{..}` | Character with a specified Unicode character property; see the [documentation](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/abenregex_pcre_syntax_specials.html) for options; among them, there are, for example, properties for lower (`Ll`) and uppercase (`Lu`) letters; negation: `\P{..}` | 1. `\p{Ll}+`
2. `\p{Lu}+`
3. `\P{Lu}+` | Hello ABAP | 1. H**ello** ABAP
2. **H**ello **ABAP**
3. H**ello **ABAP (the space is also matched) | 1. Upper case letter sequences
2. Lower case letter sequences
3. Upper case letter sequences | + +Examples: + + +```abap +DATA string_esc_chars TYPE string. + +"Special characters +"a#b#c#d#e#f#g +string_chars_types = replace( val = `a[b]c\d/e^f.g` pcre = `\[|\]|\\|\/|\^|\.` with = `#` occ = 0 ). + +"Line feeds +"a#b#c +string_chars_types = replace( val = |a\nb\nc| pcre = `\n` with = `#` occ = 0 ). + +"Line feed negation +"String template representation of the result: |#\n#\n#| +string_chars_types = replace( val = |a\nb\nc| pcre = `\N` with = `#` occ = 0 ). + +"Tabs +"a#b#c +string_chars_types = replace( val = |a\tb\tc| pcre = `\t` with = `#` occ = 0 ). + +"Carriage return +"d#e#f +string_chars_types = replace( val = |d\re\rf| pcre = `\r` with = `#` occ = 0 ). + +"Characters with hex code +"The example string includes a non-breaking space. +"#x# +string_chars_types = replace( + val = |#{ cl_abap_conv_codepage=>create_in( codepage = `UTF-16BE` )->convert( source = CONV xstring( `00A0` ) ) }#| + pcre = `\x{00A0}` with = `x` occ = 0 ). + +"Characters with Unicode code point +"As above, the example string includes a non-breaking space. +"The PCRE syntax uses the control verb (*UTF) to enable UTF mode. +"#y# +string_chars_types = replace( + val = |#{ cl_abap_conv_codepage=>create_in( codepage = `UTF-16BE` )->convert( source = CONV xstring( `00A0` ) ) }#| + pcre = `(*UTF)\N{U+00A0}` with = `y` occ = 0 ). + +"Characters with a specified Unicode character property +"H# ABAP +string_chars_types = replace( val = `Hello ABAP` pcre = `\p{Ll}+` with = `#` occ = 0 ). + +"#ello # +string_chars_types = replace( val = `Hello ABAP` pcre = `\p{Lu}+` with = `#` occ = 0 ). + +"#ello# +string_chars_types = replace( val = `Hello ABAP` pcre = `\P{Ll}+` with = `#` occ = 0 ). + +"H#ABAP +string_chars_types = replace( val = `Hello ABAP` pcre = `\P{Lu}+` with = `#` occ = 0 ). +``` + +

⬆️ back to top

+ +### Quantifiers, Repetitions and Alternatives + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `x*` | Zero or more repetitions of `x` | `ab*` | abc abbc abbbc a ac | **ab**c **abb**c **abbb**c **a** **a**c | **abc** **abbc** **abbbc** a **ac** | +| `x+` | One or more repetitions of `x` | `ab+` | abc abbc abbbc a ac | **ab**c **abb**c **abbb**c a ac | ... **a** **a**c | +| `x{m,n}` | Between `m` and `n` repetitions of `x` | `ab{2,3}` | abc abbc abbbc a ac | abc **abb**c **abbb**c a ac | **ab**c ... | +| `x{m}` | Exactly `m` repetitions | `ab{3}` | abc abbc abbbc a ac | abc abbc **abbb**c a ac | abc **abb**c ... | +| `x{m,}` | Exactly `m` or more repetitions | `ab{2,}` | abc abbc abbbc a ac | abc **abb**c **abbb**c a ac | **ab**c ... | +| `x?` | Optional `x`, i.e. zero or one time | `ab?` | abc abbc abbbc a ac | **ab**c **ab**bc **ab**bbc **a** **a**c | ... **ac** | +| `x\|y` | Matching alternatives, i. e. `x` or `y` | 1. `b\|2`
2. `b(a\|u)t` | 1. abc 123
2. bit bat but bet | 1. b, 2
2. bat, but | 1. a, c, 1, 3
2. bit, bet | +| `x*?` | `x*` captures greedily, i.e. as much as possible, while `x*?` captures non-greedily, i.e. as few as possible | 1. `bc*?`
2. `a.*?#` | 1. abcd abccccd ab
2. abc#defgh#i | 1. a**b**cd a**b**ccccd a**b**
2. **abc#**defgh#i | 1. a**bc**d a**bcccc**d a**b** (result for `bc*`)
2. **abc#defgh#**i (result for `a.*#`) | +| `x+?` | Same as above: `x+` (greedy), `x+?` (non-greedy) | 1. `bc+?`
2. `<.+?>` | 1. abcd abccccd ab
2. <span>Hallo</span> html. | 1. a**bc**d a**bc**cccd ab
2. **<span>**Hallo**</span>** html. | 1. a**bc**d a**bcccc**d ab (result for `bc+`)
2. **<span>Hallo</span>** html. (result for `<.+>`) | + + +Examples: + + +```abap +DATA string_quan_rep_alt TYPE string. + +"Zero or more repetitions +"#c #c #c # #c +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab*` with = `#` occ = 0 ). + +"One or more repetitions +"#c #c #c a ac +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab+` with = `#` occ = 0 ). + +"Between x and y repetitions +"abc #c #c a ac +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab{2,3}` with = `#` occ = 0 ). + +"Exactly n repetitions +"abc abbc #c a ac +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab{3}` with = `#` occ = 0 ). + +"Exactly n or more repetitions +"abc #c #c a ac +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab{2,}` with = `#` occ = 0 ). + +"Optional matches +"#c #bc #bbc # #c +string_quan_rep_alt = replace( val = `abc abbc abbbc a ac` pcre = `ab?` with = `#` occ = 0 ). + +"Alternative matches +"#z#y#xdwevfu +string_quan_rep_alt = replace( val = `azbycxdwevfu` pcre = `a|b|c` with = `#` occ = 0 ). + +"Capturing non-greedily (zero or more repetitions) +"a#cd a#ccccd a# +string_quan_rep_alt = replace( val = `abcd abccccd ab` pcre = `bc*?` with = `#` occ = 0 ). + +"_defghxi +string_quan_rep_alt = replace( val = `abcxdefghxi` pcre = `\A.*?x` with = `_` ). + +"Example to compare with greedy capturing +"_i +string_quan_rep_alt = replace( val = `abcxdefghxi` pcre = `\A.*x` with = `_` ). + +"Capturing non-greedily (one or more repetitions) +"a#d a#cccd ab +string_quan_rep_alt = replace( val = `abcd abccccd ab` pcre = `bc+?` with = `#` occ = 0 ). + +"Example to compare with greedy capturing +"a#d a#d ab +string_quan_rep_alt = replace( val = `abcd abccccd ab` pcre = `bc+` with = `#` occ = 0 ). + +"#Hallo# +string_quan_rep_alt = replace( val = `Hallo` pcre = `<.+?>` with = `#` occ = 0 ). +``` + +

⬆️ back to top

+ +### Character Sets and Ranges + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `[...]` | Character set, matches a single character present in the list | `b[iu]` | bit bat but bet | **bi**t bat **bu**t bet | bit **ba**t but **be**t | +| `[x-y]` | Character range, matches a single character in the specified range, note that ranges may be locale-dependent | `a[a-c0-5]` | aa1 ab2 ba3 cac4 da56 a7 |**aa**1 **ab**2 b**a3** c**ac**4 d**a5**6 a7 | aa1 ab2 ba3 cac4 da56 **a7** | +| `[^...]` | Negation, matches any single character not present in the list | `[^Ap]` | ABap | B, a | A, p | +| `[^x-y]` | Negation, matches any single character not within the range | `[^A-Ca-c1-4]` | ABCDabcd123456 | D, d, 5, 6 | A, B, C, a, b, c, 1, 2, 3, 4 | + +Examples: + + +```abap +DATA string_char_sets_ranges TYPE string. + +"Defining a character set +"#t bat #t bet +string_char_sets_ranges = replace( val = `bit bat but bet` pcre = `b[iu]` with = `#` occ = 0 ). + +"Defining a character range +"#1 #2 b# c#4 d#6 a7 +string_char_sets_ranges = replace( val = `aa1 ab2 ba3 cac4 da56 a7` pcre = `a[a-c0-5]` with = `#` occ = 0 ). + +"Matching any single character not present in the list +"A##p +string_char_sets_ranges = replace( val = `ABap` pcre = `[^Ap]` with = `#` occ = 0 ). + +"Matching any single character not within the range +"ABC#abc#1234## +string_char_sets_ranges = replace( val = `ABCDabcd123456` pcre = `[^A-Ca-c1-4]` with = `#` occ = 0 ). +``` + +

⬆️ back to top

+ +### Anchors and Positions + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `\A` | Start of a subject | `\A.` | abc def | **a**bc def | abc **d**ef | +| `^` | Start of a subject (particularly relevant in multi line mode) | `^.` | See above and more details in the option settion section | | | +| `\Z` | End of a subject | `.\Z` | abc def | abc de**f** | **a**bc def | +| `$` | End of a subject (particularly relevant in multi line mode) | `.$` | See above and more details in the option settion section | | | +| `\b` | Start or end of word | 1. `\ba.`
2. `\Dd\b`
3. `\b.d\b` | abcd a12d ed | 1. **ab**cd **a1**2d ed
2. ab**cd** a12d **ed**
3. abcd a12d **ed** | 1. ab**cd** a1**2d** ed
2. abcd a1**2d** ed
3. **abcd** **a12d** ed | +| `\B` | Negation of `\b`, not at the start or end of words | `\Be\B` | see an elefant | s**e**e an el**e**fant | s**ee** an **e**lefant | +| `\K` | Resets the starting point of a match, i.e. findings are excluded from the final match | `a.\Kc` | abcd | ab**c**d | **abc**d | + + +Examples: + + +```abap +DATA string_anchors_pos TYPE string. + +"Start of subject, syntax \A +"#abc def +string_anchors_pos = replace( val = `abc def` pcre = `\A` with = `#` occ = 0 ). + +"Start of subject, syntax ^; find more information below regarding multi line mode +"#abc +"#def +"#ghi +string_anchors_pos = replace( val = |abc\ndef\nghi| pcre = `(?m)^` with = `#` occ = 0 ). + +"The following examples uses ^ without enabling the multi line mode +"#abc +"def +"ghi +string_anchors_pos = replace( val = |abc\ndef\nghi| pcre = `^` with = `#` occ = 0 ). + +"End of subject, syntax \Z +"abc def# +string_anchors_pos = replace( val = `abc def` pcre = `\Z` with = `#` occ = 0 ). + +"End of subject, syntax $ +"The example uses multi line mode +"abc# +"def# +"ghi# +string_anchors_pos = replace( val = |abc\ndef\nghi| pcre = `(?m)$` with = `#` occ = 0 ). + +"Start or end of word +"#cd #2d ed +string_anchors_pos = replace( val = `abcd a12d ed` pcre = `\ba.` with = `#` occ = 0 ). + +"ab# a12d # +string_anchors_pos = replace( val = `abcd a12d ed` pcre = `\Dd\b` with = `#` occ = 0 ). + +"abcd a12d # +string_anchors_pos = replace( val = `abcd a12d ed` pcre = `\b.d\b` with = `#` occ = 0 ). + +"Not at the start or end of words +"s#e an el#fant +string_anchors_pos = replace( val = `see an elefant` pcre = `\Be\B` with = `#` occ = 0 ). + +"Resetting the starting point of a match +"ab#d +string_anchors_pos = replace( val = `abcd` pcre = `a.\Kc` with = `#` occ = 0 ). +``` + +

⬆️ back to top

+ +### Capturing Groups, Replacements and Backreferences + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `(...)` | Capturing group to group parts of patterns together | `b(a\|u)t` | bit bat but bet | bat, but | bit, bet | +| `(?...)`, `(?'name'...)` | Named capturing group | `(?..)(?'y'..)` | abcd | `$x` refers to *ab*, `$y` to *cd*
The reference can also be made using curly brackets: `${x}`, `${y}` | `$x` refers to *cd*, `$y` to *ab* | +| `$id`, `${id}` | Represents the substitute for a capturing group, `id` stands for a number or name of a capturing group | `(..)(..)` | abcd | `$1` refers to *ab*, `$2` to *cd*
`$0` refers to the content of the whole match
As above, the reference can also be made using curly brackets: `${0}`, `${1}`, `${2}` | `$1` refers to *cd*, `$2` to *ab* | +| `(?:...)` | Creates a group but it is not captured | `(?:ab)(ap)` | abap | It matches *abap*, but in a replacements and when referring to the subgroup `$1`, it refers to *ap*. | ab | +| `\1` | Backreference, refers to a previous capturing group; 1 represents the number of the group index (the group index starts with 1); more back reference syntax options are possible such as the specification of named groups | `(a.)(\w*)\1` | abcdefabghij | **abcdefab**ghij
Note: Capturing group 1 holds `ab` in the example. The second capturing group captures all word characters until `ab` is found. | **ab**cdefabghij | + +Examples: + + +```abap +DATA string_capt_group TYPE string. + +"bit # # bet +string_capt_group = replace( val = `bit bat but bet` pcre = `b(a|u)t` with = `#` occ = 0 ). + +"------- Replacements with capturing groups ------- + +"ABAP +string_capt_group = replace( val = `APAB` pcre = `(..)(..)` with = `$2$1` ). + +"$0 representing the content of the whole match +"#abcd# +string_capt_group = replace( val = `abcd` pcre = `(..)(..)` with = `#$0#` ). + +"Alternative replacement syntax with the curly brackets +"cdab +string_capt_group = replace( val = `APAB` pcre = `(..)(..)` with = `${2}${1}` ). + +"------- Named capturing groups ------- + + +"PCRE syntax (?...), replacement syntax $name +"Examplatory result: Day: 28, month: 11, year: 2024 +string_capt_group = replace( val = xco_cp=>sy->date( )->as( xco_cp_time=>format->abap )->value + pcre = `(?\d{4})(?\d{2})(?\d{2})` + with = `Day: $d, month: $mo, year: $yr` ). + + +"Alternative PCRE syntax (?'name'...), replacement syntax ${name} +"Examplatory result: 28.11.2024 +string_capt_group = replace( val = xco_cp=>sy->date( )->as( xco_cp_time=>format->abap )->value + pcre = `(?'yr'\d{4})(?'mo'\d{2})(?'d'\d{2})` + with = `${d}.${mo}.${yr}` ). + +"Creating a group but not capturing it +"#cd# +string_capt_group = replace( val = `abcd` pcre = `(?:..)(..)` with = `#$1#` ). + +"The following example raises an exception. $2 cannot be referred to. +TRY. + string_capt_group = replace( val = `abcd` pcre = `(?:..)(..)` with = `#$1#$2#` ). + CATCH cx_sy_invalid_regex_format. +ENDTRY. + +"#abcd# +string_capt_group = replace( val = `abcd` pcre = `(?:..)(..)` with = `#$0#` ). + +"------- Back reference ------- + +"In the example, the capturing group 1 holds `ab`. The second capturing group captures +"all word characters until `ab` is found, including `ab`. The reference to the first +"capturing group is made using \1. +"#ghij +string_capt_group = replace( val = `abcdefabghij` pcre = `(a.)(\w*)\1` with = `#` ). +``` + +

⬆️ back to top

+ +### Lookarounds + +| Expression | Represents | Example Regex | Example String | Matches | Does not Match | +|---|---|---|---|---|---| +| `(?=...)` | Positive lookahead, returns characters that are followed by a specified pattern without including this pattern | `a(?=b)` | abc ade | **a**bc ade | abc **a**de | +| `(?!...)` | Negative lookahead, returns characters that are not followed by a specified pattern without including this pattern | `a(?!b)` | abc ade | abc **a**de | **a**bc ade | +| `(?<=...)` | Positive lookbehind, returns characters that are preceded by a specified pattern without including this pattern | `(?<=\s)c` | ab c abcd | ab **c** abcd (it is preceded by a blank) | ab c ab**c**d | +| `(?**c**d (it is not preceded by a blank) | ab **c** abcd | + +Examples: + +```abap +DATA string_look_arounds TYPE string. + +"Positive lookahead +"#bc ade +string_look_arounds = replace( val = `abc ade` pcre = `a(?=b)` with = `#` occ = 0 ). + +"Negative lookahead +"abc #de +string_look_arounds = replace( val = `abc ade` pcre = `a(?!b)` with = `#` occ = 0 ). + +"Positive lookbehind +"ab # abcd +string_look_arounds = replace( val = `ab c abcd` pcre = `(?<=\s)c` with = `#` occ = 0 ). + +"Negative lookbehind +"ab c ab#d +string_look_arounds = replace( val = `ab c abcd` pcre = `(?⬆️ back to top

+ +### Case Conversions in Replacement Patterns + +You can use the following syntax options for case conversion in replacement patterns: + +- `\u`: Converts the first character after this marker to uppercase. +- `\U`: Converts all characters to uppercase until the next `\L` or `\E` marker. +- `\l`: Converts the first character after this marker to lowercase. +- `\L`: Converts all characters to lowercase until the next `\U` or `\E` marker. +- `\E`: Ends the current uppercase or lowercase transformation. + + +Examples: + + +```abap +DATA string_case_conv TYPE string. + +"\u syntax +"abcDefg +string_case_conv = replace( val = `abcdefg` pcre = `c(.*)` with = `c\u$1` ). + +"\U syntax +"abcDEFG +string_case_conv = replace( val = `abcdefg` pcre = `c(.*)` with = `c\U$1` ). + +"\l syntax +"HIJKlMNO +string_case_conv = replace( val = `HIJKLMNO` pcre = `K(.*)` with = `K\l$1` ). + +"\L syntax +"HIJKlmno +string_case_conv = replace( val = `HIJKLMNO` pcre = `K(.*)` with = `K\L$1` ). + +"\E syntax + +"abcDEfg +string_case_conv = replace( val = `abcdefg` pcre = `c(..)(..)` with = `c\U$1\E$2` ). + +"The following example is a comparison to similar syntax without specifying \E +"abcDEFG +string_case_conv = replace( val = `abcdefg` pcre = `c(..)(..)` with = `c\U$1$2` ). +``` + +

⬆️ back to top

+ +### Option Settings and Control Verbs + +There are various [setting options](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/abenregex_pcre_syntax_specials.html) for specifying PCRE syntax. The following examples demonstrate a selection: + +*Multi- and single-line modes* +- `(?m)` enables multi-line mode. +- `(?s)` enables single-line mode. +- Note in this context: + - The `.` character does not match new lines unless single-line mode is set. + - The anchors `^` and `$` become especially relevant in multi-line mode. + - These settings apply to the subsequent part of the group in which they are specified, such as the top-level pattern or a capturing group. + + +Examples: + + +```abap +DATA string_opt_set TYPE string. + +"----------- Examples with `.` and single line mode ----------- + +"`.` matches new lines feeds when the single line option is set +"Result (9 replacements): +"### +"### +"### +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `.` with = `#` occ = 0 ). + +"Result (11 replacements including the new line feeds): +"########### +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `(?s).` with = `#` occ = 0 ). + +"## +string_opt_set = replace( val = |\n\n| pcre = `(?s).` with = `#` occ = 0 ). + +"----------- Examples for the start of subjects ----------- + +"Multi line mode not enabled +"Result: +"#bc +"def +"ghi +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `^.` with = `#` occ = 0 ). + +"Multi line mode enabled +"Result: +"#bc +"#ef +"#hi +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `(?m)^.` with = `#` occ = 0 ). + +"\A for comparison +"Result: +"#bc +"def +"ghi +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `(?m)\A.` with = `#` occ = 0 ). + +"----------- Examples for the end of subjects ----------- + +"Result: +"abc +"def +"gh# +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `.$` with = `#` occ = 0 ). + +"Result: +"ab# +"de# +"gh# +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `(?m).$` with = `#` occ = 0 ). + +"\Z syntax (the result is the same as with `.$`) +string_opt_set = replace( val = |abc\ndef\nghi| pcre = `(?m).\Z` with = `#` occ = 0 ). +``` + +*Extended Mode* + +- Regular expressions with PCRE can be processed in normal or extended mode. +- In extended mode, unescaped whitespaces in the pattern are ignored. +- The extended mode is the default for ABAP statements and built-in functions, but you can disable it in the regex pattern with `(?-x)`. The `create_pcre` method of the `CL_ABAP_REGEX` class includes an `extended` parameter for this purpose. +- To handle whitespaces in regular expressions, escape it with `\ `, match it using `\s`, or enable extended mode with `(?-x)`. + +The following code snippet shows extended mode enabled by default and then explicitly disabled in a PCRE regular expression. + + +```abap +DATA some_string TYPE string. + +"Extended mode is enabled by default +"No replacement in the example. +"Result: abc def +some_string = replace( val = `abc def` pcre = `abc def` with = `#` ). + +"The following example works because there is no whitespace in the source string. +"The whitespace in the regular expression is ignored. +"Result: # +some_string = replace( val = `abcdef` pcre = `abc def` with = `#` ). + +"To match whitespace characters, you can use the pattern \s +"Result: # +some_string = replace( val = `abc def` pcre = `abc\sdef` with = `#` ). + +"Escaping a whitespace using \ +"Result: # +some_string = replace( val = `abc def` pcre = `abc\ def` with = `#` ). +"The following example also performs the replacement as the unescaped whitespaces +"are ignored. +some_string = replace( val = `abc def` pcre = `abc\ def` with = `#` ). + +"Disabling the extended mode, so whitespaces are not ignored +"Result: # +some_string = replace( val = `abc def` pcre = `(?-x)abc def` with = `#` ). +``` + +*Control verbs* + +Control verbs, for example, for ... + +- enabling UTF mode using `(*UTF)`. In PCRE syntax, you can define UTF-16 character strings. +- line breaks such as the following: + - `(*CR)`: Only carriage returns + - `(*LF)` Only line feed + - `(*CRLF)`: Only carriage returns followed by a line feed + - `(*ANYCRLF)`: All of the above-mentioned options + + +```abap +DATA ctrl_verb_string TYPE string. + +"Using (*UTF); the source string contains a non-breakable space that is to be replaced +"Result: #x# +ctrl_verb_string = replace( + val = |#{ cl_abap_conv_codepage=>create_in( codepage = `UTF-16BE` )->convert( source = CONV xstring( `00A0` ) ) }#| + pcre = `(*UTF)\N{U+00A0}` with = `x` occ = 0 ). + +"Line breaks +"The results are demonstrated using string templates. +"In the examples, the multi line mode is enabled in addition. + +"|_abc\ndef\r_ghi\r_\njkl| +ctrl_verb_string = replace( val = |abc\ndef\rghi\r\njkl| + pcre = `(*CR)(?m)^` with = `_` occ = 0 ). + +ctrl_verb_string = replace( val = |abc\ndef\rghi\r\njkl| + pcre = `(*LF)(?m)^` with = `_` occ = 0 ). + +"|_abc\n_def\rghi\r\n_jkl| +ctrl_verb_string = replace( val = |abc\ndef\rghi\r\njkl| + pcre = `(*CRLF)(?m)^` with = `_` occ = 0 ). + +"|_abc\n_def\r_ghi\r\n_jkl| +ctrl_verb_string = replace( val = |abc\ndef\rghi\r\njkl| + pcre = `(*ANYCRLF)(?m)^` with = `_` occ = 0 ). +``` + + +

⬆️ back to top

+ +### Callouts + +- Callouts invoke ABAP code during pattern matching. +- They access the current matcher state for further processing and can influence regex processing. +- In ABAP, object-oriented representations of PCRE regular expressions can use callouts. +- Relevant PCRE syntax includes `(?Cn)` (*n* is a number) and `(?C"text")`. + +The following example shows how to use callouts to call an ABAP method from a PCRE regular expression. It creates an object-oriented representation of a PCRE regex using the `CL_ABAP_REGEX` class, applying different PCRE syntaxes for callout specifications. The class implements the `IF_ABAP_MATCHER_CALLOUT` interface, and the `callout` method uses a demo class instance as the callout handler. If the regex matches, the method is called for each callout position. The `callout` method populates a string table with accessible details. This demonstrates that regex processing can be influenced, and in the example, processing stops when a condition is met. As a result, `found` is false because the regex is not fully processed. For more details, refer to the class documentation. + +To try the example out, create a demo class named `zcl_some_class` and paste the code into it. After activation, choose *F9* in ADT to execute the class. The example is set up to display output in the console. + +```abap +CLASS zcl_some_class DEFINITION + PUBLIC + FINAL + CREATE PUBLIC . + + PUBLIC SECTION. + INTERFACES if_oo_adt_classrun. + INTERFACES if_abap_matcher_callout. + PROTECTED SECTION. + PRIVATE SECTION. + CLASS-DATA callout TYPE i. + CLASS-DATA callout_tab TYPE string_table. +ENDCLASS. + + + +CLASS zcl_some_class IMPLEMENTATION. + + METHOD if_oo_adt_classrun~main. + DATA(text_to_search) = `abcdefghijklmnopq`. + DATA(regex) = cl_abap_regex=>create_pcre( pattern = `(...)(?C1)(..)(?C2)(....)(?C3)(.)(?C"D")(....)(?C"E")(...)(?C"F")` ). + DATA(matcher) = regex->create_matcher( text = text_to_search ). + DATA(handler) = NEW zcl_some_class( ). + matcher->set_callout( handler ). + DATA(found) = matcher->match( ). + + IF found = abap_false. + out->write( |Pattern not found.\n\n| ). + ENDIF. + + out->write( callout_tab ). + + ENDMETHOD. + + METHOD if_abap_matcher_callout~callout. + DATA(ts) = utclong_current( ). + callout += 1. + + IF callout_string = `F`. + callout_result = if_abap_matcher_callout=>c_callout_result-abort. + ELSE. + callout_result = if_abap_matcher_callout=>c_callout_result-pass. + + APPEND INITIAL LINE TO callout_tab. + APPEND |----------------- Callout { callout } -----------------| TO callout_tab. + APPEND INITIAL LINE TO callout_tab. + APPEND |callout_num: { callout_num }| TO callout_tab. + APPEND |callout_string: { callout_string }| TO callout_tab. + APPEND |regex: { regex }| TO callout_tab. + APPEND |subject: { subject }| TO callout_tab. + APPEND |current_subject_pos: { current_subject_pos }| TO callout_tab. + APPEND |current_pattern_pos: { current_pattern_pos }| TO callout_tab. + APPEND |capture_last: { capture_last }| TO callout_tab. + APPEND |capture_last_len: { capture_last_len }| TO callout_tab. + APPEND |capture_last_off: { capture_last_off }| TO callout_tab. + APPEND |start_match_off: { start_match_off }| TO callout_tab. + + TRY. + APPEND |Content of match: { subject+capture_last_off(capture_last_len) }| TO callout_tab. + CATCH cx_sy_range_out_of_bounds. + ENDTRY. + APPEND |This callout was called at { ts }| TO callout_tab. + ENDIF. + + ENDMETHOD. + +ENDCLASS. +``` + +

⬆️ back to top

+ +### Coniditional Patterns + +- Used to check whether certain capture groups are matched +- When matched or not matched, you can specify replacement patterns; you can also specify replacement patterns with conditions +- Relevant PCRE syntax for searches: `(?(condition)yes-pattern)` or `(?(condition)yes-pattern|no-pattern)` +- The `condition` can, for example, be the number or name of a capturing group. +- See the code snippet for replacement patterns. + +```abap +DATA string_cond_pattern TYPE string. + +"------- Conditions in the search pattern ------- + +"Example pattern: +"- The first capturing group matches a digit. +"- The '?' character specifies that the capturing group participates +" in the match optionally. +"- A condition is specified, introduced by '(?...' and referring to +" the first capturing group (the digit) indicated by 1. +"- If the condition is met, i.e. a digit is found, then the yes pattern +" is applied ('.pdf'), which participates in the match. +"- If the condition is not met, i.e. a digit is not found, then the no +" pattern is applied ('.md'), which participates in the match. +"- As a result, '2.pdf' and the two '.md' occurrences (also the one +" without the digit) are replaced. + +"1.txt,a.txt,#,b.pdf,3.gif,c.gif,4#,d#,5.jpg,e.jpg +string_cond_pattern = replace( val = `1.txt,a.txt,2.pdf,b.pdf,3.gif,c.gif,4.md,d.md,5.jpg,e.jpg` + pcre = `(\d)?(?(1)\.pdf|\.md)` with = `#` occ = 0 ). + +"------- Conditions in the replacement pattern ------- + +"Search pattern: +"- The '?' character specifies that the capturing group participates in the match optionally. +"Replacements pattern: +"- '$' followed by the specification of the capturing group that is referred to in curly brackets +"- The capturing group number is followed by a ':' and '+...' that denotes the value if true, +" the value following the other colon denotes the value if false + +"This is very good. +string_cond_pattern = replace( val = `correct` pcre = `(in)?correct` with = `This is ${1:+not:very} good.` ). + +"This is not good. +string_cond_pattern = replace( val = `incorrect` pcre = `(in)?correct` with = `This is ${1:+not:very} good.` ). + +"Syntax {n:-default} +"Shortcut for the PCRE replacement pattern by specifying +"a default if the capture group does not participate in the match + +"abcd +"'ab' participates in the match, so the default replacement pattern is not +"applied +string_cond_pattern = replace( val = `abcd` pcre = `(ab)?cd` with = `${1:-yz}cd` ). + +"yzcd +"'ab' does not participate in the match, so the default replacement pattern is +"applied +string_cond_pattern = replace( val = `cd` pcre = `(ab)?cd` with = `${1:-yz}cd` ). +``` + +## ABAP Statements Using Regular Expressions + +The following example demonstrates a selection of syntax options with `FIND` and `REPLACE`. There are multiple options as there is a rich variety of additions. For more details, refer to the [String Processing](07_String_Processing.md) cheat sheet. + +```abap +"----------------------------------------------------------------------------- +"------------------------------- FIND ---------------------------------------- +"----------------------------------------------------------------------------- + +DATA(str) = `Cathy's black cat on the mat played with Matt.`. + +"----- ALL OCCURRENCES addition: Finding all occurrences ----- +"The MATCH COUNT addition stores the number of occurrences in a data object. + +"Determining the number of letters in a string +"a: 36 +FIND ALL OCCURRENCES OF PCRE `[A-Za-z]` IN str MATCH COUNT DATA(a). + +"----- RESULTS addition: Finding all occurrences ----- +"The example also uses the ALL OCCURRENCES addition. The findings are +"stored in a table of type match_result_tab. +FIND ALL OCCURRENCES OF PCRE `\s` IN str RESULTS DATA(b). + +*LINE OFFSET LENGTH SUBMATCHES +*0 7 1 OFFSET LENGTH +* +*0 13 1 OFFSET LENGTH +* +*0 17 1 OFFSET LENGTH +* +*0 20 1 OFFSET LENGTH +* +*0 24 1 OFFSET LENGTH +* +*0 28 1 OFFSET LENGTH +* +*0 35 1 OFFSET LENGTH +* +*0 40 1 OFFSET LENGTH + +"----- SUBMATCHES addition: Storing capturing group content in variables ----- +"Pattern: anything before and after ' on ' +FIND PCRE `(.*)\son\s(.*)` IN str IGNORING CASE SUBMATCHES DATA(c) DATA(d). +"a: 'Cathy's black cat' / b: 'the mat played with Matt.'. + +"Using the RESULTS addition to get offset and length information of +"all capturing groups, among others +FIND PCRE `(.*)\son\s(.*)` IN str IGNORING CASE RESULTS DATA(e). + +*LINE OFFSET LENGTH SUBMATCHES +*0 0 46 OFFSET LENGTH +* 0 17 +* 21 25 + +"----- MATCH OFFSET/LENGTH additions ----- +"The following examples find the last occurence of a comma and the following content. +"\K denotes that the comma is excluded from the result. Using the offset and length +"values, the part of the string is extracted. The examples underscore that you can +"achieve the same thing differently with different PCRE syntax patterns (which also +"applies to ABAP demonstrated with the use of the substring function and specifying +"the syntax string+off(len) to extract the substring). + +str = `abc,def,ghi,jkl,mno,pqr,stu,vwx,yz`. + +"Negative lookahead +FIND PCRE `,(?!.*,)\K.*` IN str MATCH OFFSET DATA(off_a) MATCH LENGTH DATA(len_a). +"yz +DATA(content_a) = substring( val = str off = off_a len = len_a ). + +"Positive lookahead +FIND PCRE `,(?=[^,]*$)\K.*` IN str MATCH OFFSET DATA(off_b) MATCH LENGTH DATA(len_b). +DATA(content_b) = str+off_b(len_b). + +ASSERT content_a = `yz`. +ASSERT content_b = content_a. + +"----- IN TABLE addition: Searching in internal tables ----- +"The internal table must have a character-like line type. + +"Searching in an internal table and retrieving line, offset, length information +"The example only searches for the first occurrence. See another example below +"that uses the ALL OCCURRENCES and RESULTS additions. +DATA(itab) = VALUE string_table( ( `Cathy's black cat on the mat played with the friend of Matt.` ) ). +"Pattern: 't' at the beginning of a word followed by another character +"d: 1, e: 21, f: 2 +FIND FIRST OCCURRENCE OF PCRE `\bt.` IN TABLE itab + IGNORING CASE MATCH LINE DATA(f) MATCH OFFSET DATA(g) MATCH LENGTH DATA(h). + +"----- Searching in internal tables ----- +"The RESULTS addition stores findings in an internal table of type match_result_tab when +"ALL OCCURRENCES is used. +"Submatches (i.e. length and offset values of the submatches) are stored in internal +"tables themselves. Therefore, the example uses nested loops and the substring function +"to retrieve the strings. + +"The objective of the following example is to extract the content of the segments that +"are positioned within /.../ in a URL. The segments are stored in an internal table. +DATA(url) = `https://help.sap.com/docs/abap-cloud/abap-concepts/controlled-sap-luw/`. +DATA url_parts TYPE string_table. + +FIND ALL OCCURRENCES OF PCRE `(?<=/)([^/]+)(?=/)` IN url RESULTS DATA(res). + +"Details on the regular expression: +"- Positive lookbehind (?<=/) that determines that the content is preceded by `/` +"- Positive lookahead (?=/) that determines that the content is followed by `/ +"- ([^/]+) in between determines that any sequence of characters that are not `/` are matched +"- The match is put in parentheses to store the submatch + +LOOP AT res INTO DATA(finding). + LOOP AT finding-submatches INTO DATA(sub). + DATA(url_part) = substring( val = url off = sub-offset len = sub-length ). + APPEND url_part TO url_parts. + ENDLOOP. +ENDLOOP. + +"The following statement uses nested iteration expressions with FOR instead of nested +"LOOP statements. +DATA(url_parts_for_loop) = VALUE string_table( FOR wa1 IN res + FOR wa2 IN wa1-submatches + ( substring( val = url off = wa2-offset len = wa2-length ) ) ). + +ASSERT url_parts = url_parts_for_loop. +*Content: +*help.sap.com +*docs +*abap-cloud +*abap-concepts +*controlled-sap-luw + +"----------------------------------------------------------------------------- +"------------------------------- REPLACE ------------------------------------- +"----------------------------------------------------------------------------- + +DATA(str_replace) = `ab apppc app`. +DATA(str_replace_copy) = str_replace. + +"Changing the source field directly with a REPLACE statement; same as above +"pc app#ab a +REPLACE PCRE `(.*?)PP(.*)` IN str_replace WITH `$2#$1` IGNORING CASE. + +str_replace = str_replace_copy. + +"ALL OCCURRENCES addition +"ab#apppc#app +REPLACE ALL OCCURRENCES OF PCRE `\s` IN str_replace WITH `#`. + +str_replace = str_replace_copy. + +"ab a## a# +REPLACE ALL OCCURRENCES OF PCRE `p.` IN str_replace WITH `#` + REPLACEMENT COUNT DATA(repl_cnt) "3 + RESULTS DATA(repl_res). +"repl_res: +"LINE OFFSET LENGTH +"0 4 1 +"0 5 1 +"0 8 1 + +DATA(str_table_original) = VALUE string_table( ( `a1bc2` ) ( `d3ef` ) ( `4ghi` ) ( `jkl` ) ). +DATA(str_table) = str_table_original. + +"Replacing all occurrences in a table +"RESULTS addition: Storing information in an internal table of type repl_result_tab +REPLACE ALL OCCURRENCES OF PCRE `\d` + IN TABLE str_table + WITH `#` + RESULTS DATA(res_table). + +"str_table: a#bc# / d#ef / #ghi / jkl +"res_table: +"LINE OFFSET LENGTH +"1 1 1 +"1 4 1 +"2 1 1 +"3 0 1 + +str_table = str_table_original. + +"Replacing the first occurrence in a table +"RESULTS addition: Storing information in a structure of type repl_result +REPLACE FIRST OCCURRENCE OF PCRE `\d` + IN TABLE str_table + WITH `#` + RESULTS DATA(res_structure). + +"str_table: a#bc2 / d3ef / 4ghi / jkl +"res_structure: +"LINE OFFSET LENGTH +"1 1 1 + +str_table = str_table_original. + +"Restricting the search range in an internal table +REPLACE ALL OCCURRENCES OF PCRE `\d` + IN TABLE str_table + FROM 1 TO 2 + WITH `#`. + +"str_table: a#bc# / d#ef / 4ghi / jkl + +str_table = str_table_original. + +"Offsets can be optionally specified (also only the offset of start or end line possible) +REPLACE ALL OCCURRENCES OF PCRE `\d` + IN TABLE str_table + FROM 1 OFFSET 3 TO 2 OFFSET 2 + WITH `#`. + +"str_table: a1bc# / d#ef / 4ghi / jkl +``` + +

⬆️ back to top

+ + +## Built-In Functions in ABAP Using Regular Expressions + +Examples: + + +```abap +DATA(text) = `Pieces of cakes.`. + +"---------------- find ---------------- +"The find function searches for the subtexting specified and returns the offset + +"15 +DATA(find) = find( val = text pcre = `\.` ). + +"---------------- find_end ---------------- +"find_end returns the sum of the offset of the occurrence plus the length of the match + +"7 (6 + 1) +DATA(find_end) = find_end( val = text pcre = `\s` ). + +"---------------- count ---------------- + +"2 +DATA(count_a) = count( val = text pcre = `\s` ). + +"16 +DATA(count_b) = count( val = text pcre = `.` ). + +"---------------- match ---------------- + +"jon.doe@email.com +DATA(match_a) = match( val = `The email address is jon.doe@email.com.` + pcre = `\w+(\.\w+)*@(\w+\.)+(\w{2,4})` ). + +"Find blank (without inlcuding it in the result indicated by \K) and +"the following 2 characters, second occurrence +"ad +DATA(match_b) = match( val = `The email address is jon.doe@email.com.` + pcre = `\s\K..` + occ = 2 ). + +"---------------- replace ---------------- +"Pieces#of cakes. +DATA(replace_a) = replace( val = text pcre = `\s` with = `#` ). + +"Pieces of#cakes. +DATA(replace_b) = replace( val = text pcre = `\s` occ = 2 with = `#` ). + +"---------------- substring_* ---------------- + +text = `Lorem ipsum dolor sit amet`. + +"Extracting a substring ... +"... after a matching regular expression + +"dolor sit amet +DATA(substring_after) = substring_after( val = text pcre = `\s` occ = 2 ). + + +"... before a matching regular expression +"Lorem ipsum +DATA(substring_before) = substring_before( val = text pcre = `\s` occ = 2 ). + +"... from a matching regular expression on including the match +"' dolor sit amet' +DATA(substring_from) = substring_from( val = text pcre = `\s` occ = 2 ). + +"... up to a matching regular expression including the match +"'Lorem ipsum ' +DATA(substring_to) = substring_to( val = text pcre = `\s` occ = 2 ). +``` + +

⬆️ back to top

+ + +## Built-In Functions in ABAP SQL and CDS Using Regular Expressions + +The following example demonstrates [built-in functions using regular expressions in ABAP SQL](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/ABENSQL_FUNCTIONS_STRING.html). +For more information on the built-in functions for ABAP CDS, refer to the [ABAP Keyword Documentation](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenddic_builtin_functions.htm). The example assumes that the ABAP cheat sheet repository has been imported into the system as it uses some of its repository objects. + +```abap +"Populating demo database tables +zcl_demo_abap_aux=>fill_dbtabs( ). + +SELECT SINGLE + carrid, "LH + carrname, "Lufthansa + url, "http://www.lufthansa.com + + "Checks if expression contains a PCRE expression; + "case-sensitive by default (case_sensitive parameter can be specified) + "Notes on the 1 = found, 0 = not found + "1 + like_regexpr( pcre = '\..', "Period that is followed by any character + value = url ) AS like_regex, + + "Searches a PCRE pattern, returns offset of match; + "many optional parameters: occurrence, case_sensitive, start, group + "21 + locate_regexpr( pcre = '\..', "Period followed by any character + value = url, + occurrence = 2 ) "2nd occurrence in the string + AS locate_regexpr, + + "Searches a PCRE pattern, returns offset of match + 1; + "many optional parameters: occurrence, case_sensitive, start, group + "2 + locate_regexpr_after( pcre = '.', "Any character + value = url, + occurrence = 1 ) AS locate_regexpr_after, + + "Counts all occurrences of found PCRE patterns + "2 + occurrences_regexpr( pcre = '\..', "Period that is followed by any character + value = url ) AS occ_regex, + + "Replaces a found PCRE expression; + "more parameters possible: occurrence, case_sensitive, start + "http://www#ufthansa#om + replace_regexpr( pcre = '\..', "Period that is followed by any character + value = url, + with = '#' ) AS replace_regex, + + "Searches for a PCRE expression and returns the matched substring + "More parameters possible: occurrence, case_sensitive, start, group + ".lu + substring_regexpr( pcre = '\...', "Period that is followed by any two characters + value = url ) AS substring_regexpr + + FROM zdemo_abap_carr + WHERE carrid = 'LH' + INTO @DATA(builtin_func_regex). +``` + +

⬆️ back to top

+ +## System Classes for Regular Expressions: CL_ABAP_REGEX and CL_ABAP_MATCHER + +- `CL_ABAP_REGEX` + - Creates an object-oriented representation of regular expressions. + - The `create_pcre` method creates instances of regular expressions with PCRE syntax. + - Instances of the class can be used with the ... + - `CL_ABAP_MATCHER` class to create a matcher, which applies regular expressions or + - `REGEX` addition (not `PCRE`) in `FIND` and `REPLACE` statements. + - A variety of methods and parameters can be specified to accomplish various things and to further specify the handling of the regular expression. + +- `CL_ABAP_MATCHER` + - Applies regular expressions to strings or internal tables. + - Offers methods to search, replace, or match regular expressions. + - Instances of the class can be created with ... + - methods of the class itself or + - the `create_matcher` method of the `CL_ABAP_REGEX` class. + - You get access to the processing state of the regular expression using `get...` methods. + - Supports callouts as outlined in [this section](#callouts), requiring the `IF_ABAP_MATCHER_CALLOUT` interface implementation. + - For reusing regular expressions, use `CL_ABAP_MATCHER` instances created with `CL_ABAP_REGEX`. + +- Find more information in the class documentation (press F2 on the class in ADT). + + +Examples: + + +``` abap +"Example string +DATA(str) = `a1 # B2 ? cd . E3`. + +"----------- Creating an instance of a regular expression ----------- + +"Creating an instance of a regular expression with PCRE syntax +"using cl_abap_regex + +"Example pattern: Any-non digit followed by a digit +DATA(regex_1) = cl_abap_regex=>create_pcre( pattern = `\D\d` + ignore_case = abap_true ). + +"----------- Creating matchers ----------- + +"Two ways are possible (both ways return references of type +"ref to cl_abap_matcher): +"- create_matcher method of the cl_abap_regex class +"- create_pcre method of the cl_abap_matcher class +"Note that several importing parameters are available to enable +"further settings of the regular expression, e.g. ignoring the +"case, using the extended mode, etc. The example pass a string +"to the 'text' parameter. You can also specify internal tables +"with the 'table' parameter and more. + +"Creating a matcher using the create_matcher method of the cl_abap_regex class +DATA(matcher_1) = regex_1->create_matcher( text = str ). +"Creating a matcher in one go using method chaining +DATA(matcher_2) = cl_abap_regex=>create_pcre( pattern = `\D\d` + ignore_case = abap_true + )->create_matcher( text = str ). + +"Creating a matcher using the create_pcre method of the cl_abap_matcher class +DATA(matcher_3) = cl_abap_matcher=>create_pcre( pattern = `\D\d` + text = str + ignore_case = abap_true ). + +"----------- Exploring searching and replacing ----------- + +"--- Finding all occurrences using the find_all method --- +"In the example, result has the type match_result_tab containing the findings. +DATA(result_1) = matcher_1->find_all( ). + +*LINE OFFSET LENGTH SUBMATCHES +*0 0 2 OFFSET LENGTH +* +*0 5 2 OFFSET LENGTH +* +*0 15 2 OFFSET LENGTH + +DATA(result_2) = matcher_2->find_all( ). +ASSERT result_2 = result_1. + +"Getting the result in one go using method chaining with cl_abap_matcher +DATA(result_3) = cl_abap_matcher=>create_pcre( pattern = `\D\d` + text = str + ignore_case = abap_true + )->find_all( ). +ASSERT result_3 = result_1. + +"--- Example with subgroups --- + +str = `XabcdXXefgXXhXXijklmnXX`. + +DATA(result_4) = cl_abap_matcher=>create_pcre( pattern = `X(.*?)X` + text = str + ignore_case = abap_true + )->find_all( ). + +*LINE OFFSET LENGTH SUBMATCHES +*0 0 6 OFFSET LENGTH +* 1 4 +*0 6 5 OFFSET LENGTH +* 7 3 +*0 11 3 OFFSET LENGTH +* 12 1 +*0 14 8 OFFSET LENGTH +* 15 6 + +"--- Replacing all occurrences using the 'replace_all' method --- + +DATA(matcher_4) = cl_abap_regex=>create_pcre( pattern = `X(.*?)X` + )->create_matcher( text = str ). + +"4 +DATA(repl_count_1) = matcher_4->replace_all( newtext = `#` ). + +"####X +DATA(repl_result_1) = matcher_4->text. + +"Using cl_abap_matcher +DATA(matcher_5) = cl_abap_matcher=>create_pcre( pattern = `X(.*?)X` + text = str ). +DATA(repl_count_2) = matcher_5->replace_all( newtext = `#` ). +DATA(repl_result_3) = matcher_5->text. + +"---- Sequential processing of the regular expression --- +"---- using the find_next method ------------------------ +"The example explores various other methods, and writes +"information to a string table. + +str = `a1bc2def3ghij45klm67opqr8stuvwx90yz`. + +DATA(matcher_6) = cl_abap_matcher=>create_pcre( pattern = `\d(\D.)` + text = str ). + +DATA strtab TYPE string_table. +WHILE matcher_6->find_next( ) = abap_true. + APPEND |---- Finding { sy-index } -----| TO strtab. + + "Type match_result + DATA(match_result) = matcher_6->get_match( ). + + DATA(offset) = matcher_6->get_offset( ). + DATA(length) = matcher_6->get_length( ). + DATA(matched_content) = str+offset(length). + + APPEND |Match offset: { offset }| TO strtab. + APPEND |Match length: { length }| TO strtab. + APPEND |Match content: { matched_content }| TO strtab. + + "Type match_result + DATA(subgroup) = matcher_6->get_match( )-submatches. + + LOOP AT subgroup INTO DATA(wa). + DATA(sub_tabix) = sy-tabix. + DATA(submatch_line) = wa. + DATA(submatch_offset) = wa-offset. + DATA(submatch_length) = wa-length. + DATA(submatch) = matcher_6->get_submatch( sub_tabix ). + APPEND |Submatch { sub_tabix } offset: { submatch_offset }| TO strtab. + APPEND |Submatch { sub_tabix } length: { submatch_length }| TO strtab. + APPEND |Submatch { sub_tabix } content: { submatch }| TO strtab. + ENDLOOP. + +ENDWHILE. + +"---- Using an object of type cl_abap_regex in ABAP --- +"---- statements with the REGEX addition -------------- + +DATA(result_5) = cl_abap_matcher=>create_pcre( pattern = `\d(\D.)` + text = str + )->find_all( ). +DATA(result_6) = cl_abap_regex=>create_pcre( pattern = `\d(\D.)` + )->create_matcher( text = str + )->find_all( ). + +DATA(regex_2) = cl_abap_regex=>create_pcre( pattern = `\d(\D.)` ). + +FIND ALL OCCURRENCES OF REGEX regex_2 IN str RESULTS DATA(result_7). + +*LINE OFFSET LENGTH SUBMATCHES +*0 1 3 OFFSET LENGTH +* 2 2 +*0 4 3 OFFSET LENGTH +* 5 2 +*0 8 3 OFFSET LENGTH +* 9 2 +*0 14 3 OFFSET LENGTH +* 15 2 +*0 19 3 OFFSET LENGTH +* 20 2 +*0 24 3 OFFSET LENGTH +* 25 2 +*0 32 3 OFFSET LENGTH +* 33 2 + +ASSERT result_7 = result_5. +ASSERT result_7 = result_6. + +"Note that the REGEX addition is obsolete when using (POSIX) syntax patterns +"A syntax warning is displayed for the following example. +"FIND ALL OCCURRENCES OF REGEX `\d(\D.)` IN str RESULTS DATA(result_8). + +"The syntax warning can be suppressed using a pragma +FIND ALL OCCURRENCES OF REGEX `\d(\D.)` IN str RESULTS DATA(result_9) ##REGEX_POSIX. + +"Using PCRE instead +FIND ALL OCCURRENCES OF PCRE `\d(\D.)` IN str RESULTS DATA(result_10). +ASSERT result_10 = result_7. +``` + +

⬆️ back to top

\ No newline at end of file diff --git a/README.md b/README.md index 097906a..ed4fb9b 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,7 @@ ABAP cheat sheets[^1] ... |[Authorization Checks](25_Authorization_Checks.md)|Provides a high-level overview of explicit and implicit authorization checks in ABAP|- (The cheat sheet includes a copy and paste example class)| |[ABAP Dictionary](26_ABAP_Dictionary.md)|Covers a selection of repository objects in the ABAP Dictionary (DDIC) that represent global types|- (The cheat sheet includes a copy and paste example class)| |[Exceptions and Runtime Errors](27_Exceptions.md)|Provides an overview on exceptions and runtime errors|- (The cheat sheet includes a copy and paste example class)| +|[Regular Expressions in ABAP](28_Regular_Expressions.md)|Includes an overview of common regular expressions and their use in ABAP through statements, built-in functions, and system classes|- (The cheat sheet includes copy and paste sample code snippets)|