This commit is contained in:
danrega
2024-11-29 16:28:28 +01:00
parent 5085810a52
commit 23d6c43d52
3 changed files with 1386 additions and 361 deletions

View File

@@ -34,11 +34,6 @@
- [Pattern-Based Searching and Replacing in Strings](#pattern-based-searching-and-replacing-in-strings)
- [Simple Pattern-Based Searching Using Comparison Operators](#simple-pattern-based-searching-using-comparison-operators)
- [Complex Searching and Replacing Using Regular Expressions](#complex-searching-and-replacing-using-regular-expressions)
- [Excursion: Common Regular Expressions](#excursion-common-regular-expressions)
- [Searching Using Regular Expressions](#searching-using-regular-expressions)
- [System Classes for Regular Expressions](#system-classes-for-regular-expressions)
- [Replacing Using Regular Expressions](#replacing-using-regular-expressions)
- [Overview/Examples: Using PCRE Regular Expressions in Various Contexts](#overviewexamples-using-pcre-regular-expressions-in-various-contexts)
- [More String Functions](#more-string-functions)
- [Checking the Similarity of Strings](#checking-the-similarity-of-strings)
- [Repeating Strings](#repeating-strings)
@@ -1687,279 +1682,17 @@ IF s1 NP `i+`. ... "true; sy-fdpos = 11 (length of searched string)
### Complex Searching and Replacing Using Regular Expressions
#### Excursion: Common Regular Expressions
There are several ways to perform complex searches in strings using PCRE expressions. They can be quite complex. The following overview shows common PCRE expressions with simple examples. It is not comprehensive. For more details, see [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm).
Characters and character types
| Expression | Represents | Example Regex | Example String | Matches | Does not Match |
|---|---|---|---|---|---|
| `x` | Specific character | `a` | abcdef | a | Anything else |
| `.` | Anything except a line break | `.` | ab 1# | a, b, the blank, 1, # | ab, 1# |
| `\d` | Any digit (0-9), alternative: `[0-9]` | `\d` | a1-b2 3-4c9 | 1, 2, 3, 4, 9 | a, b, c, the blank and hyphens |
| `\D` | Any non-digit, alternative: `[^0-9]` | `\D` | a1-b2 3-4c9 | a, b, c, the blank and hyphens | 1, 2, 3, 4, 9 |
| `\s` | Any whitespace character such as a blank, tab and new line | `\s` | (hi X ) | The blanks | h, i, X, (, ) |
| `\S` | Any character that is not a whitespace | `\S` | (hi X ) | h, i, X, (, ) | The blanks |
| `\w` | Any word character (letter, digit or the underscore), alternative: `[a-zA-Z0-9_]` | `\w` | (ab 12_c) | a, b, c, 1, 2, _ | (, ), the blank |
| `\W` | Any character that is not a word character, alternative: `[^a-zA-Z0-9_]` | `\W` | (ab 12_c) | (, ), the blank | a, b, c, 1, 2, _ |
| `\` | To include special characters like `[] \ / ^`, use `\` to escape them. Use `\.` to match a period ("."). | `.\.` | ab.cd.ef | a<ins>**b.**</ins>c<ins>**d.**</ins>ef | ab<ins>**.c**</ins>d<ins>**.e**</ins>f |
Repetitions and Alternatives
| Expression | Represents | Example Regex | Example String | Matches | Does not Match |
|---|---|---|---|---|---|
| `x*` | Zero or more repetitions of `x` | `ab*` | abc abbc abbbc a ac | <ins>**ab**</ins>c <ins>**abb**</ins>c <ins>**abbb**</ins>c <ins>**a**</ins> <ins>**a**</ins>c | <ins>**abc**</ins> <ins>**abbc**</ins> <ins>**abbbc**</ins> a <ins>**ac**</ins> |
| `x+` | One or more repetitions of `x` | `ab+` | abc abbc abbbc a ac | <ins>**ab**</ins>c <ins>**abb**</ins>c <ins>**abbb**</ins>c a ac | ... <ins>**a**</ins> <ins>**a**</ins>c |
| `x{m,n}` | Between `m` and `n` repetitions of `x` | `ab{2,3}` | abc abbc abbbc a ac | abc <ins>**abb**</ins>c <ins>**abbb**</ins>c a ac | <ins>**ab**</ins>c ... |
| `x{m}` | Exactly `m` repetitions | `ab{3}` | abc abbc abbbc a ac | abc abbc <ins>**abbb**</ins>c a ac | abc <ins>**abb**</ins>c ... |
| `x{m,}` | Exactly `m` or more repetitions | `ab{2,}` | abc abbc abbbc a ac | abc <ins>**abb**</ins>c <ins>**abbb**</ins>c a ac | <ins>**ab**</ins>c ... |
| `x?` | Optional `x`, i.e. zero or one time | `ab?` | abc abbc abbbc a ac | <ins>**ab**</ins>c <ins>**ab**</ins>bc <ins>**ab**</ins>bbc <ins>**a**</ins> <ins>**a**</ins>c | ... <ins>**ac**</ins> |
| `x\|y` | Matching alternatives, i. e. `x` or `y` | 1. `b\|2` <br> 2. `b(a\|u)t` | 1. abc 123 <br> 2. bit bat but bet | 1. b, 2 <br> 2. bat, but | 1. a, c, 1, 3 <br> 2. bit, bet |
| `x*?` | `x*` captures greedily, i.e. as much as possible, while `x*?` captures non-greedily, i.e. as few as possible | 1. `bc*?` <br> 2. `a.*?#` | 1. abcd abccccd ab<br> 2. abc#defgh#i | 1. a<ins>**b**</ins>cd a<ins>**b**</ins>ccccd a<ins>**b**</ins><br> 2. <ins>**abc#**</ins>defgh#i | 1. a<ins>**bc**</ins>d a<ins>**bcccc**</ins>d a<ins>**b**</ins> (result for `bc*`) <br> 2. <ins>**abc#defgh#**</ins>i (result for `a.*#`) |
| `x+?` | Same as above: `x+` (greedy), `x+?` (non-greedy) | 1. `bc+?` <br> 2. `<.+?>` | 1. abcd abccccd ab<br> 2. &lt;span>Hallo&lt;/span> html. | 1. a<ins>**bc**</ins>d a<ins>**bc**</ins>cccd ab<br> 2. <ins>**&lt;span>**</ins>Hallo<ins>**&lt;/span>**</ins> html. | 1. a<ins>**bc**</ins>d a<ins>**bcccc**</ins>d ab (result for `bc+`) <br> 2. <ins>**&lt;span>Hallo&lt;/span>**</ins> html. (result for `<.+>`) |
Character Sets, Ranges, Subgroups and Lookarounds
| Expression | Represents | Example Regex | Example String | Matches | Does not Match |
|---|---|---|---|---|---|
| `[xy]` | Character set, matches a single character present in the list | `b[iu]` | bit bat but bet | <ins>**bi**</ins>t bat <ins>**bu**</ins>t bet | bit <ins>**ba**</ins>t but <ins>**be**</ins>t |
| `[x-y]` | Character range, matches a single character in the specified range, note that ranges may be locale-dependent | `a[a-c0-5]` | aa1 ab2 ba3 cac4 da56 a7 |<ins>**aa**</ins>1 <ins>**ab**</ins>2 b<ins>**a3**</ins> c<ins>**ac**</ins>4 d<ins>**a5**</ins>6 a7 | aa1 ab2 ba3 cac4 da56 <ins>**a7**</ins> |
| `[^xy]` | Negation, matches any single character not present in the list | `[^Ap]` | ABap | B, a | A, p |
| `[^x-y]` | Negation, matches any single character not within the range | `[^A-Ca-c1-4]` | ABCDabcd123456 | D, d, 5, 6 | A, B, C, a, b, c, 1, 2, 3, 4 |
| `(...)` | Capturing group to group parts of patterns together | `b(a\|u)t` | bit bat but bet | bat, but | bit, bet |
| `(?=...)` | Positive lookahead, returns characters that are followed by a specified pattern without including this pattern | `a(?=b)` | abc ade | <ins>**a**</ins>bc ade | abc <ins>**a**</ins>de |
| `(?!...)` | Negative lookahead, returns characters that are not followed by a specified pattern without including this pattern | `a(?!b)` | abc ade | abc <ins>**a**</ins>de | <ins>**a**</ins>bc ade |
| `(?<=...)` | Positive lookbehind, returns characters that are preceded by a specified pattern without including this pattern | `(?<=\s)c` | ab c abcd | ab <ins>**c**</ins> abcd (it is preceded by a blank) | ab c ab<ins>**c**</ins>d |
| `(?<!...)` | Negative lookbehind, returns characters that are not preceded by a specified pattern without including this pattern | `(?<!\s)c` | ab c abcd | ab c ab<ins>**c**</ins>d (it is not preceded by a blank) | ab <ins>**c**</ins> abcd |
| `\1` | Backreference, refers to a previous capturing group; 1 represents the number of the group index (the group index starts with 1) | `(a.)(\w*)\1` | abcdefabghij | <ins>**abcdefab**</ins>ghij <br>Note: Capturing group 1 holds `ab` in the example. The second capturing group captures all word characters until `ab` is found. | <ins>**ab**</ins>cdefabghij |
| `\K` | Resets the starting point of a match, i.e. findings are excluded from the final match | `a.\Kc` | abcd | ab<ins>**c**</ins>d | <ins>**abc**</ins>d |
> **💡 Note**<br>
> - Subgroups are useful in replacements. By using an expression with `$` and a number, such as `$1`, you can refer to a specific group. For example, you have a string `abcde`. A PCRE expression might be
`(ab|xy)c(d.)`, where two subgroups are specified within two pairs of parentheses. In a replacement pattern, you can refer to the first group with `$1` and the second group with `$2`. Thus, the replacement pattern `$2Z$1` results in `deZab`.
> - `(?:x)` creates a group but it is not captured. Example regular expression: `(?:ab)(ap)`. Example string: 'abap'. It matches 'abap', but `$1` will only contain 'ap'.
> - Regarding special characters, check the [Special Characters](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm) topic in the ABAP Keyword Documentation. For example, a non-breaking space whose hex code is *U+00A0*. You can replace all of the non-breaking space occurrences in a string as follows:
> ```abap
> REPLACE ALL OCCURRENCES OF PCRE `\x{00A0}` IN some_string WITH ``.
> "Alternative
> REPLACE ALL OCCURRENCES OF PCRE `(*UTF)\N{U+00A0}` IN some_string WITH ``.
> ```
Anchors and Positions
| Expression | Represents | Example Regex | Example String | Matches | Does not Match |
|---|---|---|---|---|---|
| `^` | Start of line | `^.` | abc def | <ins>**a**</ins>bc def | abc <ins>**d**</ins>ef |
| `$` | End of line | `.$` | abc def | abc de<ins>**f**</ins> | <ins>**a**</ins>bc def |
| `\b` | Start or end of word | 1. `\ba.` <br>2. `\Dd\b` <br>3. `\b.d\b` | abcd a12d ed | 1. <ins>**ab**</ins>cd <ins>**a1**</ins>2d ed <br>2. ab<ins>**cd**</ins> a12d <ins>**ed**</ins> <br> 3. abcd a12d <ins>**ed**</ins> | 1. ab<ins>**cd**</ins> a1<ins>**2d**</ins> ed <br> 2. abcd a1<ins>**2d**</ins> ed <br> 3. <ins>**abcd**</ins> <ins>**a12d**</ins> ed |
| `\B` | Negation of `\b`, not at the start or end of words | `\Be\B` | see an elefant | s<ins>**e**</ins>e an el<ins>**e**</ins>fant | s<ins>**ee**</ins> an <ins>**e**</ins>lefant |
> **💡 Note**<br>
> Find more information [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_pcre_syntax_specials.htm). Note that there are further anchors available such as `\A` and `\Z` for the start and end of strings.
<p align="right"><a href="#top">⬆️ back to top</a></p>
#### Searching Using Regular Expressions
- Multiple string functions support PCRE expressions by offering the
`pcre` parameter, which you can use to specify such an expression.
`FIND` and `REPLACE` statements support regular
expressions with the `PCRE` addition.
- The string function
[`match`](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenmatch_functions.htm)
works only with regular expressions. It returns a substring that
matches a regular expression within a string.
- For comparisons, you can
also use the [predicate
function](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenpredicate_function_glosry.htm "Glossary Entry")
[`matches`](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenmatches_functions.htm), which returns true or false if a string matches a given pattern or not.
Syntax examples:
``` abap
DATA(s1) = `Cathy's black cat on the mat played with Matt.`.
"Determining the position of the first occurrence
"Here, the parameter occ is 1 by default.
DATA(int) = find( val = s1 pcre = `at.` ). "1
"Determining the number of all occurrences.
"Respects all 'a' characters not followed by 't', all 'at' plus 'att'
int = count( val = s1 pcre = `at*` ). "6
"Respects all 'at' plus 'att'
int = count( val = s1 pcre = `at+` ). "4
"Extracting a substring matching a given pattern
DATA(s2) = match( val = `The email address is jon.doe@email.com.`
pcre = `\w+(\.\w+)*@(\w+\.)+(\w{2,4})` ). "jon.doe@email.com
"Predicate function matches
"Checking the validitiy of an email address
IF matches( val = `jon.doe@email.com`
pcre = `\w+(\.\w+)*@(\w+\.)+(\w{2,4})` ). "true
...
ENDIF.
"Examples with the FIND statement
"SUBMATCHES addition: Storing submatches in variables
"Pattern: anything before and after ' on '
FIND PCRE `(.*)\son\s(.*)` IN s1 IGNORING CASE SUBMATCHES DATA(a) DATA(b).
"a: 'Cathy's black cat' / b: 'the mat played with Matt.'.
"Determining the number of letters in a string
FIND ALL OCCURRENCES OF PCRE `[A-Za-z]` IN s1 MATCH COUNT DATA(c). "36
"Searching in an internal table and retrieving line, offset, length information
DATA(itab) = value string_table( ( `Cathy's black cat on the mat played with the friend of Matt.` ) ).
"Pattern: 't' at the beginning of a word followed by another character
FIND FIRST OCCURRENCE OF PCRE `\bt.` IN TABLE itab
IGNORING CASE MATCH LINE DATA(d) MATCH OFFSET DATA(e) MATCH LENGTH DATA(f). "d: 1, e: 21, f: 2
"The objective of the following example is to extract the content of the segments that
"are positioned within /.../ in a URL. The segments are stored in an internal table.
DATA(url) = `https://help.sap.com/docs/abap-cloud/abap-concepts/controlled-sap-luw/`.
DATA url_parts TYPE string_table.
FIND ALL OCCURRENCES OF PCRE `(?<=/)([^/]+)(?=/)` IN url RESULTS DATA(res).
"Details on the regular expression:
"- Positive lookbehind (?<=/) that determines that the content is preceded by `/`
"- Positive lookahead (?=/) that determines that the content is followed by `/
"- ([^/]+) in between determines that any sequence of characters that are not `/` are matched
"- The match is put in parentheses to store the submatch
"The RESULTS addition stores findings in an internal table of type match_result_tab.
"Submatches (i.e. length and offset values of the submatches) are stored in internal
"tables themselves. Therefore, the example uses nested loops and the substring function
"to retrieve the strings.
LOOP AT res INTO DATA(finding).
LOOP AT finding-submatches INTO DATA(sub).
DATA(url_part) = substring( val = url off = sub-offset len = sub-length ).
APPEND url_part TO url_parts.
ENDLOOP.
ENDLOOP.
"The following statement uses nested iteration expressions with FOR instead of nested
"LOOP statements.
DATA(url_parts_for_loop) = VALUE string_table( FOR wa1 IN res
FOR wa2 IN wa1-submatches
( substring( val = url off = wa2-offset len = wa2-length ) ) ).
ASSERT url_parts = url_parts_for_loop.
*Content:
*help.sap.com
*docs
*abap-cloud
*abap-concepts
*controlled-sap-luw
```
<p align="right"><a href="#top">⬆️ back to top</a></p>
#### System Classes for Regular Expressions
- You can create an object-oriented representation of regular expressions using the `CL_ABAP_REGEX` system class.
- For example, the `CREATE_PCRE` method creates instances of regular expressions with PCRE syntax.
- The instances can be used, for example, with the `CL_ABAP_MATCHER` class, which applies the regular expressions.
- A variety of methods and parameters can be specified to accomplish various things and to further specify the handling of the regular expression.
- More information can be found [here](https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/index.htm?file=abenregex_system_classes.htm) and in the class documentation (choose F2 on the class in ADT).
``` abap
DATA(str) = `a1 # B2 ? cd . E3`.
"Creating a regex instance for PCRE regular expressions
"In the example, regex_inst has the type ref to cl_abap_regex.
DATA(regex_inst) = cl_abap_regex=>create_pcre( pattern = `\D\d` "any-non digit followed by a digit
ignore_case = abap_true ).
"Creating an instance of CL_ABAP_MATCHER using the method CREATE_MATCHER of the class CL_ABAP_REGEX
"You can also specify internal tables with the 'table' parameter and more.
DATA(matcher) = regex_inst->create_matcher( text = str ).
"Finding all results using the 'find_all' method
"In the example, result has the type match_result_tab containing the findings.
DATA(result) = matcher->find_all( ).
"Using method chaining
DATA(res) = cl_abap_regex=>create_pcre( pattern = `\s\w` "any blank followed by any word character
ignore_case = abap_true )->create_matcher( text = str )->find_all( ).
```
<p align="right"><a href="#top">⬆️ back to top</a></p>
#### Replacing Using Regular Expressions
- To perform replacement operations using regular expressions, you can use both
the string function `replace` and `REPLACE` statements with the `pcre` parameter or the `PCRE` addition.
- Like the `find` function, among others, and
`FIND` statements, the `replace` function and
`REPLACE` statements offer a number of parameters and additions that you can use to further restrict the area to be replaced.
- For more detailed information, refer to the ABAP
Keyword Documentation.
- The executable example covers many of the PCRE expressions listed above.
Syntax examples:
``` abap
DATA(s1) = `ab apppc app`.
DATA s2 TYPE string.
"Replaces 'p' with 2 - 4 repetitions, all occurences
s2 = replace( val = s1 pcre = `p{2,4}` with = `#` occ = 0 ). "ab a#c a#
"Replaces any single character not present in the list, all occurences
s2 = replace( val = s1 pcre = `[^ac]` with = `#` occ = 0 ). " "a##a###c#a##
"Replaces first occurence of a blank
s2 = replace( val = s1 pcre = `\s` with = `#` ). "ab#apppc app
"Greedy search
"The pattern matches anything before 'p'. The matching is carried out as
"often as possible. Hence, in this example the search stretches until the
"end of the string since 'p' is the final character, i. e. this 'p' and
"anything before is replaced.
s2 = replace( val = s1 pcre = `.*p` with = `#` ). "#
"Non-greedy search
"The pattern matches anything before 'p'. The matching proceeds until
"the first 'p' is found and does not go beyond. It matches as few as
"possible. Hence, the first found 'p' including the content before
"is replaced.
s2 = replace( val = s1 pcre = `.*?p` with = `#` ). "#ppc app
"Replacements with subgroups
"Replaces 'pp' (case-insensitive here) with '#', the content before and after 'pp' is switched
s2 = replace( val = s1
pcre = `(.*?)PP(.*)`
with = `$2#$1`
case = abap_false ). "pc app#ab a
"Changing the source field directly with a REPLACE statement; same as above
REPLACE PCRE `(.*?)PP(.*)` IN s1 WITH `$2#$1` IGNORING CASE. "pc app#ab a
"ALL OCCURRENCES addition
REPLACE ALL OCCURRENCES OF PCRE `\s` IN s1 WITH `?`. "pc?app#ab?a
REPLACE ALL OCCURRENCES OF PCRE `p.` IN s1 WITH `XY` "XY?aXY#ab?a
REPLACEMENT COUNT DATA(repl_cnt) "2
RESULTS DATA(repl_res).
"repl_res:
"LINE OFFSET LENGTH
"0 0 2
"0 4 2
```
<p align="right"><a href="#top">⬆️ back to top</a></p>
#### Overview/Examples: Using PCRE Regular Expressions in Various Contexts
- As also covered in other sections, there are multiple contexts where regular expressions are possible, for example, PCRE regular expressions. Among them are:
- Statements (`PCRE` addition): `FIND`, `REPLACE`
- Classes: `CL_ABAP_REGEX`, `CL_ABAP_MATCHER`
- Built-in functions having the `pcre` parameter: `find`, `find_end`, `count`, `match`, `replace`, `substring_from`, `substring_after`, `substring_before`, `substring_to`
- Note: Built-in string functions dealing with regular expressions are also available for ABAP SQL and CDS. See the [Misc Built-In Functions](24_Misc_Builtin_Functions.md) cheat sheet.
- For more information refer to the [Regular Expressions in ABAP](28_Regular_Expressions.md) cheat sheet.
- It covers the following topics:
- An excursion covering common regular expressions (the focus is on PCRE)
- Regular expressions used in ABAP in the following contexts:
- `FIND` and `REPLACE` statements (with the `PCRE` addition)
- Built-in functions in ABAP with the `pcre` parameter, such as `find`, `find_end`, `count`, `match`, `replace`, `substring_from`, `substring_after`, `substring_before`, `substring_to`
- Built-in functions in ABAP SQL and CDS (e.g. `like_regexpr`, `locate_regexpr`, `locate_regexpr_after`, `occurrences_regexpr`, `replace_regexpr`, `substring_regexpr` in ABAP SQL)
- `CL_ABAP_REGEX` and `CL_ABAP_MATCHER` classes
Examples:
```abap
DATA(some_string) = `aa bb cc dd ee`.
DATA(original_string) = some_string.
@@ -2080,90 +1813,6 @@ REPLACE ALL OCCURRENCES OF PCRE `(<p>)(.*?)(<\/p>)` IN html_a WITH `$1Hi$3`.
"Regular expression: any character or a new line with zero or more repretitions
REPLACE ALL OCCURRENCES OF PCRE `(<p>)(.|\n)*?(<\/p>)` IN html_b WITH `$1Hi$3`.
*<p>Hi</p><p>Hi</p><p>Hi</p>
"The following strangely formatted demo HTML code is to be processed. It
"intentionally and randomly includes new lines and returns. Some HTML tags
"have attributes. Suppose you want to proceed with the following tasks:
"- Getting the content between the table tags
"- Putting the content in one line by removing all new lines and returns
"- Removing all attributes so that only the tags are available
DATA(nl) = |\n|.
DATA(rt) = |\r|.
DATA(html) =
`<html lang="EN">` && nl &&
`<body>` && nl &&
` <table border="1" summary="data display" ` && rt &&
`title="ABAP ` && nl &&
`Data">` &&
` <tr class="header">` && rt &&
` <th>CARRID` && nl &&
`</th>` && nl &&
` <th>CARRNAME</th>` &&
` </tr>` && nl &&
` <tr class="body">` && nl &&
` <td>` &&
` <span ` && nl &&
`class="nprpnwrp">XY</span>` &&
` </td>` && rt &&
` <td>` &&
` <span class="nprpnwrp"` && rt &&
`>XY Airlines</span>` && nl &&
` </td>` && nl &&
` </tr>` && nl &&
` <tr class="body">` && nl &&
` <td>` && rt &&
` <span class="nprpnwrp">YZ</span>` && rt &&
` </td>` && nl &&
` <td>` && nl &&
` <span class=` && rt &&
`"nprpnwrp">YZ Airways</span>` && nl &&
` </td>` && nl &&
` </tr>` && nl &&
` </table>` && rt &&
`</body>` && nl &&
`</html>`.
"See the following examples of how to proceed. Other regular expressions may be chosen, too.
"Getting the content/all tags between the table tags
"The regular expression considers the attributes of the table tag that could be anything.
"A non-greedy search up to the next '>' would not work with just '.' in this case because
"of the new lines and returns. So, a regular expression to retrieve the content might be
"as follows:
"[\s|\S]*?
"- Matches a single character specified within [...]
"- In this case it is either \s or \S (i.e. any whitespace or non-whitespace character
"- 'or' is represented by '|'
"- '*?' represents non-greedy search (zero or multiple occurrences) up to the next '>' in
" the example
"- The same regular expression is used between the <table ...> ... </table> tags to capture
" anyting.
"- It is put within a pair of parentheses to capture a group. This group (i.e. the offset
" and length values for the finding) is available in the data object following RESULTS
" in the FIND statement. As only one group is available in the regular expression,
" the 'submatches" table contains one line.
"- Note the escaping of '/' for '/' in </table>
"- The examples use FIND/REPLACE statements. You can also use string functions.
FIND PCRE `<table[\s|\S]*?>([\s|\S]*?)<\/table>` IN html RESULTS DATA(res).
DATA(content_bw_table_tag) = substring( val = html off = res-submatches[ 1 ]-offset len = res-submatches[ 1 ]-length ).
"Removing all new lines and returns
REPLACE ALL OCCURRENCES OF PCRE `\n|\r` IN content_bw_table_tag WITH ``.
"Removing all whitespace characters between start and end HTML tags
REPLACE ALL OCCURRENCES OF PCRE `>(\s*?)<` IN content_bw_table_tag WITH `><`.
"Removing all whitespace characters in front of the first HTML tag, and behind
"the last tag
REPLACE PCRE `^\s*?<` IN content_bw_table_tag WITH `<`.
REPLACE PCRE `>\s*$` IN content_bw_table_tag WITH `>`.
"Removing all attributes within tags
"Regex: Non-greedy searches up to the next whitespace character, and from that
"whitespace character up to the next '>'. Using $n, you refer to the capturing
"group. In this case, the first capturing group enclosed by '<...>' shall be
"inserted.
REPLACE ALL OCCURRENCES OF PCRE `<(.*?)\s(.*?)>` IN content_bw_table_tag WITH `<$1>`.
"span tags are not needed in the example any more as the attributes are removed
REPLACE ALL OCCURRENCES OF `<span>` IN content_bw_table_tag WITH ``.
REPLACE ALL OCCURRENCES OF `</span>` IN content_bw_table_tag WITH ``.
```
<p align="right"><a href="#top">⬆️ back to top</a></p>

1375
28_Regular_Expressions.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -95,6 +95,7 @@ ABAP cheat sheets[^1] ...
|[Authorization Checks](25_Authorization_Checks.md)|Provides a high-level overview of explicit and implicit authorization checks in ABAP|- (The cheat sheet includes a copy and paste example class)|
|[ABAP Dictionary](26_ABAP_Dictionary.md)|Covers a selection of repository objects in the ABAP Dictionary (DDIC) that represent global types|- (The cheat sheet includes a copy and paste example class)|
|[Exceptions and Runtime Errors](27_Exceptions.md)|Provides an overview on exceptions and runtime errors|- (The cheat sheet includes a copy and paste example class)|
|[Regular Expressions in ABAP](28_Regular_Expressions.md)|Includes an overview of common regular expressions and their use in ABAP through statements, built-in functions, and system classes|- (The cheat sheet includes copy and paste sample code snippets)|
<br>