regex pattern matching unicode characters for file renaming
I'm trying to understand what kind of regex expressions are supported in the file renaming syntax, specifically how to be able to match the full range of unicode letters
The only way to match letters seems to be like this
The context is this problem, trying to construct a regex pattern that extracts all initials of given names. The following pattern works for names in Latin script that don't have any diacritics:
But it fails in all other cases.
I thought I could use
Any help understanding the regex options (or suggestions for alternative ways to accomplish what I need) much appreciated.
EDIT:
FWIW, I've also tried with unicode character class ranges, but to no avail:
\p{L}
. The only way to match letters seems to be like this
[a-zA-Z0-9_]
-- neither standard regex expressions like \w
or \S
seem to work, and thus the one I need, \p{Letter}
or \p{L}
, doesn't either. Any help how I could make this work? I did include the u
flag for unicode support in regexOpts
but to no avail. The context is this problem, trying to construct a regex pattern that extracts all initials of given names. The following pattern works for names in Latin script that don't have any diacritics:
{{ creators max="1" name="given" replaceFrom="(^| )([a-zA-Z0-9_)([a-zA-Z0-9_.])*" replaceTo="$2" regexOpts="gu" }}
But it fails in all other cases.
I thought I could use
\p{L}
to include all unicode characters defined as letters, but it doesn't work. {{ creators max="1" name="given" replaceFrom="(^| )(\p{L})(\p{L}|.)*" replaceTo="$2" regexOpts="gu" }}
Any help understanding the regex options (or suggestions for alternative ways to accomplish what I need) much appreciated.
EDIT:
FWIW, I've also tried with unicode character class ranges, but to no avail:
[\u0041-\u007A\u00C0-\u00FF\u0100-\u017F\u0400-\u04FF\u0370-\u03FF]