Page 1 of 1

Regex Unicode and Non-Unicode

Posted: Sun Feb 19, 2017 9:18 am
by Debugger
It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

Re: Regex Unicode and Non-Unicode

Posted: Sun Feb 19, 2017 11:45 pm
by void
Everything uses Perl Compatible Regular Expressions.

Please try:

\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Unicode is supported.
\p{...} is not supported.

Re: Regex Unicode and Non-Unicode

Posted: Wed Mar 01, 2017 5:08 pm
by Debugger
[quote="void"]

Please try:

\b (?>) Matches a word boundary (the start or end of a word).

Regex enabled:
\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]
Always not work:
0 objects!!!!!!!!!!!

Re: Regex Unicode and Non-Unicode

Posted: Thu Mar 02, 2017 8:09 am
by void
regex:"\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]" is working correctly here.

\b = starting word boundary.
\x0D = carriage return
\x0A = new line
| = OR (all text before this is one search, all text after this another search)
[] = in a set
\x{85} = new line
\x{2028} = separator
\x{2029} = separator

Combing them all together you get:
(a carriage return or newline after a word boundary) OR (a single character matching a carriage return, newline, atlernate newline or unicode separator 2028 or unicode separator 2029)

What exactly are you trying to search for?

Please try without the word boundary:
regex:[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Make sure regex is disabled from the Search menu if you use the regex: modifier.
Also if you use the regex: modifier, please make sure you escape | with double quotes.

You can also use the built in macro to find unicode characters, which should be faster, with regex disabled, search for:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Re: Regex Unicode and Non-Unicode

Posted: Fri Mar 03, 2017 6:52 am
by Debugger
0 object

Image Image

Re: Regex Unicode and Non-Unicode

Posted: Sat Mar 04, 2017 10:35 am
by void
Are you certain you have a filename with one of the above characters?

Does the following search find any results:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:

Re: Regex Unicode and Non-Unicode

Posted: Mon Mar 06, 2017 1:35 pm
by Debugger
It does not work for me.
I want correct Regex: Show the names of Unicode
I want correct Regex: All names without Unicode.

Re: Regex Unicode and Non-Unicode

Posted: Wed Mar 08, 2017 4:27 am
by void
I've tested creating filenames with 0x0a, 0x0d, U+2028 and U+2029 characters and the above searches would find them.

It's not clear what you are searching for.

To search for files with non-ASCII characters, search for:
regex:[^\x{00}-\x{7f}]

To search for files with only non-ASCII characters, search for:
!regex:[\x{00}-\x{7f}]

To search for files with ASCII only characters, search for:
regex:^[\x{00}-\x{7f}]*$

Re: Regex Unicode and Non-Unicode

Posted: Wed Mar 08, 2017 11:08 pm
by skribb
Debugger wrote:It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

I don't know anything about Regx BUT as far as I understand it I don't see why those strings would find folders and file names containing characters from the non-latin character set

Re: Regex Unicode and Non-Unicode

Posted: Fri Mar 10, 2017 9:32 am
by Debugger
regex:[^\x{00}-\x{7f}]

It works, but I do not want to include Polish alphabet (native OS Polish)
https://en.wikipedia.org/wiki/Polish_alphabet
Image
Show only English + Unicode.

Re: Regex Unicode and Non-Unicode

Posted: Sat Mar 11, 2017 1:21 pm
by void
It works, but I do not want to include Polish alphabet (native OS Polish)
regex:[^\x{00}-\x{7f}\x{104}\x{106}\x{118}\x{141}\x{143}\x{d3}\x{15a}\x{179}\x{17b}\x{105}\x{107}\x{119}\x{142}\x{144}\x{f3}\x{15b}\x{17a}\x{17c}]
Show only English + Unicode.
What do you mean by English? does this include spaces? numbers?
What do you mean by Unicode? I assume you mean characters with a code > 7f.

To search for a-z only search for:
regex:^[a-zA-Z]*$

Re: Regex Unicode and Non-Unicode

Posted: Sat Mar 11, 2017 3:45 pm
by Debugger
English
Aa Cc Ec
Aceelerator

-----------------------
Polish
AĄaą CĆcć EĘeę
Mąka ćwikłowa

------------------------
Unicode -> Other languages than Polish native + Special Chars ★ Hozda ★

Code: Select all

¡ ¦ 
гвинея-спорт_олимпиада_мюнхен-72(1972)
極上スマイル(brz_



regex:^[a-zA-Z]*$
It does not show all the folders
It does not show all the files