Stripping out non-ASCII characters

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
harryray2
Posts: 1056
Joined: Sat Oct 15, 2016 9:56 am

Stripping out non-ASCII characters

Post by harryray2 »

Just a general question on stripping out non-ASCII characters. I use this Regex in various text editors to stripout the characters, how would I go about also taking out some ASCII characters as well, such as @,?, and so on.

Also, there's a lot of text, normally with a black background, such as NUL or STX etc. I assume that these are some sort of control characters. How can I remove these as well?
void
Developer
Posts: 15675
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to List Filenames Containing Non-ASCII Characters

Post by void »

Please try including the following regex search:

regex:[^\x20!\x22#$%&'()*+,\-./0-9:;<=>?@A-Z\[\\\]^_`a-z{0x7c}~]

\ = escape character.
\x20 = space
\x22 = literal double quote (")
\x7c = |

Take out any of the ASCII characters you want to include.
harryray2
Posts: 1056
Joined: Sat Oct 15, 2016 9:56 am

Re: How to List Filenames Containing Non-ASCII Characters

Post by harryray2 »

Thanks, I tried this on Notepad++, and it worked, but to remove the characters ? and @ I had to remove them from the line:
regex:[^\x20!\x22#$%&'()*+,\-./0-9:;<=>?@A-Z\[\\\]^_`a-z{0x7c}~]

The above worked, but there are still a few characters I would like to remove..How do I do this?
Is the Regex structure used in Notepad++, and other programmes, slightly different from Regex in Everything?

I think, what I need to do, is to remove everything, so that I'm just left with letters and numbers, also, not sure if this is possible, be left with just letters and numbers, except for @ when it's included in a email address.
void
Developer
Posts: 15675
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to List Filenames Containing Non-ASCII Characters

Post by void »

The above worked, but there are still a few characters I would like to remove..How do I do this?
What characters would like to show?

Remove the characters you want to show from:
regex:[^\x20!\x22#$%&'()*+,\-./0-9:;<=>?@A-Z\[\\\]^_`a-z{0x7c}~]

Note: a-z, A-Z and 0-9 specify a range and \ starts an escape sequence.


Is the Regex structure used in Notepad++, and other programmes, slightly different from Regex in Everything?
Everything uses Perl Compatible Regular Expressions (PCRE)
harryray2
Posts: 1056
Joined: Sat Oct 15, 2016 9:56 am

Re: How to List Filenames Containing Non-ASCII Characters

Post by harryray2 »

I think, what I need to do, is to remove everything, so that I'm just left with just letters and numbers, also, not sure if this is possible, be left with just letters and numbers, except for @ when it's included in a email address.
void
Developer
Posts: 15675
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to List Filenames Containing Non-ASCII Characters

Post by void »

I'm not sure what you want.

The following will show almost all files because of the . in the file extension:
regex:[^a-z0-9]


To skip the extension, try something like:
regex:[^a-z0-9].*\.[a-z]*$


This will still match most filenames because of spaces, - and .
To include space, - and .
regex:[^a-z0-9\-\x20\.].*\.[a-z]*$


To ignore filenames that match a basic email address, include the following in your search:
!regex:^[a-z0-9\._\-]+@[a-z0-9\._\-]+$
Post Reply