Search for DOCX using Verdana font

Discussion related to "Everything" 1.5 Alpha.
Post Reply
w64bit
Posts: 233
Joined: Wed Jan 09, 2013 9:06 am

Search for DOCX using Verdana font

Post by w64bit »

I tried to search for DOCX files which are using Verdana font
ext:docx content:Verdana
but not all files are found.
Is there anything I can do to found them?
tuska
Posts: 937
Joined: Thu Jul 13, 2017 9:14 am

Re: Search for DOCX using Verdana font

Post by tuska »

What makes you think that Everything can search for a font in a Word document?
Content Indexing
horst.epp
Posts: 1350
Joined: Fri Apr 04, 2014 3:24 pm

Re: Search for DOCX using Verdana font

Post by horst.epp »

Font names are not part of the content.
Maybe a binary search can help here.
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Search for DOCX using Verdana font

Post by froggie »

Word .docx files are compressed very much like .zip files.

If a .docx file, say wd,docx, is renamed to .zip, then this will work to find fonts:

wd.zip content:acme

(Most (all?) word documents contain some default formats, such as Times New Roman.)
w64bit
Posts: 233
Joined: Wed Jan 09, 2013 9:06 am

Re: Search for DOCX using Verdana font

Post by w64bit »

froggie, this is a good one. Thanks.

Now, how can I search inside docx files (without renaming them to zip) by using zip iFilter instead docx iFilter?
something like:
ext:docx ifilter:zip content:Verdana
NotNull
Posts: 5296
Joined: Wed May 24, 2017 9:22 pm

Re: Search for DOCX using Verdana font

Post by NotNull »

froggie wrote: Fri Jul 21, 2023 6:39 pm Word .docx files are compressed very much like .zip files.
Very, very much :D

When you open a docx file in a text or hex editor, you'll see that the first two characters are PK.
Those are the initials of Phil Katz, the developer of the original PKzip and "founder" of the zip format.
All zipfiles start with this PK identifier, although I'm not certain about about self-extracting zip files.
(end of Useless Fact)


I am surprised the font names could be found in the docx file..
Inside the docx "zipfile" are many xml files, among which a font declaration file (fonttable.xml), but reading a zip using the content: function should not read through all the files in the zip. At least that is how I thought this works.

I would like to be wrong here though, as that gives loads of new search opportunities!


Alternative would be to extract this 'fonttable.xml' from the docx file and parse that (with a script).
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Search for DOCX using Verdana font

Post by froggie »

I was aware of the "PK", but was (and am) unsure of what modifications Microsoft might have made, thus my comment.

Nevertheless, content searches for fonts and text in documents(renamed to zip files) work for me.

Maybe it won't be necessary to rename the files:
Re: Is it possible to search for content within an archive file

Post by void » Sat Jul 15, 2023 9:29 pm
I will trial a change in the next alpha update to treat any file (with a zip footer) as a zip file.

I'll report back here once this is ready for testing.
Last edited by froggie on Sat Jul 22, 2023 12:41 am, edited 1 time in total.
void
Developer
Posts: 15464
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for DOCX using Verdana font

Post by void »

Everything can't do this yet.

I will consider a zipcontent: search function.
Thank you for the suggestions.
void
Developer
Posts: 15464
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for DOCX using Verdana font

Post by void »

I will trial a change in the next alpha update to treat any file (with a zip footer) as a zip file.
Everything will only read zip filenames, not zip content.
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Search for DOCX using Verdana font

Post by froggie »

void wrote: Sat Jul 22, 2023 12:14 am Everything can't do this yet.
@void: Maybe I am missing something, but when there is a xml structure inside a zip (and only then) Everything seems to read the content - is it regarding it as a directory? Is something else strange going on? Other content within a zip does not work, as expected, but xml seems to. I created several examples and they all work like this
c1.JPG
c1.JPG (12.9 KiB) Viewed 1834 times
c2.JPG
c2.JPG (18.76 KiB) Viewed 1834 times
[/code]

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="true"?>

-<w:document mc:Ignorable="w14 w15 wp14" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" 
-<w:body>
-<w:p w:rsidRDefault="001F53A3" w:rsidR="00E65461">
-<w:r>
<w:t>Greatbigword</w:t>
</w:r>
<w:bookmarkStart w:name="_GoBack" w:id="0"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
-<w:sectPr w:rsidR="00E65461" w:rsidSect="007A34C0">
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:gutter="0" w:footer="720" w:header="720" w:left="1440" w:bottom="1440" w:right="1440" w:top="1440"/>
<w:cols w:space="720"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document>
void
Developer
Posts: 15464
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for DOCX using Verdana font

Post by void »

Everything will use the system-wide docx/zip iFilter to search docx/zip content.

Typically there's no zip iFilter so Everything will fall back to a binary content search.
Everything will look for the following text encodings when performing a binary content search:
ASCII
ANSI
UTF-8
UTF-16 LE
UTF-16 LE (with an offset of 1)
UTF-16 BE
UTF-16 BE (with an offset of 1)

My guess is the font name is stored as raw text in your zip file somewhere..



To check the content Everything reads, include the following in your search:

new .zip regex:dotall:content:^(.*)$ addcol:regmatch1

The read content is shown in the regmatch1 column.
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Search for DOCX using Verdana font

Post by froggie »

All of the Microsoft Word XML from the WordDoc.zip is in the regmatch1 column (as far as I can expand the width of the column).
So Everything can match whatever is found in the XML (including text in documents)

Using the Nirsoft filter listing tool, there is a "Microsoft Office Open XML Format Filter" (offfiltx.dll) which processes XML and only XML in ZIP files.

So this is all working the way it does because of the filter.

Now if I only knew exactly which release of Office it came from.

Thank you @Void.
Last edited by froggie on Sat Jul 22, 2023 1:52 pm, edited 1 time in total.
NotNull
Posts: 5296
Joined: Wed May 24, 2017 9:22 pm

Re: Search for DOCX using Verdana font

Post by NotNull »

froggie wrote: Sat Jul 22, 2023 12:12 am I was aware of the "PK", but was (and am) unsure of what modifications Microsoft might have made, thus my comment.
Neither was I ( due to Microsoft's infamous "Embrace and Extend and Extinguish" policy). Just a fun fact.
froggie wrote: Sat Jul 22, 2023 2:24 am Now if I only knew exactly which release of Office it came from.
You could check the version of offfitx.dll. Typically these are in line with the Office versions.

8.0 = Office 97
9.0 = Office 2000
10.0 = Office XP
11.0 = Office 2003
12.0 = Office 2007
14.0 = Office 2010
15.0 = Office 2013
16.0 = Office 2016
(lost track after that)
froggie
Posts: 297
Joined: Wed Jun 12, 2013 10:43 pm

Re: Search for DOCX using Verdana font

Post by froggie »

Good idea.

Offfiltx.dll is installed in two different places - one from Office 2010 and one from Office 2013. On another system, there are two from Office 2016 (x86 & x64) installed in yet different locations.

.docx came out with Office 2007, so that is the earliest I would expect to (perhaps) have this Ifilter.

(I originally left out the "l". The file name is offfiltx.dll )
Post Reply