Page 1 of 1

checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 5:56 pm
by jimspoon
What if you have two files with identical primary data streams, but one of the files has an alternate data streams? Will a checksum generator show the same checksum for both files? I made a copy of a file and then added an ADS to one of the two files. Nirsoft Hashmyfiles still shows the same checksums for each file. So I am guessing that Hashmyfiles, at least, generates checksums based only on the data in the primary data stream. I don't know if there are any checksum generators that take ADS into account. So if you see the same checksum for two files, the primary streams are almost certainly identical, but they may have different ADS. To identify files which are apparently identical but have different ADS, you'll have to look at other properties, such as the file modification date. Adding an ADS won't change the displayed file size (which shows the size of the primary stream only), but it will change the file modification date.

Re: checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 6:10 pm
by horst.epp
No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.

Re: checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 6:12 pm
by therube
The way I look at it is, a file is a file, & an ADS is an ADS.

And as most copy programs & alternative OS's are not ADS aware...


I've posted about ADS (do a search), but don't recall offhand in what manner it applied...


And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.

Re: checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 6:17 pm
by horst.epp
therube wrote: Mon Oct 31, 2022 6:12 pm And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.
I disagree.
It is for example the main criteria to compare backup versions with their sources.

Re: checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 6:34 pm
by therube
That may be - if you are to assume that content is the same - in which date/time is just fine.
But the whole reason for a backup is to ensure that content is the same.

And content need not be the same, even if date & time are (i.e. data corruption or whatnot).

When Voidhash runs, it does not touch directory time/date. So if you were to assume that two directory structures, because they had the same date/time, are the same, well, they are not (as one may have the hash file in it & the other not - but date/time will not tell you that).

Many backup/sync programs do use date/time to determine "diff", but doing so does not ensure "exactness".
So there is a trade-off between speed date/time vs. having to perform some sort of actual comparison (hash/content checks) on sets of files.

And plenty do backup based on date/time (living with the impression that media does not go bad, or that files have not been modified - silently...).

Re: checksums calculated from primary data streams only?

Posted: Mon Oct 31, 2022 7:29 pm
by raccoon
While ADS can, and in my opinion should, be read in from the NTFS $MFT (fast) and treated as distinct objects (like files and folders). Everything only collects attributes/properties/metadata, such as checksum hashing, on distinct objects. ADS are currently only treated as a property or metadata, not as distinct objects.

In the future would like to see Everything support alternate data streams as distinct objects. I would also like to see archive (zip;rar;7z) contents indexed as objects.

Re: checksums calculated from primary data streams only?

Posted: Tue Nov 01, 2022 12:23 am
by jimspoon
therube wrote: Mon Oct 31, 2022 6:12 pm And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.
What I meant was, if you have two files with identical hashes, but different modification dates, that could be a sign that the files are not in fact really identical, e.g. that an ADS was added to one of them. But Everything of course provides more direct ways to determine this.

Re: checksums calculated from primary data streams only?

Posted: Tue Nov 01, 2022 12:35 am
by jimspoon
horst.epp wrote: Mon Oct 31, 2022 6:10 pm No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.
That's good to know. I guess it depends entirely on the tool being used. When I used 7-zip to add an ADS to a file, the file's modification date was changed. As an experiment, I just used Powershell's set-content cmdlet to add an ADS to a file, and the file's LastWriteTime (shown by the get-childitem command) was changed. What tool do you use to write an ADS?

Re: checksums calculated from primary data streams only?

Posted: Tue Nov 01, 2022 12:44 am
by jimspoon
raccoon wrote: Mon Oct 31, 2022 7:29 pm While ADS can, and in my opinion should, be read in from the NTFS $MFT (fast) and treated as distinct objects (like files and folders). Everything only collects attributes/properties/metadata, such as checksum hashing, on distinct objects. ADS are currently only treated as a property or metadata, not as distinct objects.

In the future would like to see Everything support alternate data streams as distinct objects. I would also like to see archive (zip;rar;7z) contents indexed as objects.
I'd like to see that too! The Files file manager ( https://github.com/files-community/Files ) does give you option to view ADS alongside their containing files in the same directory listing, and so does the V File Viewer. The 7-Zip file manager lets you navigate from a file down to a listing of its ADS.

I think the best solution would let us (optionally) view ADS as distinct objects AND let metadata contained in the ADS be viewed in columns for the primary stream.

Re: checksums calculated from primary data streams only?

Posted: Tue Nov 01, 2022 5:31 am
by void
I will consider a property to calculate the checksum of alternate data streams and data + alternate data streams.

Thank you for the suggestion.

Re: checksums calculated from primary data streams only?

Posted: Tue Nov 01, 2022 9:07 am
by horst.epp
jimspoon wrote: Tue Nov 01, 2022 12:35 am
horst.epp wrote: Mon Oct 31, 2022 6:10 pm No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.
That's good to know. I guess it depends entirely on the tool being used. When I used 7-zip to add an ADS to a file, the file's modification date was changed. As an experiment, I just used Powershell's set-content cmdlet to add an ADS to a file, and the file's LastWriteTime (shown by the get-childitem command) was changed. What tool do you use to write an ADS?
A plugin in Total Commander and a script in XYplorer.
This allows me to have tags available thru the file system.
Also this tags are indexed by Everything and can be searched fast in both file managers.