Question about the fast sort code

finneon · Post by **finneon** » Tue Mar 22, 2022 5:28 pm

Hello David,

a couple months ago we had a task in our programming class where we needed to find a solution for a specific problem:

Given are two lists A and B. A is already sorted and B contains only elements found in list A, but not necessarily all and they're not sorted. Also all elements are unique, so no duplicates in either list. For example the lists might look like this:

List A: 1, 3, 6, 7, 8, 9, 12, 21, 84

List B: 9, 3, 21, 84, 8

The task was to sort list B by using the order from list A as a reference. Our solution worked really well, with the only drawback being that it required more memory. I'll add it to the end of the post for everyone curious.

So what's this got to do with Everything? Well, a few days ago I noticed that the fast sort in Everything basically solves exactly this problem. Say you have 500.000 results from the current search term sorted by name out of 1.000.000 database elements in total. When you then click on sort by path, Everything probably uses the fast path sort index to more efficiently sort the 500.000 results, which is exactly what this problem is about. I noticed that Everything does this really really quickly (so probably no binary searching to determine the order) and there doesn't seem to be a memory spike during the sorting (so probably not our solution with a temporary hash map either), Hence I was curious how it solves that problem?

And by the way, thank you very much for this great application. It's been my daily companion for years!

Our solution: If the lists have the same length, list A is the solution, so we can just copy it over to B. Otherwise we first walk through list B and remember every element we find in a hash map. Then we walk list A and for each element we check if it's in the hash map, if yes we overwrite the nth element in list B with it and increment n by one, otherwise we ignore it. At the end B is sorted according to list A.

The drawbacks of this solution was basically the additional memory cost for the hash table, otherwise it was benchmarked to be the fastest solution in our class.

raccoon · Post by **raccoon** » Tue Mar 22, 2022 11:35 pm

Do you mean "stable sort"?

Post by **void** » Tue Mar 22, 2022 11:44 pm

Everything doesn't sort at all with fast sorts.

Instead, Everything uses the fast sort index when searching for results.

For example:
Name sort index: A, B, C
Path sort index: B, C, A
Size sort index: C, B, A

When you search using the Name sort, Everything will search with the fast Name sort index.
When you search using the Path sort, Everything will search with the fast Path sort index.
When you search using the Size sort, Everything will search with the fast Size sort index.

This means there's multiple indexes for each property (name, path, size, date modified and any others you have enabled)
Each index will only use a few MB.

Everything 1.4 does flip the results if the sort order is different to the index order (ascending vs descending) which can take a few milliseconds.
Everything 1.5 doesn't flip the results and just flips any item reference (instant).

Searching with the name index is very fast in Everything as the data is also stored in order of name.
A single CPU cache line can hold several files.

Searching with any other fast sort index is slightly slower as a CPU cache line will only hold one file because they are referenced out of name order.
Access to each file data will be a CPU cache line miss.

finneon · Post by **finneon** » Wed Mar 23, 2022 10:40 am

Thank you for the detailed reply!

Everything doesn't sort at all with fast sorts.

Instead, Everything uses the fast sort index when searching for results.

Ah, that's clever. But how does the code handle this case. We have:

Name sort index: A, B, C
Path sort index: B, C, A
Size sort index: C, B, A

Then the user enters some search text, while sort by name is active, which produces:

Result (by name): A, C

What happens when the user then clicks on the path column? Like how does the code know that the result list now needs to look like:

Result (by path): C, A

Does Everything then simply search again in the path sort index, to generate the results in the correct order?

Post by **void** » Wed Mar 23, 2022 10:52 am

When you change the sort order:
Everything will 'mark' the current results.
Iterate over every item in the new sort index.
If an item is 'marked', add it to the new result list.

finneon · Post by **finneon** » Wed Mar 23, 2022 11:00 am

void wrote: ↑Wed Mar 23, 2022 10:52 am When you change the sort order:
Everything will 'mark' the current results.
Iterate over every item in the new sort index.
If an item is 'marked', add it to the new result list.

Thank you! That explains it. It seems like that's indeed what we did in our solution, we just marked the items from list B in a hash map, because we can't mark single value items like integers otherwise, whereas I suppose Everything probably has some space reserved in every database item for the mark.

voidtools forum

Question about the fast sort code

Question about the fast sort code

Re: Question about the fast sort code

Re: Question about the fast sort code

Re: Question about the fast sort code

Re: Question about the fast sort code

Re: Question about the fast sort code