-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc sort improvements #1706
base: master
Are you sure you want to change the base?
Misc sort improvements #1706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your PR submission. Setting aside performance, I think it is a good idea to use this collate
package anyway as it makes the code more readable.
That being said, I tried your branch very briefly and found that if dirfirst
is disabled then the sorting is skipped entirely since sort.SliceStable
is never called. Please do more extensive testing to ensure the various sorting options are covered.
0ea3551
to
51144aa
Compare
Thanks for the quick review, and you're right, I was a bit tired when I made this. |
nav.go
Outdated
case naturalSort: | ||
sort.SliceStable(dir.files, func(i, j int) bool { | ||
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia) | ||
if !dir.reverse { | ||
return naturalLess(s1, s2) | ||
} else { | ||
return naturalLess(s2, s1) | ||
} | ||
}) | ||
lessfun = func(i, j int) bool { | ||
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1 | ||
} | ||
case nameSort: | ||
sort.SliceStable(dir.files, func(i, j int) bool { | ||
s1, s2 := normalize(dir.files[i].Name(), dir.files[j].Name(), dir.ignorecase, dir.ignoredia) | ||
if !dir.reverse { | ||
return s1 < s2 | ||
} else { | ||
return s2 < s1 | ||
} | ||
}) | ||
lessfun = func(i, j int) bool { | ||
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two cases have the same code, so you should be able to combine them:
case nameSort, naturalSort:
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
It's probably also worth putting a comment stating that naturalSort
is already handled above.
This actually raises the question of whether natural sorting should have its own option, instead of being lumped under sortby
, since it is really a sorting modifier and not a property (e.g. name, size, time) that is used as a basis for sorting. But changing the configuration options is a breaking change, so I think it is not worth worrying about for now.
nav.go
Outdated
// Finally sort after filtering it all | ||
sort.SliceStable(dir.files, lessfun) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought this through, but does it make sense to do all the filtering (i.e. dironly
, hidden
, filter
) first before doing any sorting? It would make the logic cleaner if all of the filtering could be moved to the top of this function, but I don't know if it will cause any issues.
Since you are pretty much rewriting the entire function, it's probably a good time to clean it up as much as possible.
I bumped some dependencies yesterday, looks like it causes a merge conflict. |
OK so I decided to play around with this a bit more, and I wrote the following code in a rush, didn't test it too much yet. I wonder if this might be more desirable: func (dir *dir) sort() {
dir.sortby = getSortBy(dir.path)
dir.dirfirst = getDirFirst(dir.path)
dir.dironly = getDirOnly(dir.path)
dir.hidden = getHidden(dir.path)
dir.reverse = getReverse(dir.path)
dir.hiddenfiles = gOpts.hiddenfiles
dir.ignorecase = gOpts.ignorecase
dir.ignoredia = gOpts.ignoredia
dir.files = dir.allFiles
filterFiles := func(files []*file, f func(*file) bool) (result []*file) {
for _, file := range files {
if f(file) {
result = append(result, file)
}
}
return result
}
if dir.dironly {
dir.files = filterFiles(dir.files, func(file *file) bool {
return file.IsDir()
})
}
if !dir.hidden {
dir.files = filterFiles(dir.files, func(file *file) bool {
return !isHidden(file, dir.path, dir.hiddenfiles)
})
}
if len(dir.filter) != 0 {
dir.files = filterFiles(dir.files, func(file *file) bool {
return !isFiltered(file, dir.filter)
})
}
collopts := []collate.Option{}
if dir.ignorecase {
collopts = append(collopts, collate.IgnoreCase)
}
if dir.ignoredia {
collopts = append(collopts, collate.IgnoreDiacritics)
}
if dir.sortby == naturalSort {
collopts = append(collopts, collate.Numeric)
}
coll := collate.New(language.Und, collopts...)
var lessfun func(i, j int) bool
switch dir.sortby {
case nameSort, naturalSort:
lessfun = func(i, j int) bool {
return coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
case sizeSort:
lessfun = func(i, j int) bool {
return dir.files[i].TotalSize() < dir.files[j].TotalSize()
}
case timeSort:
lessfun = func(i, j int) bool {
return dir.files[i].ModTime().Before(dir.files[j].ModTime())
}
case atimeSort:
lessfun = func(i, j int) bool {
return dir.files[i].accessTime.Before(dir.files[j].accessTime)
}
case ctimeSort:
lessfun = func(i, j int) bool {
return dir.files[i].changeTime.Before(dir.files[j].changeTime)
}
case extSort:
lessfun = func(i, j int) bool {
cmp := coll.CompareString(dir.files[i].ext, dir.files[j].ext)
return cmp == -1 || cmp == 0 && coll.CompareString(dir.files[i].Name(), dir.files[j].Name()) == -1
}
}
if dir.reverse {
oldlessfun := lessfun
lessfun = func(i, j int) bool {
return oldlessfun(j, i)
}
}
if dir.dirfirst {
oldlessfun := lessfun
lessfun = func(i, j int) bool {
if dir.files[i].IsDir() == dir.files[j].IsDir() {
return oldlessfun(i, j)
}
return dir.files[i].IsDir()
}
}
sort.SliceStable(dir.files, lessfun)
dir.ind = max(dir.ind, 0)
dir.ind = min(dir.ind, len(dir.files)-1)
} |
Thanks, this was actually in my head at some point, but I was a bit wary of doing changes too big with my lack of knowledge of Go and its slices. Good thinking about I included your changes except that I made filtering go through the file list only once. |
6640438
to
581185e
Compare
In fact, sort after all the filtering is done, makes more sense
f138d49
to
abfb1fd
Compare
I didn't realize it was possible to combine filters in the way you have done, nice. Anyway the code looks quite good now There's a minor issue with formatting that is causing the CI build to fail, but otherwise it is probably close to being in a state where it can be merged. Just out of interest how much of an increase in performance is there with the new changes? |
Not very different: Before:
After:
Basically, a little overhead for small directories, but in the absolute, nothing. Significant gains for the big ones (not even counting the dirfirst skipped with dironly trick or less to sort if a lot is filtered). |
Mostly authored by @joelim-work
abfb1fd
to
c152029
Compare
Sorry to trouble you, but I did some more testing and I found that the The following examples all print c := collate.New(language.Und)
fmt.Println(c.CompareString("a", "B")) // "B" should come first since it is uppercase c := collate.New(language.Und, collate.IgnoreCase)
fmt.Println(c.CompareString("a", "B")) c := collate.New(language.Und)
fmt.Println(c.CompareString("é", "f")) // "é" should come last since it is a special character c := collate.New(language.Und, collate.IgnoreDiacritics)
fmt.Println(c.CompareString("é", "f")) I'm not sure how to go about solving this though, I'm haven't used the |
Not troubling me, I'm quite happy you're doing so much testing (I should be doing!). Filed golang/go#67296 for now, because it sure as hell looks like a bug to me. |
Hello, first contribution and first time touching Go here, so don't be nice and roast me well if I did some dumb things.
I was looking at the sort function because I want to implement some features and noticed some less than optimal code, be it in performance or "looking nice to me".
I didn't add any dependency (though
text
isn't indirect anymore), but https://github.com/jeandeaual/go-locale might be a good idea to use the system locale instead of hardcoding English. The other improvement that might be a good idea is to useslices.SortStableFunc
, as recommended by Go; though it's only builtin starting with 1.22.Here are the performance improvements I measured by timing sort and logging
Before:
After:
The large test directory contains
touch test/{1..100000}
as a more throughput oriented benchmark, since the other directories are basically instantaneously sorted.