-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve the performance company-etags
#903
Comments
Good news. Looks Tested with Linux kernel, My new simple string match algorithm take 6 seconds (read the file and build the list) ,while |
Not inefficient, or not efficient? If it's the latter, do you have a smaller patch for I'm a bit skeptical, though, considering I've looked into its performance a couple of years ago. |
See my pull request, when I started coding, I changed the design a little bit. Some guy has being used my patched version for two weeks and the feedback is very positive. See https://emacs-china.org/t/c-emacs/10068/11 in Chrome, and use google translator to translate the page. Here is user's translated comment, I' used similar partition algorithm on tumashu/pyim#277 before working on company. So it's very mature. The key idea comes from mysql's partition algorithm. |
That's not something that can be applied to Emacs, though. |
I'm not sure what you mean. The point 2, 3, 6 from first post is used in final implementation. Others are not implemented. I could explain with more details if you have any questions. |
A patch against Emacs core? |
You said that your "new simple string match algorithm" speeds things up by itself (aside from the other improvements). Or that's at least how I understood it. |
It's a pure lisp algorithm which is part of Old code is like The new code is like,
I divide In Linux kernel, there are 3 millions tag names. I divide them into 52 partitions (a..z, A..Z). For one given prefix, only one partition is searched and the other 51 partitions are skipped. So 2,942,307 string comparing ( 3,000,000 * 51 / 52) is skipped. That's the algorithm implemented in I did say before I could speed up the code from Emacs core. But later I gave up the idea after investigation (like many of my other ideas). |
OK, so it's not string matching, it's matching with partitioning. I'm familiar with the idea. It doesn't speed up the creation of the completion table at all, right? If anything, it might make that phase longer. It can speed subsequent completion, though. Do you have any performance numbers, with and without partitioning?
Because..? |
"speed up the creation of the completion table" is another algorithm. Let me explain the technical details. In old code,
It's slow simply because of overhead from extra layer For example, consider Here is its definition:
In my new code, The key algorithm is actually just 4 lines,
Then I found my code is fast enough. Besides, it's Lisp and more flexible than C. So I gave up on submitting patch to Emacs core team. I can give your the performance number with or without partitioning later. |
I'd really like to see the performance numbers from that change alone. |
Code to test linux kernel (tags file is 202M):
Link of full test video, https://youtu.be/9t29LfYx10w |
I wonder how much of that speedup comes from the removal of |
No I don't support implicit tag names. I drop support for etags. Everybody is using Ctags these days. |
So if it's ctags, why the file name is |
The file name is created by ctags. Quote from ctags manual,
I think the name of Currently Exuberant Ctags is the default standard. But Universal Ctags is gaining popularity. These two are compatible Other tag programs are based on ctags (https://github.com/dan-t/rusty-tags , for example). |
OK, but if we're using the etags format, we have to support it fully. Does Anyway, |
Could have two modes,
The problem of make etags mode default is most users don't bother switching to ctags mode even when they are using ctags only. So the best solution is to make ctags mode default and give user some warning when detecting tags file generated by etags. The only problem is I don't know anybody using etags. Even for Ruby, https://www.google.com/search?q=ctags+ruby&oq=ctags+ruby So maybe not worth the effort. I understand etags can produce two kinds of tag lines. Verbose kind is the format used by both etags and ctags. The compact kind (implicit tag name) is used only by etags. |
@dgutov , check my latest code. Now support both etags and ctags. User can also Besides, user can customize tags file name now. |
The variable name could use some work, but ok. How's the performance in the "slow" case? |
Benchmark code, ;; Test in Linux kernel code
(let* ((gc-cons-threshold most-positive-fixnum))
(message "%S vs %S"
(benchmark-run-compiled 1
;; Run `find . -name "*.[ch]" | etags -` to create tags file
(setq raw-content (with-temp-buffer
(insert-file-contents "~/projs/linux-master/TAGS.etags")
(buffer-string)))
(setq company-etags-support-ctags-only nil)
(company-etags-extract-tagnames raw-content))
(benchmark-run-compiled 1
;; Run `find . -name "*.[ch]" | ctags -e -L -` to create tags file
(setq raw-content (with-temp-buffer
(insert-file-contents "~/projs/linux-master/TAGS")
(buffer-string)))
(setq company-etags-support-ctags-only t)
(company-etags-extract-tagnames raw-content)))) Result: |
@dgutov , Anything else I need do before merging? |
@dgutov, I noticed you reverted my pull request. What else I need to do to get my code merged? I could re-create a new pull request if required. |
For posterity: see the linked PR. |
Per discussion #877
There is much room for improvement:
Use producer/consumer pattern, consumer only searches matches in candidate list. It should know nothing about tags file. The producer is responsible to convert tags file into candidate list (maybe in a different thread if possible)
All the tag names are assigned to different partitions, each partition only contains the string with same first character
As I mentioned before, we could use CLI program
diff
to get the minimum update patch for the candidate list.More efficient APIs and 3rd party CLI tools should be used. For example, if candidate list is just a plain string containing lines, cli program
sort
could be used to sort lines, which might be faster than sort a Lisp list of strings.build a cache system. Like cache for CPU. Say user inputs "get" to get all the tags starting with "get". It's very possible she still needs the same tags next time.
cache the candidates. For example, when user input
test1
to search all tag names start withtest1
, it's very possible candidates start withtes
is already returned, why not cache thetes
result in a variable. So searchtest1
could re-usetes
result.@dgutov , if you are interested, I can send you a new pull request asap.
The text was updated successfully, but these errors were encountered: