-
I want to know about CPD's principle. I guess:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
And if it base on tokens's hashcode, how can i get the hashcode from the cpd? |
Beta Was this translation helpful? Give feedback.
-
@cyw3 that's an interesting question, with a hateful answer: it depends. At the core, CPD takes tokens from a "lexer source", and hashes the contents on a rolling window of the requested length via the However, this "lexer source" is provided by each language module, and the particulars may vary. The basic lexer ( Officially supported languages use actual lexers for the language, which not only normalizes whitespace, but actually gives us one extra benefit: we can ignore comments. Moreover, some languages further refine these token sources, allowing extra behaviors, such as:
So, bottom line:
As for your last question, the hash of each window is not part of the public API, just computed during analysis and discarded immediately afterwards. It can be obtained directly from the |
Beta Was this translation helpful? Give feedback.
@cyw3 that's an interesting question, with a hateful answer: it depends.
At the core, CPD takes tokens from a "lexer source", and hashes the contents on a rolling window of the requested length via the
--minimum-tokens
argument. The general matching algorithm is implemented onMatchAlgorithm
.However, this "lexer source" is provided by each language module, and the particulars may vary.
The basic lexer (
AnyTokenizer
) just tokenizes the text of the analyzed file with no knowledge of the grammar itself. No language officially supported by PMD uses this, but it can be used to analyze any text file normalizing whitespace.Officially supported languages use actual lexers for the language, whic…