Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leaks #94

Open
phrazer opened this issue Sep 9, 2014 · 27 comments
Open

memory leaks #94

phrazer opened this issue Sep 9, 2014 · 27 comments

Comments

@phrazer
Copy link

phrazer commented Sep 9, 2014

when i try it with > 100kb files, it almost always gets to this message:

  • FATAL ERROR: JS Allocation failed - process out of memory

it also runs long time, before showing this error (few minutes). Would it be possible to process diff with multiple child processes. Every child would process part of html, use less memory and we would utilize multiple CPUs (since all servers nowadays have few processors).

Or some other memory handling, to prevent leaks? Otherwise, great code, very usefull!

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Actually, I've tried it with > 100kb files and it works.

But I suppose that your problem concerns the recursive algorithm which html differ uses.
The recursion depth is too big, so the tool fails.

In the version 1.0.0 I do not use any recursion, so I think that html-differ will work much faster and do not break at your case.

It would be great if you give me an example of a file which now fails.

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Or you can even help me and test the prototype of version 1.0.0 )

Checkout the branch support/1.0.0 (be careful! I've broken the API) and test it on your files.

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

I tried to install version 1.0.0 but it always says its running 0.5 when you check with -v (version)

Also it runs long and then dies. What is the procedure to install version 1.0.0. correctly? I downloaded branch support/1.0.0 and then went to that folder and executed "npm install ."

I can send you files, can you give me your email, or something to comunicate? Best,

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Yep, version 1.0.0 is in developing mode.

You should clone the branch support/1.0.0 from github, go to the cloned folder and run

npm install

bin/html-differ --help

my email:
[email protected]

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Hmm strange, it still doesnt work. And you email doesnt work either, it says no such user. best,

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Send here. I'll think about this problem.

[email protected]

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

I sended. I think it gots problems with bigger htmls because like you said recursive algorithm. But with 1.0.0 it doesnt seem to be any diffrent. Maybe we could split html to few slices, and process them separatly.

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Thank you! I'll figure it out as soon as possible!

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

OK, thanks.

If I can help in anyway, please tell me. Best regards,

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

I've understood why this happens.

It is so many diffs in the given HTMLs, so the work of the module jsdiff (I use it as dependence in html-differ) becomes really slow! Yes! It works, but the process of the comparison is so long that the Operating System thinks that the program fails and stop it in order not to break the work of OS in general.

I've opened an issue in jsdiff == > issue34

P.S. If you run html-differ on files > 100kb and they will have not such many diffs - the tool will not fail. (even if you use version 0.5.0)

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Hmmm,

The problem is that a lot of smaller files (>40+50 kb) also failes. If jsdiff adds maximum loop, we didnt solve the problem, because we dont show all diffs. The files I sended are real life webpages, so they should make problems.

Maybe the solution would be to split html to chunks and parallel process this chunks. Just an idea. Im more into php, so dont have very good knowledge about node.js.

Best regards,

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

The problem is not in sizes of HTMLs, but in the amount of diffs between them.

How can you split the HTML? What criteria to use? If you split it , you can break it and the comparison will be not fair.

The idea of the differ was to compare not absolutely different HTMLs.

Is it necessary to compare HTMLs if they are absolutely different and you can see the diffs without any tools?

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

As the variant. jsdiff adds maximum loop.

You see not all diffs, fix them or do something else what you want and run html-differ again.

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

I've found the solution!
Seems, it is the best!

In current implementation of html-differ the module jsdiff compares texts by words (there are several mods: charDiff, wordDiff, lineDiff ...). If we call jsdiff in mode lineDiff, the amount of diffs becomes less, so jsDiff does not fails.

Solution:

This mode of jsdiff will be configurable in html-differ through an option, so you can choose how to compare HTMLs (by chars, byWords, byLines).

Does this variant suit you?

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Hey,

Great 👍 Is the result still the same if you compare line by line instead of word by word? Because of classes, and other things you meantioned in README?

How can I apply this patch, what file should I change to enable this, so I can test some more files? :)

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

I am going to do this in version 1.0.0. There are some moments concerning this option which should be handled.

@eGavr eGavr added this to the 1.0.0 milestone Sep 10, 2014
@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

If I will compare texts by lines, the option ignoreWhitespaces will work in another way - it will ignore all whitespaces except \ns. If more than two \ns go successive, they will turn into one \n.

For example, if you set the option which compares files by lines, ignoreWhitespaces will work in the following way:

<span>


</span>

turns to

<span>
</span>

Does this variant suit you?

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Yep, it should do it, as long as both are changed same way. Commit changes asap, and I will test with few diffrent htmls.

I had another idea, what if we first check line by line, and then if lines are the same go on, if not we also check word by word that line. Maybe then you get best from both worlds - speed and great diff tool. Just an idea, dont know if its possible to make it.

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

Hmmm...

Not bad, not bad...

I'll think about it and fix it ASAP!

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Great,

Thanks for fast replay. I will prepare diffrent (sizes, tags, etc...)
html examples files and try and debug.

Best regards,

@eGavr
Copy link
Member

eGavr commented Sep 10, 2014

I will do the implementation of a new option (my solution), it should solve your problem too.

And after that in minor version I 'll think about the enhancement that you have proposed, because I also need ASAP something working ) and after release of 1.0.0, I'will think about speed and quality enhancements.

Created an issue concerning your idea == > change the comparison algorithm

@phrazer
Copy link
Author

phrazer commented Sep 10, 2014

Yeah, I agree with you 👍 👍

@eGavr
Copy link
Member

eGavr commented Sep 11, 2014

I can not understand, why do you need differ in tasks when you can find diffs without any tools?
I tried to implement the idea of comparison by lines, but there are still so many diffs that jsdiff fails.

Also see the comments about you ideas concerning the new algoritm of comparison ==> #95

@eGavr eGavr removed this from the 1.0.0 milestone Sep 11, 2014
@phrazer
Copy link
Author

phrazer commented Sep 11, 2014

Because I really like your tool, and it was working great. I dont understand, why you dont want it to work on bigger files too. Can you just add option to choose between lines, words as option? I tried to do it myself but could find what function is calling diff.js. Thanks,

@eGavr
Copy link
Member

eGavr commented Sep 11, 2014

It is not a problem to compare texts by lines!

I've implemented this option, but it will not help you.

Because as I've already said you that even in this mode! there are so many diffs that jsdiff fails.

When we compare by words, we receive more than 50000 diffs - it is too big for jsdiff.
And even when we compare be line, we recieve more than 6000 diffs - it is very big result for jsdiff too.

@phrazer
Copy link
Author

phrazer commented Sep 11, 2014

Hmm, I see. In my tests no problem to 2000 - 3000, then it stops. Could we optimise something else?

@eGavr
Copy link
Member

eGavr commented Sep 11, 2014

It seems No! It is absolutely the task of jsdiff(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants