Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to transfer JavaScript data to .NET faster than via JSON #254

Open
AllNamesRTaken opened this issue Apr 29, 2021 · 9 comments
Open

Comments

@AllNamesRTaken
Copy link

Hello,
Great library!
But I make calls to APIs and build data in the form of a table that I return to the .NET host.
The format of the data is something like {[key: string]: string | number | bool | null | undefined}[]
This format is straight forward to convert into a DataTable but the performance is abysmal due to slow read of property data.
I guess it is due to the following code in the GetProperty call:
engine.MarshalToHost(engine.ScriptInvoke(() => target.GetProperty(index)), false);
Is there a way to tell the engine to return somthing like Array<Dictionary> or just not a DynamicObject?

If not, are there workarounds or plans to solve this?

@ClearScriptLib
Copy link
Collaborator

Hi @AllNamesRTaken,

Thanks for your kind words!

If we understand correctly, you're building an array of dictionaries in script code and passing it to a .NET method for processing. The .NET method uses a separate dynamic call to retrieve each dictionary entry. This results in poor performance due to a large number of hops across the host-script boundary.

If that's correct, our first suggestion would be as follows. In script code, once you've constructed your data, transform it into an array of JSON strings using JSON.stringify. Pass the result to your .NET method. On the .NET side, use a library like Newtonsoft.Json to deserialize each JSON string into a .NET dictionary (see here).

If performance is still not up to par – for example, if your array is so large that retrieving each JSON string separately is a problem – you should be able to convert all the data into one JSON string and pass it to .NET in a single hop.

It's also possible that your data is so large that converting it all into a JSON string is impossible. In that case, you might have to get more creative. One possibility might be to use a JavaScript typed array as your data transfer medium. ClearScript offers fast copying to and from typed arrays.

Thoughts?

@AllNamesRTaken
Copy link
Author

AllNamesRTaken commented Apr 30, 2021

Thank you for responding so quickly.
I believe you hit the nail on it's head. The JSON.stringify + json.net convert is the solution i use as a workaround. But for large tables it is still a costly solution when compared to the speed of V8, which is why i had hoped for a solution where i.e. leafs / object properties with basic data types could have their values automatically converted. That way the marshaling would only happen per row in a datatable (edit. js object array).

Is that of interest or a bad idea?
[EDIT] words are difficult :)

@ClearScriptLib
Copy link
Collaborator

ClearScriptLib commented Apr 30, 2021

Hi @AllNamesRTaken,

That way the marshaling would only happen per row

Please clarify. Our understanding is that each dictionary in the source data ends up being a row in the final table. If that's correct, and if you're transferring each dictionary as a JSON string, you're already marshaling only once per row.

But for large tables it is still a costly solution when compared to the speed of V8

Since you're starting with a pure JavaScript data structure, marshaling is unavoidable.

As a general principle, ClearScript favors proxy marshaling over automatic conversion. The goal is to avoid data loss, but as you've noticed, accessing data across the host-script boundary is expensive.

In performance-sensitive scenarios such as yours, where that expense is unacceptable, the best solution is to alter your data access patterns in order to minimize hops across that boundary. By switching to JSON strings for the dictionaries, you've taken a big step in that direction (incidentally, we'd love to get an idea of the improvement it yielded over your original approach).

Beyond that, here are some ideas for further gains:

  1. Transfer your entire array as a single JSON string. The performance gain is likely to depend on the array size.

  2. Can you think of any way to optimize your data pre-transfer? For example, do the rows in your final table use a common schema? If so, it may be more efficient to transfer the data as a list of value arrays rather than dictionaries.

  3. Use a format that's more efficient than JSON. Standard JavaScript only supports JSON, but there are libraries out there for things like UBJSON and MessagePack. If you serialized your data to a binary buffer (Uint8Array et al), ClearScript could transfer it to a .NET array very efficiently. On the other hand, it's possible that V8's native JSON serializer is faster than anything one could code in JavaScript, even if the resulting data is larger.

Finally, as you suggested, ClearScript could natively implement some form of fast structured data transfer. That's something we've had on our backlog for a while, and we'll definitely look into it for the next release.

Please send any additional thoughts or findings!

Thanks!

@AllNamesRTaken
Copy link
Author

Hi,
The data was about 2400 rows long and 19 columns wide. We returned it as a js array of objects, with each row being one js object with 19 keys. Values were a mix of number and string types.

The original solution was to loop over the dynamic object representing the array and creating DataRows in a DataTable, one per dynamic representing the row object. Then per row looping over the keys and filling the cells with values. It was obvoius from VS perfomance metrics that the property access was the performance drain.
Total time on my machine debugging, though fluctuating, for this method was about 200- 300ms.

The current method is to JSON.serialize the whole array of objects in V8 and the do a Deserialize to Datatable with json.net.
Total time in debug mode for json.net was about 50ms, pretty deterministic.

For 2400×19 cells this feels slow.
Optimally would be iterating over the data as a pure array of dictionaries or something similar but i can understand the goal of not marshalling automatically to avoid errors from type conversion. But many usecases are not in the risk zone of these errors.

The middleground i was pondering was to automatically translate basic value properties such as string number,null and bool on objects. This would then reduce the need to marshall on every propery access when transforming into a datatable. If you wish to still allow for pure access to v8 types there could be a separate function on the clearscript object to access the pre translated data which would fallback to marshalling if nu value was found.

This would reduce the time in my example close to 20 times for a more reasonable total of 10-15ms. The wet dream ofc would be a way to ask for automatic translation of arrays as well which would allow for very large data sets at low overhead.

Regards
Joel

@ClearScriptLib
Copy link
Collaborator

ClearScriptLib commented Apr 30, 2021

Hi Joel,

Thanks for providing that information! We'll run some experiments and get back to you. Hopefully your current JSON-based solution is good enough in the short term.

The middleground i was pondering was to automatically translate basic value properties such as string number,null and bool on objects. This would then reduce the need to marshall on every propery access when transforming into a datatable.

It would appear that, by "translate", you mean transfer all primitive-valued properties during the initial proxy handshake, so that retrieving them no longer requires a round trip to the script engine. Is that correct? If so, it's an interesting idea; the problem is that access to those properties would no longer be "live", and the proxy could diverge from the original JavaScript object unless some coherency protocol were in place.

Quick question: We'd like to provide some kind of fast structured data transfer, but for practical reasons we're thinking of limiting it to JSON-compatible data. In your case, that would exclude undefined as a legal property value. Would that be a big problem for you?

Thanks again!

@AllNamesRTaken
Copy link
Author

AllNamesRTaken commented Apr 30, 2021 via email

@ClearScriptLib
Copy link
Collaborator

Hi again @AllNamesRTaken,

We've now run a bunch of tests with randomly generated JavaScript data similar in "shape" to the data in your scenario. Our goal was to find a way to transfer it more rapidly than your current solution involving JSON.stringify and JsonConvert.DeserializeObject.

Unfortunately, we've had to abandon this effort. The fundamental problem is that V8's public C++ API incurs enough overhead to offset any performance gain we can get from using a better format than JSON. In fact, in our tests, JSON.stringify consistently produced its results in less time than it took ClearScript to simply iterate the data via the public API.

Interestingly, we then found that JavaScript code could iterate the data much faster than C++, but script-based serialization was much slower. So we tried a hybrid solution, where a JavaScript function iterated the data and called out to C++ for serialization. That approach managed to pull out a slight win over JSON.stringify, but only as long as the data was all numeric. When we added strings to the mix, JSON.stringify again won easily.

We'll keep this issue open as a future enhancement, and we'll watch for new V8 APIs that might help. In the meantime, please don't hesitate to send any additional thoughts or findings our way.

Thank you!

@ClearScriptLib ClearScriptLib changed the title Performance of converting to datatable is bad Add a way to transfer JavaScript data to .NET faster than via JSON serialization May 10, 2021
@ClearScriptLib ClearScriptLib changed the title Add a way to transfer JavaScript data to .NET faster than via JSON serialization Add a way to transfer JavaScript data to .NET faster than via JSON May 10, 2021
@promontis
Copy link

This is such in interesting discussion! Have you guys tried https://github.com/lucagez/slow-json-stringify or https://github.com/fastify/fast-json-stringify?

Did you also try Protobuf?

@ClearScriptLib
Copy link
Collaborator

Thanks @promontis! We haven't tested any external serialization libraries on the JavaScript side, but we certainly encourage such experimentation, and we welcome any findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants