Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Bug) Slow or no XML export with "Child elements prefix" enabled #132

Open
OhNoxius opened this issue Dec 16, 2020 · 19 comments
Open

(Bug) Slow or no XML export with "Child elements prefix" enabled #132

OhNoxius opened this issue Dec 16, 2020 · 19 comments
Labels

Comments

@OhNoxius
Copy link

Describe the bug
For a while now (since update v61 I believe?) this plugin fails to export some of my large Google Sheets with the "Child element prefix" enabled, or does so at an extremely slow speed (talking several minutes). Examples:
NO EXPORT: https://docs.google.com/spreadsheets/d/1UjLKdSLz0bE2vR0zxUIEN3MDP351Fx71Cxg7vAOCn7c/edit?usp=sharing
SLOW EXPORT: https://docs.google.com/spreadsheets/d/1GW-fs45EtO0ODb3riHRF9A4DHszCQd_r0AyvBw5veqw/edit?usp=sharing

For clarification: Problem persists in various browsers (Chrome, Firefox, Edge). Export with "Child element prefix" de-activated works just fine, also export to JSON (default settings) encounters no problems.

These sheets are pretty large, and make heavy use of the IMPORTRANGE function to import data from other sheets. They are also managed by different accounts, two of which I am currently logged in to. Before the specific update I would export these sheets regularly, and although there were a bit smaller in size then (the size keeps growing over time) they were roughly in the same order of magnitude. The export would usually complete in less than a minute.

I'm sorry for only reporting this issue now, since it started occuring some time (and some versions) ago and I forgot which specific update started it.

Love this add-on! Used to use it every day but now that's not possible anymore. Hope a solution is possible!
Karel

@OhNoxius OhNoxius added the bug label Dec 16, 2020
@Synthoid
Copy link
Owner

Interesting. V61 is when I started messing with a few XML features so that lines up with your stated timeframe. Likely that means this has something to do with either exporting booleans as integers or namespace support. Very strange that it only occurs when exporting child elements though... I'll look around and see if anything stands out.

@Synthoid
Copy link
Owner

Finally managed to get some time to test this. Both files do take some time to export, but both do indeed export (with or without child elements). I did have to copy the file contents manually since I did not have permission to access the sheets that were being imported from. That could mean that the IMPORTRANGE function is messing up the export somehow, as you theorized. I'll try creating duplicate sheets that import their data as well so see what happens there. I had the following export times:

Name Child Elements Time (Approx)
jazz_BE_concerts false 3:14
jazz_BE_concerts true 4:40
belgianjazzwiki false 0:23
belgianjazzwiki true 0:39

It is troubling that child elements take longer to export, but I think I can clean that up a bit by caching key parameters (including namespace and prefix data). Would you be able to provide the settings you used to export the data? Maybe I'm missing something that you have enabled.

@OhNoxius
Copy link
Author

Ok, thanks for addressing this so quickly. Other settings that are not default:

  • Include first column
  • Root element: "jazz_BE_concerts"
  • Replace existing file (sometimes)
  • Minify data (sometimes)

Rest should be as is. The sheets from where I import data go down multiple levels - all in all it's around 30 sheets that are combined in this 1 document. Sharing them all with you is probably not the most elegant solution?

@Synthoid
Copy link
Owner

Ha, those were pretty much the options I was testing with, minus root tag. With some column name/namespace caching the export time for belgianjazzwiki has shortened to around 24 seconds with or without child elements so those export times appear to be more standardized. jazz_BE_concerts seems to be throwing an error now for some reason, so I'm looking at that currently. I think it's coming along though!

You don't need to share those files. I've copied the content out by hand and can just set up my own sheets to test the IMPORTRANGE function.

@Synthoid
Copy link
Owner

After some testing and digging around I think I know what is going on. Google imposes quotas on Apps Script execution, the most important of which is a 6 minute kill switch on any single script execution. Basically, if an export process takes longer than 360 seconds, google will force it it to stop and fail. It seems like multiple thousands of rows is the current operating limit on ESD due to that timeout constraint. This means that I'll need to optimize even more in order to support larger files.

It seems like the export process has a rather large range of possible times, some taking only 300 seconds, others taking longer. Since the code is executed on Google's servers, it's a little difficult to see what resources are being allocated to understand why there's such a huge potential gap (just exported again, and it only took 220 seconds). Specifically, this does seem to be a horrendous oversight in XML exports since JSON can complete similar operations in ~3-4 seconds. There were a couple promising leads on breaking script execution out into multiple executions which may help solve the issue (and allow for faster export times) but those may end up forcing ESD to hit a different quota limitation. That does make me think the IMPORTRANGE function doesn't actually affect export time much however.

TL;DR ESD is taking too long to export large data sets in XML and the export process needs to be optimized.

@OhNoxius
Copy link
Author

Would be nice if Google showed a notification when a quotum is breached. Or is that something that the app (your code) receives and can display?

For clarification, do you still manage to export the large sheet (jazz_BE_concerts)? Because my last succesful attempt dates back to December 3rd. Since then I basically tried every day without luck. Since the code runs on Google's servers, it's funny that it would work for you and not for me. Maybe the IMPORTRANGE takes a bit longer to fetch the data?

Thanks for now at least. Hopefully some optimization is still possible? (...and in the meantime I'll start exploring the possibility of using JSON - not a good match for XSLT 1.0 code, but who knows)

@Synthoid
Copy link
Owner

I wonder if part of the slowdown is because XML exports use XmlService while JSON uses native JavaScript functionality? I don't think my XML code is particularly inefficient and it is much less complicated than the JSON export code. It does make heavier use of strings though, particularly for formatting tag names to comply with XML standards. It's possible that is the culprit? I'll dig around some more to see if I can clean that up or cache some more values.

As for the error, the export process should detect and gracefully fail when errors are thrown, but the problem is google just stops the script execution. There may be a way for me to listen for that, I'll need to check.

@OhNoxius
Copy link
Author

I guess Google's own functions should be pretty efficient? But indeed maybe calling a service instead of using 'local' JS slows it down.

*on a side note: I'm curious to how you format your tag names to be XML compatible. I've tried myself in JS with some simple regex replacing obvious forbidden characters ('cause I don't really know regex), but it's far from universal.

@Synthoid
Copy link
Owner

I can believe that a lot of string formatting would slow the execution down a bit. XML name validation is largely based on the standards laid out on W3schools. ie Letters, numbers, dashes (-), dots (.), and underscores (_) only with the starting char needing to be a letter or underscore.

@OhNoxius
Copy link
Author

OhNoxius commented Jan 7, 2021

with v65 still not able to export my big sheet, but thanks for the effort though. One thing that's more of a feature, but could decrease export time significantly, is the ability to choose a custom tagname, instead of formatting each tag from the first column. Less computation, and in most cases smaller filesize, as shown in my case below:
xmltags
When "Include first column" is enabled, the tagname could be a custom name, or generated from the column header or sheet name, maybe include the row number...

@Synthoid
Copy link
Owner

Synthoid commented Jan 8, 2021

Sorry the optimizations are taking so long! v65 was getting pretty big and I wanted to push out what I had before making more changes. The key caching was really the only optimization that made it into v65. I'm looking into more optimizations now.

As for the tag names, I'd recommend adding a column called "tag" or something similar as column A then making that be "row", for example:

tag id date
row 1 1/1/2021
row 2 1/2/2021

The main reason this is the suggested approach is because it's very likely that separate sheets, or even rows within the same sheet would have separate tag names. Adding a sheetwide setting would be a fair amount of work to add a less flexible feature than ESD is already capable of doing. You are correct that it would be faster to have a flat shared tag name, but I think I would prefer to optimize across more broadly used and expensive processes first.

@Synthoid
Copy link
Owner

Synthoid commented Jan 8, 2021

Cached a lot of values last night, but it is super hard to test if caching actually has an impact or not. Because google dynamically allocates resources to apps script processes, exporting just the largest tab from the jazz_BE sheet can vary between 270 and >360 seconds. Complicated even more by file creation adding extra time on top of the export process.

In theory, caching column names and formatting should cut down on string generation by a factor of (columns * (rows - 1)) but the inconsistent export times make optimizations like that frustrating to test. I basically need to optimize enough that the slowest possible export (ie the process with the least resources allocated) takes less than 360 seconds.

I'll take another pass at optimizing tonight. Maybe the final string generation is taking up a good chunk of the time?

@Synthoid
Copy link
Owner

Synthoid commented Jan 9, 2021

After throwing in a bunch of console logs to measure times, I think I know what the problem is. ESD uses XmlService to generate XML from the sheet's contents. According to this page about apps script best practices, apps script code should avoid calls to external services, presumably including XmlService.

I confirmed that XML is roughly 100x slower than exporting sheets as JSON arrays (~3.7 seconds), with each row taking ~0.03 seconds when exporting XML. This may seem like a small amount of time, and for sheets with only a few hundred rows it would be, but sheets with thousands of rows will compound that horribly. Unfortunately, it seems that apps script does not have access to the standard XML functionality accessible in JavaScript run in browsers.

I'm going to look around some more, but this is pointing to me having to roll my own apps script XML parser if I'm going to speed things up...

VaderNo

@OhNoxius
Copy link
Author

Yikes... I have no idea what your code looks like, or even what language it's in, but you'd basically have to write the XML export again from scratch? Like you did with with JSON?

I'll try with a separate first column, I understand it's not a priority to implement something like that. But it was something that was on my mind for a while now (actually since I started using your add-on), so I thought I'd just share it with you.

@Synthoid
Copy link
Owner

I actually wouldn't need to adjust the export code much, I just need to make my own version of XmlService that runs in ESD instead of a separate service. It's just a lot of work to duplicate existing functionality purely to speed things up.

@Synthoid
Copy link
Owner

Quick status update. Initial testing with a custom XmlService is much faster. Using the standard XmlService and a custom version to create and format the following XML 10,000 times:

<data><!--This is a comment--><child>This is a text node!</child><child test="1" /></data>

With the results:

XmlService: 84.817 seconds
FastXmlService: 0.098 seconds

This is a little bitter-sweet. The custom version is waaaaaaay faster, but now I'll have to recreate XmlService from the ground up. That'll probably take a bit for me to set up. I've laid out the skeleton of the custom service code, but there's a lot of missing functionality currently so it can't be used yet.

vb-brock-getting-stupid

@OhNoxius
Copy link
Author

Crazy...what a performance increase. No hurry of course, I'm waiting in antcicipation :)

@Synthoid
Copy link
Owner

Finally getting some more time to work on XML updates, albeit months later. Sorry for the slow progress on this.

My initial attempts at improving XML export time have proven somewhat successful. Export times on the files used in this thread have dropped from 300+ seconds (aka operation timeout limits) to ~5 seconds with identical results for exporting individual sheets:

jazz_BE_concerts_test - concerts (old).txt
jazz_BE_concerts_test - concerts (fast).txt

image

For some reason, exporting the entire sheet results in incorrectly formatted values. Not malformed XML, just incorrect settings. Specifically it seems like some attributes are interpreted as child elements when they shouldn't be. There are a few minor differences in my FastXmlService and Google's internal XmlService classes, so I can only assume one of those small differences is causing the issue. On the bright side, the entire sheet only took about 12 seconds to export. That's still quite a while, but considering it's exporting 10's of 1000's of rows, that's a significant improvement.

I'll continue working on this as I'd like to optimize XML export before moving on to automation then eventually v2 of nested elements (with XML support). Sorry again for the delay.

@OhNoxius
Copy link
Author

OhNoxius commented Dec 6, 2021

Looking forward to the new version! thanks for your work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants