-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make abstract Bulk API methods better! #1370
Comments
Hey, we are refactoring the bulk v1 & v2 wrappers for jsforce v3 here:
the initial bulk v2 implementation added in jsforce v2 does that: Line 1112 in 588c6cf
unfortunately that leads to memory issues when working with big chunks of records (which is expected if you are using bulk APIs): we are reverting it back to returning a record stream in v3, the stream API is better suited this kind of wrappers that deal with ton of data.
jsforce v2 added an This will allow to keep the job ID and manually call
This would require jsforce to cache job results in the user machine, IMO not something a library should deal with.
this could work for Bulk v1. The Bulk V2 API doesn't allow to create batches, even if you try to push different records to the
job data can be uploaded once, then the org will create batches as it needs. |
Thanks @cristiand391 ! If the memory issues are bad enough that Since Bulk API V2 has very different batching behavior, it may not be necessary to add batch size specification since most clients will probably opt to use Bulk API V2 once it's available in I do agree documentation and examples could be improved upon. For example, the scenario in my original post: a bulk query + bulk update requirement. A good example would address how to structure the code so as to avoid memory issues in the event of large data volumes, how to cache job IDs and resume polling if something (e.g. a timeout) interrupts the initial script run, etc. I'm glad SFDC has taken over |
Methods like
Bulk.prototype.query
andSObject.prototype.upsertBulk
are very helpful in that they abstract away a lot of the complexity of creating jobs and batches, dealing with certain events and streaming logic, etc.Ideally, the developer should never be obligated to think of Salesforce record data in Node.js as anything other than simple arrays of JavaScript objects, and they should be able to use
async
/await
to send and receive those arrays.But there are some critical gaps to achieving this ideal state:
Bulk.prototype.query
doesn't return a promise ofPromise<Record[]>
but aParsable<Record>
. You have to listen for arecord
event on that object, which takes us away from simpleasync
/await
statements.Promise<Record[]>
orPromise<BulkIngestBatchResult>
. You have to deal with polling, events and streaming to get your results back.These abstract methods were built for simplicity, and they have amazing potential. They just need a few extra API features to be more broadly useful:
Record[]
responsejobId
(or maybe the entireJobInfo
?) as soon as the job is opened. ThisjobId
can be stored in a local file, a database, etc. for later use in case the Node process times out or crashes.jobId
alone -- resume polling on the job and return data similar to their corresponding methods for starting those jobs.Some sample TypeScript code to illustrate usage:
The text was updated successfully, but these errors were encountered: