Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

efrister · 2020-01-17T20:01:33Z

Hi,

I hope this is the right "forum" for this question; if not, please feel free to point me to the correct one and I'll post this question there.

I need to replicate attributes from one collection to another collection; however, the number of records that are affected are > 10.000. I was having problems before with Firebase and lots of concurrent writes to the same collection. I was wondering if this library is addressing this problem; from reading the source, it seems all update queries are triggered in parallel.
Do you have any thoughts or plans for this?

anishkny · 2020-01-18T06:05:37Z

Hi @efrister this is the correct forum don't worry 😄

Yes you are right - the library fires off a bunch of writes in parallel. It seems that Firestore has a limit of max 10k writes/s per database.

There are a few strategies I can think of to mitigate the issue:

Paginate affected documents - batch up writes - throttle
Retry writes to overcome possible rejections - maybe with exponential backoff
Use some sort of pub/sub queue to throttle writes - if latency is not an issue

What are your thoughts?

efrister · 2020-01-18T14:37:34Z

Hey @anishkny, thanks for your reply!

I was thinking along the lines of "1. Paginate affected documents"; that has worked well before when doing custom schema migrations to those collections - though retries will probably have to happen anyway to make sure failed attempts / rejections don't get dropped.
Would you prefer this "edge case" be taken care off outside of your library, or as part of it? If having this feature as part of Integrify fits your vision of where you want this lib to go, I can try my hands at a pull request to add this feature.
I am a bit concerned about the 500 writes/s limit to collections with sequential values, though. Not sure how to best "auto-detect" that and adjust the batch size accordingly; from the top of my head I'd probably check the number of affected rows, and if it exceeds the 10k limit, start batching away. But that would still get rejected for collections with sequential values - those would have to be batched a lot earlier.

What do you think?

efrister · 2020-01-18T15:00:29Z

Addendum: just saw that apparently retries should work out of the box using the SDK:
https://firebase.google.com/docs/firestore/best-practices#transactions_retries

It might suffice to just chunk up the updates into batches of 500, and fire those off at the same time - if we're lucky, Firebase takes care of the rest, without having to explicitly retry or throttle. Definitely worth a try before bloating the code.

anishkny · 2020-01-18T19:20:51Z

So I like both your ideas. How about this:

Short term - enhance integrify to batch writes into 500 a time (I'll accept PR for it if you want to take a stab or I can do it - see #43).

Long term - in the spirit of not bloating integrify - perhaps there is room for a new library that accepts a bunch of write requests for Firestore and does the "right" thing including batching, retries etc. Then integrify would use this library for actually performing the writes.

WDYT?

efrister · 2020-01-20T12:58:26Z

Sounds like a plan to me. I can take a stab at the PR until the end of this week and submit it as part of #43.

anishkny · 2020-01-25T18:22:34Z

@efrister coincidentally, I faced this same issue in another project (more than 500 writes/sec with consecutive keys) - I managed to address that using throttled-queue package. You could consider using that here too.

anishkny · 2020-01-26T04:07:15Z

Assigned you to this issue since GH wont let me assign you to #43. Closing the other one - lets use this one to track. Thanks!

efrister · 2020-01-30T15:03:09Z

@anishkny I took a stab at implementing this feature here:
https://github.com/efrister/integrify/tree/feature/batch-writes

However, I'm having the toughest time testing this and struggled to try this out on Firebase without pushing this to an npm repository. I think I'm misusing the firebase shell somehow; my integrify() hooks don't fire when I'm using it.
So what you see in the branch is really an untested approach; I wanted to get this out to you before too long so you can give some feedback. I'll try to figure out how I can test my cloned repository properly. My last idea is to just run "npm run build" and commit the "lib" folder so that I can actually reference the GitHub Repository in my package.json, and to remove that again before submitting the PR. If you have any pointers or a good reference on how to efficiently test, or even an insight into your own dev flow, I'd highly appreciate it. Sorry for the delay.

anishkny · 2020-02-01T06:12:33Z

Thanks for taking a shot!

To test locally, you will need to setup a Firebase project, generate a service account JSON, calculate its base64 value and set env var SERVICE_ACCOUNT_KEY_BASE64. Then you can run npm test.

To test deployed version, I think you can deploy test/functions/index.js using firebase CLI and it should deploy your code. You can ensure your version id deployed by putting a console.log in there?

anishkny mentioned this issue Jan 18, 2020

Chunk writes to batches of 500 #43

Closed

anishkny assigned efrister Jan 26, 2020

anishkny added the enhancement New feature or request label Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

efrister commented Jan 17, 2020

anishkny commented Jan 18, 2020

efrister commented Jan 18, 2020

efrister commented Jan 18, 2020

anishkny commented Jan 18, 2020

efrister commented Jan 20, 2020

anishkny commented Jan 25, 2020

anishkny commented Jan 26, 2020

efrister commented Jan 30, 2020

anishkny commented Feb 1, 2020

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

Comments

efrister commented Jan 17, 2020

anishkny commented Jan 18, 2020

efrister commented Jan 18, 2020

efrister commented Jan 18, 2020

anishkny commented Jan 18, 2020

efrister commented Jan 20, 2020

anishkny commented Jan 25, 2020

anishkny commented Jan 26, 2020

efrister commented Jan 30, 2020

anishkny commented Feb 1, 2020