Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

Open
efrister opened this issue Jan 17, 2020 · 9 comments
Open

Question: 'REPLICATE_ATTRIBUTES' for > 10.000 records #42

efrister opened this issue Jan 17, 2020 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@efrister
Copy link

Hi,

I hope this is the right "forum" for this question; if not, please feel free to point me to the correct one and I'll post this question there.

I need to replicate attributes from one collection to another collection; however, the number of records that are affected are > 10.000. I was having problems before with Firebase and lots of concurrent writes to the same collection. I was wondering if this library is addressing this problem; from reading the source, it seems all update queries are triggered in parallel.
Do you have any thoughts or plans for this?

@anishkny
Copy link
Owner

Hi @efrister this is the correct forum don't worry 😄

Yes you are right - the library fires off a bunch of writes in parallel. It seems that Firestore has a limit of max 10k writes/s per database.

There are a few strategies I can think of to mitigate the issue:

  1. Paginate affected documents - batch up writes - throttle
  2. Retry writes to overcome possible rejections - maybe with exponential backoff
  3. Use some sort of pub/sub queue to throttle writes - if latency is not an issue

What are your thoughts?

@efrister
Copy link
Author

Hey @anishkny, thanks for your reply!

I was thinking along the lines of "1. Paginate affected documents"; that has worked well before when doing custom schema migrations to those collections - though retries will probably have to happen anyway to make sure failed attempts / rejections don't get dropped.
Would you prefer this "edge case" be taken care off outside of your library, or as part of it? If having this feature as part of Integrify fits your vision of where you want this lib to go, I can try my hands at a pull request to add this feature.
I am a bit concerned about the 500 writes/s limit to collections with sequential values, though. Not sure how to best "auto-detect" that and adjust the batch size accordingly; from the top of my head I'd probably check the number of affected rows, and if it exceeds the 10k limit, start batching away. But that would still get rejected for collections with sequential values - those would have to be batched a lot earlier.

What do you think?

@efrister
Copy link
Author

Addendum: just saw that apparently retries should work out of the box using the SDK:
https://firebase.google.com/docs/firestore/best-practices#transactions_retries

It might suffice to just chunk up the updates into batches of 500, and fire those off at the same time - if we're lucky, Firebase takes care of the rest, without having to explicitly retry or throttle. Definitely worth a try before bloating the code.

@anishkny
Copy link
Owner

So I like both your ideas. How about this:

Short term - enhance integrify to batch writes into 500 a time (I'll accept PR for it if you want to take a stab or I can do it - see #43).

Long term - in the spirit of not bloating integrify - perhaps there is room for a new library that accepts a bunch of write requests for Firestore and does the "right" thing including batching, retries etc. Then integrify would use this library for actually performing the writes.

WDYT?

@efrister
Copy link
Author

Sounds like a plan to me. I can take a stab at the PR until the end of this week and submit it as part of #43.

@anishkny
Copy link
Owner

@efrister coincidentally, I faced this same issue in another project (more than 500 writes/sec with consecutive keys) - I managed to address that using throttled-queue package. You could consider using that here too.

@anishkny
Copy link
Owner

Assigned you to this issue since GH wont let me assign you to #43. Closing the other one - lets use this one to track. Thanks!

@efrister
Copy link
Author

@anishkny I took a stab at implementing this feature here:
https://github.com/efrister/integrify/tree/feature/batch-writes

However, I'm having the toughest time testing this and struggled to try this out on Firebase without pushing this to an npm repository. I think I'm misusing the firebase shell somehow; my integrify() hooks don't fire when I'm using it.
So what you see in the branch is really an untested approach; I wanted to get this out to you before too long so you can give some feedback. I'll try to figure out how I can test my cloned repository properly. My last idea is to just run "npm run build" and commit the "lib" folder so that I can actually reference the GitHub Repository in my package.json, and to remove that again before submitting the PR. If you have any pointers or a good reference on how to efficiently test, or even an insight into your own dev flow, I'd highly appreciate it. Sorry for the delay.

@anishkny
Copy link
Owner

anishkny commented Feb 1, 2020

Thanks for taking a shot!

To test locally, you will need to setup a Firebase project, generate a service account JSON, calculate its base64 value and set env var SERVICE_ACCOUNT_KEY_BASE64. Then you can run npm test.

To test deployed version, I think you can deploy test/functions/index.js using firebase CLI and it should deploy your code. You can ensure your version id deployed by putting a console.log in there?

@anishkny anishkny added the enhancement New feature or request label Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants