-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix XBRL extraction clobber #3026
Changes from 1 commit
90ade22
3318601
5a4472f
67e14b4
200a3ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
import io | ||
from datetime import date | ||
from pathlib import Path | ||
from urllib.parse import urlparse | ||
|
||
from dagster import Field, Noneable, op | ||
from ferc_xbrl_extractor.cli import run_main | ||
|
@@ -85,12 +86,17 @@ def xbrl2sqlite(context) -> None: | |
logger.info(f"Dataset ferc{form}_xbrl is disabled, skipping") | ||
continue | ||
|
||
sql_path = Path(urlparse(PudlPaths().sqlite_db(f"ferc{form.value}_xbrl")).path) | ||
|
||
if clobber: | ||
sql_path.unlink(missing_ok=True) | ||
|
||
convert_form( | ||
settings, | ||
form, | ||
datastore, | ||
output_path=output_path, | ||
clobber=clobber, | ||
sql_path=sql_path, | ||
batch_size=batch_size, | ||
workers=workers, | ||
) | ||
|
@@ -101,7 +107,7 @@ def convert_form( | |
form: XbrlFormNumber, | ||
datastore: FercXbrlDatastore, | ||
output_path: Path, | ||
clobber: bool, | ||
sql_path: Path, | ||
jdangerx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
batch_size: int | None = None, | ||
workers: int | None = None, | ||
) -> None: | ||
|
@@ -128,10 +134,8 @@ def convert_form( | |
|
||
run_main( | ||
instance_path=filings_archive, | ||
sql_path=PudlPaths() | ||
.sqlite_db(f"ferc{form.value}_xbrl") | ||
.removeprefix("sqlite:///"), # Temp hacky solution | ||
clobber=clobber, | ||
sql_path=sql_path, | ||
clobber=False, # if we set clobber=True, clobbers on *every* call to run_main | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like this makes the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously when I failed to clobber, the XBRL extractor would write duplicate data - when I ran There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, the database gets deleted at the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DELETED. Strongbad would be proud. I guess the other half of the logic we've previously associated with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the current behavior though. If you don't say clobber and there is an existing DB what happens? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the past if you don't say clobber and there's an existing DB, we actually just append to the existing DB. Which is pretty bad. I'll add a quick check to just bail out if there's an existing DB instead. |
||
taxonomy=taxonomy_archive, | ||
entry_point=taxonomy_entry_point, | ||
form_number=form.value, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay so this is now the guy that removes the db and we always run
clobber=True
in pudl when we call the extractor'srun_main
bc each run is per year. cool cool!