Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register DAG #692

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Register DAG #692

wants to merge 6 commits into from

Conversation

spencerseale
Copy link
Collaborator

@spencerseale spencerseale commented Dec 30, 2024

Adds functionality to register a DAG instance:

from tiledb.cloud.dag import DAG

graph = DAG(name="dag-testing-tg-deps-2", namespace="spencerseale")

stagea = graph.submit(func=lambda x: x)
stageb = graph.submit(func=lambda x: x, args=(stagea,))
stagec = graph.submit(func=lambda x: x, args=(stagea,))
staged = graph.submit(func=lambda x, y: (x, y), args=(stageb, stagec,))

graph.register()

# OR

from tiledb.cloud.taskgraphs.registration import register

register(
    graph=graph,
    namespace="spencerseale"
)

The DAG.register method is to be prioritized, but also usable with registration.register as that's what's used in the DAG.register method.

Open question about type hinting

I need to update type hint in tiledb.cloud.taskgraphs.registration.register to graph: Union[tgbuilder, DAG]. So I then add the import to that module from tiledb.cloud.dag.dag import DAG. But it's interesting, when I go to do that using my editable install and then run:

python -m src.tiledb.cloud.dag.dag

I get:

(tiledb-cloud-py-py39) spencerseale 12:27 PM $ TileDB-Cloud-Py python -m src.tiledb.cloud.dag.dag

Traceback (most recent call last):
  File "/Users/spencerseale/miniconda3/envs/tiledb-cloud-py-py39/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/spencerseale/miniconda3/envs/tiledb-cloud-py-py39/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/spencerseale/tiledb_apis/TileDB-Cloud-Py/src/tiledb/cloud/__init__.py", line 3, in <module>
    from . import array
  File "/Users/spencerseale/tiledb_apis/TileDB-Cloud-Py/src/tiledb/cloud/array.py", line 7, in <module>
    from . import client
  File "/Users/spencerseale/tiledb_apis/TileDB-Cloud-Py/src/tiledb/cloud/client.py", line 13, in <module>
    import tiledb.cloud._common.api_v2.models as models_v2
ImportError: cannot import name 'cloud' from 'tiledb' (/Users/spencerseale/miniconda3/envs/tiledb-cloud-py-py39/lib/python3.9/site-packages/tiledb/__init__.py)

I get the same stack trace when I update the module tiledb.cloud.taskgraphs.registration to import anything. I think this is the way we have cloud-py wired up and how the Py system path is getting created. Switching to poetry or something similar would fix the dev experience a lot so the current dev library is installed into the system path as a Python library to simulate how cloud-py will execute as an installed library.

@sgillies any ideas on this?

@spencerseale spencerseale self-assigned this Dec 30, 2024
@spencerseale spencerseale marked this pull request as draft December 30, 2024 20:51
@spencerseale spencerseale removed the request for review from Shelnutt2 January 9, 2025 23:54
@spencerseale spencerseale removed their assignment Jan 10, 2025
@spencerseale spencerseale marked this pull request as ready for review January 10, 2025 00:55
Copy link
Collaborator

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerseale It's an excellent idea! I've got a suggestion and a question inline.

src/tiledb/cloud/dag/dag.py Outdated Show resolved Hide resolved
src/tiledb/cloud/dag/dag.py Show resolved Hide resolved
src/tiledb/cloud/dag/dag.py Outdated Show resolved Hide resolved
@sgillies
Copy link
Collaborator

sgillies commented Jan 16, 2025

@spencerseale if registration.py imports dag.py and dag.py imports registration.py we have a circular import, no? Let's avoid doing that, especially if only to have perfect type hints. I think it's okay to relax the typing for registration.register() and use graph: Any for now.

But if we do want better type hints without a circular import... registration.register() can actually accept any argument that has a _tdb_to_json() method. So, we could create an abstract base class (in a new module) with that method and register DAG with the ABC, or we could use Python structural typing.

In general, I'm not concerned about whether python -m src.tiledb.cloud.dag.dag runs or not. We're not using it like that in production.

@sgillies
Copy link
Collaborator

@spencerseale about test failures: tiledb versions 0.33.1 and 0.33.2 broke the CloudArray interface. If you see errors like AttributeError: 'SparseArrayImpl' object has no attribute 'apply' (for example) in our autobuild jobs, they aren't caused by your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants