Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running prune2df #138

Closed
jpromeror opened this issue Feb 11, 2020 · 3 comments
Closed

Error when running prune2df #138

jpromeror opened this issue Feb 11, 2020 · 3 comments

Comments

@jpromeror
Copy link

jpromeror commented Feb 11, 2020

Hello,

Im currently running your pipeline in a dataset (~15K cells) and everything was running perfectly until I tried to run the prune2df function. I've checked the other issues related with this function and haven't been able to solve the problem.

There are 19812 genes in the expression matrix.

Here are the variables and the call to the function

dbs
[FeatherRankingDatabase(name="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr"), FeatherRankingDatabase(name="hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr")]`

len(modules)
8432

MOTIF_ANNOTATIONS_FNAME
'/home/jpromero/Data/PyScenic/Resources/motifs-v9-nr.hgnc-m0.001-o0.0.tbl'

df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME, num_workers=6)

There are a lot of warnings that appear when running the script (just to show an example):

2020-02-11 13:27:32,745 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for RELB could be mapped to hg38__refseq-r80__10kb_up_and_down_tss.mc9nr. Skipping this module.

And at some point it throws the following error/warning

Traceback (most recent call last):
File "", line 1, in
File "/home/jpromero/PythonLib/pyscenic/prune.py", line 351, in prune2df
num_workers, module_chunksize)
File "/home/jpromero/PythonLib/pyscenic/prune.py", line 300, in _distributed_calc
return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
File "/home/jpromero/PythonLib/dask/base.py", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/jpromero/PythonLib/dask/base.py", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/jpromero/PythonLib/dask/multiprocessing.py", line 222, in get
**kwargs
File "/home/jpromero/PythonLib/dask/local.py", line 486, in get_async
raise_exception(exc, tb)
File "/home/jpromero/PythonLib/dask/local.py", line 316, in reraise
raise exc
File "/home/jpromero/PythonLib/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/home/jpromero/PythonLib/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/home/jpromero/PythonLib/dask/dataframe/utils.py", line 657, in check_meta
check_matching_columns(meta, x)
File "/home/jpromero/PythonLib/dask/dataframe/utils.py", line 682, in check_matching_columns
" Missing: %s" % (extra, missing)
ValueError: The columns in the computed data do not match the columns in the provided metadata
Extra: []
Missing: []

After that, it continues for a while and then just suddenly stops. I've tried increasing the memory, but that doesn't seem to fix the problem. Is there anything I am missing or not seeing?

Thanks in advance!

jp

@smanne07
Copy link

Hi,
I get the same error message when running prune2df on mm9.
Is there any possible reason this is happening?

Best regards
Sasi

@zehualilab
Copy link

It seems that downgrading dask==1.0.0 and distributed==1.28.1 solved the same problem I met as above.

@cflerin
Copy link
Contributor

cflerin commented May 18, 2020

This looks like a Dask version issue. See #163 for suggestions

@cflerin cflerin closed this as completed May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants