Error when running prune2df #138

jpromeror · 2020-02-11T13:10:02Z

Hello,

Im currently running your pipeline in a dataset (~15K cells) and everything was running perfectly until I tried to run the prune2df function. I've checked the other issues related with this function and haven't been able to solve the problem.

There are 19812 genes in the expression matrix.

Here are the variables and the call to the function

dbs
[FeatherRankingDatabase(name="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr"), FeatherRankingDatabase(name="hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr")]`

len(modules)
8432

MOTIF_ANNOTATIONS_FNAME
'/home/jpromero/Data/PyScenic/Resources/motifs-v9-nr.hgnc-m0.001-o0.0.tbl'

df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME, num_workers=6)

There are a lot of warnings that appear when running the script (just to show an example):

2020-02-11 13:27:32,745 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for RELB could be mapped to hg38__refseq-r80__10kb_up_and_down_tss.mc9nr. Skipping this module.

And at some point it throws the following error/warning

Traceback (most recent call last):
File "", line 1, in
File "/home/jpromero/PythonLib/pyscenic/prune.py", line 351, in prune2df
num_workers, module_chunksize)
File "/home/jpromero/PythonLib/pyscenic/prune.py", line 300, in _distributed_calc
return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
File "/home/jpromero/PythonLib/dask/base.py", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/jpromero/PythonLib/dask/base.py", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/jpromero/PythonLib/dask/multiprocessing.py", line 222, in get
**kwargs
File "/home/jpromero/PythonLib/dask/local.py", line 486, in get_async
raise_exception(exc, tb)
File "/home/jpromero/PythonLib/dask/local.py", line 316, in reraise
raise exc
File "/home/jpromero/PythonLib/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/home/jpromero/PythonLib/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/home/jpromero/PythonLib/dask/dataframe/utils.py", line 657, in check_meta
check_matching_columns(meta, x)
File "/home/jpromero/PythonLib/dask/dataframe/utils.py", line 682, in check_matching_columns
" Missing: %s" % (extra, missing)
ValueError: The columns in the computed data do not match the columns in the provided metadata
Extra: []
Missing: []

After that, it continues for a while and then just suddenly stops. I've tried increasing the memory, but that doesn't seem to fix the problem. Is there anything I am missing or not seeing?

Thanks in advance!

jp

The text was updated successfully, but these errors were encountered:

smanne07 · 2020-03-12T02:25:00Z

Hi,
I get the same error message when running prune2df on mm9.
Is there any possible reason this is happening?

Best regards
Sasi

zehualilab · 2020-03-27T01:52:35Z

It seems that downgrading dask==1.0.0 and distributed==1.28.1 solved the same problem I met as above.

cflerin · 2020-05-18T16:02:31Z

This looks like a Dask version issue. See #163 for suggestions

TobiTekath mentioned this issue Feb 17, 2020

prune2df warning messages #106

Closed

cflerin closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running prune2df #138

Error when running prune2df #138

jpromeror commented Feb 11, 2020 •

edited

Loading

smanne07 commented Mar 12, 2020

zehualilab commented Mar 27, 2020

cflerin commented May 18, 2020

Error when running prune2df #138

Error when running prune2df #138

Comments

jpromeror commented Feb 11, 2020 • edited Loading

smanne07 commented Mar 12, 2020

zehualilab commented Mar 27, 2020

cflerin commented May 18, 2020

jpromeror commented Feb 11, 2020 •

edited

Loading