Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected partition of type DataFrame but got NoneType #117

Closed
anlin00007 opened this issue Dec 16, 2019 · 4 comments
Closed

Expected partition of type DataFrame but got NoneType #117

anlin00007 opened this issue Dec 16, 2019 · 4 comments

Comments

@anlin00007
Copy link

Hello,

I have been trying SCENIC on our data and got the error message mentioned in the title. The dataset I am using is a 22057 * 17057 matrix processed by scanpy package. If I use the whole matrix then it give me the error, however, if I only use part of the matrix (10000*17057), then it can be finished. At first I thought it is memory issue, but I am using a cluster with 128Gb memory and SCENIC only takes 32Gb before it shows the error. Thus, I have two questions here:

  1. Please advice me on how to solve this error
  2. If I chop the matrix into pieces and run GRNboost on each piece to get regulon and then take the union and run aucell. Does it equal to run GRNboost on whole matrix and then apply resulted regulon to aucell?

The detailed code and error message is shown below.

Thanks

data_expr_all = pd.DataFrame(adata.X.toarray(), index=adata.obs.index, columns=adata.var.index)
adjacencies = grnboost2(data_expr_all, tf_names=tf_names, verbose=True)
 preparing dask client
parsing input
creating dask graph
/usr/local/lib/python3.6/dist-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  expression_matrix = expression_data.as_matrix()
4 partitions
computing dask graph
shutting down client and local cluster
finished
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-33af1126a435> in <module>
      1 data_expr_all = pd.DataFrame(adata.X.toarray(), index=adata.obs.index, columns=adata.var.index)
----> 2 adjacencies = grnboost2(data_expr_all, tf_names=tf_names, verbose=True)
      3 modules = list(modules_from_adjacencies(adjacencies, data_expr_all))
      4 # Calculate a list of enriched motifs and the corresponding target genes for all modules.
      5 with ProgressBar():

/usr/local/lib/python3.6/dist-packages/arboreto/algo.py in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
     39     return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
     40                gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address,
---> 41                early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)
     42 
     43 

/usr/local/lib/python3.6/dist-packages/arboreto/algo.py in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
    133 
    134         return client \
--> 135             .compute(graph, sync=True) \
    136             .sort_values(by='importance', ascending=False)
    137 

/usr/local/lib/python3.6/dist-packages/distributed/client.py in compute(self, collections, sync, optimize_graph, workers, allow_other_workers, resources, retries, priority, fifo_timeout, actors, **kwargs)
   2756 
   2757         if sync:
-> 2758             result = self.gather(futures)
   2759         else:
   2760             result = futures

/usr/local/lib/python3.6/dist-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous)
   1820                 direct=direct,
   1821                 local_worker=local_worker,
-> 1822                 asynchronous=asynchronous,
   1823             )
   1824 

/usr/local/lib/python3.6/dist-packages/distributed/client.py in sync(self, func, *args, **kwargs)
    751             return future
    752         else:
--> 753             return sync(self.loop, func, *args, **kwargs)
    754 
    755     def __repr__(self):

/usr/local/lib/python3.6/dist-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
    329             e.wait(10)
    330     if error[0]:
--> 331         six.reraise(*error[0])
    332     else:
    333         return result[0]

~/.local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

/usr/local/lib/python3.6/dist-packages/distributed/utils.py in f()
    314             if timeout is not None:
    315                 future = gen.with_timeout(timedelta(seconds=timeout), future)
--> 316             result[0] = yield future
    317         except Exception as exc:
    318             error[0] = sys.exc_info()

~/.local/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/.local/lib/python3.6/site-packages/tornado/gen.py in run(self)
    740                     if exc_info is not None:
    741                         try:
--> 742                             yielded = self.gen.throw(*exc_info)  # type: ignore
    743                         finally:
    744                             # Break up a reference to itself

/usr/local/lib/python3.6/dist-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1651                             six.reraise(CancelledError, CancelledError(key), None)
   1652                         else:
-> 1653                             six.reraise(type(exception), exception, traceback)
   1654                     if errors == "skip":
   1655                         bad_keys.add(key)

~/.local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/dask/dataframe/utils.py in check_meta()
    519     raise ValueError("Metadata mismatch found%s.\n\n"
    520                      "%s" % ((" in `%s`" % funcname if funcname else ""),
--> 521                              errmsg))
    522 
    523 

ValueError: Metadata mismatch found in `from_delayed`.

Expected partition of type `DataFrame` but got `NoneType`

distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process 3142 was killed by unknown signal
distributed.nanny - WARNING - Worker process 3145 was killed by unknown signal
distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process 3147 was killed by unknown signal
@JPcerapio
Copy link

Hello, did you solved your problem?. Because I'm having the same exactly error message.
Thanks,
Pablo

@franciscogrisanti
Copy link

Me too! @anlin00007 @JPcerapio @aertslab

Any idea on the solution?

@sameelab
Copy link

sameelab commented May 1, 2020

me too! :( Hi Francisco! Olga

@cflerin
Copy link
Contributor

cflerin commented May 2, 2020

Hi @anlin00007 , @JPcerapio , @franciscogrisanti , @sameelab ,

I think your issue could be solved with one of the suggestions in #163 . Feel free to re-open or leave another comment if none of these options work for you.

@cflerin cflerin closed this as completed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants