Skip to content

Commit

Permalink
add jupyter-structured scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
lingjzhu committed May 11, 2023
1 parent 966a101 commit 588df43
Show file tree
Hide file tree
Showing 4 changed files with 848 additions and 0 deletions.
17 changes: 17 additions & 0 deletions preprocessing/jupyter-structured/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Creating Jupyter-structured dataset

### Step 1
Parse Jupyter notebooks from `the Stack`.
```
python jupyter-segment-notebooks.py
```

### Step 2
Generate markdown-code-output triplets.
```
python jupyter-generate-triplets.py
```

### Step 3
Create notebook-level structured dataset using `jupyter-structured.ipynb`.

Loading

0 comments on commit 588df43

Please sign in to comment.