diff --git a/examples/BOILERPLATE.ipynb b/examples/BOILERPLATE.ipynb index e2196b7..142ddd5 100644 --- a/examples/BOILERPLATE.ipynb +++ b/examples/BOILERPLATE.ipynb @@ -1,6 +1,7 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -8,7 +9,7 @@ "\n", "---\n", "\n", - "**Objective:** The file provides a simple *boilerplate* to concentrate on what is necessary, and stop doing same tasks! The boilerplate is also configured with certain [**nbextensions**](https://gitlab.com/ZenithClown/computer-configurations-and-setups) that I personally use. Install them, if required, else ignore them as they do not participate in any type of code-optimizations. For any new project *edit* this file or `File > Make a Copy` to get started with the project. Some settings and configurations are already provided, as mentioned below." + "**Objective:** The file provides a simple *boilerplate* to concentrate on what is necessary, and stop doing same tasks (`DRY` - Don't Repeat Yourself)! The boilerplate is also configured with certain [**nbextensions**](https://gitlab.com/ZenithClown/computer-configurations-and-setups) that I personally use. Install them, if required, else ignore them as they do not participate in any type of code-optimizations. For any new project *edit* this file or `File > Make a Copy` to get started with the project. Some settings and configurations are already provided, as mentioned below. In addition, some user defined modules are available to import. Check `CHANGELOG.md` for more details, however specific *user-defined* imports maybe documented/versioned seperately. Any dependent [**`submodule(s)`**](https://www.atlassian.com/git/tutorials/git-submodule) is available under `../utilities/submodules` directory." ] }, { @@ -16,8 +17,8 @@ "execution_count": 1, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:54.170074Z", - "start_time": "2024-02-21T17:37:54.144014Z" + "end_time": "2024-08-29T11:55:51.958257Z", + "start_time": "2024-08-29T11:55:51.936102Z" } }, "outputs": [ @@ -30,13 +31,15 @@ } ], "source": [ - "# use the code release version for tracking and code modifications. use the\n", - "# CHANGELOG.md file to keep track of version features, and/or release notes.\n", - "# the version file is avaiable at project root directory, check the\n", - "# global configuration setting for root directory information.\n", - "# the file is already read and is available as `__version__`\n", - "__version__ = open(\"../VERSION\", \"rt\").read() # bump codecov\n", - "print(f\"Current Code Version: {__version__}\") # TODO : author, contact" + "# show current code version using https://semver.org/ convention\n", + "# version release information is also available under CHANGELOG.md\n", + "__version__ = open(\"../VERSION\", 'rt').read() # bump codecov\n", + "print(f\"Current Code Version: {__version__}\")\n", + "\n", + "# the author name is skipped, however copywright is provided as such\n", + "# commit level author is available on git commits, and details can be setup\n", + "# the template repository is designed to keep code simple, create or edit copyright\n", + "__copyright__ = \"Copywright © 2023 Debmalya Pramanik\"" ] }, { @@ -61,15 +64,15 @@ "execution_count": 2, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:54.185266Z", - "start_time": "2024-02-21T17:37:54.173225Z" + "end_time": "2024-08-29T11:55:51.973432Z", + "start_time": "2024-08-29T11:55:51.960257Z" } }, "outputs": [], "source": [ "import os # miscellaneous os interfaces\n", "import sys # configuring python runtime environment\n", - "import time # library for time manipulation, and logging" + "# import time # library for time manipulation, and logging" ] }, { @@ -77,8 +80,8 @@ "execution_count": 3, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:54.200706Z", - "start_time": "2024-02-21T17:37:54.187380Z" + "end_time": "2024-08-29T11:55:51.988623Z", + "start_time": "2024-08-29T11:55:51.974330Z" } }, "outputs": [], @@ -93,8 +96,8 @@ "execution_count": 4, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:54.215817Z", - "start_time": "2024-02-21T17:37:54.203320Z" + "end_time": "2024-08-29T11:55:52.003674Z", + "start_time": "2024-08-29T11:55:51.990610Z" } }, "outputs": [], @@ -104,10 +107,26 @@ "# from uuid import uuid4 as UUID # unique identifier for objs" ] }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:55:52.019705Z", + "start_time": "2024-08-29T11:55:52.004675Z" + } + }, + "outputs": [], + "source": [ + "# import warnings # module for warnings management" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ + "### Code Debugging & Logging\n", + "\n", "[**`logging`**](https://docs.python.org/3/howto/logging.html) is a standard python module that is meant for tracking any events that happen during any software/code operations. This module is super powerful and helpful for code debugging and other purposes. The next section defines a `logging` configuration in **`../logs/`** directory. Modify the **`LOGS_DIR`** variable under *Global Arguments* to change the default directory. The module is configured with a simplistic approach, such that any `print())` statement can be update to `logging.LEVEL_NAME()` and the code will work. Use logging operations like:\n", "\n", "```python\n", @@ -123,11 +142,11 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:54.231028Z", - "start_time": "2024-02-21T17:37:54.218114Z" + "end_time": "2024-08-29T11:55:52.034985Z", + "start_time": "2024-08-29T11:55:52.020708Z" } }, "outputs": [], @@ -146,25 +165,20 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:56.762665Z", - "start_time": "2024-02-21T17:37:54.234767Z" + "end_time": "2024-08-29T11:55:53.612283Z", + "start_time": "2024-08-29T11:55:52.035971Z" } }, "outputs": [], "source": [ + "# import swifter # https://github.com/jmcarpenter2/swifter\n", "import numpy as np\n", "import pandas as pd\n", - "import seaborn as sns\n", - "import matplotlib.pyplot as plt\n", "\n", "%precision 3\n", - "%matplotlib inline\n", - "sns.set_style('whitegrid');\n", - "# plt.style.use('default-style'); # http://tinyurl.com/mpl-default-style\n", - "\n", "pd.set_option('display.max_rows', 50) # max. rows to show\n", "pd.set_option('display.max_columns', 17) # max. cols to show\n", "np.set_printoptions(precision = 3, threshold = 15) # set np options\n", @@ -173,11 +187,30 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:55:55.678849Z", + "start_time": "2024-08-29T11:55:53.613296Z" + } + }, + "outputs": [], + "source": [ + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "%matplotlib inline\n", + "sns.set_style('whitegrid');\n", + "# plt.style.use('default-style'); # http://tinyurl.com/mpl-default-style" + ] + }, + { + "cell_type": "code", + "execution_count": 9, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:37:56.778531Z", - "start_time": "2024-02-21T17:37:56.763532Z" + "end_time": "2024-08-29T11:55:55.694228Z", + "start_time": "2024-08-29T11:55:55.679871Z" } }, "outputs": [], @@ -190,20 +223,22 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 10, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:21.876118Z", - "start_time": "2024-02-21T17:38:14.564185Z" - } + "end_time": "2024-08-29T11:56:02.051762Z", + "start_time": "2024-08-29T11:55:55.696321Z" + }, + "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Tensorflow Version: 2.9.0\n", - "GPU Computing Available. EXPERIMENTAL : {'device_name': 'NVIDIA GeForce GTX 1650', 'compute_capability': (7, 5)}\n" + "Tensorflow Version: 2.12.0\n", + "GPU Computing Not Available. If `GPU` is present, check configuration. Detected Devices:\n", + " > [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]\n" ] } ], @@ -228,6 +263,27 @@ " print(\" > \", tf.config.list_physical_devices())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Additional Libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:56:02.067087Z", + "start_time": "2024-08-29T11:56:02.054824Z" + } + }, + "outputs": [], + "source": [ + "# import xlwings as xw # https://www.xlwings.org/" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -248,44 +304,89 @@ "echo %VARNAME%\n", "```\n", "\n", - "Once you've setup your system with [`PYTHONPATH`](https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html) as per [*python documentation*](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH) is an important directory where any `import` statements looks for based on their order of importance. If a source code/module is not available check necessary environment variables and/or ask the administrator for the source files. For testing purpose, the module boasts the use of `src`, `utils` and `config` directories. However, these directories are available at `ROOT` level, and thus using `sys.path.append()` to add directories while importing.\n", - "\n", - "**Getting Started** with **`submodules`**\n", - "\n", - "A [`submodule`](https://git-scm.com/book/en/v2/Git-Tools-Submodules) provides functionality to integrate a seperate project in the current repository - this is typically useful to remove code-duplicacy and central repository to control dependent modules. More information on initializing and using submodule is available [here](https://www.youtube.com/watch?v=gSlXo2iLBro). Check [Github-GISTS/ZenithClown](https://gist.github.com/ZenithClown) for more information." + "Once you've setup your system with [`PYTHONPATH`](https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html) as per [*python documentation*](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH) is an important directory where any `import` statements looks for based on their order of importance. If a source code/module is not available check necessary environment variables and/or ask the administrator for the source files. For testing purpose, the module boasts the use of `src`, `utils` and `config` directories. However, these directories are available at `ROOT` level, and thus using `sys.path.append()` to add directories while importing." ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 12, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:03.157445Z", - "start_time": "2024-02-21T17:38:03.142400Z" + "end_time": "2024-08-29T11:56:02.082534Z", + "start_time": "2024-08-29T11:56:02.068672Z" } }, "outputs": [], "source": [ "# append `src` and sub-modules to call additional files these directory are\n", "# project specific and not to be added under environment or $PATH variable\n", - "sys.path.append(os.path.join(\"..\", \"src\", \"agents\")) # agents for reinforcement modelling\n", - "sys.path.append(os.path.join(\"..\", \"src\", \"engine\")) # derivative engines for model control\n", - "sys.path.append(os.path.join(\"..\", \"src\", \"models\")) # actual models for decision making tools" + "# sys.path.append(os.path.join(\"..\", \"src\", \"agents\")) # agents for reinforcement modelling\n", + "# sys.path.append(os.path.join(\"..\", \"src\", \"engine\")) # derivative engines for model control\n", + "# sys.path.append(os.path.join(\"..\", \"src\", \"models\")) # actual models for decision making tools" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 13, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:03.329013Z", - "start_time": "2024-02-21T17:38:03.324996Z" + "end_time": "2024-08-29T11:56:02.097776Z", + "start_time": "2024-08-29T11:56:02.085321Z" } }, "outputs": [], "source": [ "# also append the `utilities` directory for additional helpful codes\n", - "sys.path.append(os.path.join(\"..\", \"utilities\"))" + "# sys.path.append(os.path.join(\"..\", \"utilities\"))\n", + "\n", + "# you may also want to append the `utilities/submodules` directory\n", + "# sys.path.append(os.path.join(\"..\", \"utilities\", \"submodules\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "DISCLAIMER: The following codes are designed and created by the author of this repository.\n", + "Please read the CODE OF CONDUCT and\n", + "CONTRIBUTING guidelines for more information.\n", + "
\n", + "\n", + "
\n", + "NOTE: More information on Alert Box is available here for Markdown/Jupyter Notebooks.\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:56:02.113713Z", + "start_time": "2024-08-29T11:56:02.099680Z" + } + }, + "outputs": [], + "source": [ + "# libraries hosted in pypi\n", + "# import nlpurify # natural language utility functions, https://pypi.org/project/nlpurify/\n", + "# import pandaswizard as pdw # wrapper function for the pandas, https://pypi.org/project/pandas-wizard/" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:56:02.159469Z", + "start_time": "2024-08-29T11:56:02.115701Z" + } + }, + "outputs": [], + "source": [ + "import sqlparser # https://gist.github.com/ZenithClown/3fc21f94cf9567003b153bcfca738f6d\n", + "import datetime_ as dt_ # https://gist.github.com/ZenithClown/d2dd294c5f528459e16b139c04c0b182" ] }, { @@ -299,11 +400,11 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 16, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:03.951382Z", - "start_time": "2024-02-21T17:38:03.937365Z" + "end_time": "2024-08-29T11:56:02.175470Z", + "start_time": "2024-08-29T11:56:02.160457Z" } }, "outputs": [], @@ -318,44 +419,41 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 17, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:04.292374Z", - "start_time": "2024-02-21T17:38:04.281394Z" + "end_time": "2024-08-29T11:56:02.205764Z", + "start_time": "2024-08-29T11:56:02.176457Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Code Execution Started on: Wed, Feb 21 2024\n" - ] - } - ], + "outputs": [], "source": [ "# long projects can be overwhelming, and keeping track of files, outputs and\n", "# saved models can be intriguing! to help this out, `today` can be used. for\n", "# instance output can be stored at `output//` etc.\n", "# `today` is so configured that it permits windows/*.nix file/directory names\n", - "today = dt.datetime.strftime(dt.datetime.strptime(time.ctime(), \"%a %b %d %H:%M:%S %Y\"), \"%a, %b %d %Y\")\n", - "print(f\"Code Execution Started on: {today}\") # only date, name of the sub-directory" + "\n", + "# also, if used, update the `OUTPUT_DIR` configuration as required\n", + "\n", + "# today = dt.datetime.strftime(dt.datetime.strptime(time.ctime(), \"%a %b %d %H:%M:%S %Y\"), \"%a, %b %d %Y\")\n", + "# print(f\"Code Execution Started on: {today}\") # only date, name of the sub-directory" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 18, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:04.870173Z", - "start_time": "2024-02-21T17:38:04.851885Z" + "end_time": "2024-08-29T11:56:02.220921Z", + "start_time": "2024-08-29T11:56:02.206750Z" } }, "outputs": [], "source": [ - "OUTPUT_DIR = os.path.join(ROOT, \"output\", today)\n", - "os.makedirs(OUTPUT_DIR, exist_ok = True) # create dir if not exist\n", + "OUTPUT_DIR = os.path.join(ROOT, \"output\")\n", + "\n", + "# OUTPUT_DIR = os.path.join(ROOT, \"output\", today)\n", + "# os.makedirs(OUTPUT_DIR, exist_ok = True) # create dir if not exist\n", "\n", "# also create directory for `logs`\n", "# LOGS_DIR = os.path.join(ROOT, \"logs\", open(\"../VERSION\", 'rt').read())\n", @@ -364,11 +462,11 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 19, "metadata": { "ExecuteTime": { - "end_time": "2024-02-21T17:38:05.773540Z", - "start_time": "2024-02-21T17:38:05.756542Z" + "end_time": "2024-08-29T11:56:02.236922Z", + "start_time": "2024-08-29T11:56:02.221927Z" } }, "outputs": [], @@ -389,6 +487,44 @@ "\n", "A typical machine learning project revolves around six important stages (as available in [Amazon ML Life Cycle Documentation](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/well-architected-machine-learning-lifecycle.html)). This notebook boilerplate can be used to understand the data file, perform statitical tests and other EDA as required for any AI/ML application. Later, using the below study a *full-fledged* application can be generated using other sections of the boilerplate." ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reporting & End Note(s)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "ExecuteTime": { + "end_time": "2024-08-29T11:56:02.252956Z", + "start_time": "2024-08-29T11:56:02.238910Z" + } + }, + "outputs": [], + "source": [ + "# wb = xw.Book(os.path.join(ROOT, \"template\", \"template.xlsx\"))\n", + "\n", + "# # populate the sheets with sheet selection, and defining output cell, like:\n", + "# wb.sheets[\"sheet\"][\"cell\"].options(header = False, index = False).value = data\n", + "\n", + "# # finally, close and save the object like:\n", + "# outfile = f\"[{dt.datetime.now().date()} #{str(UUID()).upper()[:3]}] Output File Name.xlsx\"\n", + "# print(f\"Output File Generated as: {outfile}\")\n", + "\n", + "# wb.save(os.path.join(OUTPUT_DIR, outfile))\n", + "# wb.close()" + ] } ], "metadata": { @@ -408,7 +544,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.13" + "version": "3.10.9" }, "latex_envs": { "LaTeX_envs_menu_present": true,