Skip to content

Commit

Permalink
Merge pull request #9 from ducdetronquito/feature/0.3.0-release
Browse files Browse the repository at this point in the history
Feature/0.3.0 release
  • Loading branch information
ducdetronquito authored Aug 8, 2019
2 parents 98caa67 + b9a4903 commit 5987cef
Show file tree
Hide file tree
Showing 15 changed files with 1,143 additions and 975 deletions.
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
__pycache__/*
*.pyc

# Virtual Environment
env/*

# Packaging
build/*
dist/*
scalpl.egg-info/*


# Mypy
.mypy_cache/*

Expand All @@ -18,3 +20,6 @@ htmlcov/*
# Tests
reddit.json
.cache/*

# IDE
.vscode/
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ install:
- pip3 install black

script:
- pytest tests.py
- black --check scalpl.py tests.py setup.py
- pytest
- black --check scalpl tests setup.py
93 changes: 45 additions & 48 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. image:: https://raw.githubusercontent.com/ducdetronquito/scalpl/master/scalpl.png
.. image:: https://raw.githubusercontent.com/ducdetronquito/scalpl/master/assets/scalpl.png
:target: https://github.com/ducdetronquito/scalpl

Scalpl
Expand All @@ -10,7 +10,7 @@ Scalpl
.. image:: https://img.shields.io/badge/coverage-100%25-green.svg
:target: #

.. image:: https://img.shields.io/badge/pypi-v0.2.6-blue.svg
.. image:: https://img.shields.io/badge/pypi-v0.3.0-blue.svg
:target: https://pypi.python.org/pypi/scalpl/

.. image:: https://travis-ci.org/ducdetronquito/scalpl.svg?branch=master
Expand Down Expand Up @@ -54,40 +54,34 @@ such as `Addict <https://github.com/mewwts/addict>`_ or
`Box <https://github.com/cdgriffith/Box>`_ , but if you give **Scalpl**
a try, you will find it:

* ⚡ Fast
* 🚀 Powerful as the standard dict API
* ⚡ Lightweight
* 👌 Well tested


Installation
~~~~~~~~~~~~

**Scalpl** is a Python3-only module that you can install via ``pip``
**Scalpl** is a Python3 library that you can install via ``pip``

.. code:: sh
pip3 install scalpl
Usage
~~~~~

**Scalpl** provides two classes that can wrap around your dictionaries:

- **LightCut**: a wrapper that handles operations on nested ``dict``.
- **Cut**: a wrapper that handles operations on nested ``dict`` and
that can cut accross ``list`` item.
**Scalpl** provides a simple class named **Cut** that wraps around your dictionary
and handles operations on nested ``dict`` and that can cut accross ``list`` item.

Usually, you will only need to use the ``Cut`` wrapper, but if you do
not need to operate through lists, you should work with the ``LightCut``
wrapper as its computation overhead is a bit smaller.

These two wrappers strictly follow the standard ``dict``
`API <https://docs.python.org/3/library/stdtypes.html#dict>`_, that
This wrapper strictly follows the standard ``dict``
`API <https://docs.python.org/3/library/stdtypes.html#dict>`_, which
means you can operate seamlessly on ``dict``,
``collections.defaultdict`` or ``collections.OrderedDict``.

``collections.defaultdict`` or ``collections.OrderedDict`` by using their methods
with dot-separated keys.

Let's see what it looks like with a toy dictionary ! 👇
Let's see what it looks like with an example ! 👇

.. code:: python
Expand Down Expand Up @@ -215,6 +209,7 @@ remove all keys.
proxy.clear()
Benchmark
~~~~~~~~~

Expand All @@ -223,56 +218,57 @@ of `Scalpl <https://github.com/ducdetronquito/scalpl>`_ compared to `Addict <htt
`Box <https://github.com/cdgriffith/Box>`_ and the built-in ``dict``.

It will summarize the *number of operations per second* that each library is
able to perform on the JSON dump of the `Python subreddit main page <https://www.reddit.com/r/Python.json>`_.
able to perform on a portion of the JSON dump of the `Python subreddit main page <https://www.reddit.com/r/Python.json>`_.

You can run this benchmark on your machine with the following command:

python3 ./performance_tests.py
python3 ./benchmarks/performance_comparison.py

Here are the results obtained on an Intel Core i5-7500U CPU (2.50GHz) with **Python 3.6.4**.

**Addict**::

instanciate:-------- 18,485 ops per second.
get:---------------- 18,806 ops per second.
get through list:--- 18,599 ops per second.
set:---------------- 18,797 ops per second.
set through list:--- 18,129 ops per second.
**Addict** 2.2.1::

instanciate:-------- 271,132 ops per second.
get:---------------- 276,090 ops per second.
get through list:--- 293,773 ops per second.
set:---------------- 300,324 ops per second.
set through list:--- 282,149 ops per second.


**Box**::
**Box** 3.4.2::

instanciate:--------- 4,150,396 ops per second.
get:----------------- 1,424,529 ops per second.
get through list:---- 110,926 ops per second.
set:----------------- 1,332,435 ops per second.
set through list:---- 110,833 ops per second.
instanciate:--------- 4,093,439 ops per second.
get:----------------- 957,069 ops per second.
get through list:---- 164,013 ops per second.
set:----------------- 900,466 ops per second.
set through list:---- 165,522 ops per second.


**Scalpl**::
**Scalpl** latest::

instanciate:-------- 136,517,371 ops per second.
get:---------------- 24,918,648 ops per second.
get through list:--- 12,624,630 ops per second.
set:---------------- 26,409,542 ops per second.
set through list:--- 13,765,265 ops per second.
instanciate:-------- 183,879,865 ops per second.
get:---------------- 14,941,355 ops per second.
get through list:--- 14,175,349 ops per second.
set:---------------- 11,320,968 ops per second.
set through list:--- 11,956,001 ops per second.


**dict**::

instanciate:--------- 92,119,547 ops per second.
get:----------------- 186,290,996 ops per second.
get through list:---- 178,747,154 ops per second.
set:----------------- 159,224,669 ops per second.
set through list :--- 79,294,520 ops per second.
instanciate:--------- 37,816,714 ops per second.
get:----------------- 84,317,032 ops per second.
get through list:---- 62,480,474 ops per second.
set:----------------- 146,484,375 ops per second.
set through list :--- 122,473,974 ops per second.


As a conclusion and despite being ~10 times slower than the built-in
``dict``, **Scalpl** is ~20 times faster than Box on simple read/write
operations, and ~100 times faster when it traverse lists. **Scalpl** is
also ~1300 times faster than Addict.
As a conclusion and despite being an order of magniture slower than the built-in
``dict``, **Scalpl** is faster than Box and Addict by an order of magnitude for any operations.
Besides, the gap increase in favor of **Scalpl** when wrapping large dictionaries.

However, do not trust benchmarks and test it on a real use-case.
Keeping in mind that this benchmark may vary depending on your use-case, it is very unlikely that
**Scalpl** will become a bottleneck of your application.


Frequently Asked Questions:
Expand All @@ -295,6 +291,7 @@ Frequently Asked Questions:
proxy = Cut(data)
proxy['it works perfectly'] = 'fine'


How to Contribute
~~~~~~~~~~~~~~~~~

Expand Down
File renamed without changes
168 changes: 168 additions & 0 deletions benchmarks/perfomance_comparison.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
from copy import deepcopy
import json
from timeit import timeit
import unittest

from scalpl import Cut

from addict import Dict
from box import Box
import requests


class TestDictPerformance(unittest.TestCase):
"""
Base class to test performance of different
dict wrapper regarding insertion and lookup.
"""

# We use a portion of the JSON dump of the Python Reddit page.

PYTHON_REDDIT = {
"kind": "Listing",
"data": {
"modhash": "",
"dist": 27,
"children": [
{
"kind": "t3",
"data": {
"approved_at_utc": None, "subreddit": "Python",
"selftext": "Top Level comments must be **Job Opportunities.**\n\nPlease include **Location** or any other **Requirements** in your comment. If you require people to work on site in San Francisco, *you must note that in your post.* If you require an Engineering degree, *you must note that in your post*.\n\nPlease include as much information as possible.\n\nIf you are looking for jobs, send a PM to the poster.",
"author_fullname": "t2_628u", "saved": False,
"mod_reason_title": None, "gilded": 0, "clicked": False, "title": "r/Python Job Board", "link_flair_richtext": [],
"subreddit_name_prefixed": "r/Python", "hidden": False, "pwls": 6, "link_flair_css_class": None, "downs": 0, "hide_score": False, "name": "t3_cmq4jj",
"quarantine": False, "link_flair_text_color": "dark", "author_flair_background_color": "", "subreddit_type": "public", "ups": 11, "total_awards_received": 0,
"media_embed": {}, "author_flair_template_id": None, "is_original_content": False, "user_reports": [], "secure_media": None,
"is_reddit_media_domain": False, "is_meta": False, "category": None, "secure_media_embed": {}, "link_flair_text": None, "can_mod_post": False,
"score": 11, "approved_by": None, "thumbnail": "", "edited": False, "author_flair_css_class": "", "author_flair_richtext": [], "gildings": {},
"content_categories": None, "is_self": True, "mod_note": None, "created": 1565124336.0, "link_flair_type": "text", "wls": 6, "banned_by": None,
"author_flair_type": "text", "domain": "self.Python",
"allow_live_comments": False,
"selftext_html": "&lt;!-- SC_OFF --&gt;&lt;div class=\"md\"&gt;&lt;p&gt;Top Level comments must be &lt;strong&gt;Job Opportunities.&lt;/strong&gt;&lt;/p&gt;\n\n&lt;p&gt;Please include &lt;strong&gt;Location&lt;/strong&gt; or any other &lt;strong&gt;Requirements&lt;/strong&gt; in your comment. If you require people to work on site in San Francisco, &lt;em&gt;you must note that in your post.&lt;/em&gt; If you require an Engineering degree, &lt;em&gt;you must note that in your post&lt;/em&gt;.&lt;/p&gt;\n\n&lt;p&gt;Please include as much information as possible.&lt;/p&gt;\n\n&lt;p&gt;If you are looking for jobs, send a PM to the poster.&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",
"likes": None, "suggested_sort": None, "banned_at_utc": None, "view_count": None, "archived": False, "no_follow": False, "is_crosspostable": False, "pinned": False,
"over_18": False, "all_awardings": [], "media_only": False, "can_gild": False, "spoiler": False, "locked": False, "author_flair_text": "reticulated",
"visited": False, "num_reports": None, "distinguished": None, "subreddit_id": "t5_2qh0y", "mod_reason_by": None, "removal_reason": None, "link_flair_background_color": "",
"id": "cmq4jj", "is_robot_indexable": True, "report_reasons": None, "author": "aphoenix", "num_crossposts": 0, "num_comments": 2, "send_replies": False, "whitelist_status": "all_ads",
"contest_mode": False, "mod_reports": [], "author_patreon_flair": False, "author_flair_text_color": "dark",
"permalink": "/r/Python/comments/cmq4jj/rpython_job_board/", "parent_whitelist_status": "all_ads", "stickied": True,
"url": "https://www.reddit.com/r/Python/comments/cmq4jj/rpython_job_board/", "subreddit_subscribers": 399170, "created_utc": 1565095536.0,
"discussion_type": None, "media": None, "is_video": False
}
}
]
}
}

namespace = {
'Wrapper': dict
}

def setUp(self):
self.data = deepcopy(self.PYTHON_REDDIT)
self.namespace.update(self=self)

def execute(self, statement, method):
n = 1000
time = timeit(statement, globals=self.namespace, number=n)
print(
'# ',
self.namespace['Wrapper'],
' - ',
method,
': ',
int(60 / (time/n)),
' ops per second.'
)

def test_init(self):
self.execute('Wrapper(self.data)', 'instanciate')

def test_getitem(self):
self.execute("Wrapper(self.data)['data']['modhash']", 'get')

def test_getitem_through_list(self):
statement = (
"Wrapper(self.data)['data']['children'][0]['data']['author']"
)
self.execute(statement, 'get through list')

def test_setitem(self):
statement = "Wrapper(self.data)['data']['modhash'] = 'dunno'"
self.execute(statement, 'set')

def test_setitem_through_list(self):
statement = (
"Wrapper(self.data)['data']['children'][0]"
"['data']['author'] = 'Captain Obvious'"
)
self.execute(statement, 'set through list')


class TestCutPerformance(TestDictPerformance):

namespace = {
'Wrapper': Cut
}

def test_getitem(self):
self.execute("Wrapper(self.data)['data.modhash']", 'get')

def test_getitem_through_list(self):
statement = (
"Wrapper(self.data)['data.children[0].data.author']"
)
self.execute(statement, 'get through list')

def test_setitem(self):
statement = "Wrapper(self.data)['data.modhash'] = 'dunno'"
self.execute(statement, 'set')

def test_setitem_through_list(self):
statement = (
"Wrapper(self.data)['data.children[0]"
".data.author'] = 'Captain Obvious'"
)
self.execute(statement, 'set through list')


class TestBoxPerformance(TestDictPerformance):

namespace = {
'Wrapper': Box
}

def test_getitem(self):
self.execute("Wrapper(self.data).data.modhash", 'get - 1st lookup')
self.execute("Wrapper(self.data).data.modhash", 'get - 2nd lookup')

def test_getitem_through_list(self):
statement = (
"Wrapper(self.data).data.children[0].data.author"
)
self.execute(statement, 'get through list - 1st lookup')
self.execute(statement, 'get through list - 2nd lookup')

def test_setitem(self):
statement = "Wrapper(self.data).data.modhash = 'dunno'"
self.execute(statement, 'set - 1st lookup')
self.execute(statement, 'set - 2nd lookup')

def test_setitem_through_list(self):
statement = (
"Wrapper(self.data).data.children[0]"
".data.author = 'Captain Obvious'"
)
self.execute(statement, 'set through list - 1st lookup')
self.execute(statement, 'set through list - 2nd lookup')


class TestAddictPerformance(TestDictPerformance):

namespace = {
'Wrapper': Dict
}


if __name__ == '__main__':
unittest.main()
Loading

0 comments on commit 5987cef

Please sign in to comment.