Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a more Pythonic API #26

Closed
di opened this issue Mar 9, 2020 · 13 comments
Closed

Consider a more Pythonic API #26

di opened this issue Mar 9, 2020 · 13 comments
Assignees

Comments

@di
Copy link
Member

di commented Mar 9, 2020

Hi all, first: thanks for providing this library. It's going to be really useful to have a standard way to work with CloudEvents from Python.

I took a preliminary look at it and my first impression was that the interface provided by this library isn't very "Pythonic", meaning that it doesn't provide an interface in a way that (I believe) would be expected by most Python developers. There are some small things (like using method names like FromRequest instead of from_request, using New in class/method names) but also larger things (like the use of method chaining, setters/getters, etc.)

I'm hoping that since this project is still in alpha, there's still time to improve the interface here (and that y'all are open to suggestions). I'll try to provide some examples below of what I'm talking about but I'm happy to clarify if anything's unclear. I'm also not intimately familiar with the CloudEvents spec, so I may have some misunderstandings as well.

Parsing upstream Event from HTTP Request

The current example for turning an HTTP request into an event:

import io

from cloudevents.sdk.event import v02
from cloudevents.sdk import marshaller

m = marshaller.NewDefaultHTTPMarshaller()
event = m.FromRequest(
    v02.Event(),
    {
        "content-type": "application/cloudevents+json",
        "ce-specversion": "0.2",
        "ce-time": "2018-10-23T12:28:22.4579346Z",
        "ce-id": "96fb5f0b-001e-0108-6dfe-da6e2806f124",
        "ce-source": "<source-url>",
        "ce-type": "word.found.name",
    },
    io.BytesIO(b"this is where your CloudEvent data"), 
    lambda x: x.read()
)
  • This makes the assumption that the user is only interested in supporting a single spec, but I think it's possible that either they would want to support multiple versions (and automatically detect which one is being used), or be able to upgrade to newer versions of the spec (and this module) without having to change their code.
  • There's only one type of marshaller provided by the library right now, HTTPMarshaller, and if it's the default maybe we should just get it by default when creating an Event, instead of making the user initialize it every time?
  • The data_unmarshaller is maybe not necessary? We're taking a bytestring, turning it into a BytesIO object to pass as data, and then using data_unmarshaller to turn it back into a bytestring. Why not just have the user do any data unmarshalling themselves before constructing the event?

Instead, consider:

from cloudevents.sdk import Event

event = Event(
    headers={
        "content-type": "application/cloudevents+json",
        "ce-specversion": "0.2",
        "ce-time": "2018-10-23T12:28:22.4579346Z",
        "ce-id": "96fb5f0b-001e-0108-6dfe-da6e2806f124",
        "ce-source": "<source-url>",
        "ce-type": "word.found.name",
    },
    body=b"this is where your CloudEvent data",
)
  • This initializes a generic Event, detects which spec it matches, and gives back a corresponding Event subclass. (If the user knew which spec they were supporting, instead of constructing an Event they could construct a v02.Event and skip the event detection)
  • The HTTPMarshaller is used by default. This also helps avoid issues like Reusing a marshaller causes failures? #12.
  • The data_unmarshaller field is removed (or at the very least, optional)

Creating a minimal CloudEvent

The provided example is:

from cloudevents.sdk.event import v01

event = (
    v01.Event().
    SetContentType("application/json").
    SetData('{"name":"john"}').
    SetEventID("my-id").
    SetSource("from-galaxy-far-far-away").
    SetEventTime("tomorrow").
    SetEventType("cloudevent.greet.you")
)
  • Similar to above, the user has to specify the exact event type they're expecting
  • Method chaining / fluent API: while this pattern is used by some Python libraries (pandas, various ORMs) it's not typical for the majority of Python libraries and generally not recommended.

Instead, consider the following based on the same Event class proposed above:

from cloudevents.sdk import Event

event = Event(
    data=b'{"name":"john"}',
    content_type="application/json",
    event_id="my-id",
    event_time="tomorrow",
    event_type="cloudevent.greet.you",
    source="from-galaxy-far-far-away",
)
  • The spec is automatically determined, but the user can also use a specific subclass if they prefer.
  • The constructor uses keyword arguments, so that an Event object can be created with a single method call. If the user needs to modify the event after initialization, they can act on the attributes directly (e.g. event.event_time = "yesterday").

Getting event attributes

Currently, after creating an event, if the user wanted to get one of the fields such as the event time, the user would have to do something like:

event = ...
data = event.ce__eventTime.value
  • The ce-prefix is a bit unexpected and the double-underscore is unconventional
  • Having to call .value to get the value is also unexpected

Instead, consider something like:

event = ...
data = event.event_time
  • No prefix or double underscore
  • Access the attribute value directly

Final thoughts

I realize this is proposing a pretty big overhaul of the current interface, but I think doing so would probably go a long way to lend towards the usability and maintainability of this library, and it'd be better to do it sooner than later. I'm happy to help out here, including designing/implementing these changes and helping maintain them afterwards, if it makes sense. Thanks!

@di
Copy link
Member Author

di commented Apr 2, 2020

Seems like there is a similar discussion on the Java SDK here: cloudevents/sdk-java#108

(cc @slinkydeveloper)

@MarcBoissonneault
Copy link

MarcBoissonneault commented Apr 13, 2020

Here is a super basic implementation of a more pythonic CloudEvent

Expand to see
from datetime import datetime
from datetime import timezone
from typing import Any
from typing import Dict
from typing import Iterable
from typing import Optional

_EVENT_KNOWN_ATTRIBUTES: Iterable[str] = (
    "spec_version",
    "id",
    "source",
    "type",
    "subject",
    "time",
    "data_content_type",
    "data_schema",
    "data",
)

_MANGLED_EVENT_KNOWN_ATTRIBUTES: Iterable[str] = tuple([f"_Event__{attr}" for attr in _EVENT_KNOWN_ATTRIBUTES])

class Event:
    __spec_version: str = "1.0"
    __id: str
    __source: str
    __type: str

    __subject: Optional[str] = None
    __time: Optional[datetime] = None
    __data_content_type: Optional[str] = None
    __data_schema: Optional[str] = None
    __data: Optional[Any] = None

    __extensions: Optional[Dict[str, Any]] = {}

    def __init__(
            self,
            spec_version: str,
            id: str,
            source: str,
            type: str,
            subject: Optional[str] = None,
            time: Optional[datetime] = None,
            data_content_type: Optional[str] = None,
            data_schema: Optional[str] = None,
            data: Optional[Any] = None,
            **extensions: Any
    ) -> None:
        self.__spec_version = spec_version
        self.__id = id
        self.__source = source
        self.__type = type

        self.__time = time or datetime.now(timezone.utc)

        if subject is not None:
            self.__subject = subject

        if data_content_type is not None:
            self.__data_content_type = data_content_type
        if data_schema is not None:
            self.__data_schema = data_schema
        if data is not None:
            self.__data = data
        if extensions is not None:
            for extension_key, extension_value in extensions.items():
                setattr(self, extension_key, extension_value)

    @property
    def spec_version(self) -> str:
        return self.__spec_version

    @spec_version.setter
    def spec_version(self, value: str) -> None:
        self.__spec_version = value

    @property
    def id(self) -> str:
        return self.__id

    @id.setter
    def id(self, value: str) -> None:
        self.__id = value

    @property
    def source(self) -> str:
        return self.__source

    @source.setter
    def source(self, value: str) -> None:
        self.__source = value

    @property
    def type(self) -> str:
        return self.__type

    @type.setter
    def type(self, value: str) -> None:
        self.__type = value

    @property
    def subject(self) -> Optional[str]:
        return self.__subject

    @subject.setter
    def subject(self, value: str) -> None:
        self.__subject = value

    @property
    def time(self) -> Optional[datetime]:
        return self.__time

    @time.setter
    def time(self, value: datetime) -> None:
        self.__time = value

    @property
    def data_content_type(self) -> Optional[str]:
        return self.__data_content_type

    @data_content_type.setter
    def data_content_type(self, value: str) -> None:
        self.__data_content_type = value

    @property
    def data_schema(self) -> Optional[str]:
        return self.__data_schema

    @data_schema.setter
    def data_schema(self, value: str) -> None:
        self.__data_schema = value

    @property
    def data(self) -> Optional[Any]:
        return self.__data

    @data.setter
    def data(self, value: Any) -> None:
        self.__data = value

    def __getattr__(self, item: str) -> Any:
        return self.__extensions[item]

    def __setattr__(self, key: str, value: Any) -> None:
        if key in _EVENT_KNOWN_ATTRIBUTES or key in _MANGLED_EVENT_KNOWN_ATTRIBUTES:
            super().__setattr__(key, value)
        else:
            self.__extensions[key] = value

This implementation allow to use pythonic style Event creation with keyword arguments for all Required, Optional or Extras attributes.

event = Event(spec_version="v1",
              id=str(uuid.uuid4()),
              source="event-source",
              type="event-type",
              subject="event-subject",
              time=datetime(2020, 4, 10, tzinfo=timezone.utc),
              data=["abc", 123, True],
              extra_parameter="extra-parameter-value")

print(event.spec_version)                  # --> v1
print(event.subject)                       # --> event-subject
print(event.time.isoformat())              # --> 2020-04-10T00:00:00+00:00
print(event.data)                          # --> ['abc', 123, True]
print(event.extra_parameter)               # --> extra-parameter-value

# Changing subject (Normal attributes)
event.subject = "new-subject"
print(event.subject)                       # --> new-subject

# Adding extra attributes
event.other_parameter = "other-value"
print(event.other_parameter)               # --> other-value

The big missing part is the serialization and deserialization

@TBBle
Copy link

TBBle commented Apr 22, 2020

One other thing to deal with when talking about the API is to be careful not to lock-in the HTTP binding's headers as the interchange format. We are using this SDK with RabbitMQ, and the AMQP bindings have different header mappings in Binary mode, for example.

The JSON Event Format support for the Structured content mode also assumes HTTP (it's in the name of the class...) but happens to have the same header name in AMQP, so we're lucky there.

At some point soon, we'll be using this with Kafka (eventually replacing RabbitMQ) and I haven't checked how well those bindings align in Structured mode. As part of this, we may want the AVRO Event Format support too.

This might mean that the current converter/marshaller/event relationship needs readjustment, since Structured mode is co-operation between an Event Format and a Protocol Binding, while Binary mode is entirely up to the Protocol Binding. Currently the JSON Event Format implementation has HTTP-specific bits inside. It should be returning a collection of Cloud Events attributes, not HTTP headers.

Note: I've only looked at the v1.0 spec, it's possible that this stuff was differently-structured in earlier releases.

@evankanderson
Copy link
Contributor

A slight caveat on #26 (comment)

It might make more sense to clearly split the Context Attributes from the Event Payload.

It would also be handy to default or require the required attributes, which are:

  • id: Non-empty, default to random string
  • source: Non-empty, should be meaningful (required param)
  • specversion: Required value, should default
  • type: Non-empty, should be meaningful (required param)

@cumason123
Copy link
Contributor

cumason123 commented Jun 26, 2020

@di The current CloudEvent class #47 allows users to pass in http headers/data content as dictionaries and it'll automatically parse the dicts for cloudevent data.

event = CloudEvent(data, headers=headers)

But this issue declares a more pythonic constructor to be the following:

event = CloudEvent(spec_version="v1",
              id=str(uuid.uuid4()),
              source="event-source",
              type="event-type",
              subject="event-subject",
              time=datetime(2020, 4, 10, tzinfo=timezone.utc),
              data=["abc", 123, True],
              extra_parameter="extra-parameter-value",
              isbinary=True
)

This constructor while more descriptive of what it wants as arguments, can provide a little more overhead when receiving events as a user must first extract data from the http request. Naturally, **kwargs could provide a slightly more convenient constructor via:

event = CloudEvent(**headers, **body, isbinary=True)

So I suppose my question is I plan on refactoring the current CloudEvent constructor to take more key named arguments instead of a vague dict, but for canonical sample code is it appropriate to show instantiating CloudEvents with **kwargs for convenience?

@evankanderson
Copy link
Contributor

Coming here from #47 , I think there's been a bit of a misunderstanding (either by myself or by one of the other authors here). Looking at the new CloudEvent class, it looks like it assumes that the payload of a CloudEvent is necessarily JSON.

According to my reading of the spec, the CloudEvent payload is not always JSON:

This specification does not place any restriction on the type of this information. It is encoded into a media format which is specified by the datacontenttype attribute (e.g. application/json), and adheres to the dataschema format when those respective attributes are present.

Furthermore, when encoding a CloudEvent with non-JSON data, the data may need to be converted to a base-64 encoded string (if Binary), or serialized as a JSON string (if not). I suspect that the current signature of "CloudEvent" in #47 and mentioned in #26 (comment) probably won't work for e.g. datacontenttype="image/png", because the type of data in the constructor is typing.Union[dict, None], rather than typing.Any (or some sort of type.HasStrMethod).

@evankanderson
Copy link
Contributor

I'm probably going to try to add this to #34, since I'm needing to rework http_events.py to get the tests to pass.

@evankanderson
Copy link
Contributor

evankanderson commented Jun 27, 2020

Proposed API. Note that the core data is version- and encoding- agnostic, based on the v1 interface, and that serialization specifies (with a default) whether structured or binary encoding should be used.

class CloudEvents():
  @classmethod
  def FromHttp(cls, headers, body, unmarshaller=json.loads) -> CloudEvent:
    """Automatically determines version and encoding, decodes to standard format.
    Throws an exception if required fields are missing."""

  def  __init__(self, attributes:dict={}, data=typing.Any):
    """Constructs an object from cloudevents attributes (no "ce-" prefix) and optional data.
    Fills in defaults as needed."""

  def ToHTTP(self, format: base.Converter=converters.HTTPStructured) -> (dict, typing.IO):
    "Creates HTTP header & payload for the specified encoding format."

  # The `data` attribute in the CluodEvent may be used to access the stored data.

  # Implement map type for attributes
  def __getitem__(self, attribute: string) -> string

  def __setitem__(self, attribute: string, value: string)

  def __delitem__(self, attribute: string)

  def __iter__(self) -> typing.Iterable[string]

  def __len__(self) -> int

  def __contains__(self, attribute: string) -> bool

@evankanderson
Copy link
Contributor

Rationale for implementing Mapping rather than free-form attributes with __getitem__: it's hard to perform iteration based on attributes.

@evankanderson
Copy link
Contributor

I've updated #34 with an implementation of my proposed API.

@grant
Copy link
Member

grant commented Jun 29, 2020

Thanks for the comments and raising the regression @evankanderson.
And thank you for the extensive effort to update #34.

I've pinged @cumason123 to look at #34 and we should aim to merge that PR next.

@lovejoy
Copy link

lovejoy commented Sep 27, 2020

is there a demo for use this with kafka like https://github.com/cloudevents/sdk-java/tree/master/examples/kafka

@TBBle
Copy link

TBBle commented Sep 27, 2020

Not as far as I know. As I noted earlier (and it hasn't changed, that I can see) the only implementations for both 'structured' and 'binary' Content Modes are for the HTTP Protocol Binding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants