Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental branch on aggregation operators for spaces #7

Open
ghost opened this issue Oct 18, 2017 · 12 comments
Open

Experimental branch on aggregation operators for spaces #7

ghost opened this issue Oct 18, 2017 · 12 comments

Comments

@ghost
Copy link

ghost commented Oct 18, 2017

The branch aggregation contains a new interface and new operators for aggregation:

  • t, _ := s.PutAgg(f, x...): Put a tuple to space s obtained by using a user defined aggregation function f on the tuples matched by a template x.... If no tuples are matched, the template x... is converted to its intrinsic tuple, that is, the concrete values read from the template itself, and put in s instead.
  • t, _ := s.GetAgg(f, x...): Get a tuple from space s obtained by using a user defined aggregation function f on the tuples matched by a template x..., while removing the tuples in the process.
  • t, _ := s.QueryAgg(f, x...): Query for a tuple from space s obtained by using a user defined aggregation function f on the tuples matched by a template x..., leaving space s intact.

All operators are non-blocking and return a copy of aggregate tuple t. The returned copy can be ignored.

Below is a visualisation of the three operators PutAgg(), GetAgg() and QueryAgg(). t corresponds to a tuple returned, while template T = x.... The empty tuple is (), and small crosses, circles and squares indicate tuples of possibly different type.

aggregation

The interface referred to here will be referred to as the Star Library.

@ghost ghost self-assigned this Oct 18, 2017
@ghost
Copy link
Author

ghost commented Oct 28, 2017

Example:

package main

import (
	"fmt"
	. "github.com/pspaces/gospace"
)

func sum(a Tuple, b Tuple) Tuple {
	s := make([]interface{}, a.Length())

	for i := 0; i < a.Length(); i += 1 {
		va := a.GetFieldAt(i).(int)
		vb := b.GetFieldAt(i).(int)
		s[i] = va + vb
	}

	return CreateTuple(s...)
}

func main() {
	spc := NewSpace("space")

	spc.Put(1, 2, 3, 4, 5)
	spc.Put(2, 3, 4, 5, 6)
	spc.Put(3, 4, 5, 6, 7)
	spc.Put(4, 5, 6, 7, 8)
	spc.Put(5, 6, 7, 8, 9)

	var i, j, k, l, m int

	q, _ := spc.Query(1, &i, &j, &k, &l)
	fmt.Printf("Query: %v\n", q)
	fmt.Printf("Size: %v\n", spc.Size())

	a, _ := spc.QueryAgg(sum, &i, &j, &k, &l, &m)
	fmt.Printf("QueryAgg: %v\n", a)
	fmt.Printf("Size: %v\n", spc.Size())

	b, _ := spc.GetAgg(sum, &i, &j, &k, &l, &m)
	fmt.Printf("GetAgg: %v\n", b)
	fmt.Printf("Size: %v\n", spc.Size())

	c, _ := spc.PutAgg(sum, "a", &a, "b", &b)
	fmt.Printf("PutAgg: %v\n", c)
	fmt.Printf("Size: %v\n", spc.Size())

	spc.Put(1, 2, 3, 4, 5)
	spc.Put(2, 3, 4, 5, 6)
	spc.Put(3, 4, 5, 6, 7)
	spc.Put(4, 5, 6, 7, 8)
	spc.Put(5, 6, 7, 8, 9)

	d, _ := spc.PutAgg(sum, &i, &j, &k, &l, &m)
	fmt.Printf("PutAgg: %v\n", d)
	fmt.Printf("Size: %v\n", spc.Size())
}

Outputs:

Query: (1, 2, 3, 4, 5)
Size: 5
QueryAgg: (15, 20, 25, 30, 35)
Size: 5
GetAgg: (15, 20, 25, 30, 35)
Size: 0
PutAgg: ("a", (15, 20, 25, 30, 35), "b", (15, 20, 25, 30, 35))
Size: 1
PutAgg: (15, 20, 25, 30, 35)
Size: 2

@ghost
Copy link
Author

ghost commented Oct 28, 2017

@albertolluch @mshPolo

See commit 3cc3d77.

Comments are welcome.

@albertolluch
Copy link
Member

Operation PutAgg is not clear to me.

@ghost
Copy link
Author

ghost commented Oct 29, 2017

Description has been clarified and visualisation of the operators are provided for easier understanding.

@albertolluch
Copy link
Member

albertolluch commented Oct 29, 2017

It seems to me that PutAgg can be implemented/defined via other operations as follows.

t, _ := s.PutAgg(f, x...)

can be obtained with:

t, _ := s.GetAgg(f, x...)
s.Put(t)

If this is indeed the case, I don't see the need of PutAgg. I guess it is a matter of minimality vs symmetry in the API. Is that the point?

The meaning of the template x... is converted to its intrinsic tuple is not cleary to me. What is the intrinsic tuple of template &x?

On a related note, the behaviour of aggregations when the set of matching tuples is the empty set should be explained. I would say that every aggregation function f should be defined for multisets and hence, the value of f(nil) should be well defined.

@sequenze
Copy link
Member

t, _ := s.PutAgg(f, x...): Put a tuple to space s obtained by using a user defined aggregation function f on the tuples matched by a template x.... If no tuples are matched, the template x... is converted to its intrinsic tuple, that is, the concrete values read from the template itself, and put in s instead.

If im understanding correctly, PutAgg takes a function f, and a template to match the tuples in space s. For all tuples which match the template will be aggregated over. The result is then stored in the tuple space. If however, no tuples match the template, then the template is instead considered a tuple itself and subsequently stored in the tuple space?

a) I get the first part. That is a natural aggregation. But I do not understand the second part. What is the utility in storing the pattern / template?

b) Im going out on a limb here. But it looks like PutAgg is trying to do too much in one operation. not unlike a god-functions / god classes.

  1. This only works for local spaces right?

Other than that it looks nice :D

@albertolluch
Copy link
Member

@sequenze: me and @luhac discussed this offline and the description will be updated. Long story short: putAgg is getAgg+put in one shot, atomically.

I can see this as a frequent pattern when you want do reduce/abstract large amounts of data. A name like "reduce" or "compress" could be more faithful to the intended semantics.

As for having it in the API, I posed the same question. Minimality of the API would argue against including the operation. @luhac argues here for some form of symmetry/homogeneity in the API (same argument for the current name). I guess that efficiency could be also an issue here: atomicity of the operation can be better handled directly wrt. requiring the programmer to get it using lock tuples, for the same reason as we have query (which can be obtained as get+put).

@sequenze
Copy link
Member

sequenze commented Nov 11, 2017

@sequenze: me and @luhac discussed this offline and the description will be updated. Long story short: putAgg is getAgg+put in one shot, atomically.

I can see this as a frequent pattern when you want do reduce/abstract large amounts of data. A name like "reduce" or "compress" could be more faithful to the intended semantics.

I get that part. That makes sense. Having the possibility of aggregating and putting in one atomic operation is nice.

My question was merely related to the storing of the pattern?
As far as i understand, it stores the pattern/template when no tuples match the pattern/template.
Wouldn't it be more true to just not put any tuple at all? considering its a non-blocking call then perhaps have it return a boolean. true if a tuple was created; otherwise false.

Edit: Why is PuttAgg also returning the values that was put'ted?

@albertolluch
Copy link
Member

We also discussed this.

In my view aggregation functions f must have multisets (unordereded lists) of tuples as domain and tuples as co-domain.

If there is no matching tuple, the resulting aggregated tuple should just be f(\emptyset).

Recall that f is user-defined. In my view it is the responsibility of the programmer to provide a well-formed f. For many aggregation functions, the result of f(\emptyset) is the identity of the function. Examples: sum(\emptyset) is 0, max(\emptyset) is the minimumu value in the corresponding domain, dually for min().

For other functions we have different story. For example, avg(\emptyset) is nonsense. The result should be undefined. When it comes to reducing a set of tuples I would go here for just not adding tuple to the tuple space. This could be accomplished by requiring f() to return an error in those cases, so that putAgg knows that nothing is to be done.

Alternatively, and more simply: the semantics of putAgg when no tuple is matched is just to do nothing.

I see advantages/disadvantages in both solutions.

@albertolluch
Copy link
Member

PS. the rationale behind putAgg returning the tuple added is the same as for the normal put in Go. @luhac proposed this to make put have a similar return signature as get/query. I said ok for trying this experimental feature. It may be interesting for instance if one applies some form of access control that may accept tuples after modifying them (e.g. sanitizing/anonymizing some fields) and may notify the user how the modified tuple is actually stored.

@sequenze
Copy link
Member

This quote just popped into my head: "Better to remain silent and be thought a fool than to speak and to remove all doubt."

So it occurs to me that i am better of being silent :D

@albertolluch
Copy link
Member

I don't agree. @luhac and me forgot to update this thread, and all your concerns/doubts were more than reasonable. Thanks for your feedback!

@ghost ghost removed the help wanted label Nov 12, 2017
@ghost ghost removed the in-process label Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants