Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does unipop support modeling a single table data as a 'virtual' graph? #111

Open
sorryya opened this issue Nov 8, 2017 · 3 comments
Open
Assignees
Labels

Comments

@sorryya
Copy link

sorryya commented Nov 8, 2017

For example, if a elasticsearch document containing some vertexes and edges, and each vertex or edge is represented by a set of fields from the document, how to write the mapping file?

@seanbarzilay
Copy link
Member

@sorryya Did you mean this kind of mapping Inner Edges?

@sorryya
Copy link
Author

sorryya commented Nov 10, 2017

I mean:
Elastic document like this:

{
    "_index": "xxx",
    "_type": "yyy",
    "_id": "AV-VSXTUbcKGrP6qekMg",
    "_source": {
        "field_1": "1111",
        "field_2": "2222",
        "field_3": "3333",
        "field_4": "4444",
        "field_5": "5555",
        "field_6": "6666",
        "field_7": "7777",
        "field_8": "8888",
        "field_9": "9999"
    }
}

My scene:

  1. Each document represents a event, and I want to model a graph about cooccurrence relations of the objects in the event.
  2. Some fields about the event are for edges, some fields about the objects are for vertices.
  3. So, one field may be as an id or a property for several edges or vertices, a field as vertex id may have duplicate value in documents.
  4. The "id" may be combined by a set of fields.
  5. The "index" should be all indexes or some indexes in elasticsearch.

Can Mapping file be like this?

{
  "class": "org.unipop.elastic.ElasticSourceProvider",
  "clusterName": "escluster",
  "addresses": "http://localhost:9200",
  "edges": [
    {
      "index": "*",
      "id": {
        "fields": ["some_value", "@_id"],
        "delimiter": "+"
      },
      "label": "lable_e1",
      "properties": {
        "field_1": "@field_1",
        "field_2": "@field_2",
        "field_3": "@field_3"
      },
      "outVertex":{
        "ref": false,
        "id": "@field_4",
        "label": "lable_v1",
        "properties": {
          "field_5": "@field_5"
        }
      },
      "inVertex":{
        "ref": false,
        "id": {
          "fields": ["@field_6", "@field_7"],
          "delimiter": "+"
        },
        "label": "lable_v2",
        "properties": {
          "property_name": {
            "fields": ["@field_6", "@field_7"],
            "delimiter": "+"
          },
        }
      }
    },
    {
      "index": "*",
      "id": "@_id",
      "label": "lable_e2",
      "properties": {
        "field_1": "@field_1",
        "field_2": "@field_2",
        "field_5": "@field_4",
        "field_7": "@field_8",
      },
      "outVertex":{
        "ref": false,
        "id": "@field_4",
        "label": "lable_v3",
        "properties": {
          "field_5": "@field_5"
        }
      },
      "inVertex":{
        "ref": false,
        "id": "@field_9",
        "label": "lable_v4",
        "properties": {
          "field_9": "@field_9"
        }
      }
    }
  ]
}

In this case, here are the problems I have met:

  1. If I use "ref" as false in "outVertex" or "inVertex", it throws:java.lang.NullPointerException.
  2. The count of edges I queried is much less than it actually is, which g.E().count() got 9881, but the elastic documents count is 7242721.
  3. If I define vertices all within edges in mapping file, the count of vertices I got is 0 use g.V().count().
  4. If I defind vertices as independent ones(not within edges), the count of vertices I got is much less than it actually is, which g.V().values("field_4").count() got 251, but the distinct count of field_4 (as the vertex's id and property) in elasticsearch is 753.
  5. When I use the fuction "has(...)" to query, I got nothing.

@seanbarzilay
Copy link
Member

@sorryya I haven't tested a schema where both vertices are non reference vertices, so I will fix it and release a patch in the next few days.

@seanbarzilay seanbarzilay self-assigned this Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants