Skip to content

Commit

Permalink
biggest parts done
Browse files Browse the repository at this point in the history
  • Loading branch information
jitsedesmet committed Jun 9, 2024
1 parent eab6625 commit c4716af
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 89 deletions.
12 changes: 12 additions & 0 deletions presentation/final-assets/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.fragment.current-bold {
&.current-fragment {
font-weight: bold;
}
}

.fragment.out-bold {
font-weight: bold;
&.visible {
font-weight: normal;
}
}
Binary file added presentation/final-assets/flow-rdf-create.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added presentation/final-assets/flow-rdf-update.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added presentation/final-assets/sgv-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
205 changes: 116 additions & 89 deletions presentation/final-presentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
<link rel="stylesheet" href="/static/revieljs/dist/reset.css">
<link rel="stylesheet" href="/static/revieljs/dist/reveal.css">
<link rel="stylesheet" href="/static/revieljs/dist/theme/white.css">
<link rel="stylesheet" href="/presentation/final-assets/custom.css">

<!-- Theme used for syntax highlighted code -->
<!-- <link rel="stylesheet" href="plugin/highlight/monokai.css">-->
Expand Down Expand Up @@ -208,82 +209,69 @@ <h3>Research Question and Hypothesis</h3>
</div>

<aside class="notes" data-markdown>
I therefore **ask the question**: "How to abstract data updates in a permissioned decentralized environment behind a query abstraction layer?"
I therefore **ask the question**: "How can we abstract data updates over a document oriented interface of a permissioned decentralized environment behind a query abstraction layer?"
Let's investigate that question, in more detail ...
</aside>
</section>

<section data-auto-animate>
<h3><span data-id="past">The Past:</span> Getting to Work</h3>
<h3>Solution</h3>

<ul>
<li>What is LDP?</li>
<li>Can we use Shape Trees for updates?</li>
<li>Heterogeneity of Data →
<span class="fragment custom out-bold" data-fragment-index="1">Describe Data</span>
</li>
<li>
<span class="fragment custom current-bold" data-fragment-index="1">Heterogeneity of Structure</span>
<span class="fragment custom current-bold" data-fragment-index="2"> Describe structure</span>
</li>
</ul>

<div class="r-stack" style="font-size: 0.6em">
<div style="width: 100%; border: red solid 2px;" class="fragment fade-out" data-fragment-index="1">
Example: LDP Container
<pre style="margin-top: 0;"><code class="language-plaintext">
&lthttp://example.org/c1/&gt
a ldp:BasicContainer;
dcterms:title "A very simple container";
ldp:contains &ltr1>, &ltr2&gt, &ltr3&gt.
Can use a shape description language like <a href="https://shex.io/">ShEx</a> or <a href="https://www.w3.org/TR/shacl/">SHACL</a>.
<br/>
Example: SHACL Shape Description of a social media post.
<pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
Wow such a nice social media example
</code></pre>
</div>
<div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out" data-fragment-index="1">
Example: LDP Structure
<div style="display: grid; grid-gap: 5px; grid-template-columns: 1fr 1fr">
<pre style="margin-top: 0; padding: 20px">
pictures/
|- Valencia/
| |- one.ttl
| |- two.ttl
posts/
|- Valencia
| |- #one
| |- #two
|- Ghent/
| |- one.ttl
| |- two.ttl
| |- #one
| |- #two
|- Paris/
| |- one.ttl
| |- two.ttl
| |- three.ttl
|- missing.ttl
| |- #one
| |- #two
| |- #three
</pre>
<pre style="margin-top: 0; padding: 20px">
pictures/
posts/
|- 30-01-2024/
| |- one.ttl
| |- two.ttl
| |- #one
| |- #two
|- 14-02-2024/
| |- one.ttl
| |- two.ttl
| |- #one
| |- #two
|- 17-05-2023/
| |- one.ttl
| |- two.ttl
| |- three.ttl
| |- four.ttl
| |- #one
| |- #two
| |- #three
| |- #four
</pre>
</div>
</div>
<div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out">
Example: SHACL Shape Description
<pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
ex:PictureShape
a sh:NodeShape;
sh:targetClass ex:Picture ;
sh:property [
sh:path ex:depicts ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
] ;
sh:property [
sh:path ex:contains ;
sh:nodeKind sh:IRI ;
] .
</code></pre>
</div>
<div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out">
Example: Shape Trees
<div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out" data-fragment-index="2">
Can use indexes like <a href="https://solid.github.io/type-indexes/">Type Indexes</a> or <a href="https://shapetrees.org/">Shape Trees</a>.
<br/>
Example: Shape Trees describing al list of files
<pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
<#PicturesTree>
a st:ShapeTree ;
Expand Down Expand Up @@ -317,9 +305,8 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>


<aside class="notes" data-markdown>
As I've mentioned before, to limit the scope of my thesis, I focus on the current tech stack of Solid.
Solid uses the **LDP interface** and **adds structural information** through Shape Trees used as an index.
LDP provides some nice interface to essentially model a file system using Linked Data.
To create an automated client, we essentially need to figure out what files hold what data and why.


Such a **file system can structure files in a variety of ways**.
With the help of Shape Trees we can understand the structure.
Expand All @@ -333,33 +320,33 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>
</section>

<section data-auto-animate>
<h3><span data-id="past">The Past:</span> Getting to Work</h3>
<h3>What are we missing?</h3>

<div>
<ol style="column-count: 2; padding: 5px; font-size: 20pt; list-style-position: inside">
<li>What if multiple directories match?
<ul>
<ul style="display: inline-block">
<li>Do I duplicate?</li>
<li>Is one canonical and the other one links to the resource saved in the canonical?</li>
<li>And how do I decide which one is canonical?</li>
</ul>
</li>
<li>What if no directories match?</li>
<li>How are resources grouped?
<ul>
<ul style="display: inline-block">
<li>Can I just infer that picture-by-date example is just that?</li>
<li>What if I need to create a new date directory?</li>
</ul>
</li>

<li>
Is that new directory I created a leaf?
<ul>
<ul style="display: inline-block">
<li>Or should I make even more directories? (Can be inferred from Shape Tree)</li>
</ul>
</li>
<li>What to do if a resource is changed?
<ul>
<ul style="display: inline-block">
<li>Should I alter the Shape Tree?</li>
<li>Should I move the resource?</li>
<li>Do I have a distance metric, and do I move when the distance is to great?</li>
Expand All @@ -383,56 +370,96 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>
</section>

<section data-auto-animate>
<h3><span data-id="future">The Future:</span> Overview</h3>
<h3>Storage Guidance Vocabulary</h3>

<ul>
<li>Adapt <a href="https://comunica.dev/">Comunica</a> to allow update queries by interpreting SGV</li>
<li>Alter <a href="https://github.com/SolidBench/SolidBench.js">SolidBench</a>, so we can measure</li>
<li>Feedback Loop: Measure and Adapt</li>
</ul>
<picture>
<img src="final-assets/sgv-graph.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
alt="Schematic overview of the Storage Guidance Vocabulary">
</picture>

<aside class="notes" data-markdown>
- *Resource Collection*: Corresponds to a group of RDF resources.
- *Unstructured Collection*: Corresponds to a classical LDP container or HTTP resource.
- *Structured Collection*: A canonical or derived collection. (below)
- *Canonical Collection*: A resource collection containing resources.
- *Derived Collection*: A resource collection that stores resources already stored by one or more other structured containers.
- *Resource Description*: A way of describing resources, for example through ShEx or SHACL.
- *Group Strategy*: A description of how resources should be grouped together, for example: my images are grouped per creation date.
- *Store Condition*: When multiple collections are eligible to store a resource, the store condition decides what collection(s) actually store the resource. Allowing the creation of a store priority system.
- *Update Condition*: Describes what to do when a containing resource is changed.
- *Client Control*: Describes the amount of freedom a client has when trying to store a resource.
</aside>
</section>

<section data-auto-animate>
<h3 data-id="eval"><span data-id="future">The Future:</span> Evaluation</h3>

<div style="text-align: left">
Experiments using SolidBench:
<ul>
<li>Extend SolidBench with SGV descriptions</li>
<li>Implement manual update scripts for each structure</li>
<li>Reason how to generalize the different scripts</li>
<li>Evaluate updating a single pod using queries</li>
<li>Evaluate updating multiple pods using queries</li>
</ul>
</div>
<h3>Storage Guidance Vocabulary</h3>

<picture>
<img src="final-assets/flow-rdf-create.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
alt="Schematic overview of an SGV creation flow">
</picture>

<aside class="notes" data-markdown>
**SolidBench** is an existing Benchmark that can **generate many data stores with different structures**.
For the different structures, **I will add the SGV**.
I will then perform multiple experiments.
1. The client gets the SGV description of the storage space (can be cached).
2. The client checks all canonical collections and checks if the resource to be inserted matches a resource description of the collection.
3. If the resource matches a description, the client checks the store condition of the description given the eligible collections.
4. For each collection that stores the resource:
1. The client checks the group strategy of the collection and groups the resource accordingly, deciding on the name of the new resource.
2. The client checks the collections that are derived from this collection.
Step 4 is executed for all collections that are derived from this collection, and the resource matches the description.
5. The client performs the store operation.

Also reason about the HTTP overhead caused.
</aside>
</section>

<section data-auto-animate>
<h3 data-id="eval"><span data-id="future">The Future:</span> Evaluation</h3>

<div style="text-align: left">
Possible metrics:
<ul>
<li>Execution time</li>
<li>Number of http requests</li>
<li>String difference between queries that want the same modification over different data stores</li>
<li>What ratio of queries leaves the data store inconsistent when introducing random server failures</li>
</ul>
</div>
<h3>Storage Guidance Vocabulary</h3>

<picture>
<img src="final-assets/flow-rdf-update.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
alt="Schematic overview of an SGV update flow">
</picture>

<aside class="notes" data-markdown>
1. The client gets the SGV description of the storage space and the HTTP resource containing the updated RDF resource.
2. The client virtually constructs the resource that would result from the requested operation.
3. The client checks the update condition of the original matching resource description. The following action depends on the update condition.
Typically, the update-condition will say whether an RDF resource is moved or not.
4. Move required: remove the existing resource and follow the steps described in the create resource flow.
5. No move required: just update the resource as requested by the user.

Also reason about the HTTP overhead caused.
</aside>
</section>

<section data-auto-animate>
<h3>Empirical Evaluation</h3>

Our tables, concluding it does indeed verify the hypothesis.

<aside class="notes" data-markdown>
</aside>
</section>

<section data-auto-animate>
<h3>Conclusion</h3>

<ul>
<li>Creating an automated client with limited overhead is possible.</li>
<li>Lack of server-side control might be an (inherent) problem</li>
<li>Inter-pod Updates - What if I want to move data between pods?</li>
<li>Other Interfaces - Is document oriented the best?</li>
<li>View Creation and Discovery - Since structure has a high influence on execution time</li>
<li>Smart Access control - Now that we know what a document is</li>
<li>CAP / ACID - transaction / BASE - CRDTs</li>
</ul>

<aside class="notes" data-markdown>
Related to other interfaces we have a [blog post of Ruben Verborgh challenging REST and other interfaces](https://ruben.verborgh.org/blog/2024/05/30/the-webs-data-triad/).

</aside>
</section>

<section data-auto-animate>
<h3>Time for Questions</h3>
Expand Down

0 comments on commit c4716af

Please sign in to comment.