biggest parts done

jitsedesmet · Jun 9, 2024 · c4716af · c4716af
1 parent eab6625
commit c4716af
Show file tree

Hide file tree

Showing 5 changed files with 128 additions and 89 deletions.
diff --git a/presentation/final-assets/custom.css b/presentation/final-assets/custom.css
@@ -0,0 +1,12 @@
+.fragment.current-bold {
+    &.current-fragment {
+        font-weight: bold;
+    }
+}
+
+.fragment.out-bold {
+    font-weight: bold;
+    &.visible {
+        font-weight: normal;
+    }
+}
diff --git a/presentation/final-assets/flow-rdf-create.png b/presentation/final-assets/flow-rdf-create.png
diff --git a/presentation/final-assets/flow-rdf-update.png b/presentation/final-assets/flow-rdf-update.png
diff --git a/presentation/final-assets/sgv-graph.png b/presentation/final-assets/sgv-graph.png
diff --git a/presentation/final-presentation.html b/presentation/final-presentation.html
@@ -9,6 +9,7 @@
     <link rel="stylesheet" href="/static/revieljs/dist/reset.css">
     <link rel="stylesheet" href="/static/revieljs/dist/reveal.css">
     <link rel="stylesheet" href="/static/revieljs/dist/theme/white.css">
+    <link rel="stylesheet" href="/presentation/final-assets/custom.css">
 
     <!-- Theme used for syntax highlighted code -->
     <!--		<link rel="stylesheet" href="plugin/highlight/monokai.css">-->
@@ -208,82 +209,69 @@ <h3>Research Question and Hypothesis</h3>
     </div>
 
     <aside class="notes" data-markdown>
-        I therefore **ask the question**: "How to abstract data updates in a permissioned decentralized environment behind a query abstraction layer?"
+        I therefore **ask the question**: "How can we abstract data updates over a document oriented interface of a permissioned decentralized environment behind a query abstraction layer?"
         Let's investigate that question, in more detail ...
     </aside>
 </section>
 
 <section data-auto-animate>
-    <h3><span data-id="past">The Past:</span> Getting to Work</h3>
+    <h3>Solution</h3>
 
     <ul>
-        <li>What is LDP?</li>
-        <li>Can we use Shape Trees for updates?</li>
+        <li>Heterogeneity of Data →
+            <span class="fragment custom out-bold" data-fragment-index="1">Describe Data</span>
+        </li>
+        <li>
+            <span class="fragment custom current-bold" data-fragment-index="1">Heterogeneity of Structure</span> →
+            <span class="fragment custom current-bold" data-fragment-index="2"> Describe structure</span>
+        </li>
     </ul>
 
     <div class="r-stack" style="font-size: 0.6em">
         <div style="width: 100%; border: red solid 2px;" class="fragment fade-out" data-fragment-index="1">
-            Example: LDP Container
-            <pre style="margin-top: 0;"><code class="language-plaintext">
-&lthttp://example.org/c1/&gt
-   a ldp:BasicContainer;
-   dcterms:title "A very simple container";
-   ldp:contains &ltr1>, &ltr2&gt, &ltr3&gt.
+            Can use a shape description language like <a href="https://shex.io/">ShEx</a> or <a href="https://www.w3.org/TR/shacl/">SHACL</a>.
+            <br/>
+            Example: SHACL Shape Description of a social media post.
+            <pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
+Wow such a nice social media example
             </code></pre>
         </div>
         <div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out" data-fragment-index="1">
             Example: LDP Structure
             <div style="display: grid; grid-gap: 5px; grid-template-columns: 1fr 1fr">
             <pre style="margin-top: 0; padding: 20px">
-pictures/
-  |- Valencia/
-  |  |- one.ttl
-  |  |- two.ttl
+posts/
+  |- Valencia
+  |  |- #one
+  |  |- #two
   |- Ghent/
-  |  |- one.ttl
-  |  |- two.ttl
+  |  |- #one
+  |  |- #two
   |- Paris/
-  |  |- one.ttl
-  |  |- two.ttl
-  |  |- three.ttl
-  |- missing.ttl
+  |  |- #one
+  |  |- #two
+  |  |- #three
             </pre>
             <pre style="margin-top: 0; padding: 20px">
-pictures/
+posts/
   |- 30-01-2024/
-  |  |- one.ttl
-  |  |- two.ttl
+  |  |- #one
+  |  |- #two
   |- 14-02-2024/
-  |  |- one.ttl
-  |  |- two.ttl
+  |  |- #one
+  |  |- #two
   |- 17-05-2023/
-  |  |- one.ttl
-  |  |- two.ttl
-  |  |- three.ttl
-  |  |- four.ttl
+  |  |- #one
+  |  |- #two
+  |  |- #three
+  |  |- #four
             </pre>
             </div>
         </div>
-        <div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out">
-            Example: SHACL Shape Description
-            <pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
-ex:PictureShape
-    a sh:NodeShape;
-    sh:targetClass ex:Picture ;
-    sh:property [
-       sh:path ex:depicts ;
-       sh:minCount 1 ;
-       sh:maxCount 1 ;
-       sh:datatype xsd:string ;
-    ] ;
-    sh:property [
-        sh:path ex:contains ;
-        sh:nodeKind sh:IRI ;
-    ] .
-            </code></pre>
-        </div>
-        <div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out">
-            Example: Shape Trees
+        <div style="width: 100%; border: red solid 2px;" class="fragment fade-in-then-out" data-fragment-index="2">
+            Can use indexes like <a href="https://solid.github.io/type-indexes/">Type Indexes</a> or <a href="https://shapetrees.org/">Shape Trees</a>.
+            <br/>
+            Example: Shape Trees describing al list of files
             <pre style="margin-top: 0;"><code class="language-plaintext" style="padding: 0 20px; margin: 0; overflow-y: clip">
 <#PicturesTree>
   a st:ShapeTree ;
@@ -317,9 +305,8 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>
 
 
     <aside class="notes" data-markdown>
-        As I've mentioned before, to limit the scope of my thesis, I focus on the current tech stack of Solid.
-        Solid uses the **LDP interface** and **adds structural information** through Shape Trees used as an index.
-        LDP provides some nice interface to essentially model a file system using Linked Data.
+        To create an automated client, we essentially need to figure out what files hold what data and why.
+
 
         Such a **file system can structure files in a variety of ways**.
         With the help of Shape Trees we can understand the structure.
@@ -333,33 +320,33 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>
 </section>
 
 <section data-auto-animate>
-    <h3><span data-id="past">The Past:</span> Getting to Work</h3>
+    <h3>What are we missing?</h3>
 
     <div>
         <ol style="column-count: 2; padding: 5px; font-size: 20pt; list-style-position: inside">
             <li>What if multiple directories match?
-                <ul>
+                <ul style="display: inline-block">
                     <li>Do I duplicate?</li>
                     <li>Is one canonical and the other one links to the resource saved in the canonical?</li>
                     <li>And how do I decide which one is canonical?</li>
                 </ul>
             </li>
             <li>What if no directories match?</li>
             <li>How are resources grouped?
-                <ul>
+                <ul style="display: inline-block">
                     <li>Can I just infer that picture-by-date example is just that?</li>
                     <li>What if I need to create a new date directory?</li>
                 </ul>
             </li>
 
             <li>
                 Is that new directory I created a leaf?
-                <ul>
+                <ul style="display: inline-block">
                     <li>Or should I make even more directories? (Can be inferred from Shape Tree)</li>
                 </ul>
             </li>
             <li>What to do if a resource is changed?
-                <ul>
+                <ul style="display: inline-block">
                     <li>Should I alter the Shape Tree?</li>
                     <li>Should I move the resource?</li>
                     <li>Do I have a distance metric, and do I move when the distance is to great?</li>
@@ -383,56 +370,96 @@ <h3><span data-id="past">The Past:</span> Getting to Work</h3>
 </section>
 
 <section data-auto-animate>
-    <h3><span data-id="future">The Future:</span> Overview</h3>
+    <h3>Storage Guidance Vocabulary</h3>
 
-    <ul>
-        <li>Adapt <a href="https://comunica.dev/">Comunica</a> to allow update queries by interpreting SGV</li>
-        <li>Alter <a href="https://github.com/SolidBench/SolidBench.js">SolidBench</a>, so we can measure</li>
-        <li>Feedback Loop: Measure and Adapt</li>
-    </ul>
+    <picture>
+        <img src="final-assets/sgv-graph.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
+             alt="Schematic overview of the Storage Guidance Vocabulary">
+    </picture>
 
     <aside class="notes" data-markdown>
+        - *Resource Collection*: Corresponds to a group of RDF resources.
+        - *Unstructured Collection*: Corresponds to a classical LDP container or HTTP resource.
+        - *Structured Collection*: A canonical or derived collection. (below)
+        - *Canonical Collection*: A resource collection containing resources.
+        - *Derived Collection*: A resource collection that stores resources already stored by one or more other structured containers.
+        - *Resource Description*: A way of describing resources, for example through ShEx or SHACL.
+        - *Group Strategy*: A description of how resources should be grouped together, for example: my images are grouped per creation date.
+        - *Store Condition*: When multiple collections are eligible to store a resource, the store condition decides what collection(s) actually store the resource. Allowing the creation of a store priority system.
+        - *Update Condition*: Describes what to do when a containing resource is changed.
+        - *Client Control*: Describes the amount of freedom a client has when trying to store a resource.
     </aside>
 </section>
 
 <section data-auto-animate>
-    <h3 data-id="eval"><span data-id="future">The Future:</span> Evaluation</h3>
-
-    <div style="text-align: left">
-        Experiments using SolidBench:
-        <ul>
-            <li>Extend SolidBench with SGV descriptions</li>
-            <li>Implement manual update scripts for each structure</li>
-            <li>Reason how to generalize the different scripts</li>
-            <li>Evaluate updating a single pod using queries</li>
-            <li>Evaluate updating multiple pods using queries</li>
-        </ul>
-    </div>
+    <h3>Storage Guidance Vocabulary</h3>
+
+    <picture>
+        <img src="final-assets/flow-rdf-create.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
+             alt="Schematic overview of an SGV creation flow">
+    </picture>
 
     <aside class="notes" data-markdown>
-        **SolidBench** is an existing Benchmark that can **generate many data stores with different structures**.
-        For the different structures, **I will add the SGV**.
-        I will then perform multiple experiments.
+        1. The client gets the SGV description of the storage space (can be cached).
+        2. The client checks all canonical collections and checks if the resource to be inserted matches a resource description of the collection.
+        3. If the resource matches a description, the client checks the store condition of the description given the eligible collections.
+        4. For each collection that stores the resource:
+            1. The client checks the group strategy of the collection and groups the resource accordingly, deciding on the name of the new resource.
+            2. The client checks the collections that are derived from this collection.
+                Step 4 is executed for all collections that are derived from this collection, and the resource matches the description.
+        5. The client performs the store operation.
+
+        Also reason about the HTTP overhead caused.
     </aside>
 </section>
 
 <section data-auto-animate>
-    <h3 data-id="eval"><span data-id="future">The Future:</span> Evaluation</h3>
-
-    <div style="text-align: left">
-        Possible metrics:
-        <ul>
-            <li>Execution time</li>
-            <li>Number of http requests</li>
-            <li>String difference between queries that want the same modification over different data stores</li>
-            <li>What ratio of queries leaves the data store inconsistent when introducing random server failures</li>
-        </ul>
-    </div>
+    <h3>Storage Guidance Vocabulary</h3>
+
+    <picture>
+        <img src="final-assets/flow-rdf-update.png" style="max-height: 600px; max-width: 700px; object-fit: contain"
+             alt="Schematic overview of an SGV update flow">
+    </picture>
+
+    <aside class="notes" data-markdown>
+        1. The client gets the SGV description of the storage space and the HTTP resource containing the updated RDF resource.
+        2. The client virtually constructs the resource that would result from the requested operation.
+        3. The client checks the update condition of the original matching resource description. The following action depends on the update condition.
+        Typically, the update-condition will say whether an RDF resource is moved or not.
+        4. Move required: remove the existing resource and follow the steps described in the create resource flow.
+        5. No move required: just update the resource as requested by the user.
+
+        Also reason about the HTTP overhead caused.
+    </aside>
+</section>
+
+<section data-auto-animate>
+    <h3>Empirical Evaluation</h3>
+
+    Our tables, concluding it does indeed verify the hypothesis.
 
     <aside class="notes" data-markdown>
     </aside>
 </section>
 
+<section data-auto-animate>
+    <h3>Conclusion</h3>
+
+    <ul>
+        <li>Creating an automated client with limited overhead is possible.</li>
+        <li>Lack of server-side control might be an (inherent) problem</li>
+        <li>Inter-pod Updates - What if I want to move data between pods?</li>
+        <li>Other Interfaces - Is document oriented the best?</li>
+        <li>View Creation and Discovery - Since structure has a high influence on execution time</li>
+        <li>Smart Access control - Now that we know what a document is</li>
+        <li>CAP / ACID - transaction / BASE - CRDTs</li>
+    </ul>
+
+    <aside class="notes" data-markdown>
+        Related to other interfaces we have a [blog post of Ruben Verborgh challenging REST and other interfaces](https://ruben.verborgh.org/blog/2024/05/30/the-webs-data-triad/).
+
+    </aside>
+</section>
 
 <section data-auto-animate>
     <h3>Time for Questions</h3>