Replies: 9 comments 21 replies
-
@balacij this looks like a great analysis to me. The research questions sound good. My slides for my talk tomorrow tell a part of the story on where Drasil will shine. There is a pattern in helpful computing tools - they capture information so that they can provide automation. Org mode captures rudimentary documentation information, like the section headings, and uses this to automatically create LaTeX and html documents with formatted section headings, along with a table of contents. Code generation in Maple works by capturing knowledge on symbolic manipulation through a computer algebra system. With this information, we can automate symbolic calculations and code generation. Drasil can capture information about physics, about math, about computing, about documentation, etc. The information that we capture can be used to automatically generate software artifacts. The limiting factor is the knowledge capture. The nice thing about Drasil in Drasil is that we can extend Drasil as we understand different branches of knowledge better. |
Beta Was this translation helpful? Give feedback.
-
So I'm going to play devil's advocate here: I find that the "problem statement" is full of Drasil jargon and somehow describes everything in terms of low-level operational details. I'm not even sure I understand what the problem statement truly means! Rather than relying on Drasil terminology for communicating the ideas, I think some concrete example(s) of what the problem really is would help a lot. The purpose statement inherits the flaws of the problem statement. I was really surprised by RQ1. It seemed to "come out of nowhere", i.e. to me the previous discussion wasn't at all leading to "Drasil in Drasil"! RQs 2-4 are nice 'speculative' questions, but a little too philosophical to make them good CS/SE research questions. It would be quite hard to know that they've been 'answered'. |
Beta Was this translation helpful? Give feedback.
-
@balacij Is that Problem/Purpose Statements and Question (PPS&Q)? As I am learning this writing technique, typically, the problem is limited to 1-2 sentences. |
Beta Was this translation helpful? Give feedback.
-
I think that a detailed discussion of what 'Drasil in Drasil' entails, or even a an abstract vision of this, would be worthwhile. I searched for 'Drasil in Drasil' in the search field and this discussion is one of the few pages that includes this phrase despite having heard it in discussions many times in the past year. This might be tacit knowledge at this point. I think it would benefit everyone if it was well documented. Thank you @balacij for creating this discussion and listing the "what does" question here. The reason this came up is that I was looking through the Body.hs (and other example specific) files and I couldn't help but wonder why we have the user do some of the things that they do. For example, having them use some of the combinators that we have when constructing sentences. On one had I do understand why we do this for the sake of our recipes, capturing knowledge, and removing redundancy, but on the other hand, thinking about end product usability, I can't help but notice that this increases on-boarding time of new users. It is certainly a great process for Drasil related research and for the sake of students working on Drasil, but if in the long term we want Drasil to be effortlessly used by practitioners, then Drasil will probably have to do some of this translation from basic text input, at least in some cases. This led to me wondering if this is one of the issues that 'Drasil in Drasil' is meant to tackle. Just a thought. |
Beta Was this translation helpful? Give feedback.
-
I've been trying using a tablet to take notes. One of the nice things about it is that I can share my scribbles. I wrote it in "Dark Mode" with "Samsung Notes," so it looks a bit off in a standard PDF reader without also enabling dark/night mode. This is my current re-write as of right now, I will eventually re-write it here too: De-embedding Drasil (DraDrasilsil).pdf |
Beta Was this translation helpful? Give feedback.
-
When we write software, we're writing instructions that the computer interprets. Typically, we expect the computer to somehow give some sort of feedback while running and when it's done. We provide computers a means of giving us feedback by hooking them up to monitors, stereos, and other computers (external processing, networking with other computers, etc.). This is wonderful. With software, we can quickly offload monotonous tasks to machines. For example, we might have them process data for us, sanitizing, and noting irregularities. We might also have it hooked up to other machines and have them work together to, say, move packages in shipping warehouses to specific locations for trucks to deliver them to their locations. The machines just follow our instructions to the T. The instructions are defined through assembly languages. Of course, programming in nearly any large, raw assembly language is stressful and difficult because we have to keep track of what CPU architecture is being used, what quirks exist, what optimizations you can make, managing memory, etc., all while also worrying about why we chose to build the program in the first place (i.e., the requirements of the program). Procedure CaptureTo make software development easier (and less stressful), we built programming languages that sit atop the assembler languages. By doing this, we also gained information about what that assembler was previously doing as a general procedure. In other words, we gained the ability to similarly translate those higher-level languages into other assembler languages. We obtained procedural reuse and came one step closer to “primarily focusing on the real issue at hand” -- the requirements that the software must satisfy. For example, C is an abstraction over some assembly code (PDP 7?). But C is still too “close” to the machine. Manual memory management, garbage collection, and the likes are all cruft we don't want to worry about. They should also be a monotonous task because they're generally well-understood too (or we at least have general schemes we can follow). Abstraction over Procedural CruftTo remedy that situation, languages with automated garbage collection, memory management, etc. came around. For example, Java, Python, D, and Rust all handle different aspects of that same garbage collection and memory management issues. However, these languages remain procedure-oriented, abstracting over only the procedural cruft. These procedures are still the same steps and calculations that somehow compute solutions to the problems we were interested in, but with even less overhead and worry about maintaining the “machine.” So, these other languages are just abstractions/“smarter” versions of the same procedures. Does it affect how we think about the formulation of solutions? In the way that we think about the procedural solution, yes. But in the context of the focal problem knowledge, not really. Similar to how we recognized the cruft of memory management and the likes, can we make converting that problem knowledge into the solution easier too? Well, in a first attempt, procedures are shared through functions, libraries, frameworks, etc. However, these exhibit their own issues:
These are all symptoms of manually transcribing software as a “view” of our requirements/problem knowledge, and not defining what it means for it to be a “view” and then generating the solution. Succinctly, the issue lies in that we're still manually building the solution implementations of our well-understood problems! Abstraction over Families of Problem Knowledge and Software ArtifactsIn order to resolve these issues, we, again, need to look at the procedure as more “monotonous cruft” and see how we can abstract over them. Well, thankfully, we already have logical “models” (theories) that we think about when building the software. So, we look to capture that. When building software, we have some sort of document that developers and product owners share that stores the requirements of the software. Developers interpret these requirements and form related solutions. Unfortunately, these requirements are typically textual and is only verified by the eyes of the implementors and creators.
[Aside: I'm not entirely sure what software you try to build without some sort of vague idea of the requirements, but in the event that it's possible, we will constrain the set of desired software to that well-understood.] For a subset of well-understood [cite] knowledge, we have a means of auditing the software and the logic behind forming it. We should be able to reduce the whole process to just writing and auditing the logic now. Then, the “development” of the software would just be a particular view (e.g., generation up to design specifications and artifact configuration). Here again, we have a reoccurring idea of “going up an abstraction,” but not quite in the same direction as earlier. Earlier, we were abstracting over procedural cruft. Here, we desire an abstraction over what was leftover -- the solution as it pertains to a problem. Similar to removing the procedural cruft of a particular language, we have to remove the equivalent “procedural cruft” (the problem/solution) by the family of problem/solutions it pertains to. Previously, we were removing the cruft of one particular language. Here, we remove the cruft of one particular software family. Previously, you could only do that for 1 language at a time. Here, you can apply it to any language, but only for kind of software. These abstractions are, in some sense, orthogonal. [Aside: it might have been better for me to talk about “machine” vs “business” logic to get the point across.] Back to the focus, to make this monotonous, we need to look at capturing the procedures one level “higher.” Drasil is a prime example of this in practice. It converts problem knowledge into related solutions. DrasilDrasil is a software artifact generation suite. Drasil uses DSLs to encode domain-specific knowledge, and create opportunity for domain-specific interpretations (this is largely an abstraction over the business-oriented logic, allowing us to remove ourselves one level further from the “code”). Drasil is deeply embedded in Haskell, and requires developers (users of Drasil) to use Haskell to encode their ideas. Unfortunately, asking developers to use Haskell is a bit of an uphill fight (P1) for many reasons (tooling maturity, stability, complexity compared to languages used in businesses, etc.). So, how does Drasil really work? Drasil has 5 “major” components (at least 3 are directly mentioned in #2883, but I believe there are two more now) that makes it special:
So, what do these 5 things look like in Drasil? ExamplesChunk Types
Drasil/code/drasil-lang/lib/Language/Drasil/Chunk/DefinedQuantity.hs Lines 23 to 34 in 2178e68 Drasil/code/drasil-lang/lib/Language/Drasil/Chunk/Unital.hs Lines 25 to 60 in 2178e68
Expr: Drasil/code/drasil-lang/lib/Language/Drasil/Expr/Lang.hs Lines 93 to 144 in 2178e68 ModelExpr: Drasil/code/drasil-lang/lib/Language/Drasil/ModelExpr/Lang.hs Lines 86 to 152 in 2178e68 CodeExpr: Drasil/code/drasil-code-base/lib/Language/Drasil/Code/Expr.hs Lines 71 to 143 in 2178e68 Chunk TransformersSometimes we define them as free floating “functions”: [Aside: I think that “constructors” are also transformers. Really, almost all of our functions are transformers. The important thing is that they somehow involve chunks and don't do any sort of IO operations.] Drasil/code/drasil-lang/lib/Language/Drasil/Chunk/Unital.hs Lines 64 to 79 in 2178e68 Anything from: https://github.com/JacquesCarette/Drasil/tree/master/code/drasil-printers/lib/Language/Drasil/Printing/Import Drasil/code/drasil-printers/lib/Language/Drasil/Printing/Import/Expr.hs Lines 111 to 138 in 2178e68 There are also some which are defined through typeclasses and that it might be best to gather them all into typeclasses so that we invert the current dependencies (I discussed this further in #2873 and #2896). Drasil/code/drasil-theory/lib/Theory/Drasil/MultiDefn.hs Lines 75 to 78 in 2178e68 Drasil/code/drasil-theory/lib/Theory/Drasil/GenDefn.hs Lines 40 to 41 in 2178e68 etc. (unfortunately, there are very few typeclass examples that I can recall right now) Chunk InstancesChunks are instantiated using deeply embedded DSLs in Drasil: Drasil/code/drasil-example/glassbr/lib/Drasil/GlassBR/Unitals.hs Lines 74 to 83 in 2178e68 Drasil/code/drasil-example/glassbr/lib/Drasil/GlassBR/IMods.hs Lines 41 to 53 in 2178e68 But then we need to manually gather all relevant chunks into a single database, for auditing and usage: Drasil/code/drasil-example/glassbr/lib/Drasil/GlassBR/Body.hs Lines 129 to 151 in 2178e68 And then we also need to manually define a “system” that we're currently interested in: Drasil/code/drasil-example/glassbr/lib/Drasil/GlassBR/Body.hs Lines 63 to 83 in 2178e68 CompilersNote: the compiler does not take in external data. All “source code” inputs are contained within the Haskell binaries and source code. Drasil/code/drasil-example/glassbr/app/Main.hs Lines 8 to 15 in 2178e68 I think of these [Aside: note that a compiler is any coherent, meaningful, collection of transformers connected together to form some sort of product from some source input.] A runtimeFinally, the most important part: the part where we run Drasil. We run Drasil by compiling the Haskell source code and then running the produced executables. Every time we run those executables, we get the exact same outputs, ignoring things inputted to the executable (e.g., the “current” time, etc.). So, the compiled Haskell code is our runtime. [ASIDE: This is similar to this original topic #1 idea where I talked about partial evaluation, the chunk db, the executables, and whatnot.] ObservationsSo, what can we observe from these specific cases? Individually, from the examplesChunk TypesWe have a common record syntax with very similar boilerplate spread around (P2). The typeclasses mirroring the functionality of an internally held is also widespread, but we have no way of automating it or designating without the excess Haskell cruft (P2.5). However, for other chunk types, such as Expr, ModelExpr, and CodeExpr, we have significant duplication on them (P3). We also have manually written “UIDs” for every chunk (P4), and we don't have any guarantee that they are unique (P5, but when #2873 is implemented, this issue becomes with respect to a particular ChunkDB). Chunk reference maps and “referenced by” maps are also something that we should be able to automate from the structure of the types (P6). Interestingly, with both the types and the transformers, we're missing out on something important (and it's a bit more evident with Expr/ModelExpr/CodeExpr): theories (~ modules as records, typeclasses, etc. -- we will hopefully be talking more about this in the next in-person meeting). Chunk TransformersOur chunk transformers follow the same general scheme but are spread around everywhere. Analyzing what things we can do at any particular typed “hole” is difficult (P7) because it requires us to use our Haddock documentation and look for general functions. I would like us to be able to have options shown to us similar to Chunk InstancesWe build our chunk instances using DSLs and manually gather them for a ChunkDB. However, we've run into issues with conflicting UIDs. Additionally, when chunks are built and type-check, we don't necessarily have a guarantee that they will actually work against the staged integrity checks we have (P8) until we actually run the whole program (which wastes our time). CompilersThe “compiler” is really just a defined, meaningful gathering of transformers that can produce some sort of encoding of some set of artifacts that we can dump/print onto host computers. Right now, this isn't very clear in Drasil, and we aren't able to really analyze any compiler (though, we only have 1 really at the moment) to show what knowledge (chunk types, etc.) it actually depends on (P9). Runtime[ASIDE: This is similar to my original post where I analyzed the Haskell runtime.] The runtime is a bit peculiar. It's what we use to look for information about Drasil's staged checks. But what's really going on during the runtime? All chunk instances, types, etc. are held in the executable. How can we get that information about stages and constraint checks earlier? Recall: partial evaluation. W.r.t. Drasil, all data is static since there is no interactivity, input IO, no command line arguments, etc. In other words, GHC should, in theory, be able to residualize the “compilers” (as shown above) down to some code that (a) prints feedback to the console, and (b) dumps software artifacts to the host machine if everything went well in the staged checks and compilation. Note that the feedback and the software artifact representations should be fully computed in the binaries already because Drasil has no external inputs. In other words, there is a sense of programming interactivity with the staged checks with the compiler that we are missing out on (P9), because it is completely deferred to the runtime of the Haskell binaries. How can we obtain that back and forth interactivity that we have with Haskell, with our Drasil programs? TogetherLooking at these together, what are we observing? We are observing the mechanical cruft Drasil has (as well as a lack of some analysis and other IDE-like tooling). [This sentence is meant to be an aggregation of the earlier “P” problems.] What are all of these a symptom of? These are all a symptom of the fact that Drasil is deeply embedded in Haskell. Is Haskell the right syntax we should be using to convey our ideas? Similar to what past and current researchers have done, can we abstract over it? Well, we can try to apply Haskell reflection, quotation, evaluation, and other (meta)programming concepts here, but we'll still be encoding something in Haskell which we won't really have much of an internal understanding of. How can we obtain that? That would be a step towards the Drasil in Drasil long-term goal as well. Until here, many languages and toolkits before Drasil focused on abstracting over monotonous tasks. The work involved capturing, what they deemed as, the “important” bits and showing how you can mechanically have the other things. They captured the key ideas from a more general syntax and made a domain-specific one from it with their key ideas. Drasil is really the same idea (to me), whereby we have software artifacts, a flow that's generally well-understood, and a bunch of cruft around it that we don't want to deal with, so we have Drasil to help us improve that workflow of developing the software artifacts, and also let us improve the quality of the software artifacts while we do it. Those 5 major ideas form a powerful ideology about software construction, capturing every conceivable facet of the design, development, contribution (i.e., requirements), maintenance, and design. [I wrote a lot about this in chapter 2 of my Master's thesis, so I will spare your time here.] Problem StatementDrasil abstracts over families of software by their related problems, allowing users to describe software problems and generate solutions to them. However, as Drasil is constructed, there is growing cruft in the implementation, data is internal, encoded knowledge is hard to analyze, and the Haskell syntax shows conflicts with what we want from Drasil (e.g., theories and theory combinators, better analysis tooling, interactivity with staging, and the likes). Can we create a better syntax, specialized tools, and an interpreter to relieve us from the issues? Potential Research Work/Questions
|
Beta Was this translation helpful? Give feedback.
-
About the start of the long comment: This is a long description of the large distance between:
In some sense, it is straightforward to abstract over assembly but, as you say, it gets harder and harder to abstract over higher level languages. This is where we need to be guided by the reason why we're abstracting, i.e. the eventual goal. That's where the top-down picture comes in. We want to look at the actual tasks we want the computer to perform; and we want to look at the highest-level description of those tasks, i.e. the words we use between humans to describe those tasks. We need to add an additional idea (which is implicit here and in Drasil): that the collection of tasks that we wish to perform, in general, contains a huge amount of similarities. There might be tens of thousands of uses for computers, but these uses contain a lot of repetition of various pieces (call them features, components, whatever). From Draco onwards, we know that "domain knowledge" is a key idea behind this. It sits somewhere close to the highest level of abstraction. But that kind of knowledge is too raw, it captures only some aspects of things, generally missing the "know-how" (also called "procedural knowledge" by philosophers and in cognitive science, but that's just very confusing for us, as we've overloaded 'procedural' a bit too much). Drasil then starts from the idea that a lot of what is traditionally written by hand by humans could in fact be the output of some other program. We still have some non-trivial gaps:
One of the things that is sub-optimal in the 5 major components description of Drasil is that it misses one level, and it's the one that sits underneath the chunks: the basic kinds of data that's actually gathered together into chunks. That is really where the analysis needs to start. I do agree with the view that a very large part of our code base is 'transformers'. It is still useful to classify them using
Chunk instances then occur because our 'chunks' are like types (for information) and then the instances are actual instances (i.e. like terms of a type). This is good. But we do need to revisit our construction methods to make this nicer. The notion of 'run time' is actually quite a bit more complicated. There is the run-time of Drasil as a generator, as well as the run-time of the generated code (and the compile-time of the generated code in some cases too). Finding good words to describe that is hard. It's worth reading up on partial evaluation (especially the 'cogen' idea that comes out of the Futumura projections) to get a better handle on that. At the end, there are some excellent questions, like "What does Drasil really do?". A really solid description of that is still missing. A proper answer is big and complicated, and needs to encompass all of Drasil. Re: "Is Haskell the right language for Drasil?" To me the answer is a resounding 'yes' as well as a resounding 'no'! "Huh?" you might say. This is because, to me, Drasil is the name for a collection of languages and language processors. The first thing we need to have in hand is a proper name for all of the languages, and all of the processors. Then, for every single one, we can ask the question of what language should that be written in (i.e. what meta-language should we use for each). Because Drasil is a piece of software that performs a task that we want performed, if we go back to the start of this (now long) comment, we see how Drasil itself can be seen as fitting in that gap, and how it could be the target of abstraction. About the very last comment: "can we abstract over what that manually created generator does [...]". My answer: once we thoroughly understand what the domain of it is, what its input, output and process are, yes we can. We are several steps away from that thorough understanding. We have lots of simpler pieces that we still don't thoroughly understand. All in all: great stuff. |
Beta Was this translation helpful? Give feedback.
-
Thank you @JacquesCarette 😄 I will respond shortly after I write this comment — I just need to get this thought out of my head and onto something before I forget. On my drive home, I realized it might be good to write out what I think the work might look like in stages:
other tasks might include:
Asides from steps (1) and (2), the order of the rest of the two lists doesn't matter much, I think. Note: one super critical thing that I haven't really spoken about/fully understand well enough, which would greatly affect the above steps, is @JacquesCarette's question about "what sits beneath chunks." I think these approximate steps are in tune with what you're mentioning, too, @JacquesCarette? |
Beta Was this translation helpful? Give feedback.
-
I have a bit of a mind dump regarding the past discussion about encoding transformers as typeclasses, I think there's a bit more nuance to it than I previously understood. The understanding previous was that since we mine "higher-level" encodings from the "lower-level" ones, that it should be a property of the higher-level ones that the lower-level ones can be recreated from it. I think that's a fair assessment, but I also claimed that we could capture all transformers using typeclasses. For example, I thought we could use (approximately):
The Some transformers might be "properties" of the encodings, such as encodings be expressible in So, we have two transformers we want to encode: (i - a property) recreate Ok, now, am I thinking too much about this (note that I only thought about this a while ago, but I only typed this up now)? Is this just an oddity of Haskell? Is this a conflict between Haskell syntax and our desired syntax for building Drasil? Should all "transformers" have a "configuration" knowledge? In other words, should properties be captured in this same style as general transformers (maybe properties need to be captured differently)? All this being said, I think it might be better to look at this through the lens of theories, theory extensions, and theory morphisms (amongst other kinds of theory-related concepts) and re-evaluate. I am going to start driving to McMaster now. See y'all soon! |
Beta Was this translation helpful? Give feedback.
-
De-embedding Drasils Implementation
Towards "Drasil in Drasil"
Problem Statement
With each case study, we have a general understanding of what occurs during its runtime (and what we will ever do in it): create a "ChunkDB" -> register chunks in the ChunkDB -> repeatedly generate things using pre-written "generation directives" with a basic set of "input choices." The runtime-registered chunks rely on type information described in Haskell, but these types (and the information we have about the types) are not available at Drasils runtime -- they are hidden from the runtime, exposed only to the Haskell compiler. Re-creation of the type information at runtime requires delving into complex reflection (specifically relying on GHC now, if I understand correctly).
As we are compiling the current Drasil source code with deeply embedded instances of data, and then running each compiled case study, we know that each execution of the compiled binary will give us the same generated results with each run. Realistically, since all data we input is known at compile-time, GHC should be able to discard of the nearly the entire "runtime" by evaluating the whole program during compilation (other than the final IO performing action that either errors out, or dumps the final
Doc
s onto the working directory, i.e., the data it dumps should be fully evaluated and in the compiled binary itself). Hence, compiling Drasil and its input information is peculiar. It appears that Drasil wants to become an interpreter for its input information rather than something that is compiled alongside its input information.Purpose Statement
The purpose of this research is to understand what we are building in the Haskell implementation and what is needed of a host language through (i) building a Drasil language, and interpreter, (ii) re-writing the knowledge contained in Drasils Haskell implementation in the new language (ensuring that the same artifacts can also be generated), and (iii) describe Drasils syntax and interpreter in the new language such that we can generate the interpreter. In order for (iii) to be feasible, we will first need to (iv) bootstrap a Drasil language interpreter in another language.
Research Questions
Beta Was this translation helpful? Give feedback.
All reactions