diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 00000000..f9017b79 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,19 @@ +# Generated by roxygen2: do not edit by hand + +S3method("$",module) +S3method(print,module) +export("?") +export(help) +export(import) +export(import_) +export(import_package) +export(import_package_) +export(module_file) +export(module_help) +export(module_name) +export(register_S3_method) +export(reload) +export(set_script_path) +export(unload) +importFrom(stats,setNames) +importFrom(utils,lsf.str) diff --git a/inst/doc/basic-usage.R b/inst/doc/basic-usage.R new file mode 100644 index 00000000..1b096698 --- /dev/null +++ b/inst/doc/basic-usage.R @@ -0,0 +1,66 @@ +## ----include=FALSE------------------------------------------------------- +devtools::load_all() +import('source-file') + +## ------------------------------------------------------------------------ +seq = import('utils/seq') +ls() + +## ------------------------------------------------------------------------ +ls(seq) + +## ----eval=FALSE---------------------------------------------------------- +# ?seq$seq + +## ------------------------------------------------------------------------ +s = seq$seq(c(foo = 'GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC', + bar = 'CATAGCAACTGACATCACAGCG')) +s + +## ------------------------------------------------------------------------ +seq$print.seq + +## ------------------------------------------------------------------------ +# We can unload loaded modules that we assigned to an identifier: +unload(seq) + +options(import.path = 'utils') +import('seq', attach = TRUE) + +## ------------------------------------------------------------------------ +search() + +## ------------------------------------------------------------------------ +detach('module:seq') # Name is optional +local({ + import('seq', attach = TRUE) + table('GATTACA') +}) + +## ------------------------------------------------------------------------ +search() +table('GATTACA') + +## ----file='utils/__init__.r'--------------------------------------------- + +## ------------------------------------------------------------------------ +options(import.path = NULL) # Reset search path +utils = import('utils') +ls(utils) +ls(utils$seq) +utils$seq$revcomp('CAT') + +## ----eval=FALSE---------------------------------------------------------- +# export_submodule('./seq') + +## ----file='info.r'------------------------------------------------------- + +## ------------------------------------------------------------------------ +info = import('info') + +## ------------------------------------------------------------------------ +import('info') + +## ------------------------------------------------------------------------ +reload(info) + diff --git a/inst/doc/basic-usage.html b/inst/doc/basic-usage.html new file mode 100644 index 00000000..1efa1965 --- /dev/null +++ b/inst/doc/basic-usage.html @@ -0,0 +1,1685 @@ + + + + +
+ + + + + + + + + + +seq
moduleFor the purpose of this tutorial, we are going to use the toy module utils/seq
, which is implemented in the file utils/seq.r
. The module implements some very basic mechanisms to deal with DNA sequences (character strings consisting entirely of the letters A
, C
, G
and T
).
First, we load the module.
+ +## [1] "seq"
+utils
serves as a supermodule here, which groups several submodules (but for now, seq
is the only one).
To see which functions a module exports, use ls
:
## [1] "print.seq" "revcomp" "seq"
+## [4] "table" "valid_seq" "valid_seq.default"
+## [7] "valid_seq.seq"
+And we can display interactive help for individual functions:
+ +This function creates a biological sequence. We can use it:
+ +## >foo
+## GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC
+## >bar
+## CATAGCAACTGACATCACAGCG
+Notice how we get a pretty-printed, FASTA-like output because the print
method is redefined for the seq
class in utils/seq
:
## function (seq, columns = 60)
+## {
+## lines = strsplit(seq, sprintf("(?<=.{%s})", columns), perl = TRUE)
+## print_single = function(seq, name) {
+## if (!is.null(name))
+## cat(sprintf(">%s\n", name))
+## cat(seq, sep = "\n")
+## }
+## names = if (is.null(names(seq)))
+## list(NULL)
+## else names(seq)
+## Map(print_single, lines, names)
+## invisible(seq)
+## }
+## <environment: 0x7fcca2b58b08>
+That’s it for basic usage. In order to understand more about the module mechanism, let’s look at an alternative usage:
+# We can unload loaded modules that we assigned to an identifier:
+unload(seq)
+
+options(import.path = 'utils')
+import('seq', attach = TRUE)
After unloading the already loaded module, the options
function call sets the module search path: this is where import
searches for modules. If more than one path is given, import
searches them all until a module of matching name is found.
The import
statement can now simply specify seq
instead of utils/seq
as the module name. We also specify attach=TRUE
. This has an effect similar to package loading (or attach
ing an environment): all the module’s names are now available for direct use without necessitating the seq$
qualifier.
However, unlike the attach
function, module attachment happens in local scope only. Since the above code was executed in global scope, there’s no distinction between local and global scope:
## [1] ".GlobalEnv" "module:seq" "devtools_shims"
+## [4] "package:modules" "package:stats" "package:graphics"
+## [7] "package:grDevices" "package:utils" "package:datasets"
+## [10] "rprofile" "Autoloads" "package:base"
+Notice the second position, which reads “module:seq”. But now let’s undo that, and attach (and use) the module locally instead.
+detach('module:seq') # Name is optional
+local({
+ import('seq', attach = TRUE)
+ table('GATTACA')
+})
## [[1]]
+##
+## A C G T
+## 3 1 1 2
+Note that this uses seq
’s table
function, rather than base::table
(which would have a different output). Furthermore, note that outside the local scope, the module is not attached:
## [1] ".GlobalEnv" "devtools_shims" "package:modules"
+## [4] "package:stats" "package:graphics" "package:grDevices"
+## [7] "package:utils" "package:datasets" "rprofile"
+## [10] "Autoloads" "package:base"
+
+##
+## GATTACA
+## 1
+This is very powerful, as it isolates separate scopes more effectively than the attach
function. What is more, modules which are imported and attached inside another module remain inside that module and are not visible outside the module by default.
Nevertheless, the normal, recommended usage of a module is with attach=FALSE
(the default), as this makes it clearer which names we are referring to.
Modules can also be nested in hierarchies. In fact, here is the implementation of utils
(in utils/__init__.r
: since utils
is a directory rather than a file, the module implementation resides in the nested file __init__.r
):
The submodule is specified as './seq'
rather than 'seq'
: the explicitly provided relative path prevents lookup in the import search path (that we set via options(import.path=…)
earlier); instead, only the current directory is considered.
We can now use the utils
module:
## [1] "seq"
+
+## [1] "print.seq" "revcomp" "seq"
+## [4] "table" "valid_seq" "valid_seq.default"
+## [7] "valid_seq.seq"
+
+## ATG
+We could also have implemented utils
as follows:
This would have made all of seq
’s definitions immediately available in utils
. This is sometimes useful, but should be employed with care.
utils/seq.r
is, by and large, a normal R source file. In fact, there are only two things worth mentioning:
Documentation. Each function in the module file is documented using the roxygen2 syntax. It works the same as for packages. The modules package parses the documentation and makes it available via module_help
and ?
.
The module exports S3 functions. The modules package takes care to register such functions automatically but this only works for user generics that are defined inside the same module. When overriding “known generics” (such as print
), we need to register these manually via register_S3_method
(this is necessary since these functions are inherently ambiguous and there is no automatic way of finding them).
Module files can contain arbitrary code. It is executed when loaded for the first time: subsequent import
s in the same session, regardless of whether they occur in a different scope, will refer to the loaded, cached module, and will not reload a module.
We can illustrate this by loading a module which has side-effects, 'info'
.
message('Loading module "', module_name(), '"')
+message('Module path: "', basename(module_file()), '"')
Let’s load it:
+ +## Loading module "info"
+## Module path: "vignettes"
+We have imported the module, and get the diagnostic messages. Let’s re-import the module:
+ +… no messages are displayed. However, we can explicitly reload a module. This clears the cache, and loads the module again:
+ +## Loading module "info"
+## Module path: "vignettes"
+And this displays the messages again. The reload
function is a shortcut for unload
followed by import
(using the exact same arguments as used on the original import
call).
The info
module also show-cases two important helper functions:
module_name
contains the name of the module with which it was loaded. This is especially handy because outside of a module module_name
is NULL
. We can harness this in a similar way to Python’s __name__
mechanism.
module_file
works equivalently to system.file
: it returns the full path to any file within a module. This is helpful when distributing data files with modules, which are loaded from within the module. When invoked without arguments, module_file
returns the full path to the directory containing the module source file.
Modules don’t have a built-in foreign function interface yet but it is possible to integrate C++ code via the excellent Rcpp package.
+As an example, take a look at the rcpp
module found under inst/doc
; the module consists of a C++ source file which is loaded inside the __init__.r
file:
Here’s the C++ code itself (the example is taken from the Rcpp documentation):
+#include "Rcpp.h"
+
+using Rcpp::NumericVector;
+
+// [[Rcpp::export]]
+NumericVector convolve(NumericVector a, NumericVector b) {
+ int na = a.size(), nb = b.size();
+ int nab = na + nb - 1;
+ NumericVector xab(nab);
+ for (int i = 0; i < na; i++)
+ for (int j = 0; j < nb; j++)
+ xab[i + j] += a[i] * b[j];
+ return xab;
+}
This module can be used like any normal module:
+ +## [1] "convolve"
+
+## [1] 1 4 10 16 22 22 15
+Unfortunately, this has a rather glaring flaw: the code is recompiled for each new R session. In order to avoid this, we need to compile the code once and save the resulting dynamic library. There’s no straightforward way of doing this, but Rcpp wraps R CMD SHLIB
.
For the time being, we manually need to trigger compilation by executing the __install__.r
file found in the inst/doc/rcpp
module path.
Once that’s done, the actual module code is easy enough:
+# Load compiled module meta information, and load R wrapper code, which, in
+# turn, loads the compiled module via `dyn.load`.
+load_dynamic = function (prefix) {
+ context = readRDS(module_file(sprintf('%s.rds', prefix)))
+ source(context$rSourceFilename, local = parent.frame())
+}
+
+load_dynamic('convolve')
We can use it like any other module:
+ +## [1] 1 4 10 16 22 22 15
+