forked from UNM-CARC/QuickBytes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdocker_singularity_intro.html
335 lines (272 loc) · 11.9 KB
/
docker_singularity_intro.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>Docker and Singularity</title>
<style type="text/css">
body {
font-family: Helvetica, arial, sans-serif;
font-size: 14px;
line-height: 1.6;
padding-top: 10px;
padding-bottom: 10px;
background-color: white;
padding: 30px; }
</style>
</head>
<body>
<h1>Docker and Singularity</h1>
<h2>Introduction</h2>
<p>Have you ever installed software on your own system to process data, but then find it is not so easy to setup the same environment on a CARC cluster when you want to process larger data sets? CARC staff will work with you to get your software running, but occasionally we run across fundamental incompatabilities. Docker allows us to circumvent those incompatabilities.</p>
<p>Docker and Singularity are free tools that allow you to run software with dependencies on its environment that are difficult to satisfy natively on the CARC clusters. For example, your software might depend on a particular flavor of linux that differs from the one installed on the cluster you want to use. Perhaps your software needs to make global changes to the operating system that are incompatible with the needs of the center or other users. Docker allows us to deploy software with these requirements.</p>
<p>In this guide you will learn how to setup your software to run in a docker container, and how to convert the container to singularity and run it at CARC. The R software package party.</p>
<p>
In the example below commands to enter are given in <code><font color="Green">Green</font></code> and the resulting output is given in <code><font color="Navy">Navy</font></code>.
</p>
<h2>What is Docker?</h2>
<p>
Docker allows you to setup a virtual environment for your program that you can configure however you like. You can choose the underlying operating system (so long as it is linux based), install any packages you need, and make any other changes the root user could make. The custom OS environment is stored in a <i>docker image</i> file that can be loaded by any docker installation. Docker has online repositories with many pre-built images available to download.
</p>
<p>When loaded, a docker image provides a <i>container</i> that allows the software to have complete control of its environment without effecting the host operating system.</p>
<p>If you have used a virtual machine (such as virtualbox, or vmware) the description of docker images will sound familiar. The main difference is that docker just containerizes the environment but still uses the host operating system's kernel. This means there is very little performance impact.
<h2>What is Singularity?</h2>
<p>Singularity is able to convert and run docker images into a secure form suitable for multiuser machines such as the CARC clusters.</p>
<h2>Installing Docker</h2>
<p>Docker runs on Microsoft Windows, Apple OS X, and Linux. All the systems at CARC are linux based so you will need to create and configure a linux based docker container. OS X and Linux will allow you to do this. Windows 10 can as well but you need to install the Windows Subsystem for Linux and a linux distribution.</p>
<p>
Docker can be downloaded from <a href="https://www.docker.com">www.docker.com</a>.
</p>
<h2>Running Docker</h2>
<p>To start docker under Windows or OS X just launch the program like you would any other. To start the docker daemon under linux enter:</p>
<p>
<code><font color="Green">sudo systemctl start docker</font></code>
</p>
<h2>Creating a Docker Image</h2>
<h3>Pulling an existing base docker image</h3>
For this example we are going to use an existing docker image that has the CentOS linux distribution preconfigured. We could just as easily use Ubuntu by replacing <q>centos</q> with <q>ubuntu</q> below.
<p>
<code><font color="Green">
docker pull centos
</font></code>
</p>
<p>
<code><font color="Navy">
Using default tag: latest
Trying to pull repository docker.io/library/centos ... <br>
latest: Pulling from docker.io/library/centos<br>
a02a4930cb5d: Pull complete<br>
Digest: sha256:184e5f35598e333bfa7de10d8fb1cebb5ee4df5bc0f970bf2b1e7c7345136426<br>
Status: Downloaded newer image for docker.io/centos:latest</br>
</font></code>
</p>
<p>
We can now issue a command that runs inside the docker container and returns output. Below we run the <code><font color="Green">ls -la</font></code> command to get a file listing inside the container.
</p>
<p>
<code><font color="Green">
docker run centos ls
</font></code>
</p>
<p>
<code><font color="Navy">
anaconda-post.log<br>
bin<br>
dev<br>
etc<br>
home<br>
lib<br>
lib64<br>
media<br>
mnt<br>
opt<br>
proc<br>
root<br>
run<br>
sbin<br>
srv<br>
sys<br>
tmp<br>
usr<br>
var<br>
</font></code>
</p>
<h3>Customizing the docker image</h3>
<p>Now we can install the program we need inside the docker container and any dependencies it needs. It is convenient to do this interactively inside the container. Configuring the container in interactive mode allows us to do everything we need to do before saving the changes to a docker image. If we issued commands one-by-one with <code><font color="Green">run</font></code> we would lose the state of the container after each command.</p>
<p>
<code><font color="Green">
docker run -it centos
</font></code>
</p>
<p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]#
</font></code>
</p>
<p>
The long string after <code><font color="Navy">@</font></code> is the name of the container we are running in. Notice it is not the same name as the docker image.
</p>
<p>
In a <i>new</i> terminal we can list the running docker containers and see that our container is running.
</p>
<p>
<code><font color="Green">
docker container list
</font></code>
</p>
<code><font color="Navy">
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES<br>
a264d0f7b8c9 centos <q>/bin/bash</q> About a minute ago Up About a minute striking_mahavira<br>
</font></code>
</p>
<h3>Example: Installing the <q>party</q> R package</h3>
<p>As an example, we will install party. Party is an R package for recursive partitioning. It produces regression trees for machine learning. But you can install whatever tools you wish. This is the beauty of docker, it gives you control over your environment.</p>
<p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]# <font color="Green">yum update</font>
[root@a264d0f7b8c9 /]# <font color="Green">yum -y install epel-release</font>
[root@a264d0f7b8c9 /]# <font color="Green">yum install R</font>
</font></code>
</p>
<p>
Once R has finished installing:
</p>
<p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]# <font color="Green">R</font>
</font></code>
</p>
<p>
<code><font color="Navy">
R version 3.5.0 (2018-04-23) -- "Joy in Playing"<br>
Copyright (C) 2018 The R Foundation for Statistical Computing<br>
Platform: x86_64-pc-linux-gnu (64-bit)<br>
<br>
R is free software and comes with ABSOLUTELY NO WARRANTY.<br>
You are welcome to redistribute it under certain conditions.<br>
Type 'license()' or 'licence()' for distribution details.<br>
<br>
Natural language support but running in an English locale<br>
<br>
R is a collaborative project with many contributors.<br>
Type 'contributors()' for more information and<br>
'citation()' on how to cite R or R packages in publications.<br>
<br>
Type 'demo()' for some demos, 'help()' for on-line help, or<br>
'help.start()' for an HTML browser interface to help.<br>
Type 'q()' to quit R.<br>
<br>
></font><font color="Green">install.packages("party",deps=YES)</font>
</p>
</code>
<h2>Commit customised container to image file</h2>
<p>Once party has finished installing, our Docker <i>container</i> is ready. Now we need to commit those changes to a docker <i>image file</i> so we can load the image that has R and party installed for future use.</p>
<p>
Exit R:
</p>
<p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]# > <font color="Green">quit()</font>
</font></code>
</p>
<p>
Exit the docker container:
</p>
<p>In general:</p>
</p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]#</font> <font color="Green">exit</font><br>
<font color="Navy">$</font> <font color="Green">sudo docker commit <container id> <new image name></font>
</code>
</p>
<p>
For our example:
</p>
<p>
<code><font color="Navy">
[root@a264d0f7b8c9 /]#</font> <font color="Green">exit</font><br>
<font color="Navy">$</font> <font color="Green">sudo docker commit a264d0f7b8c9 r_party</font>
</code>
</p>
<p>Now we have a docker image saved.</p>
<h2>Running Commands using the Docker Image</h2>
<p>Now that we have a docker image we can run it on our own machines as before with:</p>
<p>
<code><font color="Green">
docker run r_party ls
</font></code>
</p>
<p>
But now we can also execute R commands such as <code><font color="Green>Rscript</font></code>.
</p>
<p>
<code><font color="Green">
docker run r_party Rscript
</font></code>
</p>
<h3></h3>
<p>For our party example we can write the following R script and save it to a file called <i>test_party.R</i>:</p>
<code><font color="Green">
library("party")<br>
set.seed(290875)<br>
### honest (i.e., out-of-bag) cross-classification of<br>
### true vs. predicted classes<br>
data("mammoexp", package = "TH.data")<br>
table(mammoexp$ME, predict(cforest(ME ~ ., data = mammoexp,<br>
control = cforest_unbiased(ntree = 50)),<br>
OOB = TRUE))<br>
### fit forest to censored response<br>
if (require("TH.data") && require("survival")) {<br>
data("GBSG2", package = "TH.data")<br>
bst <- cforest(Surv(time, cens) ~ ., data = GBSG2,<br>
control = cforest_unbiased(ntree = 50))<br>
### estimate conditional Kaplan-Meier curves<br>
treeresponse(bst, newdata = GBSG2[1:2,], OOB = TRUE)<br>
party:::prettytree(bst@ensemble[[1]], names(bst@data@get("input")))<br>
</code></font></p>
<p>
And run it using the docker image with:
</p>
<p>
<code><font color="Navy">
$ </font> <font color="Green">docker run -v < path to test_party.R folder > :/mnt r_party Rscript /mnt/test_party.R</font>
</font></code>
</p>
<h1>Singularity</h1>
<h2>Converting Docker Images to Singularty Images</h2>
<p>Once we are happy with the docker image we created we will convert it to a singularity image so we can use it on the CARC clusters.</p>
<p>We do the conversion by using a docker image provided to us that contains the necessary tools:</p>
<p>
<code><font color="Navy">
$ </font> <font color="Green">docker pull singularityware/docker2singularity</font>
</font></code>
</p>
<p>
<code><font color="Navy">
$ </font> <font color="Green">docker run -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output --privileged -t --rm singularityware/docker2singularity r_party</font>
</font></code>
</p>
<p>This will produce a singularity image in the /tmp directory that we can upload to a CARC cluster using our favorite file transfer program.</p>
<h2>Running a Singularity Image at CARC</h2>
<p>First login to a CARC cluster head node.</p>
<p>Next we will load the singularity module:</p>
<p>
<code><font color="Navy">
$ </font> <font color="Green">module load singularity-2.4.1-intel-17.0.4-sjwoqj4</font>
<font color="Navy">
$ </font></code>
</p>
<p>The syntax for executing Singularity images are similar to those we used for docker:</p>
<p>
<code><font color="Navy">
$ </font> <font color="Green">singularity exec r_party.simg Rscript test_party.R
</font></code>
</p>
<h2>Mapping Directories</h2>
<p>
<tt><font color="Navy">
$ </font> <font color="Green">singularity -B $PBS_O_WORKDIR:/mnt exec r_.simg Rscript /mnt/$CODE_DIR/process_lidar.R {} /mnt/$ORTHO_DATA_DIR/{/.}.img /mnt/$OUTPUT_DIR</font>
</font></code>
</p>
</body>
</html>