EPCCed · marioa · Feb 4, 2025 · Feb 4, 2025 · Feb 4, 2025 · Feb 4, 2025
diff --git a/docs/images/CreateDataset.png b/docs/images/CreateDataset.png
diff --git a/docs/services/datapublishing/catalogue.md b/docs/services/datapublishing/catalogue.md
@@ -0,0 +1,56 @@
+# Data Publishing
+
+## Customising your entry in the EIDF Data Catalogue
+
+When/if your project is approved and you are close to publishing your data, a CKAN organisation will be created in the [EIDF Data Catalogue](https://catalogue.eidf.ac.uk/) for you. We do not automatically create these as we do not want a lot of organisations to be present with no published data. 
+
+You can login to the EIDF Data Catalogue using your SAFE credentials - there is a "Log in" link on the top centre right. Find your organisation and then you can customise it by clicking on the "Manage" button on the top-right, e.g. you can provide a more friendly name than the EIDF project number, you can provide a description for your organisation, provide a logo or image representing your organisation and associate metadata pairs to aid discovery. Customising your organisation will make it more attractive to those that may want to use your data and will also aid in discovery.
+
+!!! warning "**Do NOT use the CKAN interface to create Datasets**"
+    The data ingest process creates these for you and associates S3 links with your data. You can provide additional metadata once the Dataset records are in CKAN. Please do not add datasets through the CKAN interface either. Contact us if would like anything removed.
+
+## Creating your dataset(s)
+
+Once your project is approved go to your project in the EIDF portal at this link:
+
+* [https://projects.eidf.ac.uk/ingest/](https://projects.eidf.ac.uk/ingest/)
+
+Select the project which you want to use to ingest data. The list of `Ingest Datasets` will be empty unless you have already created Datasets.
+
+Create a Dataset by pressing on the `New` button. You will need to provide the following minimal bits of information:
+
+* **Name**: The name for your dataset.
+* **S3 Bucket name**: this entry will automatically be populated from your dataset name to create your S3 bucket name. You can customise the name for yourself subject to the constraints specified below the text box by editing the link directly. Note though if you change the dataset name you will overwrite the S3 bucket name link if you have customised it. Your project id at the start you will not be able to change.
+* **Number of buckets**: you may want to distribute your data over a number of S3 buckets if your dataset is big.
+* **Description**: a description of your dataset.
+* **Link**: a link describing your group/contact information.
+* **Contact email**: a contact email to answer queries about your data set (this is optional).
+* **License**: the license under which you will distribute your data.
+
+An example of the form is given below.
+
+![Interface to create a dataset](../../images/CreateDataset.png)
+
+Once you are happy with the content press on the `Create` button. This will be used to create an S3 bucket to which you will be able migrate your data. 
+
+You can create a Dataset within your organisation on the EIDF Data Catalogue and a data buckets in S3.
+
+You should now be able to click on a link to your dataset to see a copy of the information that you provided. When the the S3 bucket has been created and you have added the data, you can add the S3 link to the catalogue entry. You can supplement your Dataset entry in the EIDF catalogue with additional metadata once you have logged into the data catalogue using your SAFE credentials.
+
+## Metadata format
+
+Metadata for resources in your dataset are added directly through the EIDF Data Catalogue.
+
+Make sure you're logged in to the [EIDF Data Catalogue](https://catalogue.eidf.ac.uk). Open the page of your dataset and click on "Manage" at the top right. Open the "Resources"  tab and press the button "+ Add new resource". Now you can fill in the form and describe your data as you wish. Some entries that are required and these are marked with a red "\*" in the EIDF Data Catalogue:
+
+* **Name**: a descriptive name for your dataset.
+* **Access URL**: this is a link to a file in S3 or a set of files with a common prefix, that you uploaded as explained above.
+* **Description**: a human readable description of your data set.
+* **Resource format**: the type of data included in your resource.
+* **Unique Identifier**
+* **Licence**: the licence under which you are releasing your data.
+
+Having created an S3 bucket and provided Metadata for this data set in the EIDF Data Catalogue please consult the [S3 tutorial section](../s3/tutorial.md) to get an overview of the commands you will require and some examples.
+
+!!! note
+     If it is not going to be immediately obvious to a third party as to how your data may be used then please do provide a link to some documentation showing people how to unpack/use your data. Not everyone who may want to use your data may be a domain expert in your field.
diff --git a/docs/services/datapublishing/service.md b/docs/services/datapublishing/service.md
@@ -0,0 +1,36 @@
+# Data Publishing
+
+## Service provision
+
+The EIDF guarantees, to the best of its ability, to continue its services until at least the 31-Mar-2032 and aims to continue beyond 2032 subject to funding. However, should we have to terminate the service we will give you at least 3 months notice to retrieve your data. It is thus important to keep your contact details up to date. The publishing service is not an archiving service and we recommend where possible that you have a backup version of your data outside of the EIDF. 
+
+We reserve the right to remove any data that is not legal or inappropriate as well as any data that remains at the end of service provision.
+
+Some basic assumptions - 
+
+* You already have a [SAFE](https://safe.epcc.ed.ac.uk/) account and can access the [EIDF portal](https://portal.eidf.ac.uk/). Otherwise consult the [EIDF portal documentation](https://docs.eidf.ac.uk/access/project/) before proceeding.
+* To qualify for the free data publishing service your data must be open and freely available to all. If you want to control access to your data you should use the [S3 service](https://epcced.github.io/eidf-docs/services/s3/) instead. This is not a free service.
+* The service is free up to a given threshold data volume which is generous. If the data you wish to publish is bigger than this we will get in touch with you when you apply for a data publishing project.
+
+If you find anything in this documentation that you think is not clear, missing or even wrong please let us know via the EIDF query system.
+
+## Applying for a data project
+
+To start the process you will need to apply for an EIDF data project which is slightly different from other EIDF project applications.  
+
+!!! note
+    A data publishing version of the EIDF portal will be deployed in the near future. For now, you will have to use the generic portal.
+
+In the [EIDF portal](https://portal.eidf.ac.uk/):
+
+* Press on the `Your project applications` link. 
+* Press on the `New Application` link and put in an application for us to host your data. 
+  * You will be asked to supply a title for your application.
+  * A start date (when you hope to start publishing your data).
+  * A proposed end date (at the moment you will not be able to go beyond 31-Dec-2032).
+
+For the EIDF Services you require chose the "*ingest data formally into EIDF for long-term hosting*" choice. Note that all the other EIDF services have a [cost](https://edinburgh-international-data-facility.ed.ac.uk/access) so, if you add any other EIDF services a charge will be imposed. The data publishing incur a cost if you go over a threshold - we will get in touch if you pass this threshold.
+
+Be sure to describe the dataset(s) that you wish to ingest. Submit your application. Your application will be reviewed and you will be notified if your project has been approved or rejected - someone may be in touch to clarify points in your application.
+
+Once your data project has been approved we will create an organisation in our EIDF [Data Catalogue](https://catalogue.eidf.ac.uk). The Data Catalogue is a customised [CKAN](https://ckan.org/) instance, an open source application for data management systems. We map EIDF projects to CKAN organisations. A CKAN organisation allows you to brand your organisation, allow you to provide metadata to aid the discovery of your data and to publish your datasets together with metadata specific to those data sets.
diff --git a/docs/services/s3/tutorial.md b/docs/services/s3/tutorial.md
@@ -67,6 +67,56 @@ To read from a public bucket without providing credentials, add the option `--no
 aws s3 ls s3://<bucketname> --no-sign-request
 ```
 
+### Examples
+
+You want to upload all the files in a subdirectory to your S3 bucket
+
+```bash
+aws s3 cp ./mydir s3://mybucket --recursive --exclude "*" \
+            --include "*.dat" 
+```
+
+Here all `*.dat`  files only in `mydir` will be uploaded to `s3://mybucket`.
+
+You can check your upload using:
+
+```bash
+aws s3 ls --summarize --human-readable --recursive s3://mybucket/
+```
+
+You can get help on the options for any command using:
+
+```bash
+aws s3 help
+```
+
+or for particular commands
+
+```bash
+aws s3 ls help
+```
+
+For public S3 buckets, such as those provided for the data publishing service,  you can construct a downloadable https link  to download files from an S3 link, e.g. taking:
+
+```text
+s3://eidfXXX-my-dataset/mydatafile.csv
+```
+
+and by making the following transformation:
+
+```
+https://s3.eidf.ac.uk/eidfXXX-my-dataset/mydatafile.csv
+```
+
+You can use your browser to download a particular file. Alternatively, you can use the aws client to download an entire data set:
+
+```bash
+aws s3 cp --recursive s3://eidf158-walkingtraveltimemaps/ ./walkingtraveltimemaps \
+            --no-sign-request
+```
+
+will copy the entire content of the S3 bucket to your `walkingtraveltimemaps` subdirectory. Note that you must use `--no-sign-request` when looking at other people's buckets.
+
 ## Python using `boto3`
 
 The following examples use the Python library `boto3`.
@@ -158,7 +208,8 @@ Buckets owned by an EIDF project are placed in a tenancy in the EIDF S3 Service.
 The project code is a prefix on the bucket name, separated by a colon (`:`), for example `eidfXX1:somebucket`.
 Note that some S3 client libraries do not accept bucket names in this format.
 
-Bucket permissions use IAM policies. You can grant other accounts (within the same project or from other projects) read or write access to your buckets.
+Bucket permissions use IAM (Identity Access Management) policies. You can grant other accounts (within the same project or from other projects) read or write access to your buckets.
+
 For example to grant permissions to put, get, delete and list objects in bucket `eidfXX1:somebucket` to the account `account2` in project `eidfXX2`:
 
 ```json

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -76,6 +76,9 @@ nav:
       - "Overview": services/s3/index.md
       - "Manage": services/s3/manage.md
       - "Tutorial": services/s3/tutorial.md
+    - "Data Publishing":
+       - "Getting started": services/datapublishing/service.md
+       - "Your Data Catalogue entry": services/datapublishing/catalogue.md
     - "Data Catalogue":
       - "Metadata information": services/datacatalogue/metadata.md
     #- "Managed File Transfer":