Skip to content

Commit

Permalink
TOOLS-2543 MNX Tooling updates (#153)
Browse files Browse the repository at this point in the history
Reviewed by: Dan McDonald <[email protected]>
  • Loading branch information
bahamat authored Jul 9, 2022
1 parent 6553d4a commit 5a6ab17
Show file tree
Hide file tree
Showing 152 changed files with 701 additions and 701 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ formal writing that it has come to represent.)
| predraft | [RFD 180 Linux Compute Node Containers](./rfd/0180/README.md) |
| draft | [RFD 181 Improving Manta Storage Unit Cost (MinIO)](./rfd/0181/README.md) |
| draft | [RFD 182 Altering system pool detection in SmartOS/Triton](./rfd/0182/README.md) |
| predraft | [RFD 183 Triton Volume Replication and Back up](./rfd/0183/README.md) |
| predraft | [RFD 183 Triton Volume Replication and Backup](./rfd/0183/README.md) |

## Contents of an RFD

Expand Down Expand Up @@ -406,7 +406,7 @@ then the subject would be `RFD 169 Overlay Networks for Triton`.
In the body, make sure to include a link to the RFD.

If an RFD is in the `predraft` or `draft` state, you should also [open an
issue](https://github.com/joyent/rfd/issues) to allow for additional
issue](https://github.com/TritonDataCenter/rfd/issues) to allow for additional
opportunity for discussion of the RFD. This issue should have the synopsis
that reflects its purpose (e.g. "RFD 169: Discussion") and the body should
explain its intent (e.g. "This issue represents an opportunity for discussion
Expand All @@ -418,7 +418,7 @@ points to an issue query for the RFD number. For example:
---
authors: Chewbacca <[email protected]>
state: draft
discussion: https://github.com/joyent/rfd/issues?q="RFD+169"
discussion: https://github.com/TritonDataCenter/rfd/issues?q="RFD+169"
---
```

Expand Down
2 changes: 1 addition & 1 deletion prototypes/prototype.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
authors: Han Solo <[email protected]>, Frodo Baggins <[email protected]>
state: predraft
discussion: https://github.com/joyent/rfd/issues?q=%22RFD+<Number>%22
discussion: https://github.com/TritonDataCenter/rfd/issues?q=%22RFD+<Number>%22
---

<!--
Expand Down
2 changes: 1 addition & 1 deletion rfd/0001/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -641,7 +641,7 @@ in this document, with some small changes.
* The removal hysteresis function has not yet been implemented.
* The `triton.cns.services` tag has had its syntax enhanced to support
additional metadata, such as a port number for generating SRV records.
See [the relevant documentation](https://github.com/joyent/triton-cns/blob/master/docs/metadata.md) for details.
See [the relevant documentation](https://github.com/TritonDataCenter/triton-cns/blob/master/docs/metadata.md) for details.
* Support for generating SRV records was added.
* Some support for custom PTR record generation was added, but has not been
fully documented and deployed as yet.
Expand Down
2 changes: 1 addition & 1 deletion rfd/0002/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ take the docker logger code and use that. This way:

This now lives at:

https://github.com/joyent/sdc-dockerlogger
https://github.com/TritonDataCenter/sdc-dockerlogger

and does in fact work to write logs to syslog, fluentd and gelf targets.

Expand Down
4 changes: 2 additions & 2 deletions rfd/0007/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Today, SmartOS ships a basic link layer discover protocol (LLDP) daemon
openlldp. The `lldpneighbors` command is useful today for being consumed
by an operator; however, it is not something which we can easily
consume from other programs. As part of what's been discussed in [RFD
6](https://github.com/joyent/rfd/tree/master/rfd/0006) we'd like to
6](https://github.com/TritonDataCenter/rfd/tree/master/rfd/0006) we'd like to
start consuming this information in a more programmatic way and allowing
it to help us get a sense of the data center or use it to better
understand the impact of failures.
Expand Down Expand Up @@ -228,7 +228,7 @@ This project will deliver the following components:

This work is expected to build upon part of the information that we
discussed in [RFD
6](https://github.com/joyent/rfd/tree/master/rfd/0006). It will be used
6](https://github.com/TritonDataCenter/rfd/tree/master/rfd/0006). It will be used
as the foundation of the Ethernet RAS related sections and the
combination of the datalink state and the LLDP information will be used
to augment and form the base of a series of new datalink entries in the
Expand Down
4 changes: 2 additions & 2 deletions rfd/0010/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ state: publish

## Overview

With [RFD 0002](https://github.com/joyent/rfd/tree/master/rfd/0002), we've added
With [RFD 0002](https://github.com/TritonDataCenter/rfd/tree/master/rfd/0002), we've added
support for docker logging modes which for the most part leave log files in the:

```
Expand All @@ -28,7 +28,7 @@ most recent log. In order to deal with these problems, we wanted to instead
rotate the logs to Manta so that customers could access all their logs (using
any Manta tools) and so that these logs will not fill up their container.

Per [RFD 0002](https://github.com/joyent/rfd/tree/master/rfd/0002),
Per [RFD 0002](https://github.com/TritonDataCenter/rfd/tree/master/rfd/0002),
these will be the main source of "reliable" logs, as such we would also like to
get them into Manta in a timely manner.

Expand Down
6 changes: 3 additions & 3 deletions rfd/0011/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,9 +286,9 @@ how to properly boot.
[OS-4802]: https://smartos.org/bugview/OS-4802
[FWAPI-212]: https://smartos.org/bugview/FWAPI-212
[FWAPI-225]: https://smartos.org/bugview/FWAPI-225
[NAPI-308]: https://devhub.joyent.com/jira/browse/NAPI-308
[NAPI-395]: https://devhub.joyent.com/jira/browse/NAPI-395
[NAPI-414]: https://devhub.joyent.com/jira/browse/NAPI-414
[NAPI-308]: https://mnx.atlassian.net/browse/NAPI-308
[NAPI-395]: https://mnx.atlassian.net/browse/NAPI-395
[NAPI-414]: https://mnx.atlassian.net/browse/NAPI-414
[ZAPI-598]: https://smartos.org/bugview/ZAPI-598

<!-- RFCs -->
Expand Down
10 changes: 5 additions & 5 deletions rfd/0012/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ state: draft

# RFD 12 Bedtime for node-smartdc

[node-smartdc](https://github.com/joyent/node-smartdc) is Joyent's current
[node-smartdc](https://github.com/TritonDataCenter/node-smartdc) is Joyent's current
and venerable CLI for CloudAPI. It is pretty basic (UX-wise) and we want to
replace it with the more useful and usable
[node-triton](https://github.com/joyent/node-triton). That means we need full
[node-triton](https://github.com/TritonDataCenter/node-triton). That means we need full
coverage of CloudAPI (along with rosetta stone docs, and general user docs,
etc.). This RFD is about nailing down the work to get there.

Expand Down Expand Up @@ -164,7 +164,7 @@ Notes:
'Account Config'. Cloudapi rev to rename them? Endpoints can otherwise be the
same.
- RenameMachine: This is async and don't yet have a '-w,--wait' on it.
<https://github.com/joyent/node-triton/issues/146> for that.
<https://github.com/TritonDataCenter/node-triton/issues/146> for that.
- UpdateMachineMetadata: '-a' for "add". *Is* this really about *adding*
metadata keys? I.e. excluding a key doesn't delete it? But it *does* allow
overwrite. Note that AddMachineTags does NOT allow overwrite. IOW, slight
Expand All @@ -173,13 +173,13 @@ Notes:
This is to attempt to align the differing semantics of this and
`UpdateMachineMetadata`.
- NICS: See sdc-nics from node-smartdc for some inspiration.
<https://devhub.joyent.com/jira/browse/PUBAPI-1292> for this.
<https://mnx.atlassian.net/browse/PUBAPI-1292> for this.


## RBAC

Herein the plan and discussion for RBAC support. See
<https://github.com/joyent/node-triton/issues/54> for implementation.
<https://github.com/TritonDataCenter/node-triton/issues/54> for implementation.

triton rbac ...

Expand Down
6 changes: 3 additions & 3 deletions rfd/0013/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ feel for the hierarchy and rules for these things:
- Projects have membership: an *account and a role*. The role defines the
access level.
- Roles are basically the same as now: A role has a set of polices, which have a
list of [aperture](https://github.com/joyent/node-aperture) rules like "CAN
list of [aperture](https://github.com/TritonDataCenter/node-aperture) rules like "CAN
CreateInstance" to define what actions (called "RBAC Actions") can be
performed.

Expand Down Expand Up @@ -443,7 +443,7 @@ role has a set of *policies*. A policy looks like this:
},

`CAN createmachine` is a **rule** (defined by the
[aperture](https://github.com/joyent/node-aperture) policy language).
[aperture](https://github.com/TritonDataCenter/node-aperture) policy language).
`createmachine` is an example of an RBAC **action**. In RBAC v1 the RBAC actions
(mostly) map one-to-one to CloudAPI and Muskie endpoint names.

Expand Down Expand Up @@ -720,5 +720,5 @@ currently because the docker request payload to vmapi is missing the
```

The caller info for docker operations are returned as "operator" in
[MachineAudit](https://github.com/joyent/sdc-cloudapi/blob/master/lib/audit.js#L74-L81).
[MachineAudit](https://github.com/TritonDataCenter/sdc-cloudapi/blob/master/lib/audit.js#L74-L81).
This needs to be fixed as part of the RBAC feature.
20 changes: 10 additions & 10 deletions rfd/0016/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ affected.
## Background

The jobs described above are all driven hourly or daily by the "ops" zone, which
is built from the [manta-mola](https://github.com/joyent/manta-mola) repository,
is built from the [manta-mola](https://github.com/TritonDataCenter/manta-mola) repository,
with metering implemented by the
[manta-mackerel](https://github.com/joyent/manta-mackerel) by reference. The
[manta-mackerel](https://github.com/TritonDataCenter/manta-mackerel) by reference. The
various jobs and their relationships are described in the [documentation in that
repository](https://github.com/joyent/manta-mola/blob/6f3b46703d9c906ee76ae884755acd377c815b1f/docs/index.md).
repository](https://github.com/TritonDataCenter/manta-mola/blob/6f3b46703d9c906ee76ae884755acd377c815b1f/docs/index.md).

The jobs are much more interdependent than it might seem. The daily process
broadly works like this:
Expand Down Expand Up @@ -74,7 +74,7 @@ All of these jobs are driven by cron, and each job knows nothing about its
logical dependencies. As a result, correctness depends critically on all jobs
completing on time. The discrete jobs and their schedules are described in the
["System
Crons"](https://github.com/joyent/manta-mola/blob/6f3b46703d9c906ee76ae884755acd377c815b1f/docs/system-crons.md)
Crons"](https://github.com/TritonDataCenter/manta-mola/blob/6f3b46703d9c906ee76ae884755acd377c815b1f/docs/system-crons.md)
documentation.

It's probably worth reading through the Mola documentation to better understand
Expand All @@ -91,31 +91,31 @@ These are listed in rough order of impact today:
takes too long or experiences an error, then all of the other jobs fail or
produce bad data. If the daily metering jobs fail, the subsequent summary
jobs also fail. See
[MANTA-2531](https://devhub.joyent.com/jira/browse/MANTA-2531).
[MANTA-2531](https://mnx.atlassian.net/browse/MANTA-2531).
2. **Error handling, particularly for metering jobs, is not clear.** Metering
jobs can experience a variety of issues of varying severity. They mostly
exit non-zero when these happen, causing the job to produce an error. But
the issue often only affects that one entry, and at most one user. It's not
clear to subsequent stages (e.g., summary jobs and monitoring systems)
whether the output of that job is valid. When debugging them, it's not easy
to identify the issues. See
[MANTA-2759](https://devhub.joyent.com/jira/browse/MANTA-2759) and
[MANTA-2756](https://devhub.joyent.com/jira/browse/MANTA-2756).
[MANTA-2759](https://mnx.atlassian.net/browse/MANTA-2759) and
[MANTA-2756](https://mnx.atlassian.net/browse/MANTA-2756).
3. **The results of each job are not very observable.** Questions that are
impossible to answer include: how many objects were scanned? How many users?
How many objects were processed normally? How many experienced a non-fatal
error? How many fatal errors were experienced?
4. **The jobs themselves are not very observable.** This applies to all of
these jobs. It's not easy to look at the last N days' worth of each kind of
job and see what happened. See
[MANTA-2593](https://devhub.joyent.com/jira/browse/MANTA-2593).
[MANTA-2593](https://mnx.atlassian.net/browse/MANTA-2593).
5. There are some unproven concerns about **scalability of some of the metering
processes**. In various cases in the past, we've observed metering processes
running out of memory without an obvious leak, where the process itself was
just attempting to keep track of an enormous amount of data (e.g., metadata
for each object owned by a given account). We can conceivably address this
by tuning up the number of reducers, but that only goes so far. See
[MANTA-2780](https://devhub.joyent.com/jira/browse/MANTA-2780).
[MANTA-2780](https://mnx.atlassian.net/browse/MANTA-2780).
6. **Many of the tasks used by these jobs are very long-running.** Manta jobs
were designed around tasks that would complete in a few seconds to a few
minutes. When very large tasks are used, jobs get less parallelization,
Expand All @@ -128,7 +128,7 @@ Solutions to these problems can be grouped into a few broad categories:

* Items (1) (the brittle execution pipeline) and (4) (recent job observability)
can be addressed using a system like
[Chronos](https://github.com/joyent/chronos), possibly coupled with triggers,
[Chronos](https://github.com/TritonDataCenter/chronos), possibly coupled with triggers,
to manage job execution and dependencies.
* Items (2) (error handling) and (3) (error reporting) likely require
considerable re-work of the bodies of the metering jobs. Each task should
Expand Down
4 changes: 2 additions & 2 deletions rfd/0017/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ Broadly, we'd break this up into:
supported, but expensive. Users would configure queries they'd like to be
answer quickly, and dashboards and reports that are made up of these queries.
The historical part of EVAPI is embodied today as a command-line tool called
[Dragnet](http://github.com/joyent/dragnet).
[Dragnet](http://github.com/TritonDataCenter/dragnet).
* Integration into a portal and alarming system.

There are prototype end user docs and design docs for both IAPI and EVAPI:
Expand All @@ -350,7 +350,7 @@ At this time, it's not expected that we'll necessarily tackle much of this
project, but in order to alleviate the critical monitoring problems we have
today, we'd suggest implementing pieces in this order:

* [Dragnet](http://github.com/joyent/dragnet), a system for historical analysis
* [Dragnet](http://github.com/TritonDataCenter/dragnet), a system for historical analysis
of data stored in Manta. This is largely functional today, but with very
limited support for data formats and queries.
* A Node.js library for reporting metrics that are automatically uploaded to
Expand Down
2 changes: 1 addition & 1 deletion rfd/0017/data-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ See "Implementation details" below.

### Use case 1: plugging in a custom data source

We use [pgstatsmon](https://github.com/joyent/pgstatsmon) as a representative
We use [pgstatsmon](https://github.com/TritonDataCenter/pgstatsmon) as a representative
example of an agent that users may already be using that uses a totally custom
instrumentation mechanism and reports it in the widely-used statsd format. We
assume that our data service already has a way to ingest statsd data over the
Expand Down
10 changes: 5 additions & 5 deletions rfd/0018/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ networks when it comes to specifying networks in these network arguments.

## Tickets

* [DOCKER-502](https://devhub.joyent.com/jira/browse/DOCKER-502) -- adding support for selecting packages
* [DOCKER-585](https://devhub.joyent.com/jira/browse/DOCKER-585) -- adding support for selecting networks
* [DOCKER-897](https://devhub.joyent.com/jira/browse/DOCKER-897) -- expanding support to non-fabric networks
* [DOCKER-936](https://devhub.joyent.com/jira/browse/DOCKER-936) -- expanding support to multiple networks
* [DOCKER-1020](https://devhub.joyent.com/jira/browse/DOCKER-1020) -- adding support for selecting network for exposed ports
* [DOCKER-502](https://mnx.atlassian.net/browse/DOCKER-502) -- adding support for selecting packages
* [DOCKER-585](https://mnx.atlassian.net/browse/DOCKER-585) -- adding support for selecting networks
* [DOCKER-897](https://mnx.atlassian.net/browse/DOCKER-897) -- expanding support to non-fabric networks
* [DOCKER-936](https://mnx.atlassian.net/browse/DOCKER-936) -- expanding support to multiple networks
* [DOCKER-1020](https://mnx.atlassian.net/browse/DOCKER-1020) -- adding support for selecting network for exposed ports
4 changes: 2 additions & 2 deletions rfd/0020/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ state: draft

# RFD 20 Manta Slop-Aware Zone Scheduling

Tickets: [MANTA-2801](https://devhub.joyent.com/jira/browse/MANTA-2801)
Tickets: [MANTA-2801](https://mnx.atlassian.net/browse/MANTA-2801)

## Problem summary (for background, see below)

Expand Down Expand Up @@ -69,7 +69,7 @@ There are several points where a scheduling decision is made:

The details of the algorithm are explained in the [Big Theory Statement for the
Marlin
agent](https://github.com/joyent/manta-marlin/blob/3203685ae50c9f8941e9c05c721f5c36b50e602e/agent/lib/agent/agent.js#L27-L183).
agent](https://github.com/TritonDataCenter/manta-marlin/blob/3203685ae50c9f8941e9c05c721f5c36b50e602e/agent/lib/agent/agent.js#L27-L183).
That explanation describes the competing design goals (maximizing resource
utilization while maintaining fairness), how our approach achieves that, and
several examples worked out to show how it works.
Expand Down
2 changes: 1 addition & 1 deletion rfd/0021/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,5 +222,5 @@ _latest_ version of the schema repo(s) at all times).
<!-- Link -->

[connected component]: https://en.wikipedia.org/wiki/Connected_component_(graph_theory)
[RFD 58]: https://github.com/joyent/rfd/tree/master/rfd/0058
[RFD 58]: https://github.com/TritonDataCenter/rfd/tree/master/rfd/0058
[NAPI-327]: https://smartos.org/bugview/NAPI-327
4 changes: 2 additions & 2 deletions rfd/0022/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,5 +154,5 @@ chunk out the implementation. Here are the top ones to consider:

## Related Tickets

- [PUBAPI-1201](https://devhub.joyent.com/jira/browse/PUBAPI-1201) -- Expose error details for failed requests
- [PORTAL-1530](https://devhub.joyent.com/jira/browse/PORTAL-1530) -- A way to assist user to handle errors
- [PUBAPI-1201](https://mnx.atlassian.net/browse/PUBAPI-1201) -- Expose error details for failed requests
- [PORTAL-1530](https://mnx.atlassian.net/browse/PORTAL-1530) -- A way to assist user to handle errors
Loading

0 comments on commit 5a6ab17

Please sign in to comment.