Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
koltyakov committed Oct 31, 2023
2 parents b366868 + efd2d0b commit 6f530ff
Show file tree
Hide file tree
Showing 10 changed files with 345 additions and 141 deletions.
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,8 @@ fmt:

.PHONY: build
build:
go build -o bin/cq-source-sharepoint -v
go build -o bin/cq-source-sharepoint -v

.PHONY: package
package:
go run main.go package --docs-dir docs -m @CHANGELOG.md v2.0.0 .
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# cq-source-sharepoint

[CloudQuery](https://github.com/cloudquery/cloudquery) SharePoint Source community plugin. To sync SharePoint data to any database destination.
<!-- ![Downloads](https://img.shields.io/github/downloads/koltyakov/cq-source-sharepoint/total.svg) -->

[CloudQuery](https://github.com/cloudquery/cloudquery) SharePoint Source community plugin.

<p float="left">
<img height="40px" src="./assets/cq.svg" />
Expand Down
146 changes: 146 additions & 0 deletions assets/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Welcome SharePoint to CloudQuery family

![](./intro/banner.jpg)

## Introduction

[CloudQuery](https://github.com/cloudquery/cloudquery) is a data extraction and transformation tool that allows you to sync your cloud resources to target databases. CloudQuery is open source a vendor agnostic tool with a majority of source and destinations [integrations](https://www.cloudquery.io/integrations) via an ecosystem of plugins. CloudQuery currently supports AWS, Azure, GCP, GitHub, GitLab, and more. It's build on Go programming language, lightweight, fast, and easy to use. For the last couple of weeks I have beed seeing a lot of interest in CloudQuery and appearing in GitHub trends sometimes overtaking such a fancy ChatGPT libraires.

![trends](./intro/trends.jpg)

> Yep, I have a filter by Go language, yet without the filter it's still in the top 10 as of today.
In the beginning of the previous week I was wondering how to grab some AWS Step Functions metadata with the tool and also was researching if I could use Jira as a datasource. Unfortunately, Jira is not an option yet, but maintainers are such a good guysshow transparency and almost an instant reaction to questions. So I suggested my help, yet not with Jira, but something I'm really good with - SharePoint.

The policy of CloudQuery for introducing new plugins is to have a future for official or community support for the plugin, and existence of a stable SDK in Go for the integrated system.

After highlighting my will to help, the mainteiners of the project have provided the initial starter version of the plugin for me with an unimaginative speed. So I decided to not get out of the shown pace and catch up with the plugin development. So, can you imagine the v1.0.0 is already out and available use in less than a week.

Glad to introduce you the first version of the SharePoint plugin for CloudQuery. The plugin is available on [GitHub](https://github.com/koltyakov/cq-source-sharepoint) and can be installed via CloudQuery CLI.

Current version supports SharePoint Online and SharePoint On-Premise and Lists and Document Libraries as a data sources; and fine configuration for what's to sync.

It is not indends to eat an elephant in one bite, so the plugin is not covering all the SharePoint features, yet it's a good start. I'm planning to add more features in the future, so stay tuned. Such as Managed Metadata, User Profile services, Search queries, etc. Or embedding SharePoint change API to process incremental syncs in a blink of an eye.

You can already use SharePoint with CloudQuery since today. Let me show you how get started.

## Getting started

![demo](./demo.gif)

### Prerequisites

- [CloudQuery CLI installed](https://www.cloudquery.io/docs/quickstart)
- SharePoint Online or On-Premise site
- Credentials for the site using one of the [supported auth strategies](https://go.spflow.com/auth/strategies)
- Knowing basics of SharePoint REST API (entities and fields naming)
- Target destination (PostgreSQL, MySQL, etc.) configured

### Datasource schema

Integration schema is stored in an `.yml` file, e.g. `sharepoint.yml`:

```yaml
kind: source
spec:
name: "sharepoint"
registry: "github"
path: "koltyakov/sharepoint"
# provide the latest stable version
# https://github.com/koltyakov/cq-source-sharepoint/releases
version: "v1.0.0"
destinations: ["sqlite"] # provide the list of used destinations
spec:
# Spec is mandatory
# This plugin follows idealogy of explicit configuration
auth:
strategy: "azurecert"
creds:
siteUrl: "https://contoso.sharepoint.com/sites/cloudquery"
tenantId: "e1990a0a-dcf7-4b71-8b96-2a53c7e323e0"
clientId: "2a53c7e323e0-e1990a0a-dcf7-4b71-8b96"
certPath: "/path/to/cert.pfx"
certPass: "certpass"
# A map of URIs to the list configuration
# If no lists are provided, nothing will be fetched
lists:
# List or Document library URI - a relative path without a site URL
# Can be checker in the browser URL (exclude site URL and view page path)
Lists/ListEntityName:
# REST's `$select` OData modificator, fields entity properties array
# Wildcard selectors `*` are intentionally not supported
# If not provided, only default fields will be fetched (ID, Created, AuthorId, Modified, EditorId)
select:
- Title
- Author/Title
# REST's `$expand` OData modificator, fields entity properties array
# When expanding an entity use selection of a nested entity property(s)
# Optional, and in most of the cases we recommend to avoid it and
# prefer to map nested entities to the separate tables
expand:
- Author
# Optional, an alias for the table name
# Don't map different lists to the same table - such scenariou is not supported
alias: "my_table"
Lists/AnotherList:
select:
- Title
```
#### Authentication
The plugin is powered with [gosip](https://github.com/koltyakov/gosip) library, so all of the variety of authentication options are available. Shameless plug, I'll be super happy if you star it on GitHub.
`creds` options are unique for different auth strategies. See more details in [Auth strategies](https://go.spflow.com/auth/strategies).

I always recomment Azure AD (`azurecert`) or Add-In (`addin`) auth for production scenarios for SharePoint Online. Yet, other auth strategies are available for testing and development purposes, e.g. `saml`, `device`.

SharePoint On-Premise auth is also supported, based on your farm configuration you can use: `ntlm`, `adfs` to name a few.

#### Lists configuration

The plugin is designed to be flexible and configurable. You can configure what to sync and how to sync it. The plugin is using SharePoint REST API, so you can use `$select` and `$expand` OData modifiers to configure what to sync.

Also you can use `alias` option to map different lists to a have a name you like in the destination database.

### Destination configuration

There is a variety of [targets](https://www.cloudquery.io/docs/plugins/destinations/overview) to sync to. I'd recommend starting something simple like SQLite. You can also use PostgreSQL or Snowflake, but you'll need to have a database configured.

```yaml
# sqlite.yml
kind: destination
spec:
name: sqlite
path: cloudquery/sqlite
version: "v1.3.0"
spec:
connection_string: ./db.sql
```

### Run CloudQuery sync

```bash
# With auth environment variables exported
cloudquery sync sharepoint.yml sqlite.yml
```

You should see the following output:

```bash
Loading spec(s) from sharepoint_reg.yml, sqlite.yml
Downloading https://github.com/koltyakov/cq-source-sharepoint/releases/download/v1.0.0/cq-source-sharepoint_darwin_arm64.zip
Downloading 100% |█████████████████████████████████████████████████████████| (5.2/5.2 MB, 10 MB/s)
Starting migration with 5 tables for: sharepoint (v1.0.0) -> [sqlite (v1.3.0)]
Migration completed successfully.
Starting sync for: sharepoint (v1.0.0) -> [sqlite (v1.3.0)]
Sync completed successfully. Resources: 37478, Errors: 0, Panics: 0, Time: 21s
```

That's it! You can now query your SharePoint data in the destination database.

## Conclusion

My firm belief that a diversity of tools and integration only improve a platform. Even while SharePoint and Microsoft lands have their own set of tools (such as PowerBI, Power Automate connectors, SSIS to name a few), I believe that CloudQuery can be a great addition to the toolset. Which can allow damn cheep, cloud native and lightweight integrations of SharePoint data. Also such tools can expose SharePoint to a larger and not necessarily SharePoint Pro audience.

I hope you'll find the plugin useful and simple to use. I'm looking forward to your feedback and feature requests. Please feel free to open an issue on [GitHub](https://github.com/koltyakov/cq-source-sharepoint) and share your thoughts and use cases.
Binary file added assets/intro/banner.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/intro/trends.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ require (
github.com/AlecAivazis/survey/v2 v2.3.7
github.com/apache/arrow/go/v14 v14.0.0-20231030205031-cb11e44d878f
github.com/brianvoe/gofakeit/v6 v6.24.0
github.com/cloudquery/plugin-sdk/v4 v4.17.0
github.com/cloudquery/plugin-sdk/v4 v4.17.1
github.com/google/uuid v1.4.0
github.com/koltyakov/gosip v0.0.0-20231003001958-007c8072d71c
github.com/koltyakov/gosip-sandbox v0.0.0-20230410140555-1211f873b91c
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ github.com/cloudquery/plugin-pb-go v1.13.1 h1:UR07rJgiExsY6TSDNvSHyaYsZl/QSIK62I
github.com/cloudquery/plugin-pb-go v1.13.1/go.mod h1:dpnHh8INCc+TYrOCHFKmnEFjerTrpIHCJ3u9NRGB2h8=
github.com/cloudquery/plugin-sdk/v2 v2.7.0 h1:hRXsdEiaOxJtsn/wZMFQC9/jPfU1MeMK3KF+gPGqm7U=
github.com/cloudquery/plugin-sdk/v2 v2.7.0/go.mod h1:pAX6ojIW99b/Vg4CkhnsGkRIzNaVEceYMR+Bdit73ug=
github.com/cloudquery/plugin-sdk/v4 v4.17.0 h1:R+6M4Unf+zkhwW6nOvhqRLeNUCF0Cos+CvkieFRNM2A=
github.com/cloudquery/plugin-sdk/v4 v4.17.0/go.mod h1:vGiAHFS2sOodpk1NC8rwuYjRp53oFuEk2mEq4MXgAJc=
github.com/cloudquery/plugin-sdk/v4 v4.17.1 h1:BQkDpWThRfqq5jKld9r7FAwfoXHV3+kMqaWTO+Wr//M=
github.com/cloudquery/plugin-sdk/v4 v4.17.1/go.mod h1:vGiAHFS2sOodpk1NC8rwuYjRp53oFuEk2mEq4MXgAJc=
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/creack/pty v1.1.17 h1:QeVUsEDNrLBW4tMgZHvxy18sKtr6VI492kBhUfhDJNI=
Expand Down
147 changes: 60 additions & 87 deletions resources/plugin/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,126 +4,99 @@ import (
"context"
"fmt"

"github.com/cloudquery/plugin-sdk/v4/message"
"github.com/cloudquery/plugin-sdk/v4/plugin"
"github.com/cloudquery/plugin-sdk/v4/scheduler"
"github.com/cloudquery/plugin-sdk/v4/schema"
"github.com/cloudquery/plugin-sdk/v4/transformers"
"github.com/koltyakov/cq-source-sharepoint/resources/auth"
"github.com/koltyakov/cq-source-sharepoint/resources/services/ct"
"github.com/koltyakov/cq-source-sharepoint/resources/services/lists"
"github.com/koltyakov/cq-source-sharepoint/resources/services/mmd"
"github.com/koltyakov/cq-source-sharepoint/resources/services/profiles"
"github.com/koltyakov/cq-source-sharepoint/resources/services/search"
"github.com/rs/zerolog"
)

type Client struct {
lists *lists.Lists
mmd *mmd.MMD
profiles *profiles.Profiles
search *search.Search
contentTypes *ct.ContentTypesRollup
logger zerolog.Logger
spec Spec
tables schema.Tables
scheduler *scheduler.Scheduler

options plugin.NewClientOptions

plugin.UnimplementedDestination
}

func NewClient(ctx context.Context, logger zerolog.Logger, cnfg []byte, opts plugin.NewClientOptions) (plugin.Client, error) {
spec, err := getSpec(cnfg)
if err != nil {
return nil, fmt.Errorf("failed to unmarshal spec: %w", err)
func (*Client) ID() string {
return Name
}
func (c *Client) Sync(ctx context.Context, options plugin.SyncOptions, res chan<- message.SyncMessage) error {
if c.options.NoConnection {
return fmt.Errorf("no connection")
}

sp, err := auth.GetSP(spec.Auth)
tt, err := c.Tables(ctx, plugin.TableOptions{
Tables: options.Tables,
SkipTables: options.SkipTables,
SkipDependentTables: options.SkipDependentTables,
})

if err != nil {
return nil, err
return err
}

if _, err := sp.Web().Select("Title").Get(); err != nil {
return nil, fmt.Errorf("failed to connect to SharePoint: %w", err)
}
return c.scheduler.Sync(ctx, c, tt, res, scheduler.WithSyncDeterministicCQID(options.DeterministicCQID))
}

client := &Client{
lists: lists.NewLists(sp, logger),
mmd: mmd.NewMMD(sp, logger),
profiles: profiles.NewProfiles(sp, logger),
search: search.NewSearch(sp, logger),
contentTypes: ct.NewContentTypesRollup(sp, logger),
func (c *Client) Tables(_ context.Context, options plugin.TableOptions) (schema.Tables, error) {
if c.options.NoConnection {
return schema.Tables{}, nil
}

tables, err := client.getTables(spec)
tt, err := c.tables.FilterDfs(options.Tables, options.SkipTables, options.SkipDependentTables)
if err != nil {
return nil, fmt.Errorf("failed to retrieve tables: %w", err)
}

if opts.NoConnection {
return &Plugin{
logger: logger,
tables: tables,
}, nil
return nil, err
}

return &Plugin{
logger: logger,
spec: *spec,
tables: tables,
scheduler: scheduler.NewScheduler(scheduler.WithLogger(logger)),
client: client,
}, nil
return tt, nil
}

func (c *Client) getTables(config *Spec) (schema.Tables, error) {
tables := schema.Tables{}

// Tables from lists config
for listURI, listSpec := range config.Lists {
table, err := c.lists.GetDestTable(listURI, listSpec)
if err != nil {
return nil, fmt.Errorf("failed to get list '%s': %w", listURI, err)
}
tables = append(tables, table)
}
func (*Client) Close(context.Context) error {
// ToDo: Add your client cleanup here
return nil
}

// Tables from mmd config
for terSetID, mmdSpec := range config.MMD {
table, err := c.mmd.GetDestTable(terSetID, mmdSpec)
if err != nil {
return nil, fmt.Errorf("failed to get term set '%s': %w", terSetID, err)
}
tables = append(tables, table)
}
func NewClient(_ context.Context, logger zerolog.Logger, cnfg []byte, opts plugin.NewClientOptions) (plugin.Client, error) {
logger = logger.With().Str("plugin", "sharepoint").Logger()

// Tables from profiles config
if config.Profiles.Enabled {
table, err := c.profiles.GetDestTable(config.Profiles)
if err != nil {
return nil, fmt.Errorf("failed to get profiles: %w", err)
}
tables = append(tables, table)
if opts.NoConnection {
// no spec could be present
return &Client{
logger: logger,
options: opts,
}, nil
}

// Tables from search config
for searchName, searchSpec := range config.Search {
table, err := c.search.GetDestTable(searchName, searchSpec)
if err != nil {
return nil, fmt.Errorf("failed to get search '%s': %w", searchName, err)
}
tables = append(tables, table)
spec, err := getSpec(cnfg)
if err != nil {
return nil, fmt.Errorf("failed to unmarshal spec: %w", err)
}

// Tables from content types config
for ctName, ctSpec := range config.ContentTypes {
table, err := c.contentTypes.GetDestTable(ctName, ctSpec)
if err != nil {
return nil, fmt.Errorf("failed to get content type '%s': %w", ctName, err)
}
tables = append(tables, table)
sp, err := auth.GetSP(spec.Auth)
if err != nil {
return nil, err
}

if err := transformers.TransformTables(tables); err != nil {
return nil, err
if _, err := sp.Web().Select("Title").Get(); err != nil {
return nil, fmt.Errorf("failed to connect to SharePoint: %w", err)
}

for _, table := range tables {
schema.AddCqIDs(table)
tables, err := spec.getTables(sp, logger)
if err != nil {
return nil, fmt.Errorf("failed to retrieve tables: %w", err)
}

return tables, nil
return &Client{
logger: logger,
spec: *spec,
tables: tables,
scheduler: scheduler.NewScheduler(scheduler.WithLogger(logger)),
options: opts,
}, nil
}
Loading

0 comments on commit 6f530ff

Please sign in to comment.