Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Initial spec conversion to Sphinx #16540

Open
wants to merge 3 commits into
base: sycl
Choose a base branch
from

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Jan 7, 2025

The upstream clang project uses reStructuredText and Sphinx for its
documentation, so it makes sense to use these doc tools also for the
SYCL extension specifications. Converting these documents to Sphinx
should remove one of the barriers to upstreaming our SYCL extensions.

This PR converts only a few specifications in order to get feedback on
the style and document structure.

I also took this as an opportunity to update the template we use for
writing new specifications. We've gained a lot of experience since the
template was first written, and these updates demonstrate our new
preferred style for writing specifications. I've also expanded the
template to show sample formatting for common scenarios that occur when
writing SYCL extensions. For example, there is a sample section showing
an extension that adds member functions to an existing class, a sample
section showing an extension that adds new non-member functions, etc.

The upstream clang project uses reStructuredText and Sphinx for its
documentation, so it makes sense to use these doc tools also for the
SYCL extension specifications.  Converting these documents to Sphinx
should remove one of the barriers to upstreaming our SYCL extensions.

This PR converts only a few specifications in order to get feedback on
the style and document structure.

I also took this as an opportunity to update the template we use for
writing new specifications.  We've gained a lot of experience since the
template was first written, and these updates demonstrate our new
preferred style for writing specifications.  I've also expanded the
template to show sample formatting for common scenarios that occur when
writing SYCL extensions.  For example, there is a sample section showing
an extension that adds member functions to an existing class, a sample
section showing an extension that adds new non-member functions, etc.
Come up with a strategy that allows cross-files references to work when
the specifications are viewed either from GitHub or from the Sphinx
generated HTML.  See comments in "conf.py" for details.
@gmlueck gmlueck marked this pull request as ready for review January 7, 2025 17:04
@gmlueck gmlueck requested review from a team as code owners January 7, 2025 17:04
@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 7, 2025

Here are a few specific questions I'd like feedback on:

  • Any comments in general about using reStructuredText instead of Asciidoc?

  • I intentionally omitted the copyright notice from the new specification documents because I think "Copyright Intel. All rights reserved" would be a problem for upstreaming these documents. Other Sphinx documentation in the llvm project does not seem to have any copyright notice, so I think it would be consistent for our documentation also to have no copyright notice. Note that the other Sphinx documentation does generate a copyright notice in the footer of the generated HTML ("Copyright LLVM Project"). I have not done this as part of this PR, but it probably should be done.

  • Many of the specification documents refer to the implementation as "DPC++". I'm not sure what to do about this. Should these be changed to say that the implementation is "clang"?

  • Should the name of an extension be in lower case or upper case? Previously, we always referred to extensions in lower case (e.g. "sycl_ext_oneapi_kernel_compiler". However, I noticed that the official SYCL KHR extension names are in upper case (e.g. "SYCL_KHR_DEFAULT_CONTEXT"). I don't think this was a conscious decision, but I'd like to be consistent. This PR uses upper case for the newly formatted extensions, but I'm not sure I like that better. Opinions on upper vs. lower case?

  • I added a TOC at the top of each specification, which is consistent with other Sphinx documentation in the llvm project. I like it. Is anyone opposed?

Also, picking out a couple specific people for feedback:

@Pennycook: I'd appreciate your opinion on the formatting style for the .rst versions of the extensions and also on the guidelines in the updated "template.rst".

@bader: I'd appreciate your opinion on the tooling changes I made in this PR (e.g. "conf.py").

Of course, comments are welcome from everyone also!

:align: left

=========== ========== =================
Device Type Backend(s) Number of Domains
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pennycook: Is "Number of Domains" the right title for this column? This is what you had in the original Asciidoc version, but it seems like "Number of Compute Units" might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. You're right; I based this extension on the abandoned "execution domains" extension, and missed this.

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments in general about using reStructuredText instead of Asciidoc?

IIRC, we choose Asciidoc to make it easier to integrate these documents into SYCL spec. Don't we need to preserve the Asciidoc format?

The more general question: Is the LLVM repository the right place to host SYCL specification extensions? Would it be more appropriate to host them in Khronos organization as we do for SPIR-V extensions?

# ways:
#
# * From the HTML that is generated from Sphinx, or
# * By using a web browser to navigate to the .rst file in the repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this option?
LLVM and Clang documentation uses :doc: syntax and I guess all users rely on HTML documentation for links to work.

You suggest we use different links style for writing SYCL documentation to enable cross-document links for GitHub rendered documents. I'm not sure how we can catch the use of wrong notation.

Can we ignore the GitHub rendering issue and follow LLVM/Clang link notation (i.e. use https://intel.github.io/llvm/ rendered documentation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is extremely common for us to point people to the GitHub documentation for extensions. We do this all the time in email and internal trackers too. Does https://intel.github.io/llvm/ get updated automatically to reflect the top of the "sycl" (main) branch in the repo? If so, I guess we could get in the habit of pointing people there instead.

One problem could occur if we need to point people to the documentation in a particular branch of the repo. Maybe this is uncommon enough that it doesn't matter, though.

On the other hand, using the alternate link syntax seems easy to do. Is it really so offensive? It would be nice to be able to continue pointing people to the documentation in the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does https://intel.github.io/llvm/ get updated automatically to reflect the top of the "sycl" (main) branch in the repo?

Yes. We update it every night and each time anything changes in clang/docs/ or sycl/doc/ directories. Exact rules:

schedule:
- cron: 0 1 * * *
pull_request:
branches:
- sycl
paths:
- '.github/workflows/sycl-docs.yml'
- 'clang/docs/**'
- 'sycl/doc/**'
push:
branches:
- sycl
paths:
- '.github/workflows/sycl-docs.yml'
- 'clang/docs/**'
- 'sycl/doc/**'

I don't know if using alternative style for SYCL documentation references is a real issue for the project, but is using Asciidoc format is a real blocker for upstreaming SYCL?
MLIR project managed to get accepted although they use Markdown format.
To be fair, LLVM and Clang sub-project also use Markdown alongside reStructuredText format, but MLIR rendered version has completely different look than LLVM/Clang documentation as they use Hugo framework to generate website instead of Sphinx.

Copy link
Contributor Author

@gmlueck gmlueck Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if using alternative style for SYCL documentation references is a real issue for the project, but is using Asciidoc format is a real blocker for upstreaming SYCL?

I'm not sure. When I talked to @AaronBallman, he seemed to think that converting to RST/Sphinx was important. I'm not sure if it's an absolute requirement, though.

I'd be OK with any of these options, but they all have some sort of down-side:

Option 1

  • Upstream only the "supported" and KHR extensions (not the "experimental" ones)
  • Move the "supported" specifications to the Khronos repo and use Asciidoc
  • Keep the "experimental" and "proposed" extensions only in intel/llvm and use Asciidoc for their specs
  • Con: Someone needs to refactor the code to remove all the "experimental" extensions before upstreaming

Option 2

  • Upstream only the KHR extensions (not the "supported" or "experimental" ones)
  • Keep the "supported", "experimental", and "proposed" extensions only in intel/llvm and use Asciidoc for their specs
  • Con: Someone needs to refactor the code to remove all the "supported" and "experimental" extensions before upstreaming

Option 3

  • Upstream the "supported", "experimental", and KHR extensions
  • Upstream the "supported" and "experimental" extension specs also and convert them to RST/Sphinx
  • Con: Someone needs to reformat the extension specs into RST/Sphinx; We need to convert them back to Asciidoc later in order to make them into KHRs.

Option 4

  • Upstream the "supported", "experimental", and KHR extensions
  • Upstream the "supported" and "experimental" extension specs also but leave them in Asciidoc format
  • Con: Unclear if clang community will accept this

To give some idea of the magnitude of the work, there are over 100 extensions ("supported", "experimental", and "proposed" combined), and their specifications total over 40K lines of text.

@bader, have you given any thought to what we will do with all the other SYCL documents in "sycl/doc" that are not extension specifications? Will we convert this to RST/Sphinx? Will the generated HTML be hosted someplace, or will the files just live in the repo?

EDIT: The total count of extension specifications includes "supported", "experimental", and "proposed". A previous version of this comment omitted the word "supported" from that list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bader, have you given any thought to what we will do with all the other SYCL documents in "sycl/doc" that are not extension specifications? Will we convert this to RST/Sphinx?

I didn't plan any changes in sycl/doc for the upstreaming. From day one, we use Markdown format for SYCL documentation to be upstream-ready. LLVM sub-projects use Markdown in addition to RST (even more extensively after moving to GitHub to be more friendly for first-time/irregular contributors).

Will the generated HTML be hosted someplace, or will the files just live in the repo?

I expect documentation for SYCL library will be in HTML format and will be hosted as at llvm.org as other sub-projects. See https://releases.llvm.org/ for examples. To make this happen we must upstream the minimal version of the working implementation fist.

Tagging @sergey-semenov, in case he has other plans. According to my understanding @sergey-semenov is working on upstreaming everything in sycl directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! That's an interesting point -- at some point, we're going to upstream the library stuff at which point we will basically need a separate website (same as many of the other top-level projects in the monorepo).

Yes, all of the extensions I'm talking about are extensions to the SYCL library. None of them are extensions to the clang compiler proper. I don't suppose that changes your opinion on using Asciidoc? :-)

My thinking is that if the experimental extension is implemented/being implemented in upstream, then we'd want documentation upstream that explains it

That makes sense, and it's aligned with my thinking also.

Really? The website sure reads to me like you can take any format they support as input and output RST (because of ↔︎ = conversion from and to in the docs).

That's not how I read the website. The line for Asciidoc has a right-pointing arrow, which the legend defines as "conversion to". If they supported conversion both ways, I think that line would have a double-headed arrow. In addition, the release notes make several mentions of "asciidoc writer", but there is no mention of "asciidoc reader". For comparison, the release notes mention both "rst writer" and "rst reader".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment for now ... I'm not sure if this is a question to @sergey-semenov or to the group as a whole. How do we expect the upstreamed SYCL documentation to be organized? Looking at the LLVM download page, it seems like there are 7 top-level "entry points" for the documentation:

  • llvm
  • clang
  • lld
  • clang-extra
  • libc++
  • poly
  • flang

Are we thinking that SYCL would be an 8th top-level entry point? Or, would SYCL support be part of "clang"? For comparison, OpenMP documentation is under "clang", so it seems (to me) that it would make sense for SYCL documentation to be there also.

In order for this to work, would we move the SYCL documentation files in the repo to "clang/docs" instead of "sycl/doc"? Or, would we change the CMake rules to build the "clang" documentation from both the "clang/docs" and "sycl/doc" directories?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenMP provides compiler pragmas to access features, so it makes sense to have OpenMP documentation as part of the compiler documentation.
SYCL provides a library API, so I think adding another top-level entry fits better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all of the extensions I'm talking about are extensions to the SYCL library. None of them are extensions to the clang compiler proper. I don't suppose that changes your opinion on using Asciidoc? :-)

It does -- for the SYCL library documentation, I think MLIR is a reasonable precedent to point to for not using RST because it will be its own top-level project in the monorepo and has different needs. I was opposed to mixing and matching within Clang.

That's not how I read the website. The line for Asciidoc has a right-pointing arrow, which the legend defines as "conversion to".

Ah, you may be right! Drat.

SYCL provides a library API, so I think adding another top-level entry fits better.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! It sounds like we may be converging on Option 4, then:

Option 4

  • Upstream the "supported", "experimental", and KHR extensions
  • Upstream the "supported" and "experimental" extension specs also but leave them in Asciidoc format

Is anyone opposed to this?

Still to do, then, is to add build rules to generate HTML from the Asciidoc files and integrate this into the top-level SYCL website.

Comment on lines +17 to +20
.. toctree::
:maxdepth: 1

supported/sycl_ext_oneapi_free_function_queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until now, we've directed people to GitHub to read our extensions. This syntax doesn't render correctly on GitHub, producing this:

image

I'm used to finding and reading extensions on GitHub, so I don't really like this. If we did decide to switch everything over to RST, I think we'd need to make sure we do a good job of highlighting somewhere that people should check the online documentation instead. I see pros and cons to this: only the stable documentation would be visible on the web, which is good; but it would be harder for people to read through extensions that are implemented in open-source but not yet reflected in the HTML documentation.

Comment on lines +258 to +259
* *Preconditions:* This paragraph tell the conditions that the application must
obey when calling the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *Preconditions:* This paragraph tell the conditions that the application must
obey when calling the function.
* *Preconditions:* This paragraph lists the conditions that the application must
obey when calling the function.

Comment on lines +262 to +270
* *Effects:* Tells what the function does.

* *Returns:* Tells what value the function returns.

* *Throws:* Tells what exceptions the function is required to throw and the
circumstances under which they are thrown.

* *Remarks:* Tells other information about the function if it is not covered by
the previous paragraphs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Something about "tells" feels off to me. I'd prefer a word like "Lists" but I can't really explain why.

several items.
When this happens, use a bulleted list.

Only include a paragraphs if its term is needed to specify the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only include a paragraphs if its term is needed to specify the function.
Only include a paragraph if its term is needed to specify the function.

The main section titled "Specification" (and its subsections) are the normative
part of the specification document.
All other sections (e.g. "Overview", "Examples", etc.) are non-normative.
In addition, examples are non-normative even if they appears inside the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition, examples are non-normative even if they appears inside the
In addition, examples are non-normative even if they appear inside the

"Specification" section.
This applies to example code and also to sentences that describe an example.
(Such sentences often start with "For example, ...".)
Note that the code synopses are not examples, and therefore they are normative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth noting here (or nearby) that there's a precedent in normative synopses for:

  • Unspecified types, etc, and the format we use to describe those; and
  • "Exposition only" functions and members, which may be included to improve readability but should not be implemented

@Pennycook
Copy link
Contributor

  • Any comments in general about using reStructuredText instead of Asciidoc?

I don't really like it. I've highlighted one instance in the extension you've already migrated to reStructuredText where things don't render properly. I think we'll find more (e.g., I know the forward progress extension uses a Mermaid diagram, with the expectation that things will be rendered on GitHub).

I already struggle to remember whether to use backticks or [code]+ blocks when migrating between DPC++ extensions and KHRs, and adding another syntax is going to introduce another source of confusion. I agree with @bader that we should think carefully about where extensions should go. One suggestion that might make things easier for everybody:

  • SYCL extensions are defined in the SYCL-Docs repository, using the same (AsciiDoc) syntax as the SYCL specification (but rendered separately). This would significantly simplify the process of incorporating extensions to SYCL, and would ensure all SYCL extensions were documented in one place.
  • The compiler documentation (in reStructuredText) would be limited to: 1) a list of published SYCL extensions that are supported (which may include extensions from multiple vendors); 2) any limitations in the compiler's handling of an extension (e.g., backend support); 3) any changes to the compiler itself required to interact with certain extensions (e.g., compiler options); and 4) any changes to the compiler's actual language parsing (e.g., new attributes, new keywords, etc).

I would also suggest that we split out extension specifications (for SYCL-Docs) from implementation notes (for the compiler), and make a pass to ensure we don't talk about DPC++/clang in the extension specifications. I think this would be cleaner in the long run, and would remove the need for us to upstream extensions in their entirety to Clang.

  • I intentionally omitted the copyright notice from the new specification documents because I think "Copyright Intel. All rights reserved" would be a problem for upstreaming these documents. Other Sphinx documentation in the llvm project does not seem to have any copyright notice, so I think it would be consistent for our documentation also to have no copyright notice. Note that the other Sphinx documentation does generate a copyright notice in the footer of the generated HTML ("Copyright LLVM Project"). I have not done this as part of this PR, but it probably should be done.

Splitting things as suggested above could also give us another option here: we could retain the copyright notice on extensions uploaded to SYCL-Docs, but drop the copyright notice from documentation of compiler options, etc.

  • Many of the specification documents refer to the implementation as "DPC++". I'm not sure what to do about this. Should these be changed to say that the implementation is "clang"?

I'm hopeful that splitting things out would allow us to minimize mention of specific compilers. But if we did need to add some limitation/information for a specific implementation, we could do that in the reStructuredText part that we intend to upstream. I think we'd want to refer to "DPC++" if a feature hasn't been upstreamed, and replace it with "clang" as part of the upstreaming process.

  • Should the name of an extension be in lower case or upper case? Previously, we always referred to extensions in lower case (e.g. "sycl_ext_oneapi_kernel_compiler". However, I noticed that the official SYCL KHR extension names are in upper case (e.g. "SYCL_KHR_DEFAULT_CONTEXT"). I don't think this was a conscious decision, but I'd like to be consistent. This PR uses upper case for the newly formatted extensions, but I'm not sure I like that better. Opinions on upper vs. lower case?

I personally prefer lower case, and I think we should probably fix the SYCL KHRs. There's precedent for lower case in other Khronos extensions (e.g., OpenCL), and it would be helpful to differentiate between the name of an extension (lower case) and the associated macro (upper case). If the extension and macro name are identical, the preprocessor might prevent us from using the extension name in certain contexts.

  • I added a TOC at the top of each specification, which is consistent with other Sphinx documentation in the llvm project. I like it. Is anyone opposed?

I agree it's nice. If we stick with AsciiDoc, we could add :toc: at the start of each extension to accomplish the same thing.

@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 8, 2025

I agree with @bader that we should think carefully about where extensions should go.

We discussed this in one of our Friday meetings before the break. My recollection is that we concluded that moving the "experimental" extension specifications to a separate repo was problematic. These extensions change frequently, and they are not versioned. If we moved them to another repo, we would have constant problems with the specification being out-of-sync with the implementation. By keeping them together in the same repo, we can easily keep the specification and implementation in sync.

I completely agree that KHR extension specifications should be stored in the Khronos repo, and the llvm repo should just have a list of these KHR's that we support. I suppose we could do the same for the "supported" extensions (which are versioned). However, the majority of our extensions are "experimental", so this doesn't really solve our problem.

I've highlighted one instance in the extension you've already migrated to reStructuredText where things don't render properly.

I think you are referring to the comment you made about "doc/extensions/index.rst"? This file didn't exist before this PR, and I wasn't thinking that people would view it via GitHub. If we think that it should be rendered well in GitHub, then I think it could be restructured. Note that even before this PR, we had "doc/index.rst", which also does not render well in GitHub.

@Pennycook
Copy link
Contributor

We discussed this in one of our Friday meetings before the break. My recollection is that we concluded that moving the "experimental" extension specifications to a separate repo was problematic. These extensions change frequently, and they are not versioned. If we moved them to another repo, we would have constant problems with the specification being out-of-sync with the implementation. By keeping them together in the same repo, we can easily keep the specification and implementation in sync.

Yes, I agree with that.

I completely agree that KHR extension specifications should be stored in the Khronos repo, and the llvm repo should just have a list of these KHR's that we support. I suppose we could do the same for the "supported" extensions (which are versioned). However, the majority of our extensions are "experimental", so this doesn't really solve our problem.

But "experimental" extensions are never going to be upstreamed as-is, right? Maybe this gives us an opportunity to separate things out more clearly. "supported" extensions that other vendors are invited to implement in their implementations would be stored in one place, and things we're experimenting with would be stored somewhere else.

I appreciate this could make some more work for us, so I don't feel strongly about this. I'm just pointing out there might be some advantages.

I think you are referring to the comment you made about "doc/extensions/index.rst"? This file didn't exist before this PR, and I wasn't thinking that people would view it via GitHub. If we think that it should be rendered well in GitHub, then I think it could be restructured. Note that even before this PR, we had "doc/index.rst", which also does not render well in GitHub.

I think it would be nice if it rendered well, but if that wasn't your intent then I don't think it's required. It strikes me as slightly odd to pick and choose which aspects of the documentation are supposed to be visible on GitHub, but since GitHub itself provides an index of all the extensions I don't want to spend too much time on this.

I'm more concerned about cases like this, where we're reliant on GitHub-specific rendering. This might be the only instance -- and we could swap it out for an image made in InkScape or something -- but again, I'm just pointing out that there may be cases where we would have to do more than just change some text formatting.

@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 9, 2025

But "experimental" extensions are never going to be upstreamed as-is, right?

The current thinking is that we would upstream the "experimental" extensions in their current form, and that we will refine them in the upstream repo over time. The group felt this would be acceptable by the clang community as long as the extensions were clearly identified as "experimental", which they are already. The main advantage of this approach is that it avoids the need to refactor the code to remove these extensions before upstreaming.

I do agree that it could be an option to revisit this decision. I've listed various options above in another comment which I think could be acceptable solutions. Unfortunately, they all have some sort of down-side.

I'm more concerned about cases like this, where we're reliant on GitHub-specific rendering.

I hadn't thought of this use of Mermaid until you pointed it out. There are a couple uses of Mermaid also in sycl_ext_oneapi_graph, but they are relatively small diagrams. I agree we'd need to do something here, but it could be as simple as drawing some ascii-art.

@Pennycook
Copy link
Contributor

I do agree that it could be an option to revisit this decision. I've listed various options above in another comment which I think could be acceptable solutions. Unfortunately, they all have some sort of down-side.

I'll respond above so things can be threaded as part of the same conversation.

I hadn't thought of this use of Mermaid until you pointed it out. There are a couple uses of Mermaid also in sycl_ext_oneapi_graph, but they are relatively small diagrams. I agree we'd need to do something here, but it could be as simple as drawing some ascii-art.

Yeah, I agree it might not be too hard, just wanted to point it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants