You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Site indexation (discoverability of content) is an important SEO impact area, as we need content to be crawled for SEO results, and XML Sitemaps is a good centralized way to do that. And, we need an HTML sitemap to manage site structures, clean up content, etc.
Need a XML and HTML sitemap generator for Adobe.com, HelpX, Acrobat and other sites currently on the Dexter Platform consolidating URLs from all AEM versions.
That will help SEO team and Authoring teams understand / see all the pages currently published and to analyze the structure and familiarize them with all the pages on the site.
Please also provide capability to add a url manually in case the page is not hosted on AEM.
We need a XML format for SEO and HTML format for Design
HTML format – production publish instance URLs
Outside of AEM 6.0 URLs need to be manually added to the list
Need to figure out a way to show only public URLs rather than full path URLs (/sitemap.xml)
Can the sitemaps be split by geo – Yes http://www.adobe.com/robots.txt
Need to validate that we can generate regular XML Sitemap files with AEM that are designed to improve site indexation.
Requirements:
Auto-generation of the XML sitemap files
One sitemap file per geo - www.adobe.com/ca/sitemap.xml, www.adobe.com/uk/sitemap.xml
Auto- publish new URLs into a sitemap file (approx within 1 hour - cache refresh time).
Auto - removal of non-canonical URLs (3XX, 404s, should be wiped out from XML sitemaps.)
Provide a way for authors to override the page url value that should show up in the sitemap instead of an absolute path
for ex: the url that should show up for the home page should be www.env.adobe.com instead of www.env.adobe.com/index.html.
For the above kind of pages, currently the workaround is to modify the xml manually but it would be nice to have a field provided using which the authors could mention the url that should be showing in the xml.
Separate implementation of Remove from sitemap checkbox for html and xml sitemaps
Automatically exclude non-html: https//www.adobe.com/1, https://www.adobe.com/1/creative-2015-07-20-mascha
Enforce Removal for pages with meta robots noindex e.g. http://www.adobe.com/confirmation.html, https://www.adobe.com/search.html
Include rewrite paths per the canonical tag. E.g. http://www.adobe.com/leaders.html (per canonical), not http://www.adobe.com/about-adobe/leaders.html (the actual resolving URL)
List URLs in alphabetic order
Possible to verify DNS to exclude pages that 404 on live site? https://www.adobe.com/qa_test_020.html
Sitemap generated should be http but also be able to be generated on https.
Sitemap generated should also take floodgated content that's available for visitors into account.
Acceptance Criteria:
Sitemaps are generated / refreshed on the fly, when the page is accessed.
verify new pages get added to author sitemaps when created
verify pages get updated on author sitemaps when moved or renamed
verify timestamps get properly updated in the XML sitemap when a page is updated/activated (author & publish)
verify pages are added to publish sitemaps when activated and cache flushed
Verify name update makes it to publish sitemaps
verify deactivated pages are removed from pulish sitemaps
Pages can be excluded from the sitemap via page property (or a similar place in helix)
verify new pages are included in sitemaps by default
verify that authors can remove pages from sitemap
verify that page can be de-activated, and removed from publish sitemap
verify child pages are also removed from sitemap (config is inherited, overriding inheritance was NOT tested
verify that fragments folder can be excluded from the sitemap at the folder level, and properly inherited
verify that the config is available on the FW, Lobby, Lobby tab, and fragment templates
verify the HTML sitemap has meta: (???)
Verify if the sitemaps are Floodgate aware and consider floodgated content as well that's visible tp end user.
The text was updated successfully, but these errors were encountered:
@dominique-pfister do you see anything missing in the current implementation?
Looking at the list of Requirements above, most are already built-in or can be done by using a separate helix-sitemap sheet in the index. The following, though, are not available:
Site indexation (discoverability of content) is an important SEO impact area, as we need content to be crawled for SEO results, and XML Sitemaps is a good centralized way to do that. And, we need an HTML sitemap to manage site structures, clean up content, etc.
Need a XML and HTML sitemap generator for Adobe.com, HelpX, Acrobat and other sites currently on the Dexter Platform consolidating URLs from all AEM versions.
That will help SEO team and Authoring teams understand / see all the pages currently published and to analyze the structure and familiarize them with all the pages on the site.
For XML format that is needed for SEO purposes, we need: http://adobe-consulting-services.github.io/acs-aem-commons/features/simple-sitemap.html
Please also provide capability to add a url manually in case the page is not hosted on AEM.
We need a XML format for SEO and HTML format for Design
HTML format – production publish instance URLs
Outside of AEM 6.0 URLs need to be manually added to the list
Need to figure out a way to show only public URLs rather than full path URLs (/sitemap.xml)
Can the sitemaps be split by geo – Yes
http://www.adobe.com/robots.txt
Current sitemap urls on Adobe.com:
https://www.stage.adobe.com/content/acom/us/en.sitemap.html?allowfullpath=true
http://www.stage.adobe.com/content/acom/us/en.sitemap.xml?allowfullpath=true
Need to validate that we can generate regular XML Sitemap files with AEM that are designed to improve site indexation.
Requirements:
Auto-generation of the XML sitemap files
One sitemap file per geo - www.adobe.com/ca/sitemap.xml, www.adobe.com/uk/sitemap.xml
Auto- publish new URLs into a sitemap file (approx within 1 hour - cache refresh time).
Auto - removal of non-canonical URLs (3XX, 404s, should be wiped out from XML sitemaps.)
Provide a way for authors to override the page url value that should show up in the sitemap instead of an absolute path
for ex: the url that should show up for the home page should be www.env.adobe.com instead of www.env.adobe.com/index.html.
For the above kind of pages, currently the workaround is to modify the xml manually but it would be nice to have a field provided using which the authors could mention the url that should be showing in the xml.
Separate implementation of Remove from sitemap checkbox for html and xml sitemaps
Automatically exclude non-html: https//www.adobe.com/1, https://www.adobe.com/1/creative-2015-07-20-mascha
Enforce Removal for pages with meta robots noindex e.g. http://www.adobe.com/confirmation.html, https://www.adobe.com/search.html
Include rewrite paths per the canonical tag. E.g. http://www.adobe.com/leaders.html (per canonical), not http://www.adobe.com/about-adobe/leaders.html (the actual resolving URL)
List URLs in alphabetic order
Possible to verify DNS to exclude pages that 404 on live site? https://www.adobe.com/qa_test_020.html
Sitemap generated should be http but also be able to be generated on https.
Sitemap generated should also take floodgated content that's available for visitors into account.
Acceptance Criteria:
Sitemaps are generated / refreshed on the fly, when the page is accessed.
verify new pages get added to author sitemaps when created
verify pages get updated on author sitemaps when moved or renamed
verify timestamps get properly updated in the XML sitemap when a page is updated/activated (author & publish)
verify pages are added to publish sitemaps when activated and cache flushed
Verify name update makes it to publish sitemaps
verify deactivated pages are removed from pulish sitemaps
Pages can be excluded from the sitemap via page property (or a similar place in helix)
verify new pages are included in sitemaps by default
verify that authors can remove pages from sitemap
verify that page can be de-activated, and removed from publish sitemap
verify child pages are also removed from sitemap (config is inherited, overriding inheritance was NOT tested
verify that fragments folder can be excluded from the sitemap at the folder level, and properly inherited
verify that the config is available on the FW, Lobby, Lobby tab, and fragment templates
verify the HTML sitemap has meta: (???)
Verify if the sitemaps are Floodgate aware and consider floodgated content as well that's visible tp end user.
The text was updated successfully, but these errors were encountered: