## CMX V4.1.2

- fixed error reporting in cm/cmx info artifact if artifact not found - added "cmx get repo" or "cmx get repo {repo alias}" or "cm get repo" (mlcommons#1405) - added support for PAT in "cmx pull repo {url}" (mlcommons#1381)
ctuning · Feb 18, 2025 · 4501247 · 4501247
1 parent 73ff733
commit 4501247
Show file tree

Hide file tree

Showing 10 changed files with 380 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -23,15 +23,16 @@ It includes the following sub-projects.
 
 ### Collective Mind project (MLCommons CM)
 
-The [Collective Mind automation framework (CM)](https://github.com/mlcommons/ck/tree/master/cmind)
+The [Collective Mind automation framework (CM)](https://github.com/mlcommons/ck/tree/master/cm)
 was developed to support open science and facilitate
 collaborative, reproducible, and reusable research, development, 
 and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data).
 
 It helps users non-intrusively convert their software projects 
 into file-based repositories of portable and reusable artifacts 
-(code, data, models, scripts) with extensible metadata, 
-a unified command-line interface, and a simple Python API.
+(code, data, models, scripts) with extensible metadata
+and reusable automations, a unified command-line interface, 
+and a simple Python API.
 
 Such artifacts can be easily chained together into portable 
 and technology-agnostic automation workflows, enabling users to 

diff --git a/cm/CHANGES.md b/cm/CHANGES.md
@@ -1,9 +1,14 @@
+## CMX V4.1.2
+   - fixed error reporting in cm/cmx info artifact if artifact not found
+   - added "cmx get repo" or "cmx get repo {repo alias}" or "cm get repo" (#1405)
+   - added support for PAT in "cmx pull repo {url}" (#1381)
+
 ## CMX V4.1.1
    - fixed legacy interfaces
 
 ## CMX V4.1.0
    - added -v flag to print version
-   - improve help
+   - improved help
    - added support for legacy CM front-end for MLPerf (mlc, mlcr, mlcflow)
 
 ## CMX V4.0.2

diff --git a/cm/README.CMX.md b/cm/README.CMX.md
@@ -22,7 +22,7 @@ and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reprod
 It helps users non-intrusively convert their software projects,
 directories, and Git(Hub) repositories into file-based repositories
 of portable and reusable artifacts (code, data, models, scripts) 
-with extensible metadata, a unified command-line interface, 
+with extensible metadata, reusable automations, a unified command-line interface, 
 and a simple Python API.
 
 Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,

diff --git a/cm/cmind/__init__.py b/cm/cmind/__init__.py
@@ -9,7 +9,7 @@
 # White paper: https://arxiv.org/abs/2406.16791
 # Project contributors: https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md
 
-__version__ = "4.1.1"
+__version__ = "4.1.2"
 
 from cmind.core import access
 from cmind.core import x

diff --git a/cm/cmind/automation.py b/cm/cmind/automation.py
@@ -1198,7 +1198,11 @@ def info(self, i):
 
         lst = r['list']
         if len(lst)==0:
-            return {'return':16, 'error':'artifact not found: {}'.format(i)}
+            import json
+            import copy
+            x = copy.deepcopy(i)
+            if 'control' in x: del x['control']
+            return {'return':16, 'error':'artifact not found for the CM/CMX input:\n{}'.format(json.dumps(x, indent=2))}
 
         cid1 = ''
 

diff --git a/cm/cmind/repo/automation/repo/module.py b/cm/cmind/repo/automation/repo/module.py
@@ -90,8 +90,6 @@ def pull(self, i):
 
                 url += alias.replace('@', '/')
 
-                if pat != '' and url.startswith('https://'):
-                    url = url[:8]+pat+'@'+url[8:]
         else:
             if alias == '':
                 # Get alias from URL
@@ -117,6 +115,18 @@ def pull(self, i):
                             if j1 >= 0:
                                 alias = alias[j1+1:].replace('/', '@')
 
+        if pat != '' and url != '' and url.startswith('https://'):
+            patx = pat
+            urlx = url[8:]
+
+            j = urlx.find('@')
+            if j > 0:
+                username = urlx[:j]
+                patx = username + ':' + pat
+                urlx = urlx[j+1:]
+
+            url = url[:8] + patx + '@' + urlx
+
         if url == '':
             pull_repos = []
 
@@ -1238,6 +1248,115 @@ def reindex(self, i):
 
         return {'return': 0, 'self_time': t2}
 
+    ############################################################
+
+    def get(self, i):
+        """
+        Get Git info with branch to help pull it on another machine
+
+        Args:
+            (CM input dict)
+
+            (verbose) (bool): If True, print index
+
+        Returns: 
+            (CM return dict):
+
+            * return (int): return code == 0 if no error and >0 if error
+            * (error) (str): error string if return>0
+
+        """
+
+        verbose = i.get('verbose', False)
+
+        console = i.get('out') == 'con'
+
+        artifact = i.get('artifact', '').strip()
+
+        if artifact == '' or artifact == '.':
+            r = self.cmind.access({'action': 'detect',
+                                   'automation': self.meta['alias']+','+self.meta['uid']})
+            if r['return'] > 0:
+                return r
+
+            repo_meta = r['meta']
+
+            path = r['path_to_repo']
+
+        else:
+            r = self.cmind.access({'action': 'load',
+                                   'automation': self.meta['alias']+','+self.meta['uid'],
+                                   'artifact':artifact})
+            if r['return'] > 0:
+                return r
+
+            repo_meta = r['meta']
+
+            path = r['path']
+
+        is_git = repo_meta.get('git', False)
+        repo_alias = repo_meta.get('alias', '')
+
+        # Go to this path
+        rr = {'return':0, 'path': path, 'meta': repo_meta, 'is_git': is_git}
+
+        if is_git:
+            # Go to repo directory
+            url = ''
+            branch = ''
+
+            import subprocess
+
+            cur_dir = os.getcwd()
+
+            os.chdir(path)
+
+            try:
+                url = subprocess.check_output(
+                    'git config --get remote.origin.url', shell=True).decode("utf-8").strip()
+            except subprocess.CalledProcessError as e:
+                url = ''
+
+            try:
+                branch = subprocess.check_output(
+                    'git rev-parse --abbrev-ref HEAD', shell=True).decode("utf-8").strip()
+            except subprocess.CalledProcessError as e:
+                branch = ''
+
+            if url != '':
+                rr['url'] = url
+                if branch != '':
+                    rr['branch'] = branch
+
+            url2 = ''
+            if url.startswith('https://'):
+                url2 = 'git@' + url[8:]
+
+                j = url2.find('/')
+                if j>0:
+                    url2 = url2[:j] + ':' + url2[j+1:]
+
+            cmd1 = f'cmx pull repo {url}'
+            cmd2 = f'cmx pull repo {url2}' if url2 != '' else cmd1
+
+            if branch != '':
+                cmd1 += f' --branch={branch}'
+                cmd2 += f' --branch={branch}'
+
+
+            rr['cmd'] = cmd1
+            rr['cmd2'] = cmd2
+
+            if console:
+                print ('You may retrieve this repository using the following CMX command:') 
+                print ('')
+
+                print (cmd1)
+                if cmd2 != cmd1:
+                    print (cmd2)
+
+        return rr
+
 
 ##############################################################################
 def convert_ck_dir_to_cm(rpath):
@@ -1325,7 +1444,7 @@ def convert_ck_dir_to_cm(rpath):
 
     return {'return': 0}
 
-
+#############################################################
 def print_warnings(warnings):
 
     if len(warnings) > 0:

diff --git a/cmx/README.md b/cmx/README.md
@@ -1,7 +1,7 @@
 # Collective Mind eXtension aka Common Metadata eXchange (CMX)
 
 We are developing the extension to the MLCommons Collective Mind automation framework 
-called [Common Metadata eXchange (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
+called [Collective Mind eXtension or Common Metadata eXchange (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
 to support open science and facilitate
 collaborative, reproducible, and reusable research, development, 
 and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data)
@@ -10,7 +10,7 @@ and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reprod
 It helps users non-intrusively convert their software projects 
 into file-based repositories of portable and reusable artifacts 
 (code, data, models, scripts) with extensible metadata, 
-a unified command-line interface, and a simple Python API.
+reusable automations, a unified command-line interface, and a simple Python API.
 
 Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,
 enabling users to  rerun, reproduce, and reuse complex experimental setups across diverse and rapidly 

diff --git a/cmx/mlperf-inference/v5.0/benchmarks/language/get-llama2-70b-data.md b/cmx/mlperf-inference/v5.0/benchmarks/language/get-llama2-70b-data.md
@@ -0,0 +1,35 @@
+---
+hide:
+  - toc
+---
+
+# Text Summarization using LLAMA2-70b
+
+## Dataset
+
+The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
+
+=== "Validation"
+    LLAMA2-70b validation run uses the Open ORCA dataset.
+
+    ### Get Validation Dataset
+    ```
+    mlcr get,dataset,openorca,validation -j
+    ```
+
+## Model
+The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
+
+Get the Official MLPerf LLAMA2-70b Model
+
+=== "Pytorch"
+
+    ### Pytorch
+    ```
+    mlcr get,ml-model,llama2-70b,_pytorch -j
+    ```
+
+!!! tip
+
+    Downloading llama2-70B model from Hugging Face will prompt you to enter the Hugging Face username and password. Please note that the password required is the [**access token**](https://huggingface.co/settings/tokens) generated for your account. Additionally, ensure that your account has access to the [llama2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) model.
+