Skip to content

Commit

Permalink
## CMX V4.1.2
Browse files Browse the repository at this point in the history
   - fixed error reporting in cm/cmx info artifact if artifact not found
   - added "cmx get repo" or "cmx get repo {repo alias}" or "cm get repo" (mlcommons#1405)
   - added support for PAT in "cmx pull repo {url}" (mlcommons#1381)
  • Loading branch information
gfursin committed Feb 18, 2025
1 parent 73ff733 commit 4501247
Show file tree
Hide file tree
Showing 10 changed files with 380 additions and 12 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@ It includes the following sub-projects.

### Collective Mind project (MLCommons CM)

The [Collective Mind automation framework (CM)](https://github.com/mlcommons/ck/tree/master/cmind)
The [Collective Mind automation framework (CM)](https://github.com/mlcommons/ck/tree/master/cm)
was developed to support open science and facilitate
collaborative, reproducible, and reusable research, development,
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data).

It helps users non-intrusively convert their software projects
into file-based repositories of portable and reusable artifacts
(code, data, models, scripts) with extensible metadata,
a unified command-line interface, and a simple Python API.
(code, data, models, scripts) with extensible metadata
and reusable automations, a unified command-line interface,
and a simple Python API.

Such artifacts can be easily chained together into portable
and technology-agnostic automation workflows, enabling users to
Expand Down
7 changes: 6 additions & 1 deletion cm/CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
## CMX V4.1.2
- fixed error reporting in cm/cmx info artifact if artifact not found
- added "cmx get repo" or "cmx get repo {repo alias}" or "cm get repo" (#1405)
- added support for PAT in "cmx pull repo {url}" (#1381)

## CMX V4.1.1
- fixed legacy interfaces

## CMX V4.1.0
- added -v flag to print version
- improve help
- improved help
- added support for legacy CM front-end for MLPerf (mlc, mlcr, mlcflow)

## CMX V4.0.2
Expand Down
2 changes: 1 addition & 1 deletion cm/README.CMX.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reprod
It helps users non-intrusively convert their software projects,
directories, and Git(Hub) repositories into file-based repositories
of portable and reusable artifacts (code, data, models, scripts)
with extensible metadata, a unified command-line interface,
with extensible metadata, reusable automations, a unified command-line interface,
and a simple Python API.

Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,
Expand Down
2 changes: 1 addition & 1 deletion cm/cmind/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# White paper: https://arxiv.org/abs/2406.16791
# Project contributors: https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md

__version__ = "4.1.1"
__version__ = "4.1.2"

from cmind.core import access
from cmind.core import x
Expand Down
6 changes: 5 additions & 1 deletion cm/cmind/automation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1198,7 +1198,11 @@ def info(self, i):

lst = r['list']
if len(lst)==0:
return {'return':16, 'error':'artifact not found: {}'.format(i)}
import json
import copy
x = copy.deepcopy(i)
if 'control' in x: del x['control']
return {'return':16, 'error':'artifact not found for the CM/CMX input:\n{}'.format(json.dumps(x, indent=2))}

cid1 = ''

Expand Down
125 changes: 122 additions & 3 deletions cm/cmind/repo/automation/repo/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,6 @@ def pull(self, i):

url += alias.replace('@', '/')

if pat != '' and url.startswith('https://'):
url = url[:8]+pat+'@'+url[8:]
else:
if alias == '':
# Get alias from URL
Expand All @@ -117,6 +115,18 @@ def pull(self, i):
if j1 >= 0:
alias = alias[j1+1:].replace('/', '@')

if pat != '' and url != '' and url.startswith('https://'):
patx = pat
urlx = url[8:]

j = urlx.find('@')
if j > 0:
username = urlx[:j]
patx = username + ':' + pat
urlx = urlx[j+1:]

url = url[:8] + patx + '@' + urlx

if url == '':
pull_repos = []

Expand Down Expand Up @@ -1238,6 +1248,115 @@ def reindex(self, i):

return {'return': 0, 'self_time': t2}

############################################################

def get(self, i):
"""
Get Git info with branch to help pull it on another machine
Args:
(CM input dict)
(verbose) (bool): If True, print index
Returns:
(CM return dict):
* return (int): return code == 0 if no error and >0 if error
* (error) (str): error string if return>0
"""

verbose = i.get('verbose', False)

console = i.get('out') == 'con'

artifact = i.get('artifact', '').strip()

if artifact == '' or artifact == '.':
r = self.cmind.access({'action': 'detect',
'automation': self.meta['alias']+','+self.meta['uid']})
if r['return'] > 0:
return r

repo_meta = r['meta']

path = r['path_to_repo']

else:
r = self.cmind.access({'action': 'load',
'automation': self.meta['alias']+','+self.meta['uid'],
'artifact':artifact})
if r['return'] > 0:
return r

repo_meta = r['meta']

path = r['path']

is_git = repo_meta.get('git', False)
repo_alias = repo_meta.get('alias', '')

# Go to this path
rr = {'return':0, 'path': path, 'meta': repo_meta, 'is_git': is_git}

if is_git:
# Go to repo directory
url = ''
branch = ''

import subprocess

cur_dir = os.getcwd()

os.chdir(path)

try:
url = subprocess.check_output(
'git config --get remote.origin.url', shell=True).decode("utf-8").strip()
except subprocess.CalledProcessError as e:
url = ''

try:
branch = subprocess.check_output(
'git rev-parse --abbrev-ref HEAD', shell=True).decode("utf-8").strip()
except subprocess.CalledProcessError as e:
branch = ''

if url != '':
rr['url'] = url
if branch != '':
rr['branch'] = branch

url2 = ''
if url.startswith('https://'):
url2 = 'git@' + url[8:]

j = url2.find('/')
if j>0:
url2 = url2[:j] + ':' + url2[j+1:]

cmd1 = f'cmx pull repo {url}'
cmd2 = f'cmx pull repo {url2}' if url2 != '' else cmd1

if branch != '':
cmd1 += f' --branch={branch}'
cmd2 += f' --branch={branch}'


rr['cmd'] = cmd1
rr['cmd2'] = cmd2

if console:
print ('You may retrieve this repository using the following CMX command:')
print ('')

print (cmd1)
if cmd2 != cmd1:
print (cmd2)

return rr


##############################################################################
def convert_ck_dir_to_cm(rpath):
Expand Down Expand Up @@ -1325,7 +1444,7 @@ def convert_ck_dir_to_cm(rpath):

return {'return': 0}


#############################################################
def print_warnings(warnings):

if len(warnings) > 0:
Expand Down
4 changes: 2 additions & 2 deletions cmx/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Collective Mind eXtension aka Common Metadata eXchange (CMX)

We are developing the extension to the MLCommons Collective Mind automation framework
called [Common Metadata eXchange (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
called [Collective Mind eXtension or Common Metadata eXchange (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
to support open science and facilitate
collaborative, reproducible, and reusable research, development,
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data)
Expand All @@ -10,7 +10,7 @@ and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reprod
It helps users non-intrusively convert their software projects
into file-based repositories of portable and reusable artifacts
(code, data, models, scripts) with extensible metadata,
a unified command-line interface, and a simple Python API.
reusable automations, a unified command-line interface, and a simple Python API.

Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,
enabling users to rerun, reproduce, and reuse complex experimental setups across diverse and rapidly
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
hide:
- toc
---

# Text Summarization using LLAMA2-70b

## Dataset

The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.

=== "Validation"
LLAMA2-70b validation run uses the Open ORCA dataset.

### Get Validation Dataset
```
mlcr get,dataset,openorca,validation -j
```

## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.

Get the Official MLPerf LLAMA2-70b Model

=== "Pytorch"

### Pytorch
```
mlcr get,ml-model,llama2-70b,_pytorch -j
```

!!! tip

Downloading llama2-70B model from Hugging Face will prompt you to enter the Hugging Face username and password. Please note that the password required is the [**access token**](https://huggingface.co/settings/tokens) generated for your account. Additionally, ensure that your account has access to the [llama2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) model.

Loading

0 comments on commit 4501247

Please sign in to comment.