Skip to content

Commit

Permalink
Packaging Improvements (weld-project#103)
Browse files Browse the repository at this point in the history
  • Loading branch information
sppalkia authored Mar 14, 2017
1 parent 5701dce commit 5a22f46
Show file tree
Hide file tree
Showing 21 changed files with 229 additions and 200 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ rusty-tags.vi
run
notes
bench.csv
*.dylib
*.so
*.dll
5 changes: 1 addition & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
[package]
name = "weld"
version = "0.1.0"
authors = ["Matei Zaharia <[email protected]>",
"Shoumik Palkar <[email protected]>",
"Deepak Narayanan <[email protected]>",
"James Thomas <[email protected]>"]
authors = ["Weld Developers <[email protected]>"]
build = "build.rs"

[dependencies]
Expand Down
66 changes: 45 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,27 @@
# Weld

Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a common intermediate representation, and optimizing across each framework.

Modern analytics applications combine multiple functions from different libraries and frameworks to build complex workflows. Even though individual functions can achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, and then optimizing and evaluating it only when a result is needed.

## Contents

* [Building](#building)
- [MacOS LLVM Installation](#macos-llvm-installation)
- [Ubuntu LLVM Installation](#ubuntu-llvm-installation)
- [Building Weld](#building-weld)
* [Documentation](#documentation)
* [Grizzly](#grizzly)
* [Running an Interactive REPL](#running-an-interactive-repl)
* [Benchmarking](#benchmarking)

## Building

To build Weld, you need [Rust 1.13 or higher](http://rust-lang.org) and [LLVM](http://llvm.org) 3.8.

To install Rust, follow the steps [here](https://rustup.rs). You can verify that Rust was installed correctly on your system by typing `rustc` into your shell.

### MacOS Installation
#### MacOS LLVM Installation

To install LLVM on macOS, first install [brew](https://brew.sh/). Then:

Expand All @@ -23,22 +38,7 @@ $ ln -s /usr/local/bin/llvm-config-3.8 /usr/local/bin/llvm-config

To make sure this worked correctly, run `llvm-config --version`. You should see `3.8.x`.

With LLVM and Rust installed, you can build Weld. Clone this repository and build using `cargo`:

```bash
$ git clone https://www.github.com/weld-project/weld
$ cd weld/
$ cargo build
```

Set the `WELD_HOME` environment variable and run tests:

```bash
$ export WELD_HOME=/path/to/weld/directory
$ cargo test
```

### Ubuntu Installation
#### Ubuntu LLVM Installation

To install LLVM on Ubuntu :

Expand All @@ -55,21 +55,42 @@ $ ln -s /usr/bin/llvm-config-3.8 /usr/local/bin/llvm-config

To make sure this worked correctly, run `llvm-config --version`. You should see `3.8.x`.

With LLVM and Rust installed, you can build Weld. Clone this repository and build using `cargo`:
#### Building Weld

With LLVM and Rust installed, you can build Weld. Clone this repository, set the `WELD_HOME` environment variable, and build using `cargo`:

```bash
$ git clone https://www.github.com/weld-project/weld
$ cd weld/
$ export WELD_HOME=`pwd`
$ cargo build --release
```

Set the `WELD_HOME` environment variable and run tests:
Weld builds two dynamically linked libraries (`.so` files on Linux and `.dylib` files on macOS): `libweld` and `libweldrt`. Both of these libraries must be on the `LD_LIBRARY_PATH`. By default, the libraries are in `$WELD_HOME/target/release` and `$WELD_HOME/weld_rt/target/release`. Set up the `LD_LIBRARY_PATH` as follows:

```bash
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$WELD_HOME/weld_rt/target/release:$WELD_HOME/target/release
```

Finally, run the unit and integration tests:

```bash
$ export WELD_HOME=/path/to/weld/directory
$ cargo test
```

## Documentation

The `docs/` directory contains documentation for the different components of Weld.

* [language.md](https://github.com/weld-project/weld/blob/master/docs/language.md) describes the syntax of the Weld IR.
* [api.md](https://github.com/weld-project/weld/blob/master/docs/api.md) describes the low-level C API for interfacing with Weld.
* [python.md](https://github.com/weld-project/weld/blob/master/docs/python.md) gives an overview of the Python API.
* [tutorial.md](https://github.com/weld-project/weld/blob/master/docs/tutorial.md) contains a tutorial for how to build a small vector library using Weld.

## Grizzly

**Grizzly** is a port of the [Pandas](pandas.pydata.org/) framework. Details on how to use Grizzly are under `python/grizzly`.

## Running an Interactive REPL

* `cargo test` runs unit and integration tests. A test name substring filter can be used to run a subset of the tests:
Expand Down Expand Up @@ -117,5 +138,8 @@ Expression type: vec[i32]
* `cargo bench` runs benchmarks under the `benches/` directory. The results of the benchmarks are written to a file called `benches.csv`. To specify specific benchmarks to run:

```
cargo bench --bench benches -- -t <comma-seperated-benchmarks>
cargo bench [benchmark-name]
```

If a benchmark name is not provided, all benchmarks are run.

17 changes: 17 additions & 0 deletions c/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Weld C API

This directory contains the C headers for the Weld API.

To use in a C program:

```C
#include "weld.h"
```

and when building:

```bash
$ clang -lweld my_program.c
```

See the [API documentation](https://github.com/weld-project/weld/blob/master/docs/api.md) for details on the API.
14 changes: 10 additions & 4 deletions api/c/weld.h → c/weld.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,16 +146,22 @@ weld_conf_new();
* @param key the key to look up.
* @return the string value for the key, or NULL if the key does not exist.
*/
const char *
extern "C" weld_conf_get(weld_conf_t, const char *key);
extern "C" const char*
weld_conf_get(weld_conf_t, const char *key);

/** Set a value for a Weld configuration key.
*
* @param key the key
* @param key the value
*/
void
extern "C" weld_conf_get(weld_conf_t, const char *key, const char *value);
extern "C" void
weld_conf_set(weld_conf_t, const char *key, const char *value);

/** Free a Weld configuration.
*
*/
extern "C" weld_conf_t
weld_conf_free(weld_conf_t);

#endif

14 changes: 10 additions & 4 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,16 +204,22 @@ weld_conf_new();
* @param key the key to look up.
* @return the string value for the key, or NULL if the key does not exist.
*/
const char *
extern "C" weld_conf_get(weld_conf_t, const char *key);
extern "C" const char *
weld_conf_set(weld_conf_t, const char *key);
/** Set a value for a Weld configuration key.
*
* @param key the key
* @param key the value
*/
void
extern "C" weld_conf_get(weld_conf_t, const char *key, const char *value);
extern "C" void
weld_conf_get(weld_conf_t, const char *key, const char *value);
/** Return a new Weld configuraiton.
*
*/
extern "C" void
weld_conf_free(weld_conf_t);
```

59 changes: 59 additions & 0 deletions docs/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Python API

The Python API is found under the `python/` directory. It provides convinient wrapper objects for the low-level C API, as well as utilities for building Weld computations and composing Python libraries.

To use the Python API, set the `PYTHONPATH` environment variable so it can find the `weld` Python package:

```bash
$ export PYTHONPATH=$PYTHONPATH:$WELD_HOME/python
```

You should also follow the setup instructions [here](https://github.com/weld-project/weld/blob/master/README.md) (in particular, build Weld and make sure its dynamic libraries are on the `LD_LIBRARY_PATH`).

### Bindings

The `weld.bindings` module contains bindings for the [C API](https://github.com/weld-project/weld/blob/master/docs/api.md). Each type in the C API is wrapped as a Python object. Methods on the Python objects call the corresponding C API functions.

As an example, the code below creates a new `WeldConf`, sets a value on it, and then gets the value back:

```python
>>> import weld.bindings as bnd
>>> conf = bnd.WeldConf() # calls weld_conf_new()
>>> conf.set("myKey", "myValue") # calls weld_conf_set(...)
>>> print conf.get("myKey") # calls weld_conf_get(...)
"myValue"
```

### WeldObject API

The `WeldObject` API is included in the `weld.weldobject` module, and an example of how to use it is described [here](https://github.com/weld-project/weld/blob/master/docs/tutorial.md).

This API provides an interface for _composing_ Python programs by building a lazily evaluated Weld computation. The `WeldObject` tracks values it operates over (called "dependencies") and builds and runs a runnable Weld module using a `evaluate` method. Dependencies are added using the `update` function. This function requires a Weld type for the dependency being added; types are available in the `weld.types` module.

The table below describes the `WeldObject` API in brief:

Method/Field | Description
------------- | -------------
`update(value, ty)` | Adds `value` (which has Weld type `ty`) in Weld as a dependency. Returns a string name which can be used in the object's Weld code to refer to this value.
`evaluate(ty)` | Evaluates the object and returns a value. `ty` is the expected Weld type of the return value.
`weld_code` | A string field representing the Weld IR for this object. This string is modified to register a computation with this object. See [this](https://github.com/weld-project/weld/blob/master/docs/language.md) document for a description of the language.


The general usage pattern for a WeldObject is to initialize it, add some dependencies and Weld code to register a computation, and then evaluate it to get a return value. Here's an example, where we add two numbers:

```python
>>> import weld.weldobject as wo
>>> import weld.encoders as enc
>>> obj = wo.WeldObject(enc.WeldScalarEncoder(), enc.WeldScalarDecoder()) # See more about encoders below
>>> name1 = obj.update(1, WeldI32())
>>> name2 = obj.update(2, WeldI32())
>>> obj.weld_code = name1 + " + " + name2 # Weld IR to add two numbers.
```

### Encoders and Decoders

When data is passed into Weld, it must be marshalled into a binary format which Weld understands (these formats are described in the [C API doc](https://github.com/weld-project/weld/blob/master/docs/api.md). In general, values are formatted using C scalars and structs; Python's `ctypes` module allows constructing these kinds of representations.

To support custom formats, the `WeldObject` API takes an encoder, which allows encoding a Python object as a Weld object, and a decoder, which allows decoding a Weld object into a Python object. These encoders and decoders are interfaces which must be implemented by a library writer.

Weld provides some commonly used encoders and decoders in the `weld.encoders` module. NumPy arrays, for example, are a common way to represent C-style arrays in Python. Weld thus includes a `WeldNumPyEncoder` and `WeldNumPyDecoder` class to marshall 1-dimensional NumPy arrays.
30 changes: 23 additions & 7 deletions easy_ll/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,7 @@ pub fn load_library(libname: &str) -> Result<(), LlvmError> {
/// Compile a string of LLVM IR (in human readable format) into a `CompiledModule` that can then
/// be executed. The LLVM IR should contain an entry point function called `run` that takes `i64`
/// and returns `i64`, which will be called by `CompiledModule::run`.
pub fn compile_module(code: &str,
static_lib_file: Option<&str>)
-> Result<CompiledModule, LlvmError> {
pub fn compile_module(code: &str, bc_file: Option<&[u8]>) -> Result<CompiledModule, LlvmError> {
unsafe {
// Initialize LLVM
ONCE.call_once(|| initialize());
Expand All @@ -141,10 +139,10 @@ pub fn compile_module(code: &str,
// Parse the IR to get an LLVMModuleRef
let module = try!(parse_module_str(context, code));

if static_lib_file != None {
let merger_module = try!(parse_module_file(context, static_lib_file.unwrap()));
if let Some(s) = bc_file {
let bc_module = try!(parse_module_bytes(context, s));
llvm::linker::LLVMLinkModules(module,
merger_module,
bc_module,
llvm::linker::LLVMLinkerMode::LLVMLinkerDestroySource,
std::ptr::null_mut());
}
Expand Down Expand Up @@ -202,6 +200,25 @@ unsafe fn parse_module_helper(context: LLVMContextRef,
Ok(module)
}

/// Parse a buffer of IR bytecode into an `LLVMModuleRef` for the given context.
unsafe fn parse_module_bytes(context: LLVMContextRef,
code: &[u8])
-> Result<LLVMModuleRef, LlvmError> {
// Create an LLVM memory buffer around the code
let code_len = code.len();
let name = try!(CString::new("module"));
let buffer = llvm::core::LLVMCreateMemoryBufferWithMemoryRange(code.as_ptr() as *const i8,
code_len,
name.as_ptr(),
0);

if buffer.is_null() {
return Err(LlvmError::new("LLVMCreateMemoryBufferWithMemoryRange failed"));
}

parse_module_helper(context, buffer)
}

/// Parse a string of IR code into an `LLVMModuleRef` for the given context.
unsafe fn parse_module_str(context: LLVMContextRef,
code: &str)
Expand Down Expand Up @@ -264,7 +281,6 @@ unsafe fn check_run_function(module: LLVMModuleRef) -> Result<(), LlvmError> {
let run = CString::new("run").unwrap();
let func = llvm::core::LLVMGetNamedFunction(module, run.as_ptr());
if func.is_null() {
println!("EEEK");
return Err(LlvmError::new("No run function in module"));
}
let c_str = llvm::core::LLVMPrintTypeToString(llvm::core::LLVMTypeOf(func));
Expand Down
12 changes: 7 additions & 5 deletions examples/cpp/add_repl/add_repl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@
int main() {
// Compile Weld module.
weld_error_t e = NULL;
weld_module_t m = weld_module_compile("|x:i64| x+5L", "configuration", &e);
weld_conf_t conf = weld_conf_new();
weld_module_t m = weld_module_compile("|x:i64| x+5L", conf, &e);
weld_conf_free(conf);

if (weld_error_code(e)) {
const char *err = weld_error_message(e);
printf("Error message: %s\n", err);
exit(1);
}

// Create a Weld Object for the argument.

while(true) {

char buf[4096];
char *c;
printf(">>> ");
Expand All @@ -40,13 +40,15 @@ int main() {
weld_value_t arg = weld_value_new(&input);

// Run the module and get the result.
weld_value_t result = weld_module_run(m, arg, &e);
weld_conf_t conf = weld_conf_new();
weld_value_t result = weld_module_run(m, conf, arg, &e);
void *result_data = weld_value_data(result);
printf("Answer: %lld\n", *(int64_t *)result_data);

// Free the values.
weld_value_free(result);
weld_value_free(arg);
weld_conf_free(conf);
}

weld_error_free(e);
Expand Down
12 changes: 0 additions & 12 deletions examples/cpp/mem_mgmt/Makefile

This file was deleted.

1 change: 0 additions & 1 deletion examples/cpp/mem_mgmt/README.md

This file was deleted.

Loading

0 comments on commit 5a22f46

Please sign in to comment.