Release 1.3

We are proud to announce release 1.3 of Data Workspaces. This release includes three new features:

  1. Support for git-lfs to manage large files within Git repositories
  2. Improved snapshot history reports as a part of the Jupyter integration
  3. Support for exporting a resource from one workspace and importing it (with lineage) into one or more consumer workspaces

Git-lfs Support

It can be nice to manage your golden source data in a Git repository. Unfortunately, due to its architecture and focus as a source code tracking system, Git can have significant performance issues with large files. Furthermore, hosting services like GitHub place limits on the size of individual files and on commit sizes (e.g. 100MB for GitHub). To get around this, various extensions to Git have sprung up. Data Workspaces has supported git-fat since last year, and now, with version 1.3, also integrates with git-lfs.

Git-lfs (large file storage) is a utility which interacts with a git hosting service using a special protocol. This protocol is supported by most popular Git hosting services/servers, including GitHub and GitLab. Git-lfs works with both cloud-hosted or locally-hosted versions of these services. Data Workspaces automatically determines whether a particular git repository is using git-lfs from the repository’s .gitattributes file(s). For more details, see the large file support section of the Data Workspaces documentation.

Snapshot History Reports

Data Workspaces can add several “magic” commands to Jupyter as a part of the DWS Jupyter kit. In 1.3, the %dws_history command now supports color-coding of metrics to make it quick to see which experiments have the best/worst results.

There are two styles of color coding. The --heatmap option will color code the background of metrics cells from dark red (worst results) to dark green (best results). For common metrics, like accuracy and loss, DWS knows the directionality of the metric (higher is better vs. lower is better). For less common or custom metrics, you can use the --maximize-metrics and --minimize-metrics options to specify this. Here is an example heatmap:

The second style of coloring takes a baseline snapshot. Any metric values better than the baseline have their text colored green, any metric values close to the baseline are bold back text, and any metric values worse than the baseline are colored red. The --baseline=SNAPSHOT option enables this display mode. Here is an example:

More details on the %dws_history command may be found in the Data Workspaces documentation.

Exporting and Importing Resources

In version 1.3, multiple workspaces can share resources in a loosely-coupled manner. If you add an intermediate data or results resource with the --export option, the lineage for the resource will be written to the resource’s root path each time a snapshot is taken. This resource can then be imported into another workspace as source data by using the --import option when adding it to the “importing” workspace. The lineage from the exporting workspace will then be incorporated into the lineage graph of the importing workspace.

Installing 1.3 or Upgrading from 1.2

This version is 100% backward compatible with DWS 1.2. To upgrade (or install in a new environment), just run: pip install --upgrade dataworkspaces