# Python Packaging Utilities

The Python packaging utilities allow users to easily analyze their Python scripts and create Conda environments that are specifically built to contain the necessary dependencies required for their application to run. In distributed computing systems such as Work Queue, it is often difficult to maintain homogenous work environments for their Python applications, as the scripts utilize a large number of outside resources at runtime, such as Python interpreters and imported libraries. The Python packaging collection provides three easy-to-use tools that solve this problem, helping users to analyze their Python programs and build the appropriate Conda environments that ensure consistent runtimes within the Work Queue system. 

The `python_package_analyze` tool analyzes a Python script to determine all its top-level module dependencies and the interpreter version it uses. It then generates a concise, human-readable JSON output file containing the necessary information required to build a self-contained Conda virtual environment for the Python script.

The `python_package_create` tool takes the output JSON file generated by `python_package_analyze` and creates this Conda environment, preinstalled with all the necessary libraries and the correct Python interpreter version. It then generates a packaged tarball of the environment that can be easily relocated to a different machine within the system to run the Python task.

The `python_package_run` tool acts as a wrapper script for the Python task, unpacking and activating the Conda environment and running the task within the environment.






# python_package_analyze(1)

## NAME

`python_package_analyze` - command-line utility for analyzing Python script for library and interpreter dependencies

## SYNOPSIS

`python_package_analyze [options] <python-script ...> <json-output-file>`

## DESCRIPTION

`python_package_analyze` is a simple command line utility for analyzing Python scripts for the necessary external dependencies. It generates an output file that can be used with `python_package_create` to build a self-contained Conda environment for the Python application.

The `python-script ...` argument is the path(s) to the Python script(s) to be analyzed. The `json-output-file` argument is the path to the output JSON file that will be generated by the command. Specifying `-` for either will use `stdin`/`stdout` instead of a file.

## OPTIONS

-h, --help                   Show this help message
--toplevel                   Only include imports at the top level of the script.
--function FUNCTION          Only include imports in the given function.
--pkg-mapping IMPORT=NAME    Specify that the module imported as IMPORT in the
                             code is provided by the pip/conda package NAME.
--extra-pkg                  Also include the pip/conda package PKG, even if
                             it does not appear in the sources. May be useful
                             for scripts that execute other (possibly
                             non-Python) components that must also be included.

## EXIT STATUS

On success, returns zero. On failure, returns non-zero.

## EXAMPLE

An example Python script `example.py` contains the following code:

```
import os
import sys
import pickle

import antigravity
import matplotlib


if __name__ == "__main__":
    print("example")
```

To analyze the `example.py` script for its dependencies and generate the output JSON dependencies file `dependencies.json`, run the following command:

`$ python_package_analyze example.py dependencies.json`

Once the command completes, the `dependencies.json` file within the current working directory will contain a Conda environment specification
(suitable to use with `conda env create`).

Note that system-level modules are considered part of the Python package installed into the Conda environment.
Additionally, imports not managed by Pip or Conda are not allowed.
This includes other modules within the CWD or in user-written packages.


# python_package_create(1)

## NAME

`python_package_create` - command-line utility for creating a Conda virtual environment given a Python dependencies file

## SYNOPSIS

`python_package_create [options] <dependency-file> <output-path>`

## DESCRIPTION

`python_package_create` is a simple command-line utility that creates a local Conda environment from an input JSON dependency file, generated by `python_package_analyze`.
The command creates an environment tarball at `output-path` that can be sent to and run on different machines with the same architecture.

The `dependency-file` argument is the path (relative or absolute) to the JSON dependency file that was created by `python_package_analyze`. The `output-path` argument specifies the path for the environment tarball that is created
(should usually end in `.tar.gz`).

## OPTIONS

-h        Show this help message

## EXIT STATUS

On success, returns zero. On failure, returns non-zero.

## EXAMPLE

A dependencies file `dependencies.json` should first be generated with `python_package_analyze`.

To generate a Conda environment with the Python 3.7.3 interpreter and the `antigravity` and `matplotlib` modules preinstalled and with name `example_venv`, run the following command:

`$ python_package_create dependencies.json example_venv.tar.gz`

This will create an `example_venv.tar.gz` environment tarball within the current working directory, which can then be exported to different machines for execution.



# python_package_run(1)

## NAME

`python_package_run` - wrapper script that executes Python script within an isolated Conda environment

## SYNOPSIS

`python_package_run [options] --environment <file> command and args ...`

## DESCRIPTION

The `python_package_run` tool acts as a wrapper script for a Python task, running the task within the specified Conda environment. `python_package_run` can be utilized on different machines within the Work Queue system to unpack and activate a Conda environment, and run a task within the isolated environment.

The `--environment <file>` argument is the name of the Conda environment as a tarball file in which to run the Python task.
`command and args` (the `COMMAND`) are interpreted as `ARGV` for a command to be run inside the Conda environment.

By default, the conda environment is unpacked into a temporary directory which is removed at the end of execution. If the `--unpack-to <dir>` is given, then the environment is unpacked to `<dir>`, and it is not removed at the end of execution. Further (even simultaneous) executions of `python_package_run` will not unpack the environment if `<dir>` is already populated. Instances of `python_package_run` coordinate via a writing lock. By default, the wait for a writing lock is 300 seconds, but this can be modified with the `--wait-for-lock <secs>` option.

If the argument to `--unpack-to` does not exist, then it is created as an empty directory. If it is an existing directory, but it is not empty, then unpacking is not performed, regardless on whether this directory contains a valid conda environment.


## OPTIONS 

-e, --environment <file>   Conda environment as a tar file. (Required.)
-d, --unpack-to <dir>      Directory to unpack the environment. If not given,
                           a temporary directory is used.
-w, --wait-for-lock <secs> Number of seconds to wait to get a writing lock
                           on <dir>. Default is 300.
-h, --help                 Show the help screen.
command and args           Command to execute inside the given environment.

## EXIT STATUS

On success, returns 0. On failure, returns non-zero.

## EXAMPLE

A Python script `example.py` has been analyzed using `python_package_analyze` and a corresponding Conda environment named `example_venv.tar.gz` has been created, with all the necessary dependencies preinstalled. To execute the script within the environment, run the following command:

`python_package_run --environment example_venv.tar.gz python3 example.py`

This will run the command `python3 example.py` within the Conda environment in `example_venv.tar.gz`. Note that this command can be performed either locally, on the same machine that analyzed the script and created the environment, or remotely, on a different machine that contains the Conda environment tarball and the `example.py` script.

`python_package_run --unpack-to my_persistent_env --environment example_venv.tar.gz python3 example.py`

The previous command will run faster the second time it is executed, as the
environment is only unpacked once to `my_persistent_env`.


# HOW TO TEST OVERALL FUNCTIONALITY

Desired Python script to run: `hi.py`

1. `./python_package_analyze hi.py output.json`
- Generates the appropriate JSON file in the current working directory
2. `./python_package_create output.json venv.tar.gz`
- Will create a packed tarball of the environment named `venv.tar.gz` in the current working directory
3. `./python_package_run --environment venv.tar.gz python3 hi.py`
- Runs the `python3 hi.py` task command within the `venv.tar.gz` Conda environment
