This plugin allows for a plugin that is implemented in Python to be loaded into MLDB to extend its functionality.
A new plugin of type python
named <id>
can be created as follows:
mldb.put("/v1/plugins/"+<id>, {
"type": "python",
"params": {
"address": <string>,
"source": <PackageElementSources>,
"args": <Any>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
address | URI or location of script to run (use this parameter OR 'source') |
source | source of script to run (use this parameter OR 'address') |
args | Arguments to pass to script |
With PackageElementSources
defined as:
Field, Type, Default | Description |
---|---|
main | source for the main element of the plugin |
routes | source for the routes element of the plugin |
status | source for the status element of the plugin |
If the address
parameter is used, it may contain:
file:///mldb_data/<directory_within_mldb_data>
: a directory within the Docker container's mapped directory (the directory you specified with your docker run
command) will be copied and its main.py
file will be run, and routes.py
will be run to handle REST requests.git://
or gist://
: the repo will be cloned and its main.py
file will be run to initialize the plugin, and routes.py
will be run to handle REST requests. To checkout a specific commit or tag, add the following at the end of the address: #hash
.file:///mldb_data/<file_within_mldb_data>
: a Python file will be run from the Docker container's mapped directory.http://<url>
or https://<url>
: a Python file will be downloaded via HTTP(S) and executed.If the source
parameter is used, main
will be executed to initialize the plugin and routes
will be executed to handle REST requests (see mldb.plugin.rest_params
below)
If the args
parameter is used, its contents are available to the main
Python code via the mldb.plugins.args
variable (see below).
Plugins and scripts running within the MLDB instance will have access to an mldb
object which lets them manipulate the database more efficiently than via HTTP network round-trips.
mldb
object (available to plugins and scripts)mldb.log(message)
basic logging facility.mldb.create_dataset(dataset_config)
creates and returns a dataset object (see below). Equivalent of an HTTP POST /v1/datasets
.mldb.perform(verb, uri, [[query_string_key, query_string_value],...], payload, [[header_name, header_value],...])
efficiently emulates HTTP requests. See the REST API documentation for available routes and payloads.
async:true
is supported to perform asynchronous call when creating expensive resources. When this header is used, the call will return immediately and the object will be created in the background. One can track the progress of the operation by performing a "GET" on the resource. The state
field part of the response
field will be set to initializing
while the object is being created. Once the creation is completed the state
field will be set to ok
.There are two functions that allow access to the virtual filesystem of MLDB:
mldb.read_lines(uri, max_lines)
opens the given URI and returns a list with the first max_lines
of the file. When max_lines=-1
(default), all the lines are returned.mldb.ls(uri)
lists the directory-like uri (currently file://
and
s3://
URIs support listing) and returns the content. The returned
object has two fields: dirs
contains an array of subdirectories, as
URIs, and objects
contains an associative array map of the objects
in the directory, with the URI as the key and the following structure
as the value:
Field, Type, Default | Description |
---|---|
exists | Does the object exist? |
lastModified | Date the object was last modified |
size | Size in bytes of the object |
etag | Entity tag (unique hash) for object |
storageClass | Storage class of object if applicable |
ownerId | ID of owner |
ownerName | Name of owner |
objectMetadata | Metadata about the object |
userMetadata | Metadata placed there by the user |
dataset
object (available to plugins and scripts)dataset.record_row(row_name, [[col_name, value, timestamp],...])
records a row in the datasetdataset.record_rows([ [ row_name, [[col_name, value, timestamp],...] ], ... ])
records multiple rows in the dataset. It is more efficient than record_row
in most circumstances.dataset.record_column(column_name, [[row_name, value, timestamp],...])
records a column in the dataset. Not all dataset types support recording of columns.dataset.record_columns([ [ column_name, [[row_name, value, timestamp],...] ], ... ])
records multiple columns in the dataset. Not all dataset types support recording of columns.dataset.commit()
commits a dataset. The behavior of committing varies by dataset
type and some types may allow committing only once; see the documentation for the
dataset type for more details.mldb.script
object (available to scripts)mldb.script.args
contains the value of the args
key in the JSON payload of the HTTP requestmldb.script.set_return(return_val)
sets the return value of the script to be sent with the HTTP responsemldb.plugin
object (available to plugins)mldb.plugin.args
contains the value of the args
key in the JSON payload of the HTTP requestmldb.plugin.serve_static_folder(route, dir)
serve up static content under dir
on the given plugin route GET /v1/plugins/<id>/routes/<route>
.mldb.plugin.serve_documentation_folder(dir)
serve up documentation under dir
on the plugin's documentation route (GET /v1/plugins/<id>/doc
). This will render files with a .md
extension as HTML.See the Documentation Serving page for more details.mldb.plugin.rest_params
: object available within routes.py
which represents an HTTP REST call. It has the following fields and methods:
verb
, remaining
, rest_params
: route and query-string details such that for GET /v1/plugins/X/routes/hello?who=you
verb
= GET
remaining
= hello
rest_params
= [['who', 'you'], ['yes','you']
headers
: HTTP headerspayload
: HTTP bodycontentType
: content type of HTTP bodycontentLength
: content length of HTTP bodymldb.plugin.set_return(body)
: available within routes.py
, function called to write to HTTP response body and HTTP return codemldb.plugin.get_plugin_dir()
: returns the absolute path of the plugin's installation directory on diskCalling /v1/plugins/<id>/routes/<route>
will trigger the execution of the code in routes.py
. The plugin developer must handle the (verb, remaining)
tuple, available in the mldb.plugin.rest_params
object. If it represents a valid route, the set_return
function must be called with a non-null body, which will be returned in the response. If the function is not called or called with a null body, the HTTP response code will be 404.
GET /v1/plugins/<id>/routes/lastoutput
will return the output of the latest plugin code to have run: either the plugin loader or any subsequent calls to request handlers (see mldb.plugin.set_request_handler()
above).
Script output will be returned as the HTTP response.
In either case, the output will look like this:
{
"logs": [<logs>],
"result": <return_value>,
}
where <logs>
come from calls to mldb.log()
or using the python print
statement to standard output or standard error, and <return_value>
comes from a call to mldb.set_return()
(see above).
If an exception was thrown, the output will look like
{
"logs": [<logs>],
"exception": <exception details>,
}
The actual output object looks like this:
Field, Type, Default | Description |
---|---|
result | Result of running script |
logs | Log entries created by script |
exception | Exception thrown by script |
extra | Extra information from language |
with log entries looking like
Field, Type, Default | Description |
---|---|
ts | Timestamp at which message was received |
s | Stream on which message was received |
c | Content of stream |
closed | Stream is closed |
Exceptions are represented as
Field, Type, Default | Description |
---|---|
message | Exception description |
where | Full description of where exception came from |
scriptUri | URI of script that caused the exception |
lineNumber | Number of the line that caused the exception |
columnStart | Start column in the line |
columnEnd | End column in the line |
lineContents | Contents of the line indicating where the exception was |
context | What we were doing when we threw the exception |
stack | Call stack for exception |
extra | Extra information from language about exception |
with stack frames like
Field, Type, Default | Description |
---|---|
scriptUri | URI of script in this frame |
functionName | Name of function in this frame |
where | Where is the frame, in natural format for language |
lineNumber | Line number |
columnStart | Column number of error |
columnEnd | End column number of error |
extra | Extra stack from information from language |
Note that the Python plugin only fills in the where
field of stack frame
entries.
The following are useful for debugging MLDB, but should not be used in normal use of MLDB:
mldb.debugSetPathOptimizationLevel(level)
controls whether MLDB takes
optimized or generic paths. It can be used to unit-test the equivalence
of optimized and non-optimized paths. Setting to "always"
(the default)
will make MLDB always use optimized implementations when possible. Setting
to "never"
has the opposite effect. Setting to "sometimes"
will
randomly and non-deterministically choose whether or not to use an
optimized path each time that one is encountered (50% probability of each).
Note that this setting applies to the entire MLDB instance, and so should
not be used in production.A higher level object named mldb_wrapper
can be used to wrap the mldb
object. Key features are:
mldb_wrapper.ResponseException
. It eliminates the need to validate the status code after each call.log(thing)
function acts differently based on thing
type. dicts
and lists
are formatted using json.dumps
.
str
and unicode
are output as is. Any other type will output the string representation of thing
query(query)
function, which is a shorhand for GET /v1/query?q=<query>&format=table
. It returns a list of the
rows. Whenever you work without dates, that is likely the go-to function for querying.post_run_and_track_procedure(payload, refresh_rate_sesc)
, which creates a procedure based on payload
, runs it and prints its progress status every refresh_rate_sec
seconds. It returns as soon as the procedure stops running. Useful to
see what's going on for long running procedures.run_tests()
function, which executes python unittest of the current context.Code ```python
wrapper = mldb_wrapper.wrap(mldb)
ds = wrapper.createdataset({'id' : 'ds', 'type' : 'sparse.mutable'}) ds.recordrow('row 1', [['ColA', 'A', 0]]) ds.record_row('row 2', [['ColB', 'B', 0]]) ds.commit()
res = wrapper.query("SELECT * FROM ds")
wrapper.log(res)
mldb.script.set_return("success")
Output
```
2016-10-19 20:37:34.535 script runner plugin [
[
"_rowName",
"ColB",
"ColA"
],
[
"row 2",
"B",
null
],
[
"row 1",
null,
"A"
]
]
```
For more examples, refer to the python tests, which are mostly written with the `mldb_wrapper`.