Interactions with MLDB occurs via a REST API (fully documented here), where REST stands for Representational State Transfer.
The structure of a REST API call is <verb> <resource> <arguments>
, e.g. PUT /v1/datasets {"type": "sparse.mutable"}
.
GET
, POST
, PUT
and DELETE
. These four verbs correspond to those available in the Hypertext Transfer Protocol (HTTP) but it is possible to work with the MLDB API in-process without making use of HTTP or the network (see below).The quickest way to get started with making HTTP calls to the MLDB API is with your web browser using our REST API Interactive Documentation.
Beyond that, if MLDB is running on a server named <host>
on port <port>
then you can use any of a widely-available variety of tools to make HTTP calls to http://<host>:<port>/<resource>
, for example:
curl
is a generic command-line tool available for many platforms which will allow you to make HTTP callsrequests
is an easy-to-use generic Python library for making HTTP requestshttr
is an easy-to-use generic R library for making HTTP requestspymldb
If you are using the built-in Notebook interface or want to work with MLDB from Python, you can install pymldb
, which gives you access to an MLDB-specific library to interact with the API over HTTP, while hiding the details of HTTP from you. The Using pymldb Tutorial will show you how to use pymldb
.
For most predictive applications, you will want to expose a function over
REST that implements the specific functionality required. There are two
ways of doing this: by calling the GET /v1/functions/<function>/application
route, or via a plugin exposing a custom route.
GET /v1/functions/<function>/application
As soon as a function is created, that route is automatically available, and so this version requires no extra work. However, as MLDB does not know the data types that the function will be called with, it is required to re-bind the function on every call, which can be very expensive. This can reduce performance by an order of magnitude.
In order to avoid this penalty, it is possible to pre-bind function calls
using the sql.expression
function type as follows:
PUT /v1/functions/wrapper {
"type": "sql.expression",
"params": {
"expression": "original_function({args, arg2, ...})[output] AS *",
"prepared": true
}
}
This will create a new function, /v1/functions/wrapper
, which has an
application
route that will forward arg1
and arg2
(these should be
replaced with the real function arguments) to the original_function
function, and return the output
field to the caller. The key element
here is the "prepared": true
, which causes the call to original_function
to be pre-bound and so the call is very fast; rates in the hundreds of
thousands of calls per second can be expected on a modern server for a
simple function.
The /v1/functions/<function>/batch
REST route allows for multiple
predictions to be made in one call. This call will take a JSON object
or array as input, apply the function to each element, and recreate
the same structure with the result of the predictions. For example,
if we set up the following function:
PUT /v1/functions/score_one {
"type": "sql.expression",
"params": {
"expression": "horizontal_sum(input)",
"prepared": true,
"raw": true,
"autoInput": true
}
}
(which effectively makes the horizontal_sum
builtin available as a
REST function, and uses autoInput
and raw
parameters to automatically
associate the input
parameter with the passed input value and the output
of the horizontal_sum
function to the result), then we can call it like
so:
GET /v1/functions/score_one/batch { "input": [[1,2,3],[4,5],[6],[]] }
which will return us
[6,9,6,0]
with one output per input entry. Alternatively, we could call it with
GET /v1/functions/score_one/batch {
"input": {
"one": [1,2,3],
"two": [4,5],
"three": [6],
"four": [] } }
which will return us
{ "one": 6, "two": 9, "three": 6, "four": 0 }
with the function being applied to each member of the object.
In some cases, it may make sense to batch several predictions together
into the same REST call (especially to reduce network overhead). This
can also be achieved using the sql.query
function type and the
row_dataset()
table function:
PUT /v1/functions/batcher {
"type": "sql.query",
"params": {
"query": "SELECT original_function(value) as value, column FROM row_dataset($input)",
"output": "NAMED_COLUMNS"
}
}
The batcher
function, when called, will take an argument input
that should
be either an array or an object. It will apply original_function
to each
element of the array or object, and return the same but with the elements
replaced with the value of their functions. For example, if we use
horizontal_sum
for the original function, then applying the batched version
to [[1,2,3],[4],[5,6,7],[]]
will return [6,4,18,NULL].
For maximum performance, it can be combined with the wrapper
trick above.
By defining a custom plugin, it is also possible to define completely custom
routes for MLDB to operate on. The following example shows how to make a
given function available via REST to respond to the GET /routes/predict
request.
First, we create a Javascript plugin that will set itself up to respond to the given route:
var fnconfig = {
type: "sql.expression",
params: {
expression: "horizontal_sum({*})",
prepared: true
}
};
var predictfn = mldb.createFunction(fnconfig);
function handleRequest(relpath, verb, resource, params, payload, contentType, contentLength,
headers)
{
if (verb == "GET" && relpath == "/predict") {
return predictfn.callJson(JSON.parse(params[0][1]));
}
throw "Unknown route " + verb + " " + relpath;
}
plugin.setRequestHandler(handleRequest);
Next, we PUT
that function to create the API:
PUT /v1/plugins/myapi {
"type": "javascript",
"params": {
"source": {
"main": "<above javascript code>"
}
}
}
Finally, we can GET
the predict
route in order to make a prediction:
GET /v1/plugins/myapi/routes/predict { "input": [ 1, 2, 3 ] }
which returns 6.
If you need to send a body with a GET
request to MLDB and your interface
doesn't support it, you can POST
to the special route /v1/redirect/get
,
which was created exactly for that case. It basically acts as a proxy that
redirects POST
requests as GET
to internal MLDB routes.
Here is an example
POST /v1/redirect/get {
"target": "/v1/query",
"body": {
"q": "SELECT 'foo' AS bar"
}
}
Which returns
[{"rowName":"result","columns":[["bar","foo","-Inf"]]}]