MongoDB Dataset

The MongoDB Dataset is a read only dataset based on a MongoDB collection. It is meant to be used as a bridge between MLDB and MongoDB by allowing MLDB SQL queries to run over a MongoDB collection.

Caveat

Configuration

A new dataset of type mongodb.dataset named <id> can be created as follows:

mldb.put("/v1/datasets/"+<id>, {
    "type": "mongodb.dataset",
    "params": {
        "uriConnectionScheme": <string>,
        "collection": <string>
    }
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

uriConnectionScheme
string

MongoDB connection scheme. mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database]]

collection
string

The collection to import

Example

For this example, we will use a MongoDB database populated with data provided by the book MongoDB In Action. The zipped json file is available at http://mng.bz/dOpd.

Here we create a dataset named mongodb_zips_bridge.

mldb.put('/v1/datasets/mongodb_zips_bridge', {
    'type' : 'mongodb.dataset',
    'params' : {
        'connectionScheme': 'mongodb://somehost.mldb.ai:11712/zips',
        'collection': 'zips'
    }
})

We can directly query it.

mldb.query("SELECT * NAMED zip FROM mongodb_zips_bridge ORDER BY pop DESC LIMIT 5")
_id city loc.x loc.y pop state zip
_rowName
60623 57d2f5eb21af5ee9c4e22302 CHICAGO 87.715700 41.849015 112047 IL
11226 57d2f5eb21af5ee9c4e24f28 BROOKLYN 73.956985 40.646694 111396 NY
10021 57d2f5eb21af5ee9c4e24e7f NEW YORK 73.958805 40.768476 106564 NY
10025 57d2f5eb21af5ee9c4e24e4f NEW YORK 73.968312 40.797466 100027 NY
90201 57d2f5eb21af5ee9c4e21258 BELL GARDENS 118.172050 33.969177 99568 CA