The MongoDB Dataset is a read only dataset based on a MongoDB collection. It is meant to be used as a bridge between MLDB and MongoDB by allowing MLDB SQL queries to run over a MongoDB collection.
A new dataset of type mongodb.dataset
named <id>
can be created as follows:
mldb.put("/v1/datasets/"+<id>, {
"type": "mongodb.dataset",
"params": {
"uriConnectionScheme": <string>,
"collection": <string>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
uriConnectionScheme | MongoDB connection scheme. mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database]] |
collection | The collection to import |
For this example, we will use a MongoDB database populated with data provided by the book MongoDB In Action. The zipped json file is available at http://mng.bz/dOpd.
Here we create a dataset named mongodb_zips_bridge
.
mldb.put('/v1/datasets/mongodb_zips_bridge', {
'type' : 'mongodb.dataset',
'params' : {
'connectionScheme': 'mongodb://somehost.mldb.ai:11712/zips',
'collection': 'zips'
}
})
We can directly query it.
mldb.query("SELECT * NAMED zip FROM mongodb_zips_bridge ORDER BY pop DESC LIMIT 5")
_id | city | loc.x | loc.y | pop | state | zip |
---|---|---|---|---|---|---|
_rowName | ||||||
60623 | 57d2f5eb21af5ee9c4e22302 | CHICAGO | 87.715700 | 41.849015 | 112047 | IL |
11226 | 57d2f5eb21af5ee9c4e24f28 | BROOKLYN | 73.956985 | 40.646694 | 111396 | NY |
10021 | 57d2f5eb21af5ee9c4e24e7f | NEW YORK | 73.958805 | 40.768476 | 106564 | NY |
10025 | 57d2f5eb21af5ee9c4e24e4f | NEW YORK | 73.968312 | 40.797466 | 100027 | NY |
90201 | 57d2f5eb21af5ee9c4e21258 | BELL GARDENS | 118.172050 | 33.969177 | 99568 | CA |