The MongoDB Import Procedure type is used to import a MongoDB collection into a dataset.
A new procedure of type mongodb.import
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "mongodb.import",
"params": {
"runOnCreation": <bool>,
"uriConnectionScheme": <string>,
"collection": <string>,
"outputDataset": <OutputDatasetSpec>,
"limit": <int>,
"offset": <int>,
"ignoreParsingErrors": <bool>,
"select": <SqlSelectExpression>,
"where": <string>,
"named": <string>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |
uriConnectionScheme | MongoDB connection scheme. mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database]] |
collection | The collection to import |
outputDataset | Output dataset configuration. This may refer either to an existing dataset, or a fully specified but non-existing dataset which will be created by the procedure. |
limit | Maximum number of lines to process |
offset | Skip the first n lines. |
ignoreParsingErrors | If true, any record causing an error will be skipped. Any record with BSON regex or BSON internal data type will cause an error. |
select | Which columns to use. |
where | Which lines to use to create rows. |
named | Row name expression for output dataset. Note that each row must have a unique name and that names cannot be objects. The default value, |
For this example, we will use a MongoDB database populated with data provided by the book MongoDB In Action. The zipped json file is available at http://mng.bz/dOpd.
Here we import the zips collection into an MLDB dataset called mongodb_zips
.
mldb.post('/v1/procedures', {
'type' : 'mongodb.import',
'params' : {
'connectionScheme': 'mongodb://somehost.mldb.ai:11712/zips',
'collection': 'zips',
'outputDataset' : {
'id' : 'mongodb_zips',
'type' : 'sparse.mutable'
}
}
})
We can now query the imported data as we would any other MLDB Dataset.
mldb.query("SELECT * FROM mongodb_zips LIMIT 5")
_id | city | loc.x | loc.y | pop | state | zip |
---|---|---|---|---|---|---|
_rowName | ||||||
57d2f5eb21af5ee9c4e27f08 | 57d2f5eb21af5ee9c4e27f08 | BONDURANT | 110.335287 | 43.223798 | 116 | WY |
57d2f5eb21af5ee9c4e27f07 | 57d2f5eb21af5ee9c4e27f07 | KAYCEE | 106.563230 | 43.723625 | 876 | WY |
57d2f5eb21af5ee9c4e27f05 | 57d2f5eb21af5ee9c4e27f05 | CLEARMONT | 106.458071 | 44.661010 | 350 | WY |
57d2f5eb21af5ee9c4e27f03 | 57d2f5eb21af5ee9c4e27f03 | ARVADA | 106.109191 | 44.689876 | 107 | WY |
57d2f5eb21af5ee9c4e27f01 | 57d2f5eb21af5ee9c4e27f01 | COKEVILLE | 110.916419 | 42.057983 | 905 | WY |
Here we did not provide any named parameter so oid() was used. This is why
_rowName
and _id
have the same values.
Another element to note is how the loc object was imported. The sub object was disassembled and imported as loc.x and loc.y into MLDB.