# Token Split Function

The token split function is a utility function that will insert a separating character before and after specified tokens in a string. This can be useful when preparing bags of words data for training. The function will add the specified split character only if there is not one already.

## Configuration

A new function of type tokensplit named <id> can be created as follows:

mldb.put("/v1/functions/"+<id>, {
"type": "tokensplit",
"params": {
"tokens": <InputQuery>,
"splitChars": <string>,
"splitCharToInsert": <string>
}
})

with the following key-value definitions for params:

Field, Type, DefaultDescription

tokens
InputQuery

An SQL expression specifiying the list of tokens to separate.

splitChars
string
"<space>,"

A string containing the list of possible split characters. Each character in the list is interpreted as a splitchar.

splitCharToInsert
string
"<space>"

A string containing the split character to insert if none of the characters in 'splitchars' are already present.

## Input and Output Values

The function takes a single input named text that contains the string to parse and returns a single input named output that contains the input string with the split characters inserted.

## Example

As a example, consider a function of type tokensplit defined this way:

mldb.put("/v1/functions/split_smiley", {
"type": "tokensplit",
"params": {
"tokens": "select ':P', '(>_<)', ':-)'",
"splitChars": " "
"splitCharToInsert": " "
}
})


Given this call

mldb.get("/v1/query",
q="select split_smiley({text: ':PGreat day!!! (>_<)(>_<) :P :P :P:-)'}) as x"
)


the function split_smiley will add spaces before and after emojis matching the list above but leave unchanged the ones that are already separated by a space.

":P Great day!!! (>_<) (>_<) :P :P :P :-)"