Querying BIDS data

One very useful function to be able to perform on any large dataset is to be able to pull out or find the data corresponding to a set of conditions you may have. BIDSHandler allows you to query data in a simple manner and extract a list of objects that you require. To understand how this works we can look at the function used:

>>> BIDSTree.query(obj, token, condition, value)

For the full details on allowed values etc see bidshandler.QueryMixin.QueryMixin.query(). We will however summarise the process and meanings here.

  • obj is the name of the object type you want and can be one of ‘project’, ‘subject’, ‘session’ or ‘scan’. The object name must be the of the same level as the object you are querying or lower.

  • token is the name of the data you want to compare. This can be a value like ‘task’, ‘group’, or ‘sessions’ for example, or you can use any value that is used as a key in the sidecar.json file.

    To see the full list you can see the docstring by entering

    >>> from BIDSHandler import BIDSTree
    >>> help(BIDSTree.query)
    
  • condition is one of (‘<’, ‘<=’, ‘=’, ‘!=’, ‘!!=’, ‘=>’, ‘>’) where ‘!!=’ here means none equal. This is distinctly different from ‘!=’ in that if you ask for the list of subjects that do not contain ‘task’==’resting’ using ‘!=’ you will get the list of all subjects that contains tasks that aren’t ‘task’==’resting’. ‘!!=’ will however return the list of subjects that have no instances of ‘task’==’resting’.

    The condition must also be valid for the data type. So for example using a condition of < for a value that is a string will not return any meaningful result or may raise an error.

  • value is the value to check against.

    This value can either be the actual value in the file name, so for token = ‘task’, the value will be the string in task-value. For token’s such as age it will be the age found in the participant.tsv.

Because of the structure of the arguments, it is quite simple to construct queries, as well as read what it is expected that they ask.

All BIDS objects can be queried, and only queries which make sense can be made on these objects. For example the question “what are all the projects in this session that have one subject” obviously makes no sense and will raise an error.

Queries can also be componded to allow for multiple queries to be chained. This is possible because the object that the query function returns is actually a QueryList object which has all the functionality of a normal python list, however it also has a query method which can be called to apply a query to every member of the list.

Examples

Task: Find a list of all subjects that are Female.

>>> folder2.query('subject', 'sex', '=', 'F')

If we let subjects 1 and 3 be female, then this would return the list of those two Subject objects.

Task: Find all the projects containing recordings after the 1st of January 2018:

>>> folder2.query('project', 'rec_date_', '>=', '2018-01-01')

When querying recording dates the date value must be in the format YYYY-MM-DD when specifying a particular date.

The date can also be specified to the second if need be by using a date value of the format “%Y-%m-%dT%H:%M:%S” (the previous date format is “%Y-%m-%d”).

Task: Find all the subjects that do not have any resting state tasks:

>>> folder2.query('subject', 'task', '!!=', 'restingstate')

Where a key-value pair in the BIDS filename of task-restingstate corresponds to resting state data.

Task: Find all the subjects that are female and that have 1 session: First we will find all the subjects that are female, then we will find all the subjects in this list that have only one session.

>>> subjects = folder2.query('subject', 'sex', '=', 'F')
>>> subjects.query('subject', 'sessions', '=', 1)