The `racemodel` Module¶

`gen_cdf`	Estimate the cumulative frequency polygon from response time data.
`gen_cdfs_from_list`	Estimate the empirical CDFs for a list of arrays.
`gen_percentiles`	Calculate n equally spaced percentiles.
`get_percentiles_from_cdf`	Interpolate the percentile boundaries.
`gen_step_fun`	Generate a step function from an observed response time distribution.

Race model inequality analysis implementation, based on Ulrich, Miller, and Schröter (2007): ‘Testing the race model inequality: An algorithm and computer programs’, published in Behavior Research Methods 39 (2), pp. 291-302.

pphelper.racemodel.gen_cdf(rts, t_max=None)¶

Estimate the cumulative frequency polygon from response time data.

Parameters:	rts (array_like) – The raw response time data. Data does not need to be ordered and may contain duplicate values. t_max (int, optional) – Up to which time point (in milliseconds) the model should be calculated. If not specified, the maximum value of the supplied input data will be used.
Returns:	A Series containing the estimated cumulative frequency polygon, indexed by the time points in ms.
Return type:	DataFrame or Series

Notes

Response times will be rounded to 1 millisecond. The algorithm is heavily adapted from the one described by Ulrich, Miller, and Schröter (2007): ‘Testing the race model inequality: An algorithm and computer programs’, published in Behavior Research Methods 39 (2), pp. 291-302.

Examples

>>> from pphelper.racemodel import gen_cdf
>>> import numpy as np
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> gen_cdf(RTs, t_max=RTs.max())
t
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
  0
  0
  0
  0
  0
...
  0.856643
  0.863636
  0.870629
  0.877622
  0.884615
  0.892308
  0.900000
  0.907692
  0.915385
  0.923077
  0.930769
  0.938462
  0.946154
  0.953846
  1.000000
Length: 281, dtype: float64

pphelper.racemodel.gen_cdfs_from_dataframe(data, rt_column=u'RT', modality_column=u'Modality', names=None)¶

Create cumulative distribution functions (CDFs) for response time data.

Parameters:	data (DataFrame) – A DataFrame with containing at least two columns: one with response times, and another one specifying the corresponding modalities. rt_column (string, optional) – The name of the column containing the response times. Defaults to `RT`. modality_column (string, optional) – The name of the column containing the modalities corresponding to the response times. Defaults to `Modality`. names (list, optional) – A list of length 4, supplying the names of the modalities. The first three elements specify the modalities in the input data to consider. These three and the fourth argument are also used to label the columns in the returned DataFrame. If this argument is not supplied, a default list `['A', 'B', 'AB']` will be used.
Returns:	results – A DataFrame containing the empirical cumulative distribution functions generated from the input, one CDF per column. The number of columns depends on the number of unique values in the modality_column or on the names argument,
Return type:	DataFrame

Notes

This function internally calls gen_cdf. Please see this function to find out about additional optional keyword arguments.

Examples

>>> from pphelper.racemodel import gen_cdfs_from_dataframe
>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame({'RT': np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291,
... 245, 246, 248, 250, 251, 252, 253, 254, 255, 259, 263, 265, 279, 282, 284, 319,
... 234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280]),
... 'Modality': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x',
... 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y',
... 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', ]})
>>> gen_cdfs_from_dataframe(data)
            x         y  z
t
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
  0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
 0.000000  0.000000  0
..        ...       ... ..
0.942857  0.916964  1
1.000000  0.918750  1
1.000000  0.920536  1
1.000000  0.922321  1
1.000000  0.924107  1
1.000000  0.925893  1
1.000000  0.927679  1
1.000000  0.929464  1
1.000000  0.931250  1
1.000000  0.933036  1
1.000000  0.934821  1
1.000000  0.936607  1
1.000000  0.938393  1
1.000000  0.940179  1
1.000000  0.941964  1
1.000000  0.943750  1
1.000000  0.945536  1
1.000000  0.947321  1
1.000000  0.949107  1
1.000000  0.950893  1
1.000000  0.952679  1
1.000000  0.954464  1
1.000000  0.956250  1
1.000000  0.958036  1
1.000000  0.959821  1
1.000000  0.961607  1
1.000000  0.963393  1
1.000000  0.965179  1
1.000000  0.966964  1
1.000000  1.000000  1

[320 rows x 3 columns]

pphelper.racemodel.gen_cdfs_from_list(data, t_max=None, names=None, return_type=u'dataframe')¶

Estimate the empirical CDFs for a list of arrays.

The is a convenience function that wraps gen_cdf.

Parameters:	data (list of array_like objects) – A list of raw response time arrays. The RTs do not have to be ordered and may contain duplicate values. t_max (int, optional) – Up to which time point (in milliseconds) the model should be calculated. If not specified, the maximum value of the supplied input data will be used. return_type ({'dataframe', 'list'}) – The format of the returned object. dataframe returns a DataFrame, list returns a list of Series.
Returns:	The estimated empirical CDFs as columns of a DataFrame (default) or as a list of Series (if return_type=’list’).
Return type:	DataFrame or list of Series
Raises:	`ValueError` – If the name parameter does not have the same lengths as the data list.

Examples

>>> from pphelper.racemodel import gen_cdfs_from_list
>>> import numpy as np
>>> RTs = [np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270,
 280]), np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291])]
>>> gen_cdfs_from_list(RTs, names=['CondA', 'CondB'])
        CondA     CondB
t
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
  0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
 0.000000  0.000000
..        ...       ...
0.828671  0.400000
0.835664  0.425000
0.842657  0.450000
0.849650  0.475000
0.856643  0.500000
0.863636  0.525000
0.870629  0.550000
0.877622  0.583333
0.884615  0.616667
0.892308  0.650000
0.900000  0.683333
0.907692  0.716667
0.915385  0.750000
0.923077  0.783333
0.930769  0.816667
0.938462  0.850000
0.946154  0.857143
0.953846  0.864286
1.000000  0.871429
1.000000  0.878571
1.000000  0.885714
1.000000  0.892857
1.000000  0.900000
1.000000  0.907143
1.000000  0.914286
1.000000  0.921429
1.000000  0.928571
1.000000  0.935714
1.000000  0.942857
1.000000  1.000000

[292 rows x 2 columns]

pphelper.racemodel.gen_percentiles(n=10)¶

Calculate n equally spaced percentiles.

Parameters:	n (int, optional) – The number of percentiles to generate. Defaults to 10. Floats will be rounded.
Returns:	p – 1-dimensional array of the calculated percentiles.
Return type:	ndarray
Raises:	`TypeError` – If the supplied percentile number could not be converted to a rounded integer.

See also

get_percentiles_from_cdf()

Examples

>>> from pphelper.racemodel import gen_percentiles
>>> gen_percentiles()
array([ 0.05,  0.15,  0.25,  0.35,  0.45,  0.55,  0.65,  0.75,  0.85,  0.95])

pphelper.racemodel.gen_step_fun(rts)¶

Generate a step function from an observed response time distribution.

Parameters:	rts (array_like) – The input data (usually response times) to generate a step function from. Does not have to be ordered and may contain duplicates.
Returns:	A Series of the ordered response times (smallest to largest), indexed by their respective percentiles.
Return type:	Series

Examples

>>> from pphelper.racemodel import gen_step_fun
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> sf = gen_step_fun(RTs)
>>> plt.step(sf, sf.index, where='post'); plt.show()

pphelper.racemodel.get_percentiles_from_cdf(cdf, p=None, num_p=10, time_index=u't')¶

Interpolate the percentile boundaries.

Parameters:	cdf (Series) – The cumulative distribution polygon. Usually generated by gen_cdf(). p (array_like, optional) – The percentiles for which to get values from the polygon. Usually generated by gen_percentiles(). If this is supplied, the num_p argument will be ignored. num_p (int, optional) – The number of equally spaced percentiles to generate. Will be ignored if p is supplied. Defaults to 10. time_index (str, optional) – The name of the index storing the time (in milliseconds). This will only be used if the supplied CDF is a pandas Series with a MultiIndex. Defaults to t.
Returns:	Returns a Series of interpolated percentile boundaries (fictive response times).
Return type:	Series
Raises:	`TypeError` – If the supplied percentile object could not be cast into an array, or if the CDF object is not a Series.

Examples

>>> from pphelper.racemodel import gen_cdf, gen_percentiles, get_percentiles_from_cdf
>>> import numpy as np
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> cdf = gen_cdf(RTs)
>>> percentiles = gen_percentiles(5)
>>> get_percentiles_from_cdf(cdf, percentiles)
p
0.1    237.20
0.3    241.35
0.5    245.00
0.7    255.20
0.9    272.00
dtype: float64

pphelper.racemodel.sum_cdfs(cdfs)¶

Calculate the sum of multiple cumulative distribution functions.

Parameters:	cdfs (list) – A list of CDFs generated with `gen_cdf`, `gen_cdfs_from_list`, or `gen_cdfs_from_dataframe`.
Returns:	The sum of the CDFs in the interval [0, 1], indexed by the time in milliseconds.
Return type:	Series
Raises:	`ValueError` – If the supplied CDFs have unequal lengths. `IndexError` – If the indices of the supplied CDF Series objects do not match.

Notes

First calculates the sum of the CDFs, and returns the element-wise minima min[(sum, 1).

Examples

>>> from pphelper.racemodel import gen_cdfs_from_list, sum_cdfs
>>> import numpy as np
>>> RTs = [np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280]), np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291])]
>>> cdfs = gen_cdfs_from_list(RTs, names=['A', 'B'])
>>> sum_cdfs([cdfs['A'], cdfs['B']])
t
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
  0
  0
  0
  0
  0
...
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
Length: 292, dtype: float64

The racemodel Module¶

The `racemodel` Module¶