The racemodel Module

gen_cdf Estimate the cumulative frequency polygon from response time data.
gen_cdfs_from_list Estimate the empirical CDFs for a list of arrays.
gen_percentiles Calculate n equally spaced percentiles.
get_percentiles_from_cdf Interpolate the percentile boundaries.
gen_step_fun Generate a step function from an observed response time distribution.

Race model inequality analysis implementation, based on Ulrich, Miller, and Schröter (2007): ‘Testing the race model inequality: An algorithm and computer programs’, published in Behavior Research Methods 39 (2), pp. 291-302.

pphelper.racemodel.gen_cdf(rts, t_max=None)

Estimate the cumulative frequency polygon from response time data.

Parameters:
  • rts (array_like) – The raw response time data. Data does not need to be ordered and may contain duplicate values.
  • t_max (int, optional) – Up to which time point (in milliseconds) the model should be calculated. If not specified, the maximum value of the supplied input data will be used.
Returns:

A Series containing the estimated cumulative frequency polygon, indexed by the time points in ms.

Return type:

DataFrame or Series

Notes

Response times will be rounded to 1 millisecond. The algorithm is heavily adapted from the one described by Ulrich, Miller, and Schröter (2007): ‘Testing the race model inequality: An algorithm and computer programs’, published in Behavior Research Methods 39 (2), pp. 291-302.

Examples

>>> from pphelper.racemodel import gen_cdf
>>> import numpy as np
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> gen_cdf(RTs, t_max=RTs.max())
t
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
...
266    0.856643
267    0.863636
268    0.870629
269    0.877622
270    0.884615
271    0.892308
272    0.900000
273    0.907692
274    0.915385
275    0.923077
276    0.930769
277    0.938462
278    0.946154
279    0.953846
280    1.000000
Length: 281, dtype: float64
pphelper.racemodel.gen_cdfs_from_dataframe(data, rt_column=u'RT', modality_column=u'Modality', names=None)

Create cumulative distribution functions (CDFs) for response time data.

Parameters:
  • data (DataFrame) – A DataFrame with containing at least two columns: one with response times, and another one specifying the corresponding modalities.
  • rt_column (string, optional) – The name of the column containing the response times. Defaults to RT.
  • modality_column (string, optional) – The name of the column containing the modalities corresponding to the response times. Defaults to Modality.
  • names (list, optional) – A list of length 4, supplying the names of the modalities. The first three elements specify the modalities in the input data to consider. These three and the fourth argument are also used to label the columns in the returned DataFrame. If this argument is not supplied, a default list ['A', 'B', 'AB'] will be used.
Returns:

results – A DataFrame containing the empirical cumulative distribution functions generated from the input, one CDF per column. The number of columns depends on the number of unique values in the modality_column or on the names argument,

Return type:

DataFrame

Notes

This function internally calls gen_cdf. Please see this function to find out about additional optional keyword arguments.

Examples

>>> from pphelper.racemodel import gen_cdfs_from_dataframe
>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame({'RT': np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291,
... 245, 246, 248, 250, 251, 252, 253, 254, 255, 259, 263, 265, 279, 282, 284, 319,
... 234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280]),
... 'Modality': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x',
... 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y',
... 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', 'z', ]})
>>> gen_cdfs_from_dataframe(data)
            x         y  z
t
0    0.000000  0.000000  0
1    0.000000  0.000000  0
2    0.000000  0.000000  0
3    0.000000  0.000000  0
4    0.000000  0.000000  0
5    0.000000  0.000000  0
6    0.000000  0.000000  0
7    0.000000  0.000000  0
8    0.000000  0.000000  0
9    0.000000  0.000000  0
10   0.000000  0.000000  0
11   0.000000  0.000000  0
12   0.000000  0.000000  0
13   0.000000  0.000000  0
14   0.000000  0.000000  0
15   0.000000  0.000000  0
16   0.000000  0.000000  0
17   0.000000  0.000000  0
18   0.000000  0.000000  0
19   0.000000  0.000000  0
20   0.000000  0.000000  0
21   0.000000  0.000000  0
22   0.000000  0.000000  0
23   0.000000  0.000000  0
24   0.000000  0.000000  0
25   0.000000  0.000000  0
26   0.000000  0.000000  0
27   0.000000  0.000000  0
28   0.000000  0.000000  0
29   0.000000  0.000000  0
..        ...       ... ..
290  0.942857  0.916964  1
291  1.000000  0.918750  1
292  1.000000  0.920536  1
293  1.000000  0.922321  1
294  1.000000  0.924107  1
295  1.000000  0.925893  1
296  1.000000  0.927679  1
297  1.000000  0.929464  1
298  1.000000  0.931250  1
299  1.000000  0.933036  1
300  1.000000  0.934821  1
301  1.000000  0.936607  1
302  1.000000  0.938393  1
303  1.000000  0.940179  1
304  1.000000  0.941964  1
305  1.000000  0.943750  1
306  1.000000  0.945536  1
307  1.000000  0.947321  1
308  1.000000  0.949107  1
309  1.000000  0.950893  1
310  1.000000  0.952679  1
311  1.000000  0.954464  1
312  1.000000  0.956250  1
313  1.000000  0.958036  1
314  1.000000  0.959821  1
315  1.000000  0.961607  1
316  1.000000  0.963393  1
317  1.000000  0.965179  1
318  1.000000  0.966964  1
319  1.000000  1.000000  1

[320 rows x 3 columns]

pphelper.racemodel.gen_cdfs_from_list(data, t_max=None, names=None, return_type=u'dataframe')

Estimate the empirical CDFs for a list of arrays.

The is a convenience function that wraps gen_cdf.

Parameters:
  • data (list of array_like objects) – A list of raw response time arrays. The RTs do not have to be ordered and may contain duplicate values.
  • t_max (int, optional) – Up to which time point (in milliseconds) the model should be calculated. If not specified, the maximum value of the supplied input data will be used.
  • return_type ({'dataframe', 'list'}) – The format of the returned object. dataframe returns a DataFrame, list returns a list of Series.
Returns:

The estimated empirical CDFs as columns of a DataFrame (default) or as a list of Series (if return_type=’list’).

Return type:

DataFrame or list of Series

Raises:

ValueError – If the name parameter does not have the same lengths as the data list.

Examples

>>> from pphelper.racemodel import gen_cdfs_from_list
>>> import numpy as np
>>> RTs = [np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270,
 280]), np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291])]
>>> gen_cdfs_from_list(RTs, names=['CondA', 'CondB'])
        CondA     CondB
t
0    0.000000  0.000000
1    0.000000  0.000000
2    0.000000  0.000000
3    0.000000  0.000000
4    0.000000  0.000000
5    0.000000  0.000000
6    0.000000  0.000000
7    0.000000  0.000000
8    0.000000  0.000000
9    0.000000  0.000000
10   0.000000  0.000000
11   0.000000  0.000000
12   0.000000  0.000000
13   0.000000  0.000000
14   0.000000  0.000000
15   0.000000  0.000000
16   0.000000  0.000000
17   0.000000  0.000000
18   0.000000  0.000000
19   0.000000  0.000000
20   0.000000  0.000000
21   0.000000  0.000000
22   0.000000  0.000000
23   0.000000  0.000000
24   0.000000  0.000000
25   0.000000  0.000000
26   0.000000  0.000000
27   0.000000  0.000000
28   0.000000  0.000000
29   0.000000  0.000000
..        ...       ...
262  0.828671  0.400000
263  0.835664  0.425000
264  0.842657  0.450000
265  0.849650  0.475000
266  0.856643  0.500000
267  0.863636  0.525000
268  0.870629  0.550000
269  0.877622  0.583333
270  0.884615  0.616667
271  0.892308  0.650000
272  0.900000  0.683333
273  0.907692  0.716667
274  0.915385  0.750000
275  0.923077  0.783333
276  0.930769  0.816667
277  0.938462  0.850000
278  0.946154  0.857143
279  0.953846  0.864286
280  1.000000  0.871429
281  1.000000  0.878571
282  1.000000  0.885714
283  1.000000  0.892857
284  1.000000  0.900000
285  1.000000  0.907143
286  1.000000  0.914286
287  1.000000  0.921429
288  1.000000  0.928571
289  1.000000  0.935714
290  1.000000  0.942857
291  1.000000  1.000000

[292 rows x 2 columns]

pphelper.racemodel.gen_percentiles(n=10)

Calculate n equally spaced percentiles.

Parameters:n (int, optional) – The number of percentiles to generate. Defaults to 10. Floats will be rounded.
Returns:p – 1-dimensional array of the calculated percentiles.
Return type:ndarray
Raises:TypeError – If the supplied percentile number could not be converted to a rounded integer.

Examples

>>> from pphelper.racemodel import gen_percentiles
>>> gen_percentiles()
array([ 0.05,  0.15,  0.25,  0.35,  0.45,  0.55,  0.65,  0.75,  0.85,  0.95])
pphelper.racemodel.gen_step_fun(rts)

Generate a step function from an observed response time distribution.

Parameters:rts (array_like) – The input data (usually response times) to generate a step function from. Does not have to be ordered and may contain duplicates.
Returns:A Series of the ordered response times (smallest to largest), indexed by their respective percentiles.
Return type:Series

Examples

>>> from pphelper.racemodel import gen_step_fun
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> sf = gen_step_fun(RTs)
>>> plt.step(sf, sf.index, where='post'); plt.show()
pphelper.racemodel.get_percentiles_from_cdf(cdf, p=None, num_p=10, time_index=u't')

Interpolate the percentile boundaries.

Parameters:
  • cdf (Series) – The cumulative distribution polygon. Usually generated by gen_cdf().
  • p (array_like, optional) – The percentiles for which to get values from the polygon. Usually generated by gen_percentiles(). If this is supplied, the num_p argument will be ignored.
  • num_p (int, optional) – The number of equally spaced percentiles to generate. Will be ignored if p is supplied. Defaults to 10.
  • time_index (str, optional) – The name of the index storing the time (in milliseconds). This will only be used if the supplied CDF is a pandas Series with a MultiIndex. Defaults to t.
Returns:

Returns a Series of interpolated percentile boundaries (fictive response times).

Return type:

Series

Raises:

TypeError – If the supplied percentile object could not be cast into an array, or if the CDF object is not a Series.

Examples

>>> from pphelper.racemodel import gen_cdf, gen_percentiles, get_percentiles_from_cdf
>>> import numpy as np
>>> RTs = np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280])
>>> cdf = gen_cdf(RTs)
>>> percentiles = gen_percentiles(5)
>>> get_percentiles_from_cdf(cdf, percentiles)
p
0.1    237.20
0.3    241.35
0.5    245.00
0.7    255.20
0.9    272.00
dtype: float64
pphelper.racemodel.sum_cdfs(cdfs)

Calculate the sum of multiple cumulative distribution functions.

Parameters:

cdfs (list) – A list of CDFs generated with gen_cdf, gen_cdfs_from_list, or gen_cdfs_from_dataframe.

Returns:

The sum of the CDFs in the interval [0, 1], indexed by the time in milliseconds.

Return type:

Series

Raises:
  • ValueError – If the supplied CDFs have unequal lengths.
  • IndexError – If the indices of the supplied CDF Series objects do not match.

Notes

First calculates the sum of the CDFs, and returns the element-wise minima min[(sum, 1).

Examples

>>> from pphelper.racemodel import gen_cdfs_from_list, sum_cdfs
>>> import numpy as np
>>> RTs = [np.array([234, 238, 240, 240, 243, 243, 245, 251, 254, 256, 259, 270, 280]), np.array([244, 249, 257, 260, 264, 268, 271, 274, 277, 291])]
>>> cdfs = gen_cdfs_from_list(RTs, names=['A', 'B'])
>>> sum_cdfs([cdfs['A'], cdfs['B']])
t
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
...
277    1
278    1
279    1
280    1
281    1
282    1
283    1
284    1
285    1
286    1
287    1
288    1
289    1
290    1
291    1
Length: 292, dtype: float64