ekpy.analysis package¶

Submodules¶

ekpy.analysis.core module¶

class ekpy.analysis.core.Data(initializer)¶

Bases: object

Data class for maintaining and manipulating the real data. Typically retrieved via Dataset.get_data()

Parameters: dict (Dict) – a dict (with form shown below) of the data.

Examples

>>> data
> {0: {
        'definition': {'frequency': {'1067hz'},
                                   'amplitude': {'500ua'},
                                   'nave': {5},
                                   'low_current': {-2.5},
                                   'high_current': {2.5},
                                   'delay': {1},
                                   'time_constant': {'30ms'},
                                   'ramp_rate': {0.05},
                                   'ramp_up_first': {True},
                                   'identifier': {'B1'},
                                   'sensitivity': {'50mv/na'},
                                   'trial': {0}},
        'data': {'H_mean': array([-403.524  , -407.18   , -395.602  , -382.146  , -367.444  , ... ]),
                   'H_std': array([10.39320663,  4.7261697 ,  5.3691951 ,  5.71281577,  5.9829578 , ... ]),
                   'R_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ... ]),
                   'R_std': array([1.69941166e-06, 1.42182981e-06, 1.42182981e-06, 3.31276320e-06, ...]),
                   'Theta_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ...]),
                   'Theta_std': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , ...])}
        }
}

# one can access data and/or definition as an attribute

>>> data.definition
>{'frequency': {'1067hz'},
   'amplitude': {'500ua'},
   'nave': {5},
   'low_current': {-2.5},
   'high_current': {2.5},
   'delay': {1},
   'time_constant': {'30ms'},
   'ramp_rate': {0.05},
   'ramp_up_first': {True},
   'identifier': {'B1'},
   'sensitivity': {'50mv/na'},
   'trial': {0}
}
>>> data.data
> {'H_mean': array([-403.524  , -407.18   , -395.602  , -382.146  , -367.444  , ... ]),
   'H_std': array([10.39320663,  4.7261697 ,  5.3691951 ,  5.71281577,  5.9829578 , ... ]),
   'R_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ... ]),
   'R_std': array([1.69941166e-06, 1.42182981e-06, 1.42182981e-06, 3.31276320e-06, ...]),
   'Theta_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ...]),
   'Theta_std': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , ...])}

# one can also access individual attributes of definition or data:
>>> data.H_mean
> array([-403.524  , -407.18   , -395.602  , -382.146  , -367.444  , ... ])

apply(func: callable, pass_defn: bool = False, pass_trials_iteratively: bool = True, ignore_errors: bool = True, ignore_coerce_warnings: bool = True, **kwargs)¶

Apply data_function to the data in each index. **kwargs will be passed to data_function. If function_on_data returns ‘None’, that piece of data will be dropped.

Parameters

function_on_data (function) – f(dict) -> dict. Function is passed the data_dict for each index.
pass_defn (bool) – Whether or not to pass the definition to function_on_data. If True, will be passed with other kwargs.
pass_trials_iteratively (bool) – True for functions which operate on a single trial. False for functions which operate across trials. (Only used for grouped data)
ignore_errors (bool) – If True, errors in function_on_data will be printed, but not raised. Resulting data will be original data. If False, errors will be raised.
ignore_coerce_warnings (bool) – Whether or not to ignore coerce warnings in data_array_builder() class. Most likely want this false.

Returns

the new data after operating on it

Return type

(Data)

Examples

>>> some_data
>
{
        0: {'definition': {'param1': {'10V'},
                'param2': {'100ns', '10ns', '50ns'},
                'param3': {'1mv', '2mv'}},
                'data': {'raw_data': array([[1, 2, 3],
                          [1, 2, 3],
                          [1, 2, 3],
                          [1, 2, 3]], dtype=int64)}},
        1: {'definition': {'param1': {'5V'}, 'param2': {'100ns'}, 'param3': {'1mv'}},
                'data': {'raw_data': array([1, 2, 3], dtype=int64)}}
}

#some function will square the data

def some_function(data_dict):
        "a function which operates on the data dict and returns a data dict"
        out = dict()
        for key in data_dict:
                out.update({key:data_dict[key]**2})
        return out

>>> some_data.apply(some_function)
>
{
        0: {'definition': {'param1': {'10V'},
                'param2': {'100ns', '10ns', '50ns'},
                'param3': {'1mv', '2mv'}},
                'data': {'raw_data': array([[1, 4, 9],
                  [1, 4, 9],
                  [1, 4, 9],
                  [1, 4, 9]], dtype=int64)}},
        1: {'definition': {'param1': {'5V'}, 'param2': {'100ns'}, 'param3': {'1mv'}},
                'data': {'raw_data': array([1, 4, 9], dtype=int64)}}}

Passing trials iteratively

>>> _dict = {
    0: {
        'data':{'x':np.array([1])},
        'definition':{'param':{'a'}}
    },
    1: {
        'data':{'x':np.array([0])},
        'definition':{'param':{'a'}}
    }
}

>>> data = analysis.Data(_dict)
>>> def subtract_offset(data_dict):

>>> ... return {'x':data_dict['x']-np.mean(data_dict['x'])}

>>> data.apply(subtract_offset)
> {     0: {'data': {'x': array([0.])}, 'definition': {'param': {'a'}}},
                1: {'data': {'x': array([0.])}, 'definition': {'param': {'a'}}}}

>>> data.groupby('param')
> {0:{'data': {'x': array([[1], [0]])}, 'definition': {'param': {'a'}}}}

>>> data.groupby('param').apply(subtract_offset)
> {0: {'data': {'x': array([[0.], [0.]])}, 'definition': {'param': {'a'}}}}

>>> data.groupby('param').apply(subtract_offset, pass_trials_iteratively=False)
> {0: {'data': {'x': array([[ 0.5], [-0.5]])},'definition': {'param': {'a'}}}}

collapse(data_key)¶

Return collapsed (numpy.array) data corresponding to data_key. This will return all data for all indices concatenated into a single array.

Parameters: data_key (key) – Key for data you wish to collapse
Returns: Concatenated array of all data corresponding to data_key for all indices in self.
Return type: (numpy.array)

contains(condition)¶

Returns data specified by condition.

Parameters: condition (dict) – key is definition key, value is value to find. Multiple values provided will be joined by logical OR. Multiple keys will be joined with logical AND.
Returns: data satisfying condition
Return type: (Data)

Examples

#Providing multiple values will be joined by logical or, i.e.
Data.contains({'high_voltage_v':{'100mv', '200mv'}})
#will search for all data with '100mv' OR '200mv' for high_voltage_v.

#Multiple keys provided will be joined with logical AND. i.e.
Data.contains({'x':1, 'y':2})
#will search for x = 1 AND y = 2.

property data_keys¶

Return a list of keys corresponding to data.

Returns: data keys
Return type: (list)

dropna()¶: Drop nans from data

filter(data_condition_function_dict, definition_condition_dict=None, additional_data_keys_to_filter=None)¶

Filter the data.

Parameters

data_condition_function_dict (dict) – a dict with one entry. key is data key, value is function which operates on a single value.
definition_condition_dict (dict) – a dict specifying specific definition conditions, which when satisfied can allow their data to be operated on. If None, all data will be filtered.
additional_data_keys_to_filter (str or key or array-like) – Additional data keys that you wish to filter based on data_condition_function_dict.

Returns

filtered Data

Return type

(Data)

Examples

Filter the data such that the data key ‘R’ only contains values >10. This leaves all other data keys unchanged.

>>> Data.filter({'R': lambda x: x>10})

Filter data corresponding to an amplitude of ‘500ua’ such that data key ‘R’ contains only values >10.

>>> Data.filter({'R': lambda x: x>10}, {'amplitude':'500ua'})

Filter data based on ‘saturation’ but also filter the ‘switching_time’ data.

>>> Data.filter({'saturation': lambda x: x > .003}, additional_data_keys_to_filter='switching_time')

You may use this method to remove outliers, for example.

groupby(key: str)¶

Group data by key. Similar to Dataset.get_data(groupby=key).get_data(), though offers one to perform functions on individual data files before grouping.

Parameters

data (ekpy.analysis.core.Data) – The data to group
key (str) – Key to group on

Returns

(ekpy.analysis.core.Data)

property iloc¶

An indexer as in pandas .iloc Usage is Data.iloc[index]

Returns: indexer for indexing
Return type: (iDataIndexer)

Example

>>> Data.iloc[0]

mean(axis=0)¶

Return mean of Data. If 1d data is supplied, mean will be performed over the trial, otherwise mean will be performed across trials.

Returns: (Data)

Examples

# mean over a trial
>>> X = np.array([1,2,3])
>>> data = Data({0:{'definition':{},'data':{'X':X}}})
>>> data.mean().X
> 2.0

# mean across trials
>>> X = np.array([[1,2,3], [3,4,5]])
>>> data = Data({0:{'definition':{},'data':{'X':X}}})
>>> data.mean().X
> array([2., 3., 4.])

plot(x=None, y=None, ax=None, color=None, cmap='viridis', labelby=None, **kwargs)¶

Plot the data. If ax is provided returns ax, otherwise returns fig, ax.

Parameters

x (key) – data dict key for x axis.
y (key or array-like) – data dict key for y axis
ax (matplotlib.axis) – axis to plot on
color (str) – Color of plot. (Override colormap)
cmap (str) – Color map. See matplotlib.cm.cmaps_listed for allowed colormaps.
labelby (str) – Definition key to use for plot legend.

Returns

figure of plot ax (matplotlib.axis): axis of plot. if ax is provided as an argument, only returns ax.

Return type

fig (matplotlib.figure)

scatter(x=None, y=None, ax=None, color=None, cmap='viridis', labelby=None, **kwargs)¶

Scatter plot the data. If ax is provided returns ax, otherwise returns fig, ax.

Parameters

x (key) – data dict key for x axis.
y (key or array-like) – data dict key for y axis
ax (matplotlib.axis) – axis to plot on
color (str) – Color of plot. (Override colormap)
cmap (str) – Color map. See matplotlib.cm.cmaps_listed for allowed colormaps.
labelby (str) – Definition key to use for plot legend.

Returns

figure of plot ax (matplotlib.axis): axis of plot. if ax is provided as an argument, only returns ax.

Return type

fig (matplotlib.figure)

sort(by, key=None, reverse=False)¶

Sort Data by definition key. This might be useful, for example in plotting with color maps. This Method sorts Data over mulitple indices, does not sort Data within an index.

Parameters

by (str or key) – Definition key. The definition key that you are sorting on must be unique for each index in your Data object.
key (function) – Method for accessing value to sort.
reverse (bool) – Reverse order

Returns

Sorted Data

Return type

(Data)

Examples

Sort Data on parameter ‘test’

>>> data = dset.get_data()
>>> data.summary
>
{'param': {'v'},
 'test': {0, 1, 2, 3, 4},
 'test2': {'100mv', '15mv', '1mv', '20mv', '32mv'}}

# Sort by definition key 'test'
>>> data = data.sort(by='test')

# Confirm sorted:
>>> data.test
> {0: {0}, 1: {1}, 2: {2}, 3: {3}, 4: {4}}

### Sort in reverse order:

>>> data = dset.get_data()
>>> data.sort(by = 'test', reverse = True).test
> {0: {4}, 1: {3}, 2: {2}, 3: {1}, 4: {0}}

Sort Data on a key that is not int or float:

>>> data = dset.get_data()
>>> data.summary
>
{'param': {'v'},
 'test': {0, 1, 2, 3, 4},
 'test2': {'100mv', '15mv', '1mv', '20mv', '32mv'}}

# unsorted data
>>> data.test2
> {0: {'1mv'}, 1: {'100mv'}, 2: {'20mv'}, 3: {'15mv'}, 4: {'32mv'}}

# Sort by definition key 'test2'
>>> sort_by = lambda x: float(x.replace('mv', '')) # get rid of 'mv' suffix and convert to float
>>> data = data.sort(by='test2', key=sort_by)
>>> data.test2
> {0: {'1mv'}, 1: {'15mv'}, 2: {'20mv'}, 3: {'32mv'}, 4: {'100mv'}}

std()¶

Return standard deviation of Data. If 1d data is supplied, std will be performed over the trial, otherwise std will be performed across trials.

Returns: (Data)

property summary¶

Return a summary of the definitions in data

Returns: summary of the data included
Return type: (dict)

to_DataFrame(how='lump_mean', include_defn_keys=[], defn_converter=None)¶

Convert Data to pandas.DataFrame. Each index in Data will correspond to a single row in the resulting DataFrame.

Parameters

how (function) – Method for converting data. f(ndarray, key) -> value. Default ‘lump_mean’ averages all data for each index in Data corresponding to each data key. ndarray is data array corresponding to data key key. how should operate on data corresponding to a single Data index.
include_defn_keys (str, key or array-like) – Definition key(s) to include in resulting dataframe. i.e., each key in include_defn_keys will be a column name with values corresponding to the value for each index in Data.
defn_converter (function or array-like) – Optional. Methods for converting definition values to alternative type, perhaps from str to float.

Returns

(pandas.DataFrame)

Examples

>>> _dict = {
                                0: {
                                        'definition':{
                                                'frequency':{'1khz'},
                                                'amplitude':{'500mv'}
                                        },
                                        'data':{
                                                'R':np.array([1,2,2,2]),
                                                'theta':np.array([0,0,0,0])
                                        },
                                },
                                1: {
                                        'definition':{
                                                'frequency':{'1khz'},
                                                'amplitude':{'800mv'}
                                        },
                                        'data':{
                                                'R':np.array([2,3,3,3]),
                                                'theta':np.array([0,0,0,0])
                                        },
                                }
                        }

>>> data = Data(_dict)
>>> data.to_DataFrame()
>
          R  theta
0  1.75    0.0
1  2.75    0.0

>>> data.to_DataFrame(how=lambda x,key: x[0])
>
          R  theta
0  1.0    0.0
1  2.0    0.0

>>> data.to_DataFrame(include_defn_keys='frequency')
>
          R  theta frequency
0  1.75    0.0      1khz
1  2.75    0.0      1khz

>>> data.to_DataFrame(include_defn_keys=['frequency', 'amplitude'], defn_converter=[
        lambda x: float(x.replace('khz', 'e3')), lambda x: float(x.replace('mv', 'e-3'))
])
>
          R  theta  frequency  amplitude
0  1.75    0.0     1000.0        0.5
1  2.75    0.0     1000.0        0.8

to_dict()¶

Return a dict of the Data class.

Returns: a dict class with identical structure
Return type: (dict)

to_ekpdat(file)¶

Save file as .ekpdat file.

Parameters: file (str) – Path to file

class ekpy.analysis.core.Dataset(path, initializer, readfileby=<function read_ekpy_data>, pointercolumn='filename')¶

Bases: object

Dataset class for analysis. Used to manipulate meta data while keeping track of location for the real data, which can be retrieved when necessary.

Parameters

path (str or dict) – Path to the real data.
initializer (pandas.DataFrame or dict) – Initializer for a DataFrame. The meta data.
readfileby (function) – How to read the data. Default is ekpy.utils.read_ekpy_data()
pointercolumn (str or index) – Column name which holds name of file. Default is 'filename'

Examples

>>> meta_data = pd.DataFrame(
        {
                'voltageApplied':['1v','2v'],
                'filename':['t1.csv','t2.csv']
        }
)

#Assuming the data is stored in './data/', create the Dataset
>>> dset = Dataset('./data/', meta_data,)

#Query the meta data for when voltageApplied is '1v':
>>> dset.query('voltageApplied == "1v"')

#return the Data
>>> dset.get_data()

add_calculated_column(column_name, how)¶

Add a calculated column to the Dataset.

Parameters

column_name (str) – The new column name
how (function) – f(self) -> column data. A function which operates on self (pandas.DataFrame) and returns new column data.

Returns

Updated Dataset.

Return type

(Dataset)

Examples

Convert 25um and 10um to measured areas. This will only work if no other diameters are present in the Dataset.

>>> def how(dataframe):
                nominal_diameter_to_measured_area_dict = {'25um':190, '10um':60}
                return [nominal_diameter_to_measured_area_dict[x] for x in dataframe['diameter'].values]

>>> dset.add_calculated_column('measured_area_um', how = how)

add_column(column_name, column_data)¶

Add a column to a Dataset.

Parameters

column_name (str) – The new column name.
column_data (array-like) – The data for the column

property columns¶

filter_on_column(column, function, **kwargs_for_function)¶

Filter Dataset. Keeps rows with column satisfying function.

Parameters

column (str or index) – Specify column
function (function) – Filter function. function(value) -> bool
kwargs_for_function (kwargs) – kwargs to pass to function

Returns

Filtered Dataset.

Return type

(Dataset)

Examples

>>> dset.summary
{
        'identifier': {'185um'},
        'pulsewidth_ns': {10.0, 50.0, 100.0},
        'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0},
        'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5},
        'preset_voltage_v': {0.5},
        'preset_pulsewidth_ns': {10000.0},
        'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
}

#Return only rows with where the high_voltage_v is greater than 1 and print the summary
>>> dset.filter_on_column('high_voltage_v', lambda x: x>1).summary
{   'identifier': {'185um'},
        'pulsewidth_ns': {50.0},
        'delay_ns': {10.0},
        'high_voltage_v': {1.5, 2.0, 2.5},
        'preset_voltage_v': {0.5},
        'preset_pulsewidth_ns': {10000.0},
        'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
}

get_data(groupby=None, labelby=None)¶

Return data in Data (Data class) for the current Dataset. If using groupby kwarg, resulting Data will vstack all data which corresponds to that grouping. (See examples)

Parameters

groupby (str, label, index or array-like of) – what to group on
labelby (str, label, index or array-like of) – what to label the output data by. This will change ‘definition’ in output Data class

Returns

the data

Return type

(Data)

Examples

>>> dset.summary
{
        'identifier': {'185um'},
        'pulsewidth_ns': {10.0, 50.0, 100.0},
        'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0},
        'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5},
        'preset_voltage_v': {0.5},
        'preset_pulsewidth_ns': {10000.0},
        'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
}

>>> data = dset.get_data() #no groupby yet
>>> data.data_keys
['time', 'p1', 'p2']

>>> data.iloc[0].p1 # This corresponds to a single trial (1D data)
array([  0.00898495,  0.00765674,  0.00351585, ..., -0.00679731,
                -0.00101569, -0.00039065])

# now we will group by high_voltage_v. There are many .csv files that correspond to such a case
# for example how many are there for a high_voltage_v of .125? (there are 5, see here)
>>> len(dset.query('high_voltage_v == .125'))
5

# let's retrieve the data but with grouping
>>> data = dset.get_data(groupby = 'high_voltage_v')
>>> data.iloc[0]['p1'] #vstack of all different .csv files grouped by high_voltage_v (a meta data parameter)
array([[ 0.00898495,  0.00765674,  0.00351585, ..., -0.00679731,
                -0.00101569, -0.00039065],
           [-0.02172014, -0.02773615, -0.03695549, ..., -0.0203138 ,
                -0.0085943 , -0.00117195],
           [-0.0351585 , -0.02859558, -0.02289209, ..., -0.03093948,
                -0.01968876, -0.01289145],
           [-0.02765802, -0.02453282, -0.02070445, ..., -0.00679731,
                -0.00257829, -0.0093756 ],
           [-0.0375024 , -0.04656548, -0.04930003, ..., -0.03789305,
                -0.02609542, -0.01632917]])

# no longer is it 1D data.
>>> data[0]['data']['p1'].shape
(5, 500)

#recall that there were 5 rows in the Dataset corresponding to a high_voltage_v of .125. Is this that grouping? we can check:
>>> data[0]['definition']
# indeed it is!!!
{
        'identifier': {'185um'},
        'pulsewidth_ns': {10.0},
        'delay_ns': {20.0},
        'high_voltage_v': {0.125},
        'preset_voltage_v': {0.5},
        'preset_pulsewidth_ns': {10000.0},
        'trial': {0, 1, 2, 3, 4}
}

head(*args, **kwargs)¶

property index_to_path¶

Index to path pandas.Series

Returns: Index to path
Return type: (pandas.Series)

property path¶: Return path to data.

property pretty_summary¶: Print a summary of dataset in an easier to read fashion

query(*args, **kwargs)¶

Query the columns of a Dataset with a boolean expression.

Parameters: expr (str) – The query string to evaluate. You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2) would be referenced as Area (cm^2)). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used. For example, if one of your columns is called a a and you want to sum it with b, your query should be a a + b.
Returns: the result of the query
Return type: (Dataset)

remove_index(index)¶

Remove an index or array-like of indices.

Parameters: index (index or array-like) – index to be removed
Returns: updated Dataset
Return type: (Dataset)

remove_nonexistent_files_from_metadata()¶: Remove references to files that do not exist in path. This may occur, for example, if you know certain data files are bad (and thus delete them from the data dir), but did not delete them while collecting data.

save_meta_data()¶: Save the current meta_data as pandas.DataFrame to path. This is not allowed for merged datasets i.e. Dataset resulting from analysis.utils.merge_Datasets. To save a Dataset (including merged) see to_ekpds.

select_index(index)¶

Return dataset with single index specified.

Parameters: index (int or index) – Index to select
Returns: Single row Dataset.
Return type: (Dataset)

property summary¶

Return a brief summary of the data in your Dataset.

Returns: a summary of the Dataset. keys are columns names, values are sets of values appearing in the Dataset.
Return type: (Dict)

Examples

>>> dset.head()
identifier  pulsewidth_ns  delay_ns  high_voltage_v  preset_voltage_v                           0      185um          100.0     200.0            0.25               0.5
1      185um          100.0     200.0            0.25               0.5
2      185um          100.0     200.0            0.25               0.5
3      185um          100.0     200.0            0.25               0.5
4      185um          100.0     200.0            0.25               0.5

   preset_pulsewidth_ns                                      filename  trial
0               10000.0  185um100e-9_200e-9_0x25V_500mv_10000ns_0.csv      0
1               10000.0  185um100e-9_200e-9_0x25V_500mv_10000ns_1.csv      1
2               10000.0  185um100e-9_200e-9_0x25V_500mv_10000ns_2.csv      2
3               10000.0  185um100e-9_200e-9_0x25V_500mv_10000ns_3.csv      3
4               10000.0  185um100e-9_200e-9_0x25V_500mv_10000ns_4.csv      4

>>> dset.summary
{
        'identifier': {'185um'},
        'pulsewidth_ns': {10.0, 50.0, 100.0},
        'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0},
        'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5},
        'preset_voltage_v': {0.5},
        'preset_pulsewidth_ns': {10000.0},
        'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
}

to_ekpds(path)¶

Save Dataset to file (extension .ekpds).

Parameters: path (str) – Path to save location

Example

dset.to_ekpds('./dset1.ekpds')

ekpy.analysis.data_utils module¶

ekpy.analysis.data_utils.get_vals_by_definition(data, definition_key, data_key)¶

Returns a dict where key is definition_key value is data from each Data index.

Parameters

definition_key (str or key) – Definition key whose value will correspond to the key for the returned dict.
data_key (str or key) – Data key whose value will correspond to the value for the returned dict.

Returns

Values by definition dict.

Return type

out (dict)

Examples

>>> data
> {{0: {'definition': {'type': {'preset2pulse'},
   'identifier': {'125um2'},
   'pulsewidth_ns': {500.0},
   'delay_ns': {100000000.0},
   'high_voltage_v': {1.0},
   'preset_voltage_v': {1.0},
   'preset_pulsewidth_ns': {1000.0},
   'diameter': {12.5},
   'area': {122.7184630308513},
   'trial': {0}},
  'data': {'p1': array([-0.003, -0.003, -0.001, ..., -0.003, -0.003, -0.001]),
   'time': array([0.0000e+00, 4.0000e-02, 8.0000e-02, ..., 1.9988e+02, 1.9992e+02,
                  1.9996e+02]),
   'p2': array([-0.001, -0.003, -0.005, ..., -0.003,  0.001,  0.003])}},
 1: {'definition': {'type': {'preset2pulse'},
   'identifier': {'125um2'},
   'pulsewidth_ns': {500.0},
   'delay_ns': {100000000.0},
   'high_voltage_v': {0.5},
   'preset_voltage_v': {1.0},
   'preset_pulsewidth_ns': {1000.0},
   'diameter': {12.5},
   'area': {122.7184630308513},
   'trial': {0}},
  'data': {'p1': array([0.0004, 0.0012, 0.0004, ..., 0.002 , 0.002 , 0.0004]),
   'time': array([0.0000e+00, 4.0000e-02, 8.0000e-02, ..., 1.9988e+02, 1.9992e+02,
                  1.9996e+02]),
   'p2': array([-0.0004, -0.0004,  0.0004, ...,  0.0012,  0.0012,  0.0004])}},}

#retrieve 'p1' data keyed by high_voltage_v
>>> analysis.get_vals_by_definition(data, 'high_voltage_v', 'p1')
> {     1.0: [-0.003,-0.003, ... ],
        0.5: [0.0004, 0.0012, ...]}

ekpy.analysis.data_utils.vals_by_definition_to_2darray(vals_by_definition, converter='Default', ascending=True)¶

Convert vals by definition to 2d array. Typically used for plotting and after .get_vals_by_definition().

Parameters

vals_by_definition (dict, vbd) – vals by definition dict. See .get_vals_by_definition()
converter (function) – Function to which each value for each key from vals_by_definition is passed. Default is convert to float. ` converter = lambda x: float(x) `
ascending (bool) – Return X in ascending or descending order

Returns

X, Y

Return type

(numpy.array (2D))

Examples

>>> vbd
> { 1.0 : [1,2,3],
        0.5 : [2,1,1]}

>>> X, Y = analysis.vals_by_definition_to_2darray(vbd)
>>> X
> [1.0, 1.0, 1.0, 0.5, 0.5, 0.5]

>>> Y
> [1, 2, 3, 2, 1, 1]

ekpy.analysis.load module¶

ekpy.analysis.load.generate_meta_data(path, mapper: f(str)->dict, pass_path=False, pointercolumn='filename', overwrite=False, ignore_errors=True)¶

Generate meta_data from a path for a given mapper function. Important mapper must include pointercolumn which is (key,value) = (‘<pointer column name>’, <filename>). Default is to call such a column filename, i.e. {‘filename’:’a.csv’}
args:
path (str): Specify the path to the directory mapper ( function ) : filename (str) -> dict. A function which operates on a single file name in order to get the columns (dict key) and values (dict value) for meta_data of that file. pass_path (bool) : Pass the pass of each file to mapper. This is used to parse meta data from within the file, as one can now open the file within mapper. If True, mapper must take argument path. pointercolumn (str) : The name of the pointercolumn in the created meta_data overwrite (bool) : True will overwrite any existing meta_data in path. ignore_errors (bool) : False will hault generation of meta data if a single file fails. Default is True (ignore)

examples:
Basic usage where mapper operates only on filename:
def mapper(file,):
        spl = file.split('_')

        meta_data = {
                        'param1':spl[0] # the first parameter of interest is located at the first split location.
                        'filename':file # must include the filename (or other `pointercolumn`)
                }

        return meta_data

generate_meta_data(path, mapper, pointercolumn='filename')
Parse the data file itself for metadata:
def mapper(file, path): # must contain kwarg path!
        full_path = path+file

        with open(full_path, 'r') as f:
                lines = f.readlines()

        ### extract metadata from lines ###
        param1 = lines[0].replace('

‘,’’)

meta_data = {
‘param1’:param1, ‘filename’:’file’

} return meta_data

generate_meta_data(path, mapper, pass_path=True)

ekpy.analysis.load.load_Dataset(path, meta_data=None, readfileby=<function read_ekpy_data>)¶

Load a dataset from path. Path must contain (pickle or .csv) file 'meta_data'.

Parameters

path (str) – Path to data
meta_data (pandas.DataFrame) – meta_data if one wishes to provide different meta_data from that provided in path.
readfileby (callable) – Method for reading data.

Returns

Dataset

Return type

(Dataset)

ekpy.analysis.load.read_ekpdat(filename)¶

Read Data from .ekpdat file.

Parameters: filename (str) – Path to file
Returns: Data
Return type: (Data)

ekpy.analysis.load.read_ekpds(filename)¶

Read a Dataset from .ekpds file.

Parameters: filename (str) – Path to file
Returns: Dataset
Return type: (Dataset)

ekpy.analysis.plotting module¶

ekpy.analysis.plotting.add_legend_element(ax, label, color, fontsize='auto', **kwargs)¶

Add element to legend for matplotlib.axis. For **kwargs see matplotlib.lines.line2D

Parameters

ax (matplotlib.axis) – Axis to add legend element.
label (str or int or float) – Label for legend element.
color (str or color) – Color for legend element.
fontsize (str or float) – Fontsize for the legend. Default is ‘auto’.

Returns

Axis with updated legend

Return type

(matplotlib.axis)

ekpy.analysis.plotting.format_legend(ax, **kwargs)¶

ekpy.analysis.utils module¶

ekpy.analysis.utils.concat_Datas(datas)¶

Concatenate data.

Parameters: datas (iter of Data) – Iterable of Data objects to merge.
Returns: Concatenated data.
Return type: (Data)

ekpy.analysis.utils.concat_Datasets(datasets)¶

Concatenate datasets.

Parameters: datasets (iter of Dataset) – Iterable of Dataset objects to merge.
Returns: Concatenated dataset.
Return type: (Dataset)

ekpy.analysis.utils.merge_Datas(tpl, by: str)¶

Merge tpl of Data on definition key (by).

Parameters

tpl (array-like) – Array-like of Data objects
by (str) – Definition key to merge on

Returns

Merged Data.

Return type

(Data)

Example

>> data1 = Data({
        0 : {
                'definition': {'param1':{'eric'}, 'param2':{'merge_on'}},
                'data':{'data1':[0,1,2]}
        }
})

>> data2 = Data({
        0 : {
                'definition': {'param1':{'othername'}, 'param2':{'merge_on'}},
                'data':{'data1':[3,4,5]}
        }
})

>> merge_Datas((data1, data2), by='param2')
> {0: {  'data': {'data1_0': [0, 1, 2], 'data1_1': [3, 4, 5]},
                 'definition': {'param1_0': {'eric'},
                                                'param1_1': {'othername'},
                                                'param2': {'merge_on'}}}}

>> data3 = Data({
        0 : {
                'definition': {'param1':{'eric'}, 'param2':{'this will be its own index'}},
                'data':{'data1':[0,1,2]}
        }
})
> {0: {'data': {'data1_0': [0, 1, 2], 'data1_1': [3, 4, 5]},
         'definition': {'param1_0': {'eric'},
                                        'param1_1': {'othername'},
                                        'param2': {'merge_on'}}},
 1: {'data': {'data1_0': [0, 1, 2]},
         'definition': {'param1_0': {'eric'},
                                        'param2': {'this will be its own index'}}}}

ekpy.analysis.utils.merge_Datasets(datasets)¶

ekpy.analysis package¶

Submodules¶

ekpy.analysis.core module¶

ekpy.analysis.data_utils module¶

ekpy.analysis.load module¶

ekpy.analysis.plotting module¶

ekpy.analysis.utils module¶

Module contents¶