ekpy.analysis package¶
Submodules¶
ekpy.analysis.core module¶
-
class
ekpy.analysis.core.Data(initializer)¶ Bases:
objectData class for maintaining and manipulating the real data. Typically retrieved via
Dataset.get_data()- Parameters
dict (Dict) – a dict (with form shown below) of the data.
Examples
>>> data > {0: { 'definition': {'frequency': {'1067hz'}, 'amplitude': {'500ua'}, 'nave': {5}, 'low_current': {-2.5}, 'high_current': {2.5}, 'delay': {1}, 'time_constant': {'30ms'}, 'ramp_rate': {0.05}, 'ramp_up_first': {True}, 'identifier': {'B1'}, 'sensitivity': {'50mv/na'}, 'trial': {0}}, 'data': {'H_mean': array([-403.524 , -407.18 , -395.602 , -382.146 , -367.444 , ... ]), 'H_std': array([10.39320663, 4.7261697 , 5.3691951 , 5.71281577, 5.9829578 , ... ]), 'R_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ... ]), 'R_std': array([1.69941166e-06, 1.42182981e-06, 1.42182981e-06, 3.31276320e-06, ...]), 'Theta_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ...]), 'Theta_std': array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , ...])} } }
# one can access data and/or definition as an attribute >>> data.definition >{'frequency': {'1067hz'}, 'amplitude': {'500ua'}, 'nave': {5}, 'low_current': {-2.5}, 'high_current': {2.5}, 'delay': {1}, 'time_constant': {'30ms'}, 'ramp_rate': {0.05}, 'ramp_up_first': {True}, 'identifier': {'B1'}, 'sensitivity': {'50mv/na'}, 'trial': {0} } >>> data.data > {'H_mean': array([-403.524 , -407.18 , -395.602 , -382.146 , -367.444 , ... ]), 'H_std': array([10.39320663, 4.7261697 , 5.3691951 , 5.71281577, 5.9829578 , ... ]), 'R_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ... ]), 'R_std': array([1.69941166e-06, 1.42182981e-06, 1.42182981e-06, 3.31276320e-06, ...]), 'Theta_mean': array([0.0324881 , 0.03248582, 0.03248772, 0.03248544, 0.03248354, ...]), 'Theta_std': array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , ...])} # one can also access individual attributes of definition or data: >>> data.H_mean > array([-403.524 , -407.18 , -395.602 , -382.146 , -367.444 , ... ])
-
apply(func: callable, pass_defn: bool = False, pass_trials_iteratively: bool = True, ignore_errors: bool = True, ignore_coerce_warnings: bool = True, **kwargs)¶ Apply data_function to the data in each index.
**kwargswill be passed to data_function. If function_on_data returns ‘None’, that piece of data will be dropped.- Parameters
function_on_data (function) – f(dict) -> dict. Function is passed the data_dict for each index.
pass_defn (bool) – Whether or not to pass the definition to function_on_data. If True, will be passed with other kwargs.
pass_trials_iteratively (bool) – True for functions which operate on a single trial. False for functions which operate across trials. (Only used for grouped data)
ignore_errors (bool) – If True, errors in function_on_data will be printed, but not raised. Resulting data will be original data. If False, errors will be raised.
ignore_coerce_warnings (bool) – Whether or not to ignore coerce warnings in data_array_builder() class. Most likely want this false.
- Returns
the new data after operating on it
- Return type
(Data)
Examples
>>> some_data > { 0: {'definition': {'param1': {'10V'}, 'param2': {'100ns', '10ns', '50ns'}, 'param3': {'1mv', '2mv'}}, 'data': {'raw_data': array([[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]], dtype=int64)}}, 1: {'definition': {'param1': {'5V'}, 'param2': {'100ns'}, 'param3': {'1mv'}}, 'data': {'raw_data': array([1, 2, 3], dtype=int64)}} } #some function will square the data def some_function(data_dict): "a function which operates on the data dict and returns a data dict" out = dict() for key in data_dict: out.update({key:data_dict[key]**2}) return out >>> some_data.apply(some_function) > { 0: {'definition': {'param1': {'10V'}, 'param2': {'100ns', '10ns', '50ns'}, 'param3': {'1mv', '2mv'}}, 'data': {'raw_data': array([[1, 4, 9], [1, 4, 9], [1, 4, 9], [1, 4, 9]], dtype=int64)}}, 1: {'definition': {'param1': {'5V'}, 'param2': {'100ns'}, 'param3': {'1mv'}}, 'data': {'raw_data': array([1, 4, 9], dtype=int64)}}}
Passing trials iteratively
>>> _dict = { 0: { 'data':{'x':np.array([1])}, 'definition':{'param':{'a'}} }, 1: { 'data':{'x':np.array([0])}, 'definition':{'param':{'a'}} } } >>> data = analysis.Data(_dict) >>> def subtract_offset(data_dict):
>>> ... return {'x':data_dict['x']-np.mean(data_dict['x'])}
>>> data.apply(subtract_offset) > { 0: {'data': {'x': array([0.])}, 'definition': {'param': {'a'}}}, 1: {'data': {'x': array([0.])}, 'definition': {'param': {'a'}}}}
>>> data.groupby('param') > {0:{'data': {'x': array([[1], [0]])}, 'definition': {'param': {'a'}}}}
>>> data.groupby('param').apply(subtract_offset) > {0: {'data': {'x': array([[0.], [0.]])}, 'definition': {'param': {'a'}}}}
>>> data.groupby('param').apply(subtract_offset, pass_trials_iteratively=False) > {0: {'data': {'x': array([[ 0.5], [-0.5]])},'definition': {'param': {'a'}}}}
-
collapse(data_key)¶ Return collapsed (numpy.array) data corresponding to data_key. This will return all data for all indices concatenated into a single array.
- Parameters
data_key (key) – Key for data you wish to collapse
- Returns
Concatenated array of all data corresponding to data_key for all indices in self.
- Return type
(numpy.array)
-
contains(condition)¶ Returns data specified by condition.
- Parameters
condition (dict) – key is definition key, value is value to find. Multiple values provided will be joined by logical OR. Multiple keys will be joined with logical AND.
- Returns
data satisfying condition
- Return type
(Data)
Examples
#Providing multiple values will be joined by logical or, i.e. Data.contains({'high_voltage_v':{'100mv', '200mv'}}) #will search for all data with '100mv' OR '200mv' for high_voltage_v. #Multiple keys provided will be joined with logical AND. i.e. Data.contains({'x':1, 'y':2}) #will search for x = 1 AND y = 2.
-
property
data_keys¶ Return a list of keys corresponding to data.
- Returns
data keys
- Return type
(list)
-
dropna()¶ Drop nans from data
-
filter(data_condition_function_dict, definition_condition_dict=None, additional_data_keys_to_filter=None)¶ Filter the data.
- Parameters
data_condition_function_dict (dict) – a dict with one entry. key is data key, value is function which operates on a single value.
definition_condition_dict (dict) – a dict specifying specific definition conditions, which when satisfied can allow their data to be operated on. If None, all data will be filtered.
additional_data_keys_to_filter (str or key or array-like) – Additional data keys that you wish to filter based on data_condition_function_dict.
- Returns
filtered Data
- Return type
(Data)
Examples
Filter the data such that the data key ‘R’ only contains values >10. This leaves all other data keys unchanged.
>>> Data.filter({'R': lambda x: x>10})
Filter data corresponding to an amplitude of ‘500ua’ such that data key ‘R’ contains only values >10.
>>> Data.filter({'R': lambda x: x>10}, {'amplitude':'500ua'})
Filter data based on ‘saturation’ but also filter the ‘switching_time’ data.
>>> Data.filter({'saturation': lambda x: x > .003}, additional_data_keys_to_filter='switching_time')
You may use this method to remove outliers, for example.
-
groupby(key: str)¶ Group data by key. Similar to Dataset.get_data(groupby=key).get_data(), though offers one to perform functions on individual data files before grouping.
- Parameters
data (ekpy.analysis.core.Data) – The data to group
key (str) – Key to group on
- Returns
(ekpy.analysis.core.Data)
-
property
iloc¶ An indexer as in pandas .iloc Usage is Data.iloc[index]
- Returns
indexer for indexing
- Return type
(iDataIndexer)
Example
>>> Data.iloc[0]
-
mean(axis=0)¶ Return mean of Data. If 1d data is supplied, mean will be performed over the trial, otherwise mean will be performed across trials.
- Returns
(Data)
Examples
# mean over a trial >>> X = np.array([1,2,3]) >>> data = Data({0:{'definition':{},'data':{'X':X}}}) >>> data.mean().X > 2.0 # mean across trials >>> X = np.array([[1,2,3], [3,4,5]]) >>> data = Data({0:{'definition':{},'data':{'X':X}}}) >>> data.mean().X > array([2., 3., 4.])
-
plot(x=None, y=None, ax=None, color=None, cmap='viridis', labelby=None, **kwargs)¶ Plot the data. If ax is provided returns ax, otherwise returns fig, ax.
- Parameters
x (key) – data dict key for x axis.
y (key or array-like) – data dict key for y axis
ax (matplotlib.axis) – axis to plot on
color (str) – Color of plot. (Override colormap)
cmap (str) – Color map. See matplotlib.cm.cmaps_listed for allowed colormaps.
labelby (str) – Definition key to use for plot legend.
- Returns
figure of plot ax (matplotlib.axis): axis of plot. if ax is provided as an argument, only returns ax.
- Return type
fig (matplotlib.figure)
-
scatter(x=None, y=None, ax=None, color=None, cmap='viridis', labelby=None, **kwargs)¶ Scatter plot the data. If ax is provided returns ax, otherwise returns fig, ax.
- Parameters
x (key) – data dict key for x axis.
y (key or array-like) – data dict key for y axis
ax (matplotlib.axis) – axis to plot on
color (str) – Color of plot. (Override colormap)
cmap (str) – Color map. See matplotlib.cm.cmaps_listed for allowed colormaps.
labelby (str) – Definition key to use for plot legend.
- Returns
figure of plot ax (matplotlib.axis): axis of plot. if ax is provided as an argument, only returns ax.
- Return type
fig (matplotlib.figure)
-
sort(by, key=None, reverse=False)¶ Sort Data by definition key. This might be useful, for example in plotting with color maps. This Method sorts Data over mulitple indices, does not sort Data within an index.
- Parameters
by (str or key) – Definition key. The definition key that you are sorting on must be unique for each index in your Data object.
key (function) – Method for accessing value to sort.
reverse (bool) – Reverse order
- Returns
Sorted Data
- Return type
(Data)
Examples
Sort Data on parameter ‘test’
>>> data = dset.get_data() >>> data.summary > {'param': {'v'}, 'test': {0, 1, 2, 3, 4}, 'test2': {'100mv', '15mv', '1mv', '20mv', '32mv'}} # Sort by definition key 'test' >>> data = data.sort(by='test') # Confirm sorted: >>> data.test > {0: {0}, 1: {1}, 2: {2}, 3: {3}, 4: {4}} ### Sort in reverse order: >>> data = dset.get_data() >>> data.sort(by = 'test', reverse = True).test > {0: {4}, 1: {3}, 2: {2}, 3: {1}, 4: {0}}
Sort Data on a key that is not int or float:
>>> data = dset.get_data() >>> data.summary > {'param': {'v'}, 'test': {0, 1, 2, 3, 4}, 'test2': {'100mv', '15mv', '1mv', '20mv', '32mv'}} # unsorted data >>> data.test2 > {0: {'1mv'}, 1: {'100mv'}, 2: {'20mv'}, 3: {'15mv'}, 4: {'32mv'}} # Sort by definition key 'test2' >>> sort_by = lambda x: float(x.replace('mv', '')) # get rid of 'mv' suffix and convert to float >>> data = data.sort(by='test2', key=sort_by) >>> data.test2 > {0: {'1mv'}, 1: {'15mv'}, 2: {'20mv'}, 3: {'32mv'}, 4: {'100mv'}}
-
std()¶ Return standard deviation of Data. If 1d data is supplied, std will be performed over the trial, otherwise std will be performed across trials.
- Returns
(Data)
-
property
summary¶ Return a summary of the definitions in data
- Returns
summary of the data included
- Return type
(dict)
-
to_DataFrame(how='lump_mean', include_defn_keys=[], defn_converter=None)¶ Convert Data to pandas.DataFrame. Each index in Data will correspond to a single row in the resulting DataFrame.
- Parameters
how (function) – Method for converting data. f(ndarray, key) -> value. Default ‘lump_mean’ averages all data for each index in Data corresponding to each data key. ndarray is data array corresponding to data key key. how should operate on data corresponding to a single Data index.
include_defn_keys (str, key or array-like) – Definition key(s) to include in resulting dataframe. i.e., each key in include_defn_keys will be a column name with values corresponding to the value for each index in Data.
defn_converter (function or array-like) – Optional. Methods for converting definition values to alternative type, perhaps from str to float.
- Returns
(pandas.DataFrame)
Examples
>>> _dict = { 0: { 'definition':{ 'frequency':{'1khz'}, 'amplitude':{'500mv'} }, 'data':{ 'R':np.array([1,2,2,2]), 'theta':np.array([0,0,0,0]) }, }, 1: { 'definition':{ 'frequency':{'1khz'}, 'amplitude':{'800mv'} }, 'data':{ 'R':np.array([2,3,3,3]), 'theta':np.array([0,0,0,0]) }, } }
>>> data = Data(_dict) >>> data.to_DataFrame() > R theta 0 1.75 0.0 1 2.75 0.0
>>> data.to_DataFrame(how=lambda x,key: x[0]) > R theta 0 1.0 0.0 1 2.0 0.0
>>> data.to_DataFrame(include_defn_keys='frequency') > R theta frequency 0 1.75 0.0 1khz 1 2.75 0.0 1khz
>>> data.to_DataFrame(include_defn_keys=['frequency', 'amplitude'], defn_converter=[ lambda x: float(x.replace('khz', 'e3')), lambda x: float(x.replace('mv', 'e-3')) ]) > R theta frequency amplitude 0 1.75 0.0 1000.0 0.5 1 2.75 0.0 1000.0 0.8
-
to_dict()¶ Return a dict of the Data class.
- Returns
a dict class with identical structure
- Return type
(dict)
-
to_ekpdat(file)¶ Save file as .ekpdat file.
- Parameters
file (str) – Path to file
-
class
ekpy.analysis.core.Dataset(path, initializer, readfileby=<function read_ekpy_data>, pointercolumn='filename')¶ Bases:
objectDataset class for analysis. Used to manipulate meta data while keeping track of location for the real data, which can be retrieved when necessary.
- Parameters
path (str or dict) – Path to the real data.
initializer (pandas.DataFrame or dict) – Initializer for a DataFrame. The meta data.
readfileby (function) – How to read the data. Default is
ekpy.utils.read_ekpy_data()pointercolumn (str or index) – Column name which holds name of file. Default is
'filename'
Examples
>>> meta_data = pd.DataFrame( { 'voltageApplied':['1v','2v'], 'filename':['t1.csv','t2.csv'] } ) #Assuming the data is stored in './data/', create the Dataset >>> dset = Dataset('./data/', meta_data,) #Query the meta data for when voltageApplied is '1v': >>> dset.query('voltageApplied == "1v"') #return the Data >>> dset.get_data()
-
add_calculated_column(column_name, how)¶ Add a calculated column to the Dataset.
- Parameters
column_name (str) – The new column name
how (function) – f(self) -> column data. A function which operates on self (pandas.DataFrame) and returns new column data.
- Returns
Updated Dataset.
- Return type
(Dataset)
Examples
Convert 25um and 10um to measured areas. This will only work if no other diameters are present in the Dataset.
>>> def how(dataframe): nominal_diameter_to_measured_area_dict = {'25um':190, '10um':60} return [nominal_diameter_to_measured_area_dict[x] for x in dataframe['diameter'].values] >>> dset.add_calculated_column('measured_area_um', how = how)
-
add_column(column_name, column_data)¶ Add a column to a Dataset.
- Parameters
column_name (str) – The new column name.
column_data (array-like) – The data for the column
-
property
columns¶
-
filter_on_column(column, function, **kwargs_for_function)¶ Filter Dataset. Keeps rows with column satisfying function.
- Parameters
column (str or index) – Specify column
function (function) – Filter function.
function(value) -> boolkwargs_for_function (kwargs) – kwargs to pass to function
- Returns
Filtered Dataset.
- Return type
(Dataset)
Examples
>>> dset.summary { 'identifier': {'185um'}, 'pulsewidth_ns': {10.0, 50.0, 100.0}, 'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0}, 'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5}, 'preset_voltage_v': {0.5}, 'preset_pulsewidth_ns': {10000.0}, 'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} } #Return only rows with where the high_voltage_v is greater than 1 and print the summary >>> dset.filter_on_column('high_voltage_v', lambda x: x>1).summary { 'identifier': {'185um'}, 'pulsewidth_ns': {50.0}, 'delay_ns': {10.0}, 'high_voltage_v': {1.5, 2.0, 2.5}, 'preset_voltage_v': {0.5}, 'preset_pulsewidth_ns': {10000.0}, 'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} }
-
get_data(groupby=None, labelby=None)¶ Return data in Data (Data class) for the current Dataset. If using groupby kwarg, resulting Data will vstack all data which corresponds to that grouping. (See examples)
- Parameters
groupby (str, label, index or array-like of) – what to group on
labelby (str, label, index or array-like of) – what to label the output data by. This will change ‘definition’ in output Data class
- Returns
the data
- Return type
(Data)
Examples
>>> dset.summary { 'identifier': {'185um'}, 'pulsewidth_ns': {10.0, 50.0, 100.0}, 'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0}, 'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5}, 'preset_voltage_v': {0.5}, 'preset_pulsewidth_ns': {10000.0}, 'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} } >>> data = dset.get_data() #no groupby yet >>> data.data_keys ['time', 'p1', 'p2'] >>> data.iloc[0].p1 # This corresponds to a single trial (1D data) array([ 0.00898495, 0.00765674, 0.00351585, ..., -0.00679731, -0.00101569, -0.00039065]) # now we will group by high_voltage_v. There are many .csv files that correspond to such a case # for example how many are there for a high_voltage_v of .125? (there are 5, see here) >>> len(dset.query('high_voltage_v == .125')) 5 # let's retrieve the data but with grouping >>> data = dset.get_data(groupby = 'high_voltage_v') >>> data.iloc[0]['p1'] #vstack of all different .csv files grouped by high_voltage_v (a meta data parameter) array([[ 0.00898495, 0.00765674, 0.00351585, ..., -0.00679731, -0.00101569, -0.00039065], [-0.02172014, -0.02773615, -0.03695549, ..., -0.0203138 , -0.0085943 , -0.00117195], [-0.0351585 , -0.02859558, -0.02289209, ..., -0.03093948, -0.01968876, -0.01289145], [-0.02765802, -0.02453282, -0.02070445, ..., -0.00679731, -0.00257829, -0.0093756 ], [-0.0375024 , -0.04656548, -0.04930003, ..., -0.03789305, -0.02609542, -0.01632917]]) # no longer is it 1D data. >>> data[0]['data']['p1'].shape (5, 500) #recall that there were 5 rows in the Dataset corresponding to a high_voltage_v of .125. Is this that grouping? we can check: >>> data[0]['definition'] # indeed it is!!! { 'identifier': {'185um'}, 'pulsewidth_ns': {10.0}, 'delay_ns': {20.0}, 'high_voltage_v': {0.125}, 'preset_voltage_v': {0.5}, 'preset_pulsewidth_ns': {10000.0}, 'trial': {0, 1, 2, 3, 4} }
-
head(*args, **kwargs)¶
-
property
index_to_path¶ Index to path
pandas.Series- Returns
Index to path
- Return type
(pandas.Series)
-
property
path¶ Return path to data.
-
property
pretty_summary¶ Print a summary of dataset in an easier to read fashion
-
query(*args, **kwargs)¶ Query the columns of a Dataset with a boolean expression.
- Parameters
expr (str) – The query string to evaluate. You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2) would be referenced as Area (cm^2)). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used. For example, if one of your columns is called a a and you want to sum it with b, your query should be a a + b.
- Returns
the result of the query
- Return type
(Dataset)
-
remove_index(index)¶ Remove an index or array-like of indices.
- Parameters
index (index or array-like) – index to be removed
- Returns
updated Dataset
- Return type
(Dataset)
-
remove_nonexistent_files_from_metadata()¶ Remove references to files that do not exist in path. This may occur, for example, if you know certain data files are bad (and thus delete them from the data dir), but did not delete them while collecting data.
-
save_meta_data()¶ Save the current meta_data as
pandas.DataFrameto path. This is not allowed for merged datasets i.e. Dataset resulting fromanalysis.utils.merge_Datasets. To save a Dataset (including merged) seeto_ekpds.
-
select_index(index)¶ Return dataset with single index specified.
- Parameters
index (int or index) – Index to select
- Returns
Single row Dataset.
- Return type
(Dataset)
-
property
summary¶ Return a brief summary of the data in your Dataset.
- Returns
a summary of the Dataset. keys are columns names, values are sets of values appearing in the Dataset.
- Return type
(Dict)
Examples
>>> dset.head() identifier pulsewidth_ns delay_ns high_voltage_v preset_voltage_v 0 185um 100.0 200.0 0.25 0.5 1 185um 100.0 200.0 0.25 0.5 2 185um 100.0 200.0 0.25 0.5 3 185um 100.0 200.0 0.25 0.5 4 185um 100.0 200.0 0.25 0.5 preset_pulsewidth_ns filename trial 0 10000.0 185um100e-9_200e-9_0x25V_500mv_10000ns_0.csv 0 1 10000.0 185um100e-9_200e-9_0x25V_500mv_10000ns_1.csv 1 2 10000.0 185um100e-9_200e-9_0x25V_500mv_10000ns_2.csv 2 3 10000.0 185um100e-9_200e-9_0x25V_500mv_10000ns_3.csv 3 4 10000.0 185um100e-9_200e-9_0x25V_500mv_10000ns_4.csv 4 >>> dset.summary { 'identifier': {'185um'}, 'pulsewidth_ns': {10.0, 50.0, 100.0}, 'delay_ns': {10.0, 20.0, 50.0, 200.0, 300.0, 500.0}, 'high_voltage_v': {0.125, 0.25, 0.375, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5}, 'preset_voltage_v': {0.5}, 'preset_pulsewidth_ns': {10000.0}, 'trial': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} }
-
to_ekpds(path)¶ Save Dataset to file (extension .ekpds).
- Parameters
path (str) – Path to save location
Example
dset.to_ekpds('./dset1.ekpds')
ekpy.analysis.data_utils module¶
-
ekpy.analysis.data_utils.get_vals_by_definition(data, definition_key, data_key)¶ Returns a dict where key is definition_key value is data from each Data index.
- Parameters
definition_key (str or key) – Definition key whose value will correspond to the key for the returned dict.
data_key (str or key) – Data key whose value will correspond to the value for the returned dict.
- Returns
Values by definition dict.
- Return type
out (dict)
Examples
>>> data > {{0: {'definition': {'type': {'preset2pulse'}, 'identifier': {'125um2'}, 'pulsewidth_ns': {500.0}, 'delay_ns': {100000000.0}, 'high_voltage_v': {1.0}, 'preset_voltage_v': {1.0}, 'preset_pulsewidth_ns': {1000.0}, 'diameter': {12.5}, 'area': {122.7184630308513}, 'trial': {0}}, 'data': {'p1': array([-0.003, -0.003, -0.001, ..., -0.003, -0.003, -0.001]), 'time': array([0.0000e+00, 4.0000e-02, 8.0000e-02, ..., 1.9988e+02, 1.9992e+02, 1.9996e+02]), 'p2': array([-0.001, -0.003, -0.005, ..., -0.003, 0.001, 0.003])}}, 1: {'definition': {'type': {'preset2pulse'}, 'identifier': {'125um2'}, 'pulsewidth_ns': {500.0}, 'delay_ns': {100000000.0}, 'high_voltage_v': {0.5}, 'preset_voltage_v': {1.0}, 'preset_pulsewidth_ns': {1000.0}, 'diameter': {12.5}, 'area': {122.7184630308513}, 'trial': {0}}, 'data': {'p1': array([0.0004, 0.0012, 0.0004, ..., 0.002 , 0.002 , 0.0004]), 'time': array([0.0000e+00, 4.0000e-02, 8.0000e-02, ..., 1.9988e+02, 1.9992e+02, 1.9996e+02]), 'p2': array([-0.0004, -0.0004, 0.0004, ..., 0.0012, 0.0012, 0.0004])}},} #retrieve 'p1' data keyed by high_voltage_v >>> analysis.get_vals_by_definition(data, 'high_voltage_v', 'p1') > { 1.0: [-0.003,-0.003, ... ], 0.5: [0.0004, 0.0012, ...]}
-
ekpy.analysis.data_utils.vals_by_definition_to_2darray(vals_by_definition, converter='Default', ascending=True)¶ Convert vals by definition to 2d array. Typically used for plotting and after .get_vals_by_definition().
- Parameters
vals_by_definition (dict, vbd) – vals by definition dict. See .get_vals_by_definition()
converter (function) – Function to which each value for each key from vals_by_definition is passed. Default is convert to float.
` converter = lambda x: float(x) `ascending (bool) – Return X in ascending or descending order
- Returns
X, Y
- Return type
(numpy.array (2D))
Examples
>>> vbd > { 1.0 : [1,2,3], 0.5 : [2,1,1]} >>> X, Y = analysis.vals_by_definition_to_2darray(vbd) >>> X > [1.0, 1.0, 1.0, 0.5, 0.5, 0.5] >>> Y > [1, 2, 3, 2, 1, 1]
ekpy.analysis.load module¶
-
ekpy.analysis.load.generate_meta_data(path, mapper: f(str)->dict, pass_path=False, pointercolumn='filename', overwrite=False, ignore_errors=True)¶ Generate meta_data from a path for a given mapper function. Important mapper must include pointercolumn which is (key,value) = (‘<pointer column name>’, <filename>). Default is to call such a column filename, i.e. {‘filename’:’a.csv’}
- args:
path (str): Specify the path to the directory mapper ( function ) : filename (str) -> dict. A function which operates on a single file name in order to get the columns (dict key) and values (dict value) for meta_data of that file. pass_path (bool) : Pass the pass of each file to mapper. This is used to parse meta data from within the file, as one can now open the file within mapper. If True, mapper must take argument path. pointercolumn (str) : The name of the pointercolumn in the created meta_data overwrite (bool) : True will overwrite any existing meta_data in path. ignore_errors (bool) : False will hault generation of meta data if a single file fails. Default is True (ignore)
- examples:
Basic usage where mapper operates only on filename:
def mapper(file,): spl = file.split('_') meta_data = { 'param1':spl[0] # the first parameter of interest is located at the first split location. 'filename':file # must include the filename (or other `pointercolumn`) } return meta_data generate_meta_data(path, mapper, pointercolumn='filename')
Parse the data file itself for metadata:
def mapper(file, path): # must contain kwarg path! full_path = path+file with open(full_path, 'r') as f: lines = f.readlines() ### extract metadata from lines ### param1 = lines[0].replace('
‘,’’)
- meta_data = {
‘param1’:param1, ‘filename’:’file’
} return meta_data
generate_meta_data(path, mapper, pass_path=True)
-
ekpy.analysis.load.load_Dataset(path, meta_data=None, readfileby=<function read_ekpy_data>)¶ Load a dataset from path. Path must contain (pickle or .csv) file
'meta_data'.- Parameters
path (str) – Path to data
meta_data (pandas.DataFrame) – meta_data if one wishes to provide different meta_data from that provided in path.
readfileby (callable) – Method for reading data.
- Returns
Dataset
- Return type
(Dataset)
ekpy.analysis.plotting module¶
-
ekpy.analysis.plotting.add_legend_element(ax, label, color, fontsize='auto', **kwargs)¶ Add element to legend for matplotlib.axis. For
**kwargssee matplotlib.lines.line2D- Parameters
ax (matplotlib.axis) – Axis to add legend element.
label (str or int or float) – Label for legend element.
color (str or color) – Color for legend element.
fontsize (str or float) – Fontsize for the legend. Default is ‘auto’.
- Returns
Axis with updated legend
- Return type
(matplotlib.axis)
-
ekpy.analysis.plotting.format_legend(ax, **kwargs)¶
ekpy.analysis.utils module¶
-
ekpy.analysis.utils.concat_Datas(datas)¶ Concatenate data.
- Parameters
datas (iter of Data) – Iterable of Data objects to merge.
- Returns
Concatenated data.
- Return type
(Data)
-
ekpy.analysis.utils.concat_Datasets(datasets)¶ Concatenate datasets.
- Parameters
datasets (iter of Dataset) – Iterable of Dataset objects to merge.
- Returns
Concatenated dataset.
- Return type
(Dataset)
-
ekpy.analysis.utils.merge_Datas(tpl, by: str)¶ Merge tpl of Data on definition key (by).
- Parameters
tpl (array-like) – Array-like of Data objects
by (str) – Definition key to merge on
- Returns
Merged Data.
- Return type
(Data)
Example
>> data1 = Data({ 0 : { 'definition': {'param1':{'eric'}, 'param2':{'merge_on'}}, 'data':{'data1':[0,1,2]} } }) >> data2 = Data({ 0 : { 'definition': {'param1':{'othername'}, 'param2':{'merge_on'}}, 'data':{'data1':[3,4,5]} } }) >> merge_Datas((data1, data2), by='param2') > {0: { 'data': {'data1_0': [0, 1, 2], 'data1_1': [3, 4, 5]}, 'definition': {'param1_0': {'eric'}, 'param1_1': {'othername'}, 'param2': {'merge_on'}}}} >> data3 = Data({ 0 : { 'definition': {'param1':{'eric'}, 'param2':{'this will be its own index'}}, 'data':{'data1':[0,1,2]} } }) > {0: {'data': {'data1_0': [0, 1, 2], 'data1_1': [3, 4, 5]}, 'definition': {'param1_0': {'eric'}, 'param1_1': {'othername'}, 'param2': {'merge_on'}}}, 1: {'data': {'data1_0': [0, 1, 2]}, 'definition': {'param1_0': {'eric'}, 'param2': {'this will be its own index'}}}}
-
ekpy.analysis.utils.merge_Datasets(datasets)¶