The datesy package¶

The datesy package is divided in 5 main components:

Subpackages¶

All actions of interacting with files are to be found here:

File I/O subpackage

All actions of interacting with databases are to be found here:

Database I/O subpackage
- Database
- Table
- Row
- Item

Submodules¶

datesy.convert module¶

All actions of transforming data from different file formats are to be found here

datesy.convert.rows_to_dict(rows, main_key_position=0, null_value='delete', header_line=0, contains_open_ends=False)¶

Convert a row of rows (e.g. csv) to dictionary

Parameters:	rows (list) – the row based data to convert to dict main_key_position (int, optional) – if the main_key is not on the top left, its position can be specified null_value (any, optional) – if an emtpy field in the lists shall be represented somehow in the dictionary header_line (int, optional) – if the header_line is not the first one, its position can be specified contains_open_ends (bool, optional) – if each row is not in the same length (due to last set entry as last element in row), a length check for corrupted data can be ignored
Returns:	dictionary containing the information from row-based data
Return type:	dict

datesy.convert.dict_to_rows(data, main_key_name=None, main_key_position=None, if_empty_value=None, order=None)¶

Convert a dictionary to rows (list(lists))

Parameters:	data (dict) – the data to convert in form of a dictionary main_key_name (str, optional) – if the data isn’t provided as {main_key: data} the key needs to be specified main_key_position (int, optional) – if the main_key shall not be on top left of the data the position can be specified if_empty_value (any, optional) – if a main_key’s sub_key is not set something different than blank can be defined order (dict, list, None, optional) – if a special order for the keys is required
Returns:	list of rows representing the csv based on the main_element_position
Return type:	list(lists)

datesy.convert.pandas_data_frame_to_dict(data_frame, main_key_position=0, null_value='delete', header_line=0)¶

Converts a single file_name from xlsx to json

Parameters:	data_frame (pandas.core.frame.DataFrame) – main_key_position (int, optional) – null_value (any, optional) – header_line (int, optional) –
Returns:	the dictionary representing the xlsx based on main_key_position
Return type:	dict

datesy.convert.dict_to_pandas_data_frame(data, main_key_name=None, order=None, inverse=False)¶

Convert a dictionary to pandas.DataFrame

Parameters:	data (dict) – dictionary of handling main_key_name (str, optional) – if the json or dict does not have the main key as a single {main_element : dict} present, it needs to be specified order (dict, list, optional) – list with the column names in order or dict with specified key positions inverse (bool, optional) – if columns and rows shall be switched
Returns:	DataFrame representing the dictionary
Return type:	pandas.DataFrame

datesy.convert.xml_to_standard_dict(ordered_data, reduce_orderedDicts=False, reduce_lists=False, manual_selection_for_list_reduction=False)¶

Convert a xml/orderedDict to normal dictionary

Parameters:	ordered_data (orderedDict) – input xml data to convert to standard dict reduce_orderedDicts (bool, optional) – if collections.orderedDicts shall be converted to normal dicts reduce_lists (bool, list, set, optional) – if lists in the dictionary shall be converted to dictionaries with transformed keys (list_key + unique key from dictionary from list_element) if list or set is provided, only these values will be reduced manual_selection_for_list_reduction (bool, optional) – if manually decision on list reduction shall be used all keys in `reduce_lists` will be automatically reduced
Returns:	the normalized dictionary
Return type:	dict

datesy.inspect module¶

All actions of inspecting data are to be found here

datesy.inspect.find_header_line(data, header_keys)¶

Find the header line in row_based data_structure NOT IMPLEMENTED YET: Version 0.9 feature

Parameters:	data (list, pandas.DataFrame) – header_keys (str, list, set) – some key(s) to find in a row
Returns:	the header_line
Return type:	int

datesy.inspect.find_key(data, key=None, regex_pattern=None)¶

Find a key in a complex dictionary

Parameters:	data (dict) – the data structure to find the key key (str, optional) – a string to be found regex_pattern (str, optional) – a regex match to be found
Returns:	all matches and their path in the structure `{found_key: path_to_key}`
Return type:	dict

datesy.matching module¶

All actions of mapping data to other data as well as the functions helpful for that are to be found here

datesy.matching.simplify_strings(to_simplify, lower_case=True, simplifier=True)¶

Simplify a string, set(strings), list(strings), keys in dict Options for simplifying include: lower capitals, separators, both (standard), own set of simplifier

Parameters:	to_simplify (list, set, string) – the string(s) to simplify presented by itself or as part of another data format lower_case (bool, optional) – if the input shall be converted to only lower_case (standard: True) simplifier (str, optional) – the chars to be removed from the string. if type bool and True, standard chars `_ , \| \n ' & " % * - \` used
Returns:	simplified values `{simplified_value: input_value}`
Return type:	dict

datesy.matching.ease_match_similar(list_for_matching, list_to_be_matched_to, simplified=False, similarity_limit_for_matching=0.6, print_auto_matched=False)¶

Return a dictionary with list_for_matching as keys and list_to_be_matched_to as values based on most similarity. Matching twice to the same value is possible! Similarity distance for stopping the matching is set by distance_for_automatic_vs_manual_matching. Faster than datesy.matching.match_comprehensive but when having very similar strings more likely to contain errors.

Parameters:

list_for_matching (list, set) – Iterable of strings which shall be matched
list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided
print_auto_matched (bool, optional) – Printing the matched entries during process (most likely for debugging)
similarity_limit_for_matching (float, optional) – For not matching the most irrelevant match which could exist

Returns:

match (dict) – {value_for_matching: value_to_be_mapped_to}
no_match (set) – A set of all values from list_for_matching that could not be matched

datesy.matching.match_comprehensive(list_for_matching, list_to_be_matched_to, simplified=False)¶

Return a dictionary with list_for_matching as keys and list_to_be_matched_to as values based on most similarity. All values of both iterables get compared to each other and highest similarities are picked. Slower than datesy.matching.ease_match_similar but more precise.

Parameters:

list_for_matching (list, set) – Iterable of strings which shall be matched
list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided

Returns:

match (dict) – {value_for_matching: value_to_be_mapped_to}
no_match (set) – A set of all values from list_for_matching that could not be matched

datesy.matching.match_similar_with_manual_selection(list_for_matching, list_to_be_matched_to, simplified=False, minimal_distance_for_automatic_matching=0.1, print_auto_matched=False, similarity_limit_for_manual_checking=0.6)¶

Return a dictionary with list_for_matching as keys and list_to_be_matched_to as values based on most similarity. All possible matches not matched automatically (set limit with minimal_distance_for_automatic_matching) can be handled interactively. Similarity distance for stopping the matching is set by distance_for_automatic_vs_manual_matching.

Parameters:

list_for_matching (list, set) – Iterable of strings which shall be matched
list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided
print_auto_matched (bool, optional) – Printing the matched entries during process (most likely for debugging)
minimal_distance_for_automatic_matching (float, optional) – If there is a vast difference between the most and second most matching value, automatically matching is provided This parameter provides the similarity distance to be reached for automatically matching
similarity_limit_for_manual_checking (float, optional) – For not showing/matching the most irrelevant match which could exist

Returns:

match (dict) – {value_for_matching: value_to_be_mapped_to}
no_match (set) – A set of all values from list_for_matching that could not be matched