The datesy package¶
The datesy package is divided in 5 main components:
Subpackages¶
All actions of interacting with files are to be found here:
All actions of interacting with databases are to be found here:
Submodules¶
datesy.convert module¶
All actions of transforming data from different file formats are to be found here
-
datesy.convert.
rows_to_dict
(rows, main_key_position=0, null_value='delete', header_line=0, contains_open_ends=False)¶ Convert a row of rows (e.g. csv) to dictionary
Parameters: - rows (list) – the row based data to convert to dict
- main_key_position (int, optional) – if the main_key is not on the top left, its position can be specified
- null_value (any, optional) – if an emtpy field in the lists shall be represented somehow in the dictionary
- header_line (int, optional) – if the header_line is not the first one, its position can be specified
- contains_open_ends (bool, optional) – if each row is not in the same length (due to last set entry as last element in row), a length check for corrupted data can be ignored
Returns: dictionary containing the information from row-based data
Return type: dict
-
datesy.convert.
dict_to_rows
(data, main_key_name=None, main_key_position=None, if_empty_value=None, order=None)¶ Convert a dictionary to rows (list(lists))
Parameters: - data (dict) – the data to convert in form of a dictionary
- main_key_name (str, optional) – if the data isn’t provided as {main_key: data} the key needs to be specified
- main_key_position (int, optional) – if the main_key shall not be on top left of the data the position can be specified
- if_empty_value (any, optional) – if a main_key’s sub_key is not set something different than blank can be defined
- order (dict, list, None, optional) – if a special order for the keys is required
Returns: list of rows representing the csv based on the main_element_position
Return type: list(lists)
-
datesy.convert.
pandas_data_frame_to_dict
(data_frame, main_key_position=0, null_value='delete', header_line=0)¶ Converts a single file_name from xlsx to json
Parameters: - data_frame (pandas.core.frame.DataFrame) –
- main_key_position (int, optional) –
- null_value (any, optional) –
- header_line (int, optional) –
Returns: the dictionary representing the xlsx based on main_key_position
Return type: dict
-
datesy.convert.
dict_to_pandas_data_frame
(data, main_key_name=None, order=None, inverse=False)¶ Convert a dictionary to pandas.DataFrame
Parameters: - data (dict) – dictionary of handling
- main_key_name (str, optional) – if the json or dict does not have the main key as a single {main_element : dict} present, it needs to be specified
- order (dict, list, optional) – list with the column names in order or dict with specified key positions
- inverse (bool, optional) – if columns and rows shall be switched
Returns: DataFrame representing the dictionary
Return type: pandas.DataFrame
-
datesy.convert.
xml_to_standard_dict
(ordered_data, reduce_orderedDicts=False, reduce_lists=False, manual_selection_for_list_reduction=False)¶ Convert a xml/orderedDict to normal dictionary
Parameters: - ordered_data (orderedDict) – input xml data to convert to standard dict
- reduce_orderedDicts (bool, optional) – if collections.orderedDicts shall be converted to normal dicts
- reduce_lists (bool, list, set, optional) – if lists in the dictionary shall be converted to dictionaries with transformed keys (list_key + unique key from dictionary from list_element) if list or set is provided, only these values will be reduced
- manual_selection_for_list_reduction (bool, optional) – if manually decision on list reduction shall be used
all keys in
reduce_lists
will be automatically reduced
Returns: the normalized dictionary
Return type: dict
datesy.inspect module¶
All actions of inspecting data are to be found here
-
datesy.inspect.
find_header_line
(data, header_keys)¶ Find the header line in row_based data_structure NOT IMPLEMENTED YET: Version 0.9 feature
Parameters: - data (list, pandas.DataFrame) –
- header_keys (str, list, set) – some key(s) to find in a row
Returns: the header_line
Return type: int
-
datesy.inspect.
find_key
(data, key=None, regex_pattern=None)¶ Find a key in a complex dictionary
Parameters: - data (dict) – the data structure to find the key
- key (str, optional) – a string to be found
- regex_pattern (str, optional) – a regex match to be found
Returns: all matches and their path in the structure
{found_key: path_to_key}
Return type: dict
datesy.matching module¶
All actions of mapping data to other data as well as the functions helpful for that are to be found here
-
datesy.matching.
simplify_strings
(to_simplify, lower_case=True, simplifier=True)¶ Simplify a string, set(strings), list(strings), keys in dict Options for simplifying include: lower capitals, separators, both (standard), own set of simplifier
Parameters: - to_simplify (list, set, string) – the string(s) to simplify presented by itself or as part of another data format
- lower_case (bool, optional) – if the input shall be converted to only lower_case (standard: True)
- simplifier (str, optional) – the chars to be removed from the string. if type bool and True, standard chars
_ , | \n ' & " % * - \
used
Returns: simplified values
{simplified_value: input_value}
Return type: dict
-
datesy.matching.
ease_match_similar
(list_for_matching, list_to_be_matched_to, simplified=False, similarity_limit_for_matching=0.6, print_auto_matched=False)¶ Return a dictionary with
list_for_matching
as keys andlist_to_be_matched_to
as values based on most similarity. Matching twice to the same value is possible! Similarity distance for stopping the matching is set by distance_for_automatic_vs_manual_matching. Faster than datesy.matching.match_comprehensive but when having very similar strings more likely to contain errors.Parameters: - list_for_matching (list, set) – Iterable of strings which shall be matched
- list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
- simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided
- print_auto_matched (bool, optional) – Printing the matched entries during process (most likely for debugging)
- similarity_limit_for_matching (float, optional) – For not matching the most irrelevant match which could exist
Returns: - match (dict) – {value_for_matching: value_to_be_mapped_to}
- no_match (set) – A set of all values from list_for_matching that could not be matched
-
datesy.matching.
match_comprehensive
(list_for_matching, list_to_be_matched_to, simplified=False)¶ Return a dictionary with
list_for_matching
as keys andlist_to_be_matched_to
as values based on most similarity. All values of both iterables get compared to each other and highest similarities are picked. Slower than datesy.matching.ease_match_similar but more precise.Parameters: - list_for_matching (list, set) – Iterable of strings which shall be matched
- list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
- simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided
Returns: - match (dict) – {value_for_matching: value_to_be_mapped_to}
- no_match (set) – A set of all values from list_for_matching that could not be matched
-
datesy.matching.
match_similar_with_manual_selection
(list_for_matching, list_to_be_matched_to, simplified=False, minimal_distance_for_automatic_matching=0.1, print_auto_matched=False, similarity_limit_for_manual_checking=0.6)¶ Return a dictionary with
list_for_matching
as keys andlist_to_be_matched_to
as values based on most similarity. All possible matches not matched automatically (set limit with minimal_distance_for_automatic_matching) can be handled interactively. Similarity distance for stopping the matching is set by distance_for_automatic_vs_manual_matching.Parameters: - list_for_matching (list, set) – Iterable of strings which shall be matched
- list_to_be_matched_to (list, set) – Iterable of stings which shall be matched to
- simplified (False, "capital", "separators", "all", list, str, optional) – For reducing the values by all small letters or unifying & deleting separators separators or any other list of strings provided
- print_auto_matched (bool, optional) – Printing the matched entries during process (most likely for debugging)
- minimal_distance_for_automatic_matching (float, optional) – If there is a vast difference between the most and second most matching value, automatically matching is provided This parameter provides the similarity distance to be reached for automatically matching
- similarity_limit_for_manual_checking (float, optional) – For not showing/matching the most irrelevant match which could exist
Returns: - match (dict) – {value_for_matching: value_to_be_mapped_to}
- no_match (set) – A set of all values from list_for_matching that could not be matched