Skip to content

Cleaner Module

A module for cleaning tables, fields and values.

field_cleaner

def field_cleaner(field: str) -> str

Convert field string from an html response into a snake case variable.

Arguments:

  • field str - A dirty field string.

Example:

Input Output
Previous Close previous_close
Avg. Volume avg_volume
Beta (5Y Monthly) beta_five_year_monthly
PE Ratio (TTM) pe_ratio_ttm

Returns:

  • str - lowercased and converted to snake case.

table_cleaner

def table_cleaner(html_table: HTML) -> Optional[Dict]

Clean table with two fields.

Arguments:

  • html_table HTML - HTML object parsed from a table section.

Returns:

  • dict - cleaned fields (keys) and string (values).
  • None - if html_table does not contain table elements.

cleaner

  • Partial overloading the pytdanitc.validator function with common args.

CommonCleaners

Contains the most commonly used methods for cleaning values.

remove_comma

 | def remove_comma(value: str) -> str

Remove commas from strings and strip whitespace.

Arguments:

  • value str - A number with more than 3 digits.

Example:

Input Output
'5,000' '5000'

Returns:

  • str - Commas removed and whitespace stripped.

remove_brakets

 | def remove_brakets(value: str) -> str

Remove () brakets from strings and strip whitespace.

Arguments:

  • value str - Containing () brakets. Normally surrounding a percent change.

Example:

Input Output
+19.60 (+1.36%) +19.60 +1.36

Returns:

  • str - () brakets removed and whitespace stripped.

remove_percent_sign

 | def remove_percent_sign(value: str) -> str

Remove percent sign % from string and strip whitespace.

Arguments:

  • value str - Containing percent sign.

Example:

Input Output
+1.36% +1.36

Returns:

  • str - Percent sign % removed and whitespace stripped.

remove_brakets_and_percent_sign

 | def remove_brakets_and_percent_sign(cls, value: str) -> str

Remove () brakets and % percent signs from string.

Arguments:

  • value str - Contains () backets and % percent sign.

Example:

Input Output
+19.60 (+1.36%) +19.60 +1.36

Returns:

  • str - () brakets % percent sign removed.

value_is_missing

 | def value_is_missing(value: str) -> bool

Check if value has missing data.

It checks to see if "N/A" and other missing data strings are present in the value.

Arguments:

  • value str - A value parsed from yahoo finance.

Example:

Input Output
"Ex-Dividend Date N/A" True

Returns:

  • bool - True if a missing data string is found in value else False.

has_large_number_suffix

 | def has_large_number_suffix(value: str) -> bool

Check if string representation of a number has T,B,M,K suffix.

Arguments:

  • value str - A value to check if contains a T,B,M,K suffix.

Example:

Input Output
225.0M True

Returns:

  • bool - True if string value contains T,B,M,K suffix else false.

clean_large_number

 | def clean_large_number(value: str) -> Optional[int]

Convert a string representation of a number with a T,B,M,K suffix to an int.

Arguments:

  • value str - A value which contains T,B,M,K suffix.

Example:

Input Output
2.5B 2_500_000_000

Returns:

  • int - suffix removed and used as multiplier to convert to an int.
  • None - This can return None, but should not because the has_large_number_suffix method should be used to check if this method should even be run.

common_value_cleaner

 | def common_value_cleaner(cls, value: str) -> Union[int, str]

Most common method for cleaning most values from yahoo finance.

Removes commas and converts number if it has a suffix.

Arguments:

  • value str - value to be cleaned.

Example:

Input Output Type
"5,000" "5000" str
"2.5M" 2_500_000 int

Returns:

  • str - If value is cleaned and doesn't have a suffix.
  • int - If value is cleaned and has a suffix.

clean_common_values

 | def clean_common_values(cls, value: str) -> Optional[Union[int, str]]

Most common vanilla method for cleaning most values with a check for missing values.

Arguments:

  • value str - value to be cleaned.

Example:

Input Output Type
"5,000" "5000" str
"2.5M" 2_500_000 int
"N/A" None None

Returns:

  • str - If value is cleaned and doesn't have a suffix.
  • int - If value is cleaned and has a suffix.
  • None - If value is missing.

clean_basic_percentage

 | def clean_basic_percentage(cls, value: str) -> Optional[str]

Clean a single percentage value.

Arguments:

  • value str - value to be cleaned.

Example:

Input Output Type
"-3.4%" "-3.4" str
"N/A" None None

Returns:

  • str - cleaned value if value is not missing.
  • None - if value is missing.

clean_date

 | def clean_date(cls, value: str) -> DateTime

Clean and convert a string date.

Uses pendulum parse method to extract datetime information. Sometimes yahoo finance give multiple dates in one value field. This normally happens in the Earnings Date section on the Summary page. The Earnings Date may have a single date or an estimated Earnings Date range. It would be very easy to have this method output a pendulum.period.Period object that can represent the date range. The decision was made to just return the start of the earnings period for consistency.

Arguments:

  • value str - Date string to be converted to datetime object.

Example:

Input Output
"Earnings Date Oct 26, 2020 - Oct 30, 2020" DateTime 2020-10-26T00:00:00+00:00

Returns:

  • str - cleaned if value is not missing.
  • None - if value is missing.

clean_symbol

 | def clean_symbol(cls, value: str) -> str

Make symbol uppercase.

Arguments:

  • value str - Stock symbol.

Example:

Input Output
aapl AAPL

Returns:

  • str - Uppercased.

clean_first_value_split_by_dash

 | def clean_first_value_split_by_dash(cls, value: str) -> str

Split value separated by '-' and return the first value.

Arguments:

  • value str - A string with multiple values separated by a '-'.

Example:

Input Output
"2.3400 - 2.4900" "2.3400"

Returns:

  • strs - First value parse from string.
  • None - if value is missing.

clean_second_value_split_by_dash

 | def clean_second_value_split_by_dash(cls, value: str) -> str

Split value separated by '-' and return the second value.

Arguments:

  • value str - A string with multiple values separated by a '-'.

Example:

Input Output
"2.3400 - 2.4900" "2.4900"

Returns:

  • strs - Second value parse from string.
  • None - if value is missing.

clean_first_value_split_by_space

 | def clean_first_value_split_by_space(cls, value: str) -> str

Clean first string containing a change and percent.

This will strip brakets and percent sign leaving only a whitespace between the change and percent change. Next it will split the values then return the first value. Normally the Forward Dividend & Yield from the summary page.

Arguments:

  • value str - Normally a string containing change and percent change.

Example:

Input Output
"0.82 (0.73%)" "0.82"

Returns:

  • str - The first string.
  • None - if value is missing.

clean_second_value_split_by_space

 | def clean_second_value_split_by_space(cls, value: str) -> str

Clean second string containing a change and percent.

This will strip brakets and percent sign leaving only a whitespace between the change and percent change. Next it will split the values then return the second value. Normally the Forward Dividend & Yield from the summary page.

Arguments:

  • value str - Normally a string containing change and percent change.

Example:

Input Output
"0.82 (0.73%)" "0.73"

Returns:

  • str - The Second value.
  • None - if value is missing.

clean_first_value_split_by_x

 | def clean_first_value_split_by_x(cls, value: str) -> str

Split value separated by 'x' and return the first value.

Arguments:

  • value str - A string with multiple values separated by a 'x'.

Example:

Input Output
"2.4000 x 21500" "2.4000"

Returns:

  • strs - First value parse from string.
  • None - if value is missing.

clean_second_value_split_by_x

 | def clean_second_value_split_by_x(cls, value: str) -> str

Split value separated by 'x' and return the second value.

Arguments:

  • value str - A string with multiple values separated by a 'x'.

Example:

Input Output
"2.4000 x 21500" "21500"

Returns:

  • strs - Second value parse from string.
  • None - if value is missing.