sanskrit_text package
Submodules
sanskrit_text.cli module
Console Script for sanskrit-text
Module contents
Sanskrit Text Utility
- sanskrit_text.ord_unicode(ch: str) str [source]
Get Unicode 4-character-identifier corresponding to a character
- Parameters:
ch (str) – Single character
- Returns:
4-character unicode identifier
- Return type:
str
- sanskrit_text.chr_unicode(u: str) str [source]
Get a Unicode character corresponding to 4-chracater identifier
- Parameters:
u (str) – 4-character unicode identifier
- Returns:
Single character
- Return type:
str
- sanskrit_text.form_pratyaahaara(letters: List[str]) str [source]
Form a pratyaahaara from a list of letters
- sanskrit_text.resolve_pratyaahaara(pratyaahaara: str) List[List[str]] [source]
Resolve pratyaahaara into all possible lists of characters
- sanskrit_text.clean(text: str, punct: bool = False, digits: bool = False, spaces: bool = True, allow: Optional[list] = None) str [source]
Clean a line of Sanskrit (Devanagari) text
- Parameters:
text (str) – Input string
punct (bool, optional) – If True, the punctuations are kept. The default is False.
digits (bool, optional) – If True, digits are kept. The default is False.
spaces (bool, optional) – If False, spaces are removed. It is recommended to not change the default value unless it is specifically relevant to a use-case. The default is True.
allow (list, optional) – List of characters to allow. The default is None.
- Returns:
Clean version of the string
- Return type:
str
- sanskrit_text.split_lines(text: str, pattern='[।॥\\r\\n]+') List[str] [source]
Split a string into a list of strings using regular expression
- Parameters:
text (str) – Input string
pattern (regexp, optional) – Regular expression corresponding to the split points. The default is r’[।॥rn]+’.
- Returns:
List of strings
- Return type:
List[str]
- sanskrit_text.toggle_matra(syllable: str) str [source]
Change the Laghu syllable to Guru and Guru to Laghu (if possible)
- sanskrit_text.get_anunaasika(ch: str) str [source]
Get the appropriate anunaasika from the character’s group
- sanskrit_text.fix_anuswara(text: str) str [source]
Check every anuswaara in the text and change to anunaasika if applicable
- sanskrit_text.get_syllables_word(word: str, technical: bool = False) List[str] [source]
Get syllables from a Sanskrit (Devanagari) word
- Parameters:
word (str) – Sanskrit (Devanagari) word to get syllables from. Spaces, if present, are ignored.
technical (bool, optional) – If True, ensures that each element contains at most one Swara or Vyanjana. The default is False.
- Returns:
List of syllables
- Return type:
List[str]
- sanskrit_text.get_syllables(text: str, technical: bool = False) List[List[List[str]]] [source]
Get syllables from a Sanskrit (Devanagari) text
- Parameters:
text (str) – Sanskrit (Devanagari) text to get syllables from
technical (bool, optional) – If True, ensures that each element contains at most one Swara or Vyanjana. The default is False.
- Returns:
List of syllables in a nested list format Nesting Levels: Text -> Lines -> Words
- Return type:
List[List[List[str]]]
- sanskrit_text.split_varna_word(word: str, technical: bool = True) List[str] [source]
Obtain the Varna decomposition of a Sanskrit (Devanagari) word
- Parameters:
word (str) – Sanskrit (Devanagari) word to be split.
technical (bool, optional) – If True, a split, vowels and vowel signs are treated independently which is more useful for analysis, The default is True.
- Returns:
List of Varna
- Return type:
List[str]
- sanskrit_text.split_varna(text: str, technical: bool = True, flat: bool = False) List[List[List[str]]] [source]
Obtain the Varna decomposition of a Sanskrit (Devanagari) text
- Parameters:
word (str) – Sanskrit (Devanagari) text to be split.
technical (bool, optional) – If True, a split, vowels and vowel signs are treated independently which is more useful for analysis, The default is True.
flat (bool, optional) – If True, a single list is returned instead of nested lists. The default is False.
- Returns:
Varna decomposition of the text in a nested list format. Nesting Levels: Text -> Lines -> Words
Varna decomposition of each word is a List[char].
List of Varna decomposition of each word from a line.
List of Varna decomposition of each line from the text.
If flat=True, Varna decomposition of the entire text is presented as a single list, also containing whitespace markers. Lines are separated by a newline character ‘n’ and words are separated by a space character ‘ ‘.
- Return type:
List[List[List[str]]] or List[str]
- sanskrit_text.join_varna(viccheda: str, technical: bool = True) str [source]
Join Varna decomposition to form a Sanskrit (Devanagari) word
- Parameters:
viccheda (list) – Viccheda output obtained by split_varna_word with technical=True (or output of split_varna with technical=True and flat=True) IMPORTANT: technical=True is necessary.
technical (bool) – WARNING: Currently unused. Value of the same parameter passed to split_varna_word
Note
Currently only works for the viccheda generated with technical=True
- Returns:
s – Sanskrit word
- Return type:
str
- sanskrit_text.get_ucchaarana_vector(letter: str, abbrev=False) Dict[str, int] [source]
Get ucchaarana sthaana and prayatna based vector of a letter
- Parameters:
letter (str) – Sanskrit letter
abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.
- Returns:
vector – One-hot vector indicating utpatti sthaana, aabhyantara prayatna and baahya prayatna of a letter
- Return type:
Dict[str, int]
- sanskrit_text.get_ucchaarana_vectors(word: str, abbrev: bool = False) List[Tuple[str, Dict[str, int]]] [source]
Get ucchaarana sthaana and prayatna based vector of a word or text
- Parameters:
word (str) – Sanskrit word (or text)
abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.
- Returns:
vectors – List of (letter, vector)
- Return type:
List[Tuple[str, Dict[str, int]]]
- sanskrit_text.get_signature_letter(letter: str, abbrev: bool = False) Dict[str, str] [source]
Get ucchaarana sthaana and prayatna based signature of a letter
- Parameters:
letter (str) – Sanskrit letter
abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.
- Returns:
signature – utpatti sthaana, aabhyantara prayatna and baahya prayatna of a letter
- Return type:
Dict[str, str]
- sanskrit_text.get_signature_word(word: str, abbrev: bool = False) List[Tuple[str, Dict[str, str]]] [source]
Get ucchaarana sthaana and prayatna based signature of a word
- Parameters:
word (str) – Sanskrit word (or text) Caution: If multiple words are provided, the spaces are not included in the output list.
abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.
- Returns:
List of (letter, signature)
- Return type:
List[Tuple[str, Dict[str, str]]]
- sanskrit_text.get_signature(text: str, abbrev: bool = False) List[List[List[Tuple[str, Dict[str, str]]]]] [source]
Get ucchaarana list of a Sanskrit text
- Parameters:
text (str) – Sanskrit text (can contain newlines, spaces)
abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.
- Returns:
List of (letter, signature) for words in a nested list format Nesting Levels: Text -> Lines -> Words
- Return type:
List[List[List[Tuple[str, Dict[str, str]]]]]
- sanskrit_text.get_ucchaarana_letter(letter: str, dimension: int = 0, abbrev: bool = False) str [source]
Get ucchaarana sthaana or prayatna of a letter
- Parameters:
letter (str) – Sanskrit letter
dimension (int) –
0: sthaana
1: aabhyantara prayatna
2: baahya prayatna
The default is 0.
abbrev (bool) –
- If True,
The output will contain English abbreviations
- Otherwise,
The output will contain Sanskrit names
The default is False.
- Returns:
ucchaarana sthaana or prayatna of a letter
- Return type:
str
- sanskrit_text.get_ucchaarana_word(word: str, dimension: int = 0, abbrev: bool = False) List[Tuple[str, str]] [source]
Get ucchaarana of a word
- Parameters:
word (str) –
Sanskrit word (or text)
Caution: If multiple words are provided, the spaces are not included in the output list
dimension (int) –
0: sthaana
1: aabhyantara prayatna
2: baahya prayatna
The default is 0.
abbrev (bool) –
- If True,
The output will contain English abbreviations
- Otherwise,
The output will contain Sanskrit names
The default is False.
- Returns:
List of (letter, ucchaarana)
- Return type:
List[Tuple[str, str]]
- sanskrit_text.get_ucchaarana(text: str, dimension: int = 0, abbrev: bool = False) List[List[List[Tuple[str, str]]]] [source]
Get ucchaarana list of a Sanskrit text
- Parameters:
text (str) – Sanskrit text (can contain newlines, spaces)
dimension (int) –
0: sthaana
1: aabhyantara prayatna
2: baahya prayatna
The default is 0.
abbrev (bool) –
- If True,
The output will contain English abbreviations
- Otherwise,
The output will contain Sanskrit names
The default is False.
- Returns:
List of (letter, ucchaarana) for words in a nested list format Nesting Levels: Text -> Lines -> Words
- Return type:
List[List[List[Tuple[str, str]]]]
- sanskrit_text.get_sthaana_letter(letter: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana_letter for sthaana
- sanskrit_text.get_sthaana_word(word: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana_word for sthaana
- sanskrit_text.get_sthaana(text: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana for sthaana
- sanskrit_text.get_aabhyantara_letter(letter: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana_letter for aabhyantara
- sanskrit_text.get_aabhyantara_word(word: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana_word for aabhyantara
- sanskrit_text.get_aabhyantara(text: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana for aabhyantara
- sanskrit_text.get_baahya_letter(letter: str, abbrev: bool = False)[source]
Wrapper for get_ucchaarana_letter for baahya