sanskrit_text package

Submodules

sanskrit_text.cli module

Console Script for sanskrit-text

sanskrit_text.cli.main()[source]

Console Script for sanskrit-text

Module contents

Sanskrit Text Utility

sanskrit_text.ord_unicode(ch: str) str[source]

Get Unicode 4-character-identifier corresponding to a character

Parameters:

ch (str) – Single character

Returns:

4-character unicode identifier

Return type:

str

sanskrit_text.chr_unicode(u: str) str[source]

Get a Unicode character corresponding to 4-chracater identifier

Parameters:

u (str) – 4-character unicode identifier

Returns:

Single character

Return type:

str

sanskrit_text.form_pratyaahaara(letters: List[str]) str[source]

Form a pratyaahaara from a list of letters

sanskrit_text.resolve_pratyaahaara(pratyaahaara: str) List[List[str]][source]

Resolve pratyaahaara into all possible lists of characters

sanskrit_text.clean(text: str, punct: bool = False, digits: bool = False, spaces: bool = True, allow: Optional[list] = None) str[source]

Clean a line of Sanskrit (Devanagari) text

Parameters:
  • text (str) – Input string

  • punct (bool, optional) – If True, the punctuations are kept. The default is False.

  • digits (bool, optional) – If True, digits are kept. The default is False.

  • spaces (bool, optional) – If False, spaces are removed. It is recommended to not change the default value unless it is specifically relevant to a use-case. The default is True.

  • allow (list, optional) – List of characters to allow. The default is None.

Returns:

Clean version of the string

Return type:

str

sanskrit_text.split_lines(text: str, pattern='[।॥\\r\\n]+') List[str][source]

Split a string into a list of strings using regular expression

Parameters:
  • text (str) – Input string

  • pattern (regexp, optional) – Regular expression corresponding to the split points. The default is r’[।॥rn]+’.

Returns:

List of strings

Return type:

List[str]

sanskrit_text.trim_matra(line: str) str[source]

Trim matra from the end of a string

sanskrit_text.is_laghu(syllable: str) bool[source]

Checks if the current syllable is Laghu

sanskrit_text.toggle_matra(syllable: str) str[source]

Change the Laghu syllable to Guru and Guru to Laghu (if possible)

sanskrit_text.marker_to_swara(m: str) str[source]

Convert a Matra to corresponding Swara

sanskrit_text.swara_to_marker(s: str) str[source]

Convert a Swara to correponding Matra

sanskrit_text.get_anunaasika(ch: str) str[source]

Get the appropriate anunaasika from the character’s group

sanskrit_text.fix_anuswara(text: str) str[source]

Check every anuswaara in the text and change to anunaasika if applicable

sanskrit_text.get_syllables_word(word: str, technical: bool = False) List[str][source]

Get syllables from a Sanskrit (Devanagari) word

Parameters:
  • word (str) – Sanskrit (Devanagari) word to get syllables from. Spaces, if present, are ignored.

  • technical (bool, optional) – If True, ensures that each element contains at most one Swara or Vyanjana. The default is False.

Returns:

List of syllables

Return type:

List[str]

sanskrit_text.get_syllables(text: str, technical: bool = False) List[List[List[str]]][source]

Get syllables from a Sanskrit (Devanagari) text

Parameters:
  • text (str) – Sanskrit (Devanagari) text to get syllables from

  • technical (bool, optional) – If True, ensures that each element contains at most one Swara or Vyanjana. The default is False.

Returns:

List of syllables in a nested list format Nesting Levels: Text -> Lines -> Words

Return type:

List[List[List[str]]]

sanskrit_text.split_varna_word(word: str, technical: bool = True) List[str][source]

Obtain the Varna decomposition of a Sanskrit (Devanagari) word

Parameters:
  • word (str) – Sanskrit (Devanagari) word to be split.

  • technical (bool, optional) – If True, a split, vowels and vowel signs are treated independently which is more useful for analysis, The default is True.

Returns:

List of Varna

Return type:

List[str]

sanskrit_text.split_varna(text: str, technical: bool = True, flat: bool = False) List[List[List[str]]][source]

Obtain the Varna decomposition of a Sanskrit (Devanagari) text

Parameters:
  • word (str) – Sanskrit (Devanagari) text to be split.

  • technical (bool, optional) – If True, a split, vowels and vowel signs are treated independently which is more useful for analysis, The default is True.

  • flat (bool, optional) – If True, a single list is returned instead of nested lists. The default is False.

Returns:

Varna decomposition of the text in a nested list format. Nesting Levels: Text -> Lines -> Words

  • Varna decomposition of each word is a List[char].

  • List of Varna decomposition of each word from a line.

  • List of Varna decomposition of each line from the text.

If flat=True, Varna decomposition of the entire text is presented as a single list, also containing whitespace markers. Lines are separated by a newline character ‘n’ and words are separated by a space character ‘ ‘.

Return type:

List[List[List[str]]] or List[str]

sanskrit_text.join_varna(viccheda: str, technical: bool = True) str[source]

Join Varna decomposition to form a Sanskrit (Devanagari) word

Parameters:
  • viccheda (list) – Viccheda output obtained by split_varna_word with technical=True (or output of split_varna with technical=True and flat=True) IMPORTANT: technical=True is necessary.

  • technical (bool) – WARNING: Currently unused. Value of the same parameter passed to split_varna_word

Note

Currently only works for the viccheda generated with technical=True

Returns:

s – Sanskrit word

Return type:

str

sanskrit_text.get_ucchaarana_vector(letter: str, abbrev=False) Dict[str, int][source]

Get ucchaarana sthaana and prayatna based vector of a letter

Parameters:
  • letter (str) – Sanskrit letter

  • abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.

Returns:

vector – One-hot vector indicating utpatti sthaana, aabhyantara prayatna and baahya prayatna of a letter

Return type:

Dict[str, int]

sanskrit_text.get_ucchaarana_vectors(word: str, abbrev: bool = False) List[Tuple[str, Dict[str, int]]][source]

Get ucchaarana sthaana and prayatna based vector of a word or text

Parameters:
  • word (str) – Sanskrit word (or text)

  • abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.

Returns:

vectors – List of (letter, vector)

Return type:

List[Tuple[str, Dict[str, int]]]

sanskrit_text.get_signature_letter(letter: str, abbrev: bool = False) Dict[str, str][source]

Get ucchaarana sthaana and prayatna based signature of a letter

Parameters:
  • letter (str) – Sanskrit letter

  • abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.

Returns:

signature – utpatti sthaana, aabhyantara prayatna and baahya prayatna of a letter

Return type:

Dict[str, str]

sanskrit_text.get_signature_word(word: str, abbrev: bool = False) List[Tuple[str, Dict[str, str]]][source]

Get ucchaarana sthaana and prayatna based signature of a word

Parameters:
  • word (str) – Sanskrit word (or text) Caution: If multiple words are provided, the spaces are not included in the output list.

  • abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.

Returns:

List of (letter, signature)

Return type:

List[Tuple[str, Dict[str, str]]]

sanskrit_text.get_signature(text: str, abbrev: bool = False) List[List[List[Tuple[str, Dict[str, str]]]]][source]

Get ucchaarana list of a Sanskrit text

Parameters:
  • text (str) – Sanskrit text (can contain newlines, spaces)

  • abbrev (bool) – If True, the output will contain English abbreviations otherwise, the output will contain Sanskrit names. The default is False.

Returns:

List of (letter, signature) for words in a nested list format Nesting Levels: Text -> Lines -> Words

Return type:

List[List[List[Tuple[str, Dict[str, str]]]]]

sanskrit_text.get_ucchaarana_letter(letter: str, dimension: int = 0, abbrev: bool = False) str[source]

Get ucchaarana sthaana or prayatna of a letter

Parameters:
  • letter (str) – Sanskrit letter

  • dimension (int) –

    • 0: sthaana

    • 1: aabhyantara prayatna

    • 2: baahya prayatna

    The default is 0.

  • abbrev (bool) –

    If True,

    The output will contain English abbreviations

    Otherwise,

    The output will contain Sanskrit names

    The default is False.

Returns:

ucchaarana sthaana or prayatna of a letter

Return type:

str

sanskrit_text.get_ucchaarana_word(word: str, dimension: int = 0, abbrev: bool = False) List[Tuple[str, str]][source]

Get ucchaarana of a word

Parameters:
  • word (str) –

    Sanskrit word (or text)

    Caution: If multiple words are provided, the spaces are not included in the output list

  • dimension (int) –

    • 0: sthaana

    • 1: aabhyantara prayatna

    • 2: baahya prayatna

    The default is 0.

  • abbrev (bool) –

    If True,

    The output will contain English abbreviations

    Otherwise,

    The output will contain Sanskrit names

    The default is False.

Returns:

List of (letter, ucchaarana)

Return type:

List[Tuple[str, str]]

sanskrit_text.get_ucchaarana(text: str, dimension: int = 0, abbrev: bool = False) List[List[List[Tuple[str, str]]]][source]

Get ucchaarana list of a Sanskrit text

Parameters:
  • text (str) – Sanskrit text (can contain newlines, spaces)

  • dimension (int) –

    • 0: sthaana

    • 1: aabhyantara prayatna

    • 2: baahya prayatna

    The default is 0.

  • abbrev (bool) –

    If True,

    The output will contain English abbreviations

    Otherwise,

    The output will contain Sanskrit names

    The default is False.

Returns:

List of (letter, ucchaarana) for words in a nested list format Nesting Levels: Text -> Lines -> Words

Return type:

List[List[List[Tuple[str, str]]]]

sanskrit_text.get_sthaana_letter(letter: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_letter for sthaana

sanskrit_text.get_sthaana_word(word: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_word for sthaana

sanskrit_text.get_sthaana(text: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana for sthaana

sanskrit_text.get_aabhyantara_letter(letter: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_letter for aabhyantara

sanskrit_text.get_aabhyantara_word(word: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_word for aabhyantara

sanskrit_text.get_aabhyantara(text: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana for aabhyantara

sanskrit_text.get_baahya_letter(letter: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_letter for baahya

sanskrit_text.get_baahya_word(word: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana_word for baahya

sanskrit_text.get_baahya(text: str, abbrev: bool = False)[source]

Wrapper for get_ucchaarana for baahya