| Title: | Custom 'MetaphoneBR' Phonetic Encoding for Brazilian Names |
|---|---|
| Description: | Simplifies Brazilian names phonetically using a custom 'metaphoneBR' algorithm that preserves ending vowels. Useful for name matching processing preserving gender information carried generally by ending vowels in Portuguese. Mation (2025) <doi:10.6082/uchicago.15104>. |
| Authors: | Rodrigo Borges [aut, cre] (ORCID: <https://orcid.org/0000-0003-2076-1424>), Lucas Mation [aut] (ORCID: <https://orcid.org/0000-0002-7461-932X>), Ipea - Institue for Applied Economic Research [cph, fnd] |
| Maintainer: | Rodrigo Borges <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.5 |
| Built: | 2026-05-28 07:52:53 UTC |
| Source: | https://github.com/ipeadata-lab/metaphonebr |
Applies a series of phonetic transformations to a person names vector to generate code that represents its approximate pronunciation in Brazilian Portuguese. The objective is to group similar sounding names, even though written in different forms.
metaphonebr(fullnames, verbose = FALSE)metaphonebr(fullnames, verbose = FALSE)
fullnames |
A character vector for names to be processed. |
verbose |
Logical, if |
The treatment process involves:
Preprocessing: Removal of accents, numbers and capitalize.
Removal of silent letters (initial H).
Simplification of common digraphs (LH, NH, CH, SC, QU, etc.).
Simplification of similar sounding consonants (C/K/S, G/J, Z/S, etc.).
Simplification of ending nasal sounds.
Removal of duplicated vowels.
Removal/trim of spaces and duplicated letters.
This is an adpation that does not follow strictly any published Metaphone algorithm, but was inspired by them considering brazilian portuguese context.
A character vector with corresponding phonetic representation for each entry.
example_names <- c("Jo\u00e3o Silva", "Joao da Silva", "Maria", "Marya", "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier") phonetic_codes <- metaphonebr(example_names) print(data.frame(Original = example_names, metaphonebr = phonetic_codes)) # With progress messages phonetic_codes_verbose <- metaphonebr("Exemplo Ășnico", verbose = TRUE)example_names <- c("Jo\u00e3o Silva", "Joao da Silva", "Maria", "Marya", "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier") phonetic_codes <- metaphonebr(example_names) print(data.frame(Original = example_names, metaphonebr = phonetic_codes)) # With progress messages phonetic_codes_verbose <- metaphonebr("Exemplo Ășnico", verbose = TRUE)