Skip to content

Importation and standardization (ETL) of the Electoral History of the Brazilian TSE.

Notifications You must be signed in to change notification settings

renovabr/electoral-history

Repository files navigation

Brazilian Electoral History

RenovaBR Foundation

ETL project that collects data from the Brazilian Electoral Data Repository and imports it into a MySQL database, and then the data is placed in Elasticsearch using the ELK stack. Electoral years available: 2010, 2012, 2014, 2016, 2018, 2020. The full import of every year totals 28.842.859 million lines in the database.

Synopsis

This is a raw data processing (ETL) project for standardizing information for a MySQL Database. The Brazilian Electoral History data is made available by TSE on the website TSE Electoral Data Repository.

The process of importing the data into an SQL database aims to facilitate the development of other systems as well as for statistical analysis.

Architecture

This project uses the following technologies:

Five tables are created in the database called (electoral_history) to store the data. Are they:

Table Lines Description
raw_tse_consult_candidates 16.177.96 Contains all candidates.
raw_tse_voting_cand_city 20.954.033 Contains candidate votes by city.
raw_tse_voting_party_city 16.671.05 Contains party votes by city.
raw_tse_cand_goods_declared 31.428.04 Contains the goods declared candidates.
cand_info 14.611.21 Merge with candidate information.

Requeriments

You must have it installed on your workstation:

And preferably use a GNU Linux distribution.

1. Get the code and data

Now run the commands below to compile the project:

$ git clone https://github.com/renovabr/electoral-history.git
$ cd electoral-history

Enter pipenv at the root of the electoral-history folder for the tests:

$ pipenv shell
$ pipenv install

The tse-data script downloads the raw data using the year as a parameter. Example to download the year 2020:

$ ./tse-data 2020

You can also download them all using the command:

$ for i in "2010" "2012" "2014" "2016" "2018" "2020"; do ./tse-data ${i}; done

Wait for the data to download! The data is downloaded in the project folder at: data/tse/YEAR

2. Set MySQL database

Access the database with user root check if everything is working.

$ mysql -h 127.0.0.1 -u root -p

Then run the sql/schema.sql that is within the project.

$ mysql -h 127.0.0.1 -u root -p < sql/schema.sql

Note: A database will be created called: electoral_historal with three tables and a user with access to a database called winston. The database connection settings can be changed in the config.ini file.

3. Import all data to MySQL

The scripts below import the data from the 4 table:

  • consulta-cand -> raw_tse_consult_candidates
  • votacao-candidato -> raw_tse_voting_cand_city
  • votacao-partido -> raw_tse_voting_party_city
  • bem-candidato -> raw_tse_cand_goods_declared

The tables with indication raw_ are the raw data. At the end a merge of all tables is created, this final result is available in the cand_info table.

3.1 Importing candidate information

To import the candidates data for all years, use the script: run-import-all.

$ ./01-run-import-all 'consulta-cand'

If you want to import a specific year and state you can use the command:

$ cd src
$ ./01-consulta-cand.py -y 2020 \
  -s 'AC' -p ../data/tse/2020/consulta_cand_2020/ -e csv

Or import the year all from all states:

$ cd src
$ ./01-consulta-cand.py -y 2020 \
  -p ../data/tse/2020/consulta_cand_2020/ -e csv

Note: The years (2010 and 2012) the extension is txt, the others are csv.

3.2 Import of candidates' votes

The same procedure applies to the other tables. To import all candidates voting data, use the command:

$ ./01-run-import-all 'votacao-candidato'

If you want to import a specific year and state you can use the command:

$ cd src
$ ./02-votacao-candidato-munzona.py -y 2020 \
  -s 'AC' -p ../data/tse/2020/votacao_candidato_munzona_2020/ -e csv

Or import the year all from all states:

$ cd src
$ ./02-votacao-candidato-munzona.py -y 2020 \
  -p ../data/tse/2020/votacao_candidato_munzona_2020/ -e csv

Note: The years (2010 and 2012) the extension is txt, the others are csv.

3.3 Importing of votes by parties

For the import of votes by party:

$ ./01-run-import-all 'votacao-partido'

If you want to import a specific year and state you can use the command:

$ cd src 
$ ./03-votacao-partido-munzona.py -y 2020 \
  -s 'AC' -p ../data/tse/2020/votacao_partido_munzona_2020/ -e csv

Or import the year all from all states:

$ cd src 
$ ./03-votacao-partido-munzona.py -y 2020 \
  -p ../data/tse/2020/votacao_partido_munzona_2020/ -e csv

Note: The years (2010 and 2012) the extension is txt, the others are csv.

3.4 Import of candidates' declared goods

To import candidates' declared goods:

$ ./01-run-import-all 'bem-candidato'

If you want to import a specific year and state you can use the command:

$ cd src
$ ./04-bens-candidato.py -y 2020 \
  -s 'AC' -p ../data/tse/2020/bem_candidato_2020/ -e csv

Or import the year all from all states:

$ cd src
$ ./04-bens-candidato.py -y 2020 \
  -p ../data/tse/2020/bem_candidato_2020/ -e csv

Note: The years (2010 and 2012) the extension is txt, the others are csv.

4. Computing candidate votes and information

This table is a junction of candidate data with the voting base.

For the import of all years:

$ ./02-run-cand-info

Note: The script automatically imports data from the shift 1 and after the shift 2.

If you want a specific year, shift 1:

$ cd src
$ ./05-cand-info.py -y 2020 -t 1

Same year for shift 2:

$ cd src
$ ./05-cand-info.py -y 2020 -t 2

If you want to listen to only one state and save it in a CSV file, use the command:

$ cd src
$ ./05-cand-info.py -y 2020 -t 1 -s 'AC' -e ac.csv

Or to process every year use the command:

$ cd src
$ for i in "2010" "2012" "2014" "2016" "2018" "2020"; do ./04-cand-info.py -y ${i} -t 1; done
$ for i in "2010" "2012" "2014" "2016" "2018" "2020"; do ./04-cand-info.py -y ${i} -t 2; done

The script must be executed per year and the results are saved in table cand_info

5. Processing Vice-mayor Data

To process the number of votes and other information for vice-mayor positions, it is necessary to use the script:

$ ./03-run-cand-info-vices

Note: The script automatically imports data from the shift 1 and after the shift 2.

If you want a specific year, shift 1:

$ cd src
$ ./05-cand-info-vices.py -y 2020 -t 1

Same year for shift 2:

$ cd src
$ ./05-cand-info-vices.py -y 2020 -t 2

6. Checking and analyzing data

There is a data dictionary that can be downloaded here: Dictionary.

All tables are partitioned by the year of election.

SELECT COUNT(1) FROM raw_tse_consult_candidates PARTITION (p2020);

Checking positions in the 2016 election:

SELECT
  ds_position 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2016' 
GROUP BY
  1 
ORDER BY
  1;
2016
Prefeito
Vereador

Checking positions in the 2018 election:

SELECT
  ds_position 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2018' 
GROUP BY
  1 
ORDER BY
  1;
2018
Deputado Distrital
Deputado Estadual
Deputado Federal
Governador
Senador

Checking the total number of votes of the Governors of the state of Santa Catarina in the city of Florianópolis in the first shift of the 2018 elections.

SELECT
  sq_candidate AS SQ,
  nm_ballot_candidate AS Name,
  ds_position AS Position,
  nm_city AS City,
  format(sum(qt_votes_nominal), 0, 'de_DE') AS Votes 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2018' 
  AND sg_uf = 'SC' 
  AND cd_city = 81051 
  AND nr_shift = 1 
  AND cd_position = 3 
GROUP BY
  1,
  2,
  3,
  4 
ORDER BY
  sum(qt_votes_nominal) DESC;
SQ Name Position City Votes
240000609724 COMANDANTE MOISÉS Governador FLORIANÓPOLIS 73.947
240000621321 GELSON MERÍSIO Governador FLORIANÓPOLIS 59.524
240000609537 MAURO MARIANI Governador FLORIANÓPOLIS 43.796
240000624336 DÉCIO LIMA Governador FLORIANÓPOLIS 39.144
240000601841 CAMASÃO Governador FLORIANÓPOLIS 19.362
240000616318 PORTANOVA Governador FLORIANÓPOLIS 4.844
240000610038 INGRID ASSIS Governador FLORIANÓPOLIS 1.644
240000614244 JESSE PEREIRA Governador FLORIANÓPOLIS 1.281

Checking the total number of votes of the Governors of the State of Santa Catarina in the second shit of the 2018 elections.

SELECT
  sq_candidate AS SQ,
  nm_ballot_candidate AS Name,
  ds_position AS Position,
  format(sum(qt_votes_nominal), 0, 'de_DE') AS Votes 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2018' 
  AND sg_uf = 'SC' 
  AND nr_shift = 2 
  AND cd_position = 3 
GROUP BY
  1,
  2,
  3 
ORDER BY
  sum(qt_votes_nominal) DESC;
SQ Name Position Votes
240000609724 COMANDANTE MOISÉS Governador 2.644.179
240000621321 GELSON MERÍSIO Governador 1.075.242

Checking the total number of votes of the Governors of the State of São Paulo in the first shift of the 2018 elections.

SELECT
  sq_candidate AS SQ,
  nm_ballot_candidate AS Name,
  ds_position AS Position,
  format(sum(qt_votes_nominal), 0, 'de_DE') AS Votes 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2018' 
  AND sg_uf = 'SP' 
  AND nr_shift = 1 
  AND cd_position = 3 
GROUP BY
  1,
  2,
  3 
ORDER BY
  sum(qt_votes_nominal) DESC;
SQ Name Position Votes
250000612596 JOÃO DORIA Governador 6.431.555
250000615141 MARCIO FRANÇA Governador 4.358.998
250000604077 PAULO SKAF Governador 4.269.865
250000623884 LUIZ MARINHO Governador 2.563.922
250000612133 MAJOR COSTA E SILVA Governador 747.462
250000601939 ROGERIO CHEQUER Governador 673.102
250000615464 RODRIGO TAVARES Governador 649.729
250000601522 PROFESSORA LISETE Governador 507.236
250000617766 PROF. CLAUDIO FERNANDO Governador 28.666
250000609174 TONINHO FERREIRA Governador 16.202

Which top 10 city has the most votes for a distinguished candidate example the candidate (250000612596) to governor of the state of São Paulo, for the second shift.

SELECT
  nm_city AS City,
  sum(qt_votes_nominal) AS Votes 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2018' 
  AND sg_uf = 'SP' 
  AND nr_shift = 2 
  AND cd_position = 3 
  AND sq_candidate = 250000612596 
GROUP BY
  1 
ORDER BY
  2 DESC LIMIT 10;
City Votes
SÃO PAULO 2447309
CAMPINAS 315524
GUARULHOS 240825
SÃO JOSÉ DOS CAMPOS 232775
SOROCABA 207470
SANTO ANDRÉ 202125
SÃO BERNARDO DO CAMPO 196202
OSASCO 176109
RIBEIRÃO PRETO 166728
JUNDIAÍ 143028

Checking the total amount of mayors of Rio Branco in Acre for the 2º shift (2020 year).

SELECT
  sq_candidate AS SQ,
  nm_ballot_candidate AS Name,
  cd_position AS CodePosition,
  ds_position AS Position,
  nm_city AS City,
  format(sum(qt_votes_nominal), 0, 'de_DE') AS Votes 
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2020' 
  AND sg_uf = 'AC' 
  AND cd_city = 1392 
  AND nr_shift = 2 
  AND cd_position = 11 
GROUP BY
  1,
  2,
  3,
  4,
  5 
ORDER BY
  sum(qt_votes_nominal) DESC;
SQ Name CodePosition Position Votes
10001003637 TIÃO BOCALOM 11 Prefeito 104.746
10001154203 SOCORRO NERI 11 Prefeito 61.702

Checks all elected councilors of Rio Branco in Acre:

SELECT
  sq_candidate AS SQ,
  nm_ballot_candidate AS Name,
  cd_position AS CodePosition,
  ds_position AS Position,
  ds_situ_tot_shift AS Situation,
  nm_city AS City,
  format(sum(qt_votes_nominal), 0, 'de_DE') AS Votes
FROM
  raw_tse_voting_cand_city 
WHERE
  election_year = '2020' 
  AND sg_uf = 'AC' 
  AND cd_city = 1392 
  AND nr_shift = 1
  AND cd_position = 13
  AND ((cd_situ_tot_shift = 1) OR (cd_situ_tot_shift = 2) OR (cd_situ_tot_shift = 3)) 
GROUP BY
  1,
  2,
  3,
  4,
  5,
  6
ORDER BY
  sum(qt_votes_nominal) DESC;
SQ Name CodePosition Position Situation City Votes
10000851435 DRA. MICHELLE MELO 13 Vereador ELEITO POR QP RIO BRANCO 3.576
10000848795 SAMIR BESTENE 13 Vereador ELEITO POR QP RIO BRANCO 3.403
10000969481 ANTONIO MORAIS 13 Vereador ELEITO POR QP RIO BRANCO 2.933
10000699918 EMERSON JARUDE 13 Vereador ELEITO POR QP RIO BRANCO 2.765
10000848788 VER. CAP. N LIMA' 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.559
10000969483 RAIMUNDO NENÉM 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.555
10000851433 JOAQUIM FLORÊNCIO 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.411
10000848803 RUTÊNIO SÁ 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.271
10000969498 ADAILTON CRUZ 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.222
10000968150 ARNALDO BARROS 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.132
10000851443 FÁBIO ARAÚJO 13 Vereador ELEITO POR MÉDIA RIO BRANCO 2.086
10001014911 ISMAEL MACHADO 13 Vereador ELEITO POR QP RIO BRANCO 1.928
10000670251 LENE PETECÃO 13 Vereador ELEITO POR QP RIO BRANCO 1.789
10000701028 FRANCISCO PIABA 13 Vereador ELEITO POR MÉDIA RIO BRANCO 1.659
10000699922 CÉLIO GADELHA 13 Vereador ELEITO POR MÉDIA RIO BRANCO 1.293
10001014897 DR RAIMUNDO CASTRO 13 Vereador ELEITO POR MÉDIA RIO BRANCO 1.268
10001079047 HILDEGARD PASCOAL 13 Vereador ELEITO POR MÉDIA RIO BRANCO 1.130

Authors

About

Importation and standardization (ETL) of the Electoral History of the Brazilian TSE.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published