The goal of dadosCVM is to obtain and adjust data of investments funds registered on Comissão de Valores Mobiliários (Security and Exchange Commissions from Brazil). Data is collected from files available at http://dados.cvm.gov.br/dados/.
Main goals/ideas are descripted below:
- cad_fi(): Download the latest Registration Data (‘Dados Cadastrais’) available on CVM at http://dados.cvm.gov.br/dados/FI/CAD/DADOS/ - 90%;
- dados_diarios(): Download the daily fund information (CNPJ, fund raisings and withdraws, investors’ amount, equity value, share value) of selected funds between two dates. Data is available at http://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/ - under development;
- dados_fidc(): Develop a function that downloads monthly FIDC data between two dates from a specified table of the FIDC files located at http://dados.cvm.gov.br/dados/FIDC/DOC/INF_MENSAL/DADOS/ - under development;
- simpl_names(): Develop a function that simplifies funds names (i.e. transforms ‘FUNDO DE INVESTIMENTO EM DIREITOS CREDITORIOS’ to ‘FIDC’) - 0%.
You can install the development version of dadosCVM from GitHub with:
# install.packages("devtools")
devtools::install_github("lucasan93/dadosCVM")
The function cad_fi() downloads the latest registration data available on CVM.
library(dadosCVM)
library(dplyr)
library(kableExtra)
Below the top-10 funds in operation are extracted, sorted by their equity and identified by their CNPJ (Brazil National Registry of Legal Entities number). Note that the data.frame contains 40 columns and more than 60 thousand observations.
dados_cadastrais <- cad_fi() %>%
filter(situacao == 'EM FUNCIONAMENTO NORMAL') %>%
select(cnpj,
classe,
tipo,
inicio_atv,
pl_data,
pl) %>%
arrange(desc(pl)) %>%
slice_head(n = 10) %>%
mutate(pl = format(pl, big.mark = ','))
kable(dados_cadastrais)
cnpj | classe | tipo | inicio\_atv | pl\_data | pl |
---|---|---|---|---|---|
01.608.573/0001-65 | Fundo de Renda Fixa | FI | 1997-01-02 | 2022-03-23 | 159,651,227,089 |
27.146.328/0001-77 | Fundo de Renda Fixa | FI | 2017-05-03 | 2022-03-23 | 137,429,549,362 |
07.593.972/0001-86 | Fundo de Renda Fixa | FI | 2005-09-21 | 2022-03-23 | 122,607,768,840 |
22.985.157/0001-56 | FIP Multi | FIP | 2015-09-01 | 2021-12-31 | 107,568,311,434 |
01.597.187/0001-15 | Fundo de Renda Fixa | FI | 1996-12-30 | 2022-03-23 | 95,925,774,092 |
00.822.055/0001-87 | Fundo de Renda Fixa | FI | 1995-10-02 | 2022-03-23 | 94,753,353,219 |
42.592.302/0001-46 | Fundo de Renda Fixa | FI | 2021-09-30 | 2022-03-23 | 72,417,562,362 |
42.592.315/0001-15 | Fundo de Renda Fixa | FI | 2021-09-30 | 2022-03-23 | 72,412,668,571 |
04.288.966/0001-27 | Fundo de Renda Fixa | FI | 2002-07-15 | 2022-03-23 | 59,114,466,417 |
03.737.219/0001-66 | Fundo de Renda Fixa | FI | 2005-08-24 | 2022-03-23 | 55,208,583,595 |
The function dados_diarios() downloads daily data (CNPJ, portfolio value, share value, equity, withdraws, fund raisings, and number of investors) for specifics fund (identified by theis CNPJs) between two given dates. Note that daily data is only available for funds of type ‘FI’ and that initial date must be greater than 2005-01-01.
library(dadosCVM)
library(dplyr)
library(ggplot2)
library(scales)
Below we obtain data from funds displayed in the previous example and plot their equity value over time:
dados_diarios(cnpj = dados_cadastrais$cnpj,
start = as.Date('2005-01-01'),
end = as.Date('2022-03-01')) %>%
select(data, cnpj, pl) %>%
ggplot() +
aes(x = data, y = pl, color = cnpj) +
geom_line() +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Equity (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2021-01-01 and 2022-03-01"
#> [1] "Obtaining data between 2005-01-01 and 2022-03-01"
The function dados_fidc() downloads monthly data for a group of funds (identified by theis CNPJs) between two given dates and for a specified database (identified in CVM’s FIDC’s META file) out of 16 different databases. Initial date must be greater than 2013-01-01.
This function is currently under development. I’m working on simplifying column names and adding the opting to pivot the data.frame format from wide to long.
library(dadosCVM)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(scales)
library(kableExtra)
First, let’s select the top-10 FIDCs in operation with the highest equity value and extract their CNPJs and the operations’ start date using the cad_fi function:
infos <- cad_fi() %>%
filter(situacao == 'EM FUNCIONAMENTO NORMAL',
tipo == 'FIDC') %>%
arrange(desc(pl)) %>%
slice_head(n = 10) %>%
select(cnpj,
inicio_atv)
Now we obtain the database which provides us with information about the top-10 funds’ aggregated portfolio. CVM has a ‘META’ file describing the content of each FIDC database. A quick look at it and we find that the one we are interested in is the ‘II’ table. Soon the package will provide a reference database so this search can be done quicker. Note that although the oldest fund started in 2009, the function automatically set the start date to 2013-01-01 since there’s no data prior to that date. Note that the function pivots the CVM databases to longer and provides two identification columns (‘segment’ and ‘item’), which greatly facilitates analysis and plots.
dados_fidc(cnpj = infos$cnpj,
start = min(infos$inicio_atv),
end = as.Date('2022-02-01'),
table = 'II') %>%
replace(is.na(.), 0) %>%
filter(value > 0) %>%
group_by(data, segment) %>%
summarise(value = sum(value)) %>%
ggplot() +
aes(x = data, y = value, fill = segment) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"
If we are interested in analyzing the items within a specific segment, let’s say ‘financeiro’, we can plot the following:
dados_fidc(cnpj = infos$cnpj,
start = min(infos$inicio_atv),
end = as.Date('2022-02-01'),
table = 'II') %>%
replace(is.na(.), 0) %>%
filter(value > 0, segment == 'financeiro') %>%
group_by(data, item) %>%
summarise(value = sum(value)) %>%
ggplot() +
aes(x = data, y = value, fill = item) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"
And in order to verify how the segments and its items are organized based on their CVM categories, take a look at the defs_fidc database within the package. Note that the observations in which item == ‘total’ are excluded when running the dados_fidc() function in order to avoid double counting.
defs_fidcs %>%
filter(str_detect(category, '^TAB_II_')) %>%
kable(align = c('l', 'l')) %>%
kable_styling(latex_options = 'striped')
category | table | id | base | segment | class | item |
---|---|---|---|---|---|---|
TAB\_II\_VL\_CARTEIRA | II | II | carteira | carteira | total | total |
TAB\_II\_A\_VL\_INDUST | II | II.A | carteira | industrial | industrial | industrial |
TAB\_II\_B\_VL\_IMOBIL | II | II.B | carteira | imobiliario | imobiliario | imobiliario |
TAB\_II\_C\_VL\_COMERC | II | II.C | carteira | comercial | total | total |
TAB\_II\_C1\_VL\_COMERC | II | II.C.1 | carteira | comercial | comercial | comercial |
TAB\_II\_C2\_VL\_VAREJO | II | II.C.2 | carteira | comercial | varejo | varejo |
TAB\_II\_C3\_VL\_ARREND | II | II.C.3 | carteira | comercial | arrendamento mercantil | arrendamento mercantil |
TAB\_II\_D\_VL\_SERV | II | II.D | carteira | servicos | total | total |
TAB\_II\_D1\_VL\_SERV | II | II.D.1 | carteira | servicos | servicos | servicos |
TAB\_II\_D2\_VL\_SERV\_PUBLICO | II | II.D.2 | carteira | servicos | servicos publicos | servicos publicos |
TAB\_II\_D3\_VL\_SERV\_EDUC | II | II.D.3 | carteira | servicos | servicos educacionais | servicos educacionais |
TAB\_II\_D4\_VL\_ENTRET | II | II.D.4 | carteira | servicos | servicos entretenimento | servicos entretenimento |
TAB\_II\_E\_VL\_AGRONEG | II | II.E | carteira | agronegocio | agronegocio | agronegocio |
TAB\_II\_F\_VL\_FINANC | II | II.F | carteira | financeiro | total | total |
TAB\_II\_F1\_VL\_CRED\_PESSOA | II | II.F.1 | carteira | financeiro | credito pessoal | credito pessoal |
TAB\_II\_F2\_VL\_CRED\_PESSOA\_CONSIG | II | II.F.2 | carteira | financeiro | credito pessoal consignado | credito pessoal consignado |
TAB\_II\_F3\_VL\_CRED\_CORP | II | II.F.3 | carteira | financeiro | credito corporativo | credito corporativo |
TAB\_II\_F4\_VL\_MIDMARKET | II | II.F.4 | carteira | financeiro | middle market | middle market |
TAB\_II\_F5\_VL\_VEICULO | II | II.F.5 | carteira | financeiro | veiculos | veiculos |
TAB\_II\_F6\_VL\_IMOBIL\_EMPRESA | II | II.F.6 | carteira | financeiro | carteira imobiliaria empresarial | carteira imobiliaria empresarial |
TAB\_II\_F7\_VL\_IMOBIL\_RESID | II | II.F.7 | carteira | financeiro | carteira imobiliaria residencial | carteira imobiliaria residencial |
TAB\_II\_F8\_VL\_OUTRO | II | II.F.8 | carteira | financeiro | outros | outros |
TAB\_II\_G\_VL\_CREDITO | II | II.G | carteira | cartao de credito | cartao de credito | cartao de credito |
TAB\_II\_H\_VL\_FACTOR | II | II.H | carteira | factoring | total | total |
TAB\_II\_H1\_VL\_PESSOA | II | II.H.1 | carteira | factoring | pessoal | pessoal |
TAB\_II\_H2\_VL\_CORP | II | II.H.2 | carteira | factoring | corporativo | corporativo |
TAB\_II\_I\_VL\_SETOR\_PUBLICO | II | II.I | carteira | setor publico | total | total |
TAB\_II\_I1\_VL\_PRECAT | II | II.I.1 | carteira | setor publico | precatorios | precatorios |
TAB\_II\_I2\_VL\_TRIBUT | II | II.I.2 | carteira | setor publico | creditos tributarios | creditos tributarios |
TAB\_II\_I3\_VL\_ROYALTIES | II | II.I.3 | carteira | setor publico | royalties | royalties |
TAB\_II\_I4\_VL\_OUTRO | II | II.I.4 | carteira | setor publico | outros | outros |
TAB\_II\_J\_VL\_JUDICIAL | II | II.J | carteira | acoes judiciais | acoes judiciais | acoes judiciais |
TAB\_II\_K\_VL\_MARCA | II | II.K | carteira | prop intelectual | prop intelectual | prop intelectual |
Now let’s start developing a fund’s balance sheet. We can see through the tabs_fidc database available in the dadosCVM package that the assets’ data are located in the first table:
tabs_fidcs %>%
kable(align = c('l', 'l')) %>%
kable_styling(latex_options = 'striped')
tabela | conteudo |
---|---|
I\_TODOS | informacoes cadastrais dos fundos e ativos |
II | classificacoes da carteira |
III | passivos |
IV | patrimonio liquido |
V | direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados |
VI | direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados |
VII | aquisicoes substituicoes e recompras |
IX | taxas de desconto e de juros das compras e vendas |
X\_1 | informacoes sobre cotistas por classe e serie |
X\_1\_1 | classificacoes dos cotistas |
X\_2 | informacoes sobre as cotas |
X\_3 | rentabilidade |
X\_4 | transacoes de cotas do fundo |
X\_5 | ativos classificados pela liquidez |
X\_6 | taxas de desempenho real e esperada |
X\_7 | garantias |
Let’s then gather the largest FIDC’s assets data and plot its main segments:
dados_fidc(cnpj = infos$cnpj[1],
start = infos$inicio_atv[1],
end = as.Date('2022-03-31'),
table = 'I') %>%
filter(base == 'ativo', value != 0) %>%
group_by(data, segment) %>%
summarise(valor = sum(value)) %>%
ggplot() +
aes(x = data, y = valor, fill = segment) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"
We can also analyze the items within each asset segment:
dados_fidc(cnpj = infos$cnpj[1],
start = infos$inicio_atv[1],
end = as.Date('2022-03-31'),
table = 'I') %>%
filter(base == 'ativo', value != 0) %>%
group_by(data, segment, item) %>%
summarise(valor = sum(value)) %>%
ggplot() +
aes(x = data, y = valor, fill = item) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "MM", scale = 1e-6)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal() +
facet_wrap(~segment, scales = 'free_y')
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"