dadosCVM

The goal of dadosCVM is to obtain and adjust data of investments funds registered on Comissão de Valores Mobiliários (Security and Exchange Commissions from Brazil). Data is collected from files available at http://dados.cvm.gov.br/dados/.

Main goals/ideas are descripted below:

cad_fi(): Download the latest Registration Data (‘Dados Cadastrais’) available on CVM at http://dados.cvm.gov.br/dados/FI/CAD/DADOS/ - 90%;
dados_diarios(): Download the daily fund information (CNPJ, fund raisings and withdraws, investors’ amount, equity value, share value) of selected funds between two dates. Data is available at http://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/ - under development;
dados_fidc(): Develop a function that downloads monthly FIDC data between two dates from a specified table of the FIDC files located at http://dados.cvm.gov.br/dados/FIDC/DOC/INF_MENSAL/DADOS/ - under development;
simpl_names(): Develop a function that simplifies funds names (i.e. transforms ‘FUNDO DE INVESTIMENTO EM DIREITOS CREDITORIOS’ to ‘FIDC’) - 0%.

Installation

You can install the development version of dadosCVM from GitHub with:

# install.packages("devtools")
devtools::install_github("lucasan93/dadosCVM")

Examples

1: Downloading the latest registration data from CVM:

The function cad_fi() downloads the latest registration data available on CVM.

library(dadosCVM)
library(dplyr)
library(kableExtra)

Below the top-10 funds in operation are extracted, sorted by their equity and identified by their CNPJ (Brazil National Registry of Legal Entities number). Note that the data.frame contains 40 columns and more than 60 thousand observations.

dados_cadastrais <- cad_fi() %>%
                      filter(situacao == 'EM FUNCIONAMENTO NORMAL') %>% 
                      select(cnpj,
                             classe,
                             tipo,
                             inicio_atv,
                             pl_data,
                             pl) %>% 
                      arrange(desc(pl)) %>% 
                      slice_head(n = 10) %>% 
                      mutate(pl = format(pl, big.mark = ','))

kable(dados_cadastrais)

cnpj	classe	tipo	inicio\_atv	pl\_data	pl
01.608.573/0001-65	Fundo de Renda Fixa	FI	1997-01-02	2022-03-23	159,651,227,089
27.146.328/0001-77	Fundo de Renda Fixa	FI	2017-05-03	2022-03-23	137,429,549,362
07.593.972/0001-86	Fundo de Renda Fixa	FI	2005-09-21	2022-03-23	122,607,768,840
22.985.157/0001-56	FIP Multi	FIP	2015-09-01	2021-12-31	107,568,311,434
01.597.187/0001-15	Fundo de Renda Fixa	FI	1996-12-30	2022-03-23	95,925,774,092
00.822.055/0001-87	Fundo de Renda Fixa	FI	1995-10-02	2022-03-23	94,753,353,219
42.592.302/0001-46	Fundo de Renda Fixa	FI	2021-09-30	2022-03-23	72,417,562,362
42.592.315/0001-15	Fundo de Renda Fixa	FI	2021-09-30	2022-03-23	72,412,668,571
04.288.966/0001-27	Fundo de Renda Fixa	FI	2002-07-15	2022-03-23	59,114,466,417
03.737.219/0001-66	Fundo de Renda Fixa	FI	2005-08-24	2022-03-23	55,208,583,595

2: Obtaining daily data

The function dados_diarios() downloads daily data (CNPJ, portfolio value, share value, equity, withdraws, fund raisings, and number of investors) for specifics fund (identified by theis CNPJs) between two given dates. Note that daily data is only available for funds of type ‘FI’ and that initial date must be greater than 2005-01-01.

library(dadosCVM)
library(dplyr)
library(ggplot2)
library(scales)

Below we obtain data from funds displayed in the previous example and plot their equity value over time:

dados_diarios(cnpj  = dados_cadastrais$cnpj,
              start = as.Date('2005-01-01'),
              end   = as.Date('2022-03-01')) %>% 
      select(data, cnpj, pl) %>% 
      ggplot() +
      aes(x = data, y = pl, color = cnpj) +
      geom_line() +
      scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
      xlab('Date') +
      ylab('Equity (BRL)') +
      theme_minimal()
#> [1] "Obtaining data between 2021-01-01 and 2022-03-01"
#> [1] "Obtaining data between 2005-01-01 and 2022-03-01"

3: Obtaining FIDC’s monthly portfolio

The function dados_fidc() downloads monthly data for a group of funds (identified by theis CNPJs) between two given dates and for a specified database (identified in CVM’s FIDC’s META file) out of 16 different databases. Initial date must be greater than 2013-01-01.

This function is currently under development. I’m working on simplifying column names and adding the opting to pivot the data.frame format from wide to long.

library(dadosCVM)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(scales)
library(kableExtra)

First, let’s select the top-10 FIDCs in operation with the highest equity value and extract their CNPJs and the operations’ start date using the cad_fi function:

infos <- cad_fi() %>% 
          filter(situacao == 'EM FUNCIONAMENTO NORMAL',
                 tipo     == 'FIDC') %>% 
          arrange(desc(pl)) %>% 
          slice_head(n = 10) %>% 
          select(cnpj,
                 inicio_atv)

Now we obtain the database which provides us with information about the top-10 funds’ aggregated portfolio. CVM has a ‘META’ file describing the content of each FIDC database. A quick look at it and we find that the one we are interested in is the ‘II’ table. Soon the package will provide a reference database so this search can be done quicker. Note that although the oldest fund started in 2009, the function automatically set the start date to 2013-01-01 since there’s no data prior to that date. Note that the function pivots the CVM databases to longer and provides two identification columns (‘segment’ and ‘item’), which greatly facilitates analysis and plots.

dados_fidc(cnpj  = infos$cnpj,
           start = min(infos$inicio_atv),
           end   = as.Date('2022-02-01'),
           table = 'II') %>% 
        replace(is.na(.), 0) %>% 
        filter(value > 0) %>% 
        group_by(data, segment) %>% 
        summarise(value = sum(value)) %>% 
        ggplot() +
        aes(x = data, y = value, fill = segment) +
        geom_bar(position = 'stack', stat     = 'identity') +
        scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
        xlab('Date') +
        ylab('Value (BRL)') +
        theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"

If we are interested in analyzing the items within a specific segment, let’s say ‘financeiro’, we can plot the following:

dados_fidc(cnpj  = infos$cnpj,
           start = min(infos$inicio_atv),
           end   = as.Date('2022-02-01'),
           table = 'II') %>% 
        replace(is.na(.), 0) %>% 
        filter(value > 0, segment == 'financeiro') %>% 
        group_by(data, item) %>% 
        summarise(value = sum(value)) %>% 
        ggplot() +
        aes(x = data, y = value, fill = item) +
        geom_bar(position = 'stack', stat     = 'identity') +
        scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
        xlab('Date') +
        ylab('Value (BRL)') +
        theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"

And in order to verify how the segments and its items are organized based on their CVM categories, take a look at the defs_fidc database within the package. Note that the observations in which item == ‘total’ are excluded when running the dados_fidc() function in order to avoid double counting.

defs_fidcs %>% 
    filter(str_detect(category, '^TAB_II_')) %>% 
    kable(align = c('l', 'l')) %>% 
    kable_styling(latex_options = 'striped')

category	table	id	base	segment	class	item
TAB\_II\_VL\_CARTEIRA	II	II	carteira	carteira	total	total
TAB\_II\_A\_VL\_INDUST	II	II.A	carteira	industrial	industrial	industrial
TAB\_II\_B\_VL\_IMOBIL	II	II.B	carteira	imobiliario	imobiliario	imobiliario
TAB\_II\_C\_VL\_COMERC	II	II.C	carteira	comercial	total	total
TAB\_II\_C1\_VL\_COMERC	II	II.C.1	carteira	comercial	comercial	comercial
TAB\_II\_C2\_VL\_VAREJO	II	II.C.2	carteira	comercial	varejo	varejo
TAB\_II\_C3\_VL\_ARREND	II	II.C.3	carteira	comercial	arrendamento mercantil	arrendamento mercantil
TAB\_II\_D\_VL\_SERV	II	II.D	carteira	servicos	total	total
TAB\_II\_D1\_VL\_SERV	II	II.D.1	carteira	servicos	servicos	servicos
TAB\_II\_D2\_VL\_SERV\_PUBLICO	II	II.D.2	carteira	servicos	servicos publicos	servicos publicos
TAB\_II\_D3\_VL\_SERV\_EDUC	II	II.D.3	carteira	servicos	servicos educacionais	servicos educacionais
TAB\_II\_D4\_VL\_ENTRET	II	II.D.4	carteira	servicos	servicos entretenimento	servicos entretenimento
TAB\_II\_E\_VL\_AGRONEG	II	II.E	carteira	agronegocio	agronegocio	agronegocio
TAB\_II\_F\_VL\_FINANC	II	II.F	carteira	financeiro	total	total
TAB\_II\_F1\_VL\_CRED\_PESSOA	II	II.F.1	carteira	financeiro	credito pessoal	credito pessoal
TAB\_II\_F2\_VL\_CRED\_PESSOA\_CONSIG	II	II.F.2	carteira	financeiro	credito pessoal consignado	credito pessoal consignado
TAB\_II\_F3\_VL\_CRED\_CORP	II	II.F.3	carteira	financeiro	credito corporativo	credito corporativo
TAB\_II\_F4\_VL\_MIDMARKET	II	II.F.4	carteira	financeiro	middle market	middle market
TAB\_II\_F5\_VL\_VEICULO	II	II.F.5	carteira	financeiro	veiculos	veiculos
TAB\_II\_F6\_VL\_IMOBIL\_EMPRESA	II	II.F.6	carteira	financeiro	carteira imobiliaria empresarial	carteira imobiliaria empresarial
TAB\_II\_F7\_VL\_IMOBIL\_RESID	II	II.F.7	carteira	financeiro	carteira imobiliaria residencial	carteira imobiliaria residencial
TAB\_II\_F8\_VL\_OUTRO	II	II.F.8	carteira	financeiro	outros	outros
TAB\_II\_G\_VL\_CREDITO	II	II.G	carteira	cartao de credito	cartao de credito	cartao de credito
TAB\_II\_H\_VL\_FACTOR	II	II.H	carteira	factoring	total	total
TAB\_II\_H1\_VL\_PESSOA	II	II.H.1	carteira	factoring	pessoal	pessoal
TAB\_II\_H2\_VL\_CORP	II	II.H.2	carteira	factoring	corporativo	corporativo
TAB\_II\_I\_VL\_SETOR\_PUBLICO	II	II.I	carteira	setor publico	total	total
TAB\_II\_I1\_VL\_PRECAT	II	II.I.1	carteira	setor publico	precatorios	precatorios
TAB\_II\_I2\_VL\_TRIBUT	II	II.I.2	carteira	setor publico	creditos tributarios	creditos tributarios
TAB\_II\_I3\_VL\_ROYALTIES	II	II.I.3	carteira	setor publico	royalties	royalties
TAB\_II\_I4\_VL\_OUTRO	II	II.I.4	carteira	setor publico	outros	outros
TAB\_II\_J\_VL\_JUDICIAL	II	II.J	carteira	acoes judiciais	acoes judiciais	acoes judiciais
TAB\_II\_K\_VL\_MARCA	II	II.K	carteira	prop intelectual	prop intelectual	prop intelectual

4: Obtaining FIDC’s assets

Now let’s start developing a fund’s balance sheet. We can see through the tabs_fidc database available in the dadosCVM package that the assets’ data are located in the first table:

tabs_fidcs %>% 
kable(align = c('l', 'l')) %>% 
    kable_styling(latex_options = 'striped')

tabela	conteudo
I\_TODOS	informacoes cadastrais dos fundos e ativos
II	classificacoes da carteira
III	passivos
IV	patrimonio liquido
V	direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados
VI	direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados
VII	aquisicoes substituicoes e recompras
IX	taxas de desconto e de juros das compras e vendas
X\_1	informacoes sobre cotistas por classe e serie
X\_1\_1	classificacoes dos cotistas
X\_2	informacoes sobre as cotas
X\_3	rentabilidade
X\_4	transacoes de cotas do fundo
X\_5	ativos classificados pela liquidez
X\_6	taxas de desempenho real e esperada
X\_7	garantias

Let’s then gather the largest FIDC’s assets data and plot its main segments:

dados_fidc(cnpj  = infos$cnpj[1],
           start = infos$inicio_atv[1],
           end   = as.Date('2022-03-31'),
           table = 'I') %>%
      filter(base == 'ativo', value != 0) %>% 
      group_by(data, segment) %>% 
              summarise(valor = sum(value)) %>% 
      ggplot() +
      aes(x = data, y = valor, fill = segment) +
      geom_bar(position = 'stack', stat     = 'identity') +
      scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
      xlab('Date') +
      ylab('Value (BRL)') +
      theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"

We can also analyze the items within each asset segment:

dados_fidc(cnpj  = infos$cnpj[1],
           start = infos$inicio_atv[1],
           end   = as.Date('2022-03-31'),
           table = 'I') %>%
      filter(base == 'ativo', value != 0) %>% 
      group_by(data, segment, item) %>% 
              summarise(valor = sum(value)) %>% 
      ggplot() +
      aes(x = data, y = valor, fill = item) +
      geom_bar(position = 'stack', stat     = 'identity') +
      scale_y_continuous(labels = unit_format(unit = "MM", scale = 1e-6)) +
      xlab('Date') +
      ylab('Value (BRL)') +
      theme_minimal() +
      facet_wrap(~segment, scales = 'free_y')
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dadosCVM

Installation

Examples

1: Downloading the latest registration data from CVM:

2: Obtaining daily data

3: Obtaining FIDC’s monthly portfolio

4: Obtaining FIDC’s assets

Files

README.md

Latest commit

History

README.md

File metadata and controls

dadosCVM

Installation

Examples

1: Downloading the latest registration data from CVM:

2: Obtaining daily data

3: Obtaining FIDC’s monthly portfolio

4: Obtaining FIDC’s assets