Skip to content

Obtains investment fund's data from CVM

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

lucasan93/dadosCVM

Repository files navigation

dadosCVM

The goal of dadosCVM is to obtain and adjust data of investments funds registered on Comissão de Valores Mobiliários (Security and Exchange Commissions from Brazil). Data is collected from files available at http://dados.cvm.gov.br/dados/.

Main goals/ideas are descripted below:

  • cad_fi(): Download the latest Registration Data (‘Dados Cadastrais’) available on CVM at http://dados.cvm.gov.br/dados/FI/CAD/DADOS/ - 90%;
  • dados_diarios(): Download the daily fund information (CNPJ, fund raisings and withdraws, investors’ amount, equity value, share value) of selected funds between two dates. Data is available at http://dados.cvm.gov.br/dados/FI/DOC/INF_DIARIO/DADOS/ - under development;
  • dados_fidc(): Develop a function that downloads monthly FIDC data between two dates from a specified table of the FIDC files located at http://dados.cvm.gov.br/dados/FIDC/DOC/INF_MENSAL/DADOS/ - under development;
  • simpl_names(): Develop a function that simplifies funds names (i.e. transforms ‘FUNDO DE INVESTIMENTO EM DIREITOS CREDITORIOS’ to ‘FIDC’) - 0%.

Installation

You can install the development version of dadosCVM from GitHub with:

# install.packages("devtools")
devtools::install_github("lucasan93/dadosCVM")

Examples

1: Downloading the latest registration data from CVM:

The function cad_fi() downloads the latest registration data available on CVM.

library(dadosCVM)
library(dplyr)
library(kableExtra)

Below the top-10 funds in operation are extracted, sorted by their equity and identified by their CNPJ (Brazil National Registry of Legal Entities number). Note that the data.frame contains 40 columns and more than 60 thousand observations.

dados_cadastrais <- cad_fi() %>%
                      filter(situacao == 'EM FUNCIONAMENTO NORMAL') %>% 
                      select(cnpj,
                             classe,
                             tipo,
                             inicio_atv,
                             pl_data,
                             pl) %>% 
                      arrange(desc(pl)) %>% 
                      slice_head(n = 10) %>% 
                      mutate(pl = format(pl, big.mark = ','))

kable(dados_cadastrais)
cnpj classe tipo inicio\_atv pl\_data pl
01.608.573/0001-65 Fundo de Renda Fixa FI 1997-01-02 2022-03-23 159,651,227,089
27.146.328/0001-77 Fundo de Renda Fixa FI 2017-05-03 2022-03-23 137,429,549,362
07.593.972/0001-86 Fundo de Renda Fixa FI 2005-09-21 2022-03-23 122,607,768,840
22.985.157/0001-56 FIP Multi FIP 2015-09-01 2021-12-31 107,568,311,434
01.597.187/0001-15 Fundo de Renda Fixa FI 1996-12-30 2022-03-23 95,925,774,092
00.822.055/0001-87 Fundo de Renda Fixa FI 1995-10-02 2022-03-23 94,753,353,219
42.592.302/0001-46 Fundo de Renda Fixa FI 2021-09-30 2022-03-23 72,417,562,362
42.592.315/0001-15 Fundo de Renda Fixa FI 2021-09-30 2022-03-23 72,412,668,571
04.288.966/0001-27 Fundo de Renda Fixa FI 2002-07-15 2022-03-23 59,114,466,417
03.737.219/0001-66 Fundo de Renda Fixa FI 2005-08-24 2022-03-23 55,208,583,595

2: Obtaining daily data

The function dados_diarios() downloads daily data (CNPJ, portfolio value, share value, equity, withdraws, fund raisings, and number of investors) for specifics fund (identified by theis CNPJs) between two given dates. Note that daily data is only available for funds of type ‘FI’ and that initial date must be greater than 2005-01-01.

library(dadosCVM)
library(dplyr)
library(ggplot2)
library(scales)

Below we obtain data from funds displayed in the previous example and plot their equity value over time:

dados_diarios(cnpj  = dados_cadastrais$cnpj,
              start = as.Date('2005-01-01'),
              end   = as.Date('2022-03-01')) %>% 
      select(data, cnpj, pl) %>% 
      ggplot() +
      aes(x = data, y = pl, color = cnpj) +
      geom_line() +
      scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
      xlab('Date') +
      ylab('Equity (BRL)') +
      theme_minimal()
#> [1] "Obtaining data between 2021-01-01 and 2022-03-01"
#> [1] "Obtaining data between 2005-01-01 and 2022-03-01"

3: Obtaining FIDC’s monthly portfolio

The function dados_fidc() downloads monthly data for a group of funds (identified by theis CNPJs) between two given dates and for a specified database (identified in CVM’s FIDC’s META file) out of 16 different databases. Initial date must be greater than 2013-01-01.

This function is currently under development. I’m working on simplifying column names and adding the opting to pivot the data.frame format from wide to long.

library(dadosCVM)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(scales)
library(kableExtra)

First, let’s select the top-10 FIDCs in operation with the highest equity value and extract their CNPJs and the operations’ start date using the cad_fi function:

infos <- cad_fi() %>% 
          filter(situacao == 'EM FUNCIONAMENTO NORMAL',
                 tipo     == 'FIDC') %>% 
          arrange(desc(pl)) %>% 
          slice_head(n = 10) %>% 
          select(cnpj,
                 inicio_atv)

Now we obtain the database which provides us with information about the top-10 funds’ aggregated portfolio. CVM has a ‘META’ file describing the content of each FIDC database. A quick look at it and we find that the one we are interested in is the ‘II’ table. Soon the package will provide a reference database so this search can be done quicker. Note that although the oldest fund started in 2009, the function automatically set the start date to 2013-01-01 since there’s no data prior to that date. Note that the function pivots the CVM databases to longer and provides two identification columns (‘segment’ and ‘item’), which greatly facilitates analysis and plots.

dados_fidc(cnpj  = infos$cnpj,
           start = min(infos$inicio_atv),
           end   = as.Date('2022-02-01'),
           table = 'II') %>% 
        replace(is.na(.), 0) %>% 
        filter(value > 0) %>% 
        group_by(data, segment) %>% 
        summarise(value = sum(value)) %>% 
        ggplot() +
        aes(x = data, y = value, fill = segment) +
        geom_bar(position = 'stack', stat     = 'identity') +
        scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
        xlab('Date') +
        ylab('Value (BRL)') +
        theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"

If we are interested in analyzing the items within a specific segment, let’s say ‘financeiro’, we can plot the following:

dados_fidc(cnpj  = infos$cnpj,
           start = min(infos$inicio_atv),
           end   = as.Date('2022-02-01'),
           table = 'II') %>% 
        replace(is.na(.), 0) %>% 
        filter(value > 0, segment == 'financeiro') %>% 
        group_by(data, item) %>% 
        summarise(value = sum(value)) %>% 
        ggplot() +
        aes(x = data, y = value, fill = item) +
        geom_bar(position = 'stack', stat     = 'identity') +
        scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
        xlab('Date') +
        ylab('Value (BRL)') +
        theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"

And in order to verify how the segments and its items are organized based on their CVM categories, take a look at the defs_fidc database within the package. Note that the observations in which item == ‘total’ are excluded when running the dados_fidc() function in order to avoid double counting.

defs_fidcs %>% 
    filter(str_detect(category, '^TAB_II_')) %>% 
    kable(align = c('l', 'l')) %>% 
    kable_styling(latex_options = 'striped')
category table id base segment class item
TAB\_II\_VL\_CARTEIRA II II carteira carteira total total
TAB\_II\_A\_VL\_INDUST II II.A carteira industrial industrial industrial
TAB\_II\_B\_VL\_IMOBIL II II.B carteira imobiliario imobiliario imobiliario
TAB\_II\_C\_VL\_COMERC II II.C carteira comercial total total
TAB\_II\_C1\_VL\_COMERC II II.C.1 carteira comercial comercial comercial
TAB\_II\_C2\_VL\_VAREJO II II.C.2 carteira comercial varejo varejo
TAB\_II\_C3\_VL\_ARREND II II.C.3 carteira comercial arrendamento mercantil arrendamento mercantil
TAB\_II\_D\_VL\_SERV II II.D carteira servicos total total
TAB\_II\_D1\_VL\_SERV II II.D.1 carteira servicos servicos servicos
TAB\_II\_D2\_VL\_SERV\_PUBLICO II II.D.2 carteira servicos servicos publicos servicos publicos
TAB\_II\_D3\_VL\_SERV\_EDUC II II.D.3 carteira servicos servicos educacionais servicos educacionais
TAB\_II\_D4\_VL\_ENTRET II II.D.4 carteira servicos servicos entretenimento servicos entretenimento
TAB\_II\_E\_VL\_AGRONEG II II.E carteira agronegocio agronegocio agronegocio
TAB\_II\_F\_VL\_FINANC II II.F carteira financeiro total total
TAB\_II\_F1\_VL\_CRED\_PESSOA II II.F.1 carteira financeiro credito pessoal credito pessoal
TAB\_II\_F2\_VL\_CRED\_PESSOA\_CONSIG II II.F.2 carteira financeiro credito pessoal consignado credito pessoal consignado
TAB\_II\_F3\_VL\_CRED\_CORP II II.F.3 carteira financeiro credito corporativo credito corporativo
TAB\_II\_F4\_VL\_MIDMARKET II II.F.4 carteira financeiro middle market middle market
TAB\_II\_F5\_VL\_VEICULO II II.F.5 carteira financeiro veiculos veiculos
TAB\_II\_F6\_VL\_IMOBIL\_EMPRESA II II.F.6 carteira financeiro carteira imobiliaria empresarial carteira imobiliaria empresarial
TAB\_II\_F7\_VL\_IMOBIL\_RESID II II.F.7 carteira financeiro carteira imobiliaria residencial carteira imobiliaria residencial
TAB\_II\_F8\_VL\_OUTRO II II.F.8 carteira financeiro outros outros
TAB\_II\_G\_VL\_CREDITO II II.G carteira cartao de credito cartao de credito cartao de credito
TAB\_II\_H\_VL\_FACTOR II II.H carteira factoring total total
TAB\_II\_H1\_VL\_PESSOA II II.H.1 carteira factoring pessoal pessoal
TAB\_II\_H2\_VL\_CORP II II.H.2 carteira factoring corporativo corporativo
TAB\_II\_I\_VL\_SETOR\_PUBLICO II II.I carteira setor publico total total
TAB\_II\_I1\_VL\_PRECAT II II.I.1 carteira setor publico precatorios precatorios
TAB\_II\_I2\_VL\_TRIBUT II II.I.2 carteira setor publico creditos tributarios creditos tributarios
TAB\_II\_I3\_VL\_ROYALTIES II II.I.3 carteira setor publico royalties royalties
TAB\_II\_I4\_VL\_OUTRO II II.I.4 carteira setor publico outros outros
TAB\_II\_J\_VL\_JUDICIAL II II.J carteira acoes judiciais acoes judiciais acoes judiciais
TAB\_II\_K\_VL\_MARCA II II.K carteira prop intelectual prop intelectual prop intelectual

4: Obtaining FIDC’s assets

Now let’s start developing a fund’s balance sheet. We can see through the tabs_fidc database available in the dadosCVM package that the assets’ data are located in the first table:

tabs_fidcs %>% 
kable(align = c('l', 'l')) %>% 
    kable_styling(latex_options = 'striped')
tabela conteudo
I\_TODOS informacoes cadastrais dos fundos e ativos
II classificacoes da carteira
III passivos
IV patrimonio liquido
V direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados
VI direitos creditorios classificados por prazo de vencimento inadimplencia e pagamentos antecipados
VII aquisicoes substituicoes e recompras
IX taxas de desconto e de juros das compras e vendas
X\_1 informacoes sobre cotistas por classe e serie
X\_1\_1 classificacoes dos cotistas
X\_2 informacoes sobre as cotas
X\_3 rentabilidade
X\_4 transacoes de cotas do fundo
X\_5 ativos classificados pela liquidez
X\_6 taxas de desempenho real e esperada
X\_7 garantias

Let’s then gather the largest FIDC’s assets data and plot its main segments:

dados_fidc(cnpj  = infos$cnpj[1],
           start = infos$inicio_atv[1],
           end   = as.Date('2022-03-31'),
           table = 'I') %>%
      filter(base == 'ativo', value != 0) %>% 
      group_by(data, segment) %>% 
              summarise(valor = sum(value)) %>% 
      ggplot() +
      aes(x = data, y = valor, fill = segment) +
      geom_bar(position = 'stack', stat     = 'identity') +
      scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
      xlab('Date') +
      ylab('Value (BRL)') +
      theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"

We can also analyze the items within each asset segment:

dados_fidc(cnpj  = infos$cnpj[1],
           start = infos$inicio_atv[1],
           end   = as.Date('2022-03-31'),
           table = 'I') %>%
      filter(base == 'ativo', value != 0) %>% 
      group_by(data, segment, item) %>% 
              summarise(valor = sum(value)) %>% 
      ggplot() +
      aes(x = data, y = valor, fill = item) +
      geom_bar(position = 'stack', stat     = 'identity') +
      scale_y_continuous(labels = unit_format(unit = "MM", scale = 1e-6)) +
      xlab('Date') +
      ylab('Value (BRL)') +
      theme_minimal() +
      facet_wrap(~segment, scales = 'free_y')
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"

About

Obtains investment fund's data from CVM

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages