Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance could be better #40

Open
jgm opened this issue Jan 26, 2020 · 9 comments
Open

Performance could be better #40

jgm opened this issue Jan 26, 2020 · 9 comments
Labels
help wanted Extra attention is needed

Comments

@jgm
Copy link
Contributor

jgm commented Jan 26, 2020

One pandoc user has run into an issue with a large (100k line) bibliography in YAML format (for details see jgm/pandoc#6084). Prior to pandoc 2.8 (when we used the yaml package), this was handled fairly quickly, but now that we use HsYAML it takes 18 seconds to read the bibliography. I confirmed that the slowdown is due to HsYAML, by loading the file in a GHCI session as b and trying

GHCI> :set +s
GHCI> let x = decodeNode b in x `seq` 3 -- this is just to ensure it's evaluated
(25.28 secs, 82,135,579,376 bytes)

What are the performance expectations for HsYAML? Have you made efforts to optimize here? aeson claimed decoding speeds of 46M/sec on a slower machine than mine; this file is 3M. I wouldn't expect that YAML parsing could be as fast as JSON parsing, but it would be nice to get in the 4M/sec range (10x slower than aeson).

EDIT: 82G allocated with 1G max residency seems an awful lot to parse a 3M file!

Profiling reports these as the biggest cost centers:

applyParser                  Data.YAML.Token src/Data/YAML/Token.hs:220:1-30             32.5    0.4
*>.\                         Data.YAML.Token src/Data/YAML/Token.hs:(435,5)-(439,67)      9.4   25.3
^.                           Data.YAML.Token src/Data/YAML/Token.hs:73:1-30               9.3    0.8
&                            Data.YAML.Token src/Data/YAML/Token.hs:567:1-44              4.3    4.0
<|>.decideParser             Data.YAML.Token src/Data/YAML/Token.hs:(599,7)-(609,95)      3.9    5.2
nextIf.consumeNextIf         Data.YAML.Token src/Data/YAML/Token.hs:(791,5)-(817,52)      3.1    0.4
prefixErrorWith.\            Data.YAML.Token src/Data/YAML/Token.hs:(913,5)-(917,95)      2.7    7.8
prefixErrorWith.\.reply      Data.YAML.Token src/Data/YAML/Token.hs:913:9-49              2.4    0.0
append                       Data.DList      src/Data/DList.hs:34:1-46                    1.8    2.7
/                            Data.YAML.Token src/Data/YAML/Token.hs:572:1-68              1.8    0.1
reject.\                     Data.YAML.Token src/Data/YAML/Token.hs:673:5-67              1.7    8.1
*>                           Data.YAML.Token src/Data/YAML/Token.hs:(434,3)-(439,67)      1.6    0.0
returnReply                  Data.YAML.Token src/Data/YAML/Token.hs:(387,1)-(390,52)      1.5    7.9

Heap profiling shows that the DLists account for a lot of the allocation.

@jgm
Copy link
Contributor Author

jgm commented Feb 13, 2020

Unfortunately, unless something can be done, this issue is probably going to force me to switch back to using yaml in pandoc, which I'm unhappy about -- but people have some large YAML files to process. I tried doing some profiling with explicit SCC annotations. This seems to indicate that most of the time is spent in Data.YAML.Token c_l_block_seq_entry, which I suppose is what you'd expect for this input, but I wasn't yet able to pin it down further and nothing obvious has jumped out...

@hasufell
Copy link

Performance is 10 times worse than yaml package: https://gitlab.haskell.org/haskell/ghcup-hs/-/issues/270

@hasufell
Copy link

My tests seem to indicate that it's Data.DList.toList:

P4FxHnF

@jgm
Copy link
Contributor Author

jgm commented Oct 20, 2021

There's no problem with dlist, as far as I can see, so this profile doesn't tell us where the problem really lies. I tried replacing dlist with Data.Sequence from containers (which is a dependency of this package anyway), and this didn't affect performance significantly. After that, profiling says

97.3  97.3  96.7            Data.YAML.Token tokenize (0)

@sjakobi sjakobi added the help wanted Extra attention is needed label May 11, 2022
@sjakobi
Copy link
Collaborator

sjakobi commented May 11, 2022

It would be nice to make progress on this issue. Maybe the -fprof-late-ccs option announced for GHC 9.4 could help getting more insights on this.

@sjakobi
Copy link
Collaborator

sjakobi commented May 11, 2022

I've had a brief look at the Core of Data.Yaml.Token. What I noticed so far is that there's a lot of reboxing of Replys. For example:


-- RHS size: {terms: 12, types: 23, coercions: 6, joins: 0/0}
tokenize135 :: State -> Reply ()
[GblId,
 Arity=1,
 Str=<L>,
 Cpr=1,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=False)}]
tokenize135
  = \ (w :: State) ->
      case $w$c*>
             @() @() (tokenize153 `cast` <Co:3>) (tokenize136 `cast` <Co:3>) w
      of
      { (# ww1, ww2, ww3, ww4 #) ->
      Reply @() ww1 ww2 ww3 ww4
      }

-- RHS size: {terms: 12, types: 23, coercions: 6, joins: 0/0}
tokenize134 :: State -> Reply ()
[GblId,
 Arity=1,
 Str=<L>,
 Cpr=1,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=False)}]
tokenize134
  = \ (w :: State) ->
      case $w$c*>
             @() @() (tokenize154 `cast` <Co:3>) (tokenize135 `cast` <Co:3>) w
      of
      { (# ww1, ww2, ww3, ww4 #) ->
      Reply @() ww1 ww2 ww3 ww4
      }

I also thought that it was weird that *> isn't inlined. I'm not sure whether this is entirely prevented by its recursive nature or whether an INLINE pragma or just -O2 could fix that.

Rec {
-- RHS size: {terms: 16, types: 29, coercions: 0, joins: 0/0}
$fApplicativeParser2 [InlPrag=[2]]
  :: forall {a} {b}. Parser a -> Parser b -> State -> Reply b
[GblId,
 Arity=3,
 Str=<1C1(P(1L,L,L,L))><L><L>,
 Cpr=1,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)}]
$fApplicativeParser2
  = \ (@a) (@b) (w :: Parser a) (w1 :: Parser b) (w2 :: State) ->
      case $w$c*> @a @b w w1 w2 of { (# ww1, ww2, ww3, ww4 #) ->
      Reply @b ww1 ww2 ww3 ww4
      }

-- RHS size: {terms: 34, types: 60, coercions: 5, joins: 0/0}
$w$c*> [InlPrag=[2], Occ=LoopBreaker]
  :: forall {a} {b}.
     Parser a
     -> Parser b
     -> State
     -> (# Result b, DList Token, Maybe Decision, State #)
[GblId, Arity=3, Str=<1C1(P(1L,L,L,L))><L><L>, Unf=OtherCon []]
$w$c*>
  = \ (@a) (@b) (w :: Parser a) (w1 :: Parser b) (w2 :: State) ->
      case (w `cast` <Co:2>) w2 of { Reply ds206 ds207 ds208 ds209 ->
      case ds206 of {
        Failed message2 -> (# Failed @b message2, ds207, ds208, ds209 #);
        Result ds210 -> (# More @b w1, ds207, ds208, ds209 #);
        More parser6 ->
          (# More @b (($fApplicativeParser2 @a @b parser6 w1) `cast` <Co:3>),
             ds207, ds208, ds209 #)
      }
      }
end Rec }

Maybe it would also be helpful to define Reply as an unlifted type.

I also noticed that this package uses parsec instead of megaparsec which should be more optimized.

@sjakobi
Copy link
Collaborator

sjakobi commented May 11, 2022

https://wg21.link/index.yaml could be used for benchmarking.

@sjakobi
Copy link
Collaborator

sjakobi commented May 25, 2022

It would be nice to make progress on this issue. Maybe the -fprof-late-ccs option announced for GHC 9.4 could help getting more insights on this.

I've given this a spin, building with cabal build -w ghc-9.4 --enable-profiling --profiling-detail=none and

--- a/HsYAML.cabal
+++ b/HsYAML.cabal
@@ -108,7 +108,7 @@ library
   if !impl(ghc >= 7.10)
     build-depends:     nats         >= 1.1.2    && < 1.2
 
-  ghc-options:         -Wall
+  ghc-options:         -Wall -fprof-late
 
 executable yaml-test
   hs-source-dirs: src-test
@@ -133,7 +133,7 @@ executable yaml-test
   else
     buildable: False
 
-  ghc-options: -rtsopts
+  ghc-options: -rtsopts -fprof-late
 
 test-suite tests
   default-language: Haskell2010

I then profiled the following command:

cat wg21.yaml | yaml-test yaml2event0 +RTS -p

…where wg21.yaml is the file from https://wg21.link/index.yaml.

Results:

$w$c*>            Data.YAML.Token <no location info>                47.1   17.3
$c*>              Data.YAML.Token <no location info>                18.6    7.4
$wdecideParser    Data.YAML.Token src/Data/YAML/Token.hs:599:7-18    5.2    4.1
$wnextIf          Data.YAML.Token <no location info>                 4.1   18.4
$sprefixErrorWith Data.YAML.Token <no location info>                 2.7    7.1
$wchoiceParser    Data.YAML.Token <no location info>                 1.9    4.0
$wrejectParser    Data.YAML.Token src/Data/YAML/Token.hs:675:5-16    1.4    0.0
choiceParser      Data.YAML.Token <no location info>                 1.3    1.2
$wfinishToken     Data.YAML.Token <no location info>                 1.2    4.2
$wwithParser      Data.YAML.Token <no location info>                 1.2    3.5
sol               Data.YAML.Token <no location info>                 1.0    1.8
value             Data.YAML.Token <no location info>                 0.9    1.8
c_forbidden       Data.YAML.Token <no location info>                 0.8    1.8
$srecovery        Data.YAML.Token <no location info>                 0.7    1.8
$w$stoken         Data.YAML.Token <no location info>                 0.6    3.8
$wemptyToken      Data.YAML.Token <no location info>                 0.5    3.0
append            Data.DList      <no location info>                 0.4    1.6
$w$stoken         Data.YAML.Token <no location info>                 0.2    1.2

I'm not quite sure what $c*> is – I can't find it in the generated Core or STG. Maybe it's an artifact of the -fprof-late mode.

STG for the other top cost centers:

$w$c*>
Rec {
$fApplicativeParser3 [InlPrag=[2]]
  :: forall {a} {b}. Parser a -> Parser b -> State -> Reply b
[GblId,
 Arity=3,
 Str=<1C1(P(1L,L,L,L))><L><L>,
 Cpr=1,
 Unf=OtherCon []] =
    {} \r [left_sknD right_sknE state_sknF]
        case
            case left_sknD of left_tnCK {
            __DEFAULT -> $w$c*> left_tnCK right_sknE state_sknF;
            }
        of
        {
        (#,,,#) ww_sknH [Occ=Once1]
                ww1_sknI [Occ=Once1]
                ww2_sknJ [Occ=Once1]
                ww3_sknK [Occ=Once1] ->
        Reply [ww_sknH ww1_sknI ww2_sknJ ww3_sknK];
        };

$w$c*> [InlPrag=[2], Occ=LoopBreaker]
  :: forall {a} {b}.
     Parser a
     -> Parser b
     -> State
     -> (# Result b, DList Token, Maybe Decision, State #)
[GblId[StrictWorker([!, ~, ~])],
 Arity=3,
 Str=<1C1(P(1L,L,L,L))><L><L>,
 Unf=OtherCon []] =
    {} \r [left_sknL right_sknM state_sknN]
        case left_sknL state_sknN of {
        Reply ds203_sknP [Occ=Once1!]
              ds204_sknQ [Occ=Once3]
              ds205_sknR [Occ=Once3]
              ds206_sknS [Occ=Once3] ->
        case ds203_sknP<TagProper> of {
          Failed message2_sknU [Occ=Once1] ->
              (#,,,#) [wild1_sknT ds204_sknQ ds205_sknR ds206_sknS];
          Result _ [Occ=Dead] ->
              let {
                sat_sknX [Occ=Once1] :: Result b_shVR
                [LclId] =
                    CCCS More! [right_sknM];
              } in  (#,,,#) [sat_sknX ds204_sknQ ds205_sknR ds206_sknS];
          More parser6_sknY [Occ=OnceL1] ->
              let {
                sat_sknZ [Occ=Once1] :: Parser b_shVR
                [LclId] =
                    {parser6_sknY, right_sknM} \r [eta_B0]
                        $fApplicativeParser3 parser6_sknY right_sknM eta_B0; } in
              let {
                sat_sko0 [Occ=Once1] :: Result b_shVR
                [LclId] =
                    CCCS More! [sat_sknZ];
              } in  (#,,,#) [sat_sko0 ds204_sknQ ds205_sknR ds206_sknS];
        };
        };
end Rec }
$wdecideParser
Rec {
$wdecideParser [InlPrag=[2], Occ=LoopBreaker]
  :: forall {result}.
     State
     -> DList Token
     -> Parser result
     -> Parser result
     -> State
     -> (# Result result, DList Token, Maybe Decision, State #)
[GblId[StrictWorker([~, ~, !, ~, ~])],
 Arity=5,
 Str=<ML><L><1C1(P(1L,L,L,L))><L><L>,
 Unf=OtherCon []] =
    {} \r [point_skbQ tokens_skbR left_skbS right_skbT state_skbU]
        case left_skbS state_skbU of {
        Reply ds203_skbW [Occ=Once1!]
              ds204_skbX [Occ=OnceL3]
              ds205_skbY [Occ=Once2!]
              ds206_skbZ [Occ=Once3] ->
        case ds203_skbW<TagProper> of wild1_skc0 [Occ=Once2] {
          Failed _ [Occ=Dead] ->
              case point_skbQ of conrep_skc2 [Occ=Once1] {
              State _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead]
                    _ [Occ=Dead] ->
              let {
                sat_skcl [Occ=Once1] :: Result result_shQT
                [LclId] =
                    CCCS More! [right_skbT];
              } in  (#,,,#) [sat_skcl id Nothing conrep_skc2];
              };
          Result _ [Occ=Dead] ->
              let {
                sat_skcp [Occ=Once1] :: DList Token
                [LclId] =
                    {ds204_skbX, tokens_skbR} \r [x_skcn]
                        let {
                          sat_skco [Occ=Once1] :: [Token]
                          [LclId] =
                              {x_skcn, ds204_skbX} \u [] ds204_skbX x_skcn;
                        } in  tokens_skbR sat_skco;
              } in  (#,,,#) [wild1_skc0 sat_skcp ds205_skbY ds206_skbZ];
          More ds207_skcq [Occ=Once1] ->
              case ds205_skbY<TagProper> of wild2_skcr [Occ=Once1] {
                Nothing ->
                    let {
                      sat_skcu [Occ=Once1] :: DList Token
                      [LclId] =
                          {ds204_skbX, tokens_skbR} \r [x_skcs]
                              let {
                                sat_skct [Occ=Once1] :: [Token]
                                [LclId] =
                                    {x_skcs, ds204_skbX} \u [] ds204_skbX x_skcs;
                              } in  tokens_skbR sat_skct;
                    } in 
                      case ds207_skcq of ds207_tnBK {
                      __DEFAULT ->
                      $wdecideParser
                          point_skbQ sat_skcu ds207_tnBK right_skbT ds206_skbZ;
                      };
                Just _ [Occ=Dead] ->
                    let {
                      sat_skcy [Occ=Once1] :: DList Token
                      [LclId] =
                          {ds204_skbX, tokens_skbR} \r [x_skcw]
                              let {
                                sat_skcx [Occ=Once1] :: [Token]
                                [LclId] =
                                    {x_skcw, ds204_skbX} \u [] ds204_skbX x_skcw;
                              } in  tokens_skbR sat_skcx;
                    } in  (#,,,#) [wild1_skc0 sat_skcy wild2_skcr ds206_skbZ];
              };
        };
        };
end Rec }
$wnextIf
$wnextIf [InlPrag=[2]]
  :: (Char -> Bool) -> State -> (# Result (), DList Token, State #)
[GblId[StrictWorker([~, !])],
 Arity=2,
 Str=<MCM(L)><1P(L,L,L,SL,L,L,L,L,L,L,L,L,L,L,L,L,L,L)>,
 Unf=OtherCon []] =
    {} \r [test_skhc eta_skhd]
        case eta_skhd<TagProper> of wild_skhe [Occ=Once2] {
        State ds203_skhf
              ds204_skhg
              bx_skhh
              ds205_skhi [Occ=Once1!]
              ds206_skhj [Occ=Once1]
              ds207_skhk
              ds208_skhl
              bx1_skhm
              bx2_skhn
              bx3_skho
              bx4_skhp
              bx5_skhq
              bx6_skhr
              bx7_skhs
              bx8_skht
              ds209_skhu
              bx9_skhv
              ds210_skhw ->
        let-no-escape {
          limitedNextIf_skhx [Occ=Once2!T[1], Dmd=MCM(!P(L,L,L))]
            :: State -> (# Result (), DList Token, State #)
          [LclId[JoinId(1)(Nothing)],
           Arity=1,
           Str=<1P(L,L,SL,L,L,L,L,L,L,L,L,L,L,L,L,L,L,L)>,
           Unf=OtherCon []] =
              {test_skhc} \r [state_skhy]
                  case state_skhy of wild1_skhz [Occ=Once2] {
                  State ds211_skhA [Occ=Once1]
                        ds212_skhB [Occ=Once1]
                        bx10_skhC [Occ=Once1!]
                        ds213_skhD [Occ=Once1]
                        ds214_skhE [Occ=Once1]
                        ds215_skhF [Occ=Once1]
                        ds216_skhG [Occ=Once1]
                        bx11_skhH [Occ=Once1]
                        bx12_skhI [Occ=Once1]
                        bx13_skhJ [Occ=Once1]
                        bx14_skhK [Occ=Once1]
                        bx15_skhL [Occ=Once1]
                        bx16_skhM [Occ=Once1]
                        bx17_skhN [Occ=Once1]
                        bx18_skhO [Occ=Once1]
                        ds217_skhP [Occ=Once1]
                        bx19_skhQ [Occ=Once1]
                        ds218_skhR [Occ=Once1] ->
                  let-no-escape {
                    consumeNextIf_skhS [Occ=Once2!T[1], Dmd=MCM(!P(L,L,L))]
                      :: State -> (# Result (), DList Token, State #)
                    [LclId[JoinId(1)(Nothing)],
                     Arity=1,
                     Str=<1P(L,L,L,L,L,L,L,L,L,L,L,L,L,L,L,L,L,SL)>,
                     Unf=OtherCon []] =
                        {test_skhc} \r [state1_skhT]
                            case state1_skhT of wild2_skhU [Occ=Once2] {
                            State ds219_skhV [Occ=Once3]
                                  ds220_skhW [Occ=Once3]
                                  bx20_skhX [Occ=Once3]
                                  ds221_skhY [Occ=Once3]
                                  ds222_skhZ
                                  ds223_ski0 [Occ=Once1]
                                  ds224_ski1
                                  bx21_ski2 [Occ=Once1]
                                  bx22_ski3 [Occ=Once1]
                                  bx23_ski4 [Occ=Once1]
                                  bx24_ski5 [Occ=Once1]
                                  bx25_ski6 [Occ=Once1]
                                  bx26_ski7
                                  bx27_ski8
                                  bx28_ski9
                                  ds225_skia [Occ=Once3]
                                  _ [Occ=Dead]
                                  ds226_skic [Occ=Once1!] ->
                            case ds226_skic<TagProper> of {
                              [] -> (#,,#) [lvl48_rjLH id wild2_skhU];
                              : ds227_skie [Occ=Once1!] rest_skif [Occ=Once3] ->
                                  case ds227_skie of {
                                  (,) offset_skih [Occ=Once3!] char_skii ->
                                  case test_skhc char_skii of {
                                    False ->
                                        let {
                                          sat_skil [Occ=Once1] :: [Char]
                                          [LclId] =
                                              {char_skii} \u []
                                                  let {
                                                    sat_skik [Occ=Once1] :: [Char]
                                                    [LclId] =
                                                        CCCS :! [char_skii lvl46_rjLF];
                                                  } in 
                                                    unpackAppendCString# tokenize5 sat_skik; } in
                                        let {
                                          sat_skim [Occ=Once1] :: Result ()
                                          [LclId] =
                                              CCCS Failed! [sat_skil];
                                        } in  (#,,#) [sat_skim id wild2_skhU];
                                    True ->
                                        case char_skii of wild6_skin [Occ=Once1] {
                                        C# x_skio ->
                                        let-no-escape {
                                          $j_skip [Occ=Once2!T[1], Dmd=1C1(!P(L,L,L))]
                                            :: Bool %1 -> (# Result (), DList Token, State #)
                                          [LclId[JoinId(1)(Nothing)],
                                           Arity=1,
                                           Str=<1L>,
                                           Unf=OtherCon []] =
                                              {ds219_skhV, ds220_skhW, bx20_skhX, ds221_skhY,
                                               bx24_ski5, bx27_ski8, ds225_skia, x_skio, bx26_ski7,
                                               bx28_ski9, rest_skif, offset_skih, ds224_ski1,
                                               ds222_skhZ, bx23_ski4, bx22_ski3, bx21_ski2,
                                               bx25_ski6, wild6_skin} \r [arg_skiq]
                                                  let-no-escape {
                                                    $j1_skir [Occ=Once2!T[1], Dmd=1C1(!P(L,L,L))]
                                                      :: [Char]
                                                         %1 -> (# Result (), DList Token, State #)
                                                    [LclId[JoinId(1)(Nothing)],
                                                     Arity=1,
                                                     Str=<1L>,
                                                     Unf=OtherCon []] =
                                                        {ds219_skhV, ds220_skhW, bx20_skhX,
                                                         ds221_skhY, bx24_ski5, bx27_ski8,
                                                         ds225_skia, x_skio, bx26_ski7, bx28_ski9,
                                                         rest_skif, offset_skih, arg_skiq,
                                                         ds224_ski1, ds222_skhZ, bx23_ski4,
                                                         bx22_ski3, bx21_ski2,
                                                         bx25_ski6} \r [arg1_skis]
                                                            let-no-escape {
                                                              $w$j_skit [InlPrag=[2],
                                                                         Occ=Once3!T[1],
                                                                         Dmd=1C1(!P(L,L,L))]
                                                                :: Int#
                                                                   %1 -> (# Result (), DList Token,
                                                                            State #)
                                                              [LclId[JoinId(1)(Nothing)],
                                                               Arity=1,
                                                               Str=<L>,
                                                               Unf=OtherCon []] =
                                                                  {ds219_skhV, ds220_skhW,
                                                                   bx20_skhX, ds221_skhY, bx24_ski5,
                                                                   bx27_ski8, ds225_skia, x_skio,
                                                                   bx26_ski7, bx28_ski9, rest_skif,
                                                                   offset_skih, arg1_skis, arg_skiq,
                                                                   ds224_ski1, ds222_skhZ,
                                                                   bx23_ski4,
                                                                   bx22_ski3} \r [ww_skiu]
                                                                      let-no-escape {
                                                                        $w$j1_skiv [InlPrag=[2],
                                                                                    Occ=Once3!T[1],
                                                                                    Dmd=1C1(!P(L,L,L))]
                                                                          :: Int#
                                                                             %1 -> (# Result (),
                                                                                      DList Token,
                                                                                      State #)
                                                                        [LclId[JoinId(1)(Nothing)],
                                                                         Arity=1,
                                                                         Str=<L>,
                                                                         Unf=OtherCon []] =
                                                                            {ds219_skhV, ds220_skhW,
                                                                             bx20_skhX, ds221_skhY,
                                                                             ww_skiu, bx24_ski5,
                                                                             bx27_ski8, ds225_skia,
                                                                             x_skio, bx26_ski7,
                                                                             bx28_ski9, rest_skif,
                                                                             offset_skih, arg1_skis,
                                                                             arg_skiq, ds224_ski1,
                                                                             ds222_skhZ,
                                                                             bx23_ski4} \r [ww1_skiw]
                                                                                let-no-escape {
                                                                                  $w$j2_skix [InlPrag=[2],
                                                                                              Occ=Once3!T[1],
                                                                                              Dmd=1C1(!P(L,L,L))]
                                                                                    :: Int#
                                                                                       %1 -> (# Result
                                                                                                  (),
                                                                                                DList
                                                                                                  Token,
                                                                                                State #)
                                                                                  [LclId[JoinId(1)(Nothing)],
                                                                                   Arity=1,
                                                                                   Str=<L>,
                                                                                   Unf=OtherCon []] =
                                                                                      {ds219_skhV,
                                                                                       ds220_skhW,
                                                                                       bx20_skhX,
                                                                                       ds221_skhY,
                                                                                       ww_skiu,
                                                                                       ww1_skiw,
                                                                                       bx24_ski5,
                                                                                       bx27_ski8,
                                                                                       ds225_skia,
                                                                                       x_skio,
                                                                                       bx26_ski7,
                                                                                       bx28_ski9,
                                                                                       rest_skif,
                                                                                       offset_skih,
                                                                                       arg1_skis,
                                                                                       arg_skiq,
                                                                                       ds224_ski1,
                                                                                       ds222_skhZ} \r [ww2_skiy]
                                                                                          case
                                                                                              ds222_skhZ<TagProper>
                                                                                          of
                                                                                          { False ->
                                                                                                case
                                                                                                    ds224_ski1<TagProper>
                                                                                                of
                                                                                                { [] ->
                                                                                                      case
                                                                                                          arg_skiq
                                                                                                      of
                                                                                                      conrep_skiB [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          arg1_skis
                                                                                                      of
                                                                                                      conrep1_skiC [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          offset_skih
                                                                                                      of
                                                                                                      {
                                                                                                      I# unbx_skiE [Occ=Once1] ->
                                                                                                      case
                                                                                                          rest_skif
                                                                                                      of
                                                                                                      conrep3_skiF [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          +# [bx28_ski9
                                                                                                              1#]
                                                                                                      of
                                                                                                      sat_skiH [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          +# [bx26_ski7
                                                                                                              1#]
                                                                                                      of
                                                                                                      sat_skiG [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      let {
                                                                                                        sat_skiI [Occ=Once1]
                                                                                                          :: State
                                                                                                        [LclId] =
                                                                                                            CCCS State! [ds219_skhV
                                                                                                                         ds220_skhW
                                                                                                                         bx20_skhX
                                                                                                                         ds221_skhY
                                                                                                                         False
                                                                                                                         conrep_skiB
                                                                                                                         conrep1_skiC
                                                                                                                         ww_skiu
                                                                                                                         ww1_skiw
                                                                                                                         ww2_skiy
                                                                                                                         bx28_ski9
                                                                                                                         unbx_skiE
                                                                                                                         sat_skiG
                                                                                                                         bx27_ski8
                                                                                                                         sat_skiH
                                                                                                                         ds225_skia
                                                                                                                         x_skio
                                                                                                                         conrep3_skiF];
                                                                                                      } in 
                                                                                                        (#,,#) [tokenize8
                                                                                                                id
                                                                                                                sat_skiI];
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                  : _ [Occ=Dead]
                                                                                                    _ [Occ=Dead] ->
                                                                                                      case
                                                                                                          arg_skiq
                                                                                                      of
                                                                                                      conrep_skiL [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          arg1_skis
                                                                                                      of
                                                                                                      conrep1_skiM [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          offset_skih
                                                                                                      of
                                                                                                      {
                                                                                                      I# unbx_skiO [Occ=Once1] ->
                                                                                                      case
                                                                                                          rest_skif
                                                                                                      of
                                                                                                      conrep3_skiP [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          +# [bx28_ski9
                                                                                                              1#]
                                                                                                      of
                                                                                                      sat_skiR [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      case
                                                                                                          +# [bx26_ski7
                                                                                                              1#]
                                                                                                      of
                                                                                                      sat_skiQ [Occ=Once1]
                                                                                                      {
                                                                                                      __DEFAULT ->
                                                                                                      let {
                                                                                                        sat_skiS [Occ=Once1]
                                                                                                          :: State
                                                                                                        [LclId] =
                                                                                                            CCCS State! [ds219_skhV
                                                                                                                         ds220_skhW
                                                                                                                         bx20_skhX
                                                                                                                         ds221_skhY
                                                                                                                         False
                                                                                                                         conrep_skiL
                                                                                                                         conrep1_skiM
                                                                                                                         ww_skiu
                                                                                                                         ww1_skiw
                                                                                                                         ww2_skiy
                                                                                                                         bx24_ski5
                                                                                                                         unbx_skiO
                                                                                                                         sat_skiQ
                                                                                                                         bx27_ski8
                                                                                                                         sat_skiR
                                                                                                                         ds225_skia
                                                                                                                         x_skio
                                                                                                                         conrep3_skiP];
                                                                                                      } in 
                                                                                                        (#,,#) [tokenize8
                                                                                                                id
                                                                                                                sat_skiS];
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                      };
                                                                                                };
                                                                                            True ->
                                                                                                case
                                                                                                    arg_skiq
                                                                                                of
                                                                                                conrep_skiT [Occ=Once1]
                                                                                                {
                                                                                                __DEFAULT ->
                                                                                                case
                                                                                                    arg1_skis
                                                                                                of
                                                                                                conrep1_skiU [Occ=Once1]
                                                                                                {
                                                                                                __DEFAULT ->
                                                                                                case
                                                                                                    offset_skih
                                                                                                of
                                                                                                {
                                                                                                I# unbx_skiW [Occ=Once1] ->
                                                                                                case
                                                                                                    rest_skif
                                                                                                of
                                                                                                conrep3_skiX [Occ=Once1]
                                                                                                {
                                                                                                __DEFAULT ->
                                                                                                case
                                                                                                    +# [bx28_ski9
                                                                                                        1#]
                                                                                                of
                                                                                                sat_skiZ [Occ=Once1]
                                                                                                {
                                                                                                __DEFAULT ->
                                                                                                case
                                                                                                    +# [bx26_ski7
                                                                                                        1#]
                                                                                                of
                                                                                                sat_skiY [Occ=Once1]
                                                                                                {
                                                                                                __DEFAULT ->
                                                                                                let {
                                                                                                  sat_skj0 [Occ=Once1]
                                                                                                    :: State
                                                                                                  [LclId] =
                                                                                                      CCCS State! [ds219_skhV
                                                                                                                   ds220_skhW
                                                                                                                   bx20_skhX
                                                                                                                   ds221_skhY
                                                                                                                   True
                                                                                                                   conrep_skiT
                                                                                                                   conrep1_skiU
                                                                                                                   ww_skiu
                                                                                                                   ww1_skiw
                                                                                                                   ww2_skiy
                                                                                                                   -1#
                                                                                                                   unbx_skiW
                                                                                                                   sat_skiY
                                                                                                                   bx27_ski8
                                                                                                                   sat_skiZ
                                                                                                                   ds225_skia
                                                                                                                   x_skio
                                                                                                                   conrep3_skiX];
                                                                                                } in 
                                                                                                  (#,,#) [tokenize8
                                                                                                          id
                                                                                                          sat_skj0];
                                                                                                };
                                                                                                };
                                                                                                };
                                                                                                };
                                                                                                };
                                                                                                };
                                                                                          };
                                                                                } in 
                                                                                  case
                                                                                      ds222_skhZ<TagProper>
                                                                                  of
                                                                                  { False ->
                                                                                        case
                                                                                            ds224_ski1<TagProper>
                                                                                        of
                                                                                        { [] ->
                                                                                              $w$j2_skix
                                                                                                  bx27_ski8;
                                                                                          : _ [Occ=Dead]
                                                                                            _ [Occ=Dead] ->
                                                                                              $w$j2_skix
                                                                                                  bx23_ski4;
                                                                                        };
                                                                                    True ->
                                                                                        $w$j2_skix
                                                                                            -1#;
                                                                                  };
                                                                      } in 
                                                                        case
                                                                            ds222_skhZ<TagProper>
                                                                        of
                                                                        { False ->
                                                                              case
                                                                                  ds224_ski1<TagProper>
                                                                              of
                                                                              { [] ->
                                                                                    $w$j1_skiv
                                                                                        bx26_ski7;
                                                                                : _ [Occ=Dead]
                                                                                  _ [Occ=Dead] ->
                                                                                    $w$j1_skiv
                                                                                        bx22_ski3;
                                                                              };
                                                                          True -> $w$j1_skiv -1#;
                                                                        };
                                                            } in 
                                                              case ds222_skhZ<TagProper> of {
                                                                False ->
                                                                    case ds224_ski1<TagProper> of {
                                                                      [] -> $w$j_skit bx25_ski6;
                                                                      : _ [Occ=Dead] _ [Occ=Dead] ->
                                                                          $w$j_skit bx21_ski2;
                                                                    };
                                                                True -> $w$j_skit -1#;
                                                              };
                                                  } in 
                                                    case ds222_skhZ<TagProper> of {
                                                      False ->
                                                          let {
                                                            sat_skje [Occ=Once1] :: [Char]
                                                            [LclId] =
                                                                CCCS :! [wild6_skin ds224_ski1];
                                                          } in  $j1_skir sat_skje;
                                                      True -> $j1_skir [];
                                                    };
                                        } in 
                                          case x_skio<TagProper> of {
                                            __DEFAULT -> $j_skip False;
                                            '\65279'# -> $j_skip ds223_ski0;
                                          };
                                        };
                                  };
                                  };
                            };
                            };
                  } in 
                    case bx10_skhC<TagProper> of ds219_skjg [Occ=Once1] {
                      __DEFAULT ->
                          case -# [ds219_skjg 1#] of sat_skjh [Occ=Once1] {
                          __DEFAULT ->
                          let {
                            sat_skji [Occ=Once1] :: State
                            [LclId] =
                                CCCS State! [ds211_skhA
                                             ds212_skhB
                                             sat_skjh
                                             ds213_skhD
                                             ds214_skhE
                                             ds215_skhF
                                             ds216_skhG
                                             bx11_skhH
                                             bx12_skhI
                                             bx13_skhJ
                                             bx14_skhK
                                             bx15_skhL
                                             bx16_skhM
                                             bx17_skhN
                                             bx18_skhO
                                             ds217_skhP
                                             bx19_skhQ
                                             ds218_skhR];
                          } in  consumeNextIf_skhS sat_skji;
                          };
                      -1# -> consumeNextIf_skhS wild1_skhz;
                      0# -> (#,,#) [lvl67_rjM1 id wild1_skhz];
                    };
                  };
        } in 
          case ds205_skhi<TagProper> of {
            Nothing -> limitedNextIf_skhx wild_skhe;
            Just parser6_skjk [Occ=Once1] ->
                let {
                  sat_skjm [Occ=Once1] :: State
                  [LclId] =
                      CCCS State! [ds203_skhf
                                   ds204_skhg
                                   bx_skhh
                                   Nothing
                                   True
                                   ds207_skhk
                                   ds208_skhl
                                   bx1_skhm
                                   bx2_skhn
                                   bx3_skho
                                   bx4_skhp
                                   bx5_skhq
                                   bx6_skhr
                                   bx7_skhs
                                   bx8_skht
                                   ds209_skhu
                                   bx9_skhv
                                   ds210_skhw]; } in
                let {
                  sat_skjl [Occ=Once1] :: State
                  [LclId] =
                      CCCS State! [ds203_skhf
                                   ds204_skhg
                                   bx_skhh
                                   Nothing
                                   ds206_skhj
                                   ds207_skhk
                                   ds208_skhl
                                   bx1_skhm
                                   bx2_skhn
                                   bx3_skho
                                   bx4_skhp
                                   bx5_skhq
                                   bx6_skhr
                                   bx7_skhs
                                   bx8_skht
                                   ds209_skhu
                                   bx9_skhv
                                   ds210_skhw];
                } in 
                  case
                      case parser6_skjk of parser6_tnCe {
                      __DEFAULT ->
                      $wrejectParser sat_skjl lvl64_rjLY parser6_tnCe sat_skjm;
                      }
                  of
                  wild2_skjn [Occ=Once1]
                  {
                  (#,,#) ww_skjo [Occ=Once1!] _ [Occ=Dead] _ [Occ=Dead] ->
                  case ww_skjo<TagProper> of {
                    Failed _ [Occ=Dead] -> (#,,#) [ww_skjo ww1_skjp ww2_skjq];
                    Result _ [Occ=Dead] -> limitedNextIf_skhx wild_skhe;
                    More _ [Occ=Dead] -> lvl61_rjLV;
                  };
                  };
          };
        };
$sprefixErrorWith
Rec {
$sprefixErrorWith_rjNu
  :: Parser (Int, Chomp) -> Pattern -> State -> Reply (Int, Chomp)
[GblId,
 Arity=3,
 Str=<1C1(P(SL,L,L,L))><LCL(P(1L,L,L,L))><L>,
 Unf=OtherCon []] =
    {} \r [pattern_skw4 prefix_skw5 state_skw6]
        case pattern_skw4 state_skw6 of wild_skw7 [Occ=Once1] {
        Reply ds203_skw8 [Occ=Once1!]
              ds204_skw9 [Occ=Once2]
              ds205_skwa [Occ=Once2]
              ds206_skwb [Occ=Once2] ->
        case ds203_skw8<TagProper> of wild1_skwc [Occ=OnceL1] {
          Failed _ [Occ=Dead] ->
              let {
                right_skwe [Occ=OnceL1] :: State -> Reply (Int, Chomp)
                [LclId, Arity=1, Str=<1L>, Unf=OtherCon []] =
                    {wild1_skwc} \r [state1_skwf]
                        case state1_skwf of conrep_skwg [Occ=Once1] {
                        State _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead]
                              _ [Occ=Dead] ->
                        Reply [wild1_skwc id Nothing conrep_skwg];
                        }; } in
              let {
                sat_skwF [Occ=Once1] :: Parser (Int, Chomp)
                [LclId] =
                    {prefix_skw5, right_skwe} \r [state1_skwz]
                        case
                            case prefix_skw5 of prefix_tnD8 {
                            __DEFAULT -> $w$c*> prefix_tnD8 right_skwe state1_skwz;
                            }
                        of
                        {
                        (#,,,#) ww_skwB [Occ=Once1]
                                ww1_skwC [Occ=Once1]
                                ww2_skwD [Occ=Once1]
                                ww3_skwE [Occ=Once1] ->
                        Reply [ww_skwB ww1_skwC ww2_skwD ww3_skwE];
                        }; } in
              let {
                sat_skwG [Occ=Once1] :: Result (Int, Chomp)
                [LclId] =
                    CCCS More! [sat_skwF];
              } in  Reply [sat_skwG ds204_skw9 ds205_skwa ds206_skwb];
          Result _ [Occ=Dead] -> wild_skw7;
          More more_skwI [Occ=OnceL1] ->
              let {
                sat_skwJ [Occ=Once1] :: Parser (Int, Chomp)
                [LclId] =
                    {more_skwI, prefix_skw5} \r [eta_B0]
                        $sprefixErrorWith_rjNu more_skwI prefix_skw5 eta_B0; } in
              let {
                sat_skwK [Occ=Once1] :: Result (Int, Chomp)
                [LclId] =
                    CCCS More! [sat_skwJ];
              } in  Reply [sat_skwK ds204_skw9 ds205_skwa ds206_skwb];
        };
        };
end Rec }

@sjakobi
Copy link
Collaborator

sjakobi commented May 25, 2022

I think some allocations could be avoided by turning Reply and Result into unlifted unboxed datastructures. State is probably allocated with a similar frequency, but since it has so many fields, making it unlifted unboxed might result in too much register pressure.

-- | The internal parser state. We don't bother with parameterising it with a
-- \"UserState\", we just bundle the generic and specific fields together (not
-- that it is that easy to draw the line - is @sLine@ generic or specific?).
data State = State {
sEncoding :: !Encoding, -- ^ The input UTF encoding.
sDecision :: !Decision, -- ^ Current decision name.
sLimit :: !Int, -- ^ Lookahead characters limit.
sForbidden :: !(Maybe Pattern), -- ^ Pattern we must not enter into.
sIsPeek :: !Bool, -- ^ Disables token generation.
sIsSol :: !Bool, -- ^ Is at start of line?
sChars :: ![Char], -- ^ (Reversed) characters collected for a token.
sCharsByteOffset :: !Int, -- ^ Byte offset of first collected character.
sCharsCharOffset :: !Int, -- ^ Char offset of first collected character.
sCharsLine :: !Int, -- ^ Line of first collected character.
sCharsLineChar :: !Int, -- ^ Character in line of first collected character.
sByteOffset :: !Int, -- ^ Offset in bytes in the input.
sCharOffset :: !Int, -- ^ Offset in characters in the input.
sLine :: !Int, -- ^ Builds on YAML's line break definition.
sLineChar :: !Int, -- ^ Character number in line.
sCode :: !Code, -- ^ Of token we are collecting chars for.
sLast :: !Char, -- ^ Last matched character.
sInput :: ![(Int, Char)] -- ^ The decoded input characters.
}

It might be helpful to compress some of these fields into a bit field.

I wonder how far we can get by tweaking the existing code though. It's clear that the priority has been to fully comply with the YAML spec. Getting good performance out of the same code might be rather tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants