Tuesday, October 19, 2010

Rough organizational overview.

The exact details are constantly changing but here's a rough overview of the LHC pipeline.
  1. External Core.
    We've designed our compiler to use GHC as its frontend. This means that GHC will handle the parsing and type-checking of the Haskell code in addition to some of the optimization (GHC particularly excels at high-level local optimizations). LHC benefits greatly by automatically supporting many of the Haskell extensions offered by GHC.
    Notable characteristics: Non-strict, local functions, complex let-bindings. Pretty much just Haskell code with zero syntactic sugar.
    Example snippet:

    base:Data.Either.$fShowEither :: ghc-prim:GHC.Types.Int =
    ghc-prim:GHC.Types.I# (11::ghc-prim:GHC.Prim.Int#);

  2. Simple Core.
    Since External Core isn't immediately ready to be processed into GRIN code, we first translate it to Simple Core by removing or simplifying out a couple of features. The most noticeable feature of External Core is locally scoped functions which simply does not fit in with the GRIN model. When translating to Simple Core, we hoist out all local functions to the top-level.
    Notable characteristics: Non-strict, no local functions, simplified let-bindings.
  3. Grin Stage 1.
    Let me start by introducing GRIN: GRIN (Graph Reduction Intermediate Notation) is a first order, strict, (somewhat) functional language.
    The purpose of this first stage of grin code is to encode the laziness explicitly. It turns out that you can translate a lazy language (like Simple Core) to a strict language (like GRIN) using only two primitives: Eval and apply. The 'eval' primitives takes a closure, evaluates it if need be and returns the resulting object. The 'apply' primitives simply adds an argument to a closure. Haskell compilers such as GHC, JHC and UHC all use this model for implementing laziness.
    Notable characteristics: Strict, explicit laziness, opaque closures.
    Example snippet:

    base:Foreign.C.Types.@lifted_exp@ w ws =
    do x2508 <- @eval ws
    case x2508 of
    (Cbase:GHC.Int.I32# x#)
    -> do x2510 <- unit 11
    base:GHC.Show.$wshowSignedInt x2510 x# w

  4. Grin Stage 2.
    At the time of writing, each of the mentioned compilers stop at the previous stage (or at what would be their equivalent of that stage).[1] LHC follows in the footsteps of the original GRIN compiler and applies a global control-flow analysis to eliminate/inline all eval/apply primitives. In the end, a lazy/suspended function taking, say, two arguments simply becomes a data constructor with two fields.
    Notable characteristics: Strict, transparent closures.
    Example snippet:

    base:Foreign.Marshal.Utils.toBool1_caf =
    do [x2422] <- constant 0
    [x2423] <- @realWorld#
    [x2424 x2425] <- (foreign lhc_mp_from_int) x2422 x2423
    [x2426] <- constant Cinteger-gmp:GHC.Integer.Type.Integer
    unit [x2426 x2425]

  5. Grin Stage 3.
    Things are starting to get fairly low-level already at stage 2. However, stage 2 is still a bit too high-level for some optimizations to be easily implemented. Stage 3 breaks the code into smaller blocks that can easily be moved, inlined and short-circuited. The code is now sufficiently low-level that it can be pretty-printed as C.
    Notable characteristics: Functions are broken down to functional units. Otherwise same as stage 2.
    Example snippet:

    base:GHC.IO.Encoding.Iconv.@lifted@_lvl60swYU38 rb3 rb4 =
    do [x21578] <- @-# rb4 rb3
    case x21578 of
    0 -> constant Cghc-prim:GHC.Bool.False
    () -> constant Cghc-prim:GHC.Bool.True

  6. Grin--.
    Grin-- is the latest addition to the heap and not much is known about it for certain. It is even up for debate whether it belongs to the GRIN family at all since it diverge from the SSA style.
    The purpose of Grin-- is to provide a vessel for expressing stack operations.
    Notable characteristics: Operates on global virtual registers, enables explicit stack management.
    Example snippet:

    base:GHC.IO.Encoding.Iconv.@lifted@_lvl60swYU38:
    do x21578 := -# rb4 rb3
    case x21578 of
    0 -> do x88175 := Cghc-prim:GHC.Bool.False
    ret
    () -> do x88175 := Cghc-prim:GHC.Bool.True
    ret


Feel free to ask if you have any questions on the how and why of LHC.

[1] UHC does have the mechanics for lowering the eval/apply primitives but it is not enabled by default.

2 comments: