Copyright | (c) 2010 - 2011 Simon Meier |
---|---|
License | BSD3-style (see LICENSE) |
Maintainer | Simon Meier <iridcode@gmail.com> |
Stability | unstable, private |
Portability | GHC |
Safe Haskell | Unsafe |
Language | Haskell98 |
- Warning:* this module is internal. If you find that you need it then please contact the maintainers and explain what you are trying to do and discuss what you would need in the public API. It is important that you do this as the module may not be exposed at all in future releases.
Core types and functions for the
Builder
monoid and its generalization,
the
Put
monad.
The design of the
Builder
monoid is optimized such that
- buffers of arbitrary size can be filled as efficiently as possible and
-
sequencing of
Builder
s is as cheap as possible.
We achieve (1) by completely handing over control over writing to the buffer
to the
BuildStep
implementing the
Builder
. This
BuildStep
is just told
the start and the end of the buffer (represented as a
BufferRange
). Then,
the
BuildStep
can write to as big a prefix of this
BufferRange
in any
way it desires. If the
BuildStep
is done, the
BufferRange
is full, or a
long sequence of bytes should be inserted directly, then the
BuildStep
signals this to its caller using a
BuildSignal
.
We achieve (2) by requiring that every
Builder
is implemented by a
BuildStep
that takes a continuation
BuildStep
, which it calls with the
updated
BufferRange
after it is done. Therefore, only two pointers have
to be passed in a function call to implement concatenation of
Builder
s.
Moreover, many
Builder
s are completely inlined, which enables the compiler
to sequence them without a function call and with no boxing at all.
This design gives the implementation of a
Builder
full access to the
IO
monad. Therefore, utmost care has to be taken to not overwrite anything
outside the given
BufferRange
s. Moreover, further care has to be taken to
ensure that
Builder
s and
Put
s are referentially transparent. See the
comments of the
builder
and
put
functions for further information.
Note that there are
no safety belts
at all, when implementing a
Builder
using an
IO
action: you are writing code that might enable the next
buffer-overflow attack on a Haskell server!
Synopsis
- data Buffer = Buffer !( ForeignPtr Word8 ) ! BufferRange
- data BufferRange = BufferRange !( Ptr Word8 ) !( Ptr Word8 )
- newBuffer :: Int -> IO Buffer
- bufferSize :: Buffer -> Int
- byteStringFromBuffer :: Buffer -> ByteString
-
data
ChunkIOStream
a
- = Finished Buffer a
- | Yield1 ByteString ( IO ( ChunkIOStream a))
- buildStepToCIOS :: AllocationStrategy -> BuildStep a -> IO ( ChunkIOStream a)
- ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString
- ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString )) -> ChunkIOStream a -> (b, ByteString )
- data BuildSignal a
- type BuildStep a = BufferRange -> IO ( BuildSignal a)
- finalBuildStep :: BuildStep ()
- done :: Ptr Word8 -> a -> BuildSignal a
- bufferFull :: Int -> Ptr Word8 -> BuildStep a -> BuildSignal a
- insertChunk :: Ptr Word8 -> ByteString -> BuildStep a -> BuildSignal a
- fillWithBuildStep :: BuildStep a -> ( Ptr Word8 -> a -> IO b) -> ( Ptr Word8 -> Int -> BuildStep a -> IO b) -> ( Ptr Word8 -> ByteString -> BuildStep a -> IO b) -> BufferRange -> IO b
- data Builder
- builder :: ( forall r. BuildStep r -> BuildStep r) -> Builder
- runBuilder :: Builder -> BuildStep ()
- runBuilderWith :: Builder -> BuildStep a -> BuildStep a
- empty :: Builder
- append :: Builder -> Builder -> Builder
- flush :: Builder
- ensureFree :: Int -> Builder
- byteStringCopy :: ByteString -> Builder
- byteStringInsert :: ByteString -> Builder
- byteStringThreshold :: Int -> ByteString -> Builder
- lazyByteStringCopy :: ByteString -> Builder
- lazyByteStringInsert :: ByteString -> Builder
- lazyByteStringThreshold :: Int -> ByteString -> Builder
- shortByteString :: ShortByteString -> Builder
- maximalCopySize :: Int
- byteString :: ByteString -> Builder
- lazyByteString :: ByteString -> Builder
- toLazyByteStringWith :: AllocationStrategy -> ByteString -> Builder -> ByteString
- data AllocationStrategy
- safeStrategy :: Int -> Int -> AllocationStrategy
- untrimmedStrategy :: Int -> Int -> AllocationStrategy
- customStrategy :: ( Maybe ( Buffer , Int ) -> IO Buffer ) -> Int -> ( Int -> Int -> Bool ) -> AllocationStrategy
- smallChunkSize :: Int
- defaultChunkSize :: Int
- chunkOverhead :: Int
- data Put a
- put :: ( forall r. (a -> BuildStep r) -> BuildStep r) -> Put a
- runPut :: Put a -> BuildStep a
- putToLazyByteString :: Put a -> (a, ByteString )
- putToLazyByteStringWith :: AllocationStrategy -> (a -> (b, ByteString )) -> Put a -> (b, ByteString )
- hPut :: forall a. Handle -> Put a -> IO a
- putBuilder :: Builder -> Put ()
- fromPut :: Put () -> Builder
Buffer management
A
Buffer
together with the
BufferRange
of free bytes. The filled
space starts at offset 0 and ends at the first free byte.
Buffer !( ForeignPtr Word8 ) ! BufferRange |
data BufferRange Source #
A range of bytes in a buffer represented by the pointer to the first byte of the range and the pointer to the first byte after the range.
BufferRange !( Ptr Word8 ) !( Ptr Word8 ) |
bufferSize :: Buffer -> Int Source #
Combined size of the filled and free space in the buffer.
byteStringFromBuffer :: Buffer -> ByteString Source #
Convert the filled part of a
Buffer
to a strict
ByteString
.
data ChunkIOStream a Source #
A stream of chunks that are constructed in the
IO
monad.
This datatype serves as the common interface for the buffer-by-buffer
execution of a
BuildStep
by
buildStepToCIOS
. Typical users of this
interface are
ciosToLazyByteString
or iteratee-style libraries like
enumerator
.
Finished Buffer a |
The partially filled last buffer together with the result. |
Yield1 ByteString ( IO ( ChunkIOStream a)) |
Yield a
non-empty
strict
|
:: AllocationStrategy |
Buffer allocation strategy to use |
-> BuildStep a |
|
-> IO ( ChunkIOStream a) |
Convert a
BuildStep
to a
ChunkIOStream
stream by executing it on
Buffer
s allocated according to the given
AllocationStrategy
.
ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString Source #
Convert a
to a lazy
ChunkIOStream
()
ByteString
using
unsafeDupablePerformIO
.
ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString )) -> ChunkIOStream a -> (b, ByteString ) Source #
Convert a
ChunkIOStream
to a lazy tuple of the result and the written
ByteString
using
unsafeDupablePerformIO
.
Build signals and steps
data BuildSignal a Source #
BuildSignal
s abstract signals to the caller of a
BuildStep
. There are
three signals:
done
,
bufferFull
, or 'insertChunks signals
type BuildStep a = BufferRange -> IO ( BuildSignal a) Source #
BuildStep
s may be called *multiple times* and they must not rise an
async. exception.
finalBuildStep :: BuildStep () Source #
The final build step that returns the
done
signal.
:: Ptr Word8 |
Next free byte in current
|
-> a |
Computed value |
-> BuildSignal a |
Signal that the current
BuildStep
is done and has computed a value.
:: Int |
Minimal size of next
|
-> Ptr Word8 |
Next free byte in current
|
-> BuildStep a |
|
-> BuildSignal a |
Signal that the current buffer is full.
:: Ptr Word8 |
Next free byte in current
|
-> ByteString |
Chunk to insert. |
-> BuildStep a |
|
-> BuildSignal a |
Signal that a
ByteString
chunk should be inserted directly.
:: BuildStep a |
Build step to use for filling the
|
-> ( Ptr Word8 -> a -> IO b) |
Handling the
|
-> ( Ptr Word8 -> Int -> BuildStep a -> IO b) |
Handling the
|
-> ( Ptr Word8 -> ByteString -> BuildStep a -> IO b) |
Handling the
|
-> BufferRange |
Buffer range to fill. |
-> IO b |
Value computed while filling this
|
Fill a
BufferRange
using a
BuildStep
.
The Builder monoid
Builder
s denote sequences of bytes.
They are
Monoid
s where
mempty
is the zero-length sequence and
mappend
is concatenation, which runs in
O(1)
.
:: ( forall r. BuildStep r -> BuildStep r) |
A function that fills a
This function must be referentially transparent; i.e., calling it
multiple times with equally sized
|
-> Builder |
:: Builder |
|
-> BuildStep () |
|
Run a
Builder
with the
finalBuildStep
.
Run a
Builder
.
Primitive combinators
ensureFree :: Int -> Builder Source #
ensures that there are at least
ensureFree
n
n
free bytes
for the following
Builder
.
byteStringCopy :: ByteString -> Builder Source #
Construct a
Builder
that copies the strict
ByteString
.
Use this function to create
Builder
s from smallish (
<= 4kb
)
ByteString
s or if you need to guarantee that the
ByteString
is not
shared with the chunks generated by the
Builder
.
byteStringInsert :: ByteString -> Builder Source #
Construct a
Builder
that always inserts the strict
ByteString
directly as a chunk.
This implies flushing the output buffer, even if it contains just
a single byte. You should therefore use
byteStringInsert
only for large
(
> 8kb
)
ByteString
s. Otherwise, the generated chunks are too
fragmented to be processed efficiently afterwards.
byteStringThreshold :: Int -> ByteString -> Builder Source #
Construct a
Builder
that copies the strict
ByteString
s, if it is
smaller than the treshold, and inserts it directly otherwise.
For example,
byteStringThreshold 1024
copies strict
ByteString
s whose size
is less or equal to 1kb, and inserts them directly otherwise. This implies
that the average chunk-size of the generated lazy
ByteString
may be as
low as 513 bytes, as there could always be just a single byte between the
directly inserted 1025 byte, strict
ByteString
s.
lazyByteStringCopy :: ByteString -> Builder Source #
Construct a
Builder
that copies the lazy
ByteString
.
lazyByteStringInsert :: ByteString -> Builder Source #
Construct a
Builder
that inserts all chunks of the lazy
ByteString
directly.
lazyByteStringThreshold :: Int -> ByteString -> Builder Source #
Construct a
Builder
that uses the thresholding strategy of
byteStringThreshold
for each chunk of the lazy
ByteString
.
shortByteString :: ShortByteString -> Builder Source #
Construct a
Builder
that copies the
ShortByteString
.
maximalCopySize :: Int Source #
The maximal size of a
ByteString
that is copied.
2 *
to guarantee that on average a chunk is of
smallChunkSize
smallChunkSize
.
byteString :: ByteString -> Builder Source #
Create a
Builder
denoting the same sequence of bytes as a strict
ByteString
.
The
Builder
inserts large
ByteString
s directly, but copies small ones
to ensure that the generated chunks are large on average.
lazyByteString :: ByteString -> Builder Source #
Create a
Builder
denoting the same sequence of bytes as a lazy
ByteString
.
The
Builder
inserts large chunks of the lazy
ByteString
directly,
but copies small ones to ensure that the generated chunks are large on
average.
Execution
:: AllocationStrategy |
Buffer allocation strategy to use |
-> ByteString |
Lazy
|
-> Builder |
|
-> ByteString |
Resulting lazy
|
Heavy inlining.
Execute a
Builder
with custom execution parameters.
This function is inlined despite its heavy code-size to allow fusing with
the allocation strategy. For example, the default
Builder
execution
function
toLazyByteString
is defined as follows.
{-# NOINLINE toLazyByteString #-} toLazyByteString = toLazyByteStringWith (safeStrategy
smallChunkSize
defaultChunkSize
) L.empty
where
L.empty
is the zero-length lazy
ByteString
.
In most cases, the parameters used by
toLazyByteString
give good
performance. A sub-performing case of
toLazyByteString
is executing short
(<128 bytes)
Builder
s. In this case, the allocation overhead for the first
4kb buffer and the trimming cost dominate the cost of executing the
Builder
. You can avoid this problem using
toLazyByteStringWith (safeStrategy 128 smallChunkSize) L.empty
This reduces the allocation and trimming overhead, as all generated
ByteString
s fit into the first buffer and there is no trimming
required, if more than 64 bytes and less than 128 bytes are written.
data AllocationStrategy Source #
A buffer allocation strategy for executing
Builder
s.
:: Int |
Size of first buffer |
-> Int |
Size of successive buffers |
-> AllocationStrategy |
An allocation strategy that guarantees that at least half of the allocated memory is used for live data |
Use this strategy for generating lazy
ByteString
s whose chunks are
likely to survive one garbage collection. This strategy trims buffers
that are filled less than half in order to avoid spilling too much memory.
:: Int |
Size of the first buffer |
-> Int |
Size of successive buffers |
-> AllocationStrategy |
An allocation strategy that does not trim any of the filled buffers before converting it to a chunk |
Use this strategy for generating lazy
ByteString
s whose chunks are
discarded right after they are generated. For example, if you just generate
them to write them to a network socket.
:: ( Maybe ( Buffer , Int ) -> IO Buffer ) |
Buffer allocation function. If
|
-> Int |
Default buffer size. |
-> ( Int -> Int -> Bool ) |
A predicate
|
-> AllocationStrategy |
Create a custom allocation strategy. See the code for
safeStrategy
and
untrimmedStrategy
for examples.
smallChunkSize :: Int Source #
The recommended chunk size. Currently set to 4k, less the memory management overhead
defaultChunkSize :: Int Source #
The chunk size used for I/O. Currently set to 32k, less the memory management overhead
chunkOverhead :: Int Source #
The memory management overhead. Currently this is tuned for GHC only.
The Put monad
A
Put
action denotes a computation of a value that writes a stream of
bytes as a side-effect.
Put
s are strict in their side-effect; i.e., the
stream of bytes will always be written before the computed value is
returned.
Put
s are a generalization of
Builder
s. The typical use case is the
implementation of an encoding that might fail (e.g., an interface to the
zlib
compression library or the conversion from Base64 encoded data to
8-bit data). For a
Builder
, the only way to handle and report such a
failure is ignore it or call
error
. In contrast,
Put
actions are
expressive enough to allow reportng and handling such a failure in a pure
fashion.
actions are isomorphic to
Put
()
Builder
s. The functions
putBuilder
and
fromPut
convert between these two types. Where possible, you should
use
Builder
s, as sequencing them is slightly cheaper than sequencing
Put
s because they do not carry around a computed value.
:: ( forall r. (a -> BuildStep r) -> BuildStep r) |
A function that fills a
This function must be referentially transparent; i.e., calling it
multiple times with equally sized
|
-> Put a |
:: Put a |
Put to run |
-> BuildStep a |
|
Run a
Put
.
Execution
:: Put a |
|
-> (a, ByteString ) |
Result and lazy
|
Execute a
Put
and return the computed result and the bytes
written during the computation as a lazy
ByteString
.
This function is strict in the computed result and lazy in the writing of the bytes. For example, given
infinitePut = sequence_ (repeat (putBuilder (word8 1))) >> return 0
evaluating the expression
fst $ putToLazyByteString infinitePut
does not terminate, while evaluating the expression
L.head $ snd $ putToLazyByteString infinitePut
does terminate and yields the value
1 :: Word8
.
An illustrative example for these strictness properties is the implementation of Base64 decoding ( http://en.wikipedia.org/wiki/Base64 ).
type DecodingState = ... decodeBase64 ::ByteString
-> DecodingState ->Put
(Maybe DecodingState) decodeBase64 = ...
The above function takes a strict
ByteString
supposed to represent
Base64 encoded data and the current decoding state.
It writes the decoded bytes as the side-effect of the
Put
and returns the
new decoding state, if the decoding of all data in the
ByteString
was
successful. The checking if the strict
ByteString
represents Base64
encoded data and the actual decoding are fused. This makes the common case,
where all data represents Base64 encoded data, more efficient. It also
implies that all data must be decoded before the final decoding
state can be returned.
Put
s are intended for implementing such fused
checking and decoding/encoding, which is reflected in their strictness
properties.
putToLazyByteStringWith Source #
:: AllocationStrategy |
Buffer allocation strategy to use |
-> (a -> (b, ByteString )) |
Continuation to use for computing the final result and the tail of its side-effect (the written bytes). |
-> Put a |
|
-> (b, ByteString ) |
Resulting lazy
|
Execute a
Put
with a buffer-allocation strategy and a continuation. For
example,
putToLazyByteString
is implemented as follows.
putToLazyByteString =putToLazyByteStringWith
(safeStrategy
smallChunkSize
defaultChunkSize
) (x -> (x, L.empty))