License | BSD-style |
---|---|
Maintainer | Vincent Hanquez <vincent@snarc.org> |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
Opaque packed String encoded in UTF8.
The type is an instance of IsString and IsList, which allow OverloadedStrings
for string literal, and
fromList
to convert a [Char] (Prelude String) to a packed
representation
{-# LANGUAGE OverloadedStrings #-} s = "Hello World" :: String
s = fromList ("Hello World" :: Prelude.String) :: String
Each unicode code point is represented by a variable encoding of 1 to 4 bytes,
For more information about UTF8: https://en.wikipedia.org/wiki/UTF-8
Synopsis
- data String
-
data
Encoding
- = ASCII7
- | UTF8
- | UTF16
- | UTF32
- | ISO_8859_1
- fromBytes :: Encoding -> UArray Word8 -> ( String , Maybe ValidationFailure , UArray Word8 )
- fromBytesLenient :: UArray Word8 -> ( String , UArray Word8 )
- fromBytesUnsafe :: UArray Word8 -> String
- toBytes :: Encoding -> String -> UArray Word8
- data ValidationFailure
- lines :: String -> [ String ]
- words :: String -> [ String ]
- upper :: String -> String
- lower :: String -> String
- replace :: String -> String -> String -> String
- indices :: String -> String -> [ Offset8 ]
- toBase64 :: String -> String
- toBase64URL :: Bool -> String -> String
- toBase64OpenBSD :: String -> String
- breakLine :: String -> Either Bool ( String , String )
Documentation
Opaque packed array of characters in the UTF8 encoding
Instances
Various String Encoding that can be use to convert to and from bytes
Instances
Bounded Encoding | |
Enum Encoding | |
Defined in Basement.String succ :: Encoding -> Encoding Source # pred :: Encoding -> Encoding Source # toEnum :: Int -> Encoding Source # fromEnum :: Encoding -> Int Source # enumFrom :: Encoding -> [ Encoding ] Source # enumFromThen :: Encoding -> Encoding -> [ Encoding ] Source # enumFromTo :: Encoding -> Encoding -> [ Encoding ] Source # enumFromThenTo :: Encoding -> Encoding -> Encoding -> [ Encoding ] Source # |
|
Eq Encoding | |
Data Encoding | |
Defined in Basement.String gfoldl :: ( forall d b. Data d => c (d -> b) -> d -> c b) -> ( forall g. g -> c g) -> Encoding -> c Encoding Source # gunfold :: ( forall b r. Data b => c (b -> r) -> c r) -> ( forall r. r -> c r) -> Constr -> c Encoding Source # toConstr :: Encoding -> Constr Source # dataTypeOf :: Encoding -> DataType Source # dataCast1 :: Typeable t => ( forall d. Data d => c (t d)) -> Maybe (c Encoding ) Source # dataCast2 :: Typeable t => ( forall d e. ( Data d, Data e) => c (t d e)) -> Maybe (c Encoding ) Source # gmapT :: ( forall b. Data b => b -> b) -> Encoding -> Encoding Source # gmapQl :: (r -> r' -> r) -> r -> ( forall d. Data d => d -> r') -> Encoding -> r Source # gmapQr :: forall r r'. (r' -> r -> r) -> r -> ( forall d. Data d => d -> r') -> Encoding -> r Source # gmapQ :: ( forall d. Data d => d -> u) -> Encoding -> [u] Source # gmapQi :: Int -> ( forall d. Data d => d -> u) -> Encoding -> u Source # gmapM :: Monad m => ( forall d. Data d => d -> m d) -> Encoding -> m Encoding Source # gmapMp :: MonadPlus m => ( forall d. Data d => d -> m d) -> Encoding -> m Encoding Source # gmapMo :: MonadPlus m => ( forall d. Data d => d -> m d) -> Encoding -> m Encoding Source # |
|
Ord Encoding | |
Defined in Basement.String |
|
Show Encoding | |
fromBytes :: Encoding -> UArray Word8 -> ( String , Maybe ValidationFailure , UArray Word8 ) Source #
Convert a ByteArray to a string assuming a specific encoding.
It returns a 3-tuple of:
- The string that has been succesfully converted without any error
- An optional validation error
- The remaining buffer that hasn't been processed (either as a result of an error, or because the encoded sequence is not fully available)
Considering a stream of data that is fetched chunk by chunk, it's valid to assume that some sequence might fall in a chunk boundary. When converting chunks, if the error is Nothing and the remaining buffer is not empty, then this buffer need to be prepended to the next chunk
fromBytesLenient :: UArray Word8 -> ( String , UArray Word8 ) Source #
Convert a UTF8 array of bytes to a String.
If there's any error in the stream, it will automatically insert replacement bytes to replace invalid sequences.
In the case of sequence that fall in the middle of 2 chunks, the remaining buffer is supposed to be preprended to the next chunk, and resume the parsing.
fromBytesUnsafe :: UArray Word8 -> String Source #
Convert a Byte Array representing UTF8 data directly to a string without checking for UTF8 validity
If the input contains invalid sequences, it will trigger runtime async errors when processing data.
In doubt, use
fromBytes
toBytes :: Encoding -> String -> UArray Word8 Source #
Convert a String to a bytearray in a specific encoding
if the encoding is UTF8, the underlying buffer is returned without extra allocation or any processing
In any other encoding, some allocation and processing are done to convert.
data ValidationFailure Source #
Possible failure related to validating bytes of UTF8 sequences.
Instances
Eq ValidationFailure | |
Defined in Basement.UTF8.Types (==) :: ValidationFailure -> ValidationFailure -> Bool Source # (/=) :: ValidationFailure -> ValidationFailure -> Bool Source # |
|
Show ValidationFailure | |
Defined in Basement.UTF8.Types |
|
Exception ValidationFailure | |
Defined in Basement.UTF8.Types |
lines :: String -> [ String ] Source #
Split lines in a string using newline as separation.
Note that carriage return preceding a newline are also strip for maximum compatibility between Windows and Unix system.
replace :: String -> String -> String -> String Source #
Replace all the occurrencies of
needle
with
replacement
in
the
haystack
string.
indices :: String -> String -> [ Offset8 ] Source #
Finds where are the insertion points when we search for a
needle
within an
haystack
.
toBase64URL :: Bool -> String -> String Source #
Transform string
src
to URL-safe base64 binary representation.
The result will be either padded or unpadded, depending on the boolean
padded
argument.
toBase64OpenBSD :: String -> String Source #
Transform string
src
to OpenBSD base64 binary representation.
breakLine :: String -> Either Bool ( String , String ) Source #
Same as break but cut on a line feed with an optional carriage return.
This is the same operation as 'breakElem LF' dropping the last character of the string if it's a CR.
Also for efficiency reason (streaming), it returns if the last character was a CR character.