megaparsec-9.2.1: Monadic parser combinators
Copyright © 2015–present Megaparsec contributors
© 2007 Paolo Martini
© 1999–2001 Daan Leijen
License FreeBSD
Maintainer Mark Karpov <markkarpov92@gmail.com>
Stability experimental
Portability non-portable
Safe Haskell Safe
Language Haskell2010

Text.Megaparsec.Char

Description

Commonly used character parsers.

Synopsis

Simple parsers

newline :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a newline character.

crlf :: forall e s m. ( MonadParsec e s m, Token s ~ Char ) => m ( Tokens s) Source #

Parse a carriage return character followed by a newline character. Return the sequence of characters parsed.

eol :: forall e s m. ( MonadParsec e s m, Token s ~ Char ) => m ( Tokens s) Source #

Parse a CRLF (see crlf ) or LF (see newline ) end of line. Return the sequence of characters parsed.

tab :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a tab character.

space :: ( MonadParsec e s m, Token s ~ Char ) => m () Source #

Skip zero or more white space characters.

See also: skipMany and spaceChar .

hspace :: ( MonadParsec e s m, Token s ~ Char ) => m () Source #

Like space , but does not accept newlines and carriage returns.

Since: 9.0.0

space1 :: ( MonadParsec e s m, Token s ~ Char ) => m () Source #

Skip one or more white space characters.

See also: skipSome and spaceChar .

Since: 6.0.0

hspace1 :: ( MonadParsec e s m, Token s ~ Char ) => m () Source #

Like space1 , but does not accept newlines and carriage returns.

Since: 9.0.0

Categories of characters

controlChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a control character (a non-printing character of the Latin-1 subset of Unicode).

spaceChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode space character, and the control characters: tab, newline, carriage return, form feed, and vertical tab.

upperChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse an upper-case or title-case alphabetic Unicode character. Title case is used by a small number of letter ligatures like the single-character form of Lj.

lowerChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a lower-case alphabetic Unicode character.

letterChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse an alphabetic Unicode character: lower-case, upper-case, or title-case letter, or a letter of case-less scripts/modifier letter.

alphaNumChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse an alphabetic or numeric digit Unicode characters.

Note that the numeric digits outside the ASCII range are parsed by this parser but not by digitChar . Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

printChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a printable Unicode character: letter, number, mark, punctuation, symbol or space.

digitChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse an ASCII digit, i.e between “0” and “9”.

binDigitChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a binary digit, i.e. "0" or "1".

Since: 7.0.0

octDigitChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse an octal digit, i.e. between “0” and “7”.

hexDigitChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a hexadecimal digit, i.e. between “0” and “9”, or “a” and “f”, or “A” and “F”.

markChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode mark character (accents and the like), which combines with preceding characters.

numberChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode numeric character, including digits from various scripts, Roman numerals, etc.

punctuationChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode punctuation character, including various kinds of connectors, brackets and quotes.

symbolChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode symbol characters, including mathematical and currency symbols.

separatorChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a Unicode space and separator characters.

asciiChar :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a character from the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

latin1Char :: ( MonadParsec e s m, Token s ~ Char ) => m ( Token s) Source #

Parse a character from the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

charCategory :: ( MonadParsec e s m, Token s ~ Char ) => GeneralCategory -> m ( Token s) Source #

charCategory cat parses character in Unicode General Category cat , see GeneralCategory .

categoryName :: GeneralCategory -> String Source #

Return the human-readable name of Unicode General Category.

Single character

char :: ( MonadParsec e s m, Token s ~ Char ) => Token s -> m ( Token s) Source #

A type-constrained version of single .

semicolon = char ';'

char' :: ( MonadParsec e s m, Token s ~ Char ) => Token s -> m ( Token s) Source #

The same as char but case-insensitive. This parser returns the actually parsed character preserving its case.

>>> parseTest (char' 'e') "E"
'E'
>>> parseTest (char' 'e') "G"
1:1:
unexpected 'G'
expecting 'E' or 'e'

Sequence of characters

string' :: ( MonadParsec e s m, FoldCase ( Tokens s)) => Tokens s -> m ( Tokens s) Source #

The same as string , but case-insensitive. On success returns string cased as the parsed input.

>>> parseTest (string' "foobar") "foObAr"
"foObAr"