> module MailProcessor (processEmail) where
>
> import Data.Char
> import Data.Maybe
> import Data.List (isPrefixOf)
>
> -- import NLP.Stemmer> triads n = [(x, y, z)
> | x<-[1..], y<-[1..n], z<-[1..n],
> z^2+y^2==x^2]processEmail is the mail function that processed the email. It goes from the full content of the emai (typed String) to the list of the normalized words of the input email. It does three steps
deleteChars :: String -> String deletes the characters that are punctuation, specifically the ones defined in the list punc (fill in the defition of the function)
words :: String -> [String] is a function imported from Data.Char that splits a String into its words
process :: [String] -> [String] processed each word accodring to processWord (fill in the definition of the function)
> processEmail :: String -> [String]
> processEmail = process . words . deleteChars
> where
> deleteChars :: String -> String
> deleteChars = undefined
> punc = ['!', '?', '.', ',']
>
> process :: [String] -> [String]
> process = undefined processWord is the function that actually processes each word. It proceeds in four steps: - converts the word to lower case letters (fill in toLowerWord) - normalizes the word - strips out the HTML code, and - stems the word according to the NLP algorithm.
Stemming just keeps the roots of the words, i.e. it will make the following transformations that are crucial for email crassification
am, are, is -> be
car, cars, car's, cars' -> car
> processWord :: String -> Maybe String
> processWord = stripHTML . normalize . toLowerWord> toLowerWord :: String -> String
> toLowerWord = undefined Normalization of a word normalizes URLS, emails, dollars, and numbers:
> normalize :: String -> String
> normalize = normalizeURL . normalizeEmail . normalizeNumber . normalizeDollarNext, you should fill in the definitions for the normalization functions.
Function normalizeDollar replaces the character ‘$’ with the word “dollar”:
normalizeDollar "$" = "dollar"
normalizeDollar "foo" = "foo"> normalizeDollar :: String -> String
> normalizeDollar = undefinedFunction normalizeURL replaces URLS with the word “httpaddr”:
normalizeURL "http://google.com" = "httpaddr"
normalizeURL "https://google.com" = "httpaddr"
normalizeURL "foo" = "foo"> normalizeURL :: String -> String
> normalizeURL = undefinedFunction normalizeEmail replaces email addresses with the word “email”:
normalizeEmail "nvazou@cs.ucsd.edu" = "email"
normalizeEmail "foo" = "foo"> normalizeEmail :: String -> String
> normalizeEmail = undefinedFinally, function normalizeNumber replaces numbers with the word number:
normalizeNumber "42" = "number"
normalizeNumber "42$" = "number$"> normalizeNumber :: String -> String
> normalizeNumber = undefined> stripHTML :: String -> Maybe String
> stripHTML x | isNonWord x = Nothing
> | otherwise = Just x
>
> isNonWord x = isHTML x || x == ">"
> isHTML x = head x == '<' && last x == '>'