conllu.parse

conll-xform

for transforming a seq of lines in a conll file into a seq of sentences which are seqs of lines, ignoring comments.

parse-avm

(parse-avm s c kf vf)

parses an attribute-value matrix s into a map. the attribute-value pairs are linked by c, and separated by |. the keys are transformed by kf and the values by vf.

(parse-avm "a=1|b=2|c=3" \= keyword parse-nat-int)
#_=> {:a 1, :b 2, :c 3}

parse-file

(parse-file file)

parses a conllu file which can be any acceptable input for clojure.java.io/reader.

parse-nat-int

(parse-nat-int s)

refuses to parse non-nat-int.

parse-pos-int

(parse-pos-int s)

refuses to parse non-pos-int.

parse-word

(parse-word line)

parses a line of conllu, which must not be empty or comment. turn on clojure.spec/check-asserts to ensure well-formedness, for which clojure.spec/*compile-asserts* needs to be true.

specified?

(specified? s)

an underscore means unspecified.