Skip to content
This repository was archived by the owner on Nov 17, 2020. It is now read-only.

Latest commit

 

History

History
39 lines (39 loc) · 1.52 KB

File metadata and controls

39 lines (39 loc) · 1.52 KB

-*- mode:org -*-

Basic idea

Similar to java nlp framework, but in python

Prototype design in python, implement any performace-dependent components in C

Goals

Easy to use

Pythonic

Fast

Timeline

Prototype counter / countermap [DONE]

Naive Bayes [DONE]

Test that basic counter stuff works [DONE]

FunctionMinimizer (as iterable?) / DifferentiableFunction [DONE]

Encoding / Decoding / IndexLinearizer (fast vector indexing operations) [NOT NECESSARY]

Maximum Entropy Model [DONE]

Fast math operations

Approximate operations [DONE]

Frozen Counters [Performance isn’t as expected… Rearchitect]

Double arrays paired with PyString arrays & dictionary lookup Support for BLAS routines for ultra-fast vector math? Internal numpy compatibility? Goal - lightweight wrapper over numpy.core.array [Slower than C counter?! Shelving this one]

Optimized vector operations

Normalize / Log normalize [DONE]

Thread worker pools

Explore idea with background thread in counter for tracking sum / log sum

Chain models (HMM / MEMM)

Testing mostly

Hard EM

Viterbi decoding

Lots of stuff to think about: How much in C versus Python? Data structure? Should the mesh be explicitly represented? Maybe provide an iterable over ranked decodings? How to handle pruning / approximate decoding schemes?

Baum-Welch updates (soft EM)

Computationally intensive chain-models (CRF)

Test how well the framework scales (can it handle encodings / low-level access etc.)