Text2Summary API uses on-device methods to perform text summarization on Android applications. It uses extractive text summarization to give you the most important sentences from a given text.
You may read the story of Text2Summary on Medium.
For the latest fat JAR, see Releases.
The text which needs to be summarized has to be a String
object. Then,
use Text2Summary.summarize()
method to generate the summary.
Text2Summary.summarize()
is a suspend function and hence it should be called within a CoroutineContext
.
runBlocking {
var summary = Text2Summary.summarize(someLongText, compressionRate = 0.7)
}
The number 0.7
is referred as the compression factor. Meaning, given a text of 10 sentences, a summary of 7 sentences will be
produced. This number must lie in the interval ( 0 , 1 )
.
You may extract text from a file and then pass it to Text2Summary,
val bufferedReader: BufferedReader = File( "poems.txt" ).bufferedReader()
val text = bufferedReader.use{ it.readText() }
val summary = Text2Summary.summarize( text , 0.7 )
The summarizeAsync()
method internally calls the summarize()
method itself wrapped in a AsyncTask
.
Text2Summary uses the TF-IDF algorithm for extractive text summarization. Note, this is not abstractive text summarization which
use neural networks like the Seq2Seq model. As TensorFlow Lite does not support fully the conversion of Embedding
and LSTM
layers, we need to use the TF-IDF algorithm.
- The
text
which is given toTextSummary.summarize()
is broken down into sentences. These sentences are further brought down to words ( tokens ). - Using TF-IDF algorithm, a TF-IDF score is calculated for each word.
- Next, we take the sum of such scores for all words present in the sentence.
- Finally, we take the top N highest scores. The corresponding sentences hold most of the information present in the text. These sentences are then concatenated and returned as the summary.
If you are facing any issues, open an issue on the repository.